Vaibhav (VB) Srivastav @reach_vb Twitter profile | Pikagi

Pikagi

Vaibhav (VB) Srivastav

@reach_vb

11,294

Followers

172

Following

719

Media

3,862

Statuses

GPU poor @Huggingface | F1 fan | Here for @at_sofdog ’s wisdom | *opinions my own

nvidia-smi

https://t.co/N88fQklLMX

Joined June 2017

Don't wanna be here? Send us removal request.

Pinned Tweet

@reach_vb

Vaibhav (VB) Srivastav

4 months

I did this. Fuck what anyone else says, just put the pedal to the metal and BUILD. Push spaghetti code. Nobody cares about OOPs. Doesn’t matter what anyone thinks. Just keep on doing. Document in public. Don’t listen to the haters. Release more than you refactor. Just keep…

@NickADobos

Nick Dobos

4 months

Just do stuff

Tweet media one

20

199

2K

55

165

2K

Last Seen Profiles

@Tameresorciere

@HannahHJYEO

@Korikoiri

@Launchaco

@abdlmasi7yacoub

@cycrus3

@sweethartsami

@Neicey_Poohh97

@WeMintArt_

@LLDJ74

@trzndx

@benelmescany

@FeniksEyes

@SOeducating

@TraeDaTruth2024

@stw_pdg

@_SaZss_

@winniewheeny

@chrissgardner

@34ebruli34

@GENSHINJUALBELI

@ImStevan

@raeicky

@kashiyukas

@HutnikOfficial

@KassoulMhmd

@furiwna

@NAbilaRod

@danduttonysj

@Atatrksporlise1

@HRHTish

@alouwitch

@frashh44

@ANorthPrincipal

@MVHS_Football

@GalaxyOxy

@reach_vb

Vaibhav (VB) Srivastav

3 months

Let's go! MetaVoice 1B 🔉 > 1.2B parameter model. > Trained on 100K hours of data. > Supports zero-shot voice cloning. > Short & long-form synthesis. > Emotional speech. > Best part: Apache 2.0 licensed. 🔥 Powered by a simple yet robust architecture: > Encodec (Multi-Band…

31

317

2K

@reach_vb

Vaibhav (VB) Srivastav

5 months

We made Whisper even faster. ~40% faster!! 🔥 Whisper solidifies its lead by an even larger margin! With the latest changes in transformers - large v3 is the best* and the fastest among the top 5 on the Open ASR Leaderboard. Below is the reduction in Real-time factor (RTF):…

Tweet media one

19

191

1K

@reach_vb

Vaibhav (VB) Srivastav

6 months

Insanely fast whisper now with Speaker Diarisation! 🔥 100% local and works on your Mac or on Nvidia GPUs. All thanks to @hbredin 's Pyannote library, you can now get blazingly fast transcriptions and speaker segmentations! ⚡️ Here's how you can use it too: pipx install…

37

167

1K

@reach_vb

Vaibhav (VB) Srivastav

6 months

Llama 2 7B chat, running 100% private on Mac, powered by CoreML! ⚡️ We're optimising this setup to get much more faster generation. 🔥

29

134

1K

@reach_vb

Vaibhav (VB) Srivastav

3 months

Whisper powered by Apple Neural Engine! 🔥 The lads at @argmaxinc optimised Whisper to work at blazingly fast speeds on iOS and Mac! > All code is MIT-licensed. > Upto 3x faster than the competition. > Neural Engine as well as Metal runners. > Open source CoreML models. > 2…

25

143

1K

@reach_vb

Vaibhav (VB) Srivastav

4 months

Mixtral 8x7B Instruct with AWQ & Flash Attention 2 🔥 All in ~24GB GPU VRAM! With the latest release of AutoAWQ - you can now run Mixtral 8x7B MoE with Flash Attention 2 for blazingly fast inference. All in < 10 lines of code. The only real change except loading AWQ weights…

22

167

980

@reach_vb

Vaibhav (VB) Srivastav

6 months

Insanely fast whisper now with Whisper Large V3 🔥 Transcribe 150 minutes of audio in less than 98 seconds (powered by Transformers & @tri_dao Flash Attention 2). Don't believe it? look at the benchmarks below ;) All of this with the familiar Transformers API and optionally…

Tweet media one

36

131

978

@reach_vb

Vaibhav (VB) Srivastav

4 months

What are the top open source TTS models out there? 🤔 Here’s my list so far: XTTS - YourTTS - FastSpeech2 - VITS - TorToiSe - Pheme - …

Tweet card media

GitHub - PolyAI-LDN/pheme

Contribute to PolyAI-LDN/pheme development by creating an account on GitHub.

40

196

928

@reach_vb

Vaibhav (VB) Srivastav

1 month

Introducing Command R Plus ⚡ > Beats claude-3, mistral-large, gpt-4 turbo. > 104 Billion parameters. > Built with multi-step tool use and RAG. > Supports 10 languages. > Context length of 128K. > Trained with grounded generation capabilities - citations and responses based on…

Tweet media one

36

154

890

@reach_vb

Vaibhav (VB) Srivastav

4 months

Alrighty! W2V-BERT 2.0: Speech encoder for low-resource languages! 🔥 With < 15 hours of audio, you can beat Whisper and get your own SoTA ASR model! > Pre-trained on 4.5M hours of data. > 600M parameters. > 143+ languages. > 10-30x faster than Whisper. > Best part: MIT license…

Tweet media one

9

174

865

@reach_vb

Vaibhav (VB) Srivastav

1 year

After 70x faster Whisper, we present to you - 5x faster Whisper fine-tuning! ⚡️ Powered by LoRA and 🤗 PEFT - Squeeze in 5x larger batch sizes, fit Whisper-large checkpoint < 8GB VRAM! 🔥 Best part? With almost no degradation in WER! 🤯 Check it out:

Tweet media one

12

144

857

@reach_vb

Vaibhav (VB) Srivastav

5 months

distil-whisper small now in insanely-fast-whisper ⚡️ 100% private, on Mac < 1.5GB VRAM! 166M parameters are all you need* ;)

14

117

859

@reach_vb

Vaibhav (VB) Srivastav

10 days

Phi 3 running on your browser! 100% local, powered by WebGPU & Rust 🦀

13

147

850

@reach_vb

Vaibhav (VB) Srivastav

5 months

4-bit quantised Mistral 7B instruct v0.2! - fasttt! 🏎️ On Mac (M2). Powered by MLX. Fully local. Requires < 10GB RAM for 4-bits. (GPU poors, rise up) Have to say, MLX is a solid alternative to llama.cpp

21

97

801

@reach_vb

Vaibhav (VB) Srivastav

4 months

fuck yeah! whisper on metal powered by rust 🦀 100% local + fastt! brought to you by 🤗candle.

22

81

792

@reach_vb

Vaibhav (VB) Srivastav

4 months

Welcome OpenVoice! 🎙️ A versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. Open access weights 🔥 It enables granular control over voice styles, including…

11

154

784

@reach_vb

Vaibhav (VB) Srivastav

25 days

Current best local model: 1. LLM - Mistral Instruct v0.2 7B/ Command R (4bit) 2. TTS - Parler-TTS/ Style-TTS 2 3. ASR - distil-whisper/ faster-whisper 4. VLM - Idefics 2/ CogVLM Best stack: 1. Use llama.cpp to run LLM/ VLM via the server 2. Transformers to run Parler TTS/…

43

128

790

@reach_vb

Vaibhav (VB) Srivastav

1 month

LETS GOO! Parler TTS 🔥 A fully open-source, Apache 2.0 licensed Text-to-speech model focused on providing maximum controllability. Through voice prompts, you can control the pitch, speed, gender, noise levels, emotion characteristics and more! > Trained on 10K hours of…

26

129

768

@reach_vb

Vaibhav (VB) Srivastav

5 months

Mistral just dropped an improved instruct fine-tuned version of their 7B model - v0.2 Good day for GPU poor! 🔥

Tweet card media

mistralai/Mistral-7B-Instruct-v0.2 · Hugging Face

17

92

772

@reach_vb

Vaibhav (VB) Srivastav

4 months

Let's go, 200% faster Whisper w/ speculative decoding! 🔥 Whisper (baseline) - 73 seconds Whisper w/ Speculative Decoding - 33 seconds All with zero drop in performance! ⚡ Pseudocode: 1. Initialise a Teacher model ex: openai/whisper-large-v2. 2. Load an assistant model ex:…

20

119

758

@reach_vb

Vaibhav (VB) Srivastav

5 months

Oof! Whisper on @Apple 's MLX backend is quite stonkingly fast! 🏃 Not only that, it optimises GPU + CPU usage quite well! What is MLX? MLX is a framework released by Apple for ML researchers to train and infer ML models efficiently. MLX has a Python API that closely follows…

21

98

731

@reach_vb

Vaibhav (VB) Srivastav

3 months

Run Mixtral 8x7B w/ ~13 GB VRAM 🤯 *On a free colab too, powered by Transformers & AQLM! AQLM is a new SOTA method for low-bitwidth LLM quantization, targeted to the “extreme” 2-3bit / parameter range. In less than 5 lines of code, you can try it out too! ⚡ Make sure to…

Tweet media one

17

106

716

@reach_vb

Vaibhav (VB) Srivastav

6 months

Insanely fast whisper - now with a CLI⚡️ You can now translate/ transcribe 100s of hours of data across 99 languages! - all from your terminal. Here's how you can use it: 1. Install requirements pip install transformers, accelerate, optimum 2. Grab the transcribe py file and…

Tweet media one

18

94

686

@reach_vb

Vaibhav (VB) Srivastav

3 months

Whisper running on WatchOS! 🔥 > Powered by WhisperKit by @argmaxinc > Supports up to Whisper base > Leverages Neural Engine ⚡ > Three lines of code ;) > Works real-time! > MIT license Quite amazed by the speed with which Argmax is shipping. Possibly the fastest & reliable…

12

106

678

@reach_vb

Vaibhav (VB) Srivastav

2 months

ChatMusician is 🔥 *sound on* > Llama 2 pre-trained + fine-tuned further. > Beats GPT 4. > Can compose well-structured, full-length music conditioned on texts, chords, melodies, motifs, and musical forms. > Code, data, model, benchmark - open source - MIT licensed! ⚡

17

124

669

@reach_vb

Vaibhav (VB) Srivastav

6 months

Whisper Large V3 has landed in Transformers! 🎉 The large-v3 checkpoint open-sourced by Open AI yesterday is now fully compatible with Transformers! Best part: It is fully compatible with the ASR pipeline! Here's how you can use it: import torch from transformers import…

Tweet media one

7

107

673

@reach_vb

Vaibhav (VB) Srivastav

2 months

Introducing Distil-Whisper v3 ⚡ > ~50% less parameters and 6x faster than Large-v3. > More accurate than large-v3 on long-form synthesis. Available with 🦀 WebGPU, Whisper.cpp, Transformers, Faster-Whisper and Transformers.js support! Drop in; no changes are required! 🔥

19

79

665

@reach_vb

Vaibhav (VB) Srivastav

4 months

Introducing Open TTS Tracker! 🗣️ *sound on* A one-stop shop to track all open access/ source TTS models! Ranging from XTTS to Pheme, OpenVoice to VITS, and more... ⚡ For each model, we compile: 1. Souce-code 2. Checkpoints 3. License 4. Fine-tuning code 5. Languages…

25

107

664

@reach_vb

Vaibhav (VB) Srivastav

4 months

Mistral QLoRA w/ MLX on your Mac ⚡ Utilising 100% GPU, fully offline. You can now convert any Hugging Face model to a Quantised format and use it to fine-tune on-device! python convert. py --hf-path mistralai/Mistral-7B-v0.1 -q Then, to fine-tune the run: python lora. py…

4

126

636

@reach_vb

Vaibhav (VB) Srivastav

4 months

Introducing MLX-LM! ⚡ *sound on* Run LLMs on-device directly on your Mac with 3 lines of code! ;) 100% local and quite spiffy (even faster with 4-bit)! I made a quick video covering the package, its capabilities and a bit of quantisation. The video goes through what MLX is,…

19

117

636

@reach_vb

Vaibhav (VB) Srivastav

6 months

Insanely fast whisper now on Mac 🚀 You can now get the same experience of whisper in the comfort of your Mac, too! This is made possible by torch.mps backend. It isn't as fast as CUDA; however, it works pretty fast and can utilise the GPU well! All you need to do is this:…

25

68

599

@reach_vb

Vaibhav (VB) Srivastav

2 months

MusicLang 🎶 - Llama 2 based Music generation model! > Llama2 based, trained from scratch. > Permissively licensed - open source. > Optimised to run on CPU. 🔥 > Highly controllable, chose tempo, chord progression, bar range and more! ;) Absolutely love playing with the demo,…

8

113

577

@reach_vb

Vaibhav (VB) Srivastav

4 months

MIT licensed Phi running on Mac powered by Rust! 🦀 Spiffy and fast, powered by Candle! ⚡ As simple as running: cargo run --example phi --release --features metal -- --model 2 --prompt "A skier slides down a frictionless slope of height 40m and length 80m. What's the skiers…

18

94

572

@reach_vb

Vaibhav (VB) Srivastav

3 months

Introducing Qwen 1.5! 🔥 > 6 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, and 72B. > Beats GPT 3.5, Mistral-Medium. > Multilingual support of both base and chat models. > Support 32K context length. > Base + chat model checkpoints released. > Runs natively with…

Tweet media one

15

86

570

@reach_vb

Vaibhav (VB) Srivastav

3 months

Introducing fast-llm rs! 🦀 Infer LLMs like Mistral, LLama, Mixtral, on your Mac at the touch of your CLI! Powered by Candle and Rust! ⚡ Works on Metal and CPU - Infer your GGUF checkpoints in pure Rust! ;) All you gotta do is: Step 1: git clone https://github.…

27

89

547

@reach_vb

Vaibhav (VB) Srivastav

18 days

Snowflake dropped a 408B Dense + Hybrid MoE 🔥 > 17B active parameters > 128 experts > trained on 3.5T tokens > uses top-2 gating > fully apache 2.0 licensed (along with data recipe too) > excels at tasks like SQL generation, coding, instruction following > 4K context window,…

Tweet media one

18

100

542

@reach_vb

Vaibhav (VB) Srivastav

5 months

High-quality speech/ text translations with SeamlessM4T v2 by @AIatMeta 🔉 M4T == Massively Multilingual and Multimodal Machine Translation seamlessly ;) You can now translate in 100 languages from/ to speech or text with transformers! Here's how you can do it, too! 👇 1.…

Tweet media one

16

76

513

@reach_vb

Vaibhav (VB) Srivastav

5 months

Nous Hermes Yi 34B beats Mixtral 8X7B 🔥 With AWQ, you only need ~20GB VRAM to run this beast, 100% local and offline! Trained on 1M+ GPT4 generated data points! (synthetic data ftw!) Here's how you can run it, too (w/ transformers and AutoAWQ): from transformers import…

13

88

514

@reach_vb

Vaibhav (VB) Srivastav

2 months

Wow! @CohereForAI just released CMD-R 🔥 > Beats GPT 3.5 > 128K context window. > 35 billion parameters. > 10 languages. > Optimised for reasoning, question answering and summarisation. > Use it directly in transformers 🤗

Tweet card media

CohereForAI/c4ai-command-r-v01 · Hugging Face

10

87

506

@reach_vb

Vaibhav (VB) Srivastav

3 months

Announcing TTS Arena! 🗣️ *sound on* One place to test, rate and find the champion of current open models. A continually updated space with the greatest and the best of the current TTS landscape! ⚡ Rate once, rate twice - help us find the best out there. Starting with five…

29

101

486

@reach_vb

Vaibhav (VB) Srivastav

3 months

NeMo Canary 1B by @NVIDIAAI 🔥 *Sound on 🔊* > Tops the Open ASR Leaderboard. > Beats Whisper to punch for ASR. > Beats Seamless M4Tv2 for Speech Translation. > Supports 4 languages - English, Spanish, French & German. > Trained on 85,000 hours of annotated audio. >…

12

94

480

@reach_vb

Vaibhav (VB) Srivastav

2 months

Let's goo! StyleTTS 2 - New king of the Text to Speech Arena! 👑 StyleTTS 2 is fully open source, and the authors are training better and larger checkpoints. 🔥 Stay tuned for some exciting updates re: StyleTTS v2 - things will get excitinggg! Side note: 200 stars on the TTS…

Tweet media one

11

73

450

@reach_vb

Vaibhav (VB) Srivastav

3 months

OpenMath Instruct-1 by @NVIDIAAI 🧮 > 1.8 Million Problem-Solution (synthetic) pairs. > Uses GSM8K & MATH training subsets. > Uses Mixtral 8x7B to produce the pairs. > Leverages both text reasoning + code interpreter during generation. > Released LLama, CodeLlama, Mistral,…

4

95

468

@reach_vb

Vaibhav (VB) Srivastav

6 months

Whisper Large-v3: New champion for the Open ASR leaderboard! 👑 We evaluated the latest Whisper checkpoint on a series of datasets and found it the most performant! Here are a couple of quick takeaways from running these evaluations: 1. The best performance for Whisper…

Tweet media one

11

83

454

@reach_vb

Vaibhav (VB) Srivastav

11 months

Want to train your own Bark/MusicGen-like TTS/TTA models? 👀 The SoTA Encodec model by @MetaAI has now landed in 🤗Transformers! It supports compression up to 1.5KHz and produces discrete audio representations. ⚡️ Model: Colab:

10

103

450

@reach_vb

Vaibhav (VB) Srivastav

9 days

BOOM! Whisper + Speaker Diarisation! 🔥 Blazingly fast meeting transcription all with a simple call to an API - powered by Inference Endpoints ⚡ - Whisper to transcribe speech to text (w/ Flash Attention) - Diarization to break down the transcription by speakers (w/ Pyannote)…

Tweet media one

19

53

457

@reach_vb

Vaibhav (VB) Srivastav

5 months

Running Phi-2 on Mac - fully utilising the GPU! Powered by MLX! ⚡️ 2.7B model running 100% locally!

12

53

443

@reach_vb

Vaibhav (VB) Srivastav

4 months

Parakeet RNNT & CTC models top the Open ASR Leaderboard! 👑 Brought to you by @NVIDIAAI and @suno_ai_ , parakeet beats Whisper and regains its first place. The models are released under a commercially permissive license! 🥳 The models inherit the same FastConformer…

Tweet media one

17

75

441

@reach_vb

Vaibhav (VB) Srivastav

6 months

Insanely fast whisper now with Flash Attention 2 🔥 With the latest release of Transformers (4.35), you can run Whisper & Distil-Whisper even faster with Flash Attention 2. To benefit from it, make sure to upgrade your transformers & flash-attn version: pip install --upgrade…

Tweet media one

6

63

443

@reach_vb

Vaibhav (VB) Srivastav

1 month

Distil Whisper - streaming on an iPhone! 🔥 100% local. Fully on-device. WhisperKit ftw! ⚡ brew install whisperkit-cli && whisperkit-cli transcribe --model-prefix distil --model distil-large-v3 --audio-path <audio-path>

@reach_vb

Vaibhav (VB) Srivastav

1 month

Distil-whisper now with apple neural engine support via WhisperKit! 🔥 You can now: brew install whisperkit-cli Followed by: whisperkit-cli transcribe --model-prefix "distil" --model "large-v3" --verbose --audio-path ~/Downloads/jfk.wav Bonus: If you have an M2 or higher…

4

46

288

10

56

439

@reach_vb

Vaibhav (VB) Srivastav

7 months

UPDATE: New benchmark for insanely fast whisper! 🤗 You can transcribe 3000 hours of audio in less than 2 hours! Batching + BetterTransformer is still the fastest way to transcribe audio insanely fast!

Tweet media one

13

59

433

@reach_vb

Vaibhav (VB) Srivastav

4 months

PSA 📣: MLX can now pull Mistral/ Llama/ TinyLlama safetensors directly from the Hub! 🔥 pip install -U mlx is all you need! All mistral/ llama fine-tunes supported too! 20,000+ checkpoints overall! P.S. We also provide a script to convert and quantise checkpoints and…

13

56

430

@reach_vb

Vaibhav (VB) Srivastav

1 month

mixtral 8x22B - things we know so far 🫡 > 176B parameters > performance in between gpt4 and claude sonnet (according to their discord) > same/ similar tokeniser used as mistral 7b > 65536 sequence length > 8 experts, 2 experts per token: More > would require ~260GB VRAM in…

Tweet media one

14

59

430

@reach_vb

Vaibhav (VB) Srivastav

1 month

IT WORKS! Running Mixtral 8x22B with Transformers! 🔥 Running on a DGX (4x A100 - 80GB) with CPU offloading 🤯

@reach_vb

Vaibhav (VB) Srivastav

1 month

mixtral 8x22B - things we know so far 🫡 > 176B parameters > performance in between gpt4 and claude sonnet (according to their discord) > same/ similar tokeniser used as mistral 7b > 65536 sequence length > 8 experts, 2 experts per token: More > would require ~260GB VRAM in…

Tweet media one

14

59

430

14

54

424

@reach_vb

Vaibhav (VB) Srivastav

7 months

Generate melodies with MusicGen & Transformers, but faster! ⚡️ import torch from transformers import pipeline pipe = pipeline("text-to-audio", "facebook/musicgen-small", torch_dtype=torch.float16) pipe("upbeat lo-fi music") That's it! 🤗

12

85

401

@reach_vb

Vaibhav (VB) Srivastav

6 months

Welcome distil-whisper 🔥 49% smaller, 6x faster, and within the 1% performance range of Whisper-large-v2! All in the good ol' Transformers API. 1. Make sure to upgrade transformers to the latest release. pip install --upgrade transformers 2. Import torch & transformers…

Tweet media one

8

72

415

@reach_vb

Vaibhav (VB) Srivastav

7 months

Transcribe 150 minutes of Audio in less than 5 minutes with Whisper large! 🏎️ Powered by Transformers and Optimum, you get blazingly fast transcriptions in a few lines of code! pipe = pipeline("automatic-speech-recognition", "openai/whisper-large-v2", torch_dtype=torch.float16,…

Tweet media one

11

59

397

@reach_vb

Vaibhav (VB) Srivastav

2 months

Damn! DBRX 132B is wild! 🤯 > Trained on 12 Trillion tokens. > Beats Grok-1, Mixtral, etc. > Mixture of Experts. 16 experts, 4 active. > Uses RoPE, GLU and GQA. > Context size of 32K. > Open access - Base and Instruct. 🔥 > Requires 264 GB RAM; inference with Transformers! 🤗

Tweet media one

16

62

388

@reach_vb

Vaibhav (VB) Srivastav

3 months

恭喜发财 Gong xi fa cai 🧧 The impact of China on the current AI/ ML landscape has been ginormous. From LLMs to TTS to ASR, we've gotten SoTA models weekly from China-based labs! Some highlights for me: LLM/ VLMs 1. Qwen 1.5 & Qwen VL - 2. OpenBMB…

Tweet media one

14

79

393

@reach_vb

Vaibhav (VB) Srivastav

8 days

SURPRISE: Google just dropped CodeGemma 1.1 7B IT 🔥 The models get incrementally better at Single and Multi-generations. Major boost in in C#, Go, Python 🐍 Along with the 7B IT they release an updated 2B base model too. Enjoy!

Tweet media one

14

82

394

@reach_vb

Vaibhav (VB) Srivastav

3 months

Hear me out, fam! > Zuck releases llama 3 beats GPT 4. > Zuck releases VoiceBox beats OAI TTS. > Zuck releases Imagine beats Dall-E. > Zuck releases Seamless beats Whisper. Open AI > OpenAI. Quite likely, this is true. 2024 would be huge if this happens.

21

20

381

@reach_vb

Vaibhav (VB) Srivastav

3 months

Introducing @NVIDIAAI & @suno_ai_ 's Parakeet-TDT! ✨ The latest in the Parakeet series, Nvidia & Suno beat Whisper again and won the Open ASR Leaderboard - this time by ~1 WER. All of this by making the model ~175% faster than the last generation of the models. ⚡ Bonus:…

7

65

384

@reach_vb

Vaibhav (VB) Srivastav

5 months

Making audio a first-class citizen in LLMs: Qwen Audio 🔉 Using a Multi-Task Training Framework, Qwen Audio - Combines OpenAI's Whisper large v2 (Audio encoder) with Qwen 7B LM to train on over 30 audio tasks jointly. Tasks ranging from Speech Recognition to Music Captioning…

Tweet media one

3

66

378

@reach_vb

Vaibhav (VB) Srivastav

2 months

Fuck! This is wild! 🤯 Infinite AI Jam - mix instruments, genres and more! MusicFX:

9

63

379

@reach_vb

Vaibhav (VB) Srivastav

4 months

Whisper on MLX just got better! 🔥 Word-level timestamps + confidence scores and models on the 🤗Hub ;) Don't forget to `git pull` before you get whisper-ing. Kudos to @awnihannun & bofenghuang! P.S. It now also supports Large-v3 \o/

8

41

376

@reach_vb

Vaibhav (VB) Srivastav

19 days

llms this, llms that! why aren't people releasing more audio stuff 😭 i want tts, asr, speech translation, voice cloning, text to audio, text to music, anything..

57

27

376

@reach_vb

Vaibhav (VB) Srivastav

6 months

VITS is probably the most underrated TTS model out there! At just 150M params, it works on-CPU runtime 🤯 Sure, it isn't the most realistic, but it does its job for most on-device use cases like reading an article, practising a language, etc.!! Here's how you can use it with…

Tweet media one

18

59

370

@reach_vb

Vaibhav (VB) Srivastav

3 months

Let's fucking goooo! CodeLlama 70B is here. > 67.8 on HumanEval!

Tweet card media

codellama/CodeLlama-70b-Instruct-hf · Hugging Face

10

47

369

@reach_vb

Vaibhav (VB) Srivastav

6 months

Open Whisper-style Speech Model (OWSM) 🔉 OWSM reproduces Whisper training using an open-source toolkit (ESPNet) and publicly available datasets. OWSM is much more efficient in training and is robust at multi-directional translations. Open source training, inference scripts and…

Tweet media one

8

77

364

@reach_vb

Vaibhav (VB) Srivastav

6 months

Want high-quality Audio embeddings? CLAP! 👏 We support the latest general, music and speech CLAP models in Transformers! Use it for Text-to-Speech/ Text-to-Music training and more. What is CLAP? CLAP (Contrastive Language-Audio Pretraining) is a neural network trained on…

Tweet media one

12

57

367

@reach_vb

Vaibhav (VB) Srivastav

2 months

This is wild! A team trained Mistral 7B playing DOOM 🤯 Trained on ASCII representations - love the ingenuity and sheer creativity! Open/ acc.

13

43

361

@reach_vb

Vaibhav (VB) Srivastav

2 months

VILA by @NVIDIAAI & @MIT 🔥 > 13B, 7B and 2.7B model checkpoints. > Beats the current SoTA models like QwenVL. > Interleaved Vision + Text pre-training. > Followed by joint SFT. > Works with AWQ for 4-bit inference. Models on the Hugging Face Hub:

Tweet media one

6

61

365

@reach_vb

Vaibhav (VB) Srivastav

5 months

Common Voice 16 by @mozilla is out on the Hub! 🔥 This brings a total 30,328 hours of audio spread across 120 languages! Out of the total 30K hours of audio 19.5K is validated! ✨ You can access it all in less than 2 lines of code with the datasets library: from datasets…

7

70

360

@reach_vb

Vaibhav (VB) Srivastav

11 days

MASSIVE UPDATE: Text to Speech arena adds OpenVoice v2, PlayHT 2.0 & Voicecraft 2.0 🔥 *sound on 🔔* Why them? OpenVoice v2 is the latest release from myshell ai, trained with more data, better training strategy and more importantly released under MIT license Voicecraft 2.0…

13

59

368

@reach_vb

Vaibhav (VB) Srivastav

6 months

TIL: You can drop in GPTQ weights directly in the Transformers API 🤯 Load a Zephyr 7B in less than 5 GB GPU VRAM! GPTQ (Post Training Quantisation) makes LLMs much smaller using a calibration dataset. Thanks to Optimum and AutoGPTQ - Transformers now supports GPTQ weights…

Tweet media one

7

64

348

@reach_vb

Vaibhav (VB) Srivastav

12 days

Let's go!! Common Voice 17 - now on the Hub! 🔥 With 31,000 hours of audio (& transcriptions) across 124 languages. *sound on 🎶* 847 hours of data were added in CV 17, along with 493 hours of validated data. Four new languages have been added to this edition: Haitian…

6

68

351

@reach_vb

Vaibhav (VB) Srivastav

27 days

UPDATE: Four new open models on the Text to Speech Arena! 🔥 *sound on🔉* As the Text-to-Speech ecosystem is heating up, we decided to add more competition. > Parler TTS > VoiceCraft > Vokan > GPT-SOVITS Why is this important? The TTS ecosystem is riddled with opaque metrics…

13

65

342

@reach_vb

Vaibhav (VB) Srivastav

1 month

Ratchet: A web-first, cross-platform ML developer toolkit! ⚡ *written in Rust 🦀 > Inference only. > WebGPU/CPU only. 🔥 > First class quantisation support. > Lazy computation. > Inplace by default. Supports Whisper out of the box! More models - LLMs, GGUF, etc are coming…

Tweet media one

5

56

341

@reach_vb

Vaibhav (VB) Srivastav

7 months

Open-access GPT 3.5 replacement - Zephyr 7B! ⚡️ import torch from transformers import pipeline pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-alpha", torch_dtype=torch.bfloat16, device_map="auto") messages = [ { "role": "system", "content": "You are a…

Tweet media one

9

59

333

@reach_vb

Vaibhav (VB) Srivastav

1 year

Want to train your own MusicLM? 🎶 The MusicCaps dataset is now on the 🤗Hub: The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. 🎸

3

54

333

@reach_vb

Vaibhav (VB) Srivastav

6 months

Insanely Fast Whisper is trending #1 on HackerNews! 🤯 Transformers, Optimum and Flash Attention ftw! ⚡️

Tweet media one

11

29

329

@reach_vb

Vaibhav (VB) Srivastav

2 months

Melo TTS - fast real-time TTS on CPU! ⚡ *sound on* > Multi-lingual - English, Spanish, French, Chinese, Japanese and Korean. > Apache 2.0 licensed. > Allows code-switching in ZH + EN. > Works on Mac! 🔥 > Models on the Hub ;) GG @myshell_ai 🤗

16

53

330

@reach_vb

Vaibhav (VB) Srivastav

27 days

Introducing Idefics 2 🤯 An 8B Vision-Language Model - literally punching above its weight. > Apache 2.0 licensed! 🔥 > Competitive with 30B models like MM1-Chat > 12 point increase in VQAv2, 30 point increase in TextVQA (compared to Idefics 1) > 10x fewer parameters than…

Tweet media one

10

75

328

@reach_vb

Vaibhav (VB) Srivastav

2 months

MetaVoice-1B on Metal powered by Candle! 🦀 Apache 2.0 licensed TTS with Voice Cloning. Thanks to @lmazare , you can now use MetaVoice in Rust. ⚡ Try it out via candle-examples: cargo run --example metavoice --features metal --release -- --prompt "Hey hey my name is VB."

11

58

323

@reach_vb

Vaibhav (VB) Srivastav

2 months

llama.cpp with OpenAI chat completions API! 🦙 100% local. Powered by Metal! *sound on* In 2 steps: 1. brew install ggerganov/ggerganov/llama.cpp 2. llama-server --model <path to model> -c 2048 P.S. All of this with a binary size of less than 5MB ;) That's it! 🤗

10

40

318

@reach_vb

Vaibhav (VB) Srivastav

11 months

🚨 @huggingface is releasing its Audio Course this Wednesday (Jun 14th)! Fully open source and 100% free. 6 weeks self-paced course to level up your Machine Learning game with Audio ⚡️ Sign up and don't forget to tune in for our launch event:

Tweet media one

3

66

311

@reach_vb

Vaibhav (VB) Srivastav

4 months

Welcome MAGNeT by @AIatMeta 🎶 Open access weights, training and inference codebase! \o/ 2 variants (10 sec, 30 sec), 2 sizes (small - 300M, medium - 1.5B) > MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples. > Masked…

9

59

318

@reach_vb

Vaibhav (VB) Srivastav

7 months

Introducing the Text-to-Speech/ Audio pipeline! ⚡️ @suno_ai_ 's Bark, @AIatMeta 's MMS-TTS, @MSFTResearch 's SpeechT5, Kakao Research's VITS & MusicGen! 1000+ languages, open-access models. All of these are accessible in just a few lines of code! 🤯

Tweet media one

12

66

312

@reach_vb

Vaibhav (VB) Srivastav

3 months

4x faster Llama inference! 🔥 > leverages static cache. > uses torch compile for decoder models. > very minimum code changes required. > coming to mistral and other models soon. > opens possibility to unlock even more speed-ups. massive kudos to @art_zucker for working on this…

3

41

312

@reach_vb

Vaibhav (VB) Srivastav

1 month

UPDATE: Gemma 1.1 7B & 2B - Instruction Tuned! ✨ > substantial gains in quality, coding capabilities, factuality, and instruction following. > better multi-turn conversation quality.

Tweet card media

google/gemma-1.1-7b-it · Hugging Face

5

60

312

@reach_vb

Vaibhav (VB) Srivastav

11 months

🧘‍♀️Meditate with an AI-generated melody ☮️ Brought to you by, MusicGen - A simple and controllable music generation model by @MetaAI 🎶 Models on the🤗Hub: Check it out here 👉

5

62

299

@reach_vb

Vaibhav (VB) Srivastav

6 months

Welcome MusicGen Stereo! 🎶 You can now generate high-quality stereo sounds at the speed of thought! Powered by Audiocraft from @honualx and @AIatMeta 🤗 Oh, and you can use it with Transformers with just 3 lines of code! import torch from transformers import pipeline…

6

53

297

@reach_vb

Vaibhav (VB) Srivastav

26 days

CodeQwen1 1.5 7B - GPU poor ftw! 🔥 > pre-trained on 3 trillion tokens. > 64K context. > supports tasks like code generation, code editing, sql, chat and more. > performs better than deepseek coder and chat gpt 3.5 on SWE bench. > open access model, weights on the Hub.

Tweet media one

8

45

298

@reach_vb

Vaibhav (VB) Srivastav

2 months

Introducing StarCoder2 15B 🌟 > Beats CodeLlama 34B. > 16,384 context window. > Trained in 600+ programming languages from The Stack v2. > Trained on Fill-in-the-middle objective on 4 trillion + tokens. Along with that, we release smol-StarCoder2 3B & 7B ⭐ > 16K context…

Tweet media one

7

49

292

@reach_vb

Vaibhav (VB) Srivastav

5 months

MusicGen + LLM = High-quality tunes 🌟 Creating tunes by mere text prompts is no easy feat; there have been multiple attempts, but anecdotally, I have yet to find any that beats MusicGen by @AIatMeta ! All you need is about 5GB of GPU VRAM (or a Google Colab) ;) Here's how you…

2

64

288

@reach_vb

Vaibhav (VB) Srivastav

4 months

Yi-VL-6B New GPU poor Vision Language Model just dropped! ✨ > 6B & 34B parameter models > Multi-round text-image conversations > Bilingual: Chinese + English > Strong image comprehension > Fine-grained image resolution - 448 x 448

Tweet media one

6

48

289

@reach_vb

Vaibhav (VB) Srivastav

1 month

Distil-whisper now with apple neural engine support via WhisperKit! 🔥 You can now: brew install whisperkit-cli Followed by: whisperkit-cli transcribe --model-prefix "distil" --model "large-v3" --verbose --audio-path ~/Downloads/jfk.wav Bonus: If you have an M2 or higher…

4

46

288

@reach_vb

Vaibhav (VB) Srivastav

4 months

Faster Mistral 8x7B with fused modules & AWQ 🔥 Powered by AutoAWQ & Transformers. Fused modules offer improved accuracy and performance by replacing the Attention, MLP, and Layernorm layers with their corresponding fused version. Fused modules can use faster kernels and…

7

56

285

@reach_vb

Vaibhav (VB) Srivastav

1 year

🗣️ A new speech community event is incoming!! 📆 The Whisper fine-tuning sprints will be held from the 5th to the 19th of December. 🌍 Come join us to build better and faster speech recognition systems in 70+ languages. 🔥 Claim SoTA in a language of your choice!

Tweet media one

9

56

285