Vaibhav (VB) Srivastav Profile Banner
Vaibhav (VB) Srivastav Profile
Vaibhav (VB) Srivastav

@reach_vb

11,294
Followers
172
Following
719
Media
3,862
Statuses

GPU poor @Huggingface | F1 fan | Here for @at_sofdog ’s wisdom | *opinions my own

nvidia-smi
Joined June 2017
Don't wanna be here? Send us removal request.
Pinned Tweet
@reach_vb
Vaibhav (VB) Srivastav
4 months
I did this. Fuck what anyone else says, just put the pedal to the metal and BUILD. Push spaghetti code. Nobody cares about OOPs. Doesn’t matter what anyone thinks. Just keep on doing. Document in public. Don’t listen to the haters. Release more than you refactor. Just keep…
@NickADobos
Nick Dobos
4 months
Just do stuff
Tweet media one
20
199
2K
55
165
2K
@reach_vb
Vaibhav (VB) Srivastav
3 months
Let's go! MetaVoice 1B 🔉 > 1.2B parameter model. > Trained on 100K hours of data. > Supports zero-shot voice cloning. > Short & long-form synthesis. > Emotional speech. > Best part: Apache 2.0 licensed. 🔥 Powered by a simple yet robust architecture: > Encodec (Multi-Band…
31
317
2K
@reach_vb
Vaibhav (VB) Srivastav
5 months
We made Whisper even faster. ~40% faster!! 🔥 Whisper solidifies its lead by an even larger margin! With the latest changes in transformers - large v3 is the best* and the fastest among the top 5 on the Open ASR Leaderboard. Below is the reduction in Real-time factor (RTF):…
Tweet media one
19
191
1K
@reach_vb
Vaibhav (VB) Srivastav
6 months
Insanely fast whisper now with Speaker Diarisation! 🔥 100% local and works on your Mac or on Nvidia GPUs. All thanks to @hbredin 's Pyannote library, you can now get blazingly fast transcriptions and speaker segmentations! ⚡️ Here's how you can use it too: pipx install…
37
167
1K
@reach_vb
Vaibhav (VB) Srivastav
6 months
Llama 2 7B chat, running 100% private on Mac, powered by CoreML! ⚡️ We're optimising this setup to get much more faster generation. 🔥
29
134
1K
@reach_vb
Vaibhav (VB) Srivastav
3 months
Whisper powered by Apple Neural Engine! 🔥 The lads at @argmaxinc optimised Whisper to work at blazingly fast speeds on iOS and Mac! > All code is MIT-licensed. > Upto 3x faster than the competition. > Neural Engine as well as Metal runners. > Open source CoreML models. > 2…
25
143
1K
@reach_vb
Vaibhav (VB) Srivastav
4 months
Mixtral 8x7B Instruct with AWQ & Flash Attention 2 🔥 All in ~24GB GPU VRAM! With the latest release of AutoAWQ - you can now run Mixtral 8x7B MoE with Flash Attention 2 for blazingly fast inference. All in < 10 lines of code. The only real change except loading AWQ weights…
22
167
980
@reach_vb
Vaibhav (VB) Srivastav
6 months
Insanely fast whisper now with Whisper Large V3 🔥 Transcribe 150 minutes of audio in less than 98 seconds (powered by Transformers & @tri_dao Flash Attention 2). Don't believe it? look at the benchmarks below ;) All of this with the familiar Transformers API and optionally…
Tweet media one
36
131
978
@reach_vb
Vaibhav (VB) Srivastav
4 months
What are the top open source TTS models out there? 🤔 Here’s my list so far: XTTS - YourTTS - FastSpeech2 - VITS - TorToiSe - Pheme - …
40
196
928
@reach_vb
Vaibhav (VB) Srivastav
1 month
Introducing Command R Plus ⚡ > Beats claude-3, mistral-large, gpt-4 turbo. > 104 Billion parameters. > Built with multi-step tool use and RAG. > Supports 10 languages. > Context length of 128K. > Trained with grounded generation capabilities - citations and responses based on…
Tweet media one
36
154
890
@reach_vb
Vaibhav (VB) Srivastav
4 months
Alrighty! W2V-BERT 2.0: Speech encoder for low-resource languages! 🔥 With < 15 hours of audio, you can beat Whisper and get your own SoTA ASR model! > Pre-trained on 4.5M hours of data. > 600M parameters. > 143+ languages. > 10-30x faster than Whisper. > Best part: MIT license…
Tweet media one
9
174
865
@reach_vb
Vaibhav (VB) Srivastav
1 year
After 70x faster Whisper, we present to you - 5x faster Whisper fine-tuning! ⚡️ Powered by LoRA and 🤗 PEFT - Squeeze in 5x larger batch sizes, fit Whisper-large checkpoint < 8GB VRAM! 🔥 Best part? With almost no degradation in WER! 🤯 Check it out:
Tweet media one
12
144
857
@reach_vb
Vaibhav (VB) Srivastav
5 months
distil-whisper small now in insanely-fast-whisper ⚡️ 100% private, on Mac < 1.5GB VRAM! 166M parameters are all you need* ;)
14
117
859
@reach_vb
Vaibhav (VB) Srivastav
10 days
Phi 3 running on your browser! 100% local, powered by WebGPU & Rust 🦀
13
147
850
@reach_vb
Vaibhav (VB) Srivastav
5 months
4-bit quantised Mistral 7B instruct v0.2! - fasttt! 🏎️ On Mac (M2). Powered by MLX. Fully local. Requires < 10GB RAM for 4-bits. (GPU poors, rise up) Have to say, MLX is a solid alternative to llama.cpp
21
97
801
@reach_vb
Vaibhav (VB) Srivastav
4 months
fuck yeah! whisper on metal powered by rust 🦀 100% local + fastt! brought to you by 🤗candle.
22
81
792
@reach_vb
Vaibhav (VB) Srivastav
4 months
Welcome OpenVoice! 🎙️ A versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. Open access weights 🔥 It enables granular control over voice styles, including…
11
154
784
@reach_vb
Vaibhav (VB) Srivastav
25 days
Current best local model: 1. LLM - Mistral Instruct v0.2 7B/ Command R (4bit) 2. TTS - Parler-TTS/ Style-TTS 2 3. ASR - distil-whisper/ faster-whisper 4. VLM - Idefics 2/ CogVLM Best stack: 1. Use llama.cpp to run LLM/ VLM via the server 2. Transformers to run Parler TTS/…
43
128
790
@reach_vb
Vaibhav (VB) Srivastav
1 month
LETS GOO! Parler TTS 🔥 A fully open-source, Apache 2.0 licensed Text-to-speech model focused on providing maximum controllability. Through voice prompts, you can control the pitch, speed, gender, noise levels, emotion characteristics and more! > Trained on 10K hours of…
26
129
768
@reach_vb
Vaibhav (VB) Srivastav
5 months
Mistral just dropped an improved instruct fine-tuned version of their 7B model - v0.2 Good day for GPU poor! 🔥
17
92
772
@reach_vb
Vaibhav (VB) Srivastav
4 months
Let's go, 200% faster Whisper w/ speculative decoding! 🔥 Whisper (baseline) - 73 seconds Whisper w/ Speculative Decoding - 33 seconds All with zero drop in performance! ⚡ Pseudocode: 1. Initialise a Teacher model ex: openai/whisper-large-v2. 2. Load an assistant model ex:…
20
119
758
@reach_vb
Vaibhav (VB) Srivastav
5 months
Oof! Whisper on @Apple 's MLX backend is quite stonkingly fast! 🏃 Not only that, it optimises GPU + CPU usage quite well! What is MLX? MLX is a framework released by Apple for ML researchers to train and infer ML models efficiently. MLX has a Python API that closely follows…
21
98
731
@reach_vb
Vaibhav (VB) Srivastav
3 months
Run Mixtral 8x7B w/ ~13 GB VRAM 🤯 *On a free colab too, powered by Transformers & AQLM! AQLM is a new SOTA method for low-bitwidth LLM quantization, targeted to the “extreme” 2-3bit / parameter range. In less than 5 lines of code, you can try it out too! ⚡ Make sure to…
Tweet media one
17
106
716
@reach_vb
Vaibhav (VB) Srivastav
6 months
Insanely fast whisper - now with a CLI⚡️ You can now translate/ transcribe 100s of hours of data across 99 languages! - all from your terminal. Here's how you can use it: 1. Install requirements pip install transformers, accelerate, optimum 2. Grab the transcribe py file and…
Tweet media one
18
94
686
@reach_vb
Vaibhav (VB) Srivastav
3 months
Whisper running on WatchOS! 🔥 > Powered by WhisperKit by @argmaxinc > Supports up to Whisper base > Leverages Neural Engine ⚡ > Three lines of code ;) > Works real-time! > MIT license Quite amazed by the speed with which Argmax is shipping. Possibly the fastest & reliable…
12
106
678
@reach_vb
Vaibhav (VB) Srivastav
2 months
ChatMusician is 🔥 *sound on* > Llama 2 pre-trained + fine-tuned further. > Beats GPT 4. > Can compose well-structured, full-length music conditioned on texts, chords, melodies, motifs, and musical forms. > Code, data, model, benchmark - open source - MIT licensed! ⚡
17
124
669
@reach_vb
Vaibhav (VB) Srivastav
6 months
Whisper Large V3 has landed in Transformers! 🎉 The large-v3 checkpoint open-sourced by Open AI yesterday is now fully compatible with Transformers! Best part: It is fully compatible with the ASR pipeline! Here's how you can use it: import torch from transformers import…
Tweet media one
7
107
673
@reach_vb
Vaibhav (VB) Srivastav
2 months
Introducing Distil-Whisper v3 ⚡ > ~50% less parameters and 6x faster than Large-v3. > More accurate than large-v3 on long-form synthesis. Available with 🦀 WebGPU, Whisper.cpp, Transformers, Faster-Whisper and Transformers.js support! Drop in; no changes are required! 🔥
19
79
665
@reach_vb
Vaibhav (VB) Srivastav
4 months
Introducing Open TTS Tracker! 🗣️ *sound on* A one-stop shop to track all open access/ source TTS models! Ranging from XTTS to Pheme, OpenVoice to VITS, and more... ⚡ For each model, we compile: 1. Souce-code 2. Checkpoints 3. License 4. Fine-tuning code 5. Languages…
25
107
664
@reach_vb
Vaibhav (VB) Srivastav
4 months
Mistral QLoRA w/ MLX on your Mac ⚡ Utilising 100% GPU, fully offline. You can now convert any Hugging Face model to a Quantised format and use it to fine-tune on-device! python convert. py --hf-path mistralai/Mistral-7B-v0.1 -q Then, to fine-tune the run: python lora. py…
4
126
636
@reach_vb
Vaibhav (VB) Srivastav
4 months
Introducing MLX-LM! ⚡ *sound on* Run LLMs on-device directly on your Mac with 3 lines of code! ;) 100% local and quite spiffy (even faster with 4-bit)! I made a quick video covering the package, its capabilities and a bit of quantisation. The video goes through what MLX is,…
19
117
636
@reach_vb
Vaibhav (VB) Srivastav
6 months
Insanely fast whisper now on Mac 🚀 You can now get the same experience of whisper in the comfort of your Mac, too! This is made possible by torch.mps backend. It isn't as fast as CUDA; however, it works pretty fast and can utilise the GPU well! All you need to do is this:…
25
68
599
@reach_vb
Vaibhav (VB) Srivastav
2 months
MusicLang 🎶 - Llama 2 based Music generation model! > Llama2 based, trained from scratch. > Permissively licensed - open source. > Optimised to run on CPU. 🔥 > Highly controllable, chose tempo, chord progression, bar range and more! ;) Absolutely love playing with the demo,…
8
113
577
@reach_vb
Vaibhav (VB) Srivastav
4 months
MIT licensed Phi running on Mac powered by Rust! 🦀 Spiffy and fast, powered by Candle! ⚡ As simple as running: cargo run --example phi --release --features metal -- --model 2 --prompt "A skier slides down a frictionless slope of height 40m and length 80m. What's the skiers…
18
94
572
@reach_vb
Vaibhav (VB) Srivastav
3 months
Introducing Qwen 1.5! 🔥 > 6 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, and 72B. > Beats GPT 3.5, Mistral-Medium. > Multilingual support of both base and chat models. > Support 32K context length. > Base + chat model checkpoints released. > Runs natively with…
Tweet media one
15
86
570
@reach_vb
Vaibhav (VB) Srivastav
3 months
Introducing fast-llm rs! 🦀 Infer LLMs like Mistral, LLama, Mixtral, on your Mac at the touch of your CLI! Powered by Candle and Rust! ⚡ Works on Metal and CPU - Infer your GGUF checkpoints in pure Rust! ;) All you gotta do is: Step 1: git clone https://github.…
27
89
547
@reach_vb
Vaibhav (VB) Srivastav
18 days
Snowflake dropped a 408B Dense + Hybrid MoE 🔥 > 17B active parameters > 128 experts > trained on 3.5T tokens > uses top-2 gating > fully apache 2.0 licensed (along with data recipe too) > excels at tasks like SQL generation, coding, instruction following > 4K context window,…
Tweet media one
18
100
542
@reach_vb
Vaibhav (VB) Srivastav
5 months
High-quality speech/ text translations with SeamlessM4T v2 by @AIatMeta 🔉 M4T == Massively Multilingual and Multimodal Machine Translation seamlessly ;) You can now translate in 100 languages from/ to speech or text with transformers! Here's how you can do it, too! 👇 1.…
Tweet media one
16
76
513
@reach_vb
Vaibhav (VB) Srivastav
5 months
Nous Hermes Yi 34B beats Mixtral 8X7B 🔥 With AWQ, you only need ~20GB VRAM to run this beast, 100% local and offline! Trained on 1M+ GPT4 generated data points! (synthetic data ftw!) Here's how you can run it, too (w/ transformers and AutoAWQ): from transformers import…
13
88
514
@reach_vb
Vaibhav (VB) Srivastav
2 months
Wow! @CohereForAI just released CMD-R 🔥 > Beats GPT 3.5 > 128K context window. > 35 billion parameters. > 10 languages. > Optimised for reasoning, question answering and summarisation. > Use it directly in transformers 🤗
10
87
506
@reach_vb
Vaibhav (VB) Srivastav
3 months
Announcing TTS Arena! 🗣️ *sound on* One place to test, rate and find the champion of current open models. A continually updated space with the greatest and the best of the current TTS landscape! ⚡ Rate once, rate twice - help us find the best out there. Starting with five…
29
101
486
@reach_vb
Vaibhav (VB) Srivastav
3 months
NeMo Canary 1B by @NVIDIAAI 🔥 *Sound on 🔊* > Tops the Open ASR Leaderboard. > Beats Whisper to punch for ASR. > Beats Seamless M4Tv2 for Speech Translation. > Supports 4 languages - English, Spanish, French & German. > Trained on 85,000 hours of annotated audio. >…
12
94
480
@reach_vb
Vaibhav (VB) Srivastav
2 months
Let's goo! StyleTTS 2 - New king of the Text to Speech Arena! 👑 StyleTTS 2 is fully open source, and the authors are training better and larger checkpoints. 🔥 Stay tuned for some exciting updates re: StyleTTS v2 - things will get excitinggg! Side note: 200 stars on the TTS…
Tweet media one
11
73
450
@reach_vb
Vaibhav (VB) Srivastav
3 months
OpenMath Instruct-1 by @NVIDIAAI 🧮 > 1.8 Million Problem-Solution (synthetic) pairs. > Uses GSM8K & MATH training subsets. > Uses Mixtral 8x7B to produce the pairs. > Leverages both text reasoning + code interpreter during generation. > Released LLama, CodeLlama, Mistral,…
4
95
468
@reach_vb
Vaibhav (VB) Srivastav
6 months
Whisper Large-v3:  New champion for the Open ASR leaderboard! 👑 We evaluated the latest Whisper checkpoint on a series of datasets and found it the most performant! Here are a couple of quick takeaways from running these evaluations: 1. The best performance for Whisper…
Tweet media one
11
83
454
@reach_vb
Vaibhav (VB) Srivastav
11 months
Want to train your own Bark/MusicGen-like TTS/TTA models? 👀 The SoTA Encodec model by @MetaAI has now landed in 🤗Transformers! It supports compression up to 1.5KHz and produces discrete audio representations. ⚡️ Model: Colab:
10
103
450
@reach_vb
Vaibhav (VB) Srivastav
9 days
BOOM! Whisper + Speaker Diarisation! 🔥 Blazingly fast meeting transcription all with a simple call to an API - powered by Inference Endpoints ⚡ - Whisper to transcribe speech to text (w/ Flash Attention) - Diarization to break down the transcription by speakers (w/ Pyannote)…
Tweet media one
19
53
457
@reach_vb
Vaibhav (VB) Srivastav
5 months
Running Phi-2 on Mac - fully utilising the GPU! Powered by MLX! ⚡️ 2.7B model running 100% locally!
12
53
443
@reach_vb
Vaibhav (VB) Srivastav
4 months
Parakeet RNNT & CTC models top the Open ASR Leaderboard! 👑 Brought to you by @NVIDIAAI and @suno_ai_ , parakeet beats Whisper and regains its first place. The models are released under a commercially permissive license! 🥳 The models inherit the same FastConformer…
Tweet media one
17
75
441
@reach_vb
Vaibhav (VB) Srivastav
6 months
Insanely fast whisper now with Flash Attention 2 🔥 With the latest release of Transformers (4.35), you can run Whisper & Distil-Whisper even faster with Flash Attention 2. To benefit from it, make sure to upgrade your transformers & flash-attn version: pip install --upgrade…
Tweet media one
6
63
443
@reach_vb
Vaibhav (VB) Srivastav
1 month
Distil Whisper - streaming on an iPhone! 🔥 100% local. Fully on-device. WhisperKit ftw! ⚡ brew install whisperkit-cli && whisperkit-cli transcribe --model-prefix distil --model distil-large-v3 --audio-path <audio-path>
@reach_vb
Vaibhav (VB) Srivastav
1 month
Distil-whisper now with apple neural engine support via WhisperKit! 🔥 You can now: brew install whisperkit-cli Followed by: whisperkit-cli transcribe --model-prefix "distil" --model "large-v3" --verbose --audio-path ~/Downloads/jfk.wav Bonus: If you have an M2 or higher…
4
46
288
10
56
439
@reach_vb
Vaibhav (VB) Srivastav
7 months
UPDATE: New benchmark for insanely fast whisper! 🤗 You can transcribe 3000 hours of audio in less than 2 hours! Batching + BetterTransformer is still the fastest way to transcribe audio insanely fast!
Tweet media one
13
59
433
@reach_vb
Vaibhav (VB) Srivastav
4 months
PSA 📣: MLX can now pull Mistral/ Llama/ TinyLlama safetensors directly from the Hub! 🔥 pip install -U mlx is all you need! All mistral/ llama fine-tunes supported too! 20,000+ checkpoints overall! P.S. We also provide a script to convert and quantise checkpoints and…
13
56
430
@reach_vb
Vaibhav (VB) Srivastav
1 month
mixtral 8x22B - things we know so far 🫡 > 176B parameters > performance in between gpt4 and claude sonnet (according to their discord) > same/ similar tokeniser used as mistral 7b > 65536 sequence length > 8 experts, 2 experts per token: More > would require ~260GB VRAM in…
Tweet media one
14
59
430
@reach_vb
Vaibhav (VB) Srivastav
1 month
IT WORKS! Running Mixtral 8x22B with Transformers! 🔥 Running on a DGX (4x A100 - 80GB) with CPU offloading 🤯
@reach_vb
Vaibhav (VB) Srivastav
1 month
mixtral 8x22B - things we know so far 🫡 > 176B parameters > performance in between gpt4 and claude sonnet (according to their discord) > same/ similar tokeniser used as mistral 7b > 65536 sequence length > 8 experts, 2 experts per token: More > would require ~260GB VRAM in…
Tweet media one
14
59
430
14
54
424
@reach_vb
Vaibhav (VB) Srivastav
7 months
Generate melodies with MusicGen & Transformers, but faster! ⚡️ import torch from transformers import pipeline pipe = pipeline("text-to-audio", "facebook/musicgen-small", torch_dtype=torch.float16) pipe("upbeat lo-fi music") That's it! 🤗
12
85
401
@reach_vb
Vaibhav (VB) Srivastav
6 months
Welcome distil-whisper 🔥 49% smaller, 6x faster, and within the 1% performance range of Whisper-large-v2! All in the good ol' Transformers API. 1. Make sure to upgrade transformers to the latest release. pip install --upgrade transformers 2. Import torch & transformers…
Tweet media one
8
72
415
@reach_vb
Vaibhav (VB) Srivastav
7 months
Transcribe 150 minutes of Audio in less than 5 minutes with Whisper large! 🏎️ Powered by Transformers and Optimum, you get blazingly fast transcriptions in a few lines of code! pipe = pipeline("automatic-speech-recognition", "openai/whisper-large-v2", torch_dtype=torch.float16,…
Tweet media one
11
59
397
@reach_vb
Vaibhav (VB) Srivastav
2 months
Damn! DBRX 132B is wild! 🤯 > Trained on 12 Trillion tokens. > Beats Grok-1, Mixtral, etc. > Mixture of Experts. 16 experts, 4 active. > Uses RoPE, GLU and GQA. > Context size of 32K. > Open access - Base and Instruct. 🔥 > Requires 264 GB RAM; inference with Transformers! 🤗
Tweet media one
16
62
388
@reach_vb
Vaibhav (VB) Srivastav
3 months
恭喜发财 Gong xi fa cai 🧧 The impact of China on the current AI/ ML landscape has been ginormous. From LLMs to TTS to ASR, we've gotten SoTA models weekly from China-based labs! Some highlights for me: LLM/ VLMs 1. Qwen 1.5 & Qwen VL - 2. OpenBMB…
Tweet media one
14
79
393
@reach_vb
Vaibhav (VB) Srivastav
8 days
SURPRISE: Google just dropped CodeGemma 1.1 7B IT 🔥 The models get incrementally better at Single and Multi-generations. Major boost in in C#, Go, Python 🐍 Along with the 7B IT they release an updated 2B base model too. Enjoy!
Tweet media one
14
82
394
@reach_vb
Vaibhav (VB) Srivastav
3 months
Hear me out, fam! > Zuck releases llama 3 beats GPT 4. > Zuck releases VoiceBox beats OAI TTS. > Zuck releases Imagine beats Dall-E. > Zuck releases Seamless beats Whisper. Open AI > OpenAI. Quite likely, this is true. 2024 would be huge if this happens.
21
20
381
@reach_vb
Vaibhav (VB) Srivastav
3 months
Introducing @NVIDIAAI & @suno_ai_ 's Parakeet-TDT! ✨ The latest in the Parakeet series, Nvidia & Suno beat Whisper again and won the Open ASR Leaderboard - this time by ~1 WER. All of this by making the model ~175% faster than the last generation of the models. ⚡ Bonus:…
7
65
384
@reach_vb
Vaibhav (VB) Srivastav
5 months
Making audio a first-class citizen in LLMs: Qwen Audio 🔉 Using a Multi-Task Training Framework, Qwen Audio - Combines OpenAI's Whisper large v2 (Audio encoder) with Qwen 7B LM to train on over 30 audio tasks jointly. Tasks ranging from Speech Recognition to Music Captioning…
Tweet media one
3
66
378
@reach_vb
Vaibhav (VB) Srivastav
2 months
Fuck! This is wild! 🤯 Infinite AI Jam - mix instruments, genres and more! MusicFX:
9
63
379
@reach_vb
Vaibhav (VB) Srivastav
4 months
Whisper on MLX just got better! 🔥 Word-level timestamps + confidence scores and models on the 🤗Hub ;) Don't forget to `git pull` before you get whisper-ing. Kudos to @awnihannun & bofenghuang! P.S. It now also supports Large-v3 \o/
8
41
376
@reach_vb
Vaibhav (VB) Srivastav
19 days
llms this, llms that! why aren't people releasing more audio stuff 😭 i want tts, asr, speech translation, voice cloning, text to audio, text to music, anything..
57
27
376
@reach_vb
Vaibhav (VB) Srivastav
6 months
VITS is probably the most underrated TTS model out there! At just 150M params, it works on-CPU runtime 🤯 Sure, it isn't the most realistic, but it does its job for most on-device use cases like reading an article, practising a language, etc.!! Here's how you can use it with…
Tweet media one
18
59
370
@reach_vb
Vaibhav (VB) Srivastav
3 months
Let's fucking goooo! CodeLlama 70B is here. > 67.8 on HumanEval!
10
47
369
@reach_vb
Vaibhav (VB) Srivastav
6 months
Open Whisper-style Speech Model (OWSM) 🔉 OWSM reproduces Whisper training using an open-source toolkit (ESPNet) and publicly available datasets. OWSM is much more efficient in training and is robust at multi-directional translations. Open source training, inference scripts and…
Tweet media one
8
77
364
@reach_vb
Vaibhav (VB) Srivastav
6 months
Want high-quality Audio embeddings? CLAP! 👏 We support the latest general, music and speech CLAP models in Transformers! Use it for Text-to-Speech/ Text-to-Music training and more. What is CLAP? CLAP (Contrastive Language-Audio Pretraining) is a neural network trained on…
Tweet media one
12
57
367
@reach_vb
Vaibhav (VB) Srivastav
2 months
This is wild! A team trained Mistral 7B playing DOOM 🤯 Trained on ASCII representations - love the ingenuity and sheer creativity! Open/ acc.
13
43
361
@reach_vb
Vaibhav (VB) Srivastav
2 months
VILA by @NVIDIAAI & @MIT 🔥 > 13B, 7B and 2.7B model checkpoints. > Beats the current SoTA models like QwenVL. > Interleaved Vision + Text pre-training. > Followed by joint SFT. > Works with AWQ for 4-bit inference. Models on the Hugging Face Hub:
Tweet media one
6
61
365
@reach_vb
Vaibhav (VB) Srivastav
5 months
Common Voice 16 by @mozilla is out on the Hub! 🔥 This brings a total 30,328 hours of audio spread across 120 languages! Out of the total 30K hours of audio 19.5K is validated! ✨ You can access it all in less than 2 lines of code with the datasets library: from datasets…
7
70
360
@reach_vb
Vaibhav (VB) Srivastav
11 days
MASSIVE UPDATE: Text to Speech arena adds OpenVoice v2, PlayHT 2.0 & Voicecraft 2.0 🔥 *sound on 🔔* Why them? OpenVoice v2 is the latest release from myshell ai, trained with more data, better training strategy and more importantly released under MIT license Voicecraft 2.0…
13
59
368
@reach_vb
Vaibhav (VB) Srivastav
6 months
TIL: You can drop in GPTQ weights directly in the Transformers API 🤯 Load a Zephyr 7B in less than 5 GB GPU VRAM! GPTQ (Post Training Quantisation) makes LLMs much smaller using a calibration dataset. Thanks to Optimum and AutoGPTQ - Transformers now supports GPTQ weights…
Tweet media one
7
64
348
@reach_vb
Vaibhav (VB) Srivastav
12 days
Let's go!! Common Voice 17 - now on the Hub! 🔥 With 31,000 hours of audio (& transcriptions) across 124 languages. *sound on 🎶* 847 hours of data were added in CV 17, along with 493 hours of validated data. Four new languages have been added to this edition: Haitian…
6
68
351
@reach_vb
Vaibhav (VB) Srivastav
27 days
UPDATE: Four new open models on the Text to Speech Arena! 🔥 *sound on🔉* As the Text-to-Speech ecosystem is heating up, we decided to add more competition. > Parler TTS > VoiceCraft > Vokan > GPT-SOVITS Why is this important? The TTS ecosystem is riddled with opaque metrics…
13
65
342
@reach_vb
Vaibhav (VB) Srivastav
1 month
Ratchet: A web-first, cross-platform ML developer toolkit! ⚡ *written in Rust 🦀 > Inference only. > WebGPU/CPU only. 🔥 > First class quantisation support. > Lazy computation. > Inplace by default. Supports Whisper out of the box! More models - LLMs, GGUF, etc are coming…
Tweet media one
5
56
341
@reach_vb
Vaibhav (VB) Srivastav
7 months
Open-access GPT 3.5 replacement - Zephyr 7B! ⚡️ import torch from transformers import pipeline pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-alpha", torch_dtype=torch.bfloat16, device_map="auto") messages = [ { "role": "system", "content": "You are a…
Tweet media one
9
59
333
@reach_vb
Vaibhav (VB) Srivastav
1 year
Want to train your own MusicLM? 🎶 The MusicCaps dataset is now on the 🤗Hub: The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. 🎸
3
54
333
@reach_vb
Vaibhav (VB) Srivastav
6 months
Insanely Fast Whisper is trending #1 on HackerNews! 🤯 Transformers, Optimum and Flash Attention ftw! ⚡️
Tweet media one
11
29
329
@reach_vb
Vaibhav (VB) Srivastav
2 months
Melo TTS - fast real-time TTS on CPU! ⚡ *sound on* > Multi-lingual - English, Spanish, French, Chinese, Japanese and Korean. > Apache 2.0 licensed. > Allows code-switching in ZH + EN. > Works on Mac! 🔥 > Models on the Hub ;) GG @myshell_ai 🤗
16
53
330
@reach_vb
Vaibhav (VB) Srivastav
27 days
Introducing Idefics 2 🤯 An 8B Vision-Language Model - literally punching above its weight. > Apache 2.0 licensed! 🔥 > Competitive with 30B models like MM1-Chat > 12 point increase in VQAv2, 30 point increase in TextVQA (compared to Idefics 1) > 10x fewer parameters than…
Tweet media one
10
75
328
@reach_vb
Vaibhav (VB) Srivastav
2 months
MetaVoice-1B on Metal powered by Candle! 🦀 Apache 2.0 licensed TTS with Voice Cloning. Thanks to @lmazare , you can now use MetaVoice in Rust. ⚡ Try it out via candle-examples: cargo run --example metavoice --features metal --release -- --prompt "Hey hey my name is VB."
11
58
323
@reach_vb
Vaibhav (VB) Srivastav
2 months
llama.cpp with OpenAI chat completions API! 🦙 100% local. Powered by Metal! *sound on* In 2 steps: 1. brew install ggerganov/ggerganov/llama.cpp 2. llama-server --model <path to model> -c 2048 P.S. All of this with a binary size of less than 5MB ;) That's it! 🤗
10
40
318
@reach_vb
Vaibhav (VB) Srivastav
11 months
🚨 @huggingface is releasing its Audio Course this Wednesday (Jun 14th)! Fully open source and 100% free. 6 weeks self-paced course to level up your Machine Learning game with Audio ⚡️ Sign up and don't forget to tune in for our launch event:
Tweet media one
3
66
311
@reach_vb
Vaibhav (VB) Srivastav
4 months
Welcome MAGNeT by @AIatMeta 🎶 Open access weights, training and inference codebase! \o/ 2 variants (10 sec, 30 sec), 2 sizes (small - 300M, medium - 1.5B) > MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples. > Masked…
9
59
318
@reach_vb
Vaibhav (VB) Srivastav
7 months
Introducing the Text-to-Speech/ Audio pipeline! ⚡️ @suno_ai_ 's Bark, @AIatMeta 's MMS-TTS, @MSFTResearch 's SpeechT5, Kakao Research's VITS & MusicGen! 1000+ languages, open-access models. All of these are accessible in just a few lines of code! 🤯
Tweet media one
12
66
312
@reach_vb
Vaibhav (VB) Srivastav
3 months
4x faster Llama inference! 🔥 > leverages static cache. > uses torch compile for decoder models. > very minimum code changes required. > coming to mistral and other models soon. > opens possibility to unlock even more speed-ups. massive kudos to @art_zucker for working on this…
3
41
312
@reach_vb
Vaibhav (VB) Srivastav
1 month
UPDATE: Gemma 1.1 7B & 2B - Instruction Tuned! ✨ > substantial gains in quality, coding capabilities, factuality, and instruction following. > better multi-turn conversation quality.
5
60
312
@reach_vb
Vaibhav (VB) Srivastav
11 months
🧘‍♀️Meditate with an AI-generated melody ☮️ Brought to you by, MusicGen - A simple and controllable music generation model by @MetaAI 🎶 Models on the🤗Hub: Check it out here 👉
5
62
299
@reach_vb
Vaibhav (VB) Srivastav
6 months
Welcome MusicGen Stereo! 🎶 You can now generate high-quality stereo sounds at the speed of thought! Powered by Audiocraft from @honualx and @AIatMeta 🤗 Oh, and you can use it with Transformers with just 3 lines of code! import torch from transformers import pipeline…
6
53
297
@reach_vb
Vaibhav (VB) Srivastav
26 days
CodeQwen1 1.5 7B - GPU poor ftw! 🔥 > pre-trained on 3 trillion tokens. > 64K context. > supports tasks like code generation, code editing, sql, chat and more. > performs better than deepseek coder and chat gpt 3.5 on SWE bench. > open access model, weights on the Hub.
Tweet media one
8
45
298
@reach_vb
Vaibhav (VB) Srivastav
2 months
Introducing StarCoder2 15B 🌟 > Beats CodeLlama 34B. > 16,384 context window. > Trained in 600+ programming languages from The Stack v2. > Trained on Fill-in-the-middle objective on 4 trillion + tokens. Along with that, we release smol-StarCoder2 3B & 7B ⭐ > 16K context…
Tweet media one
7
49
292
@reach_vb
Vaibhav (VB) Srivastav
5 months
MusicGen + LLM = High-quality tunes 🌟 Creating tunes by mere text prompts is no easy feat; there have been multiple attempts, but anecdotally, I have yet to find any that beats MusicGen by @AIatMeta ! All you need is about 5GB of GPU VRAM (or a Google Colab) ;) Here's how you…
2
64
288
@reach_vb
Vaibhav (VB) Srivastav
4 months
Yi-VL-6B New GPU poor Vision Language Model just dropped! ✨ > 6B & 34B parameter models > Multi-round text-image conversations > Bilingual: Chinese + English > Strong image comprehension > Fine-grained image resolution - 448 x 448
Tweet media one
6
48
289
@reach_vb
Vaibhav (VB) Srivastav
1 month
Distil-whisper now with apple neural engine support via WhisperKit! 🔥 You can now: brew install whisperkit-cli Followed by: whisperkit-cli transcribe --model-prefix "distil" --model "large-v3" --verbose --audio-path ~/Downloads/jfk.wav Bonus: If you have an M2 or higher…
4
46
288
@reach_vb
Vaibhav (VB) Srivastav
4 months
Faster Mistral 8x7B with fused modules & AWQ 🔥 Powered by AutoAWQ & Transformers. Fused modules offer improved accuracy and performance by replacing the Attention, MLP, and Layernorm layers with their corresponding fused version. Fused modules can use faster kernels and…
7
56
285
@reach_vb
Vaibhav (VB) Srivastav
1 year
🗣️ A new speech community event is incoming!! 📆 The Whisper fine-tuning sprints will be held from the 5th to the 19th of December. 🌍 Come join us to build better and faster speech recognition systems in 70+ languages. 🔥 Claim SoTA in a language of your choice!
Tweet media one
9
56
285