BlinkDL Profile Banner
BlinkDL Profile
BlinkDL

@BlinkDL_AI

6,790
Followers
92
Following
82
Media
186
Statuses

RWKV = 100% RNN with GPT-level performance. and

Joined September 2022
Don't wanna be here? Send us removal request.
Pinned Tweet
@BlinkDL_AI
BlinkDL
1 year
#RWKV is One Dev's Journey to Dethrone GPT Transformers. The largest RNN ever (up to 14B). Parallelizable. Faster inference & training. Supports INT8/4. No KV cache. 3 years of hard work. DEMO: Computation sponsored by @StabilityAI @AiEleuther @EMostaque
Tweet media one
45
348
2K
@BlinkDL_AI
BlinkDL
3 months
RWKV-5 "Eagle" 7B: beats Mistral-7B at multilingual, reaches Llama2-7B level at English, while being 100% attention-free RNN and only trained 1.1T tokens. Gradio Demo: RWKV-6 "Finch" 1B5 in ~10days, 3B in ~30days.
Tweet media one
Tweet media two
10
88
424
@BlinkDL_AI
BlinkDL
1 year
#RWKV : Reinventing RNNs for the Transformer Era
Tweet media one
Tweet media two
Tweet media three
Tweet media four
@AiEleuther
EleutherAI
1 year
Everyone knows that transformers are synonymous with large language models… but what if they weren’t? Over the past two years @BlinkDL_AI and team have been hard at work scaling RNNs to unprecedented scales. Today we are releasing a preprint on our work
5
117
474
8
95
344
@BlinkDL_AI
BlinkDL
1 year
Raven v6🐦7B (added gpt4all etc.) Please compare with 7B test5🙂 RWKV-4-Raven-7B-v6-Eng (99% English + 1% Multilang) RWKV-4-Raven-7B-v6-EngChnJpn (98% English + 1% Chn Jpn [GuanacoDataset] + 1% Multilang)
4
57
261
@BlinkDL_AI
BlinkDL
7 months
RWKV-5 World v2 - The best multilingual & code 1.5B language model is here🙂Online Demo: 100% RNN & attention-free. 3B & 7B coming soon.
Tweet media one
6
55
248
@BlinkDL_AI
BlinkDL
1 year
Raven 14B ( #RWKV finetuned on alpaca+codealpaca): Raven 7B: Raven 7B Gradio Demo: Try "+i Tell me about ravens." in ChatRWKV v2 to use them🐦
6
47
236
@BlinkDL_AI
BlinkDL
10 months
RWKV-5 increases headsz from 1 to 64, similar to applying RWKV-style "RNNify" to Linear Transformers (2006.16236). States are now matrix-valued and larger (also shown in RetNet to be helpful). Note there is no need for positional encoding.
Tweet media one
3
38
221
@BlinkDL_AI
BlinkDL
7 months
RWKV-5 World v2: the strongest 1.5B language model ever, supports 100+ world languages & code. Release in 12 days. training 3B & 7B too.
Tweet media one
4
42
218
@BlinkDL_AI
BlinkDL
1 year
Raven v8🐦14B to the moon🚀based on #RWKV (100% #RNN language model) 14B/7B/3B/1B Download: And v9 soon (ctxlen 8192, 3x SFT data)🚀
Tweet media one
7
40
204
@BlinkDL_AI
BlinkDL
10 months
A tiny #RWKV with 2.9M (!) params can solve 18239.715*9.728263 or 4.2379*564.778-1209.01 etc. with CoT, while being 100% #RNN (L6-D192)🤯The trick: generate lots of data with reversed numbers (denoted by "f" here) to train the model🚀Try it now:
Tweet media one
5
40
197
@BlinkDL_AI
BlinkDL
5 months
nanoRWKV: does not require custom CUDA kernel to train, works for any GPU / CPU 🙂 This is built on RWKV "x051" (current RWKV v5 models are "x052"). RWKV is a 100% RNN with GPT-level performance. DEMO:
Tweet media one
4
35
194
@BlinkDL_AI
BlinkDL
1 year
New RWKV multilang tokenizer (sz 65525) for future RWKV models. Better than 20B_tokenizer for all langs & code🚀Very easy encoding: simply greedy match to pick the longest token. Supports European langs and CJK and more. Clean vocab. Good at numbers too🔥
Tweet media one
2
38
190
@BlinkDL_AI
BlinkDL
8 months
Coming soon: new RWKV-5 with matrix-valued states / u / w, for best performance ever🚀
Tweet media one
2
21
177
@BlinkDL_AI
BlinkDL
1 year
Raven v9🐦7B 14B ctx8192 (better at everything), and multi-lang 10%Chn-JpnEspKor2%-Other2% models (7B finished. 14B training)🚀Language ratios determined by amt of ChatGPT data. Please share more ChatGPT data to increase the ratio of your lang🚀
4
29
169
@BlinkDL_AI
BlinkDL
1 year
Raven v7🐦7B & 14B, 3x v6 data (Alpaca+Vicuna-style, with plenty of multiround chats) finetuned from #RWKV (100% RNN Language Model). It's especially good at coding (Simply chat with it. Screenshot is 7B-Eng-v7). Download: Demo:
Tweet media one
4
38
166
@BlinkDL_AI
BlinkDL
1 year
Raven is #RWKV finetuned on Alpaca dataset🐦 DEMO:
Tweet media one
6
28
163
@BlinkDL_AI
BlinkDL
1 year
Note that RWKV "wins more than 50% of non-tied matches against all other open-source models except Vicuna."🙂We are optimized Arena settings to match ChatRWKV, and the real position of RWKV is very likely just below Vicuna.
Tweet media one
@lmsysorg
lmsys.org
1 year
Announcing the Week 2 update for the Chatbot Arena leaderboard! We've added some new models that are showcasing strong performance. Currently, @OpenAI 's GPT-4 and @AnthropicAI 's Claude lead the pack, with open-source models in hot pursuit. More findings:
Tweet media one
48
278
1K
4
29
163
@BlinkDL_AI
BlinkDL
5 months
RWKV-6 illustrated (formulas: ). Other projects are comparing with RWKV-4 (and call it "RWKV"). The don't even dare to show RWKV-5 numbers😂RWKV-5 3B Gradio demo:
Tweet media one
@BlinkDL_AI
BlinkDL
5 months
RWKV6🐦 vs Mamba🐍. RWKV6 will be the strongest multilingual model (data = only 1.1T tokens), which can occupy some capacities for English, but worth it🙂 Mamba @0 .3T(Pile) is great at 1.4B, less so at 2.8B. Will be interesting to see the results when provided more training data.
Tweet media one
3
19
145
6
25
152
@BlinkDL_AI
BlinkDL
3 months
RWKV-5 "Eagle" 7B is Mistral-7B level for language modeling of unseen arxiv CS & Physics papers, and significantly better than Llama2🐦We are testing more new data.
Tweet media one
@BlinkDL_AI
BlinkDL
5 months
Uncheatable LLM benchmark🙂A dev is testing new data: tokenize the first 5000 chars of 1000 new arXiv papers, compute sum of [neg. log prob.], smaller = better. RWKV-5 is good here. Phi-2 not good. makes you think🤔
Tweet media one
6
20
143
2
26
150
@BlinkDL_AI
BlinkDL
6 months
RWKV-5 7B 49% trained and it's already a strong model (100% RNN). try it in
Tweet media one
5
18
145
@BlinkDL_AI
BlinkDL
5 months
RWKV6🐦 vs Mamba🐍. RWKV6 will be the strongest multilingual model (data = only 1.1T tokens), which can occupy some capacities for English, but worth it🙂 Mamba @0 .3T(Pile) is great at 1.4B, less so at 2.8B. Will be interesting to see the results when provided more training data.
Tweet media one
3
19
145
@BlinkDL_AI
BlinkDL
5 months
Uncheatable LLM benchmark🙂A dev is testing new data: tokenize the first 5000 chars of 1000 new arXiv papers, compute sum of [neg. log prob.], smaller = better. RWKV-5 is good here. Phi-2 not good. makes you think🤔
Tweet media one
6
20
143
@BlinkDL_AI
BlinkDL
10 months
The JPNtuned 7B #RWKV World is the best open-source Japanese LLM 🚀Runner: Model (55% trained, finishing in a few days): More languages are coming🌍RWKV is 100% RNN
Tweet media one
1
43
140
@BlinkDL_AI
BlinkDL
14 days
RWKV state-tuning alignment: because RWKV is 100% RNN, we can directly tune its RNN state to control its behavior🤯For example, a state-tuned RWKV-6 "Finch" 1.6B can be fun and use emojis🐦even for unseen prompts. Demo model: (use rwkv pip pkg 0.8.26+, and…
Tweet media one
7
19
137
@BlinkDL_AI
BlinkDL
2 months
RWKV as an efficient text compressor:
Tweet media one
4
22
127
@BlinkDL_AI
BlinkDL
2 months
Google's RG-LRU is the same as RWKV-6 / GLA / ... and used our planned "Hawk" name (for RWKV-8)😂Fortunately RWKV-7 "Goose" is safe and WIP.
@_akhaliq
AK
2 months
Google presents Griffin Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN
Tweet media one
7
100
533
4
17
126
@BlinkDL_AI
BlinkDL
3 months
100% composed by RWKV-6 120M params MIDI model🎶Still takes multiple trials for such high quality outputs, but I will fix this🙂
3
21
125
@BlinkDL_AI
BlinkDL
5 days
Introducing RWKV-6 "Finch" 7B🐦 the strongest 100% RNN LLM (attention-free). Trained on 2.5T multilingual tokens (supports 100+ languages🌍 and code). Further scaling soon🚀Gradio Demo: #RWKV #RNN
Tweet media one
5
22
131
@BlinkDL_AI
BlinkDL
5 months
RWKV-5 7B 72% trained (finishing before xmas) uploaded to Supports 100+ languages and code. Attention-free RNN Language model. Trained on just 1.1T tokens - imagine what happens when we have compute to train on 2T+ tokens🙂
Tweet media one
5
20
122
@BlinkDL_AI
BlinkDL
2 months
RWKV-6.0 "Finch" 3B - reaching multilingual eval 58.9% (Mistral 7B = 58.2%). Gradio Demo: Will continue training it on World-2.1 (1.4T) to boost performance. Download:
Tweet media one
Tweet media two
5
21
118
@BlinkDL_AI
BlinkDL
5 months
And RWKV might be the only open source LLM architecture at this moment🙂I gave it to the Linux Foundation @LFAIDataFdn - please feel free to contact @picocreator for collaborations🎉
Tweet media one
@BlinkDL_AI
BlinkDL
5 months
nanoRWKV: does not require custom CUDA kernel to train, works for any GPU / CPU 🙂 This is built on RWKV "x051" (current RWKV v5 models are "x052"). RWKV is a 100% RNN with GPT-level performance. DEMO:
Tweet media one
4
35
194
3
13
120
@BlinkDL_AI
BlinkDL
1 year
Raven v10🐦7B/3B/1.5B based on #RWKV 100% RNN language model (14B soon). Now with 7B Eng89%-日本語10% version too 🚀 Gradio demo: (has chat mode now)
Tweet media one
0
22
116
@BlinkDL_AI
BlinkDL
11 months
RWKV World 7B release🌍chat & generate & code in 100+ world languages. The best small multi-lang model (available in 0.1~7B) and 100% RNN🚀Use to run it. DEMO: Download: #RWKV
Tweet media one
3
30
115
@BlinkDL_AI
BlinkDL
3 months
RWKV-6.0 "Finch" 1.6B is exceptionally good at multilingual (for its size): There will be more iterations (6.0, 6.1, 6.2) 🙂 #RWKV
Tweet media one
5
15
113
@BlinkDL_AI
BlinkDL
1 year
RWKV-4-World: Chat and text generation in 100 languages🌎Good at English zeroshot too. 0.1/0.4B done, 1.5/3/7B preview: Yes even 0.1B can chat in 100 langs🚀And there will be "RavenWorld" with further chat optimizations🐦
Tweet media one
Tweet media two
Tweet media three
2
22
110
@BlinkDL_AI
BlinkDL
1 year
Raven🐦14B-Eng v7 (100% RNN based on #RWKV ). Download: Run: (16G VRAM recommended). You can also try the Bot in RWKV Discord:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
17
110
@BlinkDL_AI
BlinkDL
7 months
The 14% trained RWKV-5 World v2 is already almost RWKV-4 World level🚀 I am training 3B & 7B too. #RWKV
Tweet media one
4
10
105
@BlinkDL_AI
BlinkDL
4 months
RWKV-5 7B 86% uploaded to (100% on Jan-28🚀)
Tweet media one
7
14
102
@BlinkDL_AI
BlinkDL
2 years
RWKV-4: scaling RNN to 7B params and beyond, with GPT-level language modeling and zero-shot performance :)
Tweet media one
0
32
101
@BlinkDL_AI
BlinkDL
1 year
Raven-test5🐦: Meticulously finetuned #RWKV (the RNN LM) on alpaca & more. These are STRONG chat models too (with latest ChatRWKV v2 English-2 prompt). 7B 14B DEMO
2
20
101
@BlinkDL_AI
BlinkDL
2 months
RWKV-6 showing perfect MQAR performance within test range🙂(compared with Based / Mamba) in RWKV v5+v6 WIP paper discussion: (It's in EleutherAI discord: )
Tweet media one
1
10
98
@BlinkDL_AI
BlinkDL
7 months
RWKV-5 3B in 15 days, and 7B in December🙂 #RWKV checkpts:
Tweet media one
1
13
96
@BlinkDL_AI
BlinkDL
1 month
RWKV-6.0 "Finch"🐦1.6B on 2.5T (=World2+World2.1) tokens for great performance. 100% RNN & attention-free. Supports 100+ world languages & code. Demo: (try a few times for each prompt, as this is base model) 3B in April and 7B in early May🙂I haven't put…
Tweet media one
Tweet media two
Tweet media three
4
23
95
@BlinkDL_AI
BlinkDL
6 months
RWKV v5 models are particularly good at fiction🙂3B Demo:
Tweet media one
@BlinkDL_AI
BlinkDL
6 months
RWKV v5 3B ctx4k - a nice English model, and the best multilingual model in town. 3B demo: (compare with 1.5B demo: ) Will finetune to longer ctx soon.
Tweet media one
Tweet media two
Tweet media three
2
15
87
2
17
94
@BlinkDL_AI
BlinkDL
10 months
From #RWKV MIDI model (100% RNN), continuation of the first 15 seconds of Raiden Shogun Theme #GenshinImpact . The model has zero knowledge of chord, instrumentation, etc. and learn everything by itself from MIDI dataset🚀Download:
0
13
94
@BlinkDL_AI
BlinkDL
1 year
Use rwkv.cpp for fast INT4/INT8 CPU inference 🚀 works for Linux / Mac / Windows (Q8_0 recommended). Raven🐦14B v10 demo: 7B v11x demo: #RWKV is 100% RNN with great performance。
Tweet media one
Tweet media two
1
12
93
@BlinkDL_AI
BlinkDL
1 month
Diffusion-RWKV (RWKV-4) with good results🙂Let's see what happens when we upgrade to RWKV-6 #RWKV #RNN
@_akhaliq
AK
1 month
Diffusion-RWKV Scaling RWKV-Like Architectures for Diffusion Models Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields. However, substantial computational complexity poses limitations for their application in long-context
Tweet media one
2
35
175
1
13
93
@BlinkDL_AI
BlinkDL
1 year
RNN is All You Need? Role-playing with 100% RNN Language Model: #RWKV "Raven"🐦14B-Eng v7. We will be able to efficiently run it on CPU with (FP16 ready, INT4 WIP).
@BlinkDL_AI
BlinkDL
1 year
Raven🐦14B-Eng v7 (100% RNN based on #RWKV ). Download: Run: (16G VRAM recommended). You can also try the Bot in RWKV Discord:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
17
110
1
15
92
@BlinkDL_AI
BlinkDL
2 months
Note RWKV is 100% RNN and all context is compressed into its state📦Here is RWKV-6 cuda kernel with state gradients: to enable state-tuning🚀(much more efficient and powerful than prompt-tuning) and genuine BPTT.
@BlinkDL_AI
BlinkDL
2 months
RWKV-6 showing perfect MQAR performance within test range🙂(compared with Based / Mamba) in RWKV v5+v6 WIP paper discussion: (It's in EleutherAI discord: )
Tweet media one
1
10
98
0
14
90
@BlinkDL_AI
BlinkDL
6 months
RWKV-6 looks good🙂Preview checkpts: supported in rwkv pip package 0.8.22+, and it's easy to migrate from RWKV-5
Tweet media one
@BlinkDL_AI
BlinkDL
7 months
RWKV-6 Todo. RWKV-5 Gradio Demo:
Tweet media one
3
10
61
2
17
89
@BlinkDL_AI
BlinkDL
4 days
RWKV6, xLSTM, Mamba, Griffin, GLA, HRGN2, ... All are similar "matrix-valued dynamic exponential decay"🙂 Only differences are sharing some parameters / adding some tweaks / adding some attention (hybrid). RWKV6 is the most battle-tested AFAIK: 7B dense @ 2.5T (attention-free to…
@BlinkDL_AI
BlinkDL
5 days
Introducing RWKV-6 "Finch" 7B🐦 the strongest 100% RNN LLM (attention-free). Trained on 2.5T multilingual tokens (supports 100+ languages🌍 and code). Further scaling soon🚀Gradio Demo: #RWKV #RNN
Tweet media one
5
22
131
2
11
104
@BlinkDL_AI
BlinkDL
6 months
RWKV v5 3B ctx4k - a nice English model, and the best multilingual model in town. 3B demo: (compare with 1.5B demo: ) Will finetune to longer ctx soon.
Tweet media one
Tweet media two
Tweet media three
2
15
87
@BlinkDL_AI
BlinkDL
1 year
And here is #RWKV 14B Demo: 🚀
@huggingface
Hugging Face
1 year
The first RNN in transformers! 🤯 Announcing the integration of RWKV models in transformers with @BlinkDL_AI and RWKV community! RWKV is an attention free model that combines the best from RNNs and transformers. Learn more about the model in this blogpost:
Tweet media one
19
268
1K
7
33
82
@BlinkDL_AI
BlinkDL
25 days
Introducing RWKV-6 "Finch" 3B v2.1🐦a very performant base model, and 100% RNN. 7B soon. Gradio Demo: Download:
Tweet media one
@BlinkDL_AI
BlinkDL
1 month
RWKV-6.0 "Finch"🐦1.6B on 2.5T (=World2+World2.1) tokens for great performance. 100% RNN & attention-free. Supports 100+ world languages & code. Demo: (try a few times for each prompt, as this is base model) 3B in April and 7B in early May🙂I haven't put…
Tweet media one
Tweet media two
Tweet media three
4
23
95
1
8
79
@BlinkDL_AI
BlinkDL
11 months
The Math behind WKV kernel in #RWKV :
Tweet media one
1
19
77
@BlinkDL_AI
BlinkDL
1 year
#RWKV Language Model in pure CUDA🚀(no need for pytorch, works for Linux & Windows). RWKV is 100% RNN with great performance:
@HarrisonVander1
Harrison Vanderbyl
1 year
RWKV-CPP-CUDA, is an implementation of rwkv written in pure c++ and cuda, allowing for both high speed and portability. Examples of use include add-ons for game engines, such as Godot-rwkv, a distributable build of godot for developing AI games. Details:
0
18
76
3
18
76
@BlinkDL_AI
BlinkDL
25 days
Vision-RWKV with RWKV-6 layers: Diffusion-RWKV: RWKV-5/6 paper: RWKV-6 "Finch"🐦7B under training and going strong. Fully 100% RNN and attention-free.
Tweet media one
@BlinkDL_AI
BlinkDL
1 month
Diffusion-RWKV (RWKV-4) with good results🙂Let's see what happens when we upgrade to RWKV-6 #RWKV #RNN
1
13
93
0
12
74
@BlinkDL_AI
BlinkDL
1 year
"EEG" Visualization of the #RWKV 1.5B RNN Language Model (the output after each block) by one of our community members😀
1
7
72
@BlinkDL_AI
BlinkDL
5 months
RWKV-6 progress. v6 1.6B in 20 days. #RWKV
Tweet media one
@BlinkDL_AI
BlinkDL
5 months
More RWKV-6 results: [v6 1.60b 42%] eval = [v5 1.58b 42%] eval + 0.9/1.2%🙂Uploaded to
Tweet media one
1
8
57
3
8
71
@BlinkDL_AI
BlinkDL
5 months
Transformer in 2024 be like🙂 Gen6 designs: RWKV-6🐦, Mamba🐍 Gen5 designs: RWKV-5, RetNet RWKV-5 3B Gradio Demo: RWKV Projects:
Tweet media one
4
14
70
@BlinkDL_AI
BlinkDL
10 months
#RWKV midi 120M params model 🎹🥁sample:
Tweet media one
2
14
71
@BlinkDL_AI
BlinkDL
5 months
RWKV-6: larger improvements in larger models. 3b 14% uploaded to
Tweet media one
1
13
63
@BlinkDL_AI
BlinkDL
14 days
RWKV-6 state-tuned 1.6B Gradio Demo: Only the initial state is tuned, so just 24x64x2048=3.1M parameters, and 0 inference overhead. Tuned a few thousand samples. Will add more🚀
@BlinkDL_AI
BlinkDL
14 days
RWKV state-tuning alignment: because RWKV is 100% RNN, we can directly tune its RNN state to control its behavior🤯For example, a state-tuned RWKV-6 "Finch" 1.6B can be fun and use emojis🐦even for unseen prompts. Demo model: (use rwkv pip pkg 0.8.26+, and…
Tweet media one
7
19
137
1
13
62
@BlinkDL_AI
BlinkDL
3 months
Local WebGPU inference of a 0.4B RWKV5 in your desktop/mobile browser🙂
Tweet media one
1
9
63
@BlinkDL_AI
BlinkDL
7 months
RWKV-6 Todo. RWKV-5 Gradio Demo:
Tweet media one
3
10
61
@BlinkDL_AI
BlinkDL
1 year
Raven 14B v11x & Q8_0 version (good for rwkv.cpp) 🚀 Gradio demo updated to 14B v11x too:
Tweet media one
Tweet media two
1
19
59
@BlinkDL_AI
BlinkDL
3 months
RWKV-6.0 "Finch" is much better at roleplaying with simple prompts and understanding instructions, while using the same data as v5🐦Gradio demo:
Tweet media one
@BlinkDL_AI
BlinkDL
3 months
RWKV-6.0 "Finch" 1.6B is exceptionally good at multilingual (for its size): There will be more iterations (6.0, 6.1, 6.2) 🙂 #RWKV
Tweet media one
5
15
113
2
12
57
@BlinkDL_AI
BlinkDL
1 year
Better Raven v6🐦7B & 14B finetuned from #RWKV (the 100% RNN Language Model) DEMO:
3
8
56
@BlinkDL_AI
BlinkDL
5 months
More RWKV-6 results: [v6 1.60b 42%] eval = [v5 1.58b 42%] eval + 0.9/1.2%🙂Uploaded to
Tweet media one
@BlinkDL_AI
BlinkDL
5 months
RWKV-6: larger improvements in larger models. 3b 14% uploaded to
Tweet media one
1
13
63
1
8
57
@BlinkDL_AI
BlinkDL
3 months
More RWKV-6 "Finch" evals on unseen data by 🐦RWKV-6 1.6B Gradio Demo:
Tweet media one
@BlinkDL_AI
BlinkDL
3 months
RWKV-6.0 "Finch" 1.6B is exceptionally good at multilingual (for its size): There will be more iterations (6.0, 6.1, 6.2) 🙂 #RWKV
Tweet media one
5
15
113
0
9
53
@BlinkDL_AI
BlinkDL
13 days
From community: RWKV-6 3B can be state-tuned to 99.2% LAMBADA, memorizing 400k+ tokens🧠 (only for testing capacity - it's training on test set). Method: check readme "State-tuning" part🚀 #RWKV #RNN #LL
Tweet media one
@BlinkDL_AI
BlinkDL
14 days
RWKV-6 state-tuned 1.6B Gradio Demo: Only the initial state is tuned, so just 24x64x2048=3.1M parameters, and 0 inference overhead. Tuned a few thousand samples. Will add more🚀
1
13
62
1
11
53
@BlinkDL_AI
BlinkDL
9 months
RWKV-4 world tuned to Japanese: and Arabic: (contact me on discord if you have data for other languages🙂) I'm preparing world v2 data for RWKV-5 0.1~14B runs🌍
1
18
47
@BlinkDL_AI
BlinkDL
6 months
And RWKV-5 models are as fast as RWKV-4 in latest rwkv.cpp now (try Q5_1)🚀
@BlinkDL_AI
BlinkDL
6 months
RWKV-5 7B 49% trained and it's already a strong model (100% RNN). try it in
Tweet media one
5
18
145
0
5
46
@BlinkDL_AI
BlinkDL
1 month
RWKV-5/6 models are great at modeling arXiv TeX papers🧑‍🔬(the best among similar params models) , because I put in both peS2o & SlimPaj_arxiv. Try "\section{Introduction}" in Gradio Demo. Tested in Wonder if someone can use it for TeX copilot🙂
Tweet media one
Tweet media two
@BlinkDL_AI
BlinkDL
1 month
RWKV-6.0 "Finch"🐦1.6B on 2.5T (=World2+World2.1) tokens for great performance. 100% RNN & attention-free. Supports 100+ world languages & code. Demo: (try a few times for each prompt, as this is base model) 3B in April and 7B in early May🙂I haven't put…
Tweet media one
Tweet media two
Tweet media three
4
23
95
0
8
43
@BlinkDL_AI
BlinkDL
1 year
@boborado Would you be interested in training more #RWKV models 😀 (on RedPajama & more)
5
3
45
@BlinkDL_AI
BlinkDL
5 months
Demo script to train RWKV-5 on MiniPile (1.5G tokens, will auto-download), works for WSL2 too:
Tweet media one
0
8
45
@BlinkDL_AI
BlinkDL
1 year
And that's why I prefer prompt-tuning😀I try the "Alpaca prompt" on RWKV 14B ctx8192, and to my surprise it works out of box without finetuning (RWKV is a 100% RNN trained on 100% Pile v1 and nothing else). demo: code:
Tweet media one
Tweet media two
@geoffreyhinton
Geoffrey Hinton
1 year
Reinforcement Learning by Human Feedback is just parenting for a supernaturally precocious child.
135
491
3K
2
6
42
@BlinkDL_AI
BlinkDL
1 year
I always wonder if we can do this for animals (such as cats and dogs) too. Will their "text encoder" be similar to ours (I feel so)? Can we finally decode their brain and talk with them?🐱🐶
@NishimotoShinji
Shinji Nishimoto
1 year
Our paper got accepted at #CVPR2023 ! (w/ @yu_takagi ) We modeled the relationship between human brain activity (early/semantic areas) and Stable Diffusion's latent representations and decoded perceptual contents from brain activity ("brain2image").
Tweet media one
16
117
426
2
4
39
@BlinkDL_AI
BlinkDL
3 months
Fast RWKV-5 inference on Intel iGPUs & GPU: 🚀
@BlinkDL_AI
BlinkDL
3 months
RWKV-5 "Eagle" 7B: beats Mistral-7B at multilingual, reaches Llama2-7B level at English, while being 100% attention-free RNN and only trained 1.1T tokens. Gradio Demo: RWKV-6 "Finch" 1B5 in ~10days, 3B in ~30days.
Tweet media one
Tweet media two
10
88
424
0
9
39
@BlinkDL_AI
BlinkDL
1 year
Information propagation in #RWKV (100% RNN Language Model) by our community member: "The values in the chart are the log-probability of outputting "Paris" when the states and activation of the layer at that token are recovered from corruption."
Tweet media one
0
7
39
@BlinkDL_AI
BlinkDL
7 months
More benchmarks. RWKV-5 World v2 download (training in progress): RWKV is 100% RNN - fast and saves VRAM🚀
Tweet media one
1
8
38
@BlinkDL_AI
BlinkDL
6 months
Cool RWKV-related papers: (ASR) (ICL) (boost RWKV4 performance). Latest 0.4B 3B 7B checkpts:
0
9
37
@BlinkDL_AI
BlinkDL
1 year
The @AiEleuther community is writing a paper on #RWKV . If you'd like to be a part of the team, join the rwkv paper channel: in EleutherAI Discord: 🚀
1
5
35
@BlinkDL_AI
BlinkDL
9 months
RWKV-4 and RWKV-5 ABC (music sheet format) model 🎹 and (100% RNN) #RWKV
Tweet media one
0
3
32
@BlinkDL_AI
BlinkDL
5 months
p.s. Transformers are RNNs with growing states (growing KV cache). Strange that this is not mentioned more.
@BlinkDL_AI
BlinkDL
5 months
RWKV-6 illustrated (formulas: ). Other projects are comparing with RWKV-4 (and call it "RWKV"). The don't even dare to show RWKV-5 numbers😂RWKV-5 3B Gradio demo:
Tweet media one
6
25
152
2
7
32
@BlinkDL_AI
BlinkDL
5 months
@felix_red_panda Arxiv is the beginning. We can use latest news, github repos, arxiv paper, blog posts, new wiki entries, and more. The point is to benchmark LLMs on new data - although they can be polluted by ChatGPT too, it is still better than using very old (and actually noisy) evals.
2
2
30
@BlinkDL_AI
BlinkDL
1 year
@ipvkyte @StabilityAI @AiEleuther @EMostaque Can already do zero-shot instruction :) I will release an Alpaca-tuned version soon.
@BlinkDL_AI
BlinkDL
1 year
And that's why I prefer prompt-tuning😀I try the "Alpaca prompt" on RWKV 14B ctx8192, and to my surprise it works out of box without finetuning (RWKV is a 100% RNN trained on 100% Pile v1 and nothing else). demo: code:
Tweet media one
Tweet media two
2
6
42
0
0
29
@BlinkDL_AI
BlinkDL
1 year
The largest RNN ever: RWKV-4 14B release😀 Let's build ChatRWKV:
Tweet media one
0
9
30
@BlinkDL_AI
BlinkDL
1 year
@tugot17 @StabilityAI @AiEleuther @EMostaque Effective ctxlen easily goes beyond 4K :)
Tweet media one
1
0
29
@BlinkDL_AI
BlinkDL
1 year
Would you be interested in testing the quantization of RWKV 14B? 🙂 It has both GPT and RNN mode (can be used as 100% RNN), so faster and saves VRAM. #RWKV #ChatRWKV
Tweet media one
@Tim_Dettmers
Tim Dettmers
1 year
An update to our k-bit inference scaling laws paper: + Includes results for 175B OPT/BLOOM + Short analysis of scaling behavior of GPTQ + Better related work + Main takeaway: input-dependent quantization like GPTQ might unlock less than 4-bits.
3
40
163
3
1
29
@BlinkDL_AI
BlinkDL
1 year
@ArthurB @StabilityAI @AiEleuther @EMostaque RWKV can be trained in GPT mode too, so you get all the GPT benefits.
0
0
28
@BlinkDL_AI
BlinkDL
7 months
Benchmark of popular small LMs on HuggingFace using (removed triviaQA for now as lm_eval is not good at parsing verbose replies. probably should use few-shot to restrict response format)
Tweet media one
0
7
29
@BlinkDL_AI
BlinkDL
6 months
All efficient RWKV-5 backends: (nvidia, amd, intel, arm, mac, gpu, cpu, vulkan, ...)
@BlinkDL_AI
BlinkDL
6 months
RWKV v5 models are particularly good at fiction🙂3B Demo:
Tweet media one
2
17
94
0
3
29
@BlinkDL_AI
BlinkDL
1 year
Please take a look at RWKV 14B too🙂Our latest ctx4096 model is great. #RWKV #ChatRWKV
Tweet media one
@ggerganov
Georgi Gerganov
1 year
I think I can make 4-bit LLaMA-65B inference run on a 64 GB M1 Pro 🤔 Speed should be somewhere around 2 tokens/sec. Is this useful for anything?
37
17
453
0
2
27
@BlinkDL_AI
BlinkDL
6 months
Stranger things can happen as we get closer to the turning point of human kind, if my theory of the universe is correct🙃
2
3
26
@BlinkDL_AI
BlinkDL
7 months
Gradio demo of 1.5B RWKV-5 World v2 (70% trained) now with more examples (you can edit the prompt and try a few times): 🤖🚀
Tweet media one
Tweet media two
1
4
25