BlinkDL @BlinkDL_AI Twitter profile | Pikagi

Pikagi

BlinkDL

@BlinkDL_AI

6,790

Followers

92

Following

82

Media

186

Statuses

RWKV = 100% RNN with GPT-level performance. and

https://t.co/6ZHNbn36G4

Joined September 2022

Don't wanna be here? Send us removal request.

Pinned Tweet

@BlinkDL_AI

BlinkDL

1 year

#RWKV is One Dev's Journey to Dethrone GPT Transformers. The largest RNN ever (up to 14B). Parallelizable. Faster inference & training. Supports INT8/4. No KV cache. 3 years of hard work. DEMO: Computation sponsored by @StabilityAI @AiEleuther @EMostaque

Tweet media one

45

348

2K

Last Seen Profiles

@broiptu

@ConcernedApe

@stw_pdg

@almohadith

@AspieGG

@n0svamos3shima

@_prodkv

@wxmbrymbosquash

@nShirokurogo

@CianHayes7

@StavMessinis

@kaysahiasahi

@seanhops6

@eddy_njoroge

@julie_renbe

@garitoken

@the_gabodiaz

@totgametwit

@artyfiber

@chershi

@ned_ed

@kimjintta4

@bodoai

@ashtoo_

@BOKEPBOCILVIRA3

@Twihpe

@dogalyasalar

@ArtsJeze

@_vcent_

@yusyasy

@framer

@LSEDataScience

@ProgMagazineUK

@BP_G_

@ASJ_sapphics

@deIenart

@BlinkDL_AI

BlinkDL

3 months

RWKV-5 "Eagle" 7B: beats Mistral-7B at multilingual, reaches Llama2-7B level at English, while being 100% attention-free RNN and only trained 1.1T tokens. Gradio Demo: RWKV-6 "Finch" 1B5 in ~10days, 3B in ~30days.

Tweet media one

Tweet media two

10

88

424

@BlinkDL_AI

BlinkDL

1 year

#RWKV : Reinventing RNNs for the Transformer Era

Tweet media one

Tweet media two

Tweet media three

Tweet media four

@AiEleuther

EleutherAI

1 year

Everyone knows that transformers are synonymous with large language models… but what if they weren’t? Over the past two years @BlinkDL_AI and team have been hard at work scaling RNNs to unprecedented scales. Today we are releasing a preprint on our work

5

117

474

8

95

344

@BlinkDL_AI

BlinkDL

1 year

Raven v6🐦7B (added gpt4all etc.) Please compare with 7B test5🙂 RWKV-4-Raven-7B-v6-Eng (99% English + 1% Multilang) RWKV-4-Raven-7B-v6-EngChnJpn (98% English + 1% Chn Jpn [GuanacoDataset] + 1% Multilang)

Tweet card media

BlinkDL/rwkv-4-pile-7b at main

4

57

261

@BlinkDL_AI

BlinkDL

7 months

RWKV-5 World v2 - The best multilingual & code 1.5B language model is here🙂Online Demo: 100% RNN & attention-free. 3B & 7B coming soon.

Tweet media one

6

55

248

@BlinkDL_AI

BlinkDL

1 year

Raven 14B ( #RWKV finetuned on alpaca+codealpaca): Raven 7B: Raven 7B Gradio Demo: Try "+i Tell me about ravens." in ChatRWKV v2 to use them🐦

Tweet card media

RWKV-Gradio-2 - a Hugging Face Space by BlinkDL

6

47

236

@BlinkDL_AI

BlinkDL

10 months

RWKV-5 increases headsz from 1 to 64, similar to applying RWKV-style "RNNify" to Linear Transformers (2006.16236). States are now matrix-valued and larger (also shown in RetNet to be helpful). Note there is no need for positional encoding.

Tweet media one

3

38

221

@BlinkDL_AI

BlinkDL

7 months

RWKV-5 World v2: the strongest 1.5B language model ever, supports 100+ world languages & code. Release in 12 days. training 3B & 7B too.

Tweet media one

4

42

218

@BlinkDL_AI

BlinkDL

1 year

Raven v8🐦14B to the moon🚀based on #RWKV (100% #RNN language model) 14B/7B/3B/1B Download: And v9 soon (ctxlen 8192, 3x SFT data)🚀

Tweet media one

7

40

204

@BlinkDL_AI

BlinkDL

10 months

A tiny #RWKV with 2.9M (!) params can solve 18239.715*9.728263 or 4.2379*564.778-1209.01 etc. with CoT, while being 100% #RNN (L6-D192)🤯The trick: generate lots of data with reversed numbers (denoted by "f" here) to train the model🚀Try it now:

Tweet media one

5

40

197

@BlinkDL_AI

BlinkDL

5 months

nanoRWKV: does not require custom CUDA kernel to train, works for any GPU / CPU 🙂 This is built on RWKV "x051" (current RWKV v5 models are "x052"). RWKV is a 100% RNN with GPT-level performance. DEMO:

Tweet media one

4

35

194

@BlinkDL_AI

BlinkDL

1 year

New RWKV multilang tokenizer (sz 65525) for future RWKV models. Better than 20B_tokenizer for all langs & code🚀Very easy encoding: simply greedy match to pick the longest token. Supports European langs and CJK and more. Clean vocab. Good at numbers too🔥

Tweet media one

2

38

190

@BlinkDL_AI

BlinkDL

8 months

Coming soon: new RWKV-5 with matrix-valued states / u / w, for best performance ever🚀

Tweet media one

2

21

177

@BlinkDL_AI

BlinkDL

1 year

Raven v9🐦7B 14B ctx8192 (better at everything), and multi-lang 10%Chn-JpnEspKor2%-Other2% models (7B finished. 14B training)🚀Language ratios determined by amt of ChatGPT data. Please share more ChatGPT data to increase the ratio of your lang🚀

Tweet card media

BlinkDL/rwkv-4-raven · Hugging Face

4

29

169

@BlinkDL_AI

BlinkDL

1 year

Raven v7🐦7B & 14B, 3x v6 data (Alpaca+Vicuna-style, with plenty of multiround chats) finetuned from #RWKV (100% RNN Language Model). It's especially good at coding (Simply chat with it. Screenshot is 7B-Eng-v7). Download: Demo:

Tweet media one

4

38

166

@BlinkDL_AI

BlinkDL

1 year

Raven is #RWKV finetuned on Alpaca dataset🐦 DEMO:

Tweet media one

6

28

163

@BlinkDL_AI

BlinkDL

1 year

Note that RWKV "wins more than 50% of non-tied matches against all other open-source models except Vicuna."🙂We are optimized Arena settings to match ChatRWKV, and the real position of RWKV is very likely just below Vicuna.

Tweet media one

@lmsysorg

lmsys.org

1 year

Announcing the Week 2 update for the Chatbot Arena leaderboard! We've added some new models that are showcasing strong performance. Currently, @OpenAI 's GPT-4 and @AnthropicAI 's Claude lead the pack, with open-source models in hot pursuit. More findings:

Tweet media one

48

278

1K

4

29

163

@BlinkDL_AI

BlinkDL

5 months

RWKV-6 illustrated (formulas: ). Other projects are comparing with RWKV-4 (and call it "RWKV"). The don't even dare to show RWKV-5 numbers😂RWKV-5 3B Gradio demo:

Tweet media one

@BlinkDL_AI

BlinkDL

5 months

RWKV6🐦 vs Mamba🐍. RWKV6 will be the strongest multilingual model (data = only 1.1T tokens), which can occupy some capacities for English, but worth it🙂 Mamba @0 .3T(Pile) is great at 1.4B, less so at 2.8B. Will be interesting to see the results when provided more training data.

Tweet media one

3

19

145

6

25

152

@BlinkDL_AI

BlinkDL

3 months

RWKV-5 "Eagle" 7B is Mistral-7B level for language modeling of unseen arxiv CS & Physics papers, and significantly better than Llama2🐦We are testing more new data.

Tweet media one

@BlinkDL_AI

BlinkDL

5 months

Uncheatable LLM benchmark🙂A dev is testing new data: tokenize the first 5000 chars of 1000 new arXiv papers, compute sum of [neg. log prob.], smaller = better. RWKV-5 is good here. Phi-2 not good. makes you think🤔

Tweet media one

6

20

143

2

26

150

@BlinkDL_AI

BlinkDL

6 months

RWKV-5 7B 49% trained and it's already a strong model (100% RNN). try it in

Tweet media one

5

18

145

@BlinkDL_AI

BlinkDL

5 months

RWKV6🐦 vs Mamba🐍. RWKV6 will be the strongest multilingual model (data = only 1.1T tokens), which can occupy some capacities for English, but worth it🙂 Mamba @0 .3T(Pile) is great at 1.4B, less so at 2.8B. Will be interesting to see the results when provided more training data.

Tweet media one

3

19

145

@BlinkDL_AI

BlinkDL

5 months

Uncheatable LLM benchmark🙂A dev is testing new data: tokenize the first 5000 chars of 1000 new arXiv papers, compute sum of [neg. log prob.], smaller = better. RWKV-5 is good here. Phi-2 not good. makes you think🤔

Tweet media one

6

20

143

@BlinkDL_AI

BlinkDL

10 months

The JPNtuned 7B #RWKV World is the best open-source Japanese LLM 🚀Runner: Model (55% trained, finishing in a few days): More languages are coming🌍RWKV is 100% RNN

Tweet media one

1

43

140

@BlinkDL_AI

BlinkDL

14 days

RWKV state-tuning alignment: because RWKV is 100% RNN, we can directly tune its RNN state to control its behavior🤯For example, a state-tuned RWKV-6 "Finch" 1.6B can be fun and use emojis🐦even for unseen prompts. Demo model: (use rwkv pip pkg 0.8.26+, and…

Tweet media one

7

19

137

@BlinkDL_AI

BlinkDL

2 months

RWKV as an efficient text compressor:

Tweet media one

4

22

127

@BlinkDL_AI

BlinkDL

2 months

Google's RG-LRU is the same as RWKV-6 / GLA / ... and used our planned "Hawk" name (for RWKV-8)😂Fortunately RWKV-7 "Goose" is safe and WIP.

@_akhaliq

AK

2 months

Google presents Griffin Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN

Tweet media one

7

100

533

4

17

126

@BlinkDL_AI

BlinkDL

3 months

100% composed by RWKV-6 120M params MIDI model🎶Still takes multiple trials for such high quality outputs, but I will fix this🙂

3

21

125

@BlinkDL_AI

BlinkDL

5 days

Introducing RWKV-6 "Finch" 7B🐦 the strongest 100% RNN LLM (attention-free). Trained on 2.5T multilingual tokens (supports 100+ languages🌍 and code). Further scaling soon🚀Gradio Demo: #RWKV #RNN

Tweet media one

5

22

131

@BlinkDL_AI

BlinkDL

5 months

RWKV-5 7B 72% trained (finishing before xmas) uploaded to Supports 100+ languages and code. Attention-free RNN Language model. Trained on just 1.1T tokens - imagine what happens when we have compute to train on 2T+ tokens🙂

Tweet media one

5

20

122

@BlinkDL_AI

BlinkDL

2 months

RWKV-6.0 "Finch" 3B - reaching multilingual eval 58.9% (Mistral 7B = 58.2%). Gradio Demo: Will continue training it on World-2.1 (1.4T) to boost performance. Download:

Tweet media one

Tweet media two

5

21

118

@BlinkDL_AI

BlinkDL

5 months

And RWKV might be the only open source LLM architecture at this moment🙂I gave it to the Linux Foundation @LFAIDataFdn - please feel free to contact @picocreator for collaborations🎉

Tweet media one

@BlinkDL_AI

BlinkDL

5 months

nanoRWKV: does not require custom CUDA kernel to train, works for any GPU / CPU 🙂 This is built on RWKV "x051" (current RWKV v5 models are "x052"). RWKV is a 100% RNN with GPT-level performance. DEMO:

Tweet media one

4

35

194

3

13

120

@BlinkDL_AI

BlinkDL

1 year

Raven v10🐦7B/3B/1.5B based on #RWKV 100% RNN language model (14B soon). Now with 7B Eng89%-日本語10% version too 🚀 Gradio demo: (has chat mode now)

Tweet media one

0

22

116

@BlinkDL_AI

BlinkDL

11 months

RWKV World 7B release🌍chat & generate & code in 100+ world languages. The best small multi-lang model (available in 0.1~7B) and 100% RNN🚀Use to run it. DEMO: Download: #RWKV

Tweet media one

3

30

115

@BlinkDL_AI

BlinkDL

3 months

RWKV-6.0 "Finch" 1.6B is exceptionally good at multilingual (for its size): There will be more iterations (6.0, 6.1, 6.2) 🙂 #RWKV

Tweet media one

5

15

113

@BlinkDL_AI

BlinkDL

1 year

RWKV-4-World: Chat and text generation in 100 languages🌎Good at English zeroshot too. 0.1/0.4B done, 1.5/3/7B preview: Yes even 0.1B can chat in 100 langs🚀And there will be "RavenWorld" with further chat optimizations🐦

Tweet media one

Tweet media two

Tweet media three

2

22

110

@BlinkDL_AI

BlinkDL

1 year

Raven🐦14B-Eng v7 (100% RNN based on #RWKV ). Download: Run: (16G VRAM recommended). You can also try the Bot in RWKV Discord:

Tweet media one

Tweet media two

Tweet media three

Tweet media four

1

17

110

@BlinkDL_AI

BlinkDL

7 months

The 14% trained RWKV-5 World v2 is already almost RWKV-4 World level🚀 I am training 3B & 7B too. #RWKV

Tweet media one

4

10

105

@BlinkDL_AI

BlinkDL

4 months

RWKV-5 7B 86% uploaded to (100% on Jan-28🚀)

Tweet media one

7

14

102

@BlinkDL_AI

BlinkDL

2 years

RWKV-4: scaling RNN to 7B params and beyond, with GPT-level language modeling and zero-shot performance :)

Tweet media one

0

32

101

@BlinkDL_AI

BlinkDL

1 year

@clif08_ @StabilityAI @AiEleuther @EMostaque Yes! Use with multiple "strategies" to run on low VRAM GPUs.

Tweet card media

GitHub - BlinkDL/ChatRWKV: ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model,...

ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source. - BlinkDL/ChatRWKV

0

5

102

@BlinkDL_AI

BlinkDL

1 year

Raven-test5🐦: Meticulously finetuned #RWKV (the RNN LM) on alpaca & more. These are STRONG chat models too (with latest ChatRWKV v2 English-2 prompt). 7B 14B DEMO

Tweet card media

RWKV-Gradio-2 - a Hugging Face Space by BlinkDL

2

20

101

@BlinkDL_AI

BlinkDL

2 months

RWKV-6 showing perfect MQAR performance within test range🙂(compared with Based / Mamba) in RWKV v5+v6 WIP paper discussion: (It's in EleutherAI discord: )

Tweet media one

1

10

98

@BlinkDL_AI

BlinkDL

7 months

RWKV-5 3B in 15 days, and 7B in December🙂 #RWKV checkpts:

Tweet media one

1

13

96

@BlinkDL_AI

BlinkDL

1 month

RWKV-6.0 "Finch"🐦1.6B on 2.5T (=World2+World2.1) tokens for great performance. 100% RNN & attention-free. Supports 100+ world languages & code. Demo: (try a few times for each prompt, as this is base model) 3B in April and 7B in early May🙂I haven't put…

Tweet media one

Tweet media two

Tweet media three

4

23

95

@BlinkDL_AI

BlinkDL

6 months

RWKV v5 models are particularly good at fiction🙂3B Demo:

Tweet media one

@BlinkDL_AI

BlinkDL

6 months

RWKV v5 3B ctx4k - a nice English model, and the best multilingual model in town. 3B demo: (compare with 1.5B demo: ) Will finetune to longer ctx soon.

Tweet media one

Tweet media two

Tweet media three

2

15

87

2

17

94

@BlinkDL_AI

BlinkDL

10 months

From #RWKV MIDI model (100% RNN), continuation of the first 15 seconds of Raiden Shogun Theme #GenshinImpact . The model has zero knowledge of chord, instrumentation, etc. and learn everything by itself from MIDI dataset🚀Download:

0

13

94

@BlinkDL_AI

BlinkDL

1 year

Use rwkv.cpp for fast INT4/INT8 CPU inference 🚀 works for Linux / Mac / Windows (Q8_0 recommended). Raven🐦14B v10 demo: 7B v11x demo: #RWKV is 100% RNN with great performance。

Tweet media one

Tweet media two

1

12

93

@BlinkDL_AI

BlinkDL

1 month

Diffusion-RWKV (RWKV-4) with good results🙂Let's see what happens when we upgrade to RWKV-6 #RWKV #RNN

@_akhaliq

AK

1 month

Diffusion-RWKV Scaling RWKV-Like Architectures for Diffusion Models Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields. However, substantial computational complexity poses limitations for their application in long-context

Tweet media one

2

35

175

1

13

93

@BlinkDL_AI

BlinkDL

1 year

RNN is All You Need? Role-playing with 100% RNN Language Model: #RWKV "Raven"🐦14B-Eng v7. We will be able to efficiently run it on CPU with (FP16 ready, INT4 WIP).

Tweet card media

GitHub - RWKV/rwkv.cpp: INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model - RWKV/rwkv.cpp

@BlinkDL_AI

BlinkDL

1 year

Raven🐦14B-Eng v7 (100% RNN based on #RWKV ). Download: Run: (16G VRAM recommended). You can also try the Bot in RWKV Discord:

Tweet media one

Tweet media two

Tweet media three

Tweet media four

1

17

110

1

15

92

@BlinkDL_AI

BlinkDL

2 months

Note RWKV is 100% RNN and all context is compressed into its state📦Here is RWKV-6 cuda kernel with state gradients: to enable state-tuning🚀(much more efficient and powerful than prompt-tuning) and genuine BPTT.

@BlinkDL_AI

BlinkDL

2 months

RWKV-6 showing perfect MQAR performance within test range🙂(compared with Based / Mamba) in RWKV v5+v6 WIP paper discussion: (It's in EleutherAI discord: )

Tweet media one

1

10

98

0

14

90

@BlinkDL_AI

BlinkDL

6 months

RWKV-6 looks good🙂Preview checkpts: supported in rwkv pip package 0.8.22+, and it's easy to migrate from RWKV-5

Tweet media one

@BlinkDL_AI

BlinkDL

7 months

RWKV-6 Todo. RWKV-5 Gradio Demo:

Tweet media one

3

10

61

2

17

89

@BlinkDL_AI

BlinkDL

4 days

RWKV6, xLSTM, Mamba, Griffin, GLA, HRGN2, ... All are similar "matrix-valued dynamic exponential decay"🙂 Only differences are sharing some parameters / adding some tweaks / adding some attention (hybrid). RWKV6 is the most battle-tested AFAIK: 7B dense @ 2.5T (attention-free to…

@BlinkDL_AI

BlinkDL

5 days

Introducing RWKV-6 "Finch" 7B🐦 the strongest 100% RNN LLM (attention-free). Trained on 2.5T multilingual tokens (supports 100+ languages🌍 and code). Further scaling soon🚀Gradio Demo: #RWKV #RNN

Tweet media one

5

22

131

2

11

104

@BlinkDL_AI

BlinkDL

6 months

RWKV v5 3B ctx4k - a nice English model, and the best multilingual model in town. 3B demo: (compare with 1.5B demo: ) Will finetune to longer ctx soon.

Tweet media one

Tweet media two

Tweet media three

2

15

87

@BlinkDL_AI

BlinkDL

1 year

And here is #RWKV 14B Demo: 🚀

Tweet card media

RWKV-Gradio-1 - a Hugging Face Space by BlinkDL

@huggingface

Hugging Face

1 year

The first RNN in transformers! 🤯 Announcing the integration of RWKV models in transformers with @BlinkDL_AI and RWKV community! RWKV is an attention free model that combines the best from RNNs and transformers. Learn more about the model in this blogpost:

Tweet media one

19

268

1K

7

33

82

@BlinkDL_AI

BlinkDL

25 days

Introducing RWKV-6 "Finch" 3B v2.1🐦a very performant base model, and 100% RNN. 7B soon. Gradio Demo: Download:

Tweet media one

@BlinkDL_AI

BlinkDL

1 month

RWKV-6.0 "Finch"🐦1.6B on 2.5T (=World2+World2.1) tokens for great performance. 100% RNN & attention-free. Supports 100+ world languages & code. Demo: (try a few times for each prompt, as this is base model) 3B in April and 7B in early May🙂I haven't put…

Tweet media one

Tweet media two

Tweet media three

4

23

95

1

8

79

@BlinkDL_AI

BlinkDL

11 months

The Math behind WKV kernel in #RWKV :

Tweet media one

1

19

77

@BlinkDL_AI

BlinkDL

1 year

#RWKV Language Model in pure CUDA🚀(no need for pytorch, works for Linux & Windows). RWKV is 100% RNN with great performance:

Tweet card media

RWKV-Gradio-2 - a Hugging Face Space by BlinkDL

@HarrisonVander1

Harrison Vanderbyl

@HarrisonVander1

1 year

RWKV-CPP-CUDA, is an implementation of rwkv written in pure c++ and cuda, allowing for both high speed and portability. Examples of use include add-ons for game engines, such as Godot-rwkv, a distributable build of godot for developing AI games. Details:

0

18

76

3

18

76

@BlinkDL_AI

BlinkDL

25 days

Vision-RWKV with RWKV-6 layers: Diffusion-RWKV: RWKV-5/6 paper: RWKV-6 "Finch"🐦7B under training and going strong. Fully 100% RNN and attention-free.

Tweet media one

@BlinkDL_AI

BlinkDL

1 month

Diffusion-RWKV (RWKV-4) with good results🙂Let's see what happens when we upgrade to RWKV-6 #RWKV #RNN

1

13

93

0

12

74

@BlinkDL_AI

BlinkDL

1 year

"EEG" Visualization of the #RWKV 1.5B RNN Language Model (the output after each block) by one of our community members😀

1

7

72

@BlinkDL_AI

BlinkDL

5 months

RWKV-6 progress. v6 1.6B in 20 days. #RWKV

Tweet media one

@BlinkDL_AI

BlinkDL

5 months

More RWKV-6 results: [v6 1.60b 42%] eval = [v5 1.58b 42%] eval + 0.9/1.2%🙂Uploaded to

Tweet media one

1

8

57

3

8

71

@BlinkDL_AI

BlinkDL

5 months

Transformer in 2024 be like🙂 Gen6 designs: RWKV-6🐦, Mamba🐍 Gen5 designs: RWKV-5, RetNet RWKV-5 3B Gradio Demo: RWKV Projects:

Tweet media one

4

14

70

@BlinkDL_AI

BlinkDL

10 months

#RWKV midi 120M params model 🎹🥁sample:

Tweet media one

2

14

71

@BlinkDL_AI

BlinkDL

5 months

RWKV-6: larger improvements in larger models. 3b 14% uploaded to

Tweet media one

1

13

63

@BlinkDL_AI

BlinkDL

14 days

RWKV-6 state-tuned 1.6B Gradio Demo: Only the initial state is tuned, so just 24x64x2048=3.1M parameters, and 0 inference overhead. Tuned a few thousand samples. Will add more🚀

Tweet card media

RWKV-Gradio-2 - a Hugging Face Space by BlinkDL

@BlinkDL_AI

BlinkDL

14 days

RWKV state-tuning alignment: because RWKV is 100% RNN, we can directly tune its RNN state to control its behavior🤯For example, a state-tuned RWKV-6 "Finch" 1.6B can be fun and use emojis🐦even for unseen prompts. Demo model: (use rwkv pip pkg 0.8.26+, and…

Tweet media one

7

19

137

1

13

62

@BlinkDL_AI

BlinkDL

3 months

Local WebGPU inference of a 0.4B RWKV5 in your desktop/mobile browser🙂

Tweet media one

1

9

63

@BlinkDL_AI

BlinkDL

7 months

RWKV-6 Todo. RWKV-5 Gradio Demo:

Tweet media one

3

10

61

@BlinkDL_AI

BlinkDL

1 year

Raven 14B v11x & Q8_0 version (good for rwkv.cpp) 🚀 Gradio demo updated to 14B v11x too:

Tweet media one

Tweet media two

1

19

59

@BlinkDL_AI

BlinkDL

3 months

RWKV-6.0 "Finch" is much better at roleplaying with simple prompts and understanding instructions, while using the same data as v5🐦Gradio demo:

Tweet media one

@BlinkDL_AI

BlinkDL

3 months

RWKV-6.0 "Finch" 1.6B is exceptionally good at multilingual (for its size): There will be more iterations (6.0, 6.1, 6.2) 🙂 #RWKV

Tweet media one

5

15

113

2

12

57

@BlinkDL_AI

BlinkDL

1 year

@tugot17 :)

Tweet card media

RyokoAI/ShareGPT52K · Datasets at Hugging Face

1

11

58

@BlinkDL_AI

BlinkDL

1 year

Better Raven v6🐦7B & 14B finetuned from #RWKV (the 100% RNN Language Model) DEMO:

Tweet card media

RWKV-Gradio-2 - a Hugging Face Space by BlinkDL

3

8

56

@BlinkDL_AI

BlinkDL

5 months

More RWKV-6 results: [v6 1.60b 42%] eval = [v5 1.58b 42%] eval + 0.9/1.2%🙂Uploaded to

Tweet media one

@BlinkDL_AI

BlinkDL

5 months

RWKV-6: larger improvements in larger models. 3b 14% uploaded to

Tweet media one

1

13

63

1

8

57

@BlinkDL_AI

BlinkDL

3 months

More RWKV-6 "Finch" evals on unseen data by 🐦RWKV-6 1.6B Gradio Demo:

Tweet media one

@BlinkDL_AI

BlinkDL

3 months

RWKV-6.0 "Finch" 1.6B is exceptionally good at multilingual (for its size): There will be more iterations (6.0, 6.1, 6.2) 🙂 #RWKV

Tweet media one

5

15

113

0

9

53

@BlinkDL_AI

BlinkDL

13 days

From community: RWKV-6 3B can be state-tuned to 99.2% LAMBADA, memorizing 400k+ tokens🧠 (only for testing capacity - it's training on test set). Method: check readme "State-tuning" part🚀 #RWKV #RNN #LL

Tweet media one

@BlinkDL_AI

BlinkDL

14 days

RWKV-6 state-tuned 1.6B Gradio Demo: Only the initial state is tuned, so just 24x64x2048=3.1M parameters, and 0 inference overhead. Tuned a few thousand samples. Will add more🚀

1

13

62

1

11

53

@BlinkDL_AI

BlinkDL

9 months

RWKV-4 world tuned to Japanese: and Arabic: (contact me on discord if you have data for other languages🙂) I'm preparing world v2 data for RWKV-5 0.1~14B runs🌍

1

18

47

@BlinkDL_AI

BlinkDL

6 months

And RWKV-5 models are as fast as RWKV-4 in latest rwkv.cpp now (try Q5_1)🚀

Tweet card media

GitHub - RWKV/rwkv.cpp: INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model - RWKV/rwkv.cpp

@BlinkDL_AI

BlinkDL

6 months

RWKV-5 7B 49% trained and it's already a strong model (100% RNN). try it in

Tweet media one

5

18

145

0

5

46

@BlinkDL_AI

BlinkDL

1 month

RWKV-5/6 models are great at modeling arXiv TeX papers🧑‍🔬(the best among similar params models) , because I put in both peS2o & SlimPaj_arxiv. Try "\section{Introduction}" in Gradio Demo. Tested in Wonder if someone can use it for TeX copilot🙂

Tweet media one

Tweet media two

@BlinkDL_AI

BlinkDL

1 month

RWKV-6.0 "Finch"🐦1.6B on 2.5T (=World2+World2.1) tokens for great performance. 100% RNN & attention-free. Supports 100+ world languages & code. Demo: (try a few times for each prompt, as this is base model) 3B in April and 7B in early May🙂I haven't put…

Tweet media one

Tweet media two

Tweet media three

4

23

95

0

8

43

@BlinkDL_AI

BlinkDL

1 year

@boborado Would you be interested in training more #RWKV models 😀 (on RedPajama & more)

5

3

45

@BlinkDL_AI

BlinkDL

5 months

Demo script to train RWKV-5 on MiniPile (1.5G tokens, will auto-download), works for WSL2 too:

Tweet media one

0

8

45

@BlinkDL_AI

BlinkDL

1 year

And that's why I prefer prompt-tuning😀I try the "Alpaca prompt" on RWKV 14B ctx8192, and to my surprise it works out of box without finetuning (RWKV is a 100% RNN trained on 100% Pile v1 and nothing else). demo: code:

Tweet media one

Tweet media two

@geoffreyhinton

Geoffrey Hinton

@geoffreyhinton

1 year

Reinforcement Learning by Human Feedback is just parenting for a supernaturally precocious child.

135

491

3K

2

6

42

@BlinkDL_AI

BlinkDL

1 year

I always wonder if we can do this for animals (such as cats and dogs) too. Will their "text encoder" be similar to ours (I feel so)? Can we finally decode their brain and talk with them?🐱🐶

@NishimotoShinji

Shinji Nishimoto

@NishimotoShinji

1 year

Our paper got accepted at #CVPR2023 ! (w/ @yu_takagi ) We modeled the relationship between human brain activity (early/semantic areas) and Stable Diffusion's latent representations and decoded perceptual contents from brain activity ("brain2image").

Tweet media one

16

117

426

2

4

39

@BlinkDL_AI

BlinkDL

1 year

@lorenlugosch @StabilityAI @AiEleuther @EMostaque SpikeGPT is using RWKV and they have nice diagrams :)

Tweet card media

GitHub - ridgerchu/SpikeGPT: Implementation of "SpikeGPT: Generative Pre-trained Language Model...

Implementation of "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks" - ridgerchu/SpikeGPT

0

1

39

@BlinkDL_AI

BlinkDL

3 months

Fast RWKV-5 inference on Intel iGPUs & GPU: 🚀

@BlinkDL_AI

BlinkDL

3 months

RWKV-5 "Eagle" 7B: beats Mistral-7B at multilingual, reaches Llama2-7B level at English, while being 100% attention-free RNN and only trained 1.1T tokens. Gradio Demo: RWKV-6 "Finch" 1B5 in ~10days, 3B in ~30days.

Tweet media one

Tweet media two

10

88

424

0

9

39

@BlinkDL_AI

BlinkDL

1 year

Information propagation in #RWKV (100% RNN Language Model) by our community member: "The values in the chart are the log-probability of outputting "Paris" when the states and activation of the layer at that token are recovered from corruption."

Tweet media one

0

7

39

@BlinkDL_AI

BlinkDL

7 months

More benchmarks. RWKV-5 World v2 download (training in progress): RWKV is 100% RNN - fast and saves VRAM🚀

Tweet media one

1

8

38

@BlinkDL_AI

BlinkDL

6 months

Cool RWKV-related papers: (ASR) (ICL) (boost RWKV4 performance). Latest 0.4B 3B 7B checkpts:

Tweet card media

BlinkDL/temp-latest-training-models at main

0

9

37

@BlinkDL_AI

BlinkDL

1 year

The @AiEleuther community is writing a paper on #RWKV . If you'd like to be a part of the team, join the rwkv paper channel: in EleutherAI Discord: 🚀

Tweet card media

Join the EleutherAI Discord Server!

Check out the EleutherAI community on Discord - hang out with 27270 other members and enjoy free voice and text chat.

1

5

35

@BlinkDL_AI

BlinkDL

1 year

@AlmeidaVitor21 @StabilityAI @AiEleuther @EMostaque You can use LoRA😀

Tweet card media

GitHub - Blealtan/RWKV-LM-LoRA: RWKV is a RNN with transformer-level LLM performance. It can be...

RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inf...

0

2

35

@BlinkDL_AI

BlinkDL

9 months

RWKV-4 and RWKV-5 ABC (music sheet format) model 🎹 and (100% RNN) #RWKV

Tweet media one

0

3

32

@BlinkDL_AI

BlinkDL

5 months

p.s. Transformers are RNNs with growing states (growing KV cache). Strange that this is not mentioned more.

@BlinkDL_AI

BlinkDL

5 months

RWKV-6 illustrated (formulas: ). Other projects are comparing with RWKV-4 (and call it "RWKV"). The don't even dare to show RWKV-5 numbers😂RWKV-5 3B Gradio demo:

Tweet media one

6

25

152

2

7

32

@BlinkDL_AI

BlinkDL

5 months

@felix_red_panda Arxiv is the beginning. We can use latest news, github repos, arxiv paper, blog posts, new wiki entries, and more. The point is to benchmark LLMs on new data - although they can be polluted by ChatGPT too, it is still better than using very old (and actually noisy) evals.

2

2

30

@BlinkDL_AI

BlinkDL

1 year

@ipvkyte @StabilityAI @AiEleuther @EMostaque Can already do zero-shot instruction :) I will release an Alpaca-tuned version soon.

@BlinkDL_AI

BlinkDL

1 year

And that's why I prefer prompt-tuning😀I try the "Alpaca prompt" on RWKV 14B ctx8192, and to my surprise it works out of box without finetuning (RWKV is a 100% RNN trained on 100% Pile v1 and nothing else). demo: code:

Tweet media one

Tweet media two

2

6

42

0

0

29

@BlinkDL_AI

BlinkDL

1 year

The largest RNN ever: RWKV-4 14B release😀 Let's build ChatRWKV:

Tweet media one

0

9

30

@BlinkDL_AI

BlinkDL

1 year

@tugot17 @StabilityAI @AiEleuther @EMostaque Effective ctxlen easily goes beyond 4K :)

Tweet media one

1

0

29

@BlinkDL_AI

BlinkDL

1 year

Would you be interested in testing the quantization of RWKV 14B? 🙂 It has both GPT and RNN mode (can be used as 100% RNN), so faster and saves VRAM. #RWKV #ChatRWKV

Tweet media one

@Tim_Dettmers

Tim Dettmers

1 year

An update to our k-bit inference scaling laws paper: + Includes results for 175B OPT/BLOOM + Short analysis of scaling behavior of GPTQ + Better related work + Main takeaway: input-dependent quantization like GPTQ might unlock less than 4-bits.

3

40

163

3

1

29

@BlinkDL_AI

BlinkDL

1 year

@ArthurB @StabilityAI @AiEleuther @EMostaque RWKV can be trained in GPT mode too, so you get all the GPT benefits.

0

0

28

@BlinkDL_AI

BlinkDL

7 months

Benchmark of popular small LMs on HuggingFace using (removed triviaQA for now as lm_eval is not good at parsing verbose replies. probably should use few-shot to restrict response format)

Tweet media one

0

7

29

@BlinkDL_AI

BlinkDL

6 months

All efficient RWKV-5 backends: (nvidia, amd, intel, arm, mac, gpu, cpu, vulkan, ...)

Tweet card media

GitHub - seasonjs/rwkv: pure go for rwkv

pure go for rwkv. Contribute to seasonjs/rwkv development by creating an account on GitHub.

@BlinkDL_AI

BlinkDL

6 months

RWKV v5 models are particularly good at fiction🙂3B Demo:

Tweet media one

2

17

94

0

3

29

@BlinkDL_AI

BlinkDL

1 year

Please take a look at RWKV 14B too🙂Our latest ctx4096 model is great. #RWKV #ChatRWKV

Tweet media one

@ggerganov

Georgi Gerganov

1 year

I think I can make 4-bit LLaMA-65B inference run on a 64 GB M1 Pro 🤔 Speed should be somewhere around 2 tokens/sec. Is this useful for anything?

37

17

453

0

2

27

@BlinkDL_AI

BlinkDL

6 months

Stranger things can happen as we get closer to the turning point of human kind, if my theory of the universe is correct🙃

2

3

26

@BlinkDL_AI

BlinkDL

7 months

Gradio demo of 1.5B RWKV-5 World v2 (70% trained) now with more examples (you can edit the prompt and try a few times): 🤖🚀

Tweet media one

Tweet media two

1

4

25