RWKV Profile Banner
RWKV Profile
RWKV

@RWKV_AI

2,100
Followers
3
Following
13
Media
50
Statuses

AI model built by the community, for everyone in this world Part of the Linux Foundation, Apache 2 licensed An RNN scaled to 14B params with GPT-level of perf

World
Joined November 2023
Don't wanna be here? Send us removal request.
Pinned Tweet
@RWKV_AI
RWKV
7 months
#RWKV is One Dev's Journey to Dethrone Transformers The largest RNN ever (up to 14B). Parallelizable. Fast inference & training. Quantizable. Low vram usage. 3+ years of hard work Created by @BlinkDL_AI Computation sponsored by @StabilityAI @AiEleuther
0
4
26
@RWKV_AI
RWKV
4 months
Introducing Eagle-7B Based on the RWKV-v5 architecture, bringing into opensource space, the strongest - multi-lingual model (beating even mistral) - attention-free transformer today (10-100x+ lower inference) With comparable English performance with the best 1T 7B models
Tweet media one
23
251
1K
@RWKV_AI
RWKV
4 months
All while being - Cleanly licensed Apache 2, under @linuxfoundation (do anything with it!) - The world's greenest 7B model ๐ŸŒฒ (by per token, energy consumption) You can find out more from our full writeup:
3
20
157
@RWKV_AI
RWKV
2 months
๐Ÿฆ… Eagle & ๐Ÿฆ Finch The RWKV v5 and v6 architecture paper is here Both of which, improve over RWKV-4, scaled up to 7.5b and 3.1b billion multilingual models respectively Open-source code, weights, and dataset Apache 2 licensed, under Linux Foundation
5
43
170
@RWKV_AI
RWKV
2 months
The conclusive EagleX is here Based on the RWKV-v5 architecture, bringing into opensource 7B space, the best SOTA - Multi-lingual model - English perplexity model - Attention-free transformer today (10-100x+ lower inference) With comparable English performance to Mistral
Tweet media one
3
32
156
@RWKV_AI
RWKV
4 months
If you want to quickly give it a try, you can go to our official hugging face demo here, of our latest model: We would strongly encourage you to try in non-English languages!
4
10
102
@RWKV_AI
RWKV
4 months
In terms of actual eval multi-lingual numbers, we see a substantial overall jump (by 4%!) from our previous RWKV-v4-based architecture, even with the same training dataset. A huge win for 50% of the world's population ๐Ÿ—บ๏ธ (going past 17% of the English-speaking world)
Tweet media one
3
0
59
@RWKV_AI
RWKV
4 months
This is significant - because it shows clear evidence that RWKV / linear transformers... Has strong potential to replace existing attention-based architecture, with its substantially lower inference cost, and no feature compromise So all we need to do next is get GPUs & scale
Tweet media one
2
2
57
@RWKV_AI
RWKV
7 months
RWKV V5 - 3B model (preview) is out Final fine tune, to increase its context length to 8k is on its way. Which will also hopefully give that final score bump ๐Ÿ˜‰ For now it looks on track to match the top 3B models in english, and surpass everyone in multi-lingual benchmarks ๐Ÿคž
Tweet media one
1
9
54
@RWKV_AI
RWKV
4 months
Regardless, we plan to further train this model with another 1T token, to bring it within direct comparison with LLaMA2 7B model, and hopefully surpass it Because it seems like we are scaling like a transformer by token count? As seen by similar 300B scores with Pythia
Tweet media one
1
0
43
@RWKV_AI
RWKV
4 months
While English-based evals show a similar leap. It brings us in line with similar token scaling laws of transformers Where we trade blows with other models with similar token count, or more. Before losing out to much longer-trained models like mistral
Tweet media one
1
1
41
@RWKV_AI
RWKV
4 months
Wrapping up: #RWKV was originally created by @BlinkDL_AI as a project at @AIEleuther and is now being hosted by @LFAIDataFdn Find out more on our wiki Compute was sponsored by @AIEleuther @StabilityAI & Others RWKV is not an official @StabilityAI product
2
0
37
@RWKV_AI
RWKV
2 months
All while being - Cleanly licensed Apache 2, under @linuxfoundation (do anything with it!) - The world's greenest 7B model ๐ŸŒฒ (by per token, energy consumption) - Trained on 2.25T of tokens You can find out more from our full writeup here:
1
1
23
@RWKV_AI
RWKV
2 months
Stay tuned for more details on our upcoming models this week - Eagle: 2.25T 7B - Finch: 2.5T 1.6B (Some of you probably already know where to find it, if you search through our repos / discord)
1
0
13
@RWKV_AI
RWKV
2 months
This also marks the final Eagle model, in our v5 line. Future Finch model will be based on the v6 architecture, which is shown to have approximately 10% (give and take) improvements in performance over v5 While being upcycling compatible with v5 So here comes the finch ๐Ÿฆ
Tweet media one
1
1
13
@RWKV_AI
RWKV
2 months
The RWKV community wiki can be found at: Our discord can be found at: Give the model a try, drop by our discord, and provide us feedback on how we can improve the model for the community.
0
0
10
@RWKV_AI
RWKV
2 months
Does this cover our latest model? No - this covers our previously released Eagle and Finch line of models, trained up to 1.1T tokens A reminder, that as a fully Open Source project, we release in the following sequence: Code, Weights, then the paper Not the other way around
1
0
10
@RWKV_AI
RWKV
2 months
Why is this progress significant? Because it shows clear evidence that RWKV / linear transformers... Has the potential to replace existing attention-based architecture, with substantially lower inference cost, and no feature compromise ๐Ÿฆ… paper at:
Tweet media one
1
0
9
@RWKV_AI
RWKV
2 months
Wrapping up: #RWKV was originally created by @BlinkDL_AI as a project at @AIEleuther ; and is now being hosted by @LFAIDataFdn Compute for this training, was sponsored by @recursal_AI You can find the latest EagleX model on their cloud platform here:
1
0
8
@RWKV_AI
RWKV
2 months
As with the previous 7B model, we further the open-source SOTA landscape, with leading English perplexity performance. While maintaining SOTA multi-lingual performance across 23 languages
Tweet media one
Tweet media two
1
0
8
@RWKV_AI
RWKV
2 months
This is in line with our OSS group's overall goal: To ensure the best AI models are made accessible, to everyone worldwide, regardless of language, or economic status (approximate map of languages supported worldwide)
Tweet media one
1
0
8
@RWKV_AI
RWKV
2 months
All while surpassing llama2 7B across a mixture of 21 popular English evals. While closing the gap with Mistral 7B. Proving that with continued training, the model architecture scales similarly (or better) than transformers, by tokens.
Tweet media one
1
0
6
@RWKV_AI
RWKV
2 months
Special shout-outs to @BlinkDL_AI : the creator of RWKV @AiEleuther : awesome folks who help us in the paper authoring process @LFAIDataFdn : for hosting the OSS project @StabilityAI : for partially sponsoring the bulk of the GPU used for these documented models
1
0
6
@RWKV_AI
RWKV
2 months
If you want to quickly give it a try, you can go to our official hugging face demo here, of our latest model: We would strongly encourage you to try in non-English languages!
1
0
5
@RWKV_AI
RWKV
7 months
You can give our 3B demo a try here: And compare it against our 1.5B demo: Model weights for this preview are available at:
1
0
3
@RWKV_AI
RWKV
2 months
@QuentinAnthon15 @BingchenZhao In addition, shout out to the: various contributors to the dataset, the model architecture, training & inference code Paper authorship: is reflected by paper writing contribution, which is separate from model creation/code/dataset contribution
0
0
3
@RWKV_AI
RWKV
4 months
@Mt_B00Ks "The bald eagle, the only predator that doesn't have humans as its enemy."
Tweet media one
0
0
1
@RWKV_AI
RWKV
7 months
#RWKV was originally created by @BlinkDL_AI as a project at @AIEleuther and is now being hosted by @LFAIDataFdn Find out more on our wiki Compute was sponsored from @AIEleuther and @StabilityAI . RWKV is not an official @StabilityAI product
0
0
0
@RWKV_AI
RWKV
6 months
@amirsalimiiii @K_P_Ise @SebastienBubeck @apples_jimmy You are probably referring to The above is a different technique however. Hopefully more methods to improve LLM math will appear in the future.
@BlinkDL_AI
BlinkDL
11 months
A tiny #RWKV with 2.9M (!) params can solve 18239.715*9.728263 or 4.2379*564.778-1209.01 etc. with CoT, while being 100% #RNN (L6-D192)๐ŸคฏThe trick: generate lots of data with reversed numbers (denoted by "f" here) to train the model๐Ÿš€Try it now:
Tweet media one
5
40
196
0
0
1