EnricoShippole Profile Banner
EnricoShippole Profile
EnricoShippole

@EnricoShippole

2,288
Followers
50
Following
57
Media
933
Statuses

@TeraflopAI

Joined November 2019
Don't wanna be here? Send us removal request.
Pinned Tweet
@EnricoShippole
EnricoShippole
3 months
Happy to have played a part in the release of Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers. This SOTA work was done by @RiversHaveWings , @Birchlabs , @StefanABaumann , @iScienceLuvr , and @DanielZKaplan .
Tweet media one
3
10
51
@EnricoShippole
EnricoShippole
8 months
Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther .
Tweet media one
28
174
791
@EnricoShippole
EnricoShippole
1 year
Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here:
7
106
535
@EnricoShippole
EnricoShippole
9 months
Releasing LLongMA-2 16k, a suite of Llama-2 models, trained at 16k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1 .
14
94
392
@EnricoShippole
EnricoShippole
10 months
Releasing LLongMA-2, a suite of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1 .
6
88
375
@EnricoShippole
EnricoShippole
10 months
Releasing LLongMA-2 13b, a Llama-2 model, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1 .
Tweet media one
7
53
269
@EnricoShippole
EnricoShippole
11 months
Releasing Hermes-Falcon-7b-8k, a Falcon model fine-tuned at 8k context length on the @NousResearch Hermes instruction dataset.
16
38
230
@EnricoShippole
EnricoShippole
11 months
With Reddit and many other sites shutting down access to their APIs it is now more important than ever to release quality open-source conversational data. I worked with @ShayneRedford to generate ~80GB of labeled FLAN dialog data.
2
40
208
@EnricoShippole
EnricoShippole
11 months
Releasing Flan-Open-Llama-7b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.
5
42
207
@EnricoShippole
EnricoShippole
1 month
We publicly released a cleaned open-source version of the case law data. You can train your own similar legal models with this dataset. We plan to release numerous other legal datasets consisting of billions of tokens in the upcoming weeks.
@ClementDelangue
clem 🤗
1 month
Even OAI is telling you that specialized models are better!
Tweet media one
Tweet media two
9
17
141
21
30
195
@EnricoShippole
EnricoShippole
11 months
Releasing Hermes-Open-Llama-7b-8k, an OpenLLaMA model fine-tuned at 8k context length on the @NousResearch Hermes instruction dataset.
4
41
185
@EnricoShippole
EnricoShippole
1 year
Introducing an open-source reproduction of the FLAN V2 dataset.
3
32
177
@EnricoShippole
EnricoShippole
10 months
Releasing Flan-Open-Llama-13b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.
3
30
153
@EnricoShippole
EnricoShippole
11 months
Releasing a new PaLM 2.1b model trained at a context length of 8k on C4. This model release is a continuation of the previously released 150m, 410m, and 1b models.
@EnricoShippole
EnricoShippole
1 year
Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here:
7
106
535
3
23
140
@EnricoShippole
EnricoShippole
8 months
The model can be found on @huggingface here:
4
21
139
@EnricoShippole
EnricoShippole
9 months
Releasing Hermes-LLongMA-2 8k, a series of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The models were trained in collaboration with @Teknium1 and @theemozilla of @NousResearch , and @kaiokendev1 .
3
33
134
@EnricoShippole
EnricoShippole
10 months
Introducing LLongMA, a series of OpenLLaMA models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and Kaiokendev.
Tweet media one
3
25
127
@EnricoShippole
EnricoShippole
10 months
Releasing Tasksource-Open-Llama-13b, an OpenLLaMA model fine-tuned on the Tasksource instruction dataset.
2
24
106
@EnricoShippole
EnricoShippole
2 months
@TeraflopAI is excited to help support the @caselawaccess and @HarvardLIL , in the release of over 6.6 million state and federal court decisions published throughout U.S. history.
Tweet media one
4
35
92
@EnricoShippole
EnricoShippole
10 months
Introducing LLongMA 13b, an OpenLLaMA model trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1 .
1
19
82
@EnricoShippole
EnricoShippole
11 months
Releasing Flan-Open-Llama-3b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.
1
19
73
@EnricoShippole
EnricoShippole
2 years
Towards clean and open source text data. A deduplicated version of wikitext-103-v1 is available on @Huggingface datasets. The dataset was deduplicated with Minhash LSH and a Jaccard similarity of 0.80. #machinelearning #deeplearning #datascience
2
9
60
@EnricoShippole
EnricoShippole
8 months
We are releasing all of the code, open-source, to fully reproduce the results of the paper. The repository containing u/bloc97 and @theemozilla ’s implementation of YaRN rotary embeddings can be found here:
2
6
54
@EnricoShippole
EnricoShippole
6 months
Happy to be a core contributor to @ShayneRedford 's Data Provenance Initiative. It is now more important than ever to verify the commercial licensing of available datasets in order to help ensure the integrity of the open-source community.
@ShayneRedford
Shayne Longpre
6 months
📢Announcing the🌟Data Provenance Initiative🌟 🧭A rigorous public audit of 1800+ instruct/align datasets 🔍Explore/filter sources, creators & license conditions ⚠️We see a rising divide between commercially open v closed licensed data 🌐: 1/
10
151
461
0
15
52
@EnricoShippole
EnricoShippole
8 months
We worked to extend the context length of the Llama-2 13b and 7b models through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 128k extrapolation surpassing the performance of our other recent methodology, NTK-part scaling.
Tweet media one
2
12
48
@EnricoShippole
EnricoShippole
26 days
We are releasing trillions of high-quality, copyright-free, permissively licensed tokens and multimodal data. Be sure to follow our releases @TeraflopAI .
@TeraflopAI
TeraflopAI
26 days
Glad to see Stablelm-2-12B by @jonbtow , @dmayhem93 , and @StabilityAI using our permissively licensed data to push the cutting-edge of language modeling. Data quality is more important than ever. We are working to solve this challenge at scale at @TeraflopAI .
2
4
16
1
9
45
@EnricoShippole
EnricoShippole
2 months
It is important to democratize fair access to data to the public, legal community, and researchers. You can find a processed and cleaned version of the data available on @huggingface here:
2
10
39
@EnricoShippole
EnricoShippole
10 months
We worked directly with @kaiokendev1 , to extend the context length of the Llama-2 7b model through fine-tuning. The models pass all our evaluations and maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.
Tweet media one
1
1
39
@EnricoShippole
EnricoShippole
10 months
We worked directly with Kaiokendev, to extend the context length of the open-llama 7b and 3b models through fine-tuning. The fine-tuned models maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.
Tweet media one
1
5
37
@EnricoShippole
EnricoShippole
8 months
A Yarn-Llama-2-7b model trained for 128k context length is available on @huggingface here:
1
6
33
@EnricoShippole
EnricoShippole
11 months
@zhangir_azerbay Oak Ridge National Laboratory has a CUDA training series from 2021:
0
1
25
@EnricoShippole
EnricoShippole
10 months
The model can be found on @huggingface here:
1
3
26
@EnricoShippole
EnricoShippole
1 year
The models are also compatible with many of Lucidrain's popular repositories such as Toolformer-pytorch, PaLM-rlhf-pytorch, and PaLM-pytorch. Please be sure to sponsor and help support Phil's great work:
1
0
22
@EnricoShippole
EnricoShippole
10 months
The data used during fine-tuning was extensively decontaminated and cleaned of any potential benchmarks it was evaluated against by @dmayhem93 .
@suchenzang
Susan Zhang
10 months
Odds of everyone starting to train on benchmarks? 🤔 Llama2 only briefly mentions this in Appendix A.6, but only published numbers they deemed "significant" (vs Table C.1 in the GPT-3 paper which shows actual contamination metrics across all benchmarks).
9
5
89
2
3
23
@EnricoShippole
EnricoShippole
4 months
YaRN: Efficient Context Window Extension of Large Language Models was accepted to ICLR 2024. @bloc97_ @theemozilla @Void13950782 @iclr_conf #ICLR2024
@EnricoShippole
EnricoShippole
8 months
Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther .
Tweet media one
28
174
791
2
2
24
@EnricoShippole
EnricoShippole
1 year
I have been working on an open-source replication of WebGPT using @LangChainAI . LangChain by @hwchase17 is by far the best library for building comprehensive language applications.
@LangChainAI
LangChain
1 year
🔎 More detailed search results @EnricoShippole added a method to the search classes to return more detailed search info: title, snippet, link 👀 WebGPT? Google Search: Bing Search:
1
1
19
3
0
22
@EnricoShippole
EnricoShippole
11 months
@OfirPress Reddit data is extremely low quality and should be filtered from almost all pre-training so this won't make any difference regardless.
6
1
18
@EnricoShippole
EnricoShippole
1 year
You can find the weights on @huggingface if you prefer to download the @PyTorch .pt files from there instead:
2
1
18
@EnricoShippole
EnricoShippole
8 months
It is also worth reviewing the paper, A Length-Extrapolatable Transformer, and xPos technique which also applies scaling to rotary embeddings:
1
4
17
@EnricoShippole
EnricoShippole
10 months
A Llama-2 7b model trained at 16k context length will release soon on @huggingface here:
1
4
17
@EnricoShippole
EnricoShippole
1 year
@wightmanr Currently working on this in collab with Lucid and a few members from Carper/EAI. @ShayneRedford has been helping me to open-source the FLAN dataset so we can instruct fine-tune models from the Pythia suite. As well as things such as training a flan-PaLM model.
1
3
17
@EnricoShippole
EnricoShippole
8 months
The models have similar performance to the base LLaMA 2 models on the Open LLM benchmarks while scaling context length directly to 128k.
Tweet media one
1
2
17
@EnricoShippole
EnricoShippole
10 months
A LLongMA-13b model trained at 8k context length will be released soon. As well as a suite of LLongMA models trained at 16k and 32k context lengths.
1
2
16
@EnricoShippole
EnricoShippole
10 months
A Llama-2 13b model trained at 8k will release soon on @huggingface here:
1
1
16
@EnricoShippole
EnricoShippole
1 year
All of the C4 data has been pre-tokenized with the GPTNEOX tokenizer and blocked at sequence lengths of 8192. This will help to save you the large cost of preprocessing data. The datasets are available on @huggingface . An example chunk can be found here:
2
1
15
@EnricoShippole
EnricoShippole
8 months
We also trained a set of models at 64k context length. You can find the Yarn-Llama-2-13b-64k model here:
1
2
16
@EnricoShippole
EnricoShippole
1 year
Our work on Toolformer, PaLM, and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI . A big thank you to @dmayhem93 , @jonbtow , Aman, and @zach_nussbaum as well for providing input on the @huggingface library.
1
0
16
@EnricoShippole
EnricoShippole
1 month
Not only that. We additionally built a search index over all of the data for RAG applications:
1
2
15
@EnricoShippole
EnricoShippole
21 days
Data is what makes the model. We at @TeraflopAI are working hard to provide the open-source community with permissible commercially licensed datasets for training. Congrats to @arankomatsuzaki , @lintangsutawika , and @colinraffel . And thanks to @ShayneRedford for his work on FLAN.
@TeraflopAI
TeraflopAI
21 days
Glad to see our very own @arankomatsuzaki pushing the boundaries of open-source research with a new T5 release using our data. Congrats to @lintangsutawika and @colinraffel . And @ShayneRedford for his great efforts on FLAN.
0
4
9
0
1
14
@EnricoShippole
EnricoShippole
4 days
Happy to announce our paper, Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers, has been accepted to #ICML2024 . A huge congratulations to @RiversHaveWings , @StefanABaumann , and @Birchlabs . @icmlconf #ICML
@EnricoShippole
EnricoShippole
3 months
Happy to have played a part in the release of Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers. This SOTA work was done by @RiversHaveWings , @Birchlabs , @StefanABaumann , @iScienceLuvr , and @DanielZKaplan .
Tweet media one
3
10
51
2
7
36
@EnricoShippole
EnricoShippole
11 months
Additionally, you can find a Hermes-Falcon-7b-4k model fine-tuned at a context length of 4k on @huggingface here:
2
1
14
@EnricoShippole
EnricoShippole
26 days
Glad to see Stablelm-2-12B by @jonbtow , @dmayhem93 , and @StabilityAI using our permissively licensed data to push the cutting-edge of language modeling. Data quality is more important than ever. @arankomatsuzaki and I are working to solve this challenge at scale at @TeraflopAI .
@Euclaise_
Jade
27 days
Has anyone tried this yet? They seem to have perfected trianing small models (1.6B and 3B). If they were able to keep that while scaling up, this should be amazing.
5
8
85
0
2
14
@EnricoShippole
EnricoShippole
10 months
The model has similar performance to LLaMA 2 under 4k context length, performance scales directly to 8k, and works out-of-the-box with the new version of transformers (4.31) or with `trust_remote_code` for <= 4.30.
1
1
13
@EnricoShippole
EnricoShippole
8 months
As well as the Yarn-Llama-2-7b-64k model here:
1
2
13
@EnricoShippole
EnricoShippole
11 months
The @NousResearch Hermes dataset consists of over 300,000 instruction data points. Thank you to @karan4d and @Teknium1 for providing the data to train these models.
1
1
13
@EnricoShippole
EnricoShippole
1 year
Further instruction-tuning will be done on the new FLAN datasets we have released. A big thank you to @ShayneRedford for helping!
1
0
13
@EnricoShippole
EnricoShippole
1 year
This is not an official Google or StabilityAI product. If you have any questions about the models or training be sure to reach out and ask! I will try to respond promptly.
2
0
12
@EnricoShippole
EnricoShippole
10 months
Applying the method to the rotary position embedding requires only slight changes to the model's code by dividing the positional index, t, by a scaling factor.
Tweet media one
1
1
12
@EnricoShippole
EnricoShippole
8 months
You can find out more about the @NousResearch organization here:
0
2
12
@EnricoShippole
EnricoShippole
12 days
A big thank you to @joespeez @Meta for mentioning our previous research, YaRN, at the @weights_biases Fully Connected conference. We have some exciting long-context releases coming up soon.
@TeraflopAI
TeraflopAI
12 days
Awesome to see @joespeez , AI Product Director, @Meta , mention our previous research, YaRN, on stage at the @weights_biases Fully Connected conference. We have another very exciting long-context release coming soon.
1
4
8
2
0
12
@EnricoShippole
EnricoShippole
1 year
A distributed training script is provided so that you may train or fine-tune your own PaLM models using @huggingface accelerate. More information and experiments about the training will be detailed in the repository.:
1
1
11
@EnricoShippole
EnricoShippole
10 months
@Yampeleg This is a common practice that has been used for quite a few years. You can find an example of packing the text and appending an EOS/EOT token with Huggingface datasets and tokenizers here:
0
1
10
@EnricoShippole
EnricoShippole
10 months
@EMostaque @alexgraveley @joao_gante Llama-2 8k is releasing tomorrow if all goes smoothly.
0
2
11
@EnricoShippole
EnricoShippole
8 months
All of the models can be found on Huggingface:
1
2
10
@EnricoShippole
EnricoShippole
9 months
We worked directly with @kaiokendev1 , to extend the context length of the Llama-2 13b and 7b models through fine-tuning. The models pass all our evaluations and maintain perplexity at 16k extrapolation surpassing the performance of other recent methodologies.
Tweet media one
2
0
10
@EnricoShippole
EnricoShippole
11 months
The dialog data is available on @huggingface to download. It was processed at an extended context length of 8192. It contains relevant metadata such as Inputs, Targets, Task Source, and Task Name.
1
0
10
@EnricoShippole
EnricoShippole
1 year
Working on an open-source version of Deepmind's Sparrow. A conversational agent utilizing Google's web search API for factual grounding and RLHF. I am going to be taking the learnings from our previous implementation of Toolformer.
1
0
10
@EnricoShippole
EnricoShippole
9 months
The 7b model can be found on @huggingface here:
1
0
10
@EnricoShippole
EnricoShippole
10 months
The LLongMA 7b model is available on @huggingface to use:
1
0
10
@EnricoShippole
EnricoShippole
1 year
A basic inference script was provided in the repository which you can play around with. You may want to experiment with hyperparameters in order to get generations of varying quality. Changing a variable such as temperature matters a lot.
1
0
10
@EnricoShippole
EnricoShippole
1 year
If you would like to preprocess your own dataset for training there is a dataset builder script provided. This uses @huggingface datasets to efficiently map, tokenize, and block the data:
1
0
9
@EnricoShippole
EnricoShippole
11 months
An additional FLAN Dialog submix dataset was also preprocessed for causal language modeling, fixing different encoding issues, and is available on @huggingface to download.
1
0
9
@EnricoShippole
EnricoShippole
10 months
If you would like to learn more about scaling rotary embeddings, I would strongly recommend reading @kaiokendev1 's blog posts on his findings:
1
1
8
@EnricoShippole
EnricoShippole
11 months
The @NousResearch Hermes dataset consists of over 300,000 instruction data points. Thank you to @karan4d and @Teknium1 for providing the data to train these models.
2
1
8
@EnricoShippole
EnricoShippole
1 year
Adding support for numerous different #opensource models and providers to @LangChainAI is an imperative step in establishing an ecosystem that is mutually beneficial to all. The work done by @hwchase17 will help lead to both fair and equal distribution of artificial intelligence.
@LangChainAI
LangChain
1 year
🦜🔗 v0.0.86 📂Lots more open source model integrations! @EnricoShippole 🪵PromptLayer ( @imjaredz ) and Helicone ( @justinstorre ) integrations And lots of other docs and bug fixes! 🧵
1
6
54
1
1
9
@EnricoShippole
EnricoShippole
5 months
Dissemination of artificial intelligence through clean, open-source user interfaces and experiences is an absolutely necessary gap that needs to be bridged between the research community and app developers. We must start furthering collaboration with front-end communities.
@rohanpaul_ai
Rohan Paul
5 months
Ollama iOS mobile app (open source) It works with all models served with Ollama.
11
41
174
2
0
8
@EnricoShippole
EnricoShippole
11 months
This is a previous extension of his work to publicly release the FLAN collection:
@EnricoShippole
EnricoShippole
1 year
Introducing an open-source reproduction of the FLAN V2 dataset.
3
32
177
1
0
9
@EnricoShippole
EnricoShippole
5 days
I have more copyright-free and commercially viable data than I know what to possibly do with. We are always actively looking for organizations to partner with to train on and serve this data to the community.
@PeterHndrsn
Peter Henderson
6 days
🚨More AI copyright lawsuits!🚨 1. Artists sue Google for Imagen (). 2. More newspapers sue MSFT/OpenAI (). The newspaper litigation has far more compelling examples and arguments than prior cases. One to watch.
0
5
28
0
1
9
@EnricoShippole
EnricoShippole
11 months
Additionally, you can find a Hermes-Open-Llama-7b-4k model fine-tuned at a context length of 4k on @huggingface here:
1
0
8
@EnricoShippole
EnricoShippole
1 year
"Helped" add the new @PyTorch 2.0 Flash Attention to Lucidrain's PaLM-rlhf-pytorch repository. The repository uses RLHF to build models similar to #ChatGPT and #GPT4 . Be sure to support/donate to his great open-source work.
0
2
7
@EnricoShippole
EnricoShippole
1 year
I have had the pleasure of working with @hwchase17 to expand the @LangChainAI ecosystem by adding support for numerous different #opensource models, such as those by @AiEleuther , and providers. It is a necessary step in ensuring the democratization of artificial intelligence.
@hwchase17
Harrison Chase
1 year
We need more options for integrating open source models (like those from @AiEleuther ) into @LangChainAI 🏆Thanks to @EnricoShippole we have exactly that 🚀First class support for @gooseai_NLP @cerebriumai @ForefrontAI and Petals 📃Docs:
1
5
53
0
2
9
@EnricoShippole
EnricoShippole
2 months
In collaboration with Ravel Law, @hlslib digitized over 40 million U.S. court decisions consisting of 6.7 million cases from the last 360 years into a dataset that is widely accessible to use. You can bulk download the data using the CAP API:
1
1
7
@EnricoShippole
EnricoShippole
10 months
If you would like to learn more about scaling rotary embeddings, I would strongly recommend reading @kaiokendev1 's blog posts on his findings:
1
0
8
@EnricoShippole
EnricoShippole
11 months
@nisten @NousResearch Standard fine-tune at 8k. No landmark. No lora.
0
0
8
@EnricoShippole
EnricoShippole
2 months
Thank you to @nomic_ai for providing us with Atlas research credits to store and visualize each of the jurisdictions in this dataset. You can view a Nomic Atlas map of New York state court decisions here:
1
2
8
@EnricoShippole
EnricoShippole
3 months
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers was number 1 on Hacker News.
@Birchlabs
Birchlabs
3 months
our paper made it to #1 on Hacker News!
Tweet media one
6
11
236
0
1
8