EnricoShippole @EnricoShippole Twitter profile

Pinned Tweet

EnricoShippole

3 months

Happy to have played a part in the release of Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers. This SOTA work was done by @RiversHaveWings , @Birchlabs , @StefanABaumann , @iScienceLuvr , and @DanielZKaplan .

3

10

51

Last Seen Profiles

@maisiecora

@l5sif0QCrO2Bw2O

@bralennx

@Yamada_SMM2

@heldersalomao

@xUpZG0g2R4WpESm

@neilbhatt4

@runtabino

@sasafknust

@lcswillems

@readme_media

@park_eunrii

@RedFlashMBB

@moonmistblue

@sher_sdk

@ChrisCurrieGolf

@marceloauguspa

@real_loveree

@omdop

@ito_inugasuki

@miyaxrin

@newodrawer

@mob_neco

@miffina07

@lawsofsacae

@nasuyuko

@official_rias

@Wootfoundation

@lmdbabyx

@sette__

@quroh_art

@mobbe

@SolorioFootball

@memexliar

@ebenezer_annani

@FieldExplores

EnricoShippole

@EnricoShippole

8 months

Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther .

28

174

791

EnricoShippole

@EnricoShippole

1 year

Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here:

GitHub - conceptofmind/PaLM: An open-source implementation of Google's PaLM models

An open-source implementation of Google's PaLM models - conceptofmind/PaLM

github.com

7

106

535

EnricoShippole

@EnricoShippole

9 months

Releasing LLongMA-2 16k, a suite of Llama-2 models, trained at 16k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1 .

14

94

392

EnricoShippole

@EnricoShippole

10 months

Releasing LLongMA-2, a suite of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1 .

6

88

375

EnricoShippole

@EnricoShippole

10 months

Releasing LLongMA-2 13b, a Llama-2 model, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1 .

7

53

269

EnricoShippole

@EnricoShippole

11 months

Releasing Hermes-Falcon-7b-8k, a Falcon model fine-tuned at 8k context length on the @NousResearch Hermes instruction dataset.

16

38

230

EnricoShippole

@EnricoShippole

11 months

With Reddit and many other sites shutting down access to their APIs it is now more important than ever to release quality open-source conversational data. I worked with @ShayneRedford to generate ~80GB of labeled FLAN dialog data.

2

40

208

EnricoShippole

@EnricoShippole

11 months

Releasing Flan-Open-Llama-7b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.

5

42

207

EnricoShippole

@EnricoShippole

1 month

We publicly released a cleaned open-source version of the case law data. You can train your own similar legal models with this dataset. We plan to release numerous other legal datasets consisting of billions of tokens in the upcoming weeks.

clem 🤗

@ClementDelangue

1 month

Even OAI is telling you that specialized models are better!

9

17

141

21

30

195

EnricoShippole

@EnricoShippole

11 months

Releasing Hermes-Open-Llama-7b-8k, an OpenLLaMA model fine-tuned at 8k context length on the @NousResearch Hermes instruction dataset.

4

41

185

EnricoShippole

@EnricoShippole

1 year

Introducing an open-source reproduction of the FLAN V2 dataset.

3

32

177

EnricoShippole

@EnricoShippole

10 months

Releasing Flan-Open-Llama-13b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.

3

30

153

EnricoShippole

@EnricoShippole

11 months

Releasing a new PaLM 2.1b model trained at a context length of 8k on C4. This model release is a continuation of the previously released 150m, 410m, and 1b models.

EnricoShippole

@EnricoShippole

1 year

Introducing three new open-source PaLM models trained at a context length of 8k on C4. Open-sourcing LLMs is a necessity for the fair and equitable democratization of AI. The models of sizes 150m, 410m, and 1b are available to download and use here:

7

106

535

3

23

140

EnricoShippole

@EnricoShippole

8 months

The model can be found on @huggingface here:

4

21

139

EnricoShippole

@EnricoShippole

9 months

Releasing Hermes-LLongMA-2 8k, a series of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. The models were trained in collaboration with @Teknium1 and @theemozilla of @NousResearch , and @kaiokendev1 .

3

33

134

EnricoShippole

@EnricoShippole

10 months

Introducing LLongMA, a series of OpenLLaMA models, trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and Kaiokendev.

3

25

127

EnricoShippole

@EnricoShippole

10 months

Releasing Tasksource-Open-Llama-13b, an OpenLLaMA model fine-tuned on the Tasksource instruction dataset.

2

24

106

EnricoShippole

@EnricoShippole

2 months

@TeraflopAI is excited to help support the @caselawaccess and @HarvardLIL , in the release of over 6.6 million state and federal court decisions published throughout U.S. history.

4

35

92

EnricoShippole

@EnricoShippole

10 months

Introducing LLongMA 13b, an OpenLLaMA model trained at 8k context length using linear positional interpolation scaling. The model was trained in collaboration with @theemozilla of @NousResearch and @kaiokendev1 .

1

19

82

EnricoShippole

@EnricoShippole

11 months

Releasing Flan-Open-Llama-3b, an OpenLLaMA model fine-tuned on the FLAN instruction dataset.

1

19

73

EnricoShippole

@EnricoShippole

2 years

Towards clean and open source text data. A deduplicated version of wikitext-103-v1 is available on @Huggingface datasets. The dataset was deduplicated with Minhash LSH and a Jaccard similarity of 0.80. #machinelearning #deeplearning #datascience

2

9

60

EnricoShippole

@EnricoShippole

8 months

We are releasing all of the code, open-source, to fully reproduce the results of the paper. The repository containing u/bloc97 and @theemozilla ’s implementation of YaRN rotary embeddings can be found here:

GitHub - jquesnelle/yarn: YaRN: Efficient Context Window Extension of Large Language Models

YaRN: Efficient Context Window Extension of Large Language Models - jquesnelle/yarn

github.com

2

6

54

EnricoShippole

@EnricoShippole

6 months

Happy to be a core contributor to @ShayneRedford 's Data Provenance Initiative. It is now more important than ever to verify the commercial licensing of available datasets in order to help ensure the integrity of the open-source community.

Shayne Longpre

@ShayneRedford

6 months

📢Announcing the🌟Data Provenance Initiative🌟 🧭A rigorous public audit of 1800+ instruct/align datasets 🔍Explore/filter sources, creators & license conditions ⚠️We see a rising divide between commercially open v closed licensed data 🌐: 1/

10

151

461

0

15

52

EnricoShippole

@EnricoShippole

8 months

We worked to extend the context length of the Llama-2 13b and 7b models through fine-tuning. The model passes all our evaluations and maintains the same perplexity at 128k extrapolation surpassing the performance of our other recent methodology, NTK-part scaling.

2

12

48

EnricoShippole

@EnricoShippole

26 days

We are releasing trillions of high-quality, copyright-free, permissively licensed tokens and multimodal data. Be sure to follow our releases @TeraflopAI .

TeraflopAI

@TeraflopAI

26 days

Glad to see Stablelm-2-12B by @jonbtow , @dmayhem93 , and @StabilityAI using our permissively licensed data to push the cutting-edge of language modeling. Data quality is more important than ever. We are working to solve this challenge at scale at @TeraflopAI .

2

4

16

1

9

45

EnricoShippole

@EnricoShippole

2 months

It is important to democratize fair access to data to the public, legal community, and researchers. You can find a processed and cleaned version of the data available on @huggingface here:

TeraflopAI/Caselaw_Access_Project · Datasets at Hugging Face

huggingface.co

2

10

39

EnricoShippole

@EnricoShippole

10 months

We worked directly with @kaiokendev1 , to extend the context length of the Llama-2 7b model through fine-tuning. The models pass all our evaluations and maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.

1

39

EnricoShippole

@EnricoShippole

10 months

We worked directly with Kaiokendev, to extend the context length of the open-llama 7b and 3b models through fine-tuning. The fine-tuned models maintain the same perplexity at 8k extrapolation surpassing the performance of other recent methodologies.

1

5

37

EnricoShippole

@EnricoShippole

8 months

A Yarn-Llama-2-7b model trained for 128k context length is available on @huggingface here:

1

6

33

EnricoShippole

@EnricoShippole

8 months

I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques:

Train Short, Test Long: Attention with Linear Biases Enables Input...

Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that...

arxiv.org

1

3

30

EnricoShippole

@EnricoShippole

11 months

@zhangir_azerbay Oak Ridge National Laboratory has a CUDA training series from 2021:

0

1

25

EnricoShippole

@EnricoShippole

10 months

The model can be found on @huggingface here:

1

3

26

EnricoShippole

@EnricoShippole

1 year

The models are also compatible with many of Lucidrain's popular repositories such as Toolformer-pytorch, PaLM-rlhf-pytorch, and PaLM-pytorch. Please be sure to sponsor and help support Phil's great work:

GitHub - lucidrains/PaLM-rlhf-pytorch: Implementation of RLHF (Reinforcement Learning with Human...

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM - lucidrains/PaLM-rlhf-pytorch

github.com

1

0

22

EnricoShippole

@EnricoShippole

10 months

The data used during fine-tuning was extensively decontaminated and cleaned of any potential benchmarks it was evaluated against by @dmayhem93 .

Susan Zhang

@suchenzang

10 months

Odds of everyone starting to train on benchmarks? 🤔 Llama2 only briefly mentions this in Appendix A.6, but only published numbers they deemed "significant" (vs Table C.1 in the GPT-3 paper which shows actual contamination metrics across all benchmarks).

9

5

89

2

3

23

EnricoShippole

@EnricoShippole

4 months

YaRN: Efficient Context Window Extension of Large Language Models was accepted to ICLR 2024. @bloc97_ @theemozilla @Void13950782 @iclr_conf #ICLR2024

EnricoShippole

@EnricoShippole

8 months

Releasing Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling. The model was trained in collaboration with u/bloc97 and @theemozilla of @NousResearch and @Void13950782 of @AiEleuther .

28

174

791

2

24

EnricoShippole

@EnricoShippole

1 year

I have been working on an open-source replication of WebGPT using @LangChainAI . LangChain by @hwchase17 is by far the best library for building comprehensive language applications.

LangChain

@LangChainAI

1 year

🔎 More detailed search results @EnricoShippole added a method to the search classes to return more detailed search info: title, snippet, link 👀 WebGPT? Google Search: Bing Search:

1

19

3

0

22

EnricoShippole

@EnricoShippole

11 months

@OfirPress Reddit data is extremely low quality and should be filtered from almost all pre-training so this won't make any difference regardless.

6

1

18

EnricoShippole

@EnricoShippole

1 year

You can find the weights on @huggingface if you prefer to download the @PyTorch .pt files from there instead:

2

1

18

EnricoShippole

@EnricoShippole

8 months

It is also worth reviewing the paper, A Length-Extrapolatable Transformer, and xPos technique which also applies scaling to rotary embeddings:

1

4

17

EnricoShippole

@EnricoShippole

10 months

A Llama-2 7b model trained at 16k context length will release soon on @huggingface here:

1

4

17

EnricoShippole

@EnricoShippole

1 year

@wightmanr Currently working on this in collab with Lucid and a few members from Carper/EAI. @ShayneRedford has been helping me to open-source the FLAN dataset so we can instruct fine-tune models from the Pythia suite. As well as things such as training a flan-PaLM model.

1

3

17

EnricoShippole

@EnricoShippole

8 months

The models used @tri_dao 's flash attention 2 and part of @togethercompute 's codebase. You can find out more about Flash Attention 2 here:

GitHub - Dao-AILab/flash-attention: Fast and memory-efficient exact attention

Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.

github.com

3

2

17

EnricoShippole

@EnricoShippole

8 months

The models have similar performance to the base LLaMA 2 models on the Open LLM benchmarks while scaling context length directly to 128k.

1

2

17

EnricoShippole

@EnricoShippole

1 year

The models were trained with Flash Attention, Xpos Rotary Embeddings for better length extrapolation, and multi-query single-key-value attention for more efficient decoding.

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to...

arxiv.org

1

0

17

EnricoShippole

@EnricoShippole

10 months

A LLongMA-13b model trained at 8k context length will be released soon. As well as a suite of LLongMA models trained at 16k and 32k context lengths.

1

2

16

EnricoShippole

@EnricoShippole

10 months

A Llama-2 13b model trained at 8k will release soon on @huggingface here:

1

16

EnricoShippole

@EnricoShippole

1 year

All of the C4 data has been pre-tokenized with the GPTNEOX tokenizer and blocked at sequence lengths of 8192. This will help to save you the large cost of preprocessing data. The datasets are available on @huggingface . An example chunk can be found here:

2

1

15

EnricoShippole

@EnricoShippole

8 months

We also trained a set of models at 64k context length. You can find the Yarn-Llama-2-13b-64k model here:

1

2

16

EnricoShippole

@EnricoShippole

8 months

@Teknium1 @suchenzang Pretty big difference in price for dedicated vs. spot. Also depends on which A100s. You should use skypilot. It is what I am adding integration for into my trainer:

GitHub - skypilot-org/skypilot: SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum...

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface. - skypilot-org/skypilot

github.com

0

1

16

EnricoShippole

@EnricoShippole

1 year

Our work on Toolformer, PaLM, and related projects is all thanks to the generous sponsorship by @carperai and @StabilityAI . A big thank you to @dmayhem93 , @jonbtow , Aman, and @zach_nussbaum as well for providing input on the @huggingface library.

1

0

16

EnricoShippole

@EnricoShippole

1 month

Not only that. We additionally built a search index over all of the data for RAG applications:

TeraflopAI/Caselaw_Access_Project_FAISS_index · Datasets at Hugging Face

huggingface.co

1

2

15

EnricoShippole

@EnricoShippole

1 year

@aicrumb Worth checking out as well:

GitHub - tloen/alpaca-lora: Instruct-tune LLaMA on consumer hardware

Instruct-tune LLaMA on consumer hardware. Contribute to tloen/alpaca-lora development by creating an account on GitHub.

github.com

1

3

15

EnricoShippole

@EnricoShippole

21 days

Data is what makes the model. We at @TeraflopAI are working hard to provide the open-source community with permissible commercially licensed datasets for training. Congrats to @arankomatsuzaki , @lintangsutawika , and @colinraffel . And thanks to @ShayneRedford for his work on FLAN.

TeraflopAI

@TeraflopAI

21 days

Glad to see our very own @arankomatsuzaki pushing the boundaries of open-source research with a new T5 release using our data. Congrats to @lintangsutawika and @colinraffel . And @ShayneRedford for his great efforts on FLAN.

0

4

9

0

1

14

EnricoShippole

@EnricoShippole

4 days

Happy to announce our paper, Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers, has been accepted to #ICML2024 . A huge congratulations to @RiversHaveWings , @StefanABaumann , and @Birchlabs . @icmlconf #ICML

EnricoShippole

@EnricoShippole

3 months

Happy to have played a part in the release of Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers. This SOTA work was done by @RiversHaveWings , @Birchlabs , @StefanABaumann , @iScienceLuvr , and @DanielZKaplan .

3

10

51

2

7

36

EnricoShippole

@EnricoShippole

11 months

Additionally, you can find a Hermes-Falcon-7b-4k model fine-tuned at a context length of 4k on @huggingface here:

2

1

14

EnricoShippole

@EnricoShippole

26 days

Glad to see Stablelm-2-12B by @jonbtow , @dmayhem93 , and @StabilityAI using our permissively licensed data to push the cutting-edge of language modeling. Data quality is more important than ever. @arankomatsuzaki and I are working to solve this challenge at scale at @TeraflopAI .

Jade

@Euclaise_

27 days

Has anyone tried this yet? They seem to have perfected trianing small models (1.6B and 3B). If they were able to keep that while scaling up, this should be amazing.

5

8

85

0

2

14

EnricoShippole

@EnricoShippole

10 months

The model has similar performance to LLaMA 2 under 4k context length, performance scales directly to 8k, and works out-of-the-box with the new version of transformers (4.31) or with `trust_remote_code` for <= 4.30.

1

13

EnricoShippole

@EnricoShippole

8 months

As well as the Yarn-Llama-2-7b-64k model here:

1

2

13

EnricoShippole

@EnricoShippole

11 months

The @NousResearch Hermes dataset consists of over 300,000 instruction data points. Thank you to @karan4d and @Teknium1 for providing the data to train these models.

1

13

EnricoShippole

@EnricoShippole

2 years

An open-source implementation of the ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer research paper in Google's #JAX and #Flax . @paperswithcode #machinelearning #python #code #programming #tech #deeplearning #ai

GitHub - conceptofmind/Scalable-ViT-flax

Contribute to conceptofmind/Scalable-ViT-flax development by creating an account on GitHub.

github.com

1

3

9

EnricoShippole

@EnricoShippole

1 year

Further instruction-tuning will be done on the new FLAN datasets we have released. A big thank you to @ShayneRedford for helping!

1

0

13

EnricoShippole

@EnricoShippole

1 year

This is not an official Google or StabilityAI product. If you have any questions about the models or training be sure to reach out and ask! I will try to respond promptly.

2

0

12

EnricoShippole

@EnricoShippole

10 months

Applying the method to the rotary position embedding requires only slight changes to the model's code by dividing the positional index, t, by a scaling factor.

1

12

EnricoShippole

@EnricoShippole

1 year

Different inference optimizations such as Flash Attention, Hidet, and Torch compile are used. You can read more about the Hidet compiler and project here:

Introducing Hidet: A Deep Learning Compiler for Efficient Model Serving

Hidet is a powerful deep learning compiler that simplifies the process of implementing high-performing deep learning operators on modern accelerators (e.g., NVIDIA GPUs). With the new feature of...

pytorch.org

1

0

12

EnricoShippole

@EnricoShippole

8 months

You can find out more about the @NousResearch organization here:

NousResearch (NousResearch)

huggingface.co

0

2

12

EnricoShippole

@EnricoShippole

10 months

@YoniKremer123 @theemozilla @NousResearch @kaiokendev1 16k is training.

1

12

EnricoShippole

@EnricoShippole

12 days

A big thank you to @joespeez @Meta for mentioning our previous research, YaRN, at the @weights_biases Fully Connected conference. We have some exciting long-context releases coming up soon.

TeraflopAI

@TeraflopAI

12 days

Awesome to see @joespeez , AI Product Director, @Meta , mention our previous research, YaRN, on stage at the @weights_biases Fully Connected conference. We have another very exciting long-context release coming soon.

1

4

8

2

0

12

EnricoShippole

@EnricoShippole

1 year

A distributed training script is provided so that you may train or fine-tune your own PaLM models using @huggingface accelerate. More information and experiments about the training will be detailed in the repository.:

1

11

EnricoShippole

@EnricoShippole

2 years

Update 6: Added The Pile by #EleutherAI as the default dataset for an #opensource pre-training implementation of the #LaMDA research paper and #ai in #PyTorch with @huggingface streaming datasets. #MachineLearning #python #code #programming #tech

GitHub - conceptofmind/LaMDA-rlhf-pytorch: Open-source pre-training implementation of Google's...

Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT. - conceptofmind/LaMDA-rlhf-pytorch

github.com

2

8

EnricoShippole

@EnricoShippole

10 months

The repository containing @theemozilla ’s implementation of scaled rotary embeddings can be found here:

GitHub - jquesnelle/yarn: YaRN: Efficient Context Window Extension of Large Language Models

YaRN: Efficient Context Window Extension of Large Language Models - jquesnelle/yarn

github.com

1

0

11

EnricoShippole

@EnricoShippole

10 months

@Yampeleg This is a common practice that has been used for quite a few years. You can find an example of packing the text and appending an EOS/EOT token with Huggingface datasets and tokenizers here:

0

1

10

EnricoShippole

@EnricoShippole

10 months

@EMostaque @alexgraveley @joao_gante Llama-2 8k is releasing tomorrow if all goes smoothly.

0

2

11

EnricoShippole

@EnricoShippole

8 months

All of the models can be found on Huggingface:

conceptofmind (Enrico Shippole)

huggingface.co

1

2

10

EnricoShippole

@EnricoShippole

9 months

We worked directly with @kaiokendev1 , to extend the context length of the Llama-2 13b and 7b models through fine-tuning. The models pass all our evaluations and maintain perplexity at 16k extrapolation surpassing the performance of other recent methodologies.

2

0

10

EnricoShippole

@EnricoShippole

2 years

An open-source implementation of the Better plain ViT baselines for ImageNet-1k research paper in Google's #JAX and #Flax . @paperswithcode #MachineLearning #python #code #programming #tech #deeplearning #ai

GitHub - conceptofmind/Simple-ViT-flax

Contribute to conceptofmind/Simple-ViT-flax development by creating an account on GitHub.

github.com

1

5

EnricoShippole

@EnricoShippole

11 months

The dialog data is available on @huggingface to download. It was processed at an extended context length of 8192. It contains relevant metadata such as Inputs, Targets, Task Source, and Task Name.

1

0

10

EnricoShippole

@EnricoShippole

1 year

Working on an open-source version of Deepmind's Sparrow. A conversational agent utilizing Google's web search API for factual grounding and RLHF. I am going to be taking the learnings from our previous implementation of Toolformer.

1

0

10

EnricoShippole

@EnricoShippole

9 months

The 7b model can be found on @huggingface here:

1

0

10

EnricoShippole

@EnricoShippole

10 months

The LLongMA 7b model is available on @huggingface to use:

1

0

10

EnricoShippole

@EnricoShippole

1 year

A basic inference script was provided in the repository which you can play around with. You may want to experiment with hyperparameters in order to get generations of varying quality. Changing a variable such as temperature matters a lot.

1

0

10

EnricoShippole

@EnricoShippole

1 year

If you would like to preprocess your own dataset for training there is a dataset builder script provided. This uses @huggingface datasets to efficiently map, tokenize, and block the data:

1

0

9

EnricoShippole

@EnricoShippole

10 months

The repository containing @theemozilla 's implementation of scaled rotary embeddings can be found here:

GitHub - jquesnelle/yarn: YaRN: Efficient Context Window Extension of Large Language Models

YaRN: Efficient Context Window Extension of Large Language Models - jquesnelle/yarn

github.com

1

9

EnricoShippole

@EnricoShippole

11 months

An additional FLAN Dialog submix dataset was also preprocessed for causal language modeling, fixing different encoding issues, and is available on @huggingface to download.

1

0

9

EnricoShippole

@EnricoShippole

10 months

If you would like to learn more about scaling rotary embeddings, I would strongly recommend reading @kaiokendev1 's blog posts on his findings:

Home

pages

kaiokendev.github.io

1

8

EnricoShippole

@EnricoShippole

11 months

The @NousResearch Hermes dataset consists of over 300,000 instruction data points. Thank you to @karan4d and @Teknium1 for providing the data to train these models.

2

1

8

EnricoShippole

@EnricoShippole

1 year

Adding support for numerous different #opensource models and providers to @LangChainAI is an imperative step in establishing an ecosystem that is mutually beneficial to all. The work done by @hwchase17 will help lead to both fair and equal distribution of artificial intelligence.

LangChain

@LangChainAI

1 year

🦜🔗 v0.0.86 📂Lots more open source model integrations! @EnricoShippole 🪵PromptLayer ( @imjaredz ) and Helicone ( @justinstorre ) integrations And lots of other docs and bug fixes! 🧵

1

6

54

1

9

EnricoShippole

@EnricoShippole

5 months

Dissemination of artificial intelligence through clean, open-source user interfaces and experiences is an absolutely necessary gap that needs to be bridged between the research community and app developers. We must start furthering collaboration with front-end communities.

Rohan Paul

@rohanpaul_ai

5 months

Ollama iOS mobile app (open source) It works with all models served with Ollama.

11

41

174

2

0

8

EnricoShippole

@EnricoShippole

11 months

This is a previous extension of his work to publicly release the FLAN collection:

EnricoShippole

@EnricoShippole

1 year

Introducing an open-source reproduction of the FLAN V2 dataset.

3

32

177

1

0

9

EnricoShippole

@EnricoShippole

5 days

I have more copyright-free and commercially viable data than I know what to possibly do with. We are always actively looking for organizations to partner with to train on and serve this data to the community.

Peter Henderson

@PeterHndrsn

6 days

🚨More AI copyright lawsuits!🚨 1. Artists sue Google for Imagen (). 2. More newspapers sue MSFT/OpenAI (). The newspaper litigation has far more compelling examples and arguments than prior cases. One to watch.

0

5

28

0

1

9

EnricoShippole

@EnricoShippole

11 months

Additionally, you can find a Hermes-Open-Llama-7b-4k model fine-tuned at a context length of 4k on @huggingface here:

1

0

8

EnricoShippole

@EnricoShippole

1 year

"Helped" add the new @PyTorch 2.0 Flash Attention to Lucidrain's PaLM-rlhf-pytorch repository. The repository uses RLHF to build models similar to #ChatGPT and #GPT4 . Be sure to support/donate to his great open-source work.

GitHub - lucidrains/PaLM-rlhf-pytorch: Implementation of RLHF (Reinforcement Learning with Human...

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM - lucidrains/PaLM-rlhf-pytorch

github.com

0

2

7

EnricoShippole

@EnricoShippole

1 year

I have had the pleasure of working with @hwchase17 to expand the @LangChainAI ecosystem by adding support for numerous different #opensource models, such as those by @AiEleuther , and providers. It is a necessary step in ensuring the democratization of artificial intelligence.

Harrison Chase

@hwchase17

1 year

We need more options for integrating open source models (like those from @AiEleuther ) into @LangChainAI 🏆Thanks to @EnricoShippole we have exactly that 🚀First class support for @gooseai_NLP @cerebriumai @ForefrontAI and Petals 📃Docs:

1

5

53

0

2

9

EnricoShippole

@EnricoShippole

2 months

In collaboration with Ravel Law, @hlslib digitized over 40 million U.S. court decisions consisting of 6.7 million cases from the last 360 years into a dataset that is widely accessible to use. You can bulk download the data using the CAP API:

1

7

EnricoShippole

@EnricoShippole

10 months

A PR to add scaled rotary embeddings to @huggingface transformers has been added by @joao_gante and merged:

Llama/GPTNeoX: add RoPE scaling by gante · Pull Request #24653 · huggingface/transformers

What does this PR do? This is an experimental PR for discussion, so we can decide whether to add this pattern. Context In the past week, there have been several developments about scaling RoPE (Rot...

github.com

1

7

EnricoShippole

@EnricoShippole

10 months

I would also recommend checking out the phenomenal research by @OfirPress on ALiBi which laid the foundation for many of these scaling techniques:

Train Short, Test Long: Attention with Linear Biases Enables Input...

Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that...

arxiv.org

1

0

8

EnricoShippole

@EnricoShippole

10 months

If you would like to learn more about scaling rotary embeddings, I would strongly recommend reading @kaiokendev1 's blog posts on his findings:

Home

pages

kaiokendev.github.io

1

0

8

EnricoShippole

@EnricoShippole

11 months

@nisten @NousResearch Standard fine-tune at 8k. No landmark. No lora.

0

8

EnricoShippole

@EnricoShippole

2 months

Thank you to @nomic_ai for providing us with Atlas research credits to store and visualize each of the jurisdictions in this dataset. You can view a Nomic Atlas map of New York state court decisions here: