🚨 Our New LLM Research 🚨 We show how finetuning and sparsity come together to enable accurate LLMs that can be deployed on CPUs with DeepSparse. The result is a ~7x CPU speedup for a finetuned @MosaicML's MPT-7B model vs. the FP32 baseline. 🙏 @ISTAustria for collaboration! Tweet added by Neural Magic @neuralmagic

Neural Magic

8 months

🚨 Our New LLM Research 🚨 We show how finetuning and sparsity come together to enable accurate LLMs that can be deployed on CPUs with DeepSparse. The result is a ~7x CPU speedup for a finetuned @MosaicML 's MPT-7B model vs. the FP32 baseline. 🙏 @ISTAustria for collaboration!

5

11

42

Neural Magic

@neuralmagic

8 months

Get the paper here: Kudos to paper authors @_EldarKurtic , Denis Kuznedelev, @elias_frantar , @mgoin_ , @DAlistarh , and wider teams from Neural Magic and ISTA for this remarkable work!

Sparse Fine-tuning for Inference Acceleration of Large Language Models

We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fine-tuning pretrained LLMs on specialized tasks, while inducing sparsity in their weights. On the...

arxiv.org

1

2

9

Neural Magic

@neuralmagic

8 months

For a quick paper summary and the impact of this research on the industry, read @robertshaw21 's recent blog:

Sparse Fine-Tuning for Accelerating Large Language Models with DeepSparse - Neural Magic

The arrival of capable open-source large language models (LLMs) like MosaicML’s MPT and Meta’s Llama 2 has made it easier for enterprises to explore generative AI to address their business challeng...

neuralmagic.com

1

0

4

Neural Magic

@neuralmagic

8 months

Want to try it out on your CPUs today? Go to the DeepSparse GitHub repo and find the steps in the README.