@neuralmagic
Neural Magic
8 months
🚨 Our New LLM Research 🚨 We show how finetuning and sparsity come together to enable accurate LLMs that can be deployed on CPUs with DeepSparse. The result is a ~7x CPU speedup for a finetuned @MosaicML 's MPT-7B model vs. the FP32 baseline. πŸ™ @ISTAustria for collaboration!
Tweet media one
5
11
42

Replies

@neuralmagic
Neural Magic
8 months
Get the paper here: Kudos to paper authors @_EldarKurtic , Denis Kuznedelev, @elias_frantar , @mgoin_ , @DAlistarh , and wider teams from Neural Magic and ISTA for this remarkable work!
1
2
9
@neuralmagic
Neural Magic
8 months
For a quick paper summary and the impact of this research on the industry, read @robertshaw21 's recent blog:
1
0
4
@neuralmagic
Neural Magic
8 months
Want to try it out on your CPUs today? Go to the DeepSparse GitHub repo and find the steps in the README.
0
0
6
@daryl_imagineai
Daryl
8 months
@neuralmagic @MosaicML @ISTAustria Cool, any tokens per seconds benchmarks for Llama2 70B on Intel cpus?
1
0
2
@cepe__
cepe
8 months
@neuralmagic @MosaicML @ISTAustria Does it work with spanish words?
0
0
0
@bornjre
bornjre
8 months
@neuralmagic @MosaicML @ISTAustria 2x improvement from sparsity, right ? 3.x improvement is from f32 dense to int8 dense ? == 7x
0
0
0