π¨ Our New LLM Research π¨
We show how finetuning and sparsity come together to enable accurate LLMs that can be deployed on CPUs with DeepSparse.
The result is a ~7x CPU speedup for a finetuned
@MosaicML
's MPT-7B model vs. the FP32 baseline.
π
@ISTAustria
for collaboration!