Deploying a GPT-175B requires 5 A100 80GB GPUs, each costing $15,000. That's $75,000 for inference 💰 You can reduce the model’s size by removing 50% of the weights without losing accuracy 🤯 Let's explore how to do that with the #SparseGPT algorithm. -A quick thread- 🧵👇 Tweet added by Neural Magic @neuralmagic

Neural Magic

1 year

Deploying a GPT-175B requires 5 A100 80GB GPUs, each costing $15,000. That's $75,000 for inference 💰 You can reduce the model’s size by removing 50% of the weights without losing accuracy 🤯 Let's explore how to do that with the #SparseGPT algorithm. -A quick thread- 🧵👇

5

25

129

Neural Magic

@neuralmagic

1 year

SparseGPT is a post-training pruning method for compressing #LLMs like GPT-3. SparseGPT can prune LLMs in one-shot and with minimal accuracy loss, like OPT-175B to 50% #sparsity . With SparseGPT, you can prune a larger proportion of the weights as the model gets bigger.

1

0

6

Neural Magic

@neuralmagic

1 year

100 billion weights from the large language model can be ignored at inference time, thanks to SparseGPT. This increases the model's throughput while reducing latency. Removing these weights inadvertently leads to a smaller model that is way more economical to deploy.

1

0

6

Neural Magic

@neuralmagic

1 year

SparseGPT uses a pruning mask to set weights not in the mask to 0 and the rest to their current value. In this article, we explore the internal workings of SparseGPT in more detail.

SparseGPT: Remove 100B Parameters For Free - Neural Magic

Compress large language models (LLMs) with SparseGPT to make your machine learning inference fast and efficient. Prune in one-shot with minimal accuracy loss.

neuralmagic.com

1

3

17

billyG88

@billyG881

1 year

@neuralmagic YYYEEEEESSSSSSS LETS GOOOOOO

0

1

billyG88

@billyG881

1 year

@neuralmagic Another game changer - LLMs will be everywhere, and running on every device in a few years... (they are already running on RasPis) HYYYYPEEEEE

0

4

saurabh verma

@asda33681687

1 year

@neuralmagic how much inference per second if 5 A100 80GB gpu?

0

1

enjoypolo🟠

@enjoypolosfu

1 year

@neuralmagic Cant wait to run this on my phone with data processed on-device.

0

Replies