Optimizing ML models for deployment is not an option. Particularly for enterprises that want to lower their computing costs while improving production performance. Our team suggests three model optimization techniques to consider before deploying your next model. A 🧵: Tweet added by Neural Magic @neuralmagic

Neural Magic

1 year

Optimizing ML models for deployment is not an option. Particularly for enterprises that want to lower their computing costs while improving production performance. Our team suggests three model optimization techniques to consider before deploying your next model. A 🧵:

1

3

17

Neural Magic

@neuralmagic

1 year

1. AC/DC is a training-aware compression technique enabling the training of dense and sparse models simultaneously. It simplifies training workflows and outperforms existing sparse training methods in accuracy. @dtransposed explains AC/DC in this video:

1

0

6

Neural Magic

@neuralmagic

1 year

2. OBC/OBQ is a model quantization and optimization technique that compresses a model in one shot, without retraining, in a single compression step, and with minimal computing cost. Minutes of work result in 4X speedup. Learn how at our next webinar:

Neural Magic

High-performance inference serving solutions for you to deploy leading open-source LLMs. #SoftwareDeliveredAI

www.youtube.com

1

4

Neural Magic

@neuralmagic

1 year

3. OBS pruning technique uses second-order derivatives to remove unimportant weights from a network while adjusting other weights to minimize error. It maintains accuracy and increases performance while decreasing models size 10x: See our NLP example:

oBERT: Compound Sparsification Delivers Smaller Accurate Models for NLP

GPU-Level Latency on CPUs With 10x Smaller Models using oBERT + DeepSparse

neuralmagic.medium.com

1

3

Neural Magic

@neuralmagic

1 year

Our research team collaborates with @ISTAustria to create these cutting-edge model optimization techniques for everyone to use. 🙏 Start with #SparseML to apply these and other methods to your models to enable faster and more efficient AI use cases:

GitHub - neuralmagic/sparseml: Libraries for applying sparsification recipes to neural networks...

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models - neuralmagic/sparseml

github.com

0

2

Replies