GPUs are becoming scarce. But no need to worry. You can deploy #ML models on a CPU with the same performance as a T4 GPU. Example: DeepSparse (CPU Runtime) and oBERT give you a 4.2X increase in throughput on the WNUT Dataset at the same cost as a T4 GPU. A 🧵: Tweet added by Neural Magic @neuralmagic

Neural Magic

1 year

GPUs are becoming scarce. But no need to worry. You can deploy #ML models on a CPU with the same performance as a T4 GPU. Example: DeepSparse (CPU Runtime) and oBERT give you a 4.2X increase in throughput on the WNUT Dataset at the same cost as a T4 GPU. A 🧵:

1

0

10

Neural Magic

@neuralmagic

1 year

DeepSparse is a sparsity-aware inference runtime that delivers GPU-class performance on commodity CPUs, purely in software, anywhere. If cost savings are more important than performance, DeepSparse and oBERT can still deliver a 1.5x increase in throughput and a 28% cost savings.

1

0

Neural Magic

@neuralmagic

1 year

DeepSparse can also achieve a 2x speed improvement for latency-sensitive workloads while still delivering a 6% lower cost. The cost of production can be reduced by 47% with a 1.3X increase in latency with workloads running at a batch size of 1.

1

0

Neural Magic

@neuralmagic

1 year

#DeepSparse allows you to balance between the desired latency, throughput, and cost, so you can pay the model hosting cost within your budget while achieving the preferred performance metrics.

1

0

1

Neural Magic

@neuralmagic

1 year

Try DeepSparse for your next model deployment to save money without sacrificing performance or accuracy.

CPU Inference Server - DeepSparse - Neural Magic

Maximize your CPU infrastructure with DeepSparse to run performant computer vision (CV), natural language processing (NLP), and large language models (LLMs).

neuralmagic.com

0

Replies