@neuralmagic
Neural Magic
1 year
GPUs are becoming scarce. But no need to worry. You can deploy #ML models on a CPU with the same performance as a T4 GPU. Example: DeepSparse (CPU Runtime) and oBERT give you a 4.2X increase in throughput on the WNUT Dataset at the same cost as a T4 GPU. A 🧵:
1
0
10

Replies

@neuralmagic
Neural Magic
1 year
DeepSparse is a sparsity-aware inference runtime that delivers GPU-class performance on commodity CPUs, purely in software, anywhere. If cost savings are more important than performance, DeepSparse and oBERT can still deliver a 1.5x increase in throughput and a 28% cost savings.
1
0
0
@neuralmagic
Neural Magic
1 year
DeepSparse can also achieve a 2x speed improvement for latency-sensitive workloads while still delivering a 6% lower cost. The cost of production can be reduced by 47% with a 1.3X increase in latency with workloads running at a batch size of 1.
1
0
0
@neuralmagic
Neural Magic
1 year
#DeepSparse allows you to balance between the desired latency, throughput, and cost, so you can pay the model hosting cost within your budget while achieving the preferred performance metrics.
1
0
1
@neuralmagic
Neural Magic
1 year
Try DeepSparse for your next model deployment to save money without sacrificing performance or accuracy.
0
0
0