GPUs are becoming scarce. But no need to worry.
You can deploy
#ML
models on a CPU with the same performance as a T4 GPU.
Example: DeepSparse (CPU Runtime) and oBERT give you a 4.2X increase in throughput on the WNUT Dataset at the same cost as a T4 GPU.
A 🧵:
DeepSparse is a sparsity-aware inference runtime that delivers GPU-class performance on commodity CPUs, purely in software, anywhere.
If cost savings are more important than performance, DeepSparse and oBERT can still deliver a 1.5x increase in throughput and a 28% cost savings.
DeepSparse can also achieve a 2x speed improvement for latency-sensitive workloads while still delivering a 6% lower cost.
The cost of production can be reduced by 47% with a 1.3X increase in latency with workloads running at a batch size of 1.
#DeepSparse
allows you to balance between the desired latency, throughput, and cost, so you can pay the model hosting cost within your budget while achieving the preferred performance metrics.