Accelerate your
@huggingface
π€ Inference Endpoints with DeepSparse to achieve 43x CPU speedup and 97% cost reduction over
@PyTorch
.
Side note: DeepSparse is even faster than a T4 GPU π€―
Learn more in our blog:
@neuralmagic
@huggingface
@PyTorch
This is very interesting. Looking at SparseZoo, it appears that you are not able to sparsify MobileNetV2 as well as GPT2. Is there any reason for this?
@neuralmagic
@huggingface
@PyTorch
Q:
Whatβs the perf loss from sparsification?
If I have a tflite network with inference scripts, can I try it with your tools and and see how much sparsity can be achieved?
Do we get the source to the inference libraries you generate?
have you considered any hw acceleration?