@neuralmagic
Neural Magic
1 year
Accelerate your @huggingface πŸ€— Inference Endpoints with DeepSparse to achieve 43x CPU speedup and 97% cost reduction over @PyTorch . Side note: DeepSparse is even faster than a T4 GPU 🀯 Learn more in our blog:
Tweet media one
12
87
574

Replies

@OmarBessa
Pabli! πŸ”₯πŸ’₯πŸ’«
1 year
0
0
0
@sytelus
Shital Shah
1 year
@neuralmagic @huggingface @PyTorch This is very interesting. Looking at SparseZoo, it appears that you are not able to sparsify MobileNetV2 as well as GPT2. Is there any reason for this?
1
0
1
@ramsri_goutham
Ramsri Goutham Golla
1 year
@neuralmagic @huggingface @PyTorch Are there any text generation (eg: T5) metrics?
1
0
0
@BhavnickMinhas
Bhavnick Minhas (πŸ§‘β€πŸ’»,πŸ’™)
1 year
@neuralmagic @huggingface @PyTorch Compare it with ONNX optimized graphs used with TritonRT for proper benchmarking, not the baselines
0
0
0
@tamedu81
Tamedu
1 year
0
0
0
@sandeepkaushik
Sandeep Sharma
1 year
@neuralmagic @huggingface @PyTorch Impressive. For a moment I read it beats TPUs. I guess Neural Magic will get there one day.
0
0
0
@MuzafferKal_
Muzaffer Kal @🏑 πŸ¦†β©
1 year
@neuralmagic @huggingface @PyTorch Q: What’s the perf loss from sparsification? If I have a tflite network with inference scripts, can I try it with your tools and and see how much sparsity can be achieved? Do we get the source to the inference libraries you generate? have you considered any hw acceleration?
1
0
0
@PastorPastTime
Pastor-rama πŸ’‰πŸ’‰ πŸ’‰πŸ’‰ πŸ’‰
1 year
0
0
3