Neural Magic Profile Banner
Neural Magic Profile
Neural Magic

@neuralmagic

4,875
Followers
1,663
Following
191
Media
730
Statuses

Deploy the fastest ML on CPUs and GPUs using only software. GitHub: #sparsity #opensource

Boston, MA
Joined May 2018
Don't wanna be here? Send us removal request.
Pinned Tweet
@neuralmagic
Neural Magic
18 hours
If you use @vllm_project for #LLM inference serving, you must register for our open office hours on June 5. You’ll get direct access to two vLLM maintainers, @mgoin_ and @simon_mo_ , to ask any question you have. Can’t make June 5? Join us on June 20!
0
1
4
@neuralmagic
Neural Magic
1 year
Accelerate your @huggingface πŸ€— Inference Endpoints with DeepSparse to achieve 43x CPU speedup and 97% cost reduction over @PyTorch . Side note: DeepSparse is even faster than a T4 GPU 🀯 Learn more in our blog:
Tweet media one
12
89
575
@neuralmagic
Neural Magic
2 years
HOT OFF THE PRESS! Neural Magic introduces sparsity and software-only ML execution to @MLPerf , boosting CPU performance 175X! 175X! Yes, you read that right. Read more about this amazing feat and replicate our results:
Tweet media one
4
57
454
@neuralmagic
Neural Magic
2 years
Transformers are huge. They are not efficient in deployment. But no worries. You can sparsify them with a few lines of code using SparseML: Result? More compression and better inference performance at the same accuracy. P.S. Same goes for CV models!
Tweet media one
5
77
355
@neuralmagic
Neural Magic
1 year
Did you know that you can use SparseGPT to apply one-shot sparsification to make large language models run faster on CPUs? This 🧡 explores how sparsification works, why it's a game changer for LLMs, and how to apply it to your models today. #Sparsification #LanguageModels
Tweet media one
3
20
280
@neuralmagic
Neural Magic
7 months
We applied our latest sparse fine-tuning research on the MPT-7b model, resulting in a 75% pruned model that doesn't drop accuracy. 🀯 75% fewer parameters means we can now run LLM inference performantly on commodity CPUs. @_mwitiderrick shares the details:
5
52
256
@neuralmagic
Neural Magic
2 years
DeepSparse Engine runs DL models on everyday CPUs at GPU speeds! For latency-sensitive applications, it makes a 4-core Intel Macbook more performant than a T4 GPU and an 8-core server more performant than a V100 GPU. 🀯 <-- If this is not you right now, read this tweet again!
Tweet media one
10
29
201
@neuralmagic
Neural Magic
7 months
Sparsity makes LLMs go πŸš€ πŸš€ πŸš€ ….on ordinary CPUs. Here’s how:
3
47
202
@neuralmagic
Neural Magic
1 year
Deploying a GPT-175B requires 5 A100 80GB GPUs, each costing $15,000. That's $75,000 for inference πŸ’° You can reduce the model’s size by removing 50% of the weights without losing accuracy 🀯 Let's explore how to do that with the #SparseGPT algorithm. -A quick thread- πŸ§΅πŸ‘‡
5
25
129
@neuralmagic
Neural Magic
2 years
DeepSparse Engine makes YOLOv5 models go super fast! πŸš€ It outperforms ONNX Runtime by 4x, achieving real-time processing of 60fps on an 8-core CPU and 30fps on a 4-core CPU! 🀯 #objectdetection cc @ultralytics
Tweet media one
3
18
103
@neuralmagic
Neural Magic
2 years
Ready to turn your CPUs into machine learning supercomputers? Listen to our founder Nir Shavit and @ykilcher break the possibilities down in this fun interview. Say goodbye to specialized hardware!
@ykilcher
Yannic Kilcher πŸ‡ΈπŸ‡¨
2 years
How to make your CPU as fast as a GPU? πŸ”₯ Nir Shavit explains how clever algorithms can make use of sparsity in neural networks to deliver unprecedented inference speed, without any need for specialized hardware! Watch here:
Tweet media one
6
56
345
0
16
92
@neuralmagic
Neural Magic
11 months
We sparsified @ultralytics YOLOv8 models to be faster for inference on CPUs. This video shows a sparse-quantized YOLOv8m model running in real-time on a 4-core system, showing a 5x speedup over ONNX Runtime. πŸš€πŸš€πŸš€ Best part? All sparse YOLOv8 models are open sourced!
2
9
76
@neuralmagic
Neural Magic
1 year
Accelerate your #NLP pipelines with sparse transformers! You can get 3x #CPU performance increase by optimizing your models with only a few lines of code. 1/3
Tweet media one
2
9
73
@neuralmagic
Neural Magic
1 year
#MLPerf Inference v3.0 results are out! We delivered a 6X improvement over our previous submission 6 months ago, elevating our overall CPU performance to an astounding 1,000X while reducing power consumption by 92%. This is the power of software. Details πŸ‘‡
Tweet media one
2
6
42
@neuralmagic
Neural Magic
7 months
🚨 Our New LLM Research 🚨 We show how finetuning and sparsity come together to enable accurate LLMs that can be deployed on CPUs with DeepSparse. The result is a ~7x CPU speedup for a finetuned @MosaicML 's MPT-7B model vs. the FP32 baseline. πŸ™ @ISTAustria for collaboration!
Tweet media one
5
11
42
@neuralmagic
Neural Magic
3 years
We carried a 4-core laptop around Boston, comparing runs of sparsified #YOLOv5 object detection model running on the #DeepSparse Engine and #ONNXRuntime . End result: Pruning + INT8 quantization = 10x faster and 12x smaller model. Replicate our results:
3
8
41
@neuralmagic
Neural Magic
2 years
BERT-Large (345M parameters) is now faster than the much smaller DistilBERT (66M parameters) with accuracy of the larger BERT-Large! It delivers 8x latency speedup on commodity CPUs πŸš€ πŸ™ @ZafrirOfir , @guybd35 , @markurtz_ & teams for fantastic collab!
Tweet media one
0
8
36
@neuralmagic
Neural Magic
2 years
Ready for some image classification magic? πŸͺ„πŸŽ© Here's a 29x compressed ResNet-50, from the original 98 MB to 3.3 MB. It's sparse-quantized. It recovers to 99% of the baseline accuracy! 🀯🀯 Download it from .
0
9
36
@neuralmagic
Neural Magic
7 months
Check out QUIK, a new quantization method that leads to memory and runtime savings of 3-4x on OPT, LLaMA2, and Falcon models.
@DAlistarh
Dan Alistarh
7 months
Happy to release QUIK, a new accurate post-training quantization method which processes the majority of weights and activations using 4bit precision. [1/N] With @AshkboosSaleh @elias_frantar @thoefler Paper: Code: Snapshot:
Tweet media one
7
36
159
0
7
35
@neuralmagic
Neural Magic
2 years
DeepSparse 0.12 has landed! More speedups on AMD EPYC processors. Additional AVX2 performance. Hugging Face Transformers integrations update. A Streamlit app for deploying DeepSparse & exploring the inference performance of BERT. A brand new readme! Etc.
0
7
31
@neuralmagic
Neural Magic
2 years
@giffmana For inference specifically, check out the DeepSparse Engine. It decreases compute needed to execute a network by taking advantage of sparsity and large CPU memory. We’ll be at CVPR if you’d like to see a demo!
0
1
29
@neuralmagic
Neural Magic
2 years
If you are among the majority of data scientists that are NOT optimizing their deep learning models for production, this workshop is for you!
Tweet media one
1
2
28
@neuralmagic
Neural Magic
1 year
Do you use hardware accelerators like GPUs, TPUs, and IPUs when deploying your ML? If so, you probably haven't been able to realize acceptable performance with reasonable costs for scaled deployments. Agree? Check out this 🧡on why you should deliver AI using only software:
Tweet media one
1
3
25
@neuralmagic
Neural Magic
1 year
Exciting AI news! Striveworks partners with Neural Magic to bring fast GPU-less model deployment options to their Chariot MLOps platform. No more expensive GPU infrastructure needed. Stay ahead with cutting-edge tech. #MLOps #AI #MachineLearning #chariot
0
6
27
@neuralmagic
Neural Magic
1 year
The *ultimate* guide to #imagesegmentation deployment is here, and we walk you through deploying real-time inference on CPUs without sacrificing performanceπŸ‘¨β€πŸ’» Learn more on the blog:
0
4
28
@neuralmagic
Neural Magic
1 year
You can now inference very large language models on a single GPU, using new one-shot weight quantization. Nice job @elias_frantar , @DAlistarh , and everyone that's pioneering this research at #NeurIPS2022
@thoefler
Torsten Hoefler πŸ‡¨πŸ‡­
1 year
Wondering how to run a #GPT -3 sized 175 billion parameter language model on a single #GPU ? GPTQ makes it possible with 3-bit weight quantization; work led by @elias_frantar and @DAlistarh . Relevant in this week's #NeurIPS2022 discussions :-)!
Tweet media one
Tweet media two
3
12
48
1
12
26
@neuralmagic
Neural Magic
2 years
You can accurately classify 21,000+ images every second using only software and commodity CPUs. Is it time to say goodbye to GPUs in ML inference?
0
6
24
@neuralmagic
Neural Magic
2 years
Meet oBERT! 😻 A series of 90% sparse BERT-Base models that recover to 99% of the baseline accuracy. 🀯 A 12-layer version of #oBERT outperforms #DistilBERT . A 3-layer version outperforms #TinyBERT . 🀯🀯 oBERT makes single-digit #NLP latency on CPUs the new normal! #NLProc
Tweet media one
2
5
23
@neuralmagic
Neural Magic
11 months
100 FPS YOLOv8s on a 4-core CPU 🀯 GPUs are no longer needed for AI inference. Software always wins. Come see it at #CVPR2023 and talk to us about doing the same using our open source optimization libraries and CPU runtime.
Tweet media one
1
3
23
@neuralmagic
Neural Magic
7 months
Say goodbye to the snail-paced LLM processing! 🐌 Our new blog reveals how DeepSparse + @OpenAI 's API = lightning-fast, local LLMs. ⚑️ If you are using OpenAI now, you can start using open-source LLM locally, without changing your code. Here's how:
0
4
23
@neuralmagic
Neural Magic
1 year
πŸš€ Exciting AI news from @neuralmagic ! Optimize large language models effortlessly with our software and deploy them on commodity CPUs using #DeepSparse for lightning-fast inference. Unleash unparalleled performance, scalability, and cost efficiency. And get to deployment
Tweet media one
1
4
23
@neuralmagic
Neural Magic
1 year
This is the team that’s assembling the future of machine learning. Meet them at #NeurIPS2022 this week to see what they’re up to and grab some of that cool swag! @mgoin_ @markurtz_ @finkatronic @robertshaw21
Tweet media one
2
3
23
@neuralmagic
Neural Magic
1 year
Ever wondered how to achieve lightning-fast AI inference speeds without compromising accuracy? πŸš€ Sparsity holds the key! πŸ” On June 15, we'll show how you can apply sparsity to YOLOv8 models to deploy them on regular CPUs at GPU speeds. πŸ—“ πŸ—“
0
4
23
@neuralmagic
Neural Magic
7 months
Everyone loves @LangChainAI for building GenAI apps! πŸš€ But Devs must pick pricey APIs or clunky GPUs. 😫 Enter DeepSparse to turbocharge your LLMs on CPUs. ⚑️ Get 7X faster text generation anywhere there's a CPU, from cloud to edge. πŸ’ͺ #AI #LangChain
Tweet media one
1
5
22
@neuralmagic
Neural Magic
1 year
🚨 Introducing Sparse BioBERT, our new #NLP model that is specifically fine-tuned on four unique #biology related datasets. Benchmarked against its dense variant on the ArXiv-biology dataset, the performance improvement was over 9x faster for BioBERT⚑️
1
4
21
@neuralmagic
Neural Magic
1 year
🏁  Accelerate #YOLOv8 with Neural Magic's DeepSparse by 10x! Developed by our partner @ultralytics , YOLOv8 takes #objectdetection to the next level with its anchor-free design.
1
5
20
@neuralmagic
Neural Magic
1 year
Yup, you can run LLMs super fast and accurately on commodity CPU infrastructure using SparseGPT and DeepSparse, our inference runtime. 🀯 Is it time to say goodbye to GPUs at inference? πŸ‘‹ ✌️ Watch our latest video on SparseGPT and let us know!
Tweet media one
1
3
20
@neuralmagic
Neural Magic
1 year
Meet extractive question answering with sparse transformers! You can now search 1000s of documents super quickly, for cheaper, at the accuracies that are expected in todays world. Learn more: #nlp #sparsity
0
5
18
@neuralmagic
Neural Magic
1 year
Neural Magic’s DeepSparse Runtime is now available in the AWS Marketplace! Get GPU-class performance on regular EC2 instances. Read our launch blog to learn how easy it is to deploy CV and NLP models with just a few clicks.
0
4
18
@neuralmagic
Neural Magic
1 year
Compound sparsity FTW! πŸ’― Neural Magic's recent #MLPerf benchmarks show a 92% more energy-efficient NLP execution compared to other providers. ♻️ Compound sparsity techniques and smart inferencing in DeepSparse are pushing the boundaries of #AI efficiency! #Sustainability
Tweet media one
0
1
17
@neuralmagic
Neural Magic
2 years
Here's the recording from our workshop yesterday: We covered the pros and cons of model optimization, SOTA optimization algorithms and techniques, example-driven ways to apply optimizations to #NLP and #CV models, and more!
0
4
18
@neuralmagic
Neural Magic
1 year
Large ML models are slow and expensive to maintain when in deployment, although 90% of the parameters don't influence the outputs at all. Hear from @markurtz_ and @dwhitena about how you can optimize models to run super fast on available and cheaper CPUs. There's more. A 🧡:
@PracticalAIFM
Practical AI πŸ€–
1 year
🀘 New episode of Practical AI! πŸ’‘ Large models on CPUs 🀩 with @markurtz_ from @neuralmagic πŸŽ™ hosted by @dwhitena πŸ—‚ #ai #machinelearning πŸ’š
0
0
7
1
5
19
@neuralmagic
Neural Magic
1 year
Optimizing ML models for deployment is not an option. Particularly for enterprises that want to lower their computing costs while improving production performance. Our team suggests three model optimization techniques to consider before deploying your next model. A 🧡:
1
3
17
@neuralmagic
Neural Magic
1 year
SparseGPT uses a pruning mask to set weights not in the mask to 0 and the rest to their current value. In this article, we explore the internal workings of SparseGPT in more detail.
1
3
17
@neuralmagic
Neural Magic
1 year
Building ML models in isolation can be lonely and stressful. πŸ€– In a community, you can get ideas and your questions answered quickly. πŸ”₯ A community of like-minded people can help you ship models faster. 🚒 Here's why you should join the Neural Magic Community in Slack:
Tweet media one
1
4
15
@neuralmagic
Neural Magic
2 years
@Fra_Pochetti Thanks for the shoutout! You can give DeepSparse a run for free on our website across numerous NLP and computer vision use cases. And feel free to join our Slack community for any questions and general ML performance discussions!
2
1
15
@neuralmagic
Neural Magic
1 year
CPUs > GPUs for AI inference. Start with Neural Magic for free to deliver GPU-class performance on regular CPUs across various CV and NLP use cases. Deliver the same accuracy, smaller models, and much better AI performance. 90-day free trial link πŸ‘‡
1
1
15
@neuralmagic
Neural Magic
1 year
Our 2022 #YearInReview is live. We released version 1.0, continued pushing the boundaries of sparsity, and furthered incredible integration efforts withΒ  @huggingface , @awscloud , @googlecloud , @weaviate_io , just to name a fewπŸ˜‰
1
3
16
@neuralmagic
Neural Magic
3 years
Thank you @NEA , @a16z , @Amdocs , @ComcastVentures , @pillar_vc , and Ridgeline Partners for your continued support. And shoutout to our employees for making it all happen.
1
6
15
@neuralmagic
Neural Magic
1 year
Are you still training ML models with default, slow data loaders? That's a lot of time and money wasted! Here's why you should use Deep Lake by @activeloopai data loader to train your models 2x faster:
1
2
16
@neuralmagic
Neural Magic
1 year
1/ Are you familiar with #SparseTransferLearning ? It allows anyone to convert an ML model into a smaller, faster, and sometimes even more accurate variant than its dense counterpart. We created a video tutorial using transformer models from the @huggingface model hub. More πŸ‘‡
1
3
16
@neuralmagic
Neural Magic
1 year
By deploying DENSE models, you are: 1⃣ Wasting compute resources πŸ’Έ 2⃣ Delivering sub-par inference performance πŸ“‰ SPARSE models offer higher throughput and lower latency without affecting accuracy. Check out how to deploy sparse models on @huggingface Spaces, for free πŸ‘‡
Tweet media one
1
2
15
@neuralmagic
Neural Magic
1 year
Achieve 7.5x real-time instance segmentation on a standard quad-core laptop CPU with DeepSparse and YOLACTπŸš€ Learn more in our blog:
Tweet media one
0
0
15
@neuralmagic
Neural Magic
1 year
Bringing speed to object detection, while optimizing and simplifying your #YOLOv5 deploymentπŸ”₯ We’re thrilled to announce our collaboration with @ultralytics , read more about the partnership on the blogπŸ§‘β€πŸ’»
0
5
15
@neuralmagic
Neural Magic
11 months
Forget specialized AI hardware. Get the performance you need on commodity CPUs with compound sparsity and sparsity-aware inference execution. Here's how.
2
2
14
@neuralmagic
Neural Magic
2 years
We are attending @ieeeICIP in Bordeaux, France right now! Our experts @dtransposed & @GulinKonstantin are showcasing the power of sparsity and our inference engine that delivers GPU-class performance on CPUs. Come over, say hi, and get ready to be amazed by our tech! #ICIP2022
Tweet media one
Tweet media two
0
4
14
@neuralmagic
Neural Magic
2 years
🧡Our Deep Sparse Platform provides a suite of software tools to deploy sparse deep learning models on CPUs. There are multiple ways to plug into the DeepSparse Engine and run sparse models like ResNet-50 at accelerated speeds, but what is sparsification and why should you care?
1
3
13
@neuralmagic
Neural Magic
1 year
Join us at the virtual AWS Startup Showcase on March 9. We’ll be in the company of AI/ML startups building foundation model infrastructure. Learn from an exclusive speaker lineup from @awscloud , @anyscalecompute , @astronomerio , @octoml , @roboflow , @huggingface Free reg. πŸ‘‡
Tweet media one
1
5
13
@neuralmagic
Neural Magic
1 year
Seamlessly auto-deploy sparse (read: smaller, faster, equally as accurate) NLP models on AWS Lambda using this blog!
0
7
14
@neuralmagic
Neural Magic
11 months
We've optimized @ultralytics YOLOv8 models with SOTA sparsification techniques, resulting in 10x smaller and 8x faster models! Our video shows how you can apply our sparsity techniques to your #CV use cases to deliver fast and accurate inference on CPUs.
Tweet media one
0
3
14
@neuralmagic
Neural Magic
2 years
Neural Magic 1.2 release is here! Here's what's included: 1⃣ Easier way for you to get started with Neural Magic 2⃣ Even more performance for throughput use cases 3⃣ You can now trial Neural Magic in production for free, for up to 90 days And more:
Tweet media one
0
5
12
@neuralmagic
Neural Magic
2 years
We are opening our Lunch & Learn to the wider community of practitioners and researchers interested in simpler and more efficient ML performance! We'd love for you to join us. Learn more and save your spot here: #MachineLearning #DeepLearning
Tweet media one
0
4
13
@neuralmagic
Neural Magic
1 year
It's possible to eliminate 90% of the BERT-Base weights to gain significant speedup without sacrificing model accuracy. Meet oBERT, an #opensource 90% pruned+quantized BERT-Base model that delivers the fastest CPU inference. Apply it to your use case:
0
1
13
@neuralmagic
Neural Magic
1 year
You’ve heard that making #ML models smaller improves performance. An inference runtime also plays a major role. For example, deploying a dense YOLOv8 model with DeepSparse leads to a 10X performance boost compared to the ONNX Runtime. A 🧡:
Tweet media one
1
2
12
@neuralmagic
Neural Magic
2 years
Want to build an efficient vector search on CPUs? Then this podcast is for you! @mgoin_ and @CShorten30 describe how Weaviate and Neural Magic make it easier and more economical to build ML deployments by enabling sparse inference acceleration on CPUs.
Tweet media one
0
7
13
@neuralmagic
Neural Magic
2 years
We are excited to introduce a zero-shot feature to our sparsity-aware engine, DeepSparse, to help developers classify text faster at lower costs using commodity CPUs. Read our blog for the benefits of *sparse* zero-shot and easy ways to get started:
0
5
12
@neuralmagic
Neural Magic
2 years
Ready for 🀯 ? You can sparsify ResNet-50 models up to 95% while retaining 99% of their accuracy! And then deploy them on commodity CPUs at GPU speeds! Learn more, benchmark, and apply your data with only a few lines of code:
Tweet media one
0
2
12
@neuralmagic
Neural Magic
2 years
πŸ“£πŸ“£ You can now deploy the DeepSparse Engine on AWS SageMaker! πŸ“£πŸ“£ In this example, we deployed sparse DistilBERT to achieve a 7x increase in model performance. πŸ“ˆ You can do the same with a single CLI command! #sagemaker #machinelearning #sparsity
0
5
11
@neuralmagic
Neural Magic
4 years
In our new blog series, we kick off this week with a 20-year veteran and jack-of-all-trades when it comes to machine learning and data science: Mani Sarkar! Click to read more!
Tweet media one
0
3
12
@neuralmagic
Neural Magic
11 months
Want to explore the world of sparse transfer learning for transformer NLP models? Sparse transfer learning allows you to make ML models smaller, faster, and sometimes even more accurate than their dense variants. In this tutorial (), @Quantum_Stat
Tweet media one
0
0
12
@neuralmagic
Neural Magic
1 year
🚨 Attention all #AI enthusiasts! πŸ”₯ Don't miss out on the upcoming webinar on Second Order Pruning Algorithms for SOTA Model Compression! πŸ€– Learn how to take your models to the next level and stay ahead of the game. Register now: #MachineLearning
0
2
11
@neuralmagic
Neural Magic
1 year
You can now access our last 2022 product release notes before celebrating the new yearπŸ₯‚
1
5
12
@neuralmagic
Neural Magic
2 years
Fact: Sparsified models are so light that you can load up to 19 sparse BERT models on only 16GB of RAM.
Tweet media one
1
1
12
@neuralmagic
Neural Magic
1 year
One of our very own Neural Magicians, @Quantum_Stat has amassed tips and tricks on how to maximize #chatgpt for #developers and #contentcreators β€”and we’re ready to share it with all of youπŸ€“ Check out the only ChatGPT cheat sheet you’ll ever need:
0
3
11
@neuralmagic
Neural Magic
1 year
We are excited to be a Google Cloud Build Partner and to bring DeepSparse, our CPU inference runtime, to the @googlecloud Marketplace. You can now deploy fast and accurate AI use cases on the CPUs of choice, offering the performance and flexibility your organization needs.
Tweet media one
1
0
12
@neuralmagic
Neural Magic
1 year
πŸš€ Achieve 4X faster #NLP and #ComputerVision inference… …by INT8 quantizing and removing 60% of the model weights via one shot. βœ‚οΈ Minutes of work make your inference faster, cheaper, and more energy efficient. Learn how πŸ—“οΈ πŸ‘‡
0
2
12
@neuralmagic
Neural Magic
1 year
Want to prune your #ML models at higher levels without impacting accuracy? βœ‚οΈ Join us for a virtual session πŸ“Ί on April 6 where we'll discuss second-order pruning methods that enable higher sparsity by removing weights that directly affect the loss function the least. 1/3
Tweet media one
1
1
12
@neuralmagic
Neural Magic
1 year
SparseGPT is a groundbreaking method that allows you to prune LLMs to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. It allows you to deploy LLMs on CPUs you already own, at GPU speeds. RSVP:
Tweet media one
1
1
11
@neuralmagic
Neural Magic
11 months
This Thursday, June 15, we'll spend 45 minutes showing you how to apply our open-source optimizations to your object detection use cases, so you canΒ deliver best-in-class inference performance on CPUs you already own. RSVP:
1
4
10
@neuralmagic
Neural Magic
1 year
Our ultimate guide to #objectdetection is live!πŸ”₯ Explore deep learning-based approaches and solutions to obstacles found in real-world applications. 🌟 Want more? SparseZoo makes it easy to access state-of-the-art, sparsified object detection models. πŸ”—
0
2
11
@neuralmagic
Neural Magic
1 year
Excited to share the recording of our recent webinar with all of you! πŸŽ₯πŸ”₯ @_EldarKurtic did a great job explaining how you can apply second-order pruning algorithms for SOTA model compression. Check it out now:
1
1
11
@neuralmagic
Neural Magic
1 year
GPUs are becoming scarce. But no need to worry. You can deploy #ML models on a CPU with the same performance as a T4 GPU. Example: DeepSparse (CPU Runtime) and oBERT give you a 4.2X increase in throughput on the WNUT Dataset at the same cost as a T4 GPU. A 🧡:
1
0
10
@neuralmagic
Neural Magic
2 years
What performance do you get on your CPU? pip install deepsparse deepsparse.benchmark --scenario sync zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned85_quant-none-vnni #ComputerVision #ImageClassification #ResNet50 #cpuAI
Tweet media one
0
2
10
@neuralmagic
Neural Magic
1 year
Will you be at #NeurIPS2022 next week? Come by booth #226 to meet our team! We'll be demoing object detection and NLP models running super-fast on commodity CPU hardware. And of course, we'll have fun Neural Magic merch to share, in addition to an AirPods giveaway!
0
3
9
@neuralmagic
Neural Magic
11 months
We're excited to reveal the performance results on the latest 4th Gen AMD EPYC processors. @AMD and Neural Magic are truly pushing the boundaries of AI inference performance density on CPUs. Read our blog for details:
0
1
10
@neuralmagic
Neural Magic
8 months
The latest #MLPerf inference results are in and they show #DeepSparse providing ~50x improvements over baseline BERT-Large reference implementation on both AWS ARM and GCP x86 instances. See how and replicate our results today:
Tweet media one
0
0
10
@neuralmagic
Neural Magic
2 years
The DeepSparse Engine is freely available for community use. You can give it a run and replicate the numbers from this graph by visiting our website:
1
0
9
@neuralmagic
Neural Magic
2 years
We pushed πŸ€— BERT performance to new heights by also supporting quantization on top of sparsity. End result: 7x speedup over the dense model. πŸ“ˆ You can easily benchmark sparse performance and apply it to your dataset by following this guide:
0
5
10
@neuralmagic
Neural Magic
7 months
The new C3D machine series, powered by 4th Gen @AMD EPYCβ„’ Processors from @Google Cloud, is now available and great for accelerated AI inference performance for deep learning when you use DeepSparse, @neuralmagic runtime.
0
4
11
@neuralmagic
Neural Magic
1 year
Join us TODAY at 1pm EDT for a deep dive into the latest advancements in second-order pruning algorithms for SOTA #modelcompression ! Learn practical techniques to optimize #ML models for efficiency. Confirm your spot: #MachineLearning #ai
0
0
10
@neuralmagic
Neural Magic
1 year
Hey, @mathemagic1an ! SparseGPT paper authors will hold a webinar this Thursday (May 25), showing how you can use open-source tools to apply SparseGPT to LLMs via one shot, so they can be deployed on CPUs at super-fast speeds. More info and reg. form here:
@mathemagic1an
Jay Hack
1 year
Can we compress large language models for better perf? "SparseGPT: Massive Language Models can be Accurately Pruned in One Shot" Eliminates the need to use/store 50% of weights for a 175B param model with no significant sacrifice in perf Here's how πŸ‘‡
Tweet media one
16
145
815
2
2
10
@neuralmagic
Neural Magic
2 years
Announcing the Community Edition 1.0 and 1.1 of our DeepSparse Engine and SparseML libraries, featuring: πŸ‘©β€πŸ’»New pipelines for info retrieval πŸ” Named entity recognition πŸ”„ A CustomTaskPipeline πŸ“ Effective docker management πŸ”¬ Revamped testing
1
1
8
@neuralmagic
Neural Magic
7 months
You can improve the quality of generated text from your LLM by controlling the sampling process with parameters like temperature, top_k, top_p, and repetition_penalty. Scroll on for the why and the how...
Tweet media one
1
0
10
@neuralmagic
Neural Magic
2 years
@aldanajjd @marktenenholtz Not easy indeed. We try to make it a bit easier with SparseML. In a nutshell, SparseML simplifies the sparsification process by encoding the hyperparameters and instructions needed to create accurate pruned and pruned-quantized models like BERT, YOLO, ResNet-50, YOLACT…
0
0
10
@neuralmagic
Neural Magic
2 years
Does this mean the #cybertruck is 54% complete @elonmusk ? #CVPR22
Tweet media one
2
1
8
@neuralmagic
Neural Magic
1 year
Did you lose an hour of sleep due to #DaylightSavingTime ? Don’t worry, you can get it back using Neural Magic πŸ˜ƒ 1/ #SparseZoo - prototype from already-optimized ML models 2/ #SparseML - apply your data with a few lines of code 3/ #DeepSparse - run ML in deployment super fast
0
0
9
@neuralmagic
Neural Magic
3 years
Curious how we deliver GPU-class #deeplearning performance on commodity CPUs? It all started at @MIT_CSAIL a few years back. Read the story here:
Tweet media one
0
3
10