Neural Magic @neuralmagic Twitter profile | Pikagi

Pikagi

Neural Magic

@neuralmagic

4,875

Followers

1,663

Following

191

Media

730

Statuses

Deploy the fastest ML on CPUs and GPUs using only software. GitHub: #sparsity #opensource

Boston, MA

https://t.co/5gS44nhG45

Joined May 2018

Don't wanna be here? Send us removal request.

Pinned Tweet

@neuralmagic

Neural Magic

18 hours

If you use @vllm_project for #LLM inference serving, you must register for our open office hours on June 5. You’ll get direct access to two vLLM maintainers, @mgoin_ and @simon_mo_ , to ask any question you have. Can’t make June 5? Join us on June 20!

Tweet card media

Neural Magic and vLLM Office Hours - Join Us

Join vLLM and Neural Magic for "office hours" on optimized LLM inference and accelerated production deployments using Neural Magic and vLLM.

neuralmagic.com

0

1

4

Last Seen Profiles

@BerondongBwi

@Mister_J_91

@ekade420

@MadeenAhmed

@Ivy_Cangaro

@zinnyukachi

@yudi_tigers

@SJSportsZone

@folsom_baseball

@hokutoshi_kouko

@TheReelRejects

@Tukuma_

@ShayneSpear24

@DukeOnDemand

@TamaraLBennett1

@SusanQuaggin

@ZeskyTheFurry

@abwtmym14780

@MichelleOl15873

@ThanasisBakolas

@getwhizz

@vicky_gardiner

@WEX_NSFW

@sats_names

@SheepEndy

@YorukEnver

@hennesseyuncle

@wilsjackson4

@CaseCTRL_

@rob_alcock1

@TLyO1dthPibIZXe

@UnitedChartsuv

@ChristyEri28354

@habanxro

@rileybxby

@Ang

@neuralmagic

Neural Magic

1 year

Accelerate your @huggingface 🤗 Inference Endpoints with DeepSparse to achieve 43x CPU speedup and 97% cost reduction over @PyTorch . Side note: DeepSparse is even faster than a T4 GPU 🤯 Learn more in our blog:

Tweet media one

12

89

575

@neuralmagic

Neural Magic

2 years

HOT OFF THE PRESS! Neural Magic introduces sparsity and software-only ML execution to @MLPerf , boosting CPU performance 175X! 175X! Yes, you read that right. Read more about this amazing feat and replicate our results:

Tweet media one

4

57

454

@neuralmagic

Neural Magic

2 years

Transformers are huge. They are not efficient in deployment. But no worries. You can sparsify them with a few lines of code using SparseML: Result? More compression and better inference performance at the same accuracy. P.S. Same goes for CV models!

Tweet media one

5

77

355

@neuralmagic

Neural Magic

1 year

Did you know that you can use SparseGPT to apply one-shot sparsification to make large language models run faster on CPUs? This 🧵 explores how sparsification works, why it's a game changer for LLMs, and how to apply it to your models today. #Sparsification #LanguageModels

Tweet media one

3

20

280

@neuralmagic

Neural Magic

7 months

We applied our latest sparse fine-tuning research on the MPT-7b model, resulting in a 75% pruned model that doesn't drop accuracy. 🤯 75% fewer parameters means we can now run LLM inference performantly on commodity CPUs. @_mwitiderrick shares the details:

Tweet card media

Sparse LLM Inference on CPU

5

52

256

@neuralmagic

Neural Magic

2 years

DeepSparse Engine runs DL models on everyday CPUs at GPU speeds! For latency-sensitive applications, it makes a 4-core Intel Macbook more performant than a T4 GPU and an 8-core server more performant than a V100 GPU. 🤯 <-- If this is not you right now, read this tweet again!

Tweet media one

10

29

201

@neuralmagic

Neural Magic

7 months

Sparsity makes LLMs go 🚀 🚀 🚀 ….on ordinary CPUs. Here’s how:

3

47

202

@neuralmagic

Neural Magic

1 year

Deploying a GPT-175B requires 5 A100 80GB GPUs, each costing $15,000. That's $75,000 for inference 💰 You can reduce the model’s size by removing 50% of the weights without losing accuracy 🤯 Let's explore how to do that with the #SparseGPT algorithm. -A quick thread- 🧵👇

5

25

129

@neuralmagic

Neural Magic

2 years

DeepSparse Engine makes YOLOv5 models go super fast! 🚀 It outperforms ONNX Runtime by 4x, achieving real-time processing of 60fps on an 8-core CPU and 30fps on a 4-core CPU! 🤯 #objectdetection cc @ultralytics

Tweet media one

3

18

103

@neuralmagic

Neural Magic

2 years

Ready to turn your CPUs into machine learning supercomputers? Listen to our founder Nir Shavit and @ykilcher break the possibilities down in this fun interview. Say goodbye to specialized hardware!

@ykilcher

Yannic Kilcher 🇸🇨

2 years

How to make your CPU as fast as a GPU? 🔥 Nir Shavit explains how clever algorithms can make use of sparsity in neural networks to deliver unprecedented inference speed, without any need for specialized hardware! Watch here:

Tweet media one

6

56

345

0

16

92

@neuralmagic

Neural Magic

11 months

We sparsified @ultralytics YOLOv8 models to be faster for inference on CPUs. This video shows a sparse-quantized YOLOv8m model running in real-time on a 4-core system, showing a 5x speedup over ONNX Runtime. 🚀🚀🚀 Best part? All sparse YOLOv8 models are open sourced!

2

9

76

@neuralmagic

Neural Magic

1 year

Accelerate your #NLP pipelines with sparse transformers! You can get 3x #CPU performance increase by optimizing your models with only a few lines of code. 1/3

Tweet media one

2

9

73

@neuralmagic

Neural Magic

5 years

We're excited to announce our $15 million funding round led by @ComcastVentures , and including @NEA , @A16Z , @Pillar_VC and @Amdocs -- as well as the rollout of our early access program! Check out the news in @TechCrunch

Tweet card media

Neural Magic gets $15M seed to run machine learning models on commodity CPUs | TechCrunch

Neural Magic, a startup founded by a couple of MIT professors, who figured out a way to run machine learning models on commodity CPUs, announced a $15

1

12

44

@neuralmagic

Neural Magic

1 year

#MLPerf Inference v3.0 results are out! We delivered a 6X improvement over our previous submission 6 months ago, elevating our overall CPU performance to an astounding 1,000X while reducing power consumption by 92%. This is the power of software. Details 👇

Tweet media one

2

6

42

@neuralmagic

Neural Magic

7 months

🚨 Our New LLM Research 🚨 We show how finetuning and sparsity come together to enable accurate LLMs that can be deployed on CPUs with DeepSparse. The result is a ~7x CPU speedup for a finetuned @MosaicML 's MPT-7B model vs. the FP32 baseline. 🙏 @ISTAustria for collaboration!

Tweet media one

5

11

42

@neuralmagic

Neural Magic

3 years

We carried a 4-core laptop around Boston, comparing runs of sparsified #YOLOv5 object detection model running on the #DeepSparse Engine and #ONNXRuntime . End result: Pruning + INT8 quantization = 10x faster and 12x smaller model. Replicate our results:

3

8

41

@neuralmagic

Neural Magic

2 years

BERT-Large (345M parameters) is now faster than the much smaller DistilBERT (66M parameters) with accuracy of the larger BERT-Large! It delivers 8x latency speedup on commodity CPUs 🚀 🙏 @ZafrirOfir , @guybd35 , @markurtz_ & teams for fantastic collab!

Tweet media one

0

8

36

@neuralmagic

Neural Magic

2 years

Ready for some image classification magic? 🪄🎩 Here's a 29x compressed ResNet-50, from the original 98 MB to 3.3 MB. It's sparse-quantized. It recovers to 99% of the baseline accuracy! 🤯🤯 Download it from .

0

9

36

@neuralmagic

Neural Magic

7 months

Check out QUIK, a new quantization method that leads to memory and runtime savings of 3-4x on OPT, LLaMA2, and Falcon models.

@DAlistarh

Dan Alistarh

7 months

Happy to release QUIK, a new accurate post-training quantization method which processes the majority of weights and activations using 4bit precision. [1/N] With @AshkboosSaleh @elias_frantar @thoefler Paper: Code: Snapshot:

Tweet media one

7

36

159

0

7

35

@neuralmagic

Neural Magic

2 years

DeepSparse 0.12 has landed! More speedups on AMD EPYC processors. Additional AVX2 performance. Hugging Face Transformers integrations update. A Streamlit app for deploying DeepSparse & exploring the inference performance of BERT. A brand new readme! Etc.

Tweet card media

Releases · neuralmagic/deepsparse

Sparsity-aware deep learning inference runtime for CPUs - neuralmagic/deepsparse

0

7

31

@neuralmagic

Neural Magic

2 years

@giffmana For inference specifically, check out the DeepSparse Engine. It decreases compute needed to execute a network by taking advantage of sparsity and large CPU memory. We’ll be at CVPR if you’d like to see a demo!

Tweet card media

GitHub - neuralmagic/deepsparse: Sparsity-aware deep learning inference runtime for CPUs

Sparsity-aware deep learning inference runtime for CPUs - neuralmagic/deepsparse

0

1

29

@neuralmagic

Neural Magic

2 years

If you are among the majority of data scientists that are NOT optimizing their deep learning models for production, this workshop is for you!

Tweet media one

1

2

28

@neuralmagic

Neural Magic

1 year

Do you use hardware accelerators like GPUs, TPUs, and IPUs when deploying your ML? If so, you probably haven't been able to realize acceptable performance with reasonable costs for scaled deployments. Agree? Check out this 🧵on why you should deliver AI using only software:

Tweet media one

1

3

25

@neuralmagic

Neural Magic

1 year

Exciting AI news! Striveworks partners with Neural Magic to bring fast GPU-less model deployment options to their Chariot MLOps platform. No more expensive GPU infrastructure needed. Stay ahead with cutting-edge tech. #MLOps #AI #MachineLearning #chariot

Tweet card media

Striveworks Partners with Neural Magic to Add Fast GPU-less Model Deployment Options in Chariot...

/PRNewswire/ -- Today Striveworks and Neural Magic announced their partnership. Striveworks, a pioneer in responsible MLOps for national security and other...

www.prnewswire.com

0

6

27

@neuralmagic

Neural Magic

1 year

The *ultimate* guide to #imagesegmentation deployment is here, and we walk you through deploying real-time inference on CPUs without sacrificing performance👨‍💻 Learn more on the blog:

Tweet card media

Image Segmentation: Your Ultimate Guide to Easy Deployment and Fast Inferencing - Neural Magic

The essential reference to image segmentation models and optimization considerations.

neuralmagic.com

0

4

28

@neuralmagic

Neural Magic

1 year

You can now inference very large language models on a single GPU, using new one-shot weight quantization. Nice job @elias_frantar , @DAlistarh , and everyone that's pioneering this research at #NeurIPS2022

@thoefler

Torsten Hoefler 🇨🇭

1 year

Wondering how to run a #GPT -3 sized 175 billion parameter language model on a single #GPU ? GPTQ makes it possible with 3-bit weight quantization; work led by @elias_frantar and @DAlistarh . Relevant in this week's #NeurIPS2022 discussions :-)!

Tweet media one

Tweet media two

3

12

48

1

12

26

@neuralmagic

Neural Magic

2 years

You can accurately classify 21,000+ images every second using only software and commodity CPUs. Is it time to say goodbye to GPUs in ML inference?

0

6

24

@neuralmagic

Neural Magic

2 years

Meet oBERT! 😻 A series of 90% sparse BERT-Base models that recover to 99% of the baseline accuracy. 🤯 A 12-layer version of #oBERT outperforms #DistilBERT . A 3-layer version outperforms #TinyBERT . 🤯🤯 oBERT makes single-digit #NLP latency on CPUs the new normal! #NLProc

Tweet media one

2

5

23

@neuralmagic

Neural Magic

11 months

100 FPS YOLOv8s on a 4-core CPU 🤯 GPUs are no longer needed for AI inference. Software always wins. Come see it at #CVPR2023 and talk to us about doing the same using our open source optimization libraries and CPU runtime.

Tweet media one

1

3

23

@neuralmagic

Neural Magic

7 months

Say goodbye to the snail-paced LLM processing! 🐌 Our new blog reveals how DeepSparse + @OpenAI 's API = lightning-fast, local LLMs. ⚡️ If you are using OpenAI now, you can start using open-source LLM locally, without changing your code. Here's how:

Tweet card media

Integrating DeepSparse With OpenAI’s API for Fast Local LLMs - Neural Magic

Since OpenAI's introduction of ChatGPT, developers worldwide have widely embraced the OpenAI API as the go-to solution for making API requests to their language models. However, in response to the...

neuralmagic.com

0

4

23

@neuralmagic

Neural Magic

1 year

🚀 Exciting AI news from @neuralmagic ! Optimize large language models effortlessly with our software and deploy them on commodity CPUs using #DeepSparse for lightning-fast inference. Unleash unparalleled performance, scalability, and cost efficiency. And get to deployment

Tweet media one

1

4

23

@neuralmagic

Neural Magic

1 year

This is the team that’s assembling the future of machine learning. Meet them at #NeurIPS2022 this week to see what they’re up to and grab some of that cool swag! @mgoin_ @markurtz_ @finkatronic @robertshaw21

Tweet media one

2

3

23

@neuralmagic

Neural Magic

1 year

Ever wondered how to achieve lightning-fast AI inference speeds without compromising accuracy? 🚀 Sparsity holds the key! 🔐 On June 15, we'll show how you can apply sparsity to YOLOv8 models to deploy them on regular CPUs at GPU speeds. 🗓 🗓

Tweet card media

High-performance inference serving solutions for you to deploy leading open-source LLMs. #SoftwareDeliveredAI

www.youtube.com

0

4

23

@neuralmagic

Neural Magic

7 months

Everyone loves @LangChainAI for building GenAI apps! 🚀 But Devs must pick pricey APIs or clunky GPUs. 😫 Enter DeepSparse to turbocharge your LLMs on CPUs. ⚡️ Get 7X faster text generation anywhere there's a CPU, from cloud to edge. 💪 #AI #LangChain

Tweet media one

1

5

22

@neuralmagic

Neural Magic

1 year

🚨 Introducing Sparse BioBERT, our new #NLP model that is specifically fine-tuned on four unique #biology related datasets. Benchmarked against its dense variant on the ArXiv-biology dataset, the performance improvement was over 9x faster for BioBERT⚡️

Tweet card media

Revolutionizing Biology Research With Lightning-Fast NLP: Introducing Sparse BioBERT - Neural Magic

BioBERT capabilities and our benchmark results that enable a biologist's research approach.

neuralmagic.com

1

4

21

@neuralmagic

Neural Magic

1 year

🏁 Accelerate #YOLOv8 with Neural Magic's DeepSparse by 10x! Developed by our partner @ultralytics , YOLOv8 takes #objectdetection to the next level with its anchor-free design.

1

5

20

@neuralmagic

Neural Magic

1 year

Yup, you can run LLMs super fast and accurately on commodity CPU infrastructure using SparseGPT and DeepSparse, our inference runtime. 🤯 Is it time to say goodbye to GPUs at inference? 👋 ✌️ Watch our latest video on SparseGPT and let us know!

Tweet media one

1

3

20

@neuralmagic

Neural Magic

1 year

Meet extractive question answering with sparse transformers! You can now search 1000s of documents super quickly, for cheaper, at the accuracies that are expected in todays world. Learn more: #nlp #sparsity

Tweet card media

Search Documents Quickly with Extractive Question Answering & Sparse Transformers - Neural Magic

Accelerate your extractive question answering efforts with sparse transformers. A step-by-step guide.

neuralmagic.com

0

5

18

@neuralmagic

Neural Magic

1 year

Neural Magic’s DeepSparse Runtime is now available in the AWS Marketplace! Get GPU-class performance on regular EC2 instances. Read our launch blog to learn how easy it is to deploy CV and NLP models with just a few clicks.

Tweet card media

Neural Magic’s DeepSparse Inference Runtime Now Available in the AWS Marketplace (Part 1 of 3-Blog...

DeepSparse Runtime is now available in the AWS Marketplace. Get GPU-class performance on regular EC2 instances. Deploy CV and NLP models with just a few clicks.

neuralmagic.com

0

4

18

@neuralmagic

Neural Magic

1 year

Compound sparsity FTW! 💯 Neural Magic's recent #MLPerf benchmarks show a 92% more energy-efficient NLP execution compared to other providers. ♻️ Compound sparsity techniques and smart inferencing in DeepSparse are pushing the boundaries of #AI efficiency! #Sustainability

Tweet media one

0

1

17

@neuralmagic

Neural Magic

2 years

Here's the recording from our workshop yesterday: We covered the pros and cons of model optimization, SOTA optimization algorithms and techniques, example-driven ways to apply optimizations to #NLP and #CV models, and more!

Tweet card media

Workshop: How to Optimize Deep Learning Models for Production

Topics covered:1. What model pruning is, including benefits and downsides;2. SOTA pruning algorithms and techniques that you can implement today;3. SparseML,...

www.youtube.com

0

4

18

@neuralmagic

Neural Magic

1 year

Large ML models are slow and expensive to maintain when in deployment, although 90% of the parameters don't influence the outputs at all. Hear from @markurtz_ and @dwhitena about how you can optimize models to run super fast on available and cheaper CPUs. There's more. A 🧵:

@PracticalAIFM

Practical AI 🤖

1 year

🤘 New episode of Practical AI! 💡 Large models on CPUs 🤩 with @markurtz_ from @neuralmagic 🎙 hosted by @dwhitena 🗂 #ai #machinelearning 💚

0

0

7

1

5

19

@neuralmagic

Neural Magic

1 year

Optimizing ML models for deployment is not an option. Particularly for enterprises that want to lower their computing costs while improving production performance. Our team suggests three model optimization techniques to consider before deploying your next model. A 🧵:

1

3

17

@neuralmagic

Neural Magic

1 year

SparseGPT uses a pruning mask to set weights not in the mask to 0 and the rest to their current value. In this article, we explore the internal workings of SparseGPT in more detail.

Tweet card media

SparseGPT: Remove 100B Parameters For Free - Neural Magic

Compress large language models (LLMs) with SparseGPT to make your machine learning inference fast and efficient. Prune in one-shot with minimal accuracy loss.

neuralmagic.com

1

3

17

@neuralmagic

Neural Magic

1 year

Building ML models in isolation can be lonely and stressful. 🤖 In a community, you can get ideas and your questions answered quickly. 🔥 A community of like-minded people can help you ship models faster. 🚢 Here's why you should join the Neural Magic Community in Slack:

Tweet media one

1

4

15

@neuralmagic

Neural Magic

2 years

@Fra_Pochetti Thanks for the shoutout! You can give DeepSparse a run for free on our website across numerous NLP and computer vision use cases. And feel free to join our Slack community for any questions and general ML performance discussions!

2

1

15

@neuralmagic

Neural Magic

1 year

CPUs > GPUs for AI inference. Start with Neural Magic for free to deliver GPU-class performance on regular CPUs across various CV and NLP use cases. Deliver the same accuracy, smaller models, and much better AI performance. 90-day free trial link 👇

Tweet card media

The Case for Running AI on CPUs Isn't Dead Yet

GPUs may dominate, but CPUs could be perfect for smaller AI models

spectrum.ieee.org

1

1

15

@neuralmagic

Neural Magic

1 year

Our 2022 #YearInReview is live. We released version 1.0, continued pushing the boundaries of sparsity, and furthered incredible integration efforts with @huggingface , @awscloud , @googlecloud , @weaviate_io , just to name a few😉

Tweet card media

2022 Year in Review at Neural Magic - Neural Magic

Neural Magic: Goodbye 2022 Hello 2023!

neuralmagic.com

1

3

16

@neuralmagic

Neural Magic

3 years

Thank you @NEA , @a16z , @Amdocs , @ComcastVentures , @pillar_vc , and Ridgeline Partners for your continued support. And shoutout to our employees for making it all happen.

1

6

15

@neuralmagic

Neural Magic

1 year

Are you still training ML models with default, slow data loaders? That's a lot of time and money wasted! Here's why you should use Deep Lake by @activeloopai data loader to train your models 2x faster:

1

2

16

@neuralmagic

Neural Magic

1 year

1/ Are you familiar with #SparseTransferLearning ? It allows anyone to convert an ML model into a smaller, faster, and sometimes even more accurate variant than its dense counterpart. We created a video tutorial using transformer models from the @huggingface model hub. More 👇

1

3

16

@neuralmagic

Neural Magic

1 year

By deploying DENSE models, you are: 1⃣ Wasting compute resources 💸 2⃣ Delivering sub-par inference performance 📉 SPARSE models offer higher throughput and lower latency without affecting accuracy. Check out how to deploy sparse models on @huggingface Spaces, for free 👇

Tweet media one

1

2

15

@neuralmagic

Neural Magic

1 year

Achieve 7.5x real-time instance segmentation on a standard quad-core laptop CPU with DeepSparse and YOLACT🚀 Learn more in our blog:

Tweet media one

0

0

15

@neuralmagic

Neural Magic

1 year

Bringing speed to object detection, while optimizing and simplifying your #YOLOv5 deployment🔥 We’re thrilled to announce our collaboration with @ultralytics , read more about the partnership on the blog🧑‍💻

Tweet card media

Deploy YOLOv5 With Neural Magic’s DeepSparse for GPU-Class Performance on CPUs - Neural Magic

We’ve partnered with Ultralytics to optimize and simplify your YOLOv5 deployment.

neuralmagic.com

0

5

15

@neuralmagic

Neural Magic

11 months

Forget specialized AI hardware. Get the performance you need on commodity CPUs with compound sparsity and sparsity-aware inference execution. Here's how.

2

2

14

@neuralmagic

Neural Magic

2 years

We are attending @ieeeICIP in Bordeaux, France right now! Our experts @dtransposed & @GulinKonstantin are showcasing the power of sparsity and our inference engine that delivers GPU-class performance on CPUs. Come over, say hi, and get ready to be amazed by our tech! #ICIP2022

Tweet media one

Tweet media two

0

4

14

@neuralmagic

Neural Magic

2 years

🧵Our Deep Sparse Platform provides a suite of software tools to deploy sparse deep learning models on CPUs. There are multiple ways to plug into the DeepSparse Engine and run sparse models like ResNet-50 at accelerated speeds, but what is sparsification and why should you care?

1

3

13

@neuralmagic

Neural Magic

1 year

Join us at the virtual AWS Startup Showcase on March 9. We’ll be in the company of AI/ML startups building foundation model infrastructure. Learn from an exclusive speaker lineup from @awscloud , @anyscalecompute , @astronomerio , @octoml , @roboflow , @huggingface Free reg. 👇

Tweet media one

1

5

13

@neuralmagic

Neural Magic

1 year

Seamlessly auto-deploy sparse (read: smaller, faster, equally as accurate) NLP models on AWS Lambda using this blog!

Tweet card media

Deploy DeepSparse on AWS Lambda - Neural Magic

A step-by-step guide on deploying serverless machine learning inference on AWS Lambda with DeepSparse

neuralmagic.com

0

7

14

@neuralmagic

Neural Magic

11 months

We've optimized @ultralytics YOLOv8 models with SOTA sparsification techniques, resulting in 10x smaller and 8x faster models! Our video shows how you can apply our sparsity techniques to your #CV use cases to deliver fast and accurate inference on CPUs.

Tweet media one

0

3

14

@neuralmagic

Neural Magic

2 years

Neural Magic 1.2 release is here! Here's what's included: 1⃣ Easier way for you to get started with Neural Magic 2⃣ Even more performance for throughput use cases 3⃣ You can now trial Neural Magic in production for free, for up to 90 days And more:

Tweet media one

0

5

12

@neuralmagic

Neural Magic

2 years

We are opening our Lunch & Learn to the wider community of practitioners and researchers interested in simpler and more efficient ML performance! We'd love for you to join us. Learn more and save your spot here: #MachineLearning #DeepLearning

Tweet media one

0

4

13

@neuralmagic

Neural Magic

1 year

It's possible to eliminate 90% of the BERT-Base weights to gain significant speedup without sacrificing model accuracy. Meet oBERT, an #opensource 90% pruned+quantized BERT-Base model that delivers the fastest CPU inference. Apply it to your use case:

Tweet card media

Accelerate Your Customer Review Classification - Neural Magic

A step-by-step guide on using sparse transformers for faster customer review classification on even longer customer reviews.

neuralmagic.com

0

1

13

@neuralmagic

Neural Magic

1 year

You’ve heard that making #ML models smaller improves performance. An inference runtime also plays a major role. For example, deploying a dense YOLOv8 model with DeepSparse leads to a 10X performance boost compared to the ONNX Runtime. A 🧵:

Tweet media one

1

2

12

@neuralmagic

Neural Magic

2 years

Want to build an efficient vector search on CPUs? Then this podcast is for you! @mgoin_ and @CShorten30 describe how Weaviate and Neural Magic make it easier and more economical to build ML deployments by enabling sparse inference acceleration on CPUs.

Tweet media one

0

7

13

@neuralmagic

Neural Magic

2 years

We are excited to introduce a zero-shot feature to our sparsity-aware engine, DeepSparse, to help developers classify text faster at lower costs using commodity CPUs. Read our blog for the benefits of *sparse* zero-shot and easy ways to get started:

Tweet card media

Faster Zero-Shot Learning via Sparsity - Neural Magic

We are excited to introduce a zero-shot learning feature to our sparsity-aware engine, DeepSparse, to help developers classify text faster at lower costs.

neuralmagic.com

0

5

12

@neuralmagic

Neural Magic

2 years

Ready for 🤯 ? You can sparsify ResNet-50 models up to 95% while retaining 99% of their accuracy! And then deploy them on commodity CPUs at GPU speeds! Learn more, benchmark, and apply your data with only a few lines of code:

Tweet media one

0

2

12

@neuralmagic

Neural Magic

2 years

📣📣 You can now deploy the DeepSparse Engine on AWS SageMaker! 📣📣 In this example, we deployed sparse DistilBERT to achieve a 7x increase in model performance. 📈 You can do the same with a single CLI command! #sagemaker #machinelearning #sparsity

Deploy DeepSparse Engine on AWS SageMaker - Neural Magic

Learn how to deploy sparse DistilBERT with the DeepSparse Engine on AWS SageMaker to achieve a 7x increase in model performance.

neuralmagic.com

0

5

11

@neuralmagic

Neural Magic

4 years

In our new blog series, we kick off this week with a 20-year veteran and jack-of-all-trades when it comes to machine learning and data science: Mani Sarkar! Click to read more!

Tweet media one

0

3

12

@neuralmagic

Neural Magic

11 months

Want to explore the world of sparse transfer learning for transformer NLP models? Sparse transfer learning allows you to make ML models smaller, faster, and sometimes even more accurate than their dense variants. In this tutorial (), @Quantum_Stat

Tweet media one

0

0

12

@neuralmagic

Neural Magic

1 year

🚨 Attention all #AI enthusiasts! 🔥 Don't miss out on the upcoming webinar on Second Order Pruning Algorithms for SOTA Model Compression! 🤖 Learn how to take your models to the next level and stay ahead of the game. Register now: #MachineLearning

0

2

11

@neuralmagic

Neural Magic

1 year

You can now access our last 2022 product release notes before celebrating the new year🥂

Tweet card media

Neural Magic 1.3 Product Release - Neural Magic

Here are new features and improvements in the Neural Magic 1.3 Product Release.

neuralmagic.com

1

5

12

@neuralmagic

Neural Magic

2 years

Fact: Sparsified models are so light that you can load up to 19 sparse BERT models on only 16GB of RAM.

Tweet media one

1

1

12

@neuralmagic

Neural Magic

1 year

One of our very own Neural Magicians, @Quantum_Stat has amassed tips and tricks on how to maximize #chatgpt for #developers and #contentcreators —and we’re ready to share it with all of you🤓 Check out the only ChatGPT cheat sheet you’ll ever need:

0

3

11

@neuralmagic

Neural Magic

1 year

We are excited to be a Google Cloud Build Partner and to bring DeepSparse, our CPU inference runtime, to the @googlecloud Marketplace. You can now deploy fast and accurate AI use cases on the CPUs of choice, offering the performance and flexibility your organization needs.

Tweet media one

1

0

12

@neuralmagic

Neural Magic

1 year

🚀 Achieve 4X faster #NLP and #ComputerVision inference… …by INT8 quantizing and removing 60% of the model weights via one shot. ✂️ Minutes of work make your inference faster, cheaper, and more energy efficient. Learn how 🗓️ 👇

Tweet card media

High-performance inference serving solutions for you to deploy leading open-source LLMs. #SoftwareDeliveredAI

www.youtube.com

0

2

12

@neuralmagic

Neural Magic

1 year

Want to prune your #ML models at higher levels without impacting accuracy? ✂️ Join us for a virtual session 📺 on April 6 where we'll discuss second-order pruning methods that enable higher sparsity by removing weights that directly affect the loss function the least. 1/3

Tweet media one

1

1

12

@neuralmagic

Neural Magic

1 year

SparseGPT is a groundbreaking method that allows you to prune LLMs to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. It allows you to deploy LLMs on CPUs you already own, at GPU speeds. RSVP:

Tweet media one

1

1

11

@neuralmagic

Neural Magic

11 months

This Thursday, June 15, we'll spend 45 minutes showing you how to apply our open-source optimizations to your object detection use cases, so you can deliver best-in-class inference performance on CPUs you already own. RSVP:

Tweet card media

High-performance inference serving solutions for you to deploy leading open-source LLMs. #SoftwareDeliveredAI

www.youtube.com

1

4

10

@neuralmagic

Neural Magic

1 year

Our ultimate guide to #objectdetection is live!🔥 Explore deep learning-based approaches and solutions to obstacles found in real-world applications. 🌟 Want more? SparseZoo makes it easy to access state-of-the-art, sparsified object detection models. 🔗

Tweet card media

Object Detection: Your Ultimate Guide to Easy Deployment and Fast Inferencing - Neural Magic

The essential reference to object detection models and optimization considerations.

neuralmagic.com

0

2

11

@neuralmagic

Neural Magic

2 years

Ready for this? You can 10x your @ultralytics YOLOv5 model on a laptop-grade CPU! Go from 6 FPS to nearly 70 FPS, while recovering to 99% of the baseline accuracy. See it here:

Tweet card media

[Video] Animeshkumar Nayak on LinkedIn: #intel #ubuntu #datascience...

These could be a game changer! Inferencing object detection models on a CPU seems to be slow and sometime non practical. here I have demonstrated and… | 74 comments on LinkedIn

www.linkedin.com

1

0

11

@neuralmagic

Neural Magic

1 year

Excited to share the recording of our recent webinar with all of you! 🎥🔥 @_EldarKurtic did a great job explaining how you can apply second-order pruning algorithms for SOTA model compression. Check it out now:

Tweet card media

Apply Second-Order Pruning Algorithms for SOTA Model Compression

Second-order pruning methods enable higher sparsity while maintaining accuracy by removing weights that directly affect the loss function the least. The end ...

www.youtube.com

1

1

11

@neuralmagic

Neural Magic

1 year

GPUs are becoming scarce. But no need to worry. You can deploy #ML models on a CPU with the same performance as a T4 GPU. Example: DeepSparse (CPU Runtime) and oBERT give you a 4.2X increase in throughput on the WNUT Dataset at the same cost as a T4 GPU. A 🧵:

1

0

10

@neuralmagic

Neural Magic

2 years

What performance do you get on your CPU? pip install deepsparse deepsparse.benchmark --scenario sync zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned85_quant-none-vnni #ComputerVision #ImageClassification #ResNet50 #cpuAI

Tweet media one

0

2

10

@neuralmagic

Neural Magic

1 year

Will you be at #NeurIPS2022 next week? Come by booth #226 to meet our team! We'll be demoing object detection and NLP models running super-fast on commodity CPU hardware. And of course, we'll have fun Neural Magic merch to share, in addition to an AirPods giveaway!

0

3

9

@neuralmagic

Neural Magic

11 months

We're excited to reveal the performance results on the latest 4th Gen AMD EPYC processors. @AMD and Neural Magic are truly pushing the boundaries of AI inference performance density on CPUs. Read our blog for details:

Tweet card media

Advancing AI Inference Density with Neural Magic and AMD - Neural Magic

In an era where the hunger for data is driving an exponential surge in computational demand, organizations are realizing the need for power-efficient commodity compute to support artificial intelli...

neuralmagic.com

0

1

10

@neuralmagic

Neural Magic

8 months

The latest #MLPerf inference results are in and they show #DeepSparse providing ~50x improvements over baseline BERT-Large reference implementation on both AWS ARM and GCP x86 instances. See how and replicate our results today:

Tweet media one

0

0

10

@neuralmagic

Neural Magic

2 years

The DeepSparse Engine is freely available for community use. You can give it a run and replicate the numbers from this graph by visiting our website:

1

0

9

@neuralmagic

Neural Magic

2 years

We pushed 🤗 BERT performance to new heights by also supporting quantization on top of sparsity. End result: 7x speedup over the dense model. 📈 You can easily benchmark sparse performance and apply it to your dataset by following this guide:

0

5

10

@neuralmagic

Neural Magic

7 months

The new C3D machine series, powered by 4th Gen @AMD EPYC™ Processors from @Google Cloud, is now available and great for accelerated AI inference performance for deep learning when you use DeepSparse, @neuralmagic runtime.

Tweet card media

C3D machine series on AMD Genoa now GA | Google Cloud Blog

Now GA, the C3D machine series is a general-purpose VM family that delivers price-performance improvements for a wide variety of workloads.

cloud.google.com

0

4

11

@neuralmagic

Neural Magic

1 year

2x faster inference. Less memory utilization. Simpler pre-processing pipelines. What's not to love about sequence bucketing?

Tweet card media

2x Faster Inference with Sequence Bucketing and DeepSparse - Neural Magic

Use sequence bucketing in DeepSparse to process input data faster. Simplify pre-processing pipelines to decrease memory utilization and inference time.

neuralmagic.com

0

0

10

@neuralmagic

Neural Magic

1 year

Join us TODAY at 1pm EDT for a deep dive into the latest advancements in second-order pruning algorithms for SOTA #modelcompression ! Learn practical techniques to optimize #ML models for efficiency. Confirm your spot: #MachineLearning #ai

0

0

10

@neuralmagic

Neural Magic

1 year

Hey, @mathemagic1an ! SparseGPT paper authors will hold a webinar this Thursday (May 25), showing how you can use open-source tools to apply SparseGPT to LLMs via one shot, so they can be deployed on CPUs at super-fast speeds. More info and reg. form here:

Tweet card media

High-performance inference serving solutions for you to deploy leading open-source LLMs. #SoftwareDeliveredAI

www.youtube.com

@mathemagic1an

Jay Hack

1 year

Can we compress large language models for better perf? "SparseGPT: Massive Language Models can be Accurately Pruned in One Shot" Eliminates the need to use/store 50% of weights for a 175B param model with no significant sacrifice in perf Here's how 👇

Tweet media one

16

145

815

2

2

10

@neuralmagic

Neural Magic

2 years

Announcing the Community Edition 1.0 and 1.1 of our DeepSparse Engine and SparseML libraries, featuring: 👩‍💻New pipelines for info retrieval 🔍 Named entity recognition 🔄 A CustomTaskPipeline 📁 Effective docker management 🔬 Revamped testing

1

1

8

@neuralmagic

Neural Magic

7 months

You can improve the quality of generated text from your LLM by controlling the sampling process with parameters like temperature, top_k, top_p, and repetition_penalty. Scroll on for the why and the how...

Tweet media one

1

0

10

@neuralmagic

Neural Magic

2 years

@aldanajjd @marktenenholtz Not easy indeed. We try to make it a bit easier with SparseML. In a nutshell, SparseML simplifies the sparsification process by encoding the hyperparameters and instructions needed to create accurate pruned and pruned-quantized models like BERT, YOLO, ResNet-50, YOLACT…

0

0

10

@neuralmagic

Neural Magic

2 years

Does this mean the #cybertruck is 54% complete @elonmusk ? #CVPR22

Tweet media one

2

1

8

@neuralmagic

Neural Magic

1 year

Did you lose an hour of sleep due to #DaylightSavingTime ? Don’t worry, you can get it back using Neural Magic 😃 1/ #SparseZoo - prototype from already-optimized ML models 2/ #SparseML - apply your data with a few lines of code 3/ #DeepSparse - run ML in deployment super fast

0

0

9

@neuralmagic

Neural Magic

3 years

Curious how we deliver GPU-class #deeplearning performance on commodity CPUs? It all started at @MIT_CSAIL a few years back. Read the story here:

Tweet media one

0

3

10