When Nvidia said they were sending me a DGX A100 for machine learning, they didn't say I had to build a room for it!
Say hello to Zubi & Oompa Loompa ๐ btw. these are Maine Coons, they are a bit smaller than tigers. I am not prone to exaggeration.
How big was the Cray-1 ๐ค
NN-SVG is a tool for creating Neural Network architecture drawings parametrically rather than manually!
It also provides the ability to export those drawings to Scalable Vector Graphics (SVG) files, suitable for inclusion in academic papers or web pages
NN-SVG is a tool for creating Neural Network architecture drawings parametrically rather than manually!
It also provides the ability to export those drawings to Scalable Vector Graphics (SVG) files, suitable for inclusion in academic papers or web pages
handcalcs: a library to render Python calculation code automatically in Latex for your Jupyter Notebook!
In a manner that mimics handwritten math: write the symbolic formula, followed by numeric substitutions, and then the result.
In need to draw your system architecture? Don't want to use slow and expensive Microsoft Visio?
Diagrams lets you draw your system architecture in Python code.
It was born for prototyping a new system architecture design without any design tools.
Book lottery! Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python.
@rasbt
Like and you're in the pool for one of three copies!
River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on streaming data.
Gooey โ Turn (almost) any Python command line program into a full GUI application with one line!
pip install Gooey
Don't forget to star the repository!
Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.
GitHub
Paper
Where to get data for your next machine learning project?
An overview of 8 amazing resources to accelerate your next project with data!
- Google Datasets
- Big Bad NLP Datasets
- Hugging Face Datasets
- Papers with Code Datasets
- Open Data on AWS
- Awesome Public Datasets
Awesome Explainable Graph Reasoning! A collection of research papers and software related to explainability in graph machine learning.
@benrozemberczki
Don't forget to spend some star love for the repository!
Schedule your Jupyter Notebooks and send the results as HTML report!
Notebooker executes your Jupyter notebooks when you commit to Git!
Turning your Jupyter Notebook into a production-style web-based report in a few clicks.
The unofficial PyTorch implementation of the Attention Free Transformer by Apple Inc.
$ pip install aft-pytorch
Don't forget to spend some star love for the repository!
StackOverflow implemented its semantic search solution with Weaviate. How did they do it? They used a pre-trained BERT model from the SentenceTransformers library to generate the embeddings. Their reasons for using Weaviate: it's open source, and you can host it on your ownโฆ
I am looking for interns to help me build our Community at
@explosion_ai
- Become our next NLP & Machine Learning advocate
- Build content and projects with our stack
- Paid & full remote
- We will slowly introduce you to the role and help you learn!
Awesome Bioinformatics
A curated list of awesome Bioinformatics software, resources, and libraries. Mostly command line based, and free or open-source.
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series tasks like classification, regression, forecasting, imputation
Kobra a visual programming language for machine learning.
Kobra is designed to help you learn machine learning without needing to learn how to code first.
Insights from an open source influencer
I'm often asked how I get my content, over the years I've built an unusual technology stack for it
Some insights:
Manim is an animation engine for explanatory math videos.
It's used to create precise animations programmatically, as demonstrated in the videos of 3Blue1Brown.
darts a Python library for easy manipulation and forecasting of time series.
It contains a variety of models, from classics such as ARIMA to deep neural networks.
FLAML - Fast and Lightweight AutoML by Microsoft a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically.
Pyinstrument a Python profiler whichs help you optimize your code - make it faster. To get the biggest speed increase you should focus on the slowest part of your program. Pyinstrument helps you find it!
$ pip install pyinstrument
A clone version of Github Copilot.
Instead of using AI, this extension send your search query to google, retrive stackoverflow answers and autocomplete them for you.
@harishkgarg
Book lottery: Machine Learning in Biotechnology and Life Sciences: Build machine learning models using Python and deploy them on the cloud!
Like and you're in the pool for one of three copies!
@PacktPub
GPT-J the open source cousin of GPT-3, everyone can use it!
A 6 billion parameter, autoregressive text generation model trained on The Pile.
@arankomatsuzaki
Product Hunt
GitHub
The Pile dataset
DocArray is a library for nested, unstructured data such as text, image, audio, video, 3D mesh. It allows deep learning engineers to efficiently process, embed, search, recommend, store, transfer the data with Pythonic API.
JinaAI_
Rich a Python library for rich text and beautiful formatting in the terminal!
Rich can also render pretty tables, progress bars, markdown, syntax highlighted source code, tracebacks, and more โ out of the box.
@willmcgugan
LLaMA2-Accessory an open-source toolkit for pre-training, fine-tuning and deployment of Large Language Models and multimodal LLMs
- Pre-training: RefinedWeb & StarCoder
- Single-modal
- Multi-modal fine-tuning
- LLM for API Control
Having trouble getting started with data annotation?
Use bulk labeling to select clusters of text and annotate them ๐คฏ
GitHub Bulk
Binary annotation and a model in the loop
Tutorial
Edit your DataFrame like a spreadsheet! Mito is a Python package that lets you turn your data into an interactive spreadsheet.
Each edit you make in Mito will generate the equivalent Python in the code cell below.
Ploomber is the fastest way to build data pipelines!
Use your favorite editor: Jupyter, VSCode, PyCharm to develop interactively and deploy without code changes as Kubernetes, Airflow, AWS Batch, and SLURM pipelines.
pip install ploomber
Journalism AI โ Quotes extraction for modular journalism - An NLP pipline to extract quotes from news articles using NER, add coreferencing information and format the results for an exploratory search tool!
GitHub
Blog
New release: Aim v3.0.0 an open-source, self-hosted AI experiment tracking tool. Use Aim to deeply inspect hundreds of hyperparameter-sensitive training runs at once.
GitHub
Blog
Web
ipygany: Jupyter into the third dimension - A new interactive widgets library that allows you to visualize and analyze volumetric data in your Jupyter Notebook!
Manim is an animation engine for explanatory math videos.
It's used to create precise animations programmatically, as demonstrated in the videos of 3Blue1Brown.
How to deploy machine learning models as a micro service using FastAPI by
@tiangolo
Advantages of using FastAPI
โข Make your code components reusable
โข Highly maintained
โข Ease of testing
โข Quick in response time
GitHub
How do you create a beautiful interface for your machine learning or data science project?
Handmade from scratch?
Any good tools?
Sure there are incredible tools:
Kornia is a differentiable computer vision library for PyTorch
It consists of a set of routines and differentiable modules to solve generic computer vision problems.
Transformers Interpret is a model explainability tool designed to work exclusively with the
@huggingface
transformers package
Model explainability that works seamlessly with ๐ค transformerss.
Explain your transformers model in just 2 lines of code.
SQLModel is a library for interacting with SQL databases from Python code, with Python objects
It is designed to be intuitive, easy to use, highly compatible & robust
SQLModel is based on Python type annotations, and powered by Pydantic and SQLAlchemy
Extracting information from PDFs or scanned documents is still a challenge! Use the
@huggingface
LayoutLMv3 model and Prodigy to tackle this challenge โจ
Blog
GitHub
NoiseCraft is an open source, browser-based visual programming language & platform for sound synthesis and music making that runs your a web browser
@DrTBehrens
How do you create a beautiful interface for your machine learning or data science project?
Handmade from scratch?
Any good tools?
Sure there are incredible tools:
Kornia is a differentiable computer vision library for PyTorch
It consists of a set of routines and differentiable modules to solve generic computer vision problems.
Pyinstrument a Python profiler whichs help you optimize your code - make it faster. To get the biggest speed increase you should focus on the slowest part of your program. Pyinstrument helps you find it!
$ pip install pyinstrument
DocArray is a library for nested, unstructured data such as text, image, audio, video, 3D mesh. It allows deep learning engineers to efficiently process, embed, search, recommend, store, transfer the data with Pythonic API.
@JinaAI_
Orchest lets you code, run and monitor data pipelines all from your browser!
It's an Airflow alternative that's easier to use.
Instead of configuring cloud infrastructure, simply hit the schedule button in Orchest.
Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you!
Today we're releasing Verba, the Golden RAGtriever
It's completely open source, so you can bring your own data like internal knowledge base and documentation
Use Verba to build your own RAG Retrieval Augmented Generation pipeline and utilize LLMs for internal-based outputsโฆ
RPA for Python a package for doing Robotic Process Automation in Python
$ pip install rpa
Features
โข Web automation
โข Visual automation
โข OCR automation
โข Keyboard automation
โข Mouse automation
San Francisco is when your driver wrote software for the Nasa Pathfinder mission in 1997 ๐ด๐
At first I was a bit skeptical but he knew so much about Assembly, C and some already dead programming languages ๐คฃ what a legend I started in the 90s with Turbo Pascal and Basic pureโฆ
Panel a high-level app and dashboarding solution for Python!
Panel provides tools for easily composing widgets, plots, tables, and other viewable objects and controls into custom analysis tools, apps, and dashboards
@MarcSkovMadsen
@Panel_org
Not enough training data for your NLP project?
Data augmentation to the rescue!
The standard in visual machine learning can also be used in natural language processing.
But it works slightly different
The Synthetic Data Vault (SDV) is a synthetic data generation ecosystem to easily learn single-table, multi-table, and time-series datasets to generate new synthetic data that has the same format and statistical properties as the original dataset.
Cog: Containers for machine learning an open source tool that lets you package machine learning models in a standard, production-ready container!
- Docker containers without the pain
- No more CUDA hell
- Much more
Don't forget to star the repository!
Implementation of Vision Transformer a simple way to achieve SOTA in vision classification with only a single transformer encoder in Pytorch
$ ๐๐๐ ๐๐๐๐๐๐๐ ๐๐๐-๐๐ข๐๐๐๐๐
GitHub
Paper
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model.
It connects optimal credit allocation with local explanations using the classic Shapley values from game theory.
TorchIO medical image preprocessing and augmentation toolkit for deep learning in PyTorch.
Efficiently read, preprocess, sample, augment, and write 3D medical images in deep learning applications.