Thanks to the Latent Consistency Model (LCM), we're nearing real-time image diffusion. I've made a simple MJPEG server for generation stream using diffusers img2img pipeline. It's really fun to play with it. Can't wait for the ControlNet version.
try it:
I've made it possible to share you're screen on the real-time Latent Consistency Model demo, thanks to Screen Capture Web API! No need for a custom drawing tool, use your favorite one 🤩. Plus, enjoy musk rickrolling
demo:
Create sentence embeddings in your browser with transformers.js! My guide walks you through generating embeddings and applying UMAP dimension reduction + more - all in JavaScript, no server needed
Great news! The
@huggingface
hub now has the first QR code AI art generator. You only need the QR Code content and a text-to-image prompt idea, or you can upload your image!
Check it out!
Just added ControlNet Canny to the near real-time Latent Consistency Model demo. It's much better than just img2img! Any updates to the UI parameters and prompts happen instantly. Video here at 2x speed
Demo:
InstantID works with ControlNet Pose and LCM, and it might actually work with any ControlNet. The trade-off of using multiple ControlNets results in a slight loss of facial detail.
In a couple of weeks, we went from LCM LoRa getting about ~5fps to now ~17fps with the latest SD-Turbo distilled model. Thanks
@StabilityAI
the quality with SD-Turbo is incredible! Video is at normal speed. see Musk and Gates with curly hair 😅
demo:
Here is the DragGAN Face Inversion
@Gradio
demo. You can upload your image and experiment with some wild edits. Please be patient, as the inversion training process takes approximately 2 minutes 😞
Here's the demo "Enhance This"! It's a surreal image magnifier that creates a high-res version by imagining new details, using the SDXL base model. Thanks to
@RuoyiDu
's DemoFusion research. It takes a ~minute to generate a 2024x2024 image.
I've also put together a text-to-image version so you can get a sense of what it's like to live-prompt a model, thanks to Latent Consistency Model speed. btw I'm not good at prompting. Video is at 4x, as my typing isn't fast.
here:
Check out a new ControlNet face model on the hub by
@JCatrambone
&
@DarthMarkov
, trained on the LAION-Face dataset it works with multiple faces. I've updated the live conditioning
@gradio
component, try the official demo here
Quick test with SDXL Turbo, another amazing, super-fast diffusion model. It works right out of the box with
@diffuserslib
unofficial demo txt2img and img2img:
Try the Meta Segment Anything Model (SAM) right in your browser! It performs both embedding and point prompting inferences, all powered by the Rust Candle Framework compiled to Wasm
Space:
Source:
#webml
Since you all liked the face landmarks component, I made a custom
@Gradio
component for live pose estimation generating a conditioning image for the ControlNet openpose model. It's really fun to play with
try the live demo here
Here's an experimental drawing tool I made to interact with the Text2Human generative model capable of generating humans with clothes and textures. Playing with it and pretending to be a fashion designer is fun. by
@Jiang_Yuming
et al.
@huggingface
Loving the hype for Drag Your GAN! As we wait for its official code release, check out a cool
@gradio
demo I made for its sibling project, UserControllableLT User-Controllable Latent Transformer
#DragGAN
demo:
Here are ten 42M TinyStories models running simultaneously in the browser.
@karpathy
's Llama2.c code has been ported to Rust using the Candle framework and compiled to Wasm.
Word-level timestamps now available on Transformers, I've updated an old demo project for word-level video trimming, using the
@gradio
HighlightedText component as input. It's fun for short videos.
try it now:
Just discovered OSX Automator: attach a keyboard shortcut to run a Python script on highlighted text, instantly swapping with LLM magic! Now proofing grammar on-demand with Mistral 8x7B via
@huggingface
or Mistral 7B with
@ollama
LocalLLM
Now you can train LoRA Diffusion DPO models using
@diffuserslib
, thanks to
@RisingSayak
. Check out this thread for SD2.1 results, and watch LoRA's real-time impact with SD-Turbo.
New ML gem on the hub: LDM3D by
@intel
. This diffusion model generates image & depth from text prompts. Using a custom
@gradio
6dof three.js component you can generate immersive 360-degree views from prompts
demo:
model:
I've been having fun with the Meta MusicGen music generation model - it's mind-blowing! I tweaked the demo to allow: mic input, melody trim, song continuation, and sharing on community discussions
Check the
@gradio
demo here
🔊 Bach's Toccata And Fugue,…
Since
@Gradio
3.0 was released last week, I've built a proof of concept for a video editor where you edit the video by editing the text. Powered by the newest
@Gradio
Blocks API and
@huggingface
automatic speech recognition pipeline.
Here's my initial attempt running NerfStudio as
@huggingface
Space template. Everything's within the same container, both the trainer and the viewer. You can now use our GPUs to train your NeRFs. ps, there are still a few steps you'll need edit on the Dockerfile. I wish the UI…
Here is the link for the
#StableDiffusion
multiplayer experiment.
If the frame is empty, we run text2img. Otherwise, we inpaint/outpaint empty areas. You can zoom, draw a custom mask or override painted areas. Looking forward to seeing what you create.
Here is a quick
@huggingface
Spaces demo for the original PIFu project. From a single image, it can generate a 3D model with colors!! While it's an old method (2 years old 😂) it's still very very impressive
Testing new pix2pix-Turbo in real-time, very interesting GAN architecture that leverages SD-Turbo model. Here I'm using edge2image LoRA single-step inference 🤯
My latest experiment: Since now
@huggingface
transformers include a zero-shot depth estimation model and
@Gradio
has a new 3d model viewer, why not convert the depth map to a 3d object?
Try it yourself, it works really well with selfies
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
paper page:
Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality images from detailed textual descriptions, they often lack the ability to…
Today I'm excited to share that I have joined the
@huggingface
product team as a Frontend Engineer. I'm thrilled to be surrounded by top-notch researchers and developers and to share my passion for Data Visualization and Data Science.
The InstantID demo is live on
@huggingface
amazing work by
@Haofan_Wang
!
Note: It seems you can change the base model to any SDXL. I tested it with sdxl-turbo for 4 steps, and here is the result.
demo:
Another distillation technique by ByteDance Hyper-SD
I love unified LORA multi step support. I made a demo with InstantStyle + ControlNet, all compatible with diffusers
Our Community Inference API now supports Image-to-Image models! Here are some examples of how to use it with our JavaScript library - huggingface.js. Any model with the image-to-image task tag can work.
Of course Controlnet models
lllyasviel/control_v11f1p_sd15_depth
Trying out the new
@gradio
/lite, running the entire Gradio app on a browser with no servers. With some internal tweaks, you can run external js code with it. Here's a Gradio UI demo for the Candle Segment Anything model
Thanks to
@SebastienBubeck
for uploading the weights to the hub! Here's what we can build with it: Phi-2 quantized, running in the browser ~3 tok/s, ~1.57GB artifact. **Video is sped up**
demo:
Rerender Update: You can use various SD base models, such as Analog Diffusion & Stable Diffusion PaperCut, as below. Unbatched processing takes ~10min for 6s of video. While it struggles with fast/large motions, the results are remarkable. I can't wait to see cool videos to come!
Have you tried using llama.cpp with a GPU? Check out this Space Docker template for Llama-2-7B-Chat-GGML. Easy to duplicate and switch models. Inference is lightning fast!
Space:
While
@Gradio
team is working towards the 3.0 release, I've been stress-testing the new Blocks low-level API. This time I've stitched together a zero-shot depth estimation from an image to an autostereogram (Magic Eye) on
@huggingface
Spaces
Finally played with LayerDiffuse Latent Transparency, and it's a lot of fun! You can blend from background, foreground, or just a transparent image.
Made a Gradio demo
I'm very excited that you can now use JS to interact with Hugging Face! I created this interactive
@observablehq
notebook to explore all the supported tasks. Thanks to
@coyotte508
@linesofcodedev
VLMs have a resolution problem, which prevents them from finding small details in large images. In this
@huggingface
community post, I discuss the ways to solve it and describe the details of MC-LLaVA architecture:
I've tried integrating StyleGAN image inversion from pixel2style2pixel, but ended up creating a bizarre face generator😱 Even though the inversion works moderately well, the transformations in the latent space with UserControllableLT didn't work as expected.…
If you need to search for
@huggingface
models, datasets, or spaces in your
@Gradio
app, we now have a hub quick search custom component. Here's an example with mergekit config generator.
AI video to video translation
demo:
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
paper proposes a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos. The framework includes two parts:…
Exciting tool for
#JavaScript
devs from
@Gradio
. The new gradio/client lets you use any Gradio app as an API, providing an easier way to build custom front-end components. Check out this interactive tutorial on
@observablehq
ok, now the Spaces is running on a GPU container, it takes less than 10s to generate a model, you can also download the glTF 3D file. Thanks to
@ak92501
I'm using now u-2-net background removal
Try this new and impressive TTS
@gradio
demo by
@elevenlabsio
, showcasing impressive multilingual capabilities.
"Bonjour, mon ami ¿Cómo estás hoy? Como vai? Ich hoffe, es geht dir gut. La vita è bella, non è vero?. Ας χαμογελάσουμε στη ζωή."
Excited to share new
@huggingface
demos for
@bria_ai_
text-to-image 🚨
As always our models do not violate copyrights, trained on 100% legal data 🌟
BRIA HD offers full HD resolution (1920X1080) for high-quality textures 🚀
👉
Introducing Doodle Dash, an ML-powered web game that runs completely in your browser, thanks to Transformers.js! 🤯
You have 60 seconds to draw as many words as you can, while a neural network tries to guess what you're drawing in real time!
Play here:
Introducing the DDPM inversion Space 🤗-
a new Space for real image editing 🖼️
Based on the very cool edit friendly DDPM inversion method by
@inbarhub
This technique somehow got under the radar and should get more attention🔥
So how does it work? 🧵1/7
Another Space template for you:🧘♀️Fooocus by
@lvminzhang
minimal and magical UI for so you can focus on prompting and generating. Duplicate and use on your own GPU.
live demo:
code: