Today, along with my collaborators at
@GoogleAI
, we announce DreamBooth! It allows a user to generate a subject of choice (pet, object, etc.) in myriad contexts and with text-guided semantic variations! The options are endless. (Thread 👇)
webpage:
1/N
Today, with collaborators at
@Google
, we're excited to announce 🥳🥳HyperDreamBooth🥳 🥳! It's like DreamBooth, but smaller, faster and better. 25x faster. Think of 30 minutes vs. 14 hours for 100 models. And works on a single image!
(Thread 👇)
webpage:
With collaborators
@Google
we're announcing 💫 ZipLora 💫! Merging LoRAs has been a big thing in the community, but tuning can be an onerous process. ZipLora allows us to easily combine any subject LoRA with any style LoRA! Easy to reimplement 🥳
link:
Today, along with collaborators at
@GoogleAI
, we’re excited to announce StyleDrop! It allows a user to generate new images that follow a specific style of their choice given only a single style reference image 🤯 (Thread 👇)
webpage:
Today, with collaborators at Google, we're announcing 🤩RealFill🤩! A generative AI approach to fill missing regions of an image with the content that should have been there. The best way to turn almost perfect pictures into invaluable memories!
page:
AI generated writing *feels* AI-generated at a visceral level, and even if you ask an LLM to make the writing feel or read less AI-generated it horrifically fails and makes it feel even more AI-generated. Any tricks that can help? Any prompts to share?
We are 🔥super excited🔥 to release the Platypus family of finetuned LLMs 🥳🥳. Platypus achieves the top score in the Hugging Face Open LLM Leaderboard 🏆! The main focus of our work is to achieve cheap, fast and powerful refinement of base LLMs.
page:
@cpicciolini
@SamHarrisOrg
Can you address his explanation, namely that you attributed specific positions to those two people which they do not actually hold?
@bradesposito
Interestingly she describes exactly what is wrong with streaming now. Many times I have to see an album cover to remember that it’s great and I should listen to it again!
Super happy to announce that I will be joining
@Google
as a Research Scientist and will be starting tomorrow! Extremely excited by this new step and very grateful for everyone that made this possible. 🥳🥳🥳
🥳 DreamBooth has been accepted to CVPR 2023. And with this comes a *big update* to the paper including the largest evaluation dataset for subject driven generation and an evaluation protocol! Find it in the project webpage:
(a thread)
#Dreambooth
1/N
Our team is looking for student researchers doing a PhD starting in January either full-time or part-time (prefer full-time). If you want to work on new exciting applications and methods like I did with DreamBooth, then please reach out. DMs open.
Excited 🥳🥳🥳to release my first senior author work, done while still a student at BU, with a start studded lineup of collaborators and an incredible student first-author
@ArielNLee
🌻🙌- it's all about differences between Vision Transformers & CNNs 👇
On Distillation of Guided Diffusion Models
abs:
On ImageNet 64x64 and CIFAR-10, approach is able to generate images visually comparable to that of the original model using as few as 4 sampling steps
Today, at NeurIPS, we announce counterfactual simulation testing, a new framework for comparing vastly different network architectures using counterfactuals. We use it to compare the robustness of modern ConvNets and Transformers. (Thread 👇)
webpage:
Our method has some surprising capabilities inherited from large diffusion models. For example it can generate novel art renditions of a subject! Here are some renditions of a specific dog in the style of famous painters.
4/N
So, I quickly implemented the ZipLoRA by 🤗🧨 (Some people have already noticed though)
code:
I hope it helps somehow and feel free to drop your comments and feedback~
Big thanks to the authors for their awesome work 🙌
the more I look at the videos, the more the motions feel like a video game (the walking here), but the appereance of only some videos looks like video game footage. Maybe this model is trained on a lot of game footage? models are good at learning to change style simulated->real
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.
Prompt: “Beautiful, snowy…
Text-to-image diffusion models are extremely powerful and allow for flexible generation of images with complex user captions. One limitation is that controlling the subject’s appearance and identity using text is very hard.
2/N
We can even do realistic viewpoint changes for some subjects which have a strong class prior! Here are some examples of different viewpoints for a cat. Notice the detailed fur patterns in the forehead are conserved. 🤯
7/N
By finetuning a model (Imagen here) with few images of a subject (~3-5), a user can generate variations of the subject. E.g. by controlling the environment and context of the subject. Ever wanted to have a high-quality picture of your dog in Paris (no travel required)?
3/N
My first paper as senior author (done while I was still a PhD student at BU!). So proud of Ariel and grateful for all coauthors 🙏🌸. I feel blessed. Thread coming out tomorrow 🔥
Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing
paper page:
Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional…
@TectonixGEO
@vdbDennis
@xmodesocial
I mean how anonymized is it really if you can track a phone location? You can easily figure out where people live, and identifying the person is one-step away (maybe even a Google search away)
Cool work which proposes a very similar "lower-rank" LoRA like Lightweight DreamBooth that we proposed in our HyperDreamBooth work () but for LLMs. 10x reduction in size, just like in our case!
VeRA: Vector-based Random Matrix Adaptation
paper page:
Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even…
ZipLoRA-pytorch with
@Gradio
demo by
@mk1stats
local demo:
Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (LoRA)…
Finally, our method can generate new images of a subject with different expressions/emotions. Note that the original images of the subject dog here did not exhibit any of these expressions.
8/N
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models
paper page:
Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language…
🚀 Presenting our latest SOTA LLM: OpenOrca-Platypus2-13B 🚀. Kudos to
@ArielNLee
and
@ColeJHunter
and the great people of
@alignment_lab
for topping the Hugging Face leaderboard in the 13B parameter category! Excited by this collaboration.
link:
I’m defending my PhD thesis tomorrow 🎉 at 3pm EST. It’s called: Simulating to Learn. Such a fun journey. Will post the video afterwards. If you want the zoom link send me a dm.
@JxckSweeney
@elonmusk
So you run a Twitter account that tracks Musk's jet purportedly because it is "of service" and "interesting", yet here you are offering to take it down if the amount they pay you is enough? I don't understand.
@JamesTodaroMD
@elonmusk
James, there has been a lot of criticism of the Santa Clara study and it might overestimate positive cases because of the biased sample and the false positive rate of antibody tests. The IFR that I computed of 0.1% with that data would mean prevalence of more than 100% in NYC.
With collaborators
@Google
we're announcing 💫 ZipLora 💫! Merging LoRAs has been a big thing in the community, but tuning can be an onerous process. ZipLora allows us to easily combine any subject LoRA with any style LoRA! Easy to reimplement 🥳
link:
But here is a result I really didn't expect. What surprises me is how well it handles the translation of ideas into arbitrary styles, changing the object shape to fit the style - and following stylistic flourishes and geometrical style components.
Congratulations to
@kihyuk_sohn
,
@dilipkay
and to all authors involved in this work! The list is long and can be found below. For more amazing examples go to the project page.
paper:
project webpage:
Google announces PALP
Prompt Aligned Personalization of Text-to-Image Models
paper page:
Content creators often aim to create personalized images using personal subjects that go beyond the capabilities of conventional text-to-image models. Additionally,…
DreamBooth featured at Google I/O 🥳 on an insane concept: a card game with 7+ Million unique generated characters! Amazing work by the I/O Flip team! 🤯 The first instance of such a card game? (clip linked)
One main difficulty in finetuning a diffusion model using few images is overfitting. We tackle this problem by presenting an autogenous class-specific prior preservation loss. More details in the paper.
9/N
In order to do so we propose an optimized, small, yet very powerful dataset named Open-Platypus, which is a curated subset of open datasets and focuses on enhancing LLMs' STEM and logic proficiency. We release this dataset to the public.
Before diving into technical details, let's explore some impressive examples. StyleDrop can extract the color palette and overall style from this watercolor cat painting, and generate almost anything one can imagine in that same style.
I think
@Scenario_gg
are pushing the limits of DreamBooth in crazy ways. They really are alchemists working with the original DreamBooth idea to make it much stronger and to be able to do more things with it.
We just made creating your next Consistent Character waaaaaaay easier :D
Workflow 1/3
I am sharing THREE workflows this week to using the new "Character Base" LoRAs that we just added to Scenario to:
- Use as a consistent character
- Create a new consistent character from
-…
Our freshly minted ICCV2023 paper: The nice anti-aliasing of mip-NeRF 360, but with most of the speed of Instant NGP. Error rate reductions of 8%-77% compared to either prior technique, and 24x faster than the most accurate NeRF baseline we tried.
We are able to alleviate overfitting using this approach. We show that finetuning without this loss term leads to accelerated overfitting of subject pose and appearance, or context. This decreases generation variability and incorrect scenes.
10/N
We also thank the Imagen team for lending us access to their incredible model. And we deeply thank all of the great people who helped with reviews and feedback (all acknowledged in the paper).
Again, our project website is:
13/13 (END)
@afneil
The study is hard to read. From what I saw it 1. is a retrospective study 2. treats patients that are severely ill, probably later in the course of the disease.
HCQ has in vitro antiviral effects against SARS-CoV-2 and should be used EARLY. Not effective to use it late!
Party time! The SD3 paper made it to arxiv:
Key takeaways:
- flow matching is very nice.
- back to work with
@pess_r
and a fantastic team ♥️
The paper is full of details on improved flow matching, scaling and engineering. Enjoy!
@marwilliamson
The sanctions were primarily targeted towards the regime (who wine and dine at expensive restaurants while the people starve). I just don’t agree with this specific example.
@alexandrosM
@R_H_Ebright
This letter is pretty startling I have to say. As scientists, how could they have been so certain about the origins of the virus about a month after the news of the outbreak? It's always important to have a little bit of doubt when the evidence is not fully there yet
"This Should Be Impossible!" 🥳🥳 Our RealFill work at
@Google
made it into Two Minute Paper (
@twominutepapers
🙏). Truly great presentation of the work!
TLDR: Meet ✨Lumiere✨ our new text-to-video model from
@GoogleAI
!
Lumiere is designed to create entire clips in just one go!
Seamlessly opening up possibilities for many applications:
Image-to-video 🖼️ Stylized generation 🖌️ Video editing 🪩 and beyond.
See 🧵👇
The core idea is to train a HyperNetwork that predicts weight deltas for the diffusion model in order to make it personalized. This initialization is strong enough that, given fast finetuning, we can achieve great identity preservation with impressive editability and variety 🔥
Google presents ObjectDrop
Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Diffusion models have revolutionized image editing but often generate images that violate physical laws, particularly the effects of objects on the scene, e.g.,
Thank you for your time. And thank you to all of my collaborators
@AbermanKfir
, Yuanzhen Li,
@jampani_varun
,
@MikiRubinstein
, Yael Pritch. I had an amazing time working on this with you and am looking forward to future uses of this technology and more research!
12/N
I’m kinda done with people posting screenshots of a single example of a failed LLM query and going “absolutely trash model, XYZ model is much better”
Have seen this happen for every single popular model out there. You would think they’re all bad and never give good answers
These are the input images for our character Anselmo. We generate a fully-fledged comic with Anselmo in new poses, with different accessories and even with text and speech bubbles automatically drawn by the diffusion model! (just prompt for "a [V] cartoon saying XYZ"!)
2/N
How does it work? We use MUSE, a masked Transformer for Text-to-Image Synthesis. (project: ). MUSE seems to have some properties that make it excel at learning and reproducing style.
Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse by
@neuralmagic
with
@Gradio
demo
demo:
run with docker:
duplicate space for private use:
blog:
The first key idea is Lightweight DreamBooth (LiDB), a customized model that is only ~100KB instead of more than 1GB for a typical Stable Diffusion model. This makes it 10k times smaller.