Hila Chefer@ ICLR’24 Profile
Hila Chefer@ ICLR’24

@hila_chefer

2,021
Followers
204
Following
56
Media
359
Statuses

PhD candidate @TelAvivUni , student researcher @GoogleAI , interested in Deep Learning, Computer Vision, explainable AI

Joined December 2020
Don't wanna be here? Send us removal request.
Pinned Tweet
@hila_chefer
Hila Chefer@ ICLR’24
4 months
TLDR: Meet ✨Lumiere✨ our new text-to-video model from @GoogleAI ! Lumiere is designed to create entire clips in just one go! Seamlessly opening up possibilities for many applications: Image-to-video 🖼️ Stylized generation 🖌️ Video editing 🪩 and beyond. See 🧵👇
77
207
954
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[1/n] Can explainability improve model accuracy? Our latest work shows the answer is yes! We noticed that ViTs suffer from salient issues- their output is often based on supportive signals (background) rather than the actual object
Tweet media one
11
86
448
@hila_chefer
Hila Chefer@ ICLR’24
11 months
Last Sunday, I got to present a tutorial for the ✨first time✨ Alongside the best partner @RisingSayak 🤩 📢 Today, we are happy to release **all** materials from our #CVPR2023 @CVPR tutorial- the slides, code, demos and recordings🚀 👀 recording ⬇️ 1/
Tweet media one
@RisingSayak
Sayak Paul
11 months
Starting today from 9:00 AM Vancouver time 🔥 If you're attending #CVPR2023 (virtually or physically), please do drop by! If you're attending in person, the location is **West 211**. @hila_chefer and I have wrapped our preps.
Tweet media one
2
35
174
6
75
324
@hila_chefer
Hila Chefer@ ICLR’24
11 months
Text-to-image models revolutionized computer vision, but what do they learn about the world?🤔 We present Conceptor 🧐 a method to inspect the inner representation of a concept by decomposing it into a set of interpretable tokens, which reveals surprising semantic structures🧵
7
60
283
@hila_chefer
Hila Chefer@ ICLR’24
2 years
Ever wonder which text tokens CLIP uses to match a text and an image? Using our multi-modal explainability method, your wondering days are over! Code: Notebook:
Tweet media one
Tweet media two
6
23
154
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[1/n] @eccvconf #ECCV2022 paper thread! 1. Image-based CLIP-Guided Essence Transfer (TargetCLIP)- we extract the essence of a target while preserving realism and source identity. 2. No Token Left Behind- we use explainability to stabilize the unreliable CLIP similarity scores.
@_akhaliq
AK
3 years
Image-Based CLIP-Guided Essence Transfer abs: github: new method creates a blending operator that is optimized to be simultaneously additive in both latent spaces
Tweet media one
3
55
259
2
17
126
@hila_chefer
Hila Chefer@ ICLR’24
1 year
Excited to share that we will be presenting our work in person at #NeurIPS2022 @NeurIPSConf ! Interested in leveraging explainability to improve accuracy and robustness? Come check out our poster and chat 🥳 Code: @Gradio demo:
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[1/n] Can explainability improve model accuracy? Our latest work shows the answer is yes! We noticed that ViTs suffer from salient issues- their output is often based on supportive signals (background) rather than the actual object
Tweet media one
11
86
448
4
22
129
@hila_chefer
Hila Chefer@ ICLR’24
11 months
Interested in understanding how diffusion models produce images from simple text prompts? 🧐 Or how Transformers leverage attention to make predictions? 🤔 Join my talk at @forai_ml to hear about my latest research! 🤩 Looking forward to it ✨
@CohereForAI
Cohere For AI
11 months
Our open science community is excited to welcome @hila_chefer to present their work on explainable vision transformer networks, with discussion around interpreting generative models. Join our community to attend: Thanks to @nahidalam for hosting!
Tweet media one
3
6
29
1
16
88
@hila_chefer
Hila Chefer@ ICLR’24
2 months
2024 is the year of text-to-video models 🎥 Join me tomorrow as we dive into ✨Lumiere✨our T2V model from @GoogleAI ! We'll discuss Lumiere's architecture, applications, and more🤩 ⏰ tomorrow at 20:00 IDT (10:00 PDT) 🚀 sign up now:
@hila_chefer
Hila Chefer@ ICLR’24
3 months
Excited to share insights about our text-to-video model✨Lumiere✨ Join us at the Vision-Language Club meetup 📅 We'll explore Lumiere's architecture, applications, and results 💡 Big thanks to @dana_arad4 for the invite! Secure your spot by signing up👇
6
14
77
3
12
83
@hila_chefer
Hila Chefer@ ICLR’24
3 months
Excited to share insights about our text-to-video model✨Lumiere✨ Join us at the Vision-Language Club meetup 📅 We'll explore Lumiere's architecture, applications, and results 💡 Big thanks to @dana_arad4 for the invite! Secure your spot by signing up👇
6
14
77
@hila_chefer
Hila Chefer@ ICLR’24
3 years
Our paper just got accepted to #CVPR2021 ! With @shir_gur and Lior Wolf
@_akhaliq
AK
3 years
Transformer Interpretability Beyond Attention Visualization pdf: abs: github:
Tweet media one
1
17
114
2
9
68
@hila_chefer
Hila Chefer@ ICLR’24
3 years
Very excited to share our latest work- Image-Based CLIP-Guided Essence Transfer (paper will reach arxiv soon)! We develop a method to transfer the semantic essence from a target image to any source image. code: With @BenaimSagie , Roni Paiss, and Lior Wolf
Tweet media one
Tweet media two
3
6
67
@hila_chefer
Hila Chefer@ ICLR’24
3 years
Our paper finally hit arxiv! Check it out 🥳 With @BenaimSagie , Roni Paiss, and Lior Wolf
@_akhaliq
AK
3 years
Image-Based CLIP-Guided Essence Transfer abs: github: new method creates a blending operator that is optimized to be simultaneously additive in both latent spaces
Tweet media one
3
55
259
1
7
64
@hila_chefer
Hila Chefer@ ICLR’24
4 months
Stylized generation 🖌️ is one of my favorite features! We show that Lumiere can stylize motion🏃‍♀️ by the appearance of a single image 🖼️🎨 using various cool styles from StyleDrop (by Kihyuk Sohn, @natanielruizg et al.).
5
4
61
@hila_chefer
Hila Chefer@ ICLR’24
3 years
So excited to share our new work! - we expand to explaining *any* type of Transformer (also co-attention, encoder-decoder). - we remove the complex LRP. - we explain VQA- models actually understand the text + image! - we use explainability to generate segmentation from DETR.
@_akhaliq
AK
3 years
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers pdf: abs: github:
Tweet media one
Tweet media two
2
18
64
1
13
62
@hila_chefer
Hila Chefer@ ICLR’24
4 months
For more details and many more cool results, check out our website!
6
5
61
@hila_chefer
Hila Chefer@ ICLR’24
4 months
💡Lumiere's key observation- Instead of generating short videos and temporally upsampling them, we perform joint spatial and *temporal* downsampling-- increasing both length and quality of the generated videos
2
2
59
@hila_chefer
Hila Chefer@ ICLR’24
1 year
Really excited to finally share our work! Attend-and-Excite has been accepted to #SIGGRAPH2023 🥳 w/ the amazing @yuvalalaluf @YVinker @DanielCohenOr1 and Lior Wolf Text-to-image models are amazingly expressive, but have you tried generating an image of, say, a cat and a dog?
Tweet media one
@_akhaliq
AK
1 year
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models abs: project page: github:
Tweet media one
3
76
344
2
18
58
@hila_chefer
Hila Chefer@ ICLR’24
4 months
Lumiere can also edit videos based on text with and without an input mask 🪩
2
2
49
@hila_chefer
Hila Chefer@ ICLR’24
4 months
I have been extremely fortunate to work on this with the most talented team @omerbartal @omer_tov Charles Herrmann @Roni_Paiss @ShiranZada @arielephrat @JunhwaHur Yuanzhen Li, Tomer Michaeli @oliver_wang2 @DeqingSun @talidekel and our great team leader ✨ @InbarMosseri
2
1
42
@hila_chefer
Hila Chefer@ ICLR’24
9 months
Thanks for inviting me @twelve_labs ! 🤩🥳 Classifier interpretability is fascinating and important, but it’s about time we start understanding generative models too 🧑🏼‍🎨🚨 Interested in taking a look under the hood of diffusion models? 🧐 Sign up and join us tomorrow! 👇
@twelve_labs
Twelve Labs (twelvelabs.io)
9 months
@evonotivo @activeloopai @hila_chefer will present her research "The Hidden Language of Diffusion Models", which proposes a novel interpretability method for text-to-image diffusion models 🎨
1
3
8
0
10
36
@hila_chefer
Hila Chefer@ ICLR’24
2 years
Thanks for sharing our work @ak92501 ! Check out the thread with more details on the paper 👇 @Gradio demo coming soon!
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[1/n] Can explainability improve model accuracy? Our latest work shows the answer is yes! We noticed that ViTs suffer from salient issues- their output is often based on supportive signals (background) rather than the actual object
Tweet media one
11
86
448
0
9
34
@hila_chefer
Hila Chefer@ ICLR’24
2 years
Check out our new @Gradio demo ()- Even for out of domain inputs such as images generated by DALL-E 2 or animations, our method corrects the original model to produce a plausible prediction! (code: )
Tweet media one
Tweet media two
@_akhaliq
AK
2 years
Optimizing Relevance Maps of Vision Transformers Improves Robustness abs: github: show that by finetuning the explainability maps of ViTs, a significant increase in robustness can be achieved
Tweet media one
0
23
98
0
10
34
@hila_chefer
Hila Chefer@ ICLR’24
3 years
Very happy to announce that our paper got accepted as to #iccv2021 as oral!!!
@_akhaliq
AK
3 years
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers pdf: abs: github:
Tweet media one
Tweet media two
2
18
64
1
5
34
@hila_chefer
Hila Chefer@ ICLR’24
2 years
I had the pleasure of speaking at @Columbia ’s vision seminar, kindly hosted by @SongShuran , @sy_gadre . My talk focused on using Transformer explainability algorithms to improve performance of downstream tasks (e.g. image editing). Check it out :)
0
2
32
@hila_chefer
Hila Chefer@ ICLR’24
1 year
Attention in now everywhere! Interested in understanding attention and how to use it for downstream tasks? Come check out our tutorial at @CVPR #CVPR2023 with the one and only @RisingSayak ! Super excited to see all of you there and discuss the future of attention in vision 🥳
@RisingSayak
Sayak Paul
1 year
Incredibly excited to announce our @CVPR tutorial w/ the amazing @hila_chefer ! "All Things ViTs: Understanding and Interpreting Attention in Vision" Come for cool visualizations, exclusive insights, & interesting approaches ❤️ Catch the details here ⬇️
Tweet media one
4
77
367
0
5
26
@hila_chefer
Hila Chefer@ ICLR’24
10 months
Amazing accomplishment! @YVinker is on a roll with 2 best papers in a row 😱 Congrats 🎉🥳
@YVinker
Yael Vinker🎗
10 months
I'm super excited to share that our paper Word-as-Image received an honorable mention award this year at #SIGGRAPH2023 ! 👩‍🎨 This is such a HUGE honor for me and the team! Thanks for the recognition, looking forward to seeing you in LA soon!🤩
13
5
90
2
0
24
@hila_chefer
Hila Chefer@ ICLR’24
1 year
Check out our project website for more information: A special thanks to the amazing team from Hugging Face @_akhaliq @hysts12321 @YiYiMarz @RisingSayak for creating an awesome demo: and integrating our code into diffusers!
Tweet media one
2
6
22
@hila_chefer
Hila Chefer@ ICLR’24
3 years
@ak92501 Thanks! Another example with targets that weren't inverted (directions are initialized randomly):
Tweet media one
2
0
22
@hila_chefer
Hila Chefer@ ICLR’24
11 months
[3] Evidently, the model represents dual-meaning concepts by interpolating both meanings, even if only one object is generated. For example, a crane borrows its structure from the bird 🦤. When removing the token “stork” from the decomposition the structure changes significantly
Tweet media one
3
1
19
@hila_chefer
Hila Chefer@ ICLR’24
11 months
Thrilled to host @MokadyRon in our upcoming #CVPR2023 tutorial! 🤩 Ron’s works on leveraging attention for image editing are truly inspiring, and demonstrate how powerful the attention mechanism can be! ✨ For more details, check out:
@MokadyRon
Ron Mokady
11 months
Thrilled to share that I'll be speaking (remotely) at the amazing #CVPR2023 tutorial on "Attention"! 🎉😃 by @hila_chefer @RisingSayak Join me as I dive into the importance of attention for diffusion-based image editing 👼.
Tweet media one
0
3
31
0
2
19
@hila_chefer
Hila Chefer@ ICLR’24
2 years
2/2 papers accepted to #ECCV2022 ! Works were done with @BenaimSagie , Roni Paiss, and Lior Wolf. Details to come soon 🥳 See you in Tel Aviv! 🎉
@eccvconf
European Conference on Computer Vision #ECCV2024
2 years
#ECCV2022 list of paper IDs of accepted papers is NOW available!
10
34
148
1
0
18
@hila_chefer
Hila Chefer@ ICLR’24
3 years
@ak92501 We added support for CLIP on our repo! Check out our repo and colab notebook for more examples and details :)
Tweet media one
Tweet media two
2
1
17
@hila_chefer
Hila Chefer@ ICLR’24
11 months
[2] Conceptor also extracts the tokens from the decomposition that correspond to a specific image! We find that Stable Diffusion links concepts based on non-trivial semantic features such as texture and shape, e.g., sweet peppers are generated as peppers🌶️ shaped like fingers 💅
Tweet media one
1
0
17
@hila_chefer
Hila Chefer@ ICLR’24
2 years
Check out our colab notebook where you can experiment with our fine-tuned models and compare them to the original ones:
Tweet media one
1
0
13
@hila_chefer
Hila Chefer@ ICLR’24
2 years
We only use 3 labeled examples for 500 classes, and achieve a large increase in robustness (ImageNet-A, R, ObjectNet)
Tweet media one
1
0
13
@hila_chefer
Hila Chefer@ ICLR’24
9 months
Super excited to be a part of the first @huggingface event in Tel Aviv 🥳alongside such incredible speakers 🤩 Thanks to the amazing @linoy_tsaban for making it happen 🌸
@linoy_tsaban
Linoy Tsaban🎗️
9 months
LETS GO🤩 (First!) Hugging Face meetup in Tel Aviv, September 4th🤗 Featuring an amazing group of speakers🔥: @hila_chefer @MokadyRon @RinonGal @EladRichardson @omerbartal You have a cool demo you’d like to showcase? Demo registration also is open! 🚀:
Tweet media one
4
8
57
0
0
14
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[7/n] No Token Left Behind- RT @_akhaliq : We find that CLIP similarity scores can be unreliable since they rely on a small subset of the text tokens. E.g., the noisy image scores higher due to the influence of the words "image of" on the similarity score.
Tweet media one
@_akhaliq
AK
2 years
No Token Left Behind: Explainability-Aided Image Classification and Generation abs:
Tweet media one
0
26
103
1
2
14
@hila_chefer
Hila Chefer@ ICLR’24
2 years
@eccvconf #ECCV2022 is right around the corner!🥳 @BenaimSagie and I would be happy to meet! feel free to DM and let's grab coffee 🙂 Also, come check out our posters (at Hall B)🤓 TargetCLIP - Tuesday 11:00-13:30 No token left behind with @Roni_Paiss - Thursday 15:30-17:30
1
1
13
@hila_chefer
Hila Chefer@ ICLR’24
2 years
In this work, we show that one can *directly optimize the explanations*- i.e. use a loss on the explainability signal to ensure that the classification is based on the *right reasons*- the foreground and not the background.
Tweet media one
1
0
12
@hila_chefer
Hila Chefer@ ICLR’24
3 years
So excited to be able to share our work! Hope you’ll find it useful :) With @shiretzet and Lior Wolf.
@Arxiv_Daily
arXiv Daily
3 years
Transformer Interpretability Beyond Attention Visualization by Hila Chefer et al. including @shiretzet #TransformationNetwork #ComputerVision
0
13
30
1
4
13
@hila_chefer
Hila Chefer@ ICLR’24
11 months
Check out our project website for more details and examples: It was a pleasure working on this project during my internship at @GoogleAI ! Thanks to my amazing co-authors @OranLang , @megamor2 , and our advisors Michal Irani, Inbar Mosseri, and Lior Wolf! 🙏
1
0
12
@hila_chefer
Hila Chefer@ ICLR’24
4 months
@WilliamLamkin @GoogleAI @huggingface Thanks @WilliamLamkin ! I think you broke a world record in paper-reading speed! Not sure about a demo ☹️ but I'll take this opportunity to highlight that this was a joint effort by many great people, e.g., this awesome investigation of temporal consistency was done by @ZadaShiran
Tweet media one
1
0
12
@hila_chefer
Hila Chefer@ ICLR’24
2 years
Amazing results!
@_akhaliq
AK
2 years
Volumetric Disentanglement for 3D Scene Manipulation abs: project page:
0
32
149
0
1
11
@hila_chefer
Hila Chefer@ ICLR’24
3 years
@ak92501 Also, check out our colab notebook- where you can edit your own image with one of our directions:
Tweet media one
1
0
11
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[2/n] This phenomenon results in poor generalization to domain shifts- a bus is classified as a snowplow due to the snow, a lemon is classified as a golf ball due to the grass, a forklift is classified as a garbage-truck due to the garbage
Tweet media one
1
0
9
@hila_chefer
Hila Chefer@ ICLR’24
9 months
Excited to share our #ICCV2023 work! 🥳 Great effort led by @idansc and @vesteinns 🎉
@idansc
Idan Schwartz
9 months
Thanks @_akhaliq ! Code for our #ICCV23 paper is now available! :) Discriminative Class Tokens enable direct editing from a pre-trained classifier, allowing fine-grained edits and more. See project page for more details: Code:
3
16
79
0
0
10
@hila_chefer
Hila Chefer@ ICLR’24
11 months
[4] Conceptor also enables fine-grained semantic editing by manipulating the coefficients of tokens in the decomposition. For example, we can make a sculpture more abstract 👩‍🎨. This manipulation also allows us to visualize the impact of each token in the learned decomposition 🖼️
Tweet media one
1
0
9
@hila_chefer
Hila Chefer@ ICLR’24
11 months
[1] We observe that some concepts such as “a president” and “a rapper” are represented as simple interpolations of famous personalities (e.g., “Obama”, and “Biden” for presidents, “Tupac”, and “Drake” for rappers), suggesting that the model tends to learn from examples💡
Tweet media one
1
2
9
@hila_chefer
Hila Chefer@ ICLR’24
3 months
@natanielruizg It is also worth mentioning that the method for the stylized generation is based on weight interpolation between the original and fine-tuned text-to-image weights. This is inspired by a great work on GANs by @Buntworthy and @Norod78 !
0
1
8
@hila_chefer
Hila Chefer@ ICLR’24
3 years
examples for the joker:
Tweet media one
0
0
8
@hila_chefer
Hila Chefer@ ICLR’24
2 years
@JacobGildenblat [1/n] Thanks @JacobGildenblat ! That's a great question, of course, this thread is a TLDR for the work, but I'll try to answer it in multiple parts. Part 1: the method does not entirely eliminate the use of the background (it is, as you said, useful) see example below
Tweet media one
1
0
8
@hila_chefer
Hila Chefer@ ICLR’24
2 years
The text explainability scores can also explain cases where the similarity scores are unstable, in the below example, the noisy image scores higher than a dog for the text "an image of a dog" due to the influence of the words "image of" on the similarity score.
Tweet media one
0
0
7
@hila_chefer
Hila Chefer@ ICLR’24
1 year
Amazing work by the authors! Can’t wait to try it out!
@_akhaliq
AK
1 year
Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:
10
141
670
0
0
7
@hila_chefer
Hila Chefer@ ICLR’24
11 months
[5] The obtained decompositions capture biases that are not easily detectable from the generated images, such as “millennials” for the concept “drinking” 🍺.
Tweet media one
1
0
7
@hila_chefer
Hila Chefer@ ICLR’24
2 years
It turns out that CLIP uses a sparse subset of the input tokens to determine the similarity score:
Tweet media one
0
0
7
@hila_chefer
Hila Chefer@ ICLR’24
11 months
Thanks to my great partner @RisingSayak for the incredible work on this despite the time differences and visa issues ❤️ And to our wonderful guest speaker @MokadyRon for a fascinating and inspiring talk! 🥳
0
0
7
@hila_chefer
Hila Chefer@ ICLR’24
9 months
@giffmana @chriswolfvision Or maybe it could mean that the paper assignment does not always align with the reviewers’ expertise? I have been assigned 2/6 papers that are fairly far from my area of research and it was very hard to provide a high quality review.
2
0
5
@hila_chefer
Hila Chefer@ ICLR’24
11 months
@DigThatData Very interesting work! Thanks for sharing! Seems to correspond nicely to the results we found when decomposing concepts (“president” is a linear combination of presidents, “rapper” is a linear combination of rappers, and even “dog” is a linear combination of dog breeds)
@hila_chefer
Hila Chefer@ ICLR’24
11 months
[1] We observe that some concepts such as “a president” and “a rapper” are represented as simple interpolations of famous personalities (e.g., “Obama”, and “Biden” for presidents, “Tupac”, and “Drake” for rappers), suggesting that the model tends to learn from examples💡
Tweet media one
1
2
9
0
0
6
@hila_chefer
Hila Chefer@ ICLR’24
4 months
0
1
6
@hila_chefer
Hila Chefer@ ICLR’24
3 months
@EladRichardson @kfir99 @yuvalalaluf Well deserved! Congrats 🎊🍾
0
1
6
@hila_chefer
Hila Chefer@ ICLR’24
2 months
Amazing opportunity to work with @MokadyRon 🤩
@MokadyRon
Ron Mokady
2 months
TL;DR: We are looking for Researchers 🌟 We at @bria_ai_ , having recently secured funding, are set to train 3 foundational models this year (One of them is T2I, with more details to come) If you wish to train large models 💪but in a startup environment 🚀 - please DM
3
2
26
0
0
6
@hila_chefer
Hila Chefer@ ICLR’24
3 years
We use explainability for DETR and generate segmentation masks from a model only trained for object detection!
Tweet media one
1
0
6
@hila_chefer
Hila Chefer@ ICLR’24
8 months
Another inspiring work by the amazing @YVinker 🤩
@YVinker
Yael Vinker🎗
8 months
Excited to share that "Inspiration Tree" was accepted to #SIGGRAPH Asia 2023! 🥳 We show how diffusion models can assist humans in extracting *different aspects* of existing concepts, to provide inspiration for design 🎨👩‍🎨🖌️ webpage 👉
6
24
123
1
1
6
@hila_chefer
Hila Chefer@ ICLR’24
2 years
@eccvconf [3/n] TargetCLIP - We demonstrate that using CLIP guidance and the powerful StyleGAN, we can extract an essence vector- a vector of semantic properties that correspond to the “signature characteristics” that are also identified by humans as related to the specific target.
Tweet media one
2
0
6
@hila_chefer
Hila Chefer@ ICLR’24
3 years
@ak92501 Our notebook is expanding with more interesting targets that weren't inverted :) Doc Brown (Back to the Future), Morgan Freeman, Beyonce, and Ariel (The Little Mermaid) ideas for additional targets are always welcome :)
Tweet media one
Tweet media two
0
1
5
@hila_chefer
Hila Chefer@ ICLR’24
4 months
@_akhaliq Thanks for sharing @_akhaliq ! Check out this thread for more details :)
@hila_chefer
Hila Chefer@ ICLR’24
4 months
TLDR: Meet ✨Lumiere✨ our new text-to-video model from @GoogleAI ! Lumiere is designed to create entire clips in just one go! Seamlessly opening up possibilities for many applications: Image-to-video 🖼️ Stylized generation 🖌️ Video editing 🪩 and beyond. See 🧵👇
77
207
954
0
0
4
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[10/n] No Token Left Behind- Additionally, we demonstrate that using the image explainability heatmaps, it is possible to generate an image from a given layout of bounding boxes!
Tweet media one
1
0
5
@hila_chefer
Hila Chefer@ ICLR’24
1 year
To mitigate this, we introduce Generative Semantic Nursing (GSN). We implement GSN using an attention-based formulation -- Attend and Excite -- to refine SD's attention to attend to ALL the subject tokens from the input prompt. This is done ON the fly, so NO retraining!
Tweet media one
2
1
5
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[n/n] joint work with @idansc and Lior Wolf from Tel-Aviv university
1
0
4
@hila_chefer
Hila Chefer@ ICLR’24
1 year
Amazing work! @ShellySheynin congrats 🥳
@ShellySheynin
Shelly Sheynin
1 year
Check out our new text-to-4D work! 🥳🥳
5
17
191
0
0
5
@hila_chefer
Hila Chefer@ ICLR’24
2 years
@giffmana Cool work! We have a #NeurIPS2022 paper that deals with spurious correlations too: We apply a loss to restrict the explainability maps to focus on the correct parts of the image, which improves robustness significantly with a short fine-tuning process :)
Tweet media one
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[1/n] Can explainability improve model accuracy? Our latest work shows the answer is yes! We noticed that ViTs suffer from salient issues- their output is often based on supportive signals (background) rather than the actual object
Tweet media one
11
86
448
2
0
5
@hila_chefer
Hila Chefer@ ICLR’24
4 months
@nickfloats Thanks for sharing @nickfloats ! Motion stylization is one of my favorite findings of our work! 🙏 Linking here the full thread with more details 🤩
@hila_chefer
Hila Chefer@ ICLR’24
4 months
TLDR: Meet ✨Lumiere✨ our new text-to-video model from @GoogleAI ! Lumiere is designed to create entire clips in just one go! Seamlessly opening up possibilities for many applications: Image-to-video 🖼️ Stylized generation 🖌️ Video editing 🪩 and beyond. See 🧵👇
77
207
954
1
1
5
@hila_chefer
Hila Chefer@ ICLR’24
2 years
@eccvconf [2/n] TargetCLIP- Code: Humans identify characteristics independently from the target identity. For example, the joker is identified by the signature hair color, face, and makeup regardless of the identity of the actor.
Tweet media one
1
0
5
@hila_chefer
Hila Chefer@ ICLR’24
3 years
Ever wonder if a model simply guessed an answer to a visual question? We show that models can connect the text to the image and answer based on joint information!
Tweet media one
Tweet media two
Tweet media three
1
0
4
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[8/n] No Token Left Behind- We demonstrate that explainability can be used as an additional loss to penalize similarity scores that are not based on the semantic text tokens.
Tweet media one
1
0
4
@hila_chefer
Hila Chefer@ ICLR’24
5 months
@talidekel Very well deserved! Congrats Tali 🎊🎉🍾
0
0
1
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[9/n] No Token Left Behind- Using this additional explainability loss, we demonstrate that downstream tasks such as image classification and image generation and editing that use CLIP guidance can be significantly improved.
Tweet media one
Tweet media two
1
0
4
@hila_chefer
Hila Chefer@ ICLR’24
1 year
@yuvalalaluf @YVinker @DanielCohenOr1 We find that Stable Diffusion suffers from catastrophic neglect, meaning it simply ignores entire part of the prompt
Tweet media one
Tweet media two
1
1
4
@hila_chefer
Hila Chefer@ ICLR’24
1 year
Happening today! Come see us at poster session 2 :) 4PM-6PM (CST) @ Hall J #428
0
0
4
@hila_chefer
Hila Chefer@ ICLR’24
2 years
@JacobGildenblat Part 2: Rephrasing from the ImageNet-A paper: "image classification datasets contain `spurious cues' or `shortcuts'. For instance, cows tend to co-occur with green pastures... models may predict `cow', using primarily the green pasture background cue."
Tweet media one
1
0
4
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[4/n] TargetCLIP - The extracted essence vectors are global and can be applied to any source while maintaining realism, and the original identity of the source.
Tweet media one
1
0
4
@hila_chefer
Hila Chefer@ ICLR’24
1 year
Work done with my amazing co-authors @idansc and Lior Wolf at @TelAvivUni
0
0
4
@hila_chefer
Hila Chefer@ ICLR’24
10 months
@baifeng_shi Cool work! Congrats! 🎉 We actually have a similar work published at NeurIPS’22 that shows that by a simple optimization process on the attention maps, a significant improvement to robustness can be achieved:
3
1
4
@hila_chefer
Hila Chefer@ ICLR’24
2 years
@pbaylies @danielrussruss Thanks for sharing this cool application of our work! I definitely noticed this too. Using 16 resulted in more fragmented heatmaps, and more artifacts. My best guess is that this is a training issue since I didn’t notice this issue working with ViT-B/16 explainability on imagenet
1
0
3
@hila_chefer
Hila Chefer@ ICLR’24
2 years
@BDehbozorgi83 @illc_amsterdam Thanks @BDehbozorgi83 , looking forward to it! 🚀
1
0
3
@hila_chefer
Hila Chefer@ ICLR’24
11 months
This investigation of dual-meaning concepts was inspired by a great work by @RoyiRassin et al. Our findings show that their conclusions generalize even if only one object is generated in the image 💡
1
0
3
@hila_chefer
Hila Chefer@ ICLR’24
1 year
Amazing results!
@_akhaliq
AK
1 year
CLIPascene: Scene Sketching with Different Types and Levels of Abstraction abs: project page:
2
57
274
0
0
3
@hila_chefer
Hila Chefer@ ICLR’24
2 years
[5/n] TargetCLIP - As an alternative to long optimization, we show that one can also fine-tune an inversion encoder to output the essence vector of a target, allowing for instant extraction of the essence for each target!
Tweet media one
1
0
3
@hila_chefer
Hila Chefer@ ICLR’24
3 years
Our method is able to transfer semantics from out of domain images
Tweet media one
Tweet media two
1
0
3
@hila_chefer
Hila Chefer@ ICLR’24
3 years
examples with churches:
Tweet media one
Tweet media two
Tweet media three
0
0
3
@hila_chefer
Hila Chefer@ ICLR’24
3 years
With @shir_gur and Lior Wolf :)
1
0
3
@hila_chefer
Hila Chefer@ ICLR’24
8 months
Super interesting observations by @MokadyRon ! 🚀
@MokadyRon
Ron Mokady
8 months
🔬Exploring Alignment in Diffusion Models - a 🧵 TL;DR: Diffusion models trained on *different datasets* can surprisingly generate similar images when fed with the same noise 🤯 [1/N]
Tweet media one
Tweet media two
33
112
765
1
0
3
@hila_chefer
Hila Chefer@ ICLR’24
2 years
@JacobGildenblat We attempt to solve the absolute reliance on shortcuts, and instead provide a more object-centric output. In fact, we observed that as the model obtains higher accuracy the salient issues get worse... (below: ViT Large heatmap that is unrelated to the object)
Tweet media one
1
0
3
@hila_chefer
Hila Chefer@ ICLR’24
2 years
@ai_for_humans @JacobGildenblat @ai_for_humans I think the distinction is that the background can be beneficial as a supportive cue to the foreground (e.g. a water snake is usually inside the water), when the prediction is mostly based on the background it is overfitting.
2
0
3
@hila_chefer
Hila Chefer@ ICLR’24
3 months
@ykilcher Thank you for the awesome summary of our work 🙏
1
0
3
@hila_chefer
Hila Chefer@ ICLR’24
2 months
1
0
3