Check out our
@CVPR
paper on making caption-based Vision-Language Models do object-localization without _any_ human-supervised detection data!
⁉️ We develop a new *VLM-specific PEFT method* 🤩which is more powerful than LoRA etc. We test on non-training categories only!
How can one easily teach caption-pretrained VLMs to localize objects? We show that a small Positional Insert (PIN) can unlock object localization abilities without annotated data on frozen autoregressive VLMs.
#CVPR2024
📝:
🌐:
We computer vision researchers rarely look at the individual data points inside our datasets.
Mainly because we are too lazy and/or do not have the right tools.
This needs to change. And now we have a great tool from
@TensorFlow
datasets team: Know Your Data. A thread.🧵(1/5)
Very happy to announce that VeRA is accepted at
@iclr_conf
with scores 8,8,8,5!
VeRA makes LoRA ~10x more parameter efficient while retaining the same performance & also works for vision!
Paper:
Our very light-weight webpage😏:
VeRA: Vector-based Random Matrix Adaptation
paper page:
Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even…
Today we
@Oxford_VGG
release the dataset 𝐏𝐀𝐒𝐒: 1.4M, CC-BY licensed images that do not depict humans, human parts or other PID. It can be used for self-supervised pretraining and works as well as ImageNet.
info & models:
paper:
Missed the QUVA Deep Vision lecture by Dr. Andreas Geiger (
@uni_tue
,
@MPI_IS
) yesterday?
Here's his talk on "Neural Implicit Representations for 3D Vision and Beyond".
You can also find other interesting talks from the past on our group's channel!
Our 2nd workshop on "Self-supervised Learning - What is Next?" coming to next
#ECCV22
!
More updates soon! In the meantime, checkout the previous iteration and marvel at how far we've come in SSL since then:
Organised with
@chrirupp
@dlarlus
A. Zisserman
What comes now: I'm excited to say that today I start as an Assistant Professor at the QUVA lab at the University of Amsterdam 🎊. I'll be working with
@cgmsnoek
,
@wellingmax
,
@egavves
,
@QCOMResearch
and the many talented students here! Onwards!
Yesterday I gave my first lecture for the
@UvA_IvI
Deep Learning 1 course at
@Lab42UvA
. Extremely happy for the great team of TAs (lead by Christos Athanasiadis and
@phillip_lippe
) and super motivated to make the students self-sufficient deep learners!
🌐:
Getting excited for my "Self-supervised and Vision-Language Learning" lectures starting tomorrow for the
@UvA_IvI
's MSc in AI, Deep Learning 2 course:
Sharing a preview in
@CSProfKGD
style :)
Soo much recent progress, learned a lot in preparing it.😊
Since everyone only tags these when things go wrong.. Great job
@overleaf
and
@openreviewnet
👏🎉! Submitting just a few minutes before
@CVPR
deadline worked without any problems!! I do wonder how many GB they get in the final minutes/seconds 🤔..
Our recordings are online 🎉! Listen to the talks from our Self-Supervised Learning workshop at
#ECCV2020
you've missed here:
and find slides at . Thanks again to all speakers and participants!
Looking forward to the Self-Supervised Learning Workshop we’ve organized with
@chrirupp
, A. Vedaldi and A. Joulin at
#ECCV2020
.
Join us tomorrow for our speakers:
@avdnoord
, P. Favaro,
@CarlDoersch
, A. Zisserman, I. Misra, S. Yu, A. Efros,
@pathak2206
! .
Really cool work I finally read now:
Learning to Discover and Detect Objects by
@tvlfom
,
@Ismail_Elezi
D. Ramanan
@lealtaixe
@AljosaOsep
Nice application of SSL+supervised tools for novel class obj detection. And: cool that SeLa's SK makes it work😀💪
📄:
Check out our
@iclr_conf
[oral] paper on learning state-of-the-art ViTs from a single video from scratch!
One of the coolest things is that multi-object tracking emerges from the different heads in the plain ViTs (three heads visualised below in R,G,B).
Really happy to share that DoRA is accepted as an Oral to
@iclr_conf
#ICLR2024
Using just “1 video” from our new egocentric dataset - Walking Tours, we develop a new method that outperforms DINO pretrained on ImageNet on image and video downstream tasks.
More details in 🧵👇
Looking forward to giving a talk on self-supervised learning -- the data, using multi-modal data and leveraging augmentations at
@QCOMResearch
in San Diego.
@egavves
@m_hofmann
🎉! I have a new opening: PhD position on "Adaptable Foundation Models". Joint-supervised with
@cgmsnoek
and integrated into the Qualcomm-UvA Lab within the diverse AI environment of Amsterdam, you will conduct boundary-shifting, out-the-box research.
🌐:
Looking forward to the Self-Supervised Learning Workshop we’ve organized with
@chrirupp
, A. Vedaldi and A. Joulin at
#ECCV2020
.
Join us tomorrow for our speakers:
@avdnoord
, P. Favaro,
@CarlDoersch
, A. Zisserman, I. Misra, S. Yu, A. Efros,
@pathak2206
! .
With
@iclr_conf
done &
@NeurIPSConf
deadline rapidly approaching, here's something to look forward to 🤩:
Our workshop
@ICCVConference
: "🍔BigMAC: Big Model Adaptation for Computer Vision" with amazing speakers
🌐:
📆: 2nd October 9am-1pm, details soon
🎉We open-sourced the code and models to our
#NeurIPS2020
SeLaVi ("Labelling unlabelled videos from scratch with multi-modal self-supervision") paper! Joint w/
@mandelapatrick_
,
@facebookai
,
@chrirupp
, VGG
Cluster your own videos or explore our clusters!
This week marks my one year anniversary of being assistant prof at the
@UvA_Amsterdam
. 🥳🎉
To celebrate this, I want to share a few of my distilled reflections.
🎉! Today's the start of our Deep Learning 1 course at the
@UvA_IvI
MSc in AI. Excited to introduce a new cohort of ~200 students to the intricacies, nitty-gritty details and mysteries of deep learning together with an amazing team of TAs:
Visit our ICLR poster "Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing" with
@irolaina
and A. Vedaldi, from
@Oxford_VGG
. We linear-probe SSL models, but in ǝsǝʌǝɹ!🤯 For better interpretability.
in 1h:
So that's what it feels like if your PhD student gives an oral presentation in front of hundreds of researchers at
@ICCVConference
🥹. Great job, Pengwan!
Final day at
@ICCVConference
! We have one oral at around 9:20am: "Self-ordering Point Clouds" where we learn how to select the most informative points with hierarchical contrastive learning (subsets as positive augmentations) and use Sinkhorn-Knopp for differentiable sorting.
New preprint!
LLM-to-SLM: let the big LM "think" (1x forward) and the small LM write (Nx forwards). simply idea and also shows Encoder LM are still super useful.
🚀 Excited to share our latest work "Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding" now on arXiv! We're taking strides in making language models faster & more efficient on text generation tasks like translation & summarization.🔍 []
Come to our ECCV workshop (Hall C) from 9am: "Self-supervised Learning: What is Next?". We have a great line-up of speakers including:
@jalayrac
,
@mcaron31
,
@NagraniArsha
,
@imisra_
, Linda Smith and Andrea Vedaldi, and we start our series at 9:20am with Alyosha Efros!
SimMIM: A Simple Framework for Masked Image Modeling
abs:
When applied on a larger model of about 650 million parameters, SwinV2-H, it achieves 87.1% top-1 accuracy on ImageNet-1K using only ImageNet-1K data
🤩🤩 Extremely happy to announce our ELLIS PhD school in Amsterdam: bringing together amazing researchers from all things foundation / large-scale / multi-modal / self-supervised AI!
If you're a PhD student, don't miss this opportunity, it's straight after the
@eccvconf
deadline
Today, my friend & collaborator
@TengdaHan
sent me this: I've arrived at 1000 citations! 🥳 Or rather, the works I've co-authored with many brilliant & inspiring individuals, have, collectively reached a nice arbitrary number! Still: 🥳🎉!
To celebrate: here's some TL;DRs
I've learned a lot about equivariances and geometric deep learning for preparing today's lecture on graph neural networks. Still so much to learn, it's great!📚🤓
We've now released the
@ICCVConference
's BigMAC workshop recording:
Also: most of the speakers' slides are now on the website too (and we're chasing the remaining ones) :).
Don't forget to apply to this PhD vacancy🎓! We will work on exciting topics related to computer vision / self-supervised learning and video! We prioritize diversity and are committed to creating an inclusive environment for everyone.
📅 Deadline is 31st Jan.
We have a PhD vacancy in the QUVA Lab 🎉! If you want to work with me and
@cgmsnoek
on pushing the frontier of SSL and video learning check this out!! :)
Getting excited for my "Self-supervised and Vision-Language Learning" lectures starting tomorrow for the
@UvA_IvI
's MSc in AI, Deep Learning 2 course:
Sharing a preview in
@CSProfKGD
style :)
Soo much recent progress, learned a lot in preparing it.😊
Day 2. Starting with
@chrirupp
from
@Oxford_VGG
. From Lucas-Kanade to epipolar geometry to tracking to open vocabulary segmentation to leveraging generative augmentations. A whirlwind of fundamentals and state-of-the-art! 👏
Full house at the practical of our SSL + vision-language module!
Want to follow along? Find the collab notebook made by my fabulous TAs
@ivonajdenkoska
and
@mmderakhshani
here
💻:
lecture 2 📺:
slides 📄:
That's a wrap! 🎉
Thanks
@visumschool
for inviting us and thanks to the participants for their great questions to our session "Self-supervised learning for computer vision from images, video and audio. The why & how".
With
@TengdaHan
from
@Oxford_VGG
.
Wrap-up of day 2 at our
@ELLISforEurope
winter school on Foundation Models: our poster session with drinks and Dutch borrel! Seriously strong research here. Also check out our amazing t-shirts :)
A little late, but here's a quick thread of one of my two
@iclr_conf
accepted papers!
"The augmented image prior: Distilling 1000 classes by extrapolating from a single image", joint work with
@aaqib_saeed
@TUeindhoven
🌐:
🔖:
Finally it's out! 🎉 Our new work on leveraging the passage of time in videos for learning better image encoders. Big improvements for spatial tasks like unsupervised object segmentation. Check out the great thread below! Paper
@ICCVConference
New paper on exploring the power of videos for learning better image encoders 🎥🧠. Introducing "TimeTuning", a self-supervised method that tunes models on the temporal dimension, enhancing their capabilities for spatially dense tasks, such as unsupervised semantic segmentation.
LoReFT is very interesting and follows our line of work in VeRA (ICLR'24) of making PEFT methods _much_ more parameter-efficient.
Added their numbers in our table so you can compare it against other methods in one overview.
ReFT: Representation Finetuning for Language Models
10x-50x more parameter-efficient than prior state-of-the-art parameter-efficient fine-tuning methods
repo:
abs:
Now at
@iclr_conf
! If you're also here and want to have a chat, just ping me! :) if you don't know me, I mostly work on self-supervised and multi-modal learning and love to meet new people. 👋
What are important concepts that are hard to understand when encountering VAEs for the first time?
What do you wish you would've been told/taught about this?
Happy to finally share our
#ICML2022
paper "CITRIS: Causal Identifiability from Temporal Intervened Sequences". We investigate causal representation learning for multidimensional variables, and introduce the concept of "minimal causal variables".
🧵 1/11
And following up on this, we have
@Thom_Wolf
from
@huggingface
teaching us how to train LLMs with all the nitty gritty details. And the strong message to focus and inspect THE DATA. Also, second speaker in a row referring to
@karpathy
's tokenizer lecture 🎏.
Last week we had two Guest Lectures in our Deep Learning course: A. Vedaldi from
@Oxford_VGG
on Learning 3D Geometry and
@AmlabUva
's
@wellingmax
on AI4Science.
Students saw previous concepts like GNNs, AEs, ViTs & self-sup. learning applied to two radically different topics. 🤩
I will be giving a public talk about some of my research tomorrow as part of the QUVA Deep Vision lecture.
It'll be about self-supervised learning and privacy/ethics in CV (SeLa, PASS, SeLaVi, GDT and our GPT-2 bias paper).
Tune in here:
Some news: In our new paper
@PNASNews
we show that a simple, agent-based macro-economic model where agents act myopically, can exhibit business-cycle like dynamics and near long-term economic optimality
We have a PhD vacancy in the QUVA Lab 🎉! If you want to work with me and
@cgmsnoek
on pushing the frontier of SSL and video learning check this out!! :)
Using KYD, it's easy to explore our common computer vision datasets and analyse them in various dimensions.
Here, I show you some things I played around with:
For example, we can use meta-data and show images with high latitude:
(2/5)
Checkout our paper on making plain ViTs faster and better (supervised and SSL training), now accepted at
@iclr_conf
.
Was very fun co-supervising the project during your internship
@shawshank_v
:)
Delighted to share that SkipAT has been accepted to
#ICLR2024
@iclr_conf
This work was done as part of my internship at
@QCOMResearch
together with some amazing researchers - Amir Ghodrati,
@y_m_asano
, Fatih and
@amir_habibian
.
Please check the thread 👇 for more details.
Our
#ECCV2022
@eccvconf
workshop on self-supervised learning and its many new forms.
This time with a call for papers with a deadline conveniently after ECCV decisions.
Check it out ☑️ and share🔀!
Last day of our
@ELLISforEurope
FOMO winter school starting with a big 💥:
@wellingmax
on fluids and representations and the huge potential impact of AI in science. Immediately his first slide is best explanation of free energy I've heard (and I studied physics!)
Hire this guy! He's a great researcher and engineer and working with him is fun! He's also the first author of our super cool and fun "1 video SSL learning" paper, where we beat imagenet DINO models by training on 1 video from scratch. 👀
Hi everyone,
I am graduating early next year with a PhD from
@Inria
in AI. I am looking for an industrial research position and would appreciate your support.
Thank you in advance for any connections, advice, or opportunities you can offer.
My profile:
Research -> meeting friends! 🥳 After speaking at Bristol's Machine Learning and Vision group of
@dimadamen
and having exciting discussions about research yesterday, I was happy to see old and new colleagues from
@Oxford_VGG
at my talk at
@oxengsci
in Oxford today.
Final day at
@ICCVConference
! We have one oral at around 9:20am: "Self-ordering Point Clouds" where we learn how to select the most informative points with hierarchical contrastive learning (subsets as positive augmentations) and use Sinkhorn-Knopp for differentiable sorting.
DINOv2: Learning Robust Visual Features without Supervision
train a ViT model with 1B parameters and distill it into a series of smaller models that surpass
the best available all-purpose features, OpenCLIP on most of the benchmarks at image and pixel levels
abs:…
Happy to share with you the QUvA Deep Vision Seminar talk from
@olivierhenaff
from
@DeepMind
on
"The virtuous cycle of object discovery and representation" is now online:
Enjoy😊:
I'm super happy to say that this unsupervised labelling of videos paper (via SSL-clustering and SK-optimal transport) with
@mandelapatrick_
,
@chrirupp
and Andrea Vedaldi got accepted at
#NeurIPS2020
! 🎊🎊
I'm sure that if we had this tool earlier, efforts like ImageNet-blurred, removing the person-subtree and PASS would've happened earlier.
We don't have an excuse anymore to not see the bias and problems in our datasets. (5/5)
Fixing outlier issues caused by softmaxes in transformers with three different solutions: a) add Gates b) clip values c) add "+1" in the soft a denominator. Cool talk from
@TiRune
at the LBQNN workshop. Perhaps "register tokens" another solution even.
Gathering a list of topics for our Foundation/Big Models course
@UvA_IvI
in the spring, that I'll be teaching together with
@cgmsnoek
.
Besides the list below, what should we include?
Today we
@Oxford_VGG
release the dataset 𝐏𝐀𝐒𝐒: 1.4M, CC-BY licensed images that do not depict humans, human parts or other PID. It can be used for self-supervised pretraining and works as well as ImageNet.
info & models:
paper:
Ended up being >600 applications.🤯
Now: keep checking your inboxes, we're sending acceptances continuously and will keep doing so ( spaces open due to visa issues etc.).
See you all in Amsterdam
@Ellis_Amsterdam
,
@Lab42UvA
!
Huge thanks to co-organisers
@pranindiati
&
@cgmsnoek
⏱ 3 days left to apply for the ELLIS Winter School on Foundation Models!
👉🏻 Check out the school & submit your application NOW: (speakers list and schedules have been recently updated!)
👀 We aim to have participants from highly diverse backgrounds
Great point 👇. Also research is a bit like hiking in unknown terrain: you don't know the map at the beginning, you build it by exploring; still you never know if the trail doesn't suddenly stop. Plus, in AI research the terrain changes every 6months 😅
Our paper on causal representation learning from videos got accepted into ICML. 🍋🎉 while we use toy datasets this is a great step for the future of representation learning.
Very happy to be at the in-person
@TheBMVA
Symposium! Yesterday I was able to give a talk on our NeurIPS'21 oral Motionformer paper; get people excited about video data; and, importantly, meet many old (
@Oxford_VGG
) and new friends! Now day 2!
The last two weeks I had the pleasure and honor to travel in Ghana and Côte d'Ivoire and learn more about the countries and meet people. Particularly grateful for the opportunity to speak with folks
@GoogleAI
in Ghana
@bapadadada
and
@UnivofGh
w/ Jamal Deen.
Super stoked and honored to be talking at the SSL workshop
@NeurIPSConf
tomorrow morning, alongside such amazing speakers.
Will be bringing some Dutch paintings along!
Had a great time chatting with
@bhutanisanyam1
! A pleasure and an honour! We talk about my recent research and my life as a PhD student in deep learning.
More CV & cutting edge research interviews on
@ctdsshow
🍵
In this ep, I interview
@y_m_asano
, PhD student
@Oxford_VGG
,
@UniofOxford
🙏
We talk about Self-Supervision, Self-Labelling & CV
Discussed Papers in thread
Audio:
Video:
Check out our new work "Labelling unlabelled videos from scratch with multi-modal self-supervision" by
@y_m_asano
@mandelapatrick_
@chrirupp
and Andrea Vedaldi in colab with
@facebookai
!
See below for our automatically discovered clusters on VGG-Sound!
This year I very much found that science isn't just doing research and papers! As area chair for CVPR/NeurIPS/ICLR/WACV & workshop organizer at ICCV/NeurIPS & via meeting friends of friends at conferences I learned: everyone can make their own community. Thanks for being in it!🥳
It was such a nice event! Seeing friends and meeting new ones all the while seeing NeurIPS works -- and it being simply downstairs from my office is 💯 . Especially also super happy to see the mingling between PhD and MSc students and other researchers! 🙌
We got an oral at NeurIPS!! 🎊😋 In this work we developed a new attention mechanism specifically for vision transformers that work on (longer) videos. As always great and fun collab with
@mandelapatrick_
et al.!
Our work Motionformer was accepted as an ORAL at NeurIPS 2021! We present a novel trajectory-focused self-attention block, which essentially tracks space-time patches for video transformers. Work with amazing collaborators
@facebookai
@Oxford_VGG
. Code:
Now
@RisingSayak
from
@huggingface
giving a great overview, insights and limitations into controlling Text-to-Image generative models. Solutions at low compute. Token inversion, controlnet, attend & excite etc.
Or we can sort the dataset by number of "detected" faces.
Here's PASS:
Here's ImageNet:
(I will not link to other problematic human-containing pictures in ImageNet, but please, as a researcher, explore a bit.) (4/5)
Start of day 3 of our
@Ellis_Amsterdam
winter school. Starting with
@TimSalimans
from
@Google
on image and video generation with diffusion models. From basics to sota 1-4 step generation methods. 👌
Another cool LLM paper accepted to
@iclr_conf
! This time on structured pruning for LLMs without losing much performance. Work done by
@tychovdo
who did his internship
@QCOMResearch
Amsterdam -- was very fun to help out! Check the informative thread below 👇
⭐️New paper ⭐️ Excited to share 'The LLM Surgeon', accepted at ICLR 2024. We obtain SOTA pruning performance and even demonstrate structured LLM pruning of full rows and cols. Direct practical impact enabling compression up to 20-30% with negligible loss in performance.🧵1/9👇
Alyosha: "not saying we need to move back to kNN, but.." [a) we should focus on data over algorithms b) think about what comes after copying, intelligence as emergent phenomenon c) self supervised learning and data gen for robotics]
I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks,…
Wow!! We already have >240 applications for our ~100 spots! 🎉🎉
We're trying to increase capacity and decided to start sending acceptances on a rolling basis. We will still accept applications until the 15th Feb though, but this should help with visa/hotel booking. Go submit!🚀
🤩🤩 Extremely happy to announce our ELLIS PhD school in Amsterdam: bringing together amazing researchers from all things foundation / large-scale / multi-modal / self-supervised AI!
If you're a PhD student, don't miss this opportunity, it's straight after the
@eccvconf
deadline
Finally got around to uploading some more pretrained torchvision resnet models for our SeLa method!
NMI=76%, aNMI=53% and Linear probing of 68.8% simply by using the SimCLR augmentation - still just 280 epochs and linear head. SSL=augmentations. Enjoy!🎉
We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). 🧵1/N
Paper:
Small one but a good one! We test out whether object detection on COCO works with blurred and swapped faces: It does. Towards more private and less biased training of models! work with
@OxfordAI
.
Now is a good point to urge people to spend more effort in cleaning datasets and valuing that.
I feel like in '21 with the first NeurIPS data there was momentum, but with LAION winning best paper in '22, my hope went down.
Our SSL IN1-k alternative:
big breaking news: LAION just removed its datasets, following a study from Stanford that found thousands of instances of suspected child sexual abuse material
BMVA Symposium on Vision and Language
One Day BMVA symposium in London, UK Wednesday, 17th January 2024
Chairs: Michael Wray (University of Bristol), Davide Moltisanti (University of Bath), and Tengda Han (University of Oxford).
The Segment Anything model is great & I was happy to read that the SA-1B dataset is licensed and "privacy preserving"! Yet when I looked at the data for 10min, I found >8 imgs with faces unblurred. A reminder that making AI data private is not a finished task, let's keep going!💪
Last week, we were fortunate to have Prof.
@dimadamen
give an amazing talk about "Video Understanding - An Egocentric Perspective" at our QUVA lecture. The talk is now online on our channel:
Find previous and future talks here:
Applause for our
#ELLISUnit
@Ellis_Amsterdam
, which hosted the ELLIS Winter School on
#FoundationModels
! Supported by
@ai_elise
, the vibrant event provided an in-depth look into how Europe is guiding its own research agenda in this crucial field!
Meillä
@TampereUni
Signaalinkäsittely ja koneoppiminen -pääaineessa opiskelijat osallistuvat salaseuraan, jossa alansa asiantuntijat kertovat salaisuuksia tutkimuksestaan. Tänään Yuki Asano (
@y_m_asano
) from
@UvA_Amsterdam