Gabriele Berton Profile Banner
Gabriele Berton Profile
Gabriele Berton

@gabriberton

1,058
Followers
760
Following
7
Media
199
Statuses

Visiting PhD at @CarnegieMellon - PhD student @PoliTOnews - ex @Amazon @IITalk . Looking for CV jobs from November

Joined December 2021
Don't wanna be here? Send us removal request.
@gabriberton
Gabriele Berton
9 days
This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical
Tweet media one
42
262
2K
@gabriberton
Gabriele Berton
9 days
Limitations: this only works when you have more than one loss (with disentagled computational graphs). Bonus: the more losses you have, the more memory you'll save
6
2
93
@gabriberton
Gabriele Berton
9 days
Ok I'll use this - I'm looking for computer vision jobs/internships from November, anywhere in the world. I have industry experience (Amazon and more) and good publications (first authored 6 CVPR/ICCV). Background in VPR/IR but expanding my horizons. ~1000 stars on GitHub
1
2
35
@gabriberton
Gabriele Berton
9 days
@jon_barron It needs to keep the graph as long as you're not freeing it. If you never call backward() it will never free it
1
1
38
@gabriberton
Gabriele Berton
8 months
Next week I'll be at ICCV with two papers and one demo on Visual Place Recognition! Our demo is about geo-localizing images with city-wide 3D models. You can send us your photos and we try to geo-localize them! We have 3D models of Paris, San Francisco and Berlin! 🧵
1
2
29
@gabriberton
Gabriele Berton
9 days
Wow this is getting traction. I will post more tricks on how to write efficient code then
0
0
31
@gabriberton
Gabriele Berton
3 months
@ISS astronauts take photos of Earth, which are traditionally manually localized for disaster management and climate change research: our #CVPR2024 paper shows that localization can be automated! The localized photos can be explored at Short thread:
2
2
18
@gabriberton
Gabriele Berton
1 year
Tomorrow at #CVPR2023 I'll be giving a 30 minutes talk on geolocalization for large-scale scenarios at the tutorial on Visual Geo-Localization! If you're curious about the task, see you there! We'll show you how to geolocalize clips from famous movies!
0
0
14
@gabriberton
Gabriele Berton
5 months
Looking for a summer school in machine learning? Check out this list of tens of ML summer schools per year, including dates, locations, fees, grants. PRs with new summer schools / updated info are welcome! Thanks to @sshkhr16 for this awesome repo!
0
2
12
@gabriberton
Gabriele Berton
9 days
@lwangoku99066 Yeah but then the trick doesn't really help (does not save memory), so IMO if the losses share part of the graph the doing "loss = loss1 + loss2" is fine
2
0
17
@gabriberton
Gabriele Berton
26 days
@ducha_aiki @Parskatt We gave an answer to this question for 17 matching methods on an unconventional domain (satellite images) in our latest paper (EarthMatch). Most methods plateau between 512 and 768, RoMa thrives at 1024, SIFT-LG (one of our best performing) at 256
0
2
11
@gabriberton
Gabriele Berton
8 days
@PMinervini This accumulates the gradient, but when people say "gradient accumulation" they refer to splitting the batch into sub-batches, which is quite different to the above. Standard grad acc can lead to batch norm issues (BN doesn't like small batches), the code above doesn't
0
1
14
@gabriberton
Gabriele Berton
9 days
@asmirnovxyz Exactly, and also because the backward() adds gradient to existing ones instead of replacing them (and that's why we use zero_grad() btw)
0
0
13
@gabriberton
Gabriele Berton
2 years
Are you at #CVPR ? Come to check our 2 papers on visual geolocalization and visual place recognition! Both are open-source and fully reproducible, and we have a cool demo at the poster session! More info below
@masone_carlo
Carlo Masone
2 years
The Vandal Lab is at #CVPR2022 with several members, from June 19-24. We are ecstatic to present our 8 papers on First Person Action Recognition, Semantic Segmentation, Visual Geo-localization, Object Detection, Incremental Learning. See you all there!
Tweet media one
0
5
21
1
0
8
@gabriberton
Gabriele Berton
9 days
What is the reason for not releasing final reviews to authors in conferences like @eccvconf , @CVPR , @ICCVConference ? It would be pretty helpful for ECCV authors to have the final reviews right now - we'd have a better idea of P(acceptance)
2
0
10
@gabriberton
Gabriele Berton
11 days
The perfect GitHub repo allows you to git clone -> pip install -> python and should work automagically. The should download data and models if necessary
1
0
6
@gabriberton
Gabriele Berton
3 months
@AlexStoken Tagging here a few people that might be interested in the localized images - we'd love to see the science that comes from these! @stim3on @this_is_tckb @phi48 @DaveAtCOGS @RikyUnreal @AGeeknologist @sergioajv1
4
0
6
@gabriberton
Gabriele Berton
4 months
@CVPR @CVPRConf Thanks, I've sent them an email. In case anyone else cares, I'll post their answer here below
2
0
7
@gabriberton
Gabriele Berton
4 months
Hi @CVPR @CVPRConf could we have some further details on the "no links in rebuttal rule"? E.g., if a link was already in the main paper (but we think reviewers missed it), can we add it? And what about links to news articles? Alternatively, can we add them as references?
2
1
5
@gabriberton
Gabriele Berton
1 month
@ducha_aiki @nico_dufour @xkungfu @RomainLoiseau15 @Elt_Vincent @captnloic Really nice paper, easy to read and straight to the point. Also interesting to see CLIP outperforming DINOv2 by a large margin, quite unusual among CV tasks
3
0
5
@gabriberton
Gabriele Berton
11 days
I've created a simple wget-like tool to download big files with python. Just `$ pip install py3_wget` and use like `import py3_wget; py3_wget.download_file(url)`. It's useful to download data and models within the code to improve usability/reproducibility
@gabriberton
Gabriele Berton
11 days
The perfect GitHub repo allows you to git clone -> pip install -> python and should work automagically. The should download data and models if necessary
1
0
6
1
0
6
@gabriberton
Gabriele Berton
2 years
Rethinking Visual Geo-localization for Large-Scale Applications Poster session 1.2 (Tue 2:30) We present a 40M+ dataset of San Francisco, and a new highly scalable training method which reaches sota at a fraction of the cost
0
2
6
@gabriberton
Gabriele Berton
3 months
@ducha_aiki @AlexStoken @bcaputo_iit Thank you for the tweet! The main contribution of the paper is proving that the task of Astronaut Photography Localization is actually solvable with large-scale image retrieval (nobody tried it before, and those image were mostly localized manually before EarthLoc)
0
0
5
@gabriberton
Gabriele Berton
3 months
@Parskatt @ducha_aiki @AlexStoken @bcaputo_iit And we're doing some post-processing/matching on these images and RoMa looks really good on that, more updates coming soon 😉
2
0
5
@gabriberton
Gabriele Berton
3 months
@Parskatt @ducha_aiki @AlexStoken @bcaputo_iit AnyLoc is really really good on this, considering it's not specifically designed for astronaut imagery! But its features extraction is too slow and features are way too heavy for practical deployment
0
0
4
@gabriberton
Gabriele Berton
9 days
@jon_barron For me what makes pytorch great is the large user control over the training. In practice I never saw more than 5 losses being used, and most pipelines use 1 or 2
0
0
4
@gabriberton
Gabriele Berton
9 days
@AntonObukhov1 Doesn't break batchnorm actually
0
0
4
@gabriberton
Gabriele Berton
1 month
@pesarlin @dantkz Or if you're looking for a more generic solution, then AnyLoc is the way @Nik__V__ @JayKarhade
0
0
4
@gabriberton
Gabriele Berton
1 month
@dantkz Yeah, not too much progress between 2015 (NetVLAD code released) and 2022 (I'd cite SFRS/OpenIBL as main highlight of this VPR winter). Around 2022 VPR has taken off, but we're usually focusing on urban large-scale datasets, not much on generalization/indoor/nature
0
0
4
@gabriberton
Gabriele Berton
3 months
@Parskatt @ducha_aiki @AlexStoken @bcaputo_iit @BokmanGeorg So it's a bit different from the setting from @AlexStoken FMAP paper (and Steerers)
0
0
4
@gabriberton
Gabriele Berton
3 months
@AlexStoken FMAP has localized over 100k images since last year - with retrieval-oriented EarthLoc, we have localized over 500k in just a few days!
2
0
3
@gabriberton
Gabriele Berton
3 months
@Parskatt @ducha_aiki @AlexStoken @bcaputo_iit @BokmanGeorg We're working on Steerers too (and a dozen more models). For the post-processing both images share "up" in EarthLoc we compute retrieval with 4 rotations (every 90°) of each DB image. So in case of "perfect" retrieval, they share similar orientation
2
0
4
@gabriberton
Gabriele Berton
3 months
@BokmanGeorg @AlexStoken @Parskatt @ducha_aiki @bcaputo_iit Model weights are now published, I'm now really curious to see if that works 👀
1
0
4
@gabriberton
Gabriele Berton
19 days
@akuwantsagi @giffmana Yep, can be done :) We automated localization of photos taken from the ISS (a task mostly done by hand till recently) using similar techniques that are used for visual place recognition (using retrieval instead of classification) in our CVPR24 paper
1
0
4
@gabriberton
Gabriele Berton
4 months
@CVPR @CVPRConf 2) For news articles and pre-existing libraries not generated by the authors, I think ACs would likely see this as acceptable as long as it was clearly relevant and needed to address reviewer comments.
1
0
3
@gabriberton
Gabriele Berton
3 months
@isaaccorley_ @ducha_aiki @AlexStoken @bcaputo_iit normalization, so I tested them with the default (ImageNet). Can you share some info on which normalization should be used on those? I agree that it could have a big impact on results
0
0
1
@gabriberton
Gabriele Berton
9 days
@jon_barron Not really, I love pytorch and I don't see anything wrong with this behaviour
1
0
4
@gabriberton
Gabriele Berton
4 months
@CVPR @CVPRConf Here and in the next reply is what they said: 1) If the link is already in the main paper, perhaps the rebuttal could just point out the link in the main paper using line numbers, to avoid any perception of policy violation.
1
0
3
@gabriberton
Gabriele Berton
4 months
@Parskatt There are city-wide street-view datasets freely accessible (not from Google), some coupled with 3D meshes of the city. E.g. from San Francisco images 3D mesh I don't understand why more people aren't using these
3
0
3
@gabriberton
Gabriele Berton
9 days
@jon_barron @gazorp5 I agree with the troll answer though 😂 In fact the reasoning behind my tweet is similar to the one behind gradient accumulation: instead of computing the loss with bs 1024, compute 8 sub-losses with bs 128, and run the backward sequentially for each of the 8 losses
0
0
3
@gabriberton
Gabriele Berton
19 days
@giffmana @captnloic @akuwantsagi Agree that 100M+ can't densely cover the world, but in very large-scale VPR experiments we ran (I've been working on this 4+ years) we (unexpectedly) found that a good VPR system retrieves images from nearby locations to the query, even if there absolutely no covisibility
1
0
3
@gabriberton
Gabriele Berton
2 months
@AlexStoken @ducha_aiki @Parskatt @BokmanGeorg @zhenjun_zhao I opened the comments to write the same thing! That section will save much time to many people, I wish more papers had that
0
0
3
@gabriberton
Gabriele Berton
1 month
@pesarlin @dantkz Good to know that retrieval is the main bottleneck in LaMAR. Which metric would you use to benchmark retrieval methods on it (without having to run the matching as well)?
2
0
3
@gabriberton
Gabriele Berton
3 months
Photos are taken with a DSLR camera from the ISS cupola. The ISS location at the moment of taking the photo is known, but the footprint of the photo (i.e. the area represented in it) can be thousands of km away, making the localization problem very challenging and large-scale.
1
0
2
@gabriberton
Gabriele Berton
2 years
@CVPR what's the use of the "single “slide” serving as the high-level summary of your work"? Should it be a simpler version of the poster for virtual attendees, or more like a thumbnail?
1
0
1
@gabriberton
Gabriele Berton
9 days
@evgeniyzhe @lwangoku99066 The main use case for me is when I'm using different images for different losses
1
0
6
@gabriberton
Gabriele Berton
1 month
@pesarlin @dantkz BTW if you're looking for indoor VPR models, a student in our lab released these two models (just two models tuned on indoor, we're not planning a paper on it)
1
0
3
@gabriberton
Gabriele Berton
2 years
@ducha_aiki Thank you for the tweet :) Just I would say that the biggest contribution of the paper is not the dataset, rather the scalable training method that allows us to largely outperform previous methods on all datasets
0
0
3
@gabriberton
Gabriele Berton
3 months
@BokmanGeorg @AlexStoken @Parskatt @ducha_aiki @bcaputo_iit Really cool, thanks for trying! I don't really understand what the images are supposed to show though, but if the steerer achieves the goal of avoiding 4xTTA and improving results, that would be a huge win!
1
0
3
@gabriberton
Gabriele Berton
2 years
@giotolias @SattlerTorsten @pesarlin @Poyonoz @SergeBelongie @karpathy @alexgkendall +1 on the "train for classification and repurpose the network for retrieval" part, that's pretty much what we do in one of our CVPR22. It's quite interesting to see that results are already quite good (definitely super-human level) for a very large city
2
0
3
@gabriberton
Gabriele Berton
19 days
@akuwantsagi @giffmana Astronaut photos are used for disaster management and climate research. Localizing random (ground) photos has less applications afaik, but the fact that some startups are working on this (e.g. ) makes me think that there are more use cases than we imagine
2
0
3
@gabriberton
Gabriele Berton
3 months
@AlexStoken If you're a computer vision scientist looking for a new and unexplored task with lots of practical applications (and lots of available data!) then Astronaut Photography Localization is the task for you! Feel free to ping me for any question.
1
1
2
@gabriberton
Gabriele Berton
3 months
They are also used to measure, track and discover wildfires and floodings.
1
0
1
@gabriberton
Gabriele Berton
2 years
@giotolias @SattlerTorsten @pesarlin @Poyonoz @SergeBelongie @karpathy @alexgkendall For world scale, I'd go for a coarse-to-fine approach (i.e. first classification then retrieval) though it would require something like a 10B+ images dataset (and it would only work with queries that are not far from a road)
0
0
1
@gabriberton
Gabriele Berton
3 months
@BokmanGeorg @AlexStoken @Parskatt @ducha_aiki @bcaputo_iit And I'm also really happy to see already some development on this. Being a new task, there is a ton of low-hanging fruits ready to be picked - and improving results on this task has direct real world benefits on many domains
1
0
2
@gabriberton
Gabriele Berton
8 months
@AdamDHines Really cool, considering that it's not explicitly trained for this! I just tried these two photos on this amazing demo from @_ericmb which is trained for "world-wide geolocation" and it correctly predicts Brisbane for both images!
Tweet media one
1
0
2
@gabriberton
Gabriele Berton
3 months
Given their importance, these images used to be manually localized at NASA (300k have manual labels!). Then @AlexStoken came up with FMAP (from CVPR 2023) which computes image matching between the astronaut photo and all its potential matches (spanning over millions of sq km).
1
0
2
@gabriberton
Gabriele Berton
2 years
(Oral) Deep Visual Geo-localization Benchmark Poster session 2.1 (Wed 10:00) A benchmark that provides a flexible codebase which we used to define best practices for the task!
1
2
2
@gabriberton
Gabriele Berton
8 months
Unfortunately, all three presentations are simultaneous on Thursday the 5th (posters at 10:30-12:30 Room Foyer Sud and demo 10:30-16:00 Demos Area): I'll be at the EigenPlaces poster in the morning and the demo in the afternoon, if you're at ICCV come to say hi!
1
0
1
@gabriberton
Gabriele Berton
9 days
@surmenok @lwangoku99066 In this ICCV paper I use 3 losses, each of which uses different data Also the trick can be used (with some adjustments) when using a shared frozen backbone and multiple trainable heads (and the same batch). I wouldn't say it's rare, but neither too common
0
1
2
@gabriberton
Gabriele Berton
3 months
@francoisfleuret @PyTorch @cHHillee It's crazy that nobody has yet found an answer to this, I'm so curious now
0
0
2
@gabriberton
Gabriele Berton
3 months
@AlexStoken Possible future directions might include: training directly on (some of the) localized queries, using seasonal data, applying domain adaptation, using available metadata (like camera lens) during training, using stronger backbones, and more.
1
0
1
@gabriberton
Gabriele Berton
4 months
@CSProfKGD @CVPR @CVPRConf Thanks for the advice, I'll stick to that. I'm just worried that R2 isn't going to spend 10 seconds of their time to find that link on the paper 😅 whereas providing a direct link might encourage them to click on it. Same for news article, I doubt they'll google it on their own
0
0
2
@gabriberton
Gabriele Berton
11 days
@Parskatt E.g. in these repos the code takes care of downloading the models weights. They work automatically out of the box and can be used on other data as well and
0
0
2
@gabriberton
Gabriele Berton
9 days
@eccvconf @CVPR @ICCVConference I mean releasing the final reviews as soon as the reviews are finalized by reviewers (which is now for ECCV) instead of waiting until there is the final decision from the AC
0
0
2
@gabriberton
Gabriele Berton
3 months
@eccvconf after the Feb 29th registration deadline, can we still change the abstract, title, and other submission fields?
2
0
2
@gabriberton
Gabriele Berton
3 months
@RikyUnreal @phi48 Correct, we didn't try EarthLoc (and it wouldn't work) on night images, but we will develop something for those in future developments
0
0
2
@gabriberton
Gabriele Berton
7 days
@jon_barron Do you think this could really happen? I wish it did, but I think in practice this kind of behavior (or other scientific malpractice) is not punished harshly enough. And it's also hard to prove
1
0
3
@gabriberton
Gabriele Berton
9 days
@dariocazzani Only if they share the computational graph, otherwise no. And if they do share, then I wouldn't use retain_graph, I'd just add the losses and compute a single backward
2
0
3
@gabriberton
Gabriele Berton
2 months
@TimDarcet Really cool work! It will be really interesting to see how this data will be used for forest management. The project is quite similar to the BioMassters challenge where the goal was to estimate biomass (I'm now wondering if the tree height has a 1-to-1 correlation with biomass)
0
0
2
@gabriberton
Gabriele Berton
19 days
@giffmana @captnloic @akuwantsagi And regarding real-world applications, all those mentioned above would have a human in the loop and don't require real-time, so retrieval would be more suitable (retrieval produces visual confirmation unlike classification). Probably classif+retrieval is the best choice though
0
0
2
@gabriberton
Gabriele Berton
4 months
@Parskatt Also this one from Denmark by @FrederikWarburg @jcivera
0
0
2
@gabriberton
Gabriele Berton
3 months
Astronaut photos span decades and have high spatial resolution, making them ideal to study long-term climate change effects. Unlike most satellite imagery, they can also be taken at an angle, which is helpful to measure heights (e.g. smoke height from volcanic eruptions)
1
0
2
@gabriberton
Gabriele Berton
3 months
We propose to use image retrieval over an image in a database of satellite imagery, with known footprints, that is most similar to the astronaut photo. Our database represents all the area visible to an astronaut at photo capture time.
1
0
1
@gabriberton
Gabriele Berton
3 months
@isaaccorley_ @ducha_aiki @AlexStoken @bcaputo_iit I think one reason for their poor results is that those models are not actually trained for the task of APL (because no models existed for APL, I tried a lot of models for many related tasks). But TBH on torchgeo I didn't find anything about those models using different ...
2
0
1
@gabriberton
Gabriele Berton
1 month
@oh_that_hat Sunday afternoon in Mauerpark is a must (Berlin wall+graffiti+karaoke+street artists). Cycling from Alexander Platz to the Brandenburg gate, and cycling around Tempelhofer (a huge ex airport which is now a park, full of people enjoying life). Also try a beergarten and sysyphos
0
0
2
@gabriberton
Gabriele Berton
9 days
@Nik__V__ Now I feel the pressure 😂
0
0
2
@gabriberton
Gabriele Berton
3 months
@Nik__V__ @Parskatt @ducha_aiki @AlexStoken @bcaputo_iit True, we could add AnyLoc + PCA but then I'd have to add PCA to other methods too 😅 Looking forward to the speed fix, I can still update the paper (extraction speed) for the camera ready 😉
0
0
2
@gabriberton
Gabriele Berton
9 days
@OutofAi Not sure about numerical stability, IMO the results with and without the trick are exactly identical
2
0
2
@gabriberton
Gabriele Berton
9 days
@zatxhi I mean if you have different inputs for each loss, then the trick is as simple as what I posted above. If you have something like out = backbone(images) loss1 = do_loss1(out) loss2 = do_loss2(out) Then you can also use the trick above (but the feats need to be cloned first)
0
0
4
@gabriberton
Gabriele Berton
1 month
@Suomela_L @ieee_ras_icra @JoniKamarainen Really cool work, faster and more accurate than previous solutions! See you at ICRA in 2 weeks!
1
0
2
@gabriberton
Gabriele Berton
11 days
@Parskatt Usually repos have a readme which tells you first to download code and data, then run . It would be much easier if the download was automatic. Then of course, the should also be usable with other datasets/models as a library
2
0
1
@gabriberton
Gabriele Berton
2 years
@ducha_aiki Does it make sense to open it in multiple tabs on my browser?
1
0
1
@gabriberton
Gabriele Berton
9 days
@cthorrez Not only that. It is also if you're training the same model with different batches for each loss, or if you're passing the same batch through the model once for each loss
0
0
1
@gabriberton
Gabriele Berton
2 months
@v_lomonaco Thank you!
0
0
1
@gabriberton
Gabriele Berton
11 months
@kagglingdieter @nvidia @kaggle Any chance that the talk will be recorded and publicly shared?
0
0
0
@gabriberton
Gabriele Berton
2 years
@relja_work @ducha_aiki @SattlerTorsten @bcaputo_iit So happy to have a feedback from you :) I totally agree that those things should be investigated. A good thing about a contrastive loss would also be the better scalability to large datasets, given that mining wouldn't be required
1
0
1
@gabriberton
Gabriele Berton
9 days
@zatxhi It depends on the implementation - are you using different batches for the two losses? Is the backbone/common branch frozen?
1
0
1
@gabriberton
Gabriele Berton
3 months
@chrfrde @iss In this thread there is a conversation about it. It seems that NASA tried something like that in 2015
@stim3on
Simeon Schmauß
7 months
@this_is_tckb @AlexStoken Maybe it's possible to create a small device that could be attached to the camera hot shoe that would record an accurate timestamp and pointing information which could be synced with the images after the fact.
2
0
0
0
0
1
@gabriberton
Gabriele Berton
9 days
@cthorrez Not really, if they use the same encoder but separately then they're disentangled. They're not disentangled if for example they use the same features, like feats = model(images) l1 = compute_loss1(feats) l2 = compute_loss2(feats)
1
0
1
@gabriberton
Gabriele Berton
3 months
@BokmanGeorg @Parskatt @ducha_aiki @AlexStoken @bcaputo_iit You can either make the embeddings rotation invariant (faster option), or use rotation-dependent embeddings and generate many rotated views of the database (slower but more accurate). We did something in between, with 4 rotations of the database
1
0
1
@gabriberton
Gabriele Berton
8 days
@dodlapati_reddy I've used this for many years with many models - from ResNet18 to DINO-v2
1
0
1
@gabriberton
Gabriele Berton
9 days
@xperseguers Pytorch doesn't replace the gradient, it adds it to any existing gradient. So just don't call zero_grad in between the two backwards and you don't miss the gradients for any loss
0
0
2