This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical
Limitations: this only works when you have more than one loss (with disentagled computational graphs).
Bonus: the more losses you have, the more memory you'll save
Ok I'll use this - I'm looking for computer vision jobs/internships from November, anywhere in the world. I have industry experience (Amazon and more) and good publications (first authored 6 CVPR/ICCV). Background in VPR/IR but expanding my horizons. ~1000 stars on GitHub
Next week I'll be at ICCV with two papers and one demo on Visual Place Recognition!
Our demo is about geo-localizing images with city-wide 3D models. You can send us your photos and we try to geo-localize them! We have 3D models of Paris, San Francisco and Berlin! 🧵
@ISS
astronauts take photos of Earth, which are traditionally manually localized for disaster management and climate change research: our
#CVPR2024
paper shows that localization can be automated! The localized photos can be explored at
Short thread:
Tomorrow at
#CVPR2023
I'll be giving a 30 minutes talk on geolocalization for large-scale scenarios at the tutorial on Visual Geo-Localization! If you're curious about the task, see you there! We'll show you how to geolocalize clips from famous movies!
Looking for a summer school in machine learning?
Check out this list of tens of ML summer schools per year, including dates, locations, fees, grants.
PRs with new summer schools / updated info are welcome!
Thanks to
@sshkhr16
for this awesome repo!
@lwangoku99066
Yeah but then the trick doesn't really help (does not save memory), so IMO if the losses share part of the graph the doing "loss = loss1 + loss2" is fine
@ducha_aiki
@Parskatt
We gave an answer to this question for 17 matching methods on an unconventional domain (satellite images) in our latest paper (EarthMatch). Most methods plateau between 512 and 768, RoMa thrives at 1024, SIFT-LG (one of our best performing) at 256
@PMinervini
This accumulates the gradient, but when people say "gradient accumulation" they refer to splitting the batch into sub-batches, which is quite different to the above.
Standard grad acc can lead to batch norm issues (BN doesn't like small batches), the code above doesn't
Are you at
#CVPR
?
Come to check our 2 papers on visual geolocalization and visual place recognition!
Both are open-source and fully reproducible, and we have a cool demo at the poster session!
More info below
The Vandal Lab is at
#CVPR2022
with several members, from June 19-24. We are ecstatic to present our 8 papers on First Person Action Recognition, Semantic Segmentation, Visual Geo-localization, Object Detection, Incremental Learning.
See you all there!
What is the reason for not releasing final reviews to authors in conferences like
@eccvconf
,
@CVPR
,
@ICCVConference
?
It would be pretty helpful for ECCV authors to have the final reviews right now - we'd have a better idea of P(acceptance)
The perfect GitHub repo allows you to
git clone -> pip install -> python
and should work automagically.
The should download data and models if necessary
Hi
@CVPR
@CVPRConf
could we have some further details on the "no links in rebuttal rule"?
E.g., if a link was already in the main paper (but we think reviewers missed it), can we add it?
And what about links to news articles? Alternatively, can we add them as references?
I've created a simple wget-like tool to download big files with python.
Just `$ pip install py3_wget` and use like `import py3_wget; py3_wget.download_file(url)`.
It's useful to download data and models within the code to improve usability/reproducibility
The perfect GitHub repo allows you to
git clone -> pip install -> python
and should work automagically.
The should download data and models if necessary
Rethinking Visual Geo-localization for Large-Scale Applications
Poster session 1.2 (Tue 2:30)
We present a 40M+ dataset of San Francisco, and a new highly scalable training method which reaches sota at a fraction of the cost
@ducha_aiki
@AlexStoken
@bcaputo_iit
Thank you for the tweet! The main contribution of the paper is proving that the task of Astronaut Photography Localization is actually solvable with large-scale image retrieval (nobody tried it before, and those image were mostly localized manually before EarthLoc)
@Parskatt
@ducha_aiki
@AlexStoken
@bcaputo_iit
AnyLoc is really really good on this, considering it's not specifically designed for astronaut imagery!
But its features extraction is too slow and features are way too heavy for practical deployment
@jon_barron
For me what makes pytorch great is the large user control over the training. In practice I never saw more than 5 losses being used, and most pipelines use 1 or 2
@dantkz
Yeah, not too much progress between 2015 (NetVLAD code released) and 2022 (I'd cite SFRS/OpenIBL as main highlight of this VPR winter). Around 2022 VPR has taken off, but we're usually focusing on urban large-scale datasets, not much on generalization/indoor/nature
@Parskatt
@ducha_aiki
@AlexStoken
@bcaputo_iit
@BokmanGeorg
We're working on Steerers too (and a dozen more models). For the post-processing both images share "up" in EarthLoc we compute retrieval with 4 rotations (every 90°) of each DB image. So in case of "perfect" retrieval, they share similar orientation
@akuwantsagi
@giffmana
Yep, can be done :) We automated localization of photos taken from the ISS (a task mostly done by hand till recently) using similar techniques that are used for visual place recognition (using retrieval instead of classification) in our CVPR24 paper
@CVPR
@CVPRConf
2) For news articles and pre-existing libraries not generated by the authors, I think ACs would likely see this as acceptable as long as it was clearly relevant and needed to address reviewer comments.
@isaaccorley_
@ducha_aiki
@AlexStoken
@bcaputo_iit
normalization, so I tested them with the default (ImageNet). Can you share some info on which normalization should be used on those? I agree that it could have a big impact on results
@CVPR
@CVPRConf
Here and in the next reply is what they said:
1) If the link is already in the main paper, perhaps the rebuttal could just point out the link in the main paper using line numbers, to avoid any perception of policy violation.
@Parskatt
There are city-wide street-view datasets freely accessible (not from Google), some coupled with 3D meshes of the city.
E.g. from San Francisco
images
3D mesh
I don't understand why more people aren't using these
@jon_barron
@gazorp5
I agree with the troll answer though 😂
In fact the reasoning behind my tweet is similar to the one behind gradient accumulation: instead of computing the loss with bs 1024, compute 8 sub-losses with bs 128, and run the backward sequentially for each of the 8 losses
@giffmana
@captnloic
@akuwantsagi
Agree that 100M+ can't densely cover the world, but in very large-scale VPR experiments we ran (I've been working on this 4+ years) we (unexpectedly) found that a good VPR system retrieves images from nearby locations to the query, even if there absolutely no covisibility
@pesarlin
@dantkz
Good to know that retrieval is the main bottleneck in LaMAR. Which metric would you use to benchmark retrieval methods on it (without having to run the matching as well)?
Photos are taken with a DSLR camera from the ISS cupola. The ISS location at the moment of taking the photo is known, but the footprint of the photo (i.e. the area represented in it) can be thousands of km away, making the localization problem very challenging and large-scale.
@CVPR
what's the use of the "single “slide” serving as the high-level summary of your work"?
Should it be a simpler version of the poster for virtual attendees, or more like a thumbnail?
@pesarlin
@dantkz
BTW if you're looking for indoor VPR models, a student in our lab released these two models (just two models tuned on indoor, we're not planning a paper on it)
@ducha_aiki
Thank you for the tweet :) Just I would say that the biggest contribution of the paper is not the dataset, rather the scalable training method that allows us to largely outperform previous methods on all datasets
@BokmanGeorg
@AlexStoken
@Parskatt
@ducha_aiki
@bcaputo_iit
Really cool, thanks for trying! I don't really understand what the images are supposed to show though, but if the steerer achieves the goal of avoiding 4xTTA and improving results, that would be a huge win!
@akuwantsagi
@giffmana
Astronaut photos are used for disaster management and climate research. Localizing random (ground) photos has less applications afaik, but the fact that some startups are working on this (e.g. ) makes me think that there are more use cases than we imagine
@AlexStoken
If you're a computer vision scientist looking for a new and unexplored task with lots of practical applications (and lots of available data!) then Astronaut Photography Localization is the task for you! Feel free to ping me for any question.
Kudos to all co-authors
@gabTrivv
,
@lolleko_
,
@masone_carlo
, Juan Aragon, Barbara Caputo, Thomas Pollok and Jürgen Beyerer!
Links to code and trained models of EigenPlaces:
and Divide&Classify:
@BokmanGeorg
@AlexStoken
@Parskatt
@ducha_aiki
@bcaputo_iit
And I'm also really happy to see already some development on this. Being a new task, there is a ton of low-hanging fruits ready to be picked - and improving results on this task has direct real world benefits on many domains
@AdamDHines
Really cool, considering that it's not explicitly trained for this!
I just tried these two photos on this amazing demo from
@_ericmb
which is trained for "world-wide geolocation" and it correctly predicts Brisbane for both images!
Given their importance, these images used to be manually localized at NASA (300k have manual labels!). Then
@AlexStoken
came up with FMAP (from CVPR 2023) which computes image matching between the astronaut photo and all its potential matches (spanning over millions of sq km).
(Oral) Deep Visual Geo-localization Benchmark
Poster session 2.1 (Wed 10:00)
A benchmark that provides a flexible codebase which we used to define best practices for the task!
Unfortunately, all three presentations are simultaneous on Thursday the 5th (posters at 10:30-12:30 Room Foyer Sud and demo 10:30-16:00 Demos Area): I'll be at the EigenPlaces poster in the morning and the demo in the afternoon, if you're at ICCV come to say hi!
@surmenok
@lwangoku99066
In this ICCV paper I use 3 losses, each of which uses different data
Also the trick can be used (with some adjustments) when using a shared frozen backbone and multiple trainable heads (and the same batch). I wouldn't say it's rare, but neither too common
@AlexStoken
Possible future directions might include: training directly on (some of the) localized queries, using seasonal data, applying domain adaptation, using available metadata (like camera lens) during training, using stronger backbones, and more.
@CSProfKGD
@CVPR
@CVPRConf
Thanks for the advice, I'll stick to that.
I'm just worried that R2 isn't going to spend 10 seconds of their time to find that link on the paper 😅 whereas providing a direct link might encourage them to click on it.
Same for news article, I doubt they'll google it on their own
@Parskatt
E.g. in these repos the code takes care of downloading the models weights. They work automatically out of the box and can be used on other data as well
and
@eccvconf
@CVPR
@ICCVConference
I mean releasing the final reviews as soon as the reviews are finalized by reviewers (which is now for ECCV) instead of waiting until there is the final decision from the AC
@RikyUnreal
@phi48
Correct, we didn't try EarthLoc (and it wouldn't work) on night images, but we will develop something for those in future developments
@jon_barron
Do you think this could really happen? I wish it did, but I think in practice this kind of behavior (or other scientific malpractice) is not punished harshly enough. And it's also hard to prove
@dariocazzani
Only if they share the computational graph, otherwise no. And if they do share, then I wouldn't use retain_graph, I'd just add the losses and compute a single backward
@TimDarcet
Really cool work! It will be really interesting to see how this data will be used for forest management. The project is quite similar to the BioMassters challenge where the goal was to estimate biomass (I'm now wondering if the tree height has a 1-to-1 correlation with biomass)
@giffmana
@captnloic
@akuwantsagi
And regarding real-world applications, all those mentioned above would have a human in the loop and don't require real-time, so retrieval would be more suitable (retrieval produces visual confirmation unlike classification). Probably classif+retrieval is the best choice though
Astronaut photos span decades and have high spatial resolution, making them ideal to study long-term climate change effects. Unlike most satellite imagery, they can also be taken at an angle, which is helpful to measure heights (e.g. smoke height from volcanic eruptions)
We propose to use image retrieval over an image in a database of satellite imagery, with known footprints, that is most similar to the astronaut photo. Our database represents all the area visible to an astronaut at photo capture time.
@isaaccorley_
@ducha_aiki
@AlexStoken
@bcaputo_iit
I think one reason for their poor results is that those models are not actually trained for the task of APL (because no models existed for APL, I tried a lot of models for many related tasks). But TBH on torchgeo I didn't find anything about those models using different ...
@oh_that_hat
Sunday afternoon in Mauerpark is a must (Berlin wall+graffiti+karaoke+street artists). Cycling from Alexander Platz to the Brandenburg gate, and cycling around Tempelhofer (a huge ex airport which is now a park, full of people enjoying life).
Also try a beergarten and sysyphos
@zatxhi
I mean if you have different inputs for each loss, then the trick is as simple as what I posted above.
If you have something like
out = backbone(images)
loss1 = do_loss1(out)
loss2 = do_loss2(out)
Then you can also use the trick above (but the feats need to be cloned first)
@Parskatt
Usually repos have a readme which tells you first to download code and data, then run . It would be much easier if the download was automatic. Then of course, the should also be usable with other datasets/models as a library
@cthorrez
Not only that. It is also if you're training the same model with different batches for each loss, or if you're passing the same batch through the model once for each loss
@relja_work
@ducha_aiki
@SattlerTorsten
@bcaputo_iit
So happy to have a feedback from you :)
I totally agree that those things should be investigated.
A good thing about a contrastive loss would also be the better scalability to large datasets, given that mining wouldn't be required
@this_is_tckb
@AlexStoken
Maybe it's possible to create a small device that could be attached to the camera hot shoe that would record an accurate timestamp and pointing information which could be synced with the images after the fact.
@cthorrez
Not really, if they use the same encoder but separately then they're disentangled.
They're not disentangled if for example they use the same features, like
feats = model(images)
l1 = compute_loss1(feats)
l2 = compute_loss2(feats)
@BokmanGeorg
@Parskatt
@ducha_aiki
@AlexStoken
@bcaputo_iit
You can either make the embeddings rotation invariant (faster option), or use rotation-dependent embeddings and generate many rotated views of the database (slower but more accurate).
We did something in between, with 4 rotations of the database
@xperseguers
Pytorch doesn't replace the gradient, it adds it to any existing gradient. So just don't call zero_grad in between the two backwards and you don't miss the gradients for any loss