Jerome Revaud Profile Banner
Jerome Revaud Profile
Jerome Revaud

@JeromeRevaud

1,229
Followers
100
Following
24
Media
370
Statuses

AI researcher :: Computer vision :: 3D vision :: Team Lead :: Project Lead :: Naver Labs Europe

Grenoble, France
Joined June 2023
Don't wanna be here? Send us removal request.
@JeromeRevaud
Jerome Revaud
3 months
Another convenient usage of DUSt3R (kind of anecdotal): While looking for a rental apartment on airbnb for my vacations, i noticed I had difficulty grasping the layout and space of the apartment based on photos alone. Solution: put all photos in DUSt3R and voila :)
15
82
496
@JeromeRevaud
Jerome Revaud
3 months
An example of how DUSt3R can do "impossible matching": given two images without any shared visual content (my office, obviously never seen at training), it can output an accurate reconstruction (no intrinsics, no poses!) in seconds
18
109
459
@JeromeRevaud
Jerome Revaud
5 months
We believe we did a breakthrough in Geometric 3D Vision: meet DUSt3R, an all-in-one 3D Reconstruction method
Tweet media one
13
55
267
@JeromeRevaud
Jerome Revaud
1 month
our #DUSt3R is not even a spotlight, it's a poster. Even though we had excellent reviews... I don't get it
@CSProfKGD
Kosta Derpanis
1 month
#CVPR2024 presentation decisions are now posted on OpenReview.
1
4
44
30
3
130
@JeromeRevaud
Jerome Revaud
2 months
Some more impressive 3D reconstruction examples from #DUSt3R kindly prepared by our boss ;-) @martinhu @naverlabseurope
@Vinc3nt_Leroy
Vincent Leroy
2 months
Truly amazed by the response of the community to DUSt3R, thanks a lot everyone! The repo is on fire 🔥🔥 We are busy RN with the deadline but rest assured we will handle the opened issues ASAP!
0
7
68
2
20
119
@JeromeRevaud
Jerome Revaud
1 month
UniDepth is a super interesting approach for jointly predicting depth and camera intrinsics. Conceptually, it's very similar to #DUSt3R , except for the second view and the representation of space coordinates. We tried to use their (azimuth,elevation,log_z) for #DUSt3R
@rsasaki0109
Ryohei Sasaki
1 month
UniDepth: Universal Monocular Metric Depth Estimation Zero-Shot Visualization YouTube (The Office - Parkour)
1
33
261
2
15
111
@JeromeRevaud
Jerome Revaud
24 days
Coming back from London, after being kindly invited for a talk about DUSt3R + MASt3R in the Dyson lab of @AjdDavison at Imperial college and in the 3DV team of @LourdesAgapito at UCL. Had insightful discussions with all lab’s members, this was truly a great time spent there ❤️
5
2
84
@JeromeRevaud
Jerome Revaud
23 days
@WayneINR @AjdDavison @LourdesAgapito Here are the slides of my recent invited talk about CroCo + DUSt3R + MASt3R Note: It's a pdf, i know the videos won't work but there's really nothing hitherto unseen. I can point to where to find each individual video upon request.
5
11
83
@JeromeRevaud
Jerome Revaud
2 months
Sth I find fascinating with DUSt3R is its uncanny ability to perform auto-calibration... even with single images. Is it precise? Quite so, as you can see below. This challenges the prevailing notion that single-image calibration is ill-posed (ambiguous & under-constrained)!
Tweet media one
@nackjaylor
Jack Naylor
3 months
Sometimes you see a paper that just knocks your socks off in terms of coolness - this is one of those papers.
0
5
21
2
7
66
@JeromeRevaud
Jerome Revaud
3 months
done, DUSt3R now first of the leaderboard!
@eric_brachmann
Eric Brachmann
3 months
@JeromeRevaud @ducha_aiki @Parskatt Nice! Even if it's not in the paper, it cannot hurt to submit your best entry to the leaderboard to mark your territory 😉 You did the bulk of work anyway already.
1
0
5
3
4
58
@JeromeRevaud
Jerome Revaud
2 months
Nice! Note that DUSt3R is a very data-driven approach, and it has never seen any human bodies during training. More generally, it kind-of struggles with situations it has never scenes. Still, no so bad :)
@cocktailpeanut
cocktail peanut
3 months
Instant Video-to-3D with DUST3R Dust3r generates a whole 3D scene from just a couple of images. What if it could: 1. Accept a VIDEO 2. Extract the video frames 3. Turn them into 3D? So added a Gradio Video Component. Here's me generating 3D from a video cc: @naverlabseurope
6
50
279
2
7
49
@JeromeRevaud
Jerome Revaud
2 months
So InstantSplat obliterates all concurrent pose-free methods in terms of recovered pose poses and rendering quality. And 50~200x faster at that. Wow! Congrats @WayneINR @yuewang314 👏
Tweet media one
@janusch_patas
MrNeRF
2 months
Tl:dr: some new black magic! "InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds" Paper: not yet Project: Cc @JeromeRevaud they use DUSt3R Method 1 I 2 ⬇️
5
31
245
2
1
48
@JeromeRevaud
Jerome Revaud
3 months
For reference, here is how my office truly looks like
Tweet media one
3
2
27
@JeromeRevaud
Jerome Revaud
1 month
@arankomatsuzaki Interesting how they excluded CroCo (NeurIPS’22), a visual foundation model for 3D vision which is *specifically designed* to remedy the lack of multi-view awareness. CroCo is powering #DUSt3R btw :)
@WeinzaepfelP
Weinzaepfel Philippe
11 months
Happy to present CroCo 🐊, our NeurIPS'22 paper on self-supervised learning tailored to geometric tasks by Cross-view Completion, during the 3DMV workshop at #CVPR2023 . Cross-View Completion extends Masked Image Modeling (MIM), eg MAE, with a second view of the same scene. 👇 1/8
1
1
20
3
0
26
@JeromeRevaud
Jerome Revaud
5 months
Existing image-based 3D reconstruction methods (e.g. COLMAP) rely on a complex pipelines with many handcrafted steps. In contrast, DUSt3R's main component is a deep network that directly outputs a 3D model from 2 input images (no pose, no calibration required)
Tweet media one
2
1
26
@JeromeRevaud
Jerome Revaud
5 months
@ducha_aiki More examples, videos and demonstrations at
1
6
25
@JeromeRevaud
Jerome Revaud
5 months
The key insight is to output "pointmaps" = dense 2D-3D onoe-to-one mappings, that live in the same 3D coordinate space. From this output, we can extract all scene and camera parameters effortlessly. When there's more than 2 images, we align all pointmaps in a common coord frame
Tweet media one
Tweet media two
1
1
21
@JeromeRevaud
Jerome Revaud
3 months
@ddetone The answer is simple: witchcraft🪄🧙‍♀️
@chrisoffner3d
Chris Offner
3 months
@riverakid1 @JeromeRevaud @bchidlovskii @Vinc3nt_Leroy Witchcraft. 😄 I just ran it on my own images (after changing all mentions of CUDA to [device = "mps" if () else "cpu"]) and wow, this is super cool! 🤩
1
6
32
1
1
18
@JeromeRevaud
Jerome Revaud
5 months
DUSt3R is not only simple and fast (inference takes 40ms per image pair on H100 GPU, global alignment for >2 images takes a few seconds), it also outperforms the state of the art in monocular AND multiview depth estimation, and relative pose estimation.
Tweet media one
Tweet media two
1
0
16
@JeromeRevaud
Jerome Revaud
3 months
@janusch_patas @ssh4net I can understand if you look at it purely from an application-driven viewpoint. Clearly not useable yet for realworld use-cases. For me, it's a revolution because that's the first time 3D reconstruction is done so straightforwardly, and for the first time, without calibration 😎
2
0
16
@JeromeRevaud
Jerome Revaud
1 month
Fixed! RoMa is now under BSD license
@ducha_aiki
Dmytro Mishkin 🇺🇦
4 months
@romain_bregier @JeromeRevaud @Parskatt Oh, I see the catch now: non-commercial. Any reason why? I mean, I can see a reason for the new methods, like R2D2, etc, but for the basic 3d geometry code? Anyway, that's better for kornia then ;)
Tweet media one
1
0
0
2
3
15
@JeromeRevaud
Jerome Revaud
5 months
Surprisingly, it can also establish "impossible" pixel correspondences between opposite viewpoints which share practically no visual content! (examples not cherry-picked) @ducha_aiki
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
1
15
@JeromeRevaud
Jerome Revaud
2 months
@janusch_patas @WayneINR @PMel3D on our side, we are planning to make dust3r better than COLMAP in every respect
1
0
13
@JeromeRevaud
Jerome Revaud
1 month
@eric_brachmann Sorry the opportunity was too good to miss, so here we go with DUSt3R to the rescue with impossible pairs😉
Tweet media one
Tweet media two
1
0
13
@JeromeRevaud
Jerome Revaud
3 months
@janusch_patas @ssh4net (SfM like COLMAP works without calibration, but it's a quite complex pipeline). Hopefully one day we'll beat COLMAP in term of precision too :)
1
0
12
@JeromeRevaud
Jerome Revaud
24 days
Many thanks again to @AjdDavison , @LourdesAgapito and Andrea Vedaldi @ VGG oxford, also thanks to @makezur for taking care of the organisation, and to all students who took the time to make a demo! @HideMatsu82 It was very impressive, really feeling a vibe of convergence in #3dv
1
0
12
@JeromeRevaud
Jerome Revaud
23 days
@vincesitzmann oh after ACE-Zero another deep-SfM method! That's exciting to be part of this race to beat COLMAP :)
1
0
11
@JeromeRevaud
Jerome Revaud
5 months
@ducha_aiki Work done with Shuzhe Wang, Vincent Leroy, Yohann Cabon, @bchidlovskii at @naverlabseurope cc @WeinzaepfelP @skamalas
1
1
11
@JeromeRevaud
Jerome Revaud
5 months
@ducha_aiki @chriswolfvision @naverlabseurope didn't have time yet... We plan on submitting more experimental results soon (e.g. visual localisation).
0
0
10
@JeromeRevaud
Jerome Revaud
1 month
@WayneINR When there's that many details, it's clear that the resolution of #DUSt3R is not sufficient to capture them. As a result, the pointmap initialization is faulty on many small parts. Still, most of the scene is ok... I feel we're almost there, just need to train a better DUSt3R
2
0
9
@JeromeRevaud
Jerome Revaud
1 month
oh wow
@WayneINR
Zhiwen(Aaron) Fan
1 month
Exciting update: Our latest results using THREE training views! Despite unknown poses and intrinsics, #instantsplat achieve this in under 20 seconds— even faster under sparser views.
10
12
137
0
1
9
@JeromeRevaud
Jerome Revaud
3 months
@ducha_aiki Agreed While DUSt3R is impressive with extreme viewpoint changes, it's still relatively sensitive to "long-term" changes (e.g. summer vs winter), which dominates the WxBS benchmark. A robust matcher like Roma based on Dinov2 makes more sense @Parskatt
1
0
8
@JeromeRevaud
Jerome Revaud
3 months
@janusch_patas Note: DUSt3R also has a problem with the sky. It predicts confidently some finite distance. Basically there was never any annotation to tell "this is infinite depth" We're working to fix it
1
0
8
@JeromeRevaud
Jerome Revaud
3 months
@eric_brachmann @ducha_aiki @Parskatt Yes we tried that... It's a pity we didn't have time to do it for the CVPR submission
Tweet media one
Tweet media two
3
0
8
@JeromeRevaud
Jerome Revaud
1 month
@eric_brachmann Yes that’s definitely a much more scalable approach than current pairwise matcher (including DUSt3R)
1
0
8
@JeromeRevaud
Jerome Revaud
2 months
@janusch_patas The results are in plain sight on a public leaderboard… now waiting for someone to notice 😛
2
0
7
@JeromeRevaud
Jerome Revaud
3 months
@jasonyzhang2 Nice paper, congratulations! Note that we @naverlabseurope pushed the idea a bit further with DUST3R
@JeromeRevaud
Jerome Revaud
5 months
We believe we did a breakthrough in Geometric 3D Vision: meet DUSt3R, an all-in-one 3D Reconstruction method
Tweet media one
13
55
267
1
1
7
@JeromeRevaud
Jerome Revaud
22 days
I honestly find this concerning. This essentially means that in the hidden preprompt, the LLM is asked to deny at all cost that it has access to private info, even though it clearly has. Basically someone at meta decided « ok we’re gonna lie to the face of our users »
@OnwardsProject
Jerrod Lew
23 days
This is so creepy from Meta AI. It clearly can use my location but doesn’t admit to knowing this and instead completely lies about it.
Tweet media one
536
2K
25K
0
0
7
@JeromeRevaud
Jerome Revaud
1 month
@habib__slim oh i see. Now i understand😆
0
0
6
@JeromeRevaud
Jerome Revaud
3 months
@Rafael_L_Spring something like 20ms on a A100 GPU?
0
0
6
@JeromeRevaud
Jerome Revaud
2 months
@ducha_aiki @LingjieLiu1 @KostasPenn Looks great. I really like this idea of optimizing a "canonical 3D space". It drops many constraints from the hard 4D (3D+t) reconstruction problem
0
0
4
@JeromeRevaud
Jerome Revaud
24 days
@eric_brachmann That’s pretty impressive. Awesome work, congrats! Also really happy you compared to the DUSr3R baseline :)
1
0
6
@JeromeRevaud
Jerome Revaud
24 days
2
0
6
@JeromeRevaud
Jerome Revaud
4 months
@nburrus @luxonis Here is foretaste. DUSt3R outputs this quasi-instantaneously (<100ms). No camera parameters, no calibration file, just 2 pngs given as input! :)
1
0
6
@JeromeRevaud
Jerome Revaud
20 days
@olliecrosen @AjdDavison @LourdesAgapito There's two things: - first, CroCo is bad at semantic tasks, as mentioned above. - second, we also tried to use a DINOv2 encoder (instead of CroCo) in DUSt3R and MASt3R, either frozen or finetuned, and that was kind of a failure, but we only evaluated for geometric tasks.
2
0
5
@JeromeRevaud
Jerome Revaud
2 months
@eric_brachmann @ducha_aiki Something tells me it might come sooner than you think.
1
0
5
@JeromeRevaud
Jerome Revaud
3 months
@ducha_aiki @Parskatt @jasonyzhang2 @amyxlase @RamananDeva @shubhtuls Working on it ... sometimes the results are relatively impressive, sometimes it totally fails (especially the RGB-IR pairs! wtf is this?) Anyway, DUSt3R code and model weights will be released this week.
2
0
5
@JeromeRevaud
Jerome Revaud
3 months
@janusch_patas @Atehortuajf @_linus_franke Totally agree. Let's say that directly outputting 3D Gaussians with DUSt3R is on my todo list :)
1
0
5
@JeromeRevaud
Jerome Revaud
3 months
@janusch_patas @Atehortuajf We did not try yet to optimize 3D Gaussians from the output of DUSt3R. Most likely it will fail, because the camera and reconstruction are somewhat sloppy, and 3DGS need a really precise geometry at initialization. We're working on making it a lot more precise...
1
0
5
@JeromeRevaud
Jerome Revaud
2 months
@chriswolfvision he deserves some extra-reward
0
0
5
@JeromeRevaud
Jerome Revaud
2 months
@Parskatt oh that looks very interesting indeed
0
0
5
@JeromeRevaud
Jerome Revaud
11 days
@ducha_aiki Not for long. Wait for it
1
0
5
@JeromeRevaud
Jerome Revaud
4 months
Tweet media one
0
0
5
@JeromeRevaud
Jerome Revaud
2 months
@ptrck_brgr @WayneINR Also Colmap-free-3DGS requires the camera intrinsics. How do you get the intrinsics? ... COLMAP Honestly, I don't know of a single deep-learning-based 3D vision method that works without the intrinsics... except for #DUSt3R ofc :D
1
0
5
@JeromeRevaud
Jerome Revaud
2 months
(the castle pic (second image pair) shows our actual Lab! )
Tweet media one
1
0
5
@JeromeRevaud
Jerome Revaud
2 months
@camenduru please try to put the resulting images in #DUSt3R and voila you should have a nice 3D model
@Vinc3nt_Leroy
Vincent Leroy
3 months
Been waiting for the code release of DUSt3R? Wait no more, there it is! Reminiscent of RayDiffusion (), we present a well behaved variant to Bundle Adjustment, this time with pointmaps.
3
86
471
0
0
5
@JeromeRevaud
Jerome Revaud
2 months
@ducha_aiki @Vinc3nt_Leroy By the way, we’re also working to release the full DUST3R training set
1
0
5
@JeromeRevaud
Jerome Revaud
2 months
@yuewang314 No need to thank, I’m just really happy and excited to see DUST3R contributing to the 3DV community
0
0
5
@JeromeRevaud
Jerome Revaud
1 month
@Michael_J_Black Well language is already expressed in a pure semantic manner (by definition) so that effectively removes a big part of the perception problem that we have with images. Also, massive text data are available. Much more than images, in terms of « semantic quantity » imo
1
0
5
@JeromeRevaud
Jerome Revaud
2 months
lol neat
@VisionBernie
Bernhard Egger
2 months
It’s time to revolutionize authorship order in publications across fields #ScienceTwitter ! Our work “AMOR: Ambiguous Authorship Order” resolves author ordering problems once and for all! We just implement animated author lists – that's it @SIGBOVIK : 1/n
8
12
98
0
0
5
@JeromeRevaud
Jerome Revaud
11 days
@ducha_aiki Exactly haha 👍
1
0
5
@JeromeRevaud
Jerome Revaud
24 days
@WayneINR @AjdDavison @LourdesAgapito Arxiv paper coming soon (we’re polishing some details)
2
0
5
@JeromeRevaud
Jerome Revaud
1 month
@ducha_aiki @Parskatt @SattlerTorsten We tried different flavors of 3DGS (like SuGaR) to convert small videos to dense 3D point-clouds. Too many floaters.
0
0
4
@JeromeRevaud
Jerome Revaud
3 months
@Parskatt @ducha_aiki @jasonyzhang2 @amyxlase @RamananDeva @shubhtuls In other words, the representation space is ill-posed. Therefore this kind of approach can only be successful in constrained settings, e.g. with object-centric datasets where the focal is small and roughly constant. Modelling rays rather than extrinsic is already better posed!
1
0
4
@JeromeRevaud
Jerome Revaud
23 days
@jianyuan_wang I like the very honest « limitations » section at the end of appendix :)
0
0
4
@JeromeRevaud
Jerome Revaud
2 months
@Parskatt True, in many cases, the problem is clearly ill-posed. One clue that we suspect helps DUSt3R are right angles. In most images you can find right angles, even for outdoor images in the wild: trees are verticals on a horizontal ground. That's enough to solve for the focal
1
0
4
@JeromeRevaud
Jerome Revaud
1 month
but contrary to the claims in UniDepth that decoupling rays/depth is important, we observe a decrease in performance. Here the regression loss on Habitat and MegaDepth. Not sure why... Note: log(z) diverges (the 2nd view can be behind the cam), so we added log(distance_to_origin)
Tweet media one
Tweet media two
0
0
4
@JeromeRevaud
Jerome Revaud
1 month
@jfbrly Thanks. We are!
0
0
4
@JeromeRevaud
Jerome Revaud
1 month
@Michael_J_Black @AjdDavison Thankfully that's what our plan always was... starting from #CroCo in 2022😀 Thanks for the tip! 🙏
0
0
4
@JeromeRevaud
Jerome Revaud
1 month
@eric_dexheimer @AjdDavison Great work, congrats (and thanks for releasing the code so fast!)
2
0
4
@JeromeRevaud
Jerome Revaud
5 months
@mayfer Yes, there were similar attempts in the past, like DeMoN and DeepV2D, but these approaches still require camera intrinsics. Performing 3D reconstruction without knowing the intrinsics was deemed infeasible (unless explicit BA), well we show it's not.
1
0
4
@JeromeRevaud
Jerome Revaud
1 month
@eric_brachmann @NianticLabs @axelbarrosotw @scriptide @viprad Congrats, that's a really cool paper further pushing the boundaries of image matching! Did you try optimizing the pose accuracy instead of VCRE?
1
0
4
@JeromeRevaud
Jerome Revaud
3 months
@ducha_aiki @Parskatt also, DUSt3R seems more robust with indoor data. I think there might be a training imbalance in that regard... we need more public outdoor datasets :)
1
0
3
@JeromeRevaud
Jerome Revaud
3 months
@FangQ Thanks 🙏 but note DUST3R has never seen human bodies during its entire training. Since Dust3r is a very data-driven approach by nature, reconstruction results on human bodies can be relatively sloppy and disappointing.
1
0
4
@JeromeRevaud
Jerome Revaud
23 days
@AjdDavison Same goes in the other direction. I wasn’t expecting that many excellent, really top-notch papers and demos. Super interesting research overall! Also really liked the atmosphere in your lab, for sure one of the best lab i visited
1
0
4
@JeromeRevaud
Jerome Revaud
4 months
@chriswolfvision @danish_nazir1 @ducha_aiki Same story with DUSt3R, which challenges the prevailing notion that camera intrinsics are indispensable for 3D reconstruction and a whole bunch of 3DV tasks (unless using costly bundle adjustment). Yet, here we are😎
@JeromeRevaud
Jerome Revaud
5 months
We believe we did a breakthrough in Geometric 3D Vision: meet DUSt3R, an all-in-one 3D Reconstruction method
Tweet media one
13
55
267
0
0
4
@JeromeRevaud
Jerome Revaud
1 month
@dreamingtulpa haha. It's trying
Tweet media one
0
0
4
@JeromeRevaud
Jerome Revaud
1 month
@chriswolfvision « In the spirit of friendly academic shitposting » 🤣
1
0
4
@JeromeRevaud
Jerome Revaud
3 months
@georgiagkioxari Oh, ok! We actually borrowed this formulation from "Multi-task learning using uncertainty to weigh losses for scene geometry and semantics", CVPR'18 from Kendall et al. And thanks to you, I just notice we didn't properly cite their paper 😕
1
0
3
@JeromeRevaud
Jerome Revaud
2 months
1
0
4
@JeromeRevaud
Jerome Revaud
1 month
@WayneINR I was waiting for this result... The first video that you posted was static, but this is even more impressive!
1
0
3
@JeromeRevaud
Jerome Revaud
3 months
@AndrewVoirol The only issue I see is that DUSt3R is not metric, so you'll have to find a way to properly scale the reconstructed 3D model
0
0
3
@JeromeRevaud
Jerome Revaud
4 months
@ducha_aiki HPatches is ... how to say politely... a bit outdated
0
0
2
@JeromeRevaud
Jerome Revaud
2 months
@JsonBasedman Same thing in Korea with Naver and Kakao (big IT competitors there) whereas they're actually close friends from the same school (they worked together at Samsung too)
0
0
3
@JeromeRevaud
Jerome Revaud
1 month
@hupobuboo @janusch_patas Great presentation, thank you for the coverage!
0
0
3
@JeromeRevaud
Jerome Revaud
3 months
0
0
3
@JeromeRevaud
Jerome Revaud
4 months
@ducha_aiki hahaha🤣
Tweet media one
0
0
3
@JeromeRevaud
Jerome Revaud
1 month
@taiyasaki Thanks for letting us know 🙏
0
0
3
@JeromeRevaud
Jerome Revaud
2 months
@Parskatt In this example, for instance, easy-peasy: lots of right angles everywhere :)
@JeromeRevaud
Jerome Revaud
4 months
Tweet media one
0
0
5
0
0
3
@JeromeRevaud
Jerome Revaud
2 months
@Parskatt @chrisoffner3d Also, note that you can load and store .exr with opencv, which you most likely have installed already
1
0
2
@JeromeRevaud
Jerome Revaud
12 days
@erhartford @maximelabonne That’s very interesting if true
0
0
3
@JeromeRevaud
Jerome Revaud
2 months
@Poyonoz @WayneINR @yuewang314 yeah... but everything is too fast nowadays anyway
0
0
2
@JeromeRevaud
Jerome Revaud
23 days
0
0
3
@JeromeRevaud
Jerome Revaud
2 months
1
0
3