Ben Mildenhall Profile
Ben Mildenhall

@BenMildenhall

4,908
Followers
1,003
Following
25
Media
96
Statuses

making stuff 3D. formerly research scientist at Google, phd at Berkeley.

San Francisco, CA
Joined December 2020
Don't wanna be here? Send us removal request.
@BenMildenhall
Ben Mildenhall
2 years
If you're still at CVPR and have the stamina to make it through another poster session, check out RawNeRF tomorrow morning! We exploit the fact that NeRF is surprisingly robust to image noise to reconstruct scenes directly from raw HDR sensor data.
19
200
1K
@BenMildenhall
Ben Mildenhall
3 months
will it nerf? yep ✅ congrats to @_tim_brooks @billpeeb and colleagues, absolutely incredible results!!
@_tim_brooks
Tim Brooks
3 months
Sora is our first video generation model - it can create HD videos up to 1 min long. AGI will be able to simulate the physical world, and Sora is a key step in that direction. thrilled to have worked on this with @billpeeb at @openai for the past year
150
159
1K
16
93
725
@BenMildenhall
Ben Mildenhall
2 years
Code finally released for our CVPR 2022 papers (mip-NeRF 360/Ref-NeRF/RawNeRF)! You can also find links for each paper's dataset on its project page. The code has some nice new camera utilities for larger real scenes, like this one.
@jon_barron
Jon Barron
2 years
We've finally released code for three of our CVPR2022 papers: mip-NeRF 360, Ref-NeRF, and RawNeRF. Instead of three separate releases, we've done something a little unusual and merged them into a single repo. Excited to see what people do with this!
7
118
763
17
103
584
@BenMildenhall
Ben Mildenhall
5 months
Friday was my last day at Google after 3 years. Will dearly miss hanging out with all my amazing coworkers, but looking forward to trying something new. The last few years of progress have been crazy and I expect even wilder things to come. goodbye to my nerfing camera 💔
Tweet media one
21
5
511
@BenMildenhall
Ben Mildenhall
5 months
ReconFusion = standard single-scene optimized 3D reconstruction, additionally guided by a multi-view diffusion prior to allow for decent outputs from significantly fewer input views. some thoughts below for view synthesis fanatics... 1/
6
49
419
@BenMildenhall
Ben Mildenhall
2 years
The use of sigma in NeRF volume rendering is neither wrong nor a bug, but an intentional choice. Here's the integral form of the radiative transfer equation, from rendering experts @_jannovak @wkjarosz et al How does this become the equation in NeRF?
Tweet media one
@ftm_guney
F. Güney
2 years
Richard Hartley finding a bug in the formulation of NeRF that makes it easier to optimize.. this day has already turned out more fun than I thought.
5
21
276
1
54
337
@BenMildenhall
Ben Mildenhall
2 years
one of my favorite new renderings from this project
@jon_barron
Jon Barron
2 years
Very glad I can finally talk about our newly-minted #CVPR2022 paper. We extended mip-NeRF to handle unbounded "360" scenes, and it got us ~photorealistic renderings and beautiful depth maps. Explainer video: and paper:
23
274
2K
8
20
271
@BenMildenhall
Ben Mildenhall
2 years
i am so sorry 😭
@ziruiwang_
Zirui Wang
2 years
Nice coordinate convertion😆😆😆
Tweet media one
3
9
138
6
14
176
@BenMildenhall
Ben Mildenhall
2 years
"a monkey hitting a laptop with a hammer" #dreamfusion
@GideonOnGaming
Gideon on Gaming
2 years
Me, learning blender, seeing this…
0
0
14
4
14
157
@BenMildenhall
Ben Mildenhall
1 year
moved back. London ➡️ SF
Tweet media one
Tweet media two
6
4
154
@BenMildenhall
Ben Mildenhall
2 years
check out the restaurant flythrough! @PeterHedman3 @duck
@JeffDean
Jeff Dean (@🏡)
2 years
The new @googlemaps Immersive View feature is going to be pretty amazing (and uses Neural Radiance Fields, or NeRFs, developed by @GoogleAI , UCBerkeley and UCSD researchers) See Watch Maps Immersive portion of #GoogleIO keynote:
7
80
504
0
19
154
@BenMildenhall
Ben Mildenhall
2 years
Variations on "a corgi in a bath robe reading a newspaper," suggested by @georgiagkioxari find more corgis at #dreamfusion
@georgiagkioxari
Georgia Gkioxari
2 years
@poolio @BenMildenhall @ajayj_ @jon_barron Is it a good time to ask for "a corgi in a bath robe reading a newspaper"? I'd like the mesh if it's not too much to ask! I wanna to 3D print it :) @jon_barron made me do this!
1
0
28
3
21
132
@BenMildenhall
Ben Mildenhall
4 months
+1 to Ben P and between bill freeman’s famous (and accurate) plot and never-ending complaints about the randomness of the review process… can we really blame people for coming to the misguided conclusion “draw as many samples as I can to maximize chance of getting an outlier”?
Tweet media one
@poolio
Ben Poole
4 months
One paper can change your life. But which one? Overproductivity doesn't just come from paper counting, but from the desperate acts of young researchers under extreme pressure to be part of that one paper.
1
2
137
3
7
131
@BenMildenhall
Ben Mildenhall
2 years
Playing with this has been ridiculously fun
@poolio
Ben Poole
2 years
Happy to announce DreamFusion, our new method for Text-to-3D! We optimize a NeRF from scratch using a pretrained text-to-image diffusion model. No 3D data needed! Joint work w/ the incredible team of @BenMildenhall @ajayj_ @jon_barron #dreamfusion
136
1K
6K
4
16
131
@BenMildenhall
Ben Mildenhall
2 years
z axis ✅ #dreamfusion ("a chimpanzee chiseling a marble statue of a monkey")
@paulg
Paul Graham
2 years
Prediction: More impasto coming from human artists. Dall-E has taken a lot of territory in the x and y dimensions, but none at all in the z.
52
25
441
2
12
116
@BenMildenhall
Ben Mildenhall
8 months
check out our work on improving joint NeRF+camera optimization! shoutout to this iOS app that records images and arkit poses, I was able to use it to capture all the arkit scenes for this paper in about 15 minutes
@KeunhongP
Keunhong Park
8 months
Introducing CamP🏕️ — a method to precondition camera optimization for NeRFs to significantly improve quality. With CamP we’re able to create high quality reconstructions even when input poses are bad. Project page: ArXiv: (1/n)
4
67
366
1
7
93
@BenMildenhall
Ben Mildenhall
2 years
Find out more in our talk on "NeRF in the Dark" Friday morning at 8:30am in Great Hall B-C or visit our poster ( #19 ) from 10am-12:30pm. @PeterHedman3 @_pratul_ @rmbrualla @jon_barron Paper and more results at
3
11
85
@BenMildenhall
Ben Mildenhall
2 years
nerfirenze
3
4
70
@BenMildenhall
Ben Mildenhall
2 years
to each their own #dreamfusion
@the_shweenz
Chris Sweeney
2 years
@ajayj_ @poolio @jon_barron @BenMildenhall I'm much more interested to see my home office redone to be mid-century modern style than I am in seeing a purple duck eating an everything bagel. A generative model conditioned on my personal experience would allow me to explore decisions that are within my reach to achieve[2/n]
1
0
4
6
6
58
@BenMildenhall
Ben Mildenhall
3 years
From our latest project, an homage to the original Photo Tourism visualizations by @Jimantha et al. - interpolating between camera pose, focal length, aspect ratio, and scene appearance from different tourist images. More details at @_pratul_ @jon_barron
4
8
52
@BenMildenhall
Ben Mildenhall
2 years
These days, who can say what crazy stuff is happening to your photos after they're captured, especially on a cellphone? Better to go straight to the source and grab those pixels fresh from the Bayer quads.
2
2
45
@BenMildenhall
Ben Mildenhall
3 years
Making these is really fun, I particularly like the crazy dolly zoom in the middle of this one of Sacre-Coeur
1
6
43
@BenMildenhall
Ben Mildenhall
5 months
That means, fewer views required to reconstruct the same quality as before. No one wants to capture 100-1000+ images for a good nerf or splatting result, it's super tedious!! Progress toward this means progress toward faster, easier, more casual 3D capture in the future 🎉 n/n
3
0
43
@BenMildenhall
Ben Mildenhall
3 months
no words please watch wow
@FMatrixGuy
Daniel Wedge
3 months
Here's yet another #NeRF video... but this one is of the musical variety. Enjoy! #WeAreTheNeRFs
1
21
98
2
0
42
@BenMildenhall
Ben Mildenhall
2 years
Plus, optimizing NeRF in the linear space of raw data means you can postprocess its rendered novel views just like any raw photograph (e.g., adjusting exposure, tonemapping, or white balance). You can even render synthetic defocus with correctly exposed bokeh.
3
1
39
@BenMildenhall
Ben Mildenhall
3 months
Tweet media one
1
0
33
@BenMildenhall
Ben Mildenhall
3 years
Great overview from @fdellaert ! I'd also like to highlight @TinghuiZhou and @Jimantha et al. for bringing volume rendering into deep learning for view synthesis with their paper Stereo Magnification in 2018.
@fdellaert
Frank Dellaert
3 years
2020 was the year in which *neural volume rendering* exploded onto the scene, triggered by the impressive NeRF paper by Mildenhall et al. I wrote a post as a way of getting up to speed in a fascinating and very young field and share my journey with you:
13
215
956
2
2
28
@BenMildenhall
Ben Mildenhall
2 years
3. T * sigma can be thought of as a PDF, implying that we're returning expected color along the ray. This is easily extended to return the expected value of any other quantity (eg. T * sigma * distance to get expected depth). /end
2
1
27
@BenMildenhall
Ben Mildenhall
2 years
This is such a huge blocker for neural rendering research! Graphics algorithms have so much dynamism and it is nigh impossible to try integrating these with current ML frameworks without a lot of custom low level work
@ericjang11
Eric Jang
2 years
for instance, a simulator of the future could involve NNs in kD tree lookup, NNs for sim2real, NNs for contact prediction. These pieces depend on each other but not necessarily on a per-gradient update basis, so potentially the AD software could be designed around that.
1
0
9
1
0
22
@BenMildenhall
Ben Mildenhall
4 months
@pesarlin certainly true but emphasizing this too much can also result in psychological torment, leading people to lock themselves in a room obsessing over how to create the Next Big Thing, rather than achieving a healthy balance and creative flow of research progress over time
1
0
21
@BenMildenhall
Ben Mildenhall
5 months
I'm excited about ReconFusion because it shows the potential of taking that basic setup and guiding optimization with a smarter prior that has a strong opinion about what uncaptured novel views should look like, based on learning from large multiview image datasets. 8/
1
1
21
@BenMildenhall
Ben Mildenhall
3 years
give your hotdogs the midas touch
@_pratul_
Pratul Srinivasan
3 years
We can even edit the materials in our recovered models --> hotdog alchemy! (3/3)
2
0
9
1
1
21
@BenMildenhall
Ben Mildenhall
2 years
1. It produces rendering weights T * alpha that match the alpha compositing model used in previous view synthesis methods like Neural Volumes and multiplane images. 2. It's a more common convention in graphics/rendering work (for the reasons in the screenshot above).
1
0
19
@BenMildenhall
Ben Mildenhall
2 years
This produces the typical integrand T * sigma * color from NeRF. Ok, so what's up with the sigma debate? As some replies to the original thread have noted, this is a matter of volume rendering convention. Later on the same page of Novak et al we see:
Tweet media one
3
0
16
@BenMildenhall
Ben Mildenhall
3 years
Time to revise these areas... so awkward trying to fit neural rendering submissions into these categories. Many are hiding in "stereo and multiview 3d", the place to be for orals this year 🔥🔥🔥
@CSProfKGD
Kosta Derpanis
3 years
#ICCV2021 paper stats
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
16
66
0
0
15
@BenMildenhall
Ben Mildenhall
2 years
So we can either keep sigma * L in the integrand or fold them together into a new emitted L'. This is analogous to the question of premultiplied alpha in over-compositing. We *intentionally* chose sigma * L in NeRF. It makes sense for a variety of reasons:
1
0
15
@BenMildenhall
Ben Mildenhall
2 years
T is transmittance, L_e is (emitted) color, and mu_a is what we call "density" (but is technically the absorption coefficient) and typically denote by sigma in NeRF. Since NeRF's rendering model doesn't include in- or out-scattering, mu_s = 0 and the second term disappears.
1
0
14
@BenMildenhall
Ben Mildenhall
3 years
🎉🎉 Check out the project page at and come talk to us during the poster sessions today at 4pm or Thursday at 9am Eastern time!
@ICCV_2021
ICCV2021
3 years
Honourable Mention @ICCV_2021 Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields Jonathan T Barron, Ben Mildenhall, Matthew Tancik , Peter Hedman, Ricardo Martin-Brualla, Pratul Srinivasan [Session 5 A/B]
1
6
22
0
0
13
@BenMildenhall
Ben Mildenhall
3 years
Appearance interpolation is done directly in NeRF weight space! Use simple meta-learning (Reptile) to get the initial scene weights, then just do a couple extra gradient steps to acquire the appearance of any new input image
0
1
13
@BenMildenhall
Ben Mildenhall
5 months
Many view synthesis projects later, I'd say we are somewhat remiss for not having included more of these plots in our papers. The difficulty of reconstructing a captured scene is 100% tied to view sampling density, and as a community we tend to underemphasize this fact. 5/
1
0
11
@BenMildenhall
Ben Mildenhall
5 months
This rate is basically sampling with 1 pixel of disparity between input views -- that is, you move the camera such that the nearest thing in the scene only moves 1 pixel in the resulting image (!!) Obviously, this is totally intractable. 3/
1
0
9
@BenMildenhall
Ben Mildenhall
5 months
The point of NeRF was: setting up a dumb brute-force optimization for a dense scene reconstruction that can rerender all your input images works surprisingly well. There's nothing "learned" there beyond the prior encoded by your 3D representation, plus the input images. 7/
1
0
9
@BenMildenhall
Ben Mildenhall
5 months
In ReconFusion, we ran this test on the kitchenlego scene (quality vs # input views) and got a very satisfying result. btw -- the lines do end up crossing for some metrics, at around 160 views :) The original full capture is about 250 input images. 6/
Tweet media one
1
0
8
@BenMildenhall
Ben Mildenhall
5 months
Back in grad school, @_pratul_ , Ren, and I talked a lot about plenoptic sampling in the context of view synthesis. There's a fundamental Nyquist rate sampling that needs to be achieved to guarantee "perfect" view synthesis/light field interpolation at a given resolution. 2/
1
0
8
@BenMildenhall
Ben Mildenhall
3 years
@BartWronsk Some very cool results on tracing through variable index of refraction in “Refractive Radiative Transfer Equation” by Ament et al in siggraph 2014. not including the wind simulation though!
1
0
7
@BenMildenhall
Ben Mildenhall
4 months
@mmalex one thing siggraph conference track nailed is, don't force authors to label their own paper as "lower tier" -- leave it as dual-track that can end up as either conference/journal and let reviewers decide. win/win as even good papers are encouraged to tighten up to 7 pages :)
2
0
6
@BenMildenhall
Ben Mildenhall
3 years
Their beautiful high-res results convinced @_pratul_ and I to immediately switch over from working on depth map based warping to a volumetric rendering model using multiplane images.
1
0
5
@BenMildenhall
Ben Mildenhall
1 year
@jperldev ha oops. that was meant to be in contrast to surface-based rendering (discontinuity at occlusion boundaries is hard to deal with), especially of SDFs (needs implicit differentiation). only "trivial" as in, implement the fwd pass and autodiff gives you the gradient for free...
0
0
6
@BenMildenhall
Ben Mildenhall
2 years
@AntonHand @vsaitoo I see, this is quite a large model so it's around 128MB on disk.
1
0
4
@BenMildenhall
Ben Mildenhall
5 months
In LLFF, we related our view synthesis performance to this fundamental limit, showing that if we used N planes in an MPI, we could increase the baseline between sampled views to N pixels. You can read this as image quality vs # of input views, (going from most to least). 4/
Tweet media one
1
0
4
@BenMildenhall
Ben Mildenhall
1 year
@avisingh599 airbnbs for now but most likely finding a place in actual SF!
0
0
4
@BenMildenhall
Ben Mildenhall
4 months
@mmalex the cultural shift is the main issue yeah, people have made various well-intentioned attempts to do this but i think most of them have missed the mark due to failing/refusing to understand the motivations behind prestige seeking behavior...
1
0
4
@BenMildenhall
Ben Mildenhall
3 years
@ChrisJReiser @jon_barron This is definitely possible. You can also get pretty far with a simple sparsity loss on density (eg, fig 4 in @PeterHedman3 's SNeRG paper)
Tweet media one
1
0
2
@BenMildenhall
Ben Mildenhall
2 years
@simesgreen siggraph submissions never fail to surprise, this is definitely right up there with computational citrus peeling and the knitting compiler "We partner with several zoos and circuses..."
Tweet media one
1
0
2
@BenMildenhall
Ben Mildenhall
5 months
@dacapo_go yep but not from where I was sitting 😅
0
0
2
@BenMildenhall
Ben Mildenhall
2 years
@AntonHand @vsaitoo I took about 500 input images to capture this scene, if that's what you're asking.
2
0
2
Ben Mildenhall Retweeted
@_akhaliq
AK
2 years
A implementation of text-to-3D dreamfusion, powered by stable diffusion github:
24
434
2K
@BenMildenhall
Ben Mildenhall
3 years
Come hang out with me and @PeterHedman3 in London next summer!
@jon_barron
Jon Barron
3 years
This is probably a good time to mention that my team (a subset of this author list, and a part of Google Research) is recruiting interns for next summer at our San Francisco and London offices. Email me at barron @google .com if you're interested in working on something NeRF-y.
0
0
17
0
0
1