Ben Mildenhall @BenMildenhall Twitter profile

Last Seen Profiles

@powerJAMM

@Younglaco

@autumnallune

@daz_blue

@Khadija_Ismayil

@Helen0545764682

@twitaamr

@JDKanneth

@itsbellorie

@lauramvula

@gtmhouse

@nrtz_

@Edenossie

@moyo1030

@penancesss

@stephanie_wood9

@GlostonJay

@KenroyVenture

@quanybinladenn

@_c_l_o__

@Arc_London

@mismiyavv1

@DOC_erc20

@KFelixKinder

@BrandonUbel

@coachharrod

@vitikko

@GirardHolistic

@Veronic81256082

@nuco_Nico_

@HinduAmericans

@LUGCSecretary

@NorskKiwi1

@TUkestad

@CharliePerhus

@maangchi

Ben Mildenhall

@BenMildenhall

2 years

If you're still at CVPR and have the stamina to make it through another poster session, check out RawNeRF tomorrow morning! We exploit the fact that NeRF is surprisingly robust to image noise to reconstruct scenes directly from raw HDR sensor data.

19

200

1K

Ben Mildenhall

@BenMildenhall

3 months

will it nerf? yep ✅ congrats to @_tim_brooks @billpeeb and colleagues, absolutely incredible results!!

Tim Brooks

@_tim_brooks

3 months

Sora is our first video generation model - it can create HD videos up to 1 min long. AGI will be able to simulate the physical world, and Sora is a key step in that direction. thrilled to have worked on this with @billpeeb at @openai for the past year

150

159

1K

16

93

725

Ben Mildenhall

@BenMildenhall

2 years

Code finally released for our CVPR 2022 papers (mip-NeRF 360/Ref-NeRF/RawNeRF)! You can also find links for each paper's dataset on its project page. The code has some nice new camera utilities for larger real scenes, like this one.

Jon Barron

@jon_barron

2 years

We've finally released code for three of our CVPR2022 papers: mip-NeRF 360, Ref-NeRF, and RawNeRF. Instead of three separate releases, we've done something a little unusual and merged them into a single repo. Excited to see what people do with this!

7

118

763

17

103

584

Ben Mildenhall

@BenMildenhall

5 months

Friday was my last day at Google after 3 years. Will dearly miss hanging out with all my amazing coworkers, but looking forward to trying something new. The last few years of progress have been crazy and I expect even wilder things to come. goodbye to my nerfing camera 💔

21

5

511

Ben Mildenhall

@BenMildenhall

5 months

ReconFusion = standard single-scene optimized 3D reconstruction, additionally guided by a multi-view diffusion prior to allow for decent outputs from significantly fewer input views. some thoughts below for view synthesis fanatics... 1/

6

49

419

Ben Mildenhall

@BenMildenhall

2 years

The use of sigma in NeRF volume rendering is neither wrong nor a bug, but an intentional choice. Here's the integral form of the radiative transfer equation, from rendering experts @_jannovak @wkjarosz et al How does this become the equation in NeRF?

F. Güney

@ftm_guney

2 years

Richard Hartley finding a bug in the formulation of NeRF that makes it easier to optimize.. this day has already turned out more fun than I thought.

5

21

276

1

54

337

Ben Mildenhall

@BenMildenhall

2 years

one of my favorite new renderings from this project

Jon Barron

@jon_barron

2 years

Very glad I can finally talk about our newly-minted #CVPR2022 paper. We extended mip-NeRF to handle unbounded "360" scenes, and it got us ~photorealistic renderings and beautiful depth maps. Explainer video: and paper:

23

274

2K

8

20

271

Ben Mildenhall

@BenMildenhall

2 years

i am so sorry 😭

Zirui Wang

@ziruiwang_

2 years

Nice coordinate convertion😆😆😆

3

9

138

6

14

176

Ben Mildenhall

@BenMildenhall

2 years

"a monkey hitting a laptop with a hammer" #dreamfusion

Gideon on Gaming

@GideonOnGaming

2 years

Me, learning blender, seeing this…

0

14

4

14

157

Ben Mildenhall

@BenMildenhall

1 year

moved back. London ➡️ SF

6

4

154

Ben Mildenhall

@BenMildenhall

2 years

check out the restaurant flythrough! @PeterHedman3 @duck

Jeff Dean (@🏡)

@JeffDean

2 years

The new @googlemaps Immersive View feature is going to be pretty amazing (and uses Neural Radiance Fields, or NeRFs, developed by @GoogleAI , UCBerkeley and UCSD researchers) See Watch Maps Immersive portion of #GoogleIO keynote:

7

80

504

0

19

154

Ben Mildenhall

@BenMildenhall

2 years

Variations on "a corgi in a bath robe reading a newspaper," suggested by @georgiagkioxari find more corgis at #dreamfusion

Georgia Gkioxari

@georgiagkioxari

2 years

@poolio @BenMildenhall @ajayj_ @jon_barron Is it a good time to ask for "a corgi in a bath robe reading a newspaper"? I'd like the mesh if it's not too much to ask! I wanna to 3D print it :) @jon_barron made me do this!

1

0

28

3

21

132

Ben Mildenhall

@BenMildenhall

4 months

+1 to Ben P and between bill freeman’s famous (and accurate) plot and never-ending complaints about the randomness of the review process… can we really blame people for coming to the misguided conclusion “draw as many samples as I can to maximize chance of getting an outlier”?

Ben Poole

@poolio

4 months

One paper can change your life. But which one? Overproductivity doesn't just come from paper counting, but from the desperate acts of young researchers under extreme pressure to be part of that one paper.

1

2

137

3

7

131

Ben Mildenhall

@BenMildenhall

2 years

Playing with this has been ridiculously fun

Ben Poole

@poolio

2 years

Happy to announce DreamFusion, our new method for Text-to-3D! We optimize a NeRF from scratch using a pretrained text-to-image diffusion model. No 3D data needed! Joint work w/ the incredible team of @BenMildenhall @ajayj_ @jon_barron #dreamfusion

136

1K

6K

4

16

131

Ben Mildenhall

@BenMildenhall

2 years

z axis ✅ #dreamfusion ("a chimpanzee chiseling a marble statue of a monkey")

Paul Graham

@paulg

2 years

Prediction: More impasto coming from human artists. Dall-E has taken a lot of territory in the x and y dimensions, but none at all in the z.

52

25

441

2

12

116

Ben Mildenhall

@BenMildenhall

8 months

check out our work on improving joint NeRF+camera optimization! shoutout to this iOS app that records images and arkit poses, I was able to use it to capture all the arkit scenes for this paper in about 15 minutes

GitHub - jc211/NeRFCapture: An iOS app that collects/streams posed images for NeRFs using ARKit

An iOS app that collects/streams posed images for NeRFs using ARKit - jc211/NeRFCapture

github.com

Keunhong Park

@KeunhongP

8 months

Introducing CamP🏕️ — a method to precondition camera optimization for NeRFs to significantly improve quality. With CamP we’re able to create high quality reconstructions even when input poses are bad. Project page: ArXiv: (1/n)

4

67

366

1

7

93

Ben Mildenhall

@BenMildenhall

2 years

Find out more in our talk on "NeRF in the Dark" Friday morning at 8:30am in Great Hall B-C or visit our poster ( #19 ) from 10am-12:30pm. @PeterHedman3 @_pratul_ @rmbrualla @jon_barron Paper and more results at

NeRF in the Dark (RawNeRF)

NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images.

bmild.github.io

3

11

85

Ben Mildenhall

@BenMildenhall

2 years

nerfirenze

3

4

70

Ben Mildenhall

@BenMildenhall

2 years

to each their own #dreamfusion

Chris Sweeney

@the_shweenz

2 years

@ajayj_ @poolio @jon_barron @BenMildenhall I'm much more interested to see my home office redone to be mid-century modern style than I am in seeing a purple duck eating an everything bagel. A generative model conditioned on my personal experience would allow me to explore decisions that are within my reach to achieve[2/n]

1

0

4

6

58

Ben Mildenhall

@BenMildenhall

3 years

From our latest project, an homage to the original Photo Tourism visualizations by @Jimantha et al. - interpolating between camera pose, focal length, aspect ratio, and scene appearance from different tourist images. More details at @_pratul_ @jon_barron

4

8

52

Ben Mildenhall

@BenMildenhall

2 years

These days, who can say what crazy stuff is happening to your photos after they're captured, especially on a cellphone? Better to go straight to the source and grab those pixels fresh from the Bayer quads.

2

45

Ben Mildenhall

@BenMildenhall

3 years

Making these is really fun, I particularly like the crazy dolly zoom in the middle of this one of Sacre-Coeur

1

6

43

Ben Mildenhall

@BenMildenhall

5 months

That means, fewer views required to reconstruct the same quality as before. No one wants to capture 100-1000+ images for a good nerf or splatting result, it's super tedious!! Progress toward this means progress toward faster, easier, more casual 3D capture in the future 🎉 n/n

3

0

43

Ben Mildenhall

@BenMildenhall

3 months

no words please watch wow

Daniel Wedge

@FMatrixGuy

3 months

Here's yet another #NeRF video... but this one is of the musical variety. Enjoy! #WeAreTheNeRFs

1

21

98

2

0

42

Ben Mildenhall

@BenMildenhall

2 years

Plus, optimizing NeRF in the linear space of raw data means you can postprocess its rendered novel views just like any raw photograph (e.g., adjusting exposure, tonemapping, or white balance). You can even render synthetic defocus with correctly exposed bokeh.

3

1

39

Ben Mildenhall

@BenMildenhall

3 months

@ndsong95 @_tim_brooks @billpeeb yep no problem

1

0

33

Ben Mildenhall

@BenMildenhall

3 years

Great overview from @fdellaert ! I'd also like to highlight @TinghuiZhou and @Jimantha et al. for bringing volume rendering into deep learning for view synthesis with their paper Stereo Magnification in 2018.

Frank Dellaert

@fdellaert

3 years

2020 was the year in which *neural volume rendering* exploded onto the scene, triggered by the impressive NeRF paper by Mildenhall et al. I wrote a post as a way of getting up to speed in a fascinating and very young field and share my journey with you:

13

215

956

2

28

Ben Mildenhall

@BenMildenhall

2 years

3. T * sigma can be thought of as a PDF, implying that we're returning expected color along the ray. This is easily extended to return the expected value of any other quantity (eg. T * sigma * distance to get expected depth). /end

2

1

27

Ben Mildenhall

@BenMildenhall

2 years

This is such a huge blocker for neural rendering research! Graphics algorithms have so much dynamism and it is nigh impossible to try integrating these with current ML frameworks without a lot of custom low level work

Eric Jang

@ericjang11

2 years

for instance, a simulator of the future could involve NNs in kD tree lookup, NNs for sim2real, NNs for contact prediction. These pieces depend on each other but not necessarily on a per-gradient update basis, so potentially the AD software could be designed around that.

1

0

9

1

0

22

Ben Mildenhall

@BenMildenhall

4 months

@pesarlin certainly true but emphasizing this too much can also result in psychological torment, leading people to lock themselves in a room obsessing over how to create the Next Big Thing, rather than achieving a healthy balance and creative flow of research progress over time

1

0

21

Ben Mildenhall

@BenMildenhall

5 months

I'm excited about ReconFusion because it shows the potential of taking that basic setup and guiding optimization with a smarter prior that has a strong opinion about what uncaptured novel views should look like, based on learning from large multiview image datasets. 8/

1

21

Ben Mildenhall

@BenMildenhall

3 years

give your hotdogs the midas touch

Pratul Srinivasan

@_pratul_

3 years

We can even edit the materials in our recovered models --> hotdog alchemy! (3/3)

2

0

9

1

21

Ben Mildenhall

@BenMildenhall

2 years

1. It produces rendering weights T * alpha that match the alpha compositing model used in previous view synthesis methods like Neural Volumes and multiplane images. 2. It's a more common convention in graphics/rendering work (for the reasons in the screenshot above).

1

0

19

Ben Mildenhall

@BenMildenhall

2 years

This produces the typical integrand T * sigma * color from NeRF. Ok, so what's up with the sigma debate? As some replies to the original thread have noted, this is a matter of volume rendering convention. Later on the same page of Novak et al we see:

3

0

16

Ben Mildenhall

@BenMildenhall

3 years

Time to revise these areas... so awkward trying to fit neural rendering submissions into these categories. Many are hiding in "stereo and multiview 3d", the place to be for orals this year 🔥🔥🔥

Kosta Derpanis

@CSProfKGD

3 years

#ICCV2021 paper stats

1

16

66

0

15

Ben Mildenhall

@BenMildenhall

2 years

So we can either keep sigma * L in the integrand or fold them together into a new emitted L'. This is analogous to the question of premultiplied alpha in over-compositing. We *intentionally* chose sigma * L in NeRF. It makes sense for a variety of reasons:

1

0

15

Ben Mildenhall

@BenMildenhall

1 year

@georgiagkioxari @jon_barron @_pratul_ exactly

0

14

Ben Mildenhall

@BenMildenhall

2 years

T is transmittance, L_e is (emitted) color, and mu_a is what we call "density" (but is technically the absorption coefficient) and typically denote by sigma in NeRF. Since NeRF's rendering model doesn't include in- or out-scattering, mu_s = 0 and the second term disappears.

1

0

14

Ben Mildenhall

@BenMildenhall

3 years

🎉🎉 Check out the project page at and come talk to us during the poster sessions today at 4pm or Thursday at 9am Eastern time!

ICCV2021

@ICCV_2021

3 years

Honourable Mention @ICCV_2021 Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields Jonathan T Barron, Ben Mildenhall, Matthew Tancik , Peter Hedman, Ricardo Martin-Brualla, Pratul Srinivasan [Session 5 A/B]

1

6

22

0

13

Ben Mildenhall

@BenMildenhall

3 years

Appearance interpolation is done directly in NeRF weight space! Use simple meta-learning (Reptile) to get the initial scene weights, then just do a couple extra gradient steps to acquire the appearance of any new input image

0

1

13

Ben Mildenhall

@BenMildenhall

5 months

Many view synthesis projects later, I'd say we are somewhat remiss for not having included more of these plots in our papers. The difficulty of reconstructing a captured scene is 100% tied to view sampling density, and as a community we tend to underemphasize this fact. 5/

1

0

11

Ben Mildenhall

@BenMildenhall

5 months

This rate is basically sampling with 1 pixel of disparity between input views -- that is, you move the camera such that the nearest thing in the scene only moves 1 pixel in the resulting image (!!) Obviously, this is totally intractable. 3/

1

0

9

Ben Mildenhall

@BenMildenhall

5 months

The point of NeRF was: setting up a dumb brute-force optimization for a dense scene reconstruction that can rerender all your input images works surprisingly well. There's nothing "learned" there beyond the prior encoded by your 3D representation, plus the input images. 7/

1

0

9

Ben Mildenhall

@BenMildenhall

5 months

In ReconFusion, we ran this test on the kitchenlego scene (quality vs # input views) and got a very satisfying result. btw -- the lines do end up crossing for some metrics, at around 160 views :) The original full capture is about 250 input images. 6/

1

0

8

Ben Mildenhall

@BenMildenhall

5 months

Back in grad school, @_pratul_ , Ren, and I talked a lot about plenoptic sampling in the context of view synthesis. There's a fundamental Nyquist rate sampling that needs to be achieved to guarantee "perfect" view synthesis/light field interpolation at a given resolution. 2/

1

0

8

Ben Mildenhall

@BenMildenhall

3 years

@BartWronsk Some very cool results on tracing through variable index of refraction in “Refractive Radiative Transfer Equation” by Ament et al in siggraph 2014. not including the wind simulation though!

1

0

7

Ben Mildenhall

@BenMildenhall

4 months

@mmalex one thing siggraph conference track nailed is, don't force authors to label their own paper as "lower tier" -- leave it as dual-track that can end up as either conference/journal and let reviewers decide. win/win as even good papers are encouraged to tighten up to 7 pages :)

2

0

6

Ben Mildenhall

@BenMildenhall

3 years

Their beautiful high-res results convinced @_pratul_ and I to immediately switch over from working on depth map based warping to a volumetric rendering model using multiplane images.

1

0

5

Ben Mildenhall

@BenMildenhall

1 year

@jperldev ha oops. that was meant to be in contrast to surface-based rendering (discontinuity at occlusion boundaries is hard to deal with), especially of SDFs (needs implicit differentiation). only "trivial" as in, implement the fwd pass and autodiff gives you the gradient for free...

0

6

Ben Mildenhall

@BenMildenhall

2 years

@unixpickle @sstj389 @RiversHaveWings How about Text2Mesh? Stylizes an input 3D mesh

Text2Mesh Text-Driven Neural Stylization for Meshes

Oscar Michel1, Roi Bar-On1,2, Richard Liu*1, Sagie Benaim2, Rana Hanocka1

threedle.github.io

0

5

Ben Mildenhall

@BenMildenhall

2 years

@AntonHand @vsaitoo I see, this is quite a large model so it's around 128MB on disk.

1

0

4

Ben Mildenhall

@BenMildenhall

5 months

In LLFF, we related our view synthesis performance to this fundamental limit, showing that if we used N planes in an MPI, we could increase the baseline between sampled views to N pixels. You can read this as image quality vs # of input views, (going from most to least). 4/

1

0

4

Ben Mildenhall

@BenMildenhall

1 year

@avisingh599 airbnbs for now but most likely finding a place in actual SF!

0

4

Ben Mildenhall

@BenMildenhall

4 months

@mmalex the cultural shift is the main issue yeah, people have made various well-intentioned attempts to do this but i think most of them have missed the mark due to failing/refusing to understand the motivations behind prestige seeking behavior...

1

0

4

Ben Mildenhall

@BenMildenhall

3 years

@ChrisJReiser @jon_barron This is definitely possible. You can also get pretty far with a simple sparsity loss on density (eg, fig 4 in @PeterHedman3 's SNeRG paper)

1

0

2

Ben Mildenhall

@BenMildenhall

2 years

@simesgreen siggraph submissions never fail to surprise, this is definitely right up there with computational citrus peeling and the knitting compiler "We partner with several zoos and circuses..."

1

0

2

Ben Mildenhall

@BenMildenhall

5 months

@dacapo_go yep but not from where I was sitting 😅

0

2

Ben Mildenhall

@BenMildenhall

2 years

@AntonHand @vsaitoo I took about 500 input images to capture this scene, if that's what you're asking.

2

0

2

Ben Mildenhall Retweeted

AK

@_akhaliq

2 years

A implementation of text-to-3D dreamfusion, powered by stable diffusion github:

24

434

2K

Ben Mildenhall

@BenMildenhall

3 years

Come hang out with me and @PeterHedman3 in London next summer!

Jon Barron

@jon_barron

3 years

This is probably a good time to mention that my team (a subset of this author list, and a part of Google Research) is recruiting interns for next summer at our San Francisco and London offices. Email me at barron @google .com if you're interested in working on something NeRF-y.

0

17

0

1