Given a single blueprint (image), TeCH involves the collaboration between an “architect (reconstruct w/ image)” and a “painter (imagine w/ the image descriptions)”.We specifically illustrate the correlation between generation and reconstruction w.r.t. the input views. (3/10)
ECON got accepted by
#CVPR2023
Detailed clothed human recovery from single image via normal integration.
Is implicit MLP a must? NO.
Is data-driven/learning a must? NO.
How to keep pose robustness w/o sacrificing the topological flexibility?
See
ECON's
@huggingface
is ready to play with!
Besides human digitization from a single unconstrained image, it supports pose+prompt guided image generation (ControlNet) as well.
Shout out to Lee Kwan Joong () for developing an "all-in-one" Blender Add-on, which includes an image-based clothed human reconstructor, an avatarizer for animation, and a texture generator.
Tutorial:
ECON (
#CVPR2023
) reconstructs high-fidelity 3D humans, even those wearing 𝗹𝗼𝗼𝘀𝗲 𝗰𝗹𝗼𝘁𝗵𝗶𝗻𝗴 or in 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗶𝗻𝗴 𝗽𝗼𝘀𝗲𝘀, from a single image. The reconstructions can be animated with SMPL-X poses. Here we demo the Rasputin dance using ECON+HybrIK-X. (1/n)
Foundation models (LLMs, Diffusion, SAM) are like IBM's mainframe computers, while finetuning acts as the PC for personalization and customization.
LoRA adjusts weights through addition, while BOFT uses matrix multiplication to rotate them.
Project:
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
paper page:
Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream…
Excited to share our
#NeurIPS2022
(Dataset and Benchmark Track) work, DART, which extends MANO, the widely used hand model, with Diverse 3D Accessories and Rich Textures, to synthesize more realistic hand data.
Project:
Code:
ECON got accepted by
#CVPR2023
Detailed clothed human recovery from single image via normal integration.
Is implicit MLP a must? NO.
Is data-driven/learning a must? NO.
How to keep pose robustness w/o sacrificing the topological flexibility?
See
Finally, ICON joined the big family of
@huggingface
and
@Gradio
. Upload or generate an human image, select method (PIFu/PaMIR/ICON), then you will get 1) body and clothed human meshes, 2) SMPL parameters, and 3) rendered video. Special thanks
@NimaBoscarino
@_akhaliq
@fffiloni
And it's alive ! There it is, working, the
@Gradio
demo for ICON: Implicit Clothed humans Obtained from Normals on
@huggingface
demo :
Congrats
@yuliangxiu
😌👏
Thanks
@_akhaliq
for sharing our new work TeCH.
Reconstruction is a form of Conditional Generation, especially for one-shot and few-shot occasions.
Reconstruct the visible like architect, imagine the invisible like painter.
Projct:
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
abs:
paper page:
Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level…
TeCH gets in
@3DVconf
. Honestly, I am a bit depressed that this paper did not receive oral acceptance. However, I strongly believe that TeCH showcases a paradigm shift in avatar creation and bridging reconstruction and generation.
Code:
We see Reconstruction as a form of conditional Generation. Conditioned on a single image, and the descriptive prompts derived from it, TeCH could reconstruct a “Lifelike” clothed human. “Lifelike” refers to detailed shape and high-fidelity texture, even on BACKSIDE. (1/10)
ECON got accepted by
#CVPR2023
Detailed clothed human recovery from single image via normal integration.
Is implicit MLP a must? NO.
Is data-driven/learning a must? NO.
How to keep pose robustness w/o sacrificing the topological flexibility?
See
After 2 years of hard work by the team, we are thrilled to release today! Scholar Inbox is a personal paper recommender which enables you to stay up-to-date with the most relevant progress by delivering personal suggestions directly to your inbox.🧵
G-Shell is an explicit representation that can effectively model both watertight and non-watertight shapes. It exhibits compatibility with rasterization-based rendering, while also demonstrating speed and flexibility in various tasks such as reconstruction and generation.
Ghost on the Shell:
'Open surfaces' is like a floating ghost on the template watertight mesh.
I like this paper. The idea is clean, sounds reasonable, and offers strong modelling of non-watertight surfaces.
Combining SDF and mSDF, G-shell integrates…
If these 3D models are automatically generated by ANY algorithms, regardless of data driven or optimization based. The algorithm itself should get best paper award and all the researchers working on monocular 3D reconstruction should switch their focus.
Yes, we're also developing Text2Avatar, but TADA has distinct advantages:
1. Simplicity: TADA uses SMPL-X + displacement layer, no NeRF/NeuS needed.
2. Alignment (geometry & texture): TADA ensures semantic alignment on face and pattern alignment on clothing.
We present “3D magician”: TADA! Text to Animatable Digital Avatars.
Given a textual description as input only, our method TADA generates expressive animatable 3D avatars with high-quality geometry and lifelike textures. (1/10)
Used NeRF to make a "Bullet Time" effect for a friend's wedding.
We set up 15 iPhones to capture slow-motion video, then used
@NVIDIAAIDev
Instant-NGP to train a bunch of NeRFs on the frames.
..need to work on improving the quality / resolution a bit more.
New add-on of
#ICON
to extract 3D garments from fashion images, by Daniel Gudmundsson, Marion Barrau-Joyeaux, Arthur Collette, and Amalie Kjaer from ETH Zürich.
Check for the details. Special thanks to
@songyoupeng
for his mentorship in the 3DV course😉
The number of papers does not matter.
- professors who always tweet "N (N>10) papers got into CVPR/ICCV/SIGGRAPH/NeurIPS..." and received tens of "congrats" afterwards
I shared DELTA () yesterday, focusing on Video2Avatar, a traditional reconstruction task.
What if we apply the hybrid representation in the Text2Avatar generation task? See
@YaoFeng1995
's new work, TECA ()
Text-Guided Generation and Editing of Compositional 3D Avatars
paper page:
Our goal is to create a realistic 3D facial avatar with hair and accessories using only a text description. While this challenge has attracted significant recent interest,…
Thanks
@_akhaliq
and
@Gradio
strong and supportive team to help me setup the space. Here are a few PERSONAL expectations for
@huggingface
space: terminal support, anonymous & unlisted mode (for double-blind review process), and behavior & traffic analysis.
Multi-modal
#LLMs
understand a lot about humans. But do they understand our 3D pose? We train
#PoseGPT
to estimate, generate, and reason about 3D human pose (
#SMPL
) in images and text. This is the first true foundation model for understanding 3D humans.
Day 260 of
#Blender3d
|
#b3d
I generated an image using
#stablediffusion
and
#DALLE
2
I put the image into
#ICON
to turn it into a 3D model.
Then I used
#mixamo
to animate the model. (Then did a little extra thing with Blender) Details below!
Can we construct avatars from pixels? If we can take an image or video and get a detailed 3D likeness of a person that can be animated and inserted into games, it would open up many applications. ICON (
#CVPR2022
) takes a step in this direction. (1/9)
Introducing the Luma✨Unreal Engine alpha! Fully volumetric Luma NeRFs running realtime on Windows in UE 5 for incredible cinematic shots and experiences, starting today!
Try now:
Seems more like "conditional generation", instead of "pixel-aligned reconstruction". But still, very impressive results!
Please correct me if I am wrong.🥸
We're thrilled to announce a breakthrough in 3D world generation. Now, transform ANY image - AI-generated, concept art or real world shots - into high-resolution game-engine ready 3D asset. Check it out:
🎈 Public Showcase on Discord:
🤖 Generate your own…
AFAIK, more and more Canadian visitor visa applications are SERIOUSLY delayed, including mine from March 11th. With
@CVPR
only a month away, is it possible to expedite this process via the exemption letter like
@eccvconf
?
@CitImmCanada
@ctocevents
Again, would it be possible for
@ICCVConference
to allocate space for
@CVPR
posters upon request? This may potentially attract more attendees and compensate for those who were unable to attend due to the visa issues.
Seeing so many empty posters & missing authors at
#CVPR2023
is heartbreaking - how many? 20%? Many PhD students worked hard but this absurd visa system jeopardized their chance to proudly present their work. I know that PCs
@CVPR
took action but this was largely insufficient… 1/
Apart from the design of per-point displacement, my favorite part of S3F is the supmat, where the authors explored a range of ideas that ended up degrading or not affecting performance. I learned a lot from it!
📢📢 Our paper "Structured 3D Features (S3F) for Reconstructing Relightable and Animatable Avatars" was accepted at
#CVPR2023
!
S3Fs take an input image and generate a 3D human reconstruction that can be animated, relighted or edited (eg. change clothes) without post-processing!
The "Explicit" vs "Implicit" debate parallels the UI design dispute between "Skeuomorphism" and "Flat". Representations vary in suitability!
@YaoFeng1995
's DELTA combines mesh for body+face, NeRF for clothing+haircut, and renders them in a unified way. Don't mark, READ RIGHT NOW.
Learning Disentangled Avatars with Hybrid 3D Representations
paper page:
Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a…
Besides, AlphaPose supports mainstream DL frameworks
@PyTorch
@JittorHub
@ApacheMXNet
, and SMPL body estimation. It has been a pioneer in top-down multi-person keypoint estimation approaches. Easy to set up and use!
Excited to announce that AlphaPose got accepted at TPAMI🎉
AlphaPose now supports 136 whole-body keypoints estimation and tracking in real-time.
arXiv:
Code:
In
@ScienceMagazine
, we present
#AlphaCode
- the first AI system to write computer programs at a human level in competitions.
It placed in the top 54% of participants in coding contests by solving new and complex problems.
How does it work? 🧵
I am happy to announce that 1/1 paper I gave reject with very confidence finally get rejected and 2/2 papers I voted for acceptance finally get in
#ECCV2022
✨ Today we are launching NeRF Reshoot on iOS!
Capture lifelike 3D and then create incredible shots all day using AI and the most intuitive 3D editor ever, right on your iPhone.
Available on the AppStore, today!
#3d
#ai
#nerf
#lumaai
CLIP is so powerful! OpenScene is a great example to show how to extend CLIP on 3D data, making a quite scalable paradigm on long-tail cases.
Don't miss this awesome work from
@songyoupeng
OpenScene: 3D Scene Understanding with Open Vocabularies
@songyoupeng
, Kyle Genova, Chiyu "Max" Jiang,
@taiyasaki
,
@mapo1
, Thomas Funkhouser
tl;dr: CLIP meets point cloud.
PointAvatar represents avatars as rigged and animated point clouds, learned from monocular video. PointAvatar jointly optimizes the point geometry, texture and deformations. It disentangles the observed color into albedo and shading values, allowing basic relighting. (2/9)
What an amazing team lineup! Unfortunately, I won't be able to attend CVPR due to the vexing visa issue. However, for Paris, there is no need to concern oneself with visa matters!
.
@Stanford
@UCBerkeley
&
@Caltech
computer vision faculty& their students meet today to exchange research ideas, topics include 3D vision, language-visual models, robotic learning, computational photography, vision foundation models, etc. At the EOD, AI is truly fun Science! 1/
Would it be possible for
@ICCVConference
to allocate space for
@CVPR
posters upon request? This may potentially attract more attendees and compensate for those who were unable to attend either conference due to visa and COVID-related issues.
@CSProfKGD
Been in this community for more than 4 years, got 2 CVPRs and 1 ICCV paper accepted, but have NEVER been to a computer vision conference in-person even for one single time…this is the last chance during my PhD but still didn’t make it. I don’t think I’m the only one🤔.
LLM does connect isolated islands, e.g., 2D landmarks, 3D pose space, pixels, and language, under a unified knowledge space. Various "Projectors/Adaptors" will emerge for different output formats in downstream tasks.
I think about the field of 3D human pose, shape, and motion estimation as having three phases. 1: Optimization. 2: Regression. 3: Reasoning. With
#PoseGPT
, we are just entering phase 3. I summarize the coming paradigm shift in this blog post:
To evaluate the stability of poses estimated by IMPAN, we place them in a Bullet physics simulation. We find that IPMAN produces 14.8% more stable bodies than a baseline method. Example IPMAN poses are in blue, the baseline in orange. 8/9
We see Reconstruction as a form of conditional Generation. Conditioned on a single image, and the descriptive prompts derived from it, TeCH could reconstruct a “Lifelike” clothed human. “Lifelike” refers to detailed shape and high-fidelity texture, even on BACKSIDE. (1/10)
First time to advertise Vid2Avatar personally! A surprising demo here that might be interesting to the young (Chinese) students/researchers. Though I cannot be in Vancouver due to the visa issue, do come by our poster session WED-PM-048 and talk to other co-authors!
It's so disappointing to see a ETH professor defended his inappropriate slide like this.I am also attaching his reply to the original post:
This is NOT acceptable and we need a statement.
@ETH_en
@Joel_Mesot
@G_Dissertori
@springman_sarah
KeypointNeRF's "relative spatial keypoint encoder" is a general plug-n-play module for different downstream tasks. I have integrated it with ICON, which achieves comparable performance, compared with expensive body SDF.
More details at:
The episode + all relevant links and resources are available on:
On my blog:
On YouTube:
On the Podcast:
(Available on Spotify, Apple Podcasts, Google Podcasts and more! )
The GOAT of tennis
@DjokerNole
said: "35 is the new 25.” I say: “60 is the new 35.” AI research has kept me strong and healthy. AI could work wonders for you, too!
As an advisor, there is nothing better than seeing your students and post docs succeed, grow, and become part of the community. This group is so impressive. I love how they support each other and I love their intellectual curiosity. I’m only sad for the ones who couldn’t come.
Imagine if 'GPT' could rapidly become an expert in ANY subject, autonomously gathering information from around the world, retaining it indefinitely, and continuously learning 24/7 for centuries on end...
We should definitely fire the PhDs and purchase more GPUs
PhD students, don't worry. Technologies, trends, and even whole fields come and go. A PhD makes you an expert in a field but, more importantly, teaches you how to become an expert. Once you know that you can learn anything, you can adapt to major disruptions in your field.
Have tried several SMPL-based pose estimators on very challenging images, and PyMAF performs most robustly. Hence I finally set PyMAF as the default HPS of ICON.
Cannot wait to replace PyMAF with PyMAF-X!
Compared with other methods, TADA excels in generating high-fidelity results on different avatars, with various shapes and clothes. TADA enables real-world applications, such as virtual try-on, texture editing, and geometrical editing between two avatars. (9/10)
Reading "GPTs are GPTs" paper. It's super interesting that those with bachelor's degree seems to be considered most exposed to LLMs for labor markets. You'd be less exposed with LESS OR MORE education.