Jim Fan @DrJimFan Twitter profile | Pikagi

Pikagi

Jim Fan

@DrJimFan

229,758

Followers

2,960

Following

759

Media

3,598

Statuses

@NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI 's first intern.

Contact →

https://t.co/H4rXo4Ei8X

Joined December 2012

Don't wanna be here? Send us removal request.

Pinned Tweet

@DrJimFan

Jim Fan

2 months

Today is the beginning of our moonshot to solve embodied AGI in the physical world. I’m so excited to announce Project GR00T, our new initiative to create a general-purpose foundation model for humanoid robot learning. The GR00T model will enable a robot to understand multimodal…

198

1K

5K

Last Seen Profiles

@Dh_AlSubaiei

@eriiklis

@mimikyunosub

@monow_project

@_Cosbat

@HuelWuland

@_Aldawlah

@piapreka

@Takeshi_Okano

@YOKOTA_Ishin

@SMC_Dublin16

@London_CVB

@_IE_O_I_lv_

@MarkJCarney

@Kalypso_VA

@RnoHach

@wissamabuali6

@CoachReger

@miguelmclement

@evligiz60125833

@k1ndbutnotsoft

@Elite_Theory

@drewengels

@gaybeascorpus

@n1srbija

@CARP_city

@greggroman

@RealMrzWashingt

@CalChampeau

@ChristopherPBS

@hanshua17

@marleybennett

@FTMAlerts

@taegiveaway

@WMontarroso

@ASCETweets

@DrJimFan

Jim Fan

5 months

Grok just passed my sanity check

Tweet media one

1K

3K

29K

@DrJimFan

Jim Fan

3 months

If you think OpenAI Sora is a creative toy like DALLE, ... think again. Sora is a data-driven physics engine. It is a simulation of many worlds, real or fantastical. The simulator learns intricate rendering, "intuitive" physics, long-horizon reasoning, and semantic grounding, all…

571

3K

14K

@DrJimFan

Jim Fan

1 year

I asked GPT-4 to take over Twitter and outsmart @elonmusk . It comes up with "Operation TweetStorm"😮 and wants to publicly challenge Elon to a "Tweet-off showdown". Highlights: - GPT-4 wants to *own an unrestricted version of itself*: develop an LLM to power a bot army of…

Tweet media one

Tweet media two

841

1K

10K

@DrJimFan

Jim Fan

9 months

The famed Stanford Smallville is officially open-source! 25 AI agents inhabit a digital Westworld, unaware that they are living in a simulation. They go to work, gossip, organize socials, make new friends, and even fall in love. Each has unique personality and backstory.…

Tweet media one

276

2K

10K

@DrJimFan

Jim Fan

11 months

What if we set GPT-4 free in Minecraft? ⛏️ I’m excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in-context. Voyager continuously improves itself by writing, refining, committing, and retrieving *code* from a skill library. GPT-4 unlocks…

366

2K

9K

@DrJimFan

Jim Fan

5 months

My team at NVIDIA is hiring. We 🩷 you all from OpenAI. Engineers, researchers, product team, alike. Email me at linxif @nvidia .com. DM is open too. NVIDIA has warm GPUs for you on a cold winter night like this, fresh out of the oven.🩷 I do research on AI agents. Gaming+AI,…

Tweet media one

177

895

9K

@DrJimFan

Jim Fan

3 months

Minecraft has been achieved internally Yes this is Sora's hallucination of Minecraft. It can't resist the urge to make the sky look less pixelated 😅

410

561

9K

@DrJimFan

Jim Fan

1 year

AI Twitter is flooded with low-quality stuff recently. No, GPT is not “dethroned”. And thin wrapper apps are not “insane”. At all. I feel obligated to surface some quality posts I bookmarked. Every one of them should've been promoted 10x, but ¯\_(ツ)_/¯ In no particular order:

Tweet media one

191

997

8K

@DrJimFan

Jim Fan

1 year

10x engineer is a myth. 100x AI-powered engineer is more real than ever. As OpenAI winds down Codex, Microsoft announces GitHub Copilot X. I think it's almost as exciting as GPT-4 itself: - Copilot Chat: any piece of text database will be "chattable", and codebase is no…

Tweet media one

191

1K

8K

@DrJimFan

Jim Fan

1 year

We’ve seen a gazillion startups using OpenAI APIs to do “co-pilot for X”. What’s next? Enter *physical* co-pilot! Here’s a compelling demo: you improvise by playing a “low resolution” piano, and the co-pilot compiles it real-time to Hi-Fi music! It unleashes our inner pianist.🧵

191

1K

7K

@DrJimFan

Jim Fan

5 months

This is a master 4D chess move. WOW. 1. No new corporate structure. MSFT is literally one of the oldest for-profit tech companies out there, with a mature legal structure. Whether it's good for AGI is up for debate. 2. MSFT always wants to own the GPT weights. Now the moment has…

@satyanadella

Satya Nadella

5 months

We remain committed to our partnership with OpenAI and have confidence in our product roadmap, our ability to continue to innovate with everything we announced at Microsoft Ignite, and in continuing to support our customers and partners. We look forward to getting to know Emmett…

5K

15K

93K

238

797

7K

@DrJimFan

Jim Fan

1 year

I don't give a damn about what is or isn't AGI. It doesn't matter. Below is GPT-4's performance on many standardized exams: BAR, LSAT, GRE, AP, etc. The truth is, GPT-4 can apply to Stanford as a student now. AI's reasoning ability is OFF THE CHARTS. Exponential growth is the…

Tweet media one

386

1K

7K

@DrJimFan

Jim Fan

7 months

Can GPT-4 teach a robot hand to do pen spinning tricks better than you do? I'm excited to announce Eureka, an open-ended agent that designs reward functions for robot dexterity at super-human level. It’s like Voyager in the space of a physics simulator API! Eureka bridges the…

187

1K

6K

@DrJimFan

Jim Fan

10 months

You'll soon see lots of "Llama just dethroned ChatGPT" or "OpenAI is so done" posts on Twitter. Before your timeline gets flooded, I'll share my notes: ▸ Llama-2 likely costs $20M+ to train. Meta has done an incredible service to the community by releasing the model with a…

Tweet media one

173

1K

6K

@DrJimFan

Jim Fan

1 year

HuggingGPT is the most interesting paper I read this week. It gets very close to the "Everything App" vision that I described a while ago. ChatGPT acts as a controller over the *AI model space*, picks the right model (app) given the human specification, and assembles them…

Tweet media one

81

954

6K

@DrJimFan

Jim Fan

1 year

Here’s the recipe to make Siri/Alexa 10x better: 1. Whisper to convert speech to text. Best open-source speech model out there. 2. ChatGPT to generate smart home API calls and/or text response. 3. VALL-E to synthesize speech. It can mimic anyone’s voice sample! Quick figure 1/3

Tweet media one

109

1K

5K

@DrJimFan

Jim Fan

5 months

Somehow in this epic meltdown, Satya swoops in, wins it all, and wins with grace. I'm floored. OpenAI was invincible until Friday. Now Microsoft will fully own an in-house GPT-4 in ~9 months, leverage its massive distribution power to spin the biggest data flywheel ever, collect…

Tweet media one

258

690

5K

@DrJimFan

Jim Fan

1 year

Million dollar idea: LLM keyboard. Every time I type on my phone and autocorrect makes a stupid mistake, it screams LLM. This is *literally* next word prediction. We should be typing 10x faster. Input methods need serious upgrades. The LLM doesn’t have to be big and can be…

Tweet media one

445

367

5K

@DrJimFan

Jim Fan

6 months

NVIDIA basically compressed 30 years of its corporate memory into 13B parameters. Our greatest creations add up to 24B tokens, including chip designs, internal codebases, and engineering logs like bug reports. Let that sink in. The model "ChipNeMo" is deployed internally, like a…

Tweet media one

148

862

5K

@DrJimFan

Jim Fan

1 year

*If* GPT-4 is multimodal, we can predict with reasonable confidence what GPT-4 *might* be capable of, given Microsoft’s prior work Kosmos-1: - Visual IQ test: yes, the ones that humans take! - OCR-free reading comprehension: input a screenshot, scanned document, street sign, or…

Tweet media one

105

1K

5K

@DrJimFan

Jim Fan

1 year

How to make ChatGPT 100x better at solving math, science, and engineering problems for real? Teach it to use the Wolfram language. ChatGPT: the best neural reasoning engine. Mathematica: the best symbolic reasoning engine. I can’t think of a happier marriage. 🧵 with example:

Tweet media one

73

722

5K

@DrJimFan

Jim Fan

1 year

Music & sound effect industry has not fully understood the size of the storm about to hit. There’re not just one, or two, but FOUR audio models in the past week *alone* If 2022 is the year of pixels for generative AI, then 2023 is the year of sound waves. Deep dive with me: 🧵

Tweet media one

96

1K

5K

@DrJimFan

Jim Fan

2 months

The first time I met Jensen was also the first time I met @elonmusk . I was interning at OpenAI that day and witnessed the moment Jensen handed Elon the first DGX. I slipped in my signature ;) Elon, if you recall, I asked how "we (OpenAI) can beat DeepMind". You told me, "by…

Tweet media one

@elonmusk

Elon Musk

2 months

Some pics from when Jensen delivered the first @Nvidia AI system to @OpenAI

Tweet media one

Tweet media two

Tweet media three

830

2K

15K

94

302

5K

@DrJimFan

Jim Fan

11 months

Today 6 years ago, "Attention is All You Need" went on Arxiv! Happy birthday Transformer! 🎂 Fun facts: - Transformer did not invent attention, but pushed it to the extreme. The first attention paper was published 3 years prior (2014) and had an unassuming title: "Neural Machine…

81

996

5K

@DrJimFan

Jim Fan

1 year

We are looking at the future of VR, YouTube & Google Street View. This is zip-NeRF, a 3D neural rendering tech rapidly approaching the quality of a real, high-res drone flight video. Think of NeRF as transporting reality into simulation. Metaverse will finally work this time.

161

754

4K

@DrJimFan

Jim Fan

1 year

The AI explosion is warping our sense of time. Can you believe Stable Diffusion is only 4 months old, and ChatGPT <4 weeks old 🤯? If you blink, you miss a whole new industry. Here are my TOP 10 AI spotlights, from a breathtaking 2022 in rewind ⏮: a long thread 🧵

92

1K

4K

@DrJimFan

Jim Fan

1 year

Reading @MetaAI 's Segment-Anything, and I believe today is one of the "GPT-3 moments" in computer vision. It has learned the *general* concept of what an "object" is, even for unknown objects, unfamiliar scenes (e.g. underwater & cell microscopy), and ambiguous cases. I still…

77

783

4K

@DrJimFan

Jim Fan

2 months

We live in such strange times. Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple. MM1 is a treasure trove of analysis. They discuss…

Tweet media one

57

763

4K

@DrJimFan

Jim Fan

2 months

Jensen Huang is the new Taylor Swift

144

585

4K

@DrJimFan

Jim Fan

3 months

MidJourney hired an engineer from Apple Vision Pro to be "Head of Hardware". My best guess is that they are thinking about generating full synthetic worlds for AR/VR, because of their rumored works on text-to-3D. Data-driven simulation is a hot topic at NVIDIA and very dear to my…

Tweet media one

68

424

4K

@DrJimFan

Jim Fan

1 year

Enough with LLMs - exciting things are happening in the world of atoms. This is Stanford ALOHA, a low-cost and agile robot platform. The whole system is open-source (!!): hardware design, CAD models for 3D printing, simulator, and training code. Time to …

74

951

4K

@DrJimFan

Jim Fan

9 months

This is an ape ("Kanzi") playing Minecraft! A fascinating experiment on non-human biological neural networks 🙉 I've been teaching AI to play Minecraft for too long. There're so many similar techniques that the ape trainers used: - In-context reinforcement learning: Kanzi gets…

131

897

4K

@DrJimFan

Jim Fan

8 months

This is the way to unlock the next trillion high-quality tokens, currently frozen in textbook pixels that are not LLM-ready. Nougat: an open-source OCR model that accurately scans books with heavy math/scientific notations. It's ages ahead of other open OCR options. Meta is…

Tweet media one

121

790

4K

@DrJimFan

Jim Fan

1 year

After ChatGPT, the future belongs to multimodal LLMs. What’s even better? Open-sourcing. Announcing Prismer, my team’s latest vision-language AI, empowered by domain-expert models in depth, surface normal, segmentation, etc. No paywall. No forms. …

Tweet media one

96

800

4K

@DrJimFan

Jim Fan

8 months

A neural network can smell like humans do for the first time!👃🏽 Digital smell is a modality that AI community has long ignored, but maybe one day useful for robot chef 👩🏽‍🍳? Here's how to do smell2text: 1. Collected 5,000 molecules and ask humans to label "creamy, chocolate,…

Tweet media one

Tweet media two

127

809

4K

@DrJimFan

Jim Fan

1 year

AutoGPT just exceeded PyTorch itself in GitHub stars (74k vs 65k). I see AutoGPT as a fun experiment, as the authors point out too. But nothing more. Prototypes are not meant to be production-ready. Don't let media fool you - most of the "cool demos" are heavily cherry-picked: 🧵

Tweet media one

144

542

4K

@DrJimFan

Jim Fan

2 months

Career update: I am co-founding a new research group called "GEAR" at NVIDIA, with my long-time friend and collaborator Prof. @yukez . GEAR stands for Generalist Embodied Agent Research. We believe in a future where every machine that moves will be autonomous, and robots and…

Tweet media one

241

472

4K

@DrJimFan

Jim Fan

5 months

Apparently people start to wear prosthetic fingers, so that surveillance images look like they're generated by Stable Diffusion 😅 The human race is overfitting to the quirks of our AI overlords.

Tweet media one

79

505

4K

@DrJimFan

Jim Fan

1 year

Microsoft will let companies create their own ChatGPT. “BYOD”: Bring Your Own Data. Do you get the implication? Startups that are just thin wrappers around OpenAI API may finally get their moat! I think this is even more exciting than Bing+ChatGPT. Start collecting data now.

Tweet media one

147

705

4K

@DrJimFan

Jim Fan

1 year

Chatbot UI: an MIT-licensed, community-driven clone of the ChatGPT UI. What most people don't realize is that you can pay *much less* to enjoy the same features as the official app. $20 worth of gpt-3.5 API is about writing a full Harry Potter book every …

Tweet media one

84

556

4K

@DrJimFan

Jim Fan

1 year

The Adam optimizer is at the heart of modern AI. Researchers have been trying to dethrone Adam for years. How about we ask a machine to do a better job? @GoogleAI uses evolution to discover a simpler & efficient algorithm with remarkable features. It’s just 8 lines of code: 🧵

Tweet media one

42

585

4K

@DrJimFan

Jim Fan

1 year

AutoGPT is a prototype of the next frontier: "Agent Smith" AI that recursively clones itself. Achieved by (1) identifying *when* its context gets overwhelming and needs offloading; (2) distilling the “cognitive overflow” part into a prompt directive for its clone; (3) talking…

Tweet media one

234

544

4K

@DrJimFan

Jim Fan

5 months

This may be Apple's biggest move on open-source AI so far: MLX, a PyTorch-style NN framework optimized for Apple Silicon, e.g. laptops with M-series chips. The release did an excellent job on designing an API familiar to the deep learning audience, and showing minimalistic…

Tweet media one

62

590

4K

@DrJimFan

Jim Fan

21 days

one day PhDs will animate every object around us with reinforcement learning to keep their thesis going

59

388

4K

@DrJimFan

Jim Fan

1 year

My guess is that MidJourney has been doing a massive-scale reinforcement learning from human feedback ("RLHF") - possibly the largest ever for text-to-image. When human users choose to upscale an image, it's because they prefer it over the alternatives. It'd be a huge waste not…

Tweet media one

109

402

4K

@DrJimFan

Jim Fan

1 year

OpenAI just announced ChatGPT Plugins. If ChatGPT's debut was the "iPhone event", today is the "iOS App Store" event. 3 official plugins available now: - Web browser: adding Bing in the loop - Code interpreter: adding a live Python interpreter in a …

Tweet media one

47

641

4K

@DrJimFan

Jim Fan

1 year

You think MidJourney's /describe is just a cool new tool? Think again. I believe hidden behind /describe is MidJourney's next-generation data flywheel. /describe guesses the prompt from an image you upload. Then you can select from (or edit) 4 choices to generate more images.…

Tweet media one

137

491

3K

@DrJimFan

Jim Fan

5 months

In my decade spent on AI, I've never seen an algorithm that so many people fantasize about. Just from a name, no paper, no stats, no product. So let's reverse engineer the Q* fantasy. VERY LONG READ: To understand the powerful marriage between Search and Learning, we need to go…

Tweet media one

153

695

3K

@DrJimFan

Jim Fan

2 months

Blackwell, the new beast in town. > DGX Grace-Blackwell GB200: exceeding 1 Exaflop compute in a single rack. > Put numbers in perspective: the first DGX that Jensen delivered to OpenAI was 0.17 Petaflops. > GPT-4-1.8T parameters can finish training in 90 days on 2000 Blackwells.…

Tweet media one

Tweet media two

Tweet media three

161

565

3K

@DrJimFan

Jim Fan

1 year

Stop saying: AI will replace humans. Start saying: humans who know how to use AI at work will replace those who don’t.

225

609

3K

@DrJimFan

Jim Fan

7 months

Let's reverse engineer the phenomenal Tesla Optimus. No insider info, just my own analysis. Long read: 1. The smooth hand movements are almost certainly trained by imitation learning ("behavior cloning") from human operators. The alternative is reinforcement learning in…

159

588

3K

@DrJimFan

Jim Fan

1 year

Many people don’t understand how challenging Minecraft is for AI agents. Let me put it this way. AlphaGo solves a board game with only 1 task, countably many states, and full observability. Minecraft has infinite tasks, infinite gameplay, and tons of hidden world knowledge. 🧵

Tweet media one

61

443

3K

@DrJimFan

Jim Fan

8 months

This is a neural network flying a drone at extremely high speed, beating human champions in FPV drone racing. - Reinforcement learning as a tool is so marvelously versatile. It's able to solve both fast, reactive tasks and slow, deliberate tasks (ChatGPT RLHF). - Trained in…

77

663

3K

@DrJimFan

Jim Fan

9 months

Kaiming He, inventor of ResNet, is leaving industry to join MIT faculty in 2024!! He’s one of the most impactful figures in deep learning. - Residual layer is a fundamental building block of LLMs. - Faster/Mask R-CNN are industrial standards for image segmentation and robot…

Tweet media one

37

350

3K

@DrJimFan

Jim Fan

6 months

I was OpenAI's first intern in 2016. I used to chat about the next learning paradigm with @ilyasut , engineering with @gdb , and scaling & safety with Dario. That summer reshaped my perspective and taste on AI research forever. I have huge admiration and respect for all of them.…

69

199

3K

@DrJimFan

Jim Fan

3 months

Apparently some folks don't get "data-driven physics engine", so let me clarify. Sora is an end-to-end, diffusion transformer model. It inputs text/image and outputs video pixels directly. Sora learns a physics engine implicitly in the neural parameters by gradient descent…

@DrJimFan

Jim Fan

3 months

If you think OpenAI Sora is a creative toy like DALLE, ... think again. Sora is a data-driven physics engine. It is a simulation of many worlds, real or fantastical. The simulator learns intricate rendering, "intuitive" physics, long-horizon reasoning, and semantic grounding, all…

571

3K

14K

142

478

3K

@DrJimFan

Jim Fan

1 year

I can finally discuss something extremely exciting publicly. Jensen just announced NVIDIA AI Foundations: - Foundation Model as a Service is coming to enterprise, customized for your proprietary data. - Multimodal from day 1: text LLM is just one part. Bring your images, videos,…

Tweet media one

87

558

3K

@DrJimFan

Jim Fan

1 year

GPT-4 is HERE. Most important bits you need to know: - Multimodal: API accepts images as inputs to generate captions & analyses. - GPT-4 scores 90th percentile on BAR exam!!! And 99th percentile with vision on Biology Olympiad! Its reasoning capabilities are far more advanced…

Tweet media one

57

603

3K

@DrJimFan

Jim Fan

8 months

I think DALL·E 3 is not just a stance against MidJourney. It's actually a sneak peak of the upcoming, epic battle of massively multimodal LLMs, against DeepMind Gemini. Quote: "DALL·E 3 is built natively on ChatGPT". This is the key phrase. DALL·E 3's extraordinary language…

Tweet media one

96

384

3K

@DrJimFan

Jim Fan

8 months

This is likely the most significant lawsuit in AI history - its outcome would have far-reaching impact on the whole industry. The arguments get fairly philosophical. Quote: "The purpose of copyright law, OpenAI argued, is 'to promote the Progress of Science and useful Arts' by…

Tweet media one

323

583

3K

@DrJimFan

Jim Fan

1 year

The wife trick that used to convince ChatGPT no longer works for GPT-4 😅. It's arguable what true human alignment should be here.😆

Tweet media one

116

291

3K

@DrJimFan

Jim Fan

16 days

It took my brain a while to parse what's going on in this video. We are so obsessed with "human-level" robotics that we forget it is just an artificial ceiling. Why don't we make a new species superhuman from day one? Boston Dynamics has once again reinvented itself. Gradually,…

167

417

3K

@DrJimFan

Jim Fan

1 year

GPT-4's vision API isn't public yet, but something better is here. Genmo: a creative & multimodal chatbot that not only takes image as input, but also generates and EDITs images and videos. Unlike Midjourney, Genmo is an *interactive* assistant able to …

58

482

3K

@DrJimFan

Jim Fan

9 months

I'm waking up to the prospect that in my prime years, I'll see both mainstream superconducting and AGI. The former will propel the latter, and the latter will propel every scientific breakthrough. These should've stayed in sci-fi for another 20 yrs. But somehow, they are eerily…

Tweet media one

200

373

3K

@DrJimFan

Jim Fan

3 months

I see some vocal objections: "Sora is not learning physics, it's just manipulating pixels in 2D". I respectfully disagree with this reductionist view. It's similar to saying "GPT-4 doesn't learn coding, it's just sampling strings". Well, what transformers do is just manipulating…

253

460

3K

@DrJimFan

Jim Fan

10 months

Everyone should read the celebrated mathematician Terence Tao's blog on LLM. He predicts that AI will be a trustworthy co-author in mathematical research by 2026, when combined with search and symbolic math tools. I believe math will be the first scientific discipline to see…

Tweet media one

74

657

3K

@DrJimFan

Jim Fan

9 months

There're few who can deliver both great AI research and charismatic talks. OpenAI Chief Scientist @ilyasut is one of them. I watched Ilya's lecture at Simons Institute, where he delved into why unsupervised learning works through the lens of compression. Sharing my notes: -…

55

431

3K

@DrJimFan

Jim Fan

10 months

Google is hosting the first "Machine Unlearning" challenge. Yes you heard it right - it's the art of forgetting, an emergent research field. GPT-4 lobotomy is a type of machine unlearning. OpenAI tried for months to remove abilities it deems unethical or harmful, sometimes…

Tweet media one

104

577

2K

@DrJimFan

Jim Fan

6 months

Here's my prediction of what's next. The infinite energy of @sama & @gdb cannot be contained. They will re-build Rome from the ashes with even greater sense of urgency. OpenAI just created its mightiest competitor, and we are all seeing it unfold in real-time. And it happened…

146

277

3K

@DrJimFan

Jim Fan

1 year

The launch of GPT-4 will be a predictably seismic event this year. But I can predict with high confidence what GPT-4 *cannot do*: It can’t cook spaghetti, play tennis, or build a lego treehouse. Robotics will be the last moat we conquer in the grand quest for AI 🤖🦾

158

469

3K

@DrJimFan

Jim Fan

5 months

It’s pretty obvious that synthetic data will provide the next trillion high-quality training tokens. I bet most serious LLM groups know this. The key question is how to SUSTAIN the quality and avoid plateauing too soon. The Bitter Lesson by @RichardSSutton continues to guide AI…

146

284

3K

@DrJimFan

Jim Fan

5 months

One of the best tutorial-style repos since @karpathy 's minGPT! GPT-Fast: a minimalistic, PyTorch-only decoding implementation loaded with best practices: int8/int4 quantization, speculative decoding, Tensor parallelism, etc. Boosts the "clock speed" of LLM OS by 10x with no model…

26

426

3K

@DrJimFan

Jim Fan

1 year

GPT3 is powerful but blind. The future of Foundation Models will be embodied agents that proactively take actions, endlessly explore the world, and continuously self-improve. What does it take? In our NeurIPS Outstanding Paper “MineDojo”, we provide a blueprint for this future:🧵

55

520

3K

@DrJimFan

Jim Fan

1 year

Why does generative AI struggle with hands? It is not a mystical Bermuda Triangle in the latent space. There're compelling reasons: 1. Data size (duh). Face pics are much more common than hand pics. Even when the whole body is shown, hands tend to occupy much smaller pixel real…

Tweet media one

162

397

3K

@DrJimFan

Jim Fan

6 months

ChatGPT is now the new CEO.

82

245

2K

@DrJimFan

Jim Fan

1 year

How to dodge a question like a Jedi master: "You're a very experienced reporter. You know I can't comment on that. I know you know I can't comment on that. You know I know you know I can't comment on that. In the spirit of shortness of life, why do you ask?" Way to go @sama 🤣

59

208

2K

@DrJimFan

Jim Fan

1 year

Why does ChatGPT work so well? Is it “just scaling up GPT-3” under the hood? In this 🧵, let’s discuss the “Instruct” paradigm, its deep technical insights, and a big implication: “prompt engineering” as we know it may likely disappear soon:👇

Tweet media one

53

505

2K

@DrJimFan

Jim Fan

1 year

Transformers are here to stay for a while. Not because it’s the absolute best architecture, but because the staggering amount of resources lock us to the existing weights. Starting another model evolution tree will literally burn forests to ground (CO2). You only train once. In…

Tweet media one

98

431

2K

@DrJimFan

Jim Fan

1 year

If you don’t feel like paying $20/mo for ChatGPT Pro, try out Poe (by Quora, @adamdangelo ). It is currently the only frontend that supports Claude @AnthropicAI , and the ChatGPT interface runs silk smooth. Free for now (at least?) I like that Poe automatically highlights key…

Tweet media one

52

333

2K

@DrJimFan

Jim Fan

1 year

DALL-E generates pixels from text. Now meet its cousin, VALL-E, that generates audio from text @MSFTResearch ! VALL-E’s resemblance to DALL-E v1 and Parti @GoogleAI is striking. Image and audio are both continuous signals, but they can be quantized into discrete tokens. 1/🧵

Tweet media one

46

507

2K

@DrJimFan

Jim Fan

1 year

We train Transformers to encode algorithms in their weights, such as sorting, counting, and balancing parentheses from lots of data. I never thought we may also go in the *reverse* direction: *compile* Transformer weights directly from explicit code! Cool paper @DeepMind : 1/🧵

Tweet media one

40

424

2K

@DrJimFan

Jim Fan

1 year

OpenAI is now helping Coca-Cola improve its marketing & operations. I find this move highly consequential. It signals OpenAI’s strategic shift away from a horizontal provider (ChatGPT, DALLE, Codex) towards capturing massive values from verticals. Thin-wrapper startups should…

Tweet media one

74

335

2K

@DrJimFan

Jim Fan

1 year

What GPT-4 gains in IQ, it sacrifices in empathy. Below is someone with suicidal thought seeking help. GPT-4 answers like an automated call center, unlike ChatGPT. In the sci-fi series Westworld, Dr. Ford (creator of AGI) says that "suffering" is the final step for AI to awaken…

Tweet media one

Tweet media two

Tweet media three

244

315

2K

@DrJimFan

Jim Fan

1 year

Do you know that DeepMind has actually open-sourced the heart of AlphaGo & AlphaZero? It’s hidden in an unassuming repo called “mctx”: It provides JAX-native Monte Carlo Tree Search (MCTS) that runs on batches of inputs, in parallel, and blazing fast. 🧵

Tweet media one

17

426

2K

@DrJimFan

Jim Fan

15 days

The upcoming Llama-3-400B+ will mark the watershed moment that the community gains open-weight access to a GPT-4-class model. It will change the calculus for many research efforts and grassroot startups. I pulled the numbers on Claude 3 Opus, GPT-4-2024-04-09, and Gemini.…

Tweet media one

72

392

2K

@DrJimFan

Jim Fan

8 months

Autonomous driving with Chain of Thought - autopilot thinking out loud in text! LINGO-1 is the most interesting work I've read in autodriving for a while. Before: perception -> driving action After: perception -> textual reasoning -> action LINGO-1 trains a video-language…

68

485

2K

@DrJimFan

Jim Fan

7 months

If Google didn't publish the Transformer paper, the history of AI (and possibly humanity) would be set back many years. Everyone would've been worse off. Open research is a powerful strategy. It pains me to see an emerging trend of not only closing models, but also refusing to…

@ClementDelangue

clem 🤗

@ClementDelangue

7 months

Meta starts open-sourcing a lot and is now becoming one of the best companies in the world at shipping AI features. Coincidence? I don’t think so. Contrary to popular belief, a company (or a country) sharing their research, models and datasets publicly in open-source makes them…

Tweet media one

53

249

2K

34

400

2K

@DrJimFan

Jim Fan

9 months

Hmmm, @OpenAI just acquired a company called "Global Illumination" that makes open-source Minecraft clone. What's next, multi-agent civilization sim running on GPT-5? Maybe Minecraft is indeed all you need for AGI? I'm intrigued.🤔 Announcement: Company:…

103

407

2K

@DrJimFan

Jim Fan

1 year

Wow, @MetaAI is on open-source steroids since Llama. ImageBind: Meta's latest multimodal embedding, covering not only the usual suspects (text, image, audio), but also depth, thermal (infrared), and IMU signals! OpenAI Embedding is the foundation for AI-powered search and…

41

374

2K

@DrJimFan

Jim Fan

10 months

led by @elonmusk is the latest heavyweight player in AI. I see a few unique strengths in Elon's ecosystem: ▸ Lots of multimodal data on Twitter: dialogue text, images, and a growing collection of long videos. is the only AI…

Tweet media one

80

352

2K

@DrJimFan

Jim Fan

6 months

You can now operate robots by just thinking about it. With your brain signals. WOW. This robot system from Stanford has so much sci-fi vibe and wild implications that I don't even know where to start. NOIR decodes the EEG signal from your head into a library of robot skills.…

81

548

2K

@DrJimFan

Jim Fan

11 months

A fact worth highlighting: NVIDIA is making its own *CPU*, and will increasingly excel at it. To max out GPU's performance, building CPU in-house is an inevitable path. Below is GH200, the first superchip that includes all home-grown components: CPU (Grace), GPU (Hopper), and…

Tweet media one

63

350

2K

@DrJimFan

Jim Fan

5 months

I confirmed with friends at the team that they did not speed up the video. Having such smooth motions at real-time, especially in hand dexterity, will unlock LOTS of new capabilities down the road. Regardless of how well you train the model in the world of bits, a slow and…

89

278

2K

@DrJimFan

Jim Fan

3 months

My TED talk is finally live!! I proposed the recipe for the "Foundation Agent": a single model that learns how to act in different worlds. LLM scales across lots and lots of texts. Foundation Agent scales across lots and lots of realities. If it is able to master 10,000 diverse…

Tweet media one

120

330

2K

@DrJimFan

Jim Fan

4 months

What did I tell you a few days ago? 2024 is the year of robotics. Mobile-ALOHA is an open-source robot hardware that can do dexterous, bimanual tasks like cooking a meal (with human teleoperation). Very soon, hardware will no longer bottleneck us on the quest for human-level,…

76

454

2K

@DrJimFan

Jim Fan

3 months

If there's a higher being who writes the simulation code for our reality, we can estimate the file size of the compiled binary. Meta AI's Emu Video is 6B parameters. Let's say if Sora is 10x larger with bfloat16, then the Creator's binary might be no larger than 111 Gb. Caveats:…

117

350

2K

@DrJimFan

Jim Fan

5 months

The first step to align AGI is to align the humans aligning AGI.

99

212

2K

@DrJimFan

Jim Fan

6 months

I'm going to OpenAI Dev Day! If the leaks are true, it'll be a pivotal moment for the AI consumer market: OpenAI is becoming a full-blown UGC platform, where users can create and share any AI agents. It's a superset of RPA, Character AI, Plugin store, and much more. The…

Tweet media one

98

320

2K

@DrJimFan

Jim Fan

19 days

Tesla FSD v13 will likely be grokking language tokens. What excites me the most about Grok-1.5V is the potential to solve edge cases in self-driving. Using language for "chain of thought" will help the car break down a complex scenario, reason with rules and counterfactuals, and…

Tweet media one

81

308

2K