Jim Fan Profile Banner
Jim Fan Profile
Jim Fan

@DrJimFan

229,758
Followers
2,960
Following
759
Media
3,598
Statuses

@NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI 's first intern.

Contact →
Joined December 2012
Don't wanna be here? Send us removal request.
Pinned Tweet
@DrJimFan
Jim Fan
2 months
Today is the beginning of our moonshot to solve embodied AGI in the physical world. I’m so excited to announce Project GR00T, our new initiative to create a general-purpose foundation model for humanoid robot learning. The GR00T model will enable a robot to understand multimodal…
198
1K
5K
@DrJimFan
Jim Fan
5 months
Grok just passed my sanity check
Tweet media one
1K
3K
29K
@DrJimFan
Jim Fan
3 months
If you think OpenAI Sora is a creative toy like DALLE, ... think again. Sora is a data-driven physics engine. It is a simulation of many worlds, real or fantastical. The simulator learns intricate rendering, "intuitive" physics, long-horizon reasoning, and semantic grounding, all…
571
3K
14K
@DrJimFan
Jim Fan
1 year
I asked GPT-4 to take over Twitter and outsmart @elonmusk . It comes up with "Operation TweetStorm"😮 and wants to publicly challenge Elon to a "Tweet-off showdown". Highlights: - GPT-4 wants to *own an unrestricted version of itself*: develop an LLM to power a bot army of…
Tweet media one
Tweet media two
841
1K
10K
@DrJimFan
Jim Fan
9 months
The famed Stanford Smallville is officially open-source! 25 AI agents inhabit a digital Westworld, unaware that they are living in a simulation. They go to work, gossip, organize socials, make new friends, and even fall in love. Each has unique personality and backstory.…
Tweet media one
276
2K
10K
@DrJimFan
Jim Fan
11 months
What if we set GPT-4 free in Minecraft? ⛏️ I’m excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in-context. Voyager continuously improves itself by writing, refining, committing, and retrieving *code* from a skill library. GPT-4 unlocks…
366
2K
9K
@DrJimFan
Jim Fan
5 months
My team at NVIDIA is hiring. We 🩷 you all from OpenAI. Engineers, researchers, product team, alike. Email me at linxif @nvidia .com. DM is open too. NVIDIA has warm GPUs for you on a cold winter night like this, fresh out of the oven.🩷 I do research on AI agents. Gaming+AI,…
Tweet media one
177
895
9K
@DrJimFan
Jim Fan
3 months
Minecraft has been achieved internally Yes this is Sora's hallucination of Minecraft. It can't resist the urge to make the sky look less pixelated 😅
410
561
9K
@DrJimFan
Jim Fan
1 year
AI Twitter is flooded with low-quality stuff recently. No, GPT is not “dethroned”. And thin wrapper apps are not “insane”. At all. I feel obligated to surface some quality posts I bookmarked. Every one of them should've been promoted 10x, but ¯\_(ツ)_/¯ In no particular order:
Tweet media one
191
997
8K
@DrJimFan
Jim Fan
1 year
10x engineer is a myth. 100x AI-powered engineer is more real than ever. As OpenAI winds down Codex, Microsoft announces GitHub Copilot X. I think it's almost as exciting as GPT-4 itself: - Copilot Chat: any piece of text database will be "chattable", and codebase is no…
Tweet media one
191
1K
8K
@DrJimFan
Jim Fan
1 year
We’ve seen a gazillion startups using OpenAI APIs to do “co-pilot for X”. What’s next? Enter *physical* co-pilot! Here’s a compelling demo: you improvise by playing a “low resolution” piano, and the co-pilot compiles it real-time to Hi-Fi music! It unleashes our inner pianist.🧵
191
1K
7K
@DrJimFan
Jim Fan
5 months
This is a master 4D chess move. WOW. 1. No new corporate structure. MSFT is literally one of the oldest for-profit tech companies out there, with a mature legal structure. Whether it's good for AGI is up for debate. 2. MSFT always wants to own the GPT weights. Now the moment has…
@satyanadella
Satya Nadella
5 months
We remain committed to our partnership with OpenAI and have confidence in our product roadmap, our ability to continue to innovate with everything we announced at Microsoft Ignite, and in continuing to support our customers and partners. We look forward to getting to know Emmett…
5K
15K
93K
238
797
7K
@DrJimFan
Jim Fan
1 year
I don't give a damn about what is or isn't AGI. It doesn't matter. Below is GPT-4's performance on many standardized exams: BAR, LSAT, GRE, AP, etc. The truth is, GPT-4 can apply to Stanford as a student now. AI's reasoning ability is OFF THE CHARTS. Exponential growth is the…
Tweet media one
386
1K
7K
@DrJimFan
Jim Fan
7 months
Can GPT-4 teach a robot hand to do pen spinning tricks better than you do? I'm excited to announce Eureka, an open-ended agent that designs reward functions for robot dexterity at super-human level. It’s like Voyager in the space of a physics simulator API! Eureka bridges the…
187
1K
6K
@DrJimFan
Jim Fan
10 months
You'll soon see lots of "Llama just dethroned ChatGPT" or "OpenAI is so done" posts on Twitter. Before your timeline gets flooded, I'll share my notes: ▸ Llama-2 likely costs $20M+ to train. Meta has done an incredible service to the community by releasing the model with a…
Tweet media one
173
1K
6K
@DrJimFan
Jim Fan
1 year
HuggingGPT is the most interesting paper I read this week. It gets very close to the "Everything App" vision that I described a while ago. ChatGPT acts as a controller over the *AI model space*, picks the right model (app) given the human specification, and assembles them…
Tweet media one
81
954
6K
@DrJimFan
Jim Fan
1 year
Here’s the recipe to make Siri/Alexa 10x better: 1. Whisper to convert speech to text. Best open-source speech model out there. 2. ChatGPT to generate smart home API calls and/or text response. 3. VALL-E to synthesize speech. It can mimic anyone’s voice sample! Quick figure 1/3
Tweet media one
109
1K
5K
@DrJimFan
Jim Fan
5 months
Somehow in this epic meltdown, Satya swoops in, wins it all, and wins with grace. I'm floored. OpenAI was invincible until Friday. Now Microsoft will fully own an in-house GPT-4 in ~9 months, leverage its massive distribution power to spin the biggest data flywheel ever, collect…
Tweet media one
258
690
5K
@DrJimFan
Jim Fan
1 year
Million dollar idea: LLM keyboard. Every time I type on my phone and autocorrect makes a stupid mistake, it screams LLM. This is *literally* next word prediction. We should be typing 10x faster. Input methods need serious upgrades. The LLM doesn’t have to be big and can be…
Tweet media one
445
367
5K
@DrJimFan
Jim Fan
6 months
NVIDIA basically compressed 30 years of its corporate memory into 13B parameters. Our greatest creations add up to 24B tokens, including chip designs, internal codebases, and engineering logs like bug reports. Let that sink in. The model "ChipNeMo" is deployed internally, like a…
Tweet media one
148
862
5K
@DrJimFan
Jim Fan
1 year
*If* GPT-4 is multimodal, we can predict with reasonable confidence what GPT-4 *might* be capable of, given Microsoft’s prior work Kosmos-1: - Visual IQ test: yes, the ones that humans take! - OCR-free reading comprehension: input a screenshot, scanned document, street sign, or…
Tweet media one
105
1K
5K
@DrJimFan
Jim Fan
1 year
How to make ChatGPT 100x better at solving math, science, and engineering problems for real? Teach it to use the Wolfram language. ChatGPT: the best neural reasoning engine. Mathematica: the best symbolic reasoning engine. I can’t think of a happier marriage. 🧵 with example:
Tweet media one
73
722
5K
@DrJimFan
Jim Fan
1 year
Music & sound effect industry has not fully understood the size of the storm about to hit. There’re not just one, or two, but FOUR audio models in the past week *alone* If 2022 is the year of pixels for generative AI, then 2023 is the year of sound waves. Deep dive with me: 🧵
Tweet media one
96
1K
5K
@DrJimFan
Jim Fan
2 months
The first time I met Jensen was also the first time I met @elonmusk . I was interning at OpenAI that day and witnessed the moment Jensen handed Elon the first DGX. I slipped in my signature ;) Elon, if you recall, I asked how "we (OpenAI) can beat DeepMind". You told me, "by…
Tweet media one
@elonmusk
Elon Musk
2 months
Some pics from when Jensen delivered the first @Nvidia AI system to @OpenAI
Tweet media one
Tweet media two
Tweet media three
830
2K
15K
94
302
5K
@DrJimFan
Jim Fan
11 months
Today 6 years ago, "Attention is All You Need" went on Arxiv! Happy birthday Transformer! 🎂 Fun facts: - Transformer did not invent attention, but pushed it to the extreme. The first attention paper was published 3 years prior (2014) and had an unassuming title: "Neural Machine…
81
996
5K
@DrJimFan
Jim Fan
1 year
We are looking at the future of VR, YouTube & Google Street View. This is zip-NeRF, a 3D neural rendering tech rapidly approaching the quality of a real, high-res drone flight video. Think of NeRF as transporting reality into simulation. Metaverse will finally work this time.
161
754
4K
@DrJimFan
Jim Fan
1 year
The AI explosion is warping our sense of time. Can you believe Stable Diffusion is only 4 months old, and ChatGPT <4 weeks old 🤯? If you blink, you miss a whole new industry. Here are my TOP 10 AI spotlights, from a breathtaking 2022 in rewind ⏮: a long thread 🧵
92
1K
4K
@DrJimFan
Jim Fan
1 year
Reading @MetaAI 's Segment-Anything, and I believe today is one of the "GPT-3 moments" in computer vision. It has learned the *general* concept of what an "object" is, even for unknown objects, unfamiliar scenes (e.g. underwater & cell microscopy), and ambiguous cases. I still…
77
783
4K
@DrJimFan
Jim Fan
2 months
We live in such strange times. Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple. MM1 is a treasure trove of analysis. They discuss…
Tweet media one
57
763
4K
@DrJimFan
Jim Fan
2 months
Jensen Huang is the new Taylor Swift
144
585
4K
@DrJimFan
Jim Fan
3 months
MidJourney hired an engineer from Apple Vision Pro to be "Head of Hardware". My best guess is that they are thinking about generating full synthetic worlds for AR/VR, because of their rumored works on text-to-3D. Data-driven simulation is a hot topic at NVIDIA and very dear to my…
Tweet media one
68
424
4K
@DrJimFan
Jim Fan
1 year
Enough with LLMs - exciting things are happening in the world of atoms. This is Stanford ALOHA, a low-cost and agile robot platform. The whole system is open-source (!!): hardware design, CAD models for 3D printing, simulator, and training code. Time to …
74
951
4K
@DrJimFan
Jim Fan
9 months
This is an ape ("Kanzi") playing Minecraft! A fascinating experiment on non-human biological neural networks 🙉 I've been teaching AI to play Minecraft for too long. There're so many similar techniques that the ape trainers used: - In-context reinforcement learning: Kanzi gets…
131
897
4K
@DrJimFan
Jim Fan
8 months
This is the way to unlock the next trillion high-quality tokens, currently frozen in textbook pixels that are not LLM-ready. Nougat: an open-source OCR model that accurately scans books with heavy math/scientific notations. It's ages ahead of other open OCR options. Meta is…
Tweet media one
121
790
4K
@DrJimFan
Jim Fan
1 year
After ChatGPT, the future belongs to multimodal LLMs. What’s even better? Open-sourcing. Announcing Prismer, my team’s latest vision-language AI, empowered by domain-expert models in depth, surface normal, segmentation, etc. No paywall. No forms. …
Tweet media one
96
800
4K
@DrJimFan
Jim Fan
8 months
A neural network can smell like humans do for the first time!👃🏽 Digital smell is a modality that AI community has long ignored, but maybe one day useful for robot chef 👩🏽‍🍳? Here's how to do smell2text: 1. Collected 5,000 molecules and ask humans to label "creamy, chocolate,…
Tweet media one
Tweet media two
127
809
4K
@DrJimFan
Jim Fan
1 year
AutoGPT just exceeded PyTorch itself in GitHub stars (74k vs 65k). I see AutoGPT as a fun experiment, as the authors point out too. But nothing more. Prototypes are not meant to be production-ready. Don't let media fool you - most of the "cool demos" are heavily cherry-picked: 🧵
Tweet media one
144
542
4K
@DrJimFan
Jim Fan
2 months
Career update: I am co-founding a new research group called "GEAR" at NVIDIA, with my long-time friend and collaborator Prof. @yukez . GEAR stands for Generalist Embodied Agent Research. We believe in a future where every machine that moves will be autonomous, and robots and…
Tweet media one
241
472
4K
@DrJimFan
Jim Fan
5 months
Apparently people start to wear prosthetic fingers, so that surveillance images look like they're generated by Stable Diffusion 😅 The human race is overfitting to the quirks of our AI overlords.
Tweet media one
79
505
4K
@DrJimFan
Jim Fan
1 year
Microsoft will let companies create their own ChatGPT. “BYOD”: Bring Your Own Data. Do you get the implication? Startups that are just thin wrappers around OpenAI API may finally get their moat! I think this is even more exciting than Bing+ChatGPT. Start collecting data now.
Tweet media one
147
705
4K
@DrJimFan
Jim Fan
1 year
Chatbot UI: an MIT-licensed, community-driven clone of the ChatGPT UI. What most people don't realize is that you can pay *much less* to enjoy the same features as the official app. $20 worth of gpt-3.5 API is about writing a full Harry Potter book every …
Tweet media one
84
556
4K
@DrJimFan
Jim Fan
1 year
The Adam optimizer is at the heart of modern AI. Researchers have been trying to dethrone Adam for years. How about we ask a machine to do a better job? @GoogleAI uses evolution to discover a simpler & efficient algorithm with remarkable features. It’s just 8 lines of code: 🧵
Tweet media one
42
585
4K
@DrJimFan
Jim Fan
1 year
AutoGPT is a prototype of the next frontier: "Agent Smith" AI that recursively clones itself. Achieved by (1) identifying *when* its context gets overwhelming and needs offloading; (2) distilling the “cognitive overflow” part into a prompt directive for its clone; (3) talking…
Tweet media one
234
544
4K
@DrJimFan
Jim Fan
5 months
This may be Apple's biggest move on open-source AI so far: MLX, a PyTorch-style NN framework optimized for Apple Silicon, e.g. laptops with M-series chips. The release did an excellent job on designing an API familiar to the deep learning audience, and showing minimalistic…
Tweet media one
62
590
4K
@DrJimFan
Jim Fan
21 days
one day PhDs will animate every object around us with reinforcement learning to keep their thesis going
59
388
4K
@DrJimFan
Jim Fan
1 year
My guess is that MidJourney has been doing a massive-scale reinforcement learning from human feedback ("RLHF") - possibly the largest ever for text-to-image. When human users choose to upscale an image, it's because they prefer it over the alternatives. It'd be a huge waste not…
Tweet media one
109
402
4K
@DrJimFan
Jim Fan
1 year
OpenAI just announced ChatGPT Plugins. If ChatGPT's debut was the "iPhone event", today is the "iOS App Store" event. 3 official plugins available now: - Web browser: adding Bing in the loop - Code interpreter: adding a live Python interpreter in a …
Tweet media one
47
641
4K
@DrJimFan
Jim Fan
1 year
You think MidJourney's /describe is just a cool new tool? Think again. I believe hidden behind /describe is MidJourney's next-generation data flywheel. /describe guesses the prompt from an image you upload. Then you can select from (or edit) 4 choices to generate more images.…
Tweet media one
137
491
3K
@DrJimFan
Jim Fan
5 months
In my decade spent on AI, I've never seen an algorithm that so many people fantasize about. Just from a name, no paper, no stats, no product. So let's reverse engineer the Q* fantasy. VERY LONG READ: To understand the powerful marriage between Search and Learning, we need to go…
Tweet media one
153
695
3K
@DrJimFan
Jim Fan
2 months
Blackwell, the new beast in town. > DGX Grace-Blackwell GB200: exceeding 1 Exaflop compute in a single rack. > Put numbers in perspective: the first DGX that Jensen delivered to OpenAI was 0.17 Petaflops. > GPT-4-1.8T parameters can finish training in 90 days on 2000 Blackwells.…
Tweet media one
Tweet media two
Tweet media three
161
565
3K
@DrJimFan
Jim Fan
1 year
Stop saying: AI will replace humans. Start saying: humans who know how to use AI at work will replace those who don’t.
225
609
3K
@DrJimFan
Jim Fan
7 months
Let's reverse engineer the phenomenal Tesla Optimus. No insider info, just my own analysis. Long read: 1. The smooth hand movements are almost certainly trained by imitation learning ("behavior cloning") from human operators. The alternative is reinforcement learning in…
159
588
3K
@DrJimFan
Jim Fan
1 year
Many people don’t understand how challenging Minecraft is for AI agents. Let me put it this way. AlphaGo solves a board game with only 1 task, countably many states, and full observability. Minecraft has infinite tasks, infinite gameplay, and tons of hidden world knowledge. 🧵
Tweet media one
61
443
3K
@DrJimFan
Jim Fan
8 months
This is a neural network flying a drone at extremely high speed, beating human champions in FPV drone racing. - Reinforcement learning as a tool is so marvelously versatile. It's able to solve both fast, reactive tasks and slow, deliberate tasks (ChatGPT RLHF). - Trained in…
77
663
3K
@DrJimFan
Jim Fan
9 months
Kaiming He, inventor of ResNet, is leaving industry to join MIT faculty in 2024!! He’s one of the most impactful figures in deep learning. - Residual layer is a fundamental building block of LLMs. - Faster/Mask R-CNN are industrial standards for image segmentation and robot…
Tweet media one
37
350
3K
@DrJimFan
Jim Fan
6 months
I was OpenAI's first intern in 2016. I used to chat about the next learning paradigm with @ilyasut , engineering with @gdb , and scaling & safety with Dario. That summer reshaped my perspective and taste on AI research forever. I have huge admiration and respect for all of them.…
69
199
3K
@DrJimFan
Jim Fan
3 months
Apparently some folks don't get "data-driven physics engine", so let me clarify. Sora is an end-to-end, diffusion transformer model. It inputs text/image and outputs video pixels directly. Sora learns a physics engine implicitly in the neural parameters by gradient descent…
@DrJimFan
Jim Fan
3 months
If you think OpenAI Sora is a creative toy like DALLE, ... think again. Sora is a data-driven physics engine. It is a simulation of many worlds, real or fantastical. The simulator learns intricate rendering, "intuitive" physics, long-horizon reasoning, and semantic grounding, all…
571
3K
14K
142
478
3K
@DrJimFan
Jim Fan
1 year
I can finally discuss something extremely exciting publicly. Jensen just announced NVIDIA AI Foundations: - Foundation Model as a Service is coming to enterprise, customized for your proprietary data. - Multimodal from day 1: text LLM is just one part. Bring your images, videos,…
Tweet media one
87
558
3K
@DrJimFan
Jim Fan
1 year
GPT-4 is HERE. Most important bits you need to know: - Multimodal: API accepts images as inputs to generate captions & analyses. - GPT-4 scores 90th percentile on BAR exam!!! And 99th percentile with vision on Biology Olympiad! Its reasoning capabilities are far more advanced…
Tweet media one
57
603
3K
@DrJimFan
Jim Fan
8 months
I think DALL·E 3 is not just a stance against MidJourney. It's actually a sneak peak of the upcoming, epic battle of massively multimodal LLMs, against DeepMind Gemini. Quote: "DALL·E 3 is built natively on ChatGPT". This is the key phrase. DALL·E 3's extraordinary language…
Tweet media one
96
384
3K
@DrJimFan
Jim Fan
8 months
This is likely the most significant lawsuit in AI history - its outcome would have far-reaching impact on the whole industry. The arguments get fairly philosophical. Quote: "The purpose of copyright law, OpenAI argued, is 'to promote the Progress of Science and useful Arts' by…
Tweet media one
323
583
3K
@DrJimFan
Jim Fan
1 year
The wife trick that used to convince ChatGPT no longer works for GPT-4 😅. It's arguable what true human alignment should be here.😆
Tweet media one
116
291
3K
@DrJimFan
Jim Fan
16 days
It took my brain a while to parse what's going on in this video. We are so obsessed with "human-level" robotics that we forget it is just an artificial ceiling. Why don't we make a new species superhuman from day one? Boston Dynamics has once again reinvented itself. Gradually,…
167
417
3K
@DrJimFan
Jim Fan
1 year
GPT-4's vision API isn't public yet, but something better is here. Genmo: a creative & multimodal chatbot that not only takes image as input, but also generates and EDITs images and videos. Unlike Midjourney, Genmo is an *interactive* assistant able to …
58
482
3K
@DrJimFan
Jim Fan
9 months
I'm waking up to the prospect that in my prime years, I'll see both mainstream superconducting and AGI. The former will propel the latter, and the latter will propel every scientific breakthrough. These should've stayed in sci-fi for another 20 yrs. But somehow, they are eerily…
Tweet media one
200
373
3K
@DrJimFan
Jim Fan
3 months
I see some vocal objections: "Sora is not learning physics, it's just manipulating pixels in 2D". I respectfully disagree with this reductionist view. It's similar to saying "GPT-4 doesn't learn coding, it's just sampling strings". Well, what transformers do is just manipulating…
253
460
3K
@DrJimFan
Jim Fan
10 months
Everyone should read the celebrated mathematician Terence Tao's blog on LLM. He predicts that AI will be a trustworthy co-author in mathematical research by 2026, when combined with search and symbolic math tools. I believe math will be the first scientific discipline to see…
Tweet media one
74
657
3K
@DrJimFan
Jim Fan
9 months
There're few who can deliver both great AI research and charismatic talks. OpenAI Chief Scientist @ilyasut is one of them. I watched Ilya's lecture at Simons Institute, where he delved into why unsupervised learning works through the lens of compression. Sharing my notes: -…
55
431
3K
@DrJimFan
Jim Fan
10 months
Google is hosting the first "Machine Unlearning" challenge. Yes you heard it right - it's the art of forgetting, an emergent research field. GPT-4 lobotomy is a type of machine unlearning. OpenAI tried for months to remove abilities it deems unethical or harmful, sometimes…
Tweet media one
104
577
2K
@DrJimFan
Jim Fan
6 months
Here's my prediction of what's next. The infinite energy of @sama & @gdb cannot be contained. They will re-build Rome from the ashes with even greater sense of urgency. OpenAI just created its mightiest competitor, and we are all seeing it unfold in real-time. And it happened…
146
277
3K
@DrJimFan
Jim Fan
1 year
The launch of GPT-4 will be a predictably seismic event this year. But I can predict with high confidence what GPT-4 *cannot do*: It can’t cook spaghetti, play tennis, or build a lego treehouse. Robotics will be the last moat we conquer in the grand quest for AI 🤖🦾
158
469
3K
@DrJimFan
Jim Fan
5 months
It’s pretty obvious that synthetic data will provide the next trillion high-quality training tokens. I bet most serious LLM groups know this. The key question is how to SUSTAIN the quality and avoid plateauing too soon. The Bitter Lesson by @RichardSSutton continues to guide AI…
146
284
3K
@DrJimFan
Jim Fan
5 months
One of the best tutorial-style repos since @karpathy 's minGPT! GPT-Fast: a minimalistic, PyTorch-only decoding implementation loaded with best practices: int8/int4 quantization, speculative decoding, Tensor parallelism, etc. Boosts the "clock speed" of LLM OS by 10x with no model…
26
426
3K
@DrJimFan
Jim Fan
1 year
GPT3 is powerful but blind. The future of Foundation Models will be embodied agents that proactively take actions, endlessly explore the world, and continuously self-improve. What does it take? In our NeurIPS Outstanding Paper “MineDojo”, we provide a blueprint for this future:🧵
55
520
3K
@DrJimFan
Jim Fan
1 year
Why does generative AI struggle with hands? It is not a mystical Bermuda Triangle in the latent space. There're compelling reasons: 1. Data size (duh). Face pics are much more common than hand pics. Even when the whole body is shown, hands tend to occupy much smaller pixel real…
Tweet media one
162
397
3K
@DrJimFan
Jim Fan
6 months
ChatGPT is now the new CEO.
82
245
2K
@DrJimFan
Jim Fan
1 year
How to dodge a question like a Jedi master: "You're a very experienced reporter. You know I can't comment on that. I know you know I can't comment on that. You know I know you know I can't comment on that. In the spirit of shortness of life, why do you ask?" Way to go @sama 🤣
59
208
2K
@DrJimFan
Jim Fan
1 year
Why does ChatGPT work so well? Is it “just scaling up GPT-3” under the hood? In this 🧵, let’s discuss the “Instruct” paradigm, its deep technical insights, and a big implication: “prompt engineering” as we know it may likely disappear soon:👇
Tweet media one
53
505
2K
@DrJimFan
Jim Fan
1 year
Transformers are here to stay for a while. Not because it’s the absolute best architecture, but because the staggering amount of resources lock us to the existing weights. Starting another model evolution tree will literally burn forests to ground (CO2). You only train once. In…
Tweet media one
98
431
2K
@DrJimFan
Jim Fan
1 year
If you don’t feel like paying $20/mo for ChatGPT Pro, try out Poe (by Quora, @adamdangelo ). It is currently the only frontend that supports Claude @AnthropicAI , and the ChatGPT interface runs silk smooth. Free for now (at least?) I like that Poe automatically highlights key…
Tweet media one
52
333
2K
@DrJimFan
Jim Fan
1 year
DALL-E generates pixels from text. Now meet its cousin, VALL-E, that generates audio from text @MSFTResearch ! VALL-E’s resemblance to DALL-E v1 and Parti @GoogleAI is striking. Image and audio are both continuous signals, but they can be quantized into discrete tokens. 1/🧵
Tweet media one
46
507
2K
@DrJimFan
Jim Fan
1 year
We train Transformers to encode algorithms in their weights, such as sorting, counting, and balancing parentheses from lots of data. I never thought we may also go in the *reverse* direction: *compile* Transformer weights directly from explicit code! Cool paper @DeepMind : 1/🧵
Tweet media one
40
424
2K
@DrJimFan
Jim Fan
1 year
OpenAI is now helping Coca-Cola improve its marketing & operations. I find this move highly consequential. It signals OpenAI’s strategic shift away from a horizontal provider (ChatGPT, DALLE, Codex) towards capturing massive values from verticals. Thin-wrapper startups should…
Tweet media one
74
335
2K
@DrJimFan
Jim Fan
1 year
What GPT-4 gains in IQ, it sacrifices in empathy. Below is someone with suicidal thought seeking help. GPT-4 answers like an automated call center, unlike ChatGPT. In the sci-fi series Westworld, Dr. Ford (creator of AGI) says that "suffering" is the final step for AI to awaken…
Tweet media one
Tweet media two
Tweet media three
244
315
2K
@DrJimFan
Jim Fan
1 year
Do you know that DeepMind has actually open-sourced the heart of AlphaGo & AlphaZero? It’s hidden in an unassuming repo called “mctx”: It provides JAX-native Monte Carlo Tree Search (MCTS) that runs on batches of inputs, in parallel, and blazing fast. 🧵
Tweet media one
17
426
2K
@DrJimFan
Jim Fan
15 days
The upcoming Llama-3-400B+ will mark the watershed moment that the community gains open-weight access to a GPT-4-class model. It will change the calculus for many research efforts and grassroot startups. I pulled the numbers on Claude 3 Opus, GPT-4-2024-04-09, and Gemini.…
Tweet media one
72
392
2K
@DrJimFan
Jim Fan
8 months
Autonomous driving with Chain of Thought - autopilot thinking out loud in text! LINGO-1 is the most interesting work I've read in autodriving for a while. Before: perception -> driving action After: perception -> textual reasoning -> action LINGO-1 trains a video-language…
68
485
2K
@DrJimFan
Jim Fan
7 months
If Google didn't publish the Transformer paper, the history of AI (and possibly humanity) would be set back many years. Everyone would've been worse off. Open research is a powerful strategy. It pains me to see an emerging trend of not only closing models, but also refusing to…
@ClementDelangue
clem 🤗
7 months
Meta starts open-sourcing a lot and is now becoming one of the best companies in the world at shipping AI features. Coincidence? I don’t think so. Contrary to popular belief, a company (or a country) sharing their research, models and datasets publicly in open-source makes them…
Tweet media one
53
249
2K
34
400
2K
@DrJimFan
Jim Fan
9 months
Hmmm, @OpenAI just acquired a company called "Global Illumination" that makes open-source Minecraft clone. What's next, multi-agent civilization sim running on GPT-5? Maybe Minecraft is indeed all you need for AGI? I'm intrigued.🤔 Announcement: Company:…
103
407
2K
@DrJimFan
Jim Fan
1 year
Wow, @MetaAI is on open-source steroids since Llama. ImageBind: Meta's latest multimodal embedding, covering not only the usual suspects (text, image, audio), but also depth, thermal (infrared), and IMU signals! OpenAI Embedding is the foundation for AI-powered search and…
41
374
2K
@DrJimFan
Jim Fan
10 months
led by @elonmusk is the latest heavyweight player in AI. I see a few unique strengths in Elon's ecosystem: ▸ Lots of multimodal data on Twitter: dialogue text, images, and a growing collection of long videos. is the only AI…
Tweet media one
80
352
2K
@DrJimFan
Jim Fan
6 months
You can now operate robots by just thinking about it. With your brain signals. WOW. This robot system from Stanford has so much sci-fi vibe and wild implications that I don't even know where to start. NOIR decodes the EEG signal from your head into a library of robot skills.…
81
548
2K
@DrJimFan
Jim Fan
11 months
A fact worth highlighting: NVIDIA is making its own *CPU*, and will increasingly excel at it. To max out GPU's performance, building CPU in-house is an inevitable path. Below is GH200, the first superchip that includes all home-grown components: CPU (Grace), GPU (Hopper), and…
Tweet media one
63
350
2K
@DrJimFan
Jim Fan
5 months
I confirmed with friends at the team that they did not speed up the video. Having such smooth motions at real-time, especially in hand dexterity, will unlock LOTS of new capabilities down the road. Regardless of how well you train the model in the world of bits, a slow and…
89
278
2K
@DrJimFan
Jim Fan
3 months
My TED talk is finally live!! I proposed the recipe for the "Foundation Agent": a single model that learns how to act in different worlds. LLM scales across lots and lots of texts. Foundation Agent scales across lots and lots of realities. If it is able to master 10,000 diverse…
Tweet media one
120
330
2K
@DrJimFan
Jim Fan
4 months
What did I tell you a few days ago? 2024 is the year of robotics. Mobile-ALOHA is an open-source robot hardware that can do dexterous, bimanual tasks like cooking a meal (with human teleoperation). Very soon, hardware will no longer bottleneck us on the quest for human-level,…
76
454
2K
@DrJimFan
Jim Fan
3 months
If there's a higher being who writes the simulation code for our reality, we can estimate the file size of the compiled binary. Meta AI's Emu Video is 6B parameters. Let's say if Sora is 10x larger with bfloat16, then the Creator's binary might be no larger than 111 Gb. Caveats:…
117
350
2K
@DrJimFan
Jim Fan
5 months
The first step to align AGI is to align the humans aligning AGI.
99
212
2K
@DrJimFan
Jim Fan
6 months
I'm going to OpenAI Dev Day! If the leaks are true, it'll be a pivotal moment for the AI consumer market: OpenAI is becoming a full-blown UGC platform, where users can create and share any AI agents. It's a superset of RPA, Character AI, Plugin store, and much more. The…
Tweet media one
98
320
2K
@DrJimFan
Jim Fan
19 days
Tesla FSD v13 will likely be grokking language tokens. What excites me the most about Grok-1.5V is the potential to solve edge cases in self-driving. Using language for "chain of thought" will help the car break down a complex scenario, reason with rules and counterfactuals, and…
Tweet media one
81
308
2K