Evaluation is everything! While testing Inflection-2.5, we found that MT-Bench has a bunch of incorrect answers.
Here we share the corrections for everyone to use, and we release a new Physics GRE benchmark for people to try out.
I'm also in Vienna this week for
@iclr_conf
. Reach out if you want to chat.
And if anyone has advice on moving a family of four to the Bay Area I'm also interested. :)
Read up on the EM algorithm. (It's all the rage now in RL methods!)
This 1998 Neal/Hinton paper is *so clear and readable*, I am amazed.
Far more accessible than the Wikipedia article on the topic.
Spending my lockdown weekends on this *excellent* Physics lecture series by V. Balakrishnan (thanks
@j_foerst
for the recommendation!) in combination with Sussman and Wisdom's SICM . Scheme is fun! I wish
@SymPy
was this functional.
this is wild โ kNN using a gzip-based distance metric outperforms BERT and other neural methods for OOD sentence classification
intuition: 2 texts similar if cat-ing one to the other barely increases gzip size
no training, no tuning, no params โ this is the entire algorithm:
A while back I tweeted about discounting in policy gradient methods and how the policy gradient isnโt even a gradient. With the help of
@MetaAI
colleague Yann Ollivier, I think I understand whatโs going on now. A thread ๐งต. 1/14
Very happy our NLE paper (
#NetHack
for RL) has been accepted at
@NeurIPSConf
2020.
We also worked hard to make it even faster than before; it's now 10x faster. Complex and challenging environments needn't be slow or expensive!
Very happy to see this laborious piece of research get good reviews: RL needs more analysis of quantitative results. Often the tricks that make things work are barely mentioned in our publications as they distract from the story. But they are essential!
Another big, counter-intuitive, take-away: there is no "transfer of skills", multi-tasking merely has "a regularizing effect".
This is a bit too subtle to explain on X, but we have 4 completely different experiments leading to the same conclusion, see Sections 5.4.x
Mild disagreement. PEP 8 explicitly makes the opposite idiomatic and for some data structures (e.g., trees) checking emptiness can be O(1) while length is O(n).
Unpopular opinion: don't rely on implicit truthy constructs in your language, and instead always convert to bool yourself.
For example, in Python rather than "if mylist:", do "if len(mylist) > 0:".
An example of trading more keystrokes for less cognitive burden for readers.
HUGE congrats to Prof Dr
@_rockt
for finally beating the game of
#nethack
and ascending to demigodhood.
I now expect an an AI to achieve the same in no time ;)
That's all! A fully scalable agent in a few lines of code.
To learn more about moolib, check out our repo [1], read our whitepaper [2] or look at our API documentation [3].
[1]
[2]
[3]
Recently, you have begun to find yourself unfulfilled and distant in your daily occupation. Strange dreams of training, learning, evaluating, and analysing have haunted you in your sleep for many months, but you arenโt sure of
the reason. (1/N)
Thanks for all the great responses to yesterday's thread on discounted visitation frequencies in RL.
Here's another ๐งต with some of the papers I learned about this way.
We (Vegard Mella,
@erichammy
,
@DanielleRotherm
) wrote moolib to help with our RL workloads, but it can do much more (e.g., distributed retrieval for knowledge-intensive NLP tasks, ).
Most tricks work better on the stupid than the smart. But one trick that does work on many smart people is making things complicated. Over-engineered systems and over-written prose give them more (though pointless) distinctions to proudly master.
Which brings me to my final thanks: I'm extremely thankful for the opportunity Karรฉn,
@mustafasuleyman
, and
@reidhoffman
gave me by adding me to the founding team in early 2022. Being employee number 2 (after
@JoeFenton
) was an incredible experience.
6/
Some progress on NetHack. You love to see it.
For context: The AI is still exploring only a small part of this hard game. Models like GPT4 know a lot about NetHack when asked but haven't yet been able to play anywhere near human level.
Can reinforcement learning from AI feedback unlock new capabilities in AI agents?
Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement
Our latest model Inflection-2.5 () is not bad. In fact, it was the ~4th best publicly "known" models when it was released in early March. And it was created by our pretraining team of < 15 people!
2/
moolib is based on async RPCs between peers and supports IMPALA-style dynamic batching. For higher-level usecases, its Accumulator object synchronizes gradients between peers, asynchronously. The accumulator is a state machine with 'wants', 'reduces', and 'has' gradients states.
Living on the other side of the pond, I never got the full American Thanksgiving experience. Thankfully, these days we have social media to observe and learn.
@ylecun
@Grady_Booch
@Meta
You, my former friend, are burning your reputation to the ground.
Everyone is telling you to lie down and go home.
Listen to what they are saying. Not for me; for yourself.
Inspired by
@CsabaSzepesvari
's excellent Bandit Algorithms book, here's another _very niche_ blog post: How to show that the Lebesgue measure and integral are well-defined. Many authors make this more complicated than necessary!
Pleased as punch (the drinky kind, not the hurty kind) to be returning to Google
@DeepMind
as Director of Research today. It's an exciting time to be helping develop general agents that can adapt to open-ended environments, communicate with us, and help us in novel ways!
In terms of model quality over time size, Inflection-2.5 is through the roof. How could we train such a good model with such a small team? That's primarily thanks to Jordan Hoffmann. Jordan is amazing and in my opinion one of the world's best AI researchers.
3/
Want to play around with
#StarCraft
, but 256 colors are just too much and you'd miss
@NetHack_LE
's ttyrec replays? And
#NetHack
is more interesting anyway?
I got the solution for you.
I could list people doing amazing things using W&B all day. We should probably make this a regular thing! ๐
Instead I will leave you with some of our users, telling us in their own words what they love about W&B.
But the rest of the team was also amazing all around. That includes our HPC lead, everyone working on modeling and, dearest to me, the infra folks I had the honor to support and learn from.
4/
I sometimes complain about unhelpful "pseudocode" in RL papers. So credit where credit is due: The pseudocode hidden in the Supplementary Data of the AlphaStar paper is _excellent_. Kudos to
@OriolVinyalsML
,
@ibab_ml
,
@trevorycai
!
We built our pretraining and inference stack, partially on top of open source solutions (btw thank you
@PyTorch
), partially just writing things from scratch. And we were the first team to train LLMs on H100 GPUs, using Inflection's 22k cluster ()
5/
Update: Little one just turned 6 months (well, 6x4 weeks) and it's better than ever. First tooth! On the verge of crawling. Locomotor skills better than any from deep RL but still with cute failures. Sleep almost not an issue.
Happy Father's day everyone!
Update 4 months in: Having a kid is lots of fun, can still recommend. Richard Ferber has a point. Pat leave is a great invention. I have no idea how > 1 is supposed to even work. :D
New blog post: Capital asset pricing & Fama-French factor models as examples of Linear Regression. Thanks to the
@RationalRemind
podcast (
@benjaminwfelix
,
@CameronPassmore
) for teaching me this subject and
@egrefen
for bugging me to finally write this up.
First up, a follow-up to Thomas (2014): Nota and Thomas: Is the Policy Gradient a Gradient? (2020)
I just love the grumpiness of this one! They quote from a number of well-known RL papers and conclude for each one: "[their] claim [...] is erroneous"!
Join us at the
@NeurIPSConf
2020 poster session on Thu 5pm GMT if you want to learn about the NetHack Learning Environment and why we believe a terminal-based procedurally generated game from the 80s is pushing the frontier of single-agent RL research.
I am really excited to reveal what
@GoogleDeepMind
's Open Endedness Team has been up to ๐. We introduce Genie ๐ง, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts.
@SimonDeDeo
@peterboghossian
@Liz_Shepherd
@BretWeinstein
You might try serving 3B users with a product developed by tens of thousands of engineers and report back on your failure rate.
I get how this looks and I understand this all seems so easy. Until you try that is.
After seven years, I have returned to
@DeepMind
today ๐ฅ Excited about what lies ahead, and catching up with many old friends and new ones over the coming months!
But the rest of the team was also amazing all around. That includes our HPC lead, everyone working on modeling and, dearest to me, the infra folks I had the honor to support and learn from.
4/
Got a complicated RL exploration problem? Sparse/no reward? It's dangerous to go alone: bring an AMIGo! This thread introduces work done by Andres Campero, with
@robertarail
, Josh B. Tenenbaum,
@HeinrichKuttler
,
@_rockt
and me during Andres' internship at FAIR London. [1/5]
In today's episode,
@l2k
interviews
@_rockt
and
@HeinrichKuttler
, from the
@facebookai
team, on how they are leveling the playing field for training RL models with the help of NetHack, an archaic rogue-like video game from the late 80s.
#deeplearning
Lot of new Twitter followers over the last day. I'm a little sad if that correlates to perceived social power. I did not actually give an order to fire Altman, and if you're here for that, you may as well leave now.
@_rockt
I think part of this is due to our field being driven by clickbait titles. Same reason we show hi-res videos although our agents train on 84x84.
Tired of playing with font sizes and other matplotlib parameters every time you start a new project or write a new plotting function? Use this repo to make your own style file interactively in a jupyter notebook!
@HeleneBismarck
@BorisJohnson
If only the Ukrainian ambassador had shared his take on the German government's position at that time, that might have enlightened things.
Huge congrats to
@samveIyan
for having conceived of and developed MiniHack. Many great people contributed, but it would not have happened without Mika.
Creating rich and complex environments for RL has never been easier!
I'm excited to introduce MiniHack: A Sandbox for Open-Ended Reinforcement Learning Research.
Code:
Paper:
Blogpost:
We (
@HeinrichKuttler
@nntsn
@robertarail
@egrefen
) are looking forward to meeting you at our poster A1 in room B3 in two hoursย
With NLE and 2 GPUs you can train deep RL agents at 1,200,000,000 steps a day in a challenging stochastic procgen environment ๐
Today Iโm excited to announce the first version of our new personal AI, Pi...
Pi is smart, kind and supportive. Itโs designed to be better at natural, flowing conversation than lists, plans, or code.
@b0rk
Also running:
"HEAD is a symbolic reference pointing to wherever you are in your commit history."
"commits are hashes of a tree + parent(s) + author + timestamp + commit message"
"ugit is git in Python!!"
3 years ago my teammates and I set out toward a goal that seemed like science fiction: to build an AI that could strategically outnegotiate humans *in natural language* in Diplomacy. Today, Iโm excited to share our Science paper showing weโve succeeded! ๐งต
It's really hard to argue Spain needs to reduce its gas consumption if Germany insists on shutting down its remaining nuclear reactors. Why should others suffer for Berlin's idiosyncratic policy choices?
NORTH vs SOUTH 2.0:
Spain, Greece and Portugal reject the EU call for 15% cuts in natural gas consumption to help Germany
Spanish Energy Minister (clearly aiming at Berlin): "Contrary to other countries, Spain hasn't been living beyond its means in energy terms"
#EnergyCrisis
We have a long history of supporting responsible open source & science, which can drive rapid research progress, so weโre proud to release Gemma: a set of lightweight open models, best-in-class for their size, inspired by the same tech used for Gemini
"Finally, we would like to pay tribute to the 863,918,816 simulated NetHack heroes who lost their lives in the name of science for this project (thus far)."