a good software engineer will often debug further up the stack to find a bug in third-party software rather than reaching for the first work around. a great software engineer will debug further up the stack until they realise the root bug is Society
i'm pretty anti-college, but doing iconic programming projects (connect 4, raytracer, operating systems, nand2tetris) early-career is a huge indicator of potential,,, a classically trained programmer
i summarized and compiled all of the literature i feel is relevant for catching up on the state of ai in the lm-flavoured space. everything links to directly to the pdf (not the arxiv home)~ it covers 22 models along with two dozen other techniques
transformer inference performance is becoming increasingly important and there's not as much lore on it, so here is a lot of lore that i think fully models llm inference performance
its sad watching founders with perfectly good companies get trainitis, an affliction which compels people to train their own models from scratch. the cause of failure isn't big co doing the startup, but the startup doing the big co. trainitis can happen to anyone 🥺
I love and respect Tom Scott so much, he is an icon of integrity and truth-seeking. The consistency of the channel has been incredibly consistently high quality and I love that he told us he'd stop half a year ago. Big loss but everyone should end their projects with this much…
golden handcuffs apply to status as well, i think its good for the soul and for personal success to maintain a consistent low-status practice, whatever that is for you
I've been at Anthropic for over six months now and I'm happy to recommend it to a friend! We're hiring for software engineers to work on our research, product and infrastructure, and particularly you can come work with me on a newly formed✨Tokens Team!
asked some guy at a party last night at three am to tell me about the last five french presidents and he just did it?? is this what educated people are like zomg
when i've asked someone more senior to debug something and they magically just do it the answer to "how was i supposed to know that" is usually tacit knowledge providing a strong intuition of where the bug is coming from
this man once told me he doesn't like vscode because the characters take too long to render and we're now making him do machine learning... please help him 🥹
I'm hiring for my performance optimization team at Anthropic! Join our excellent team doing kernels, distributed parallelization, and architecture co-design for GPUs, TPUs and Trainium. No problem if you've only done CPUs before! 🧵 More about us:
anyone who wants to go after openai for not open sourcing can take it up with me. i, for one, applaud them for overcoming the unjust forces of nominative determinism
i don't know how anyone survives reading fiction. barely have memories from my early teens because i was just reading fiction. all my idle thoughts are about book. i lose the real world so easily
I saw the best minds of my generation destroyed by synthetic data, starving hysterical naked,
dragging themselves through the eldritch tokens at dawn looking for an exquisite batch
Lot of pitches this week for "perpetual data machines". Either laundering self-generated data or attributing prescience to reward models. Just want to caution that is a common trap smart people fall for.
we have a mathematician who now does ml research, but last week we caught him simulating matmuls entirely in his head for bit per bit accurate results. is this normal behaviour or do we need to give him more proofs?
the vc money pouring into ai is an annoying bubble sure, but there's also the bubble of people who don't realise that the world is about to change faster than they've ever experienced?
usually figuring out if a paper is credible takes reading through it quite thoroughly, even though my strong prior is that papers are bad
sometimes, title is all you need! i know exactly which paper i'd trust more.
this is probably the thought i should've had when i wrote . i lacked foundation and technique as an uneducated webdev, ray tracing actually helped so much
all language models will have lower loss on code than natural language because code has a bunch of boring tokens and so despite this loss difference, haiku will be qualitatively worse at code than text. the notable part of this plot is that text flattened out and code hasn't…
its only been three years since i stopped working on compilers and i've forgotten a concerning amount. annoyingly i've retained opinions and not facts, and though i trust my past self's love for the ocaml garbage collector i do wish i kept the facts instead
ill be at neurips! please find me to talk about alignment, societal impacts, llm (performance), startups and compilers!
also it's my birthday monday and i just took a flight to LA to take a 46 hour train to nola. will tweet train updates in thread💫
did you know? reading a paper signed by the author doubles your learning rate! we are finally releasing our collection of signed machine learning papers. today we are launching where signed machine learning papers are being sold for charity 💖
i've always cared about my job and i feel strongly about companies in general ❤️
it was hard to leave cohere but i had a really great time getting to know these other co:mpanies too 💞
a lot of researchers who think theyre struggling because great work has already been done in the field are actually struggling because they don't know how to build off and work with the existing work which is a different skillset from finding a solution to a problem when there…
the cohere api is generally available! it's available, generally. we have lovely generation models, but more excitingly embeddings models that don't fit into a nice screenshot, you'll have to experience them for yourself! (all non-cherrypicked samples)
im hiring for my team of two at cohere where we're about to start building up entirely new inferencing infrastructure for llms with jax. this is a big greenfield undertaking and we're looking for some lovely engineers (summer interns or full time) to come hold the torch with us
in my head my bar for "human level" is actually "top 0.1% of humans" and i think this is more correct than human medians, in the same spirit of this classic dan luu work
but also i think it's funny that our models will probably have lsat scores worthy of…
i went back to toronto and the vast majority of people i wanted to see have already moved to the us and the ones i did see were mostly plotting to leave
very sad to see how canada is incapable of holding onto ambition but the food and city are phenomenal
in machine learning, i find that "inventors" are never the ones who best understand what they've created, within a year easily dozens of people will know better (except for noam shazeer)
im still convinced that insecurity and narcissism are mostly the same thing and that it would do the world a lot of good if insecurity was regarded with the same distaste
tbc i'm pretty rogue, the only classics i've done are raytracing, contest programming and half of raft,,,, desperately want to do os but only have time for nand2tetris
i think startups people use the word "moat" because people approach saas-shaped products with more or less equal footing. ai is talent constrained and builds up on research (practices) that are not easily imported. "high ground" is probably going to be more relevant than "moat"
its really important to me that we don't fear-monger about ai safety, ive known too many people have their mental health damaged by worrying about xrisk and it has only slowed alignment progress.
@SarvasvKulpati
college doesn't do the classics though. raytracing and os are electives, compilers is always taught like shit and nand2tetris and connect 4 are not that common.
most of my friends who've been early at a startup have seen the same patterns of the founder losing their identity (not just changing as a person, but becoming very inconsistent in whatever identity they adopt) in ways that are unaesthetic and suboptimal for the company
and with that, the ants pulled their crumpled up copies of "computers can be understood" out of the recycling bins and mounted them back onto the frames on their desks. what will cause their next moment of doubt, and how shall they overcome it?
The fact that most individual neurons are uninterpretable presents a serious roadblock to a mechanistic understanding of language models. We demonstrate a method for decomposing groups of neurons into interpretable features with the potential to move past that roadblock.
thinking about the poor vc who replaced google with chatgpt and asks chatgpt for the weather in san francisco every day before putting on his patagonia
my type a personality gets really frustracted when seeing someone insult my place of employment on twitter because i know i could insult it better but i'm not supposed to do that
ant is hiring for a manager (and ics) for a lm systemsy team on pretraining!
whoever gets the job gets to decide what to name it 👀 my top choices are "steps", "throughput", "occupancy" and "goodput"
everyone should be talking about how the body font for seems to be an unreleased butterick font called khyber. he reactivated his bar membership and then made a font for the most beautiful lawsuit website. it's *so good* i love this
spending two months running a startup was a really intense compromising of my values and replaced them when really crummy ones. dropping out of my yc batch and moving to ottawa to work on obscure compilers was me returning to my roots to save myself.
The "Toy Models" part of all this speaks to it being a magnificent nerdsnipe and probably the most interactive mechanistic interpretability work that has been put out. The thread has a colab where you can replicate figures on a single gpu, and the work was replicated three times.
Neural networks often pack many unrelated concepts into a single neuron – a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. In our latest work, we build toy models where the origins of polysemanticity can be fully understood.
i don't think we should never build asi, i just don't think the optimal time is asap. i also think that the funniest way to slow down would be requiring datacenters to be zoned for residential and commercial use
i have friend who is so smart and we talked a lot so i can simulate him and "what would he think" is like a "let's think step by step" prompt hack for my brain
anthropic is hiring for a tokens ̶t̶y̶r̶a̶n̶t̶ manager! tokens is a small, high-leverage team that works on data for pretraining. the role looks something like a research manager who spends say 20% of their time coding. ask me about it if you know me!