Aldous Huxley was born on this day 128 years ago.
Reminds me of that time my utilitarian friend had missed that Brave New World is supposed to be dystopian. 🤪
Cave people staring, slack-jawed, at artifacts of a superior civilization. There will never be a European iPad. The most they could contribute was make threats to get Americans to sigh and add USB-C to it.
With those clowns around there won't be a European GPT-5 either.
Zuck unironically has hands-on experience building AI. People ignored this because it was "lmao lizard robot" phase.
Remember his thing, choosing annual challenges? Wear a tie, Learn Mandarin, butcher cattle by hand? Yeah the theme of 2016 was "build an AI assistant like Jarvis".
Call me paranoid, but dropping the "tbh transformers just aren't great at length generalization: the paper" one day before announcing that they have near-perfect recall and in-context learning over 10M tokens seems suggestive that Gemini 1.5 is a substantially different design.
@cynawk
@einstein2004113
To be clear, just one European country receives more than that in a single month
But I can believe you're illiterate and innumerate so it may be that you sincerely don't get how that undermines your sarcasm
Give someone a fish, and you feed them for a day; teach someone to fish, and you feed them for a lifetime.
Elevating from Weak to Strong with Self-Play Fine-Tuning (SPIN) for All LLMs. Empower, Evolve, SPIN!
@image_origins
They had to rediscover the concept of having a peer group online, because they were helicoptered away from other children in the real world
This is a tragic testament to atomization
> Beff: “institutions have decayed beyond the point of salvaging” and the media is a “vector for cybernetic control of culture”
> The media: gets Director of the National Center for Media Forensics to dox Beff, to reduce his cultural pull
Is evil the absence of self-awareness?
After spending just 20 minutes with the
@MistralAI
model, I am shocked by how unsafe it is. It is very rare these days to see a new model so readily reply to even the most malicious instructions. I am super excited about open-source LLMs, but this can't be it!
Examples below 🧵
@growing_daniel
People actually only ever care for Western values
Those "civilization states", "great powers", it's all fake, lifeless LARP to make rubes work harder
Putin just wants to drink beer in Dresden
Xi would rather be a soccer coach in California
(Bibi is for real though)
Von Neumann's active preference for noise has always fascinated me. Moreover his memory was audial (as per Ulam)!
My hunch is that some very efficient brains are so good at dampening intrinsic noise that they require external perturbations as "random seeds" for novel thought.
Biggest sign of low IQ is being comfortable with annoying background noises
- Smoke alarm bleeps
- Water dripping from a tap
- Loudspeaker phone calls
- Screaming children
- Doors opening/closing in the wind
- Dogs barking
- Bluetooth speaker outside
- Loud conversation
> Microsoft could purchase the NYT
No, brother.
Microsoft is a business. The NYT is a Power, a fortification of a sovereign noble clan appointed by itself to enlighten the lower castes. Merchants cannot buy nobles with mere money.
The New York Times is only an $8B company. Microsoft could purchase the whole shebang with pocket change. Could be worth it to make the lawsuit go away and preserve the value of their investment in OpenAI (and recent $1T increase in the market cap of $MSFT). Yo
@satyanadella
, I’m…
@whstancil
Incidentally, Tamil Brahmins have won India 3 Science Nobels out of 4 total. Their population amounts to 2 million people – a minority within a minority.
Your analysis of rhetorical tactics is astute, but rhetorics only go so far. In the end, you cannot taboo noticing patterns.
E/accs talk a lot of smack about their enemies outlawing math, but did you know that there literally exist illegal numbers? Codellama only exercises sensible caution here. We wouldn't want it to generate a nasty, prohibited prime, would we?
Stick to permitted ones please.
FYI the new CodeLLaMA 70B model refuses to produce code that generates prime numbers.. About 80% of the time it says your request is immoral and cannot be completed.
The edge of OpenAI at this point is not engineering but mythmaking; degrading ML discourse to the level of Marvel comic books – jealous villains angling for insights of eccentric geniuses, party rumors, grand visions. People stop reading papers, tracing authors, thinking.
Murica.
i might have heard the same 😃 -- I guess info like this is passed around but no one wants to say it out loud.
GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference.
Glad that Geohot said it out loud.
Though, at this point, GPT-4 is…
I’m thrilled to share that I've joined
@OpenAI
! 🚀 For years I’ve researched AI self-play and reasoning in games like Poker and Diplomacy. I’ll now investigate how to make these methods truly general. If successful, we may one day see LLMs that are 1,000x better than GPT-4 🌌 1/
Might be late but I am now 100% convinced that Miqu is the same model that's accessible as Mistral-Medium on Perplexity Labs. It was plausible that it knows standard puzzles, but there ain't no way in Hell a pranker has tuned it to identically phrase the responses in Russian too.
@whstancil
Will, as a true southerner, have you truly not noticed that black kids of school age grow up (as in, literally get taller) at a faster pace, or do you believe this is caused by lack of healthcare and whatnot?
Wake up babe, new paper showing that LLMs* are absurdly easy to compress/sparsify/accelerate has dropped
*headline results invariably on OPT-175B, due to ReLU-induced sparcity
Alas, it's not an accident that we see SwiGLU in best-in-class models. Related:
People are so used to models trained for Huggingface Leaderboard they're in disbelief upon seeing a production-grade one. Maybe they shouldn't. Smol Qwens are samples of Tongyi Qianwen, not proofs of concept; to Alibaba, they're kind of like what T5-XXL is to Google.
Alibaba is…
When a brilliant guy like Ilya confidently says a truthy thing like "to compress information really well, what the neural network learns is some representation of the process that produced the text" – it sounds obviously right.
...To people who can't come up with an alternative…
> a massive handsome unit
> awesome hyphenated French name
> did quantum computing research at Alphabet
> I bet he understands MWI better than Yud too
Aside from focusing decel pressure on Extropic investors*, I am not sure what this was supposed to acknowledge
*brace for it
No polite way to say it. The purpose of a system is what it does. Lesswrong is a system for grooming autistic people into fanatics of AI Doom, making use of their social disadaptation and bad taste. This works because the US at large has no intellectual culture to offer them.
I love that this brainworm keeps trying to evolve the defense of never thinking very hard about AI capabilities so you stay as scared as possible of a vague amorphous threat.
This is also what
@ylecun
means when saying that GPT-4 or "GPT-5000" doesn't have a cat's understanding of the world and can't anticipate simple physical causal chains not present in texts.
It's easy to mock his dismissive takes – but, incredibly, there is evidence behind them.
I have been working on vision+language models (VLMs) for a decade.
And every few years, this community re-discovers the same lesson -- that on difficult tasks, VLMs regress to being nearly blind!
Visual content provides minor improvement to a VLM over an LLM, even when these…
Once again: if tech bros had even a smidgeon of self-respect, they'd have bullied this behavior out of existence.
– Sorry [not sorry] for being a possessed politruk, but have you tried being visibly embarrassed about your life?
– …What the fuck is wrong with you? Go read Damore.
@PiotrPadlewski
Sorry but I have to push back a little here. (Again, sorry for doing this today, and congrats on your very deserved achievement.)
When it comes to culture and change, everyone is responsible, or culpable if something is so alarmingly wrong.
Another race
We hear about races often. Frontier labs rushing to AGI, Humanity against Moloch, US vs China. But there's one more, little heard of: the race between doomers completing institutional capture, and their entire theory getting discredited through transparency of AI.…
The groundswell of interest for Llama-1 (and for all our previous open source AI packages, Like PyTorch, DINO, SAM, NLLB, wav2vec...) is what convinced the Meta leadership that the benefits of an open release of Llama-2 would overwhelmingly outweigh the risks and transform the…
Some say this is Sin manifest. But I believe Amorphus Globosus is how the God of Animal Husbandry rushes ahead of the causal chain to deliver us the cattle's ultimate form. The redemption of carnivora: innocent flesh which knew no sentience.
We shall grow them like so much fruit.
Amorphous Globosa, a phenomenon in cattle where instead of a normally developed foetus, a spherical structure covered by hairy skin and/or primitive mouthparts is formed (im obsessed by this btw)
Intel presents LLaVA-Gemma
Accelerating Multimodal Foundation Models with a Compact Language Model
We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs). Of particular
This feels like a turning point in the downfall of the US nu-elite. Misconduct, bad political takes, all gets forgiven – but not the lack of grace. They are BAD at mimicking it, lacking practice and nerve, their voices are petty and shrill. They are painfully Not Elite Material.
# CUDA/C++ origins of Deep Learning
Fun fact many people might have heard about the ImageNet / AlexNet moment of 2012, and the deep learning revolution it started.
What's maybe a bit less known is that the code backing this winning submission to the…
It was one of the bigger blackpills to me. It proved that the water I swim in, all this ostensibly free-thinking, irreverent, "common sense", progressive, New Atheism era online society – is beholden to a vile orthodoxy that's opportunistically destroying its competition.
Learning that Catholic priests sexually abuse at the same rate as Protestant preachers and way less than public school teachers was a real blackpill moment for me. Why was this all about Catholics? Profoundly effective smear campaign.
I was chatting on Discord with an anon account ,who had just come up with a really smart new approach to model merging and had some encouraging results to show.
But then he had to go.
Turns out he's in 10th grade and was in trouble because he wasn't focussing on English class.
Reminder that "IQ only measures how you do on IQ test" is a lie that kind 145 IQ college professors who go for years without a prolonged interaction with a 100 IQ person tell to their insecure above-average students, because they genuinely think that's what "low IQ" is about.
Hyperhuman Era
So we're talking of AI and the end of the human era.
Here's the deal. I'm Russian. Depending on how you look at it, one belonging to the last Soviet or the first Federation generation – child of a dying empire, born to wander concrete skeletons of abandoned…
@teortaxesTex
lol maybe, but suddenly everyone agrees with me that the end of the human era is coming soon.
15 years ago, people were seriously talking about thousands or millions of years to AI.
> they did just keep pretraining the model on an increasingly enriched corpus
> sensible experiments on basic techniques
> 0 goofy self-sabotage, 4.5T tokens, and 7B SoTA
Feels like they have some world class PM genius who's also an OSS fanatic.
🚀 DeepSeekMath: Approaching Mathematical Reasoning Capability of GPT-4 with a 7B Model.
Highlights:
- Continue pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math tokens from Common Crawl.
- Introduce GRPO, a variant of PPO, that enhances mathematical reasoning and reduces…
Meta presents Better & Faster Large Language Models via Multi-token Prediction
- training language models to predict multiple future tokens at once results in higher sample efficiency
- up to 3x faster at inference
Deepseek-67B, trained on 2T tokens, confirms just how suboptimal the llama-2/code data mixture was.
Mistral confirms this is true even if you only use open datasets – nevermind OpenAI's dirty secrets.
But how do you use them?
Contribute to the data lore:
Eliezer Yudkowsky is the defining, paradigm-setting intellectual of our era, much like Karl Marx and Sigmund Freud have been in theirs. Today, he is overlooked largely due to lacking credentials; but our posthuman successors will honor Big Yud with a mile-high diamondoid statue.
Sama must power-trip so hard from all the noise about matching GPT-4, beating it by x%, delivering ≈GPT-4 at y% the price.
Seems like nobody - at least not in the open – has any clue as to how to go even further beyond. GPT-5 will be utterly different, and unmatched at launch.
"Ah, Mr. Potter-Evans-Verres! I'm sorry to say it's time Department of MAGIC confiscated your wand – for public safety that's best, don't you agree? It would be such a shame if these wands were to fall into the wrong hands... like those of our dear old friend, Mr. You-Know-Who."
MAGIC will:
1. Ban the development of AI systems above a certain threshold of computing power.
2. Have an exclusive mandate to conduct critical AI research.
Full paper:
If this two-bit coup without securing the loyalty of crucial parties was the whole gambit, then Ilya Sutskever has done more to disprove the g factor of intelligence than every lefty professor from Lewontin to Gardner to Rutherford
@BillyVacant
> You are born where you are born by complete chance.
Bro do you think reality is a video game with randomized spawning points for souls or what
Wanted to say "bullshit"… uh.
I knew the margin, but didn't realize that the wafer cost of GH200 is merely $200. This is what an optimized process looks like. If TSMC could scale fabs faster, we'd have had >OOM YoY compute growth.
Power is becoming the main constraint already.
To make this work, we integrated MFA GEMM kernel from
@philipturnerar
of project. The result is instant: it improves the overall performance on SD v1 / v2 / XL by about 15%. And this is just a beginning! (4/9)
One of the things we really needed for Sparse Universal Transformers was a fast way to do Mixture-of-Experts for attention. That's why we spent the time to write ScatterMoE.
Ngl it's hilarious how Mistral has established its credentials as the most likely OpenAI slayer on little more than a le excellent 7B, ruthless execution and memes. No Turing Award PIs or Transformer inventors, no in-house silicon, no sweatshops with data annotators, no nothing.
On Bostrom
Yudkowsky must feel lonely now.
@robinhanson
, Bostrom, Drexler,
@perrymetzger
– his intellectual icons and teachers have all defected to some extent; he stands surrounded by rabid believers – and smooth psychos in suits, worming into halls of power.
(Ironically, he's…
FINALLY: AI xrisker Nick Bostrom regrets focusing on AI risk, now worries that our fearful herd mentality will drive us to crush AI and destroy our future potential. (from an UnHerd podcast today)
Nick Bostrom: It would be tragic if we never developed advanced artificial…
@KanizsaBoundary
Speaking of strange aides to talent: most blind mathematicians work in geometry and topology. It is argued that the spatial intuition of sighted people is degraded by the triviality of retinal perception.
No.
You should realize that higher-quality data can bail out smaller models with weaker, computationally cheaper architectures. Eg a 20B MambaMoE pretrained on 1T select tokens will wreck a 72B MHA Transformer fed with 3T of garbage.
This is the current meta.
Why is 𝙼𝚒𝚜𝚝𝚛𝚊𝚕-𝟽𝙱 so good?
I think it is the architecture.
- 𝙼𝚒𝚜𝚝𝚛𝚊𝚕-𝟽𝙱's 𝚖𝚕𝚙 dim: 14336
- 𝙻𝙻𝚊𝙼𝙰's 𝚖𝚕𝚙 dim: 11008
The GQA moved 0.8B parameters from 𝚔_𝚙𝚛𝚘𝚓 & 𝚟_𝚙𝚛𝚘𝚓 to the 𝚖𝚕𝚙's dimension.
Note: 𝚈𝚒-𝟼𝙱 also uses GQA with (smaller…
@austin_rief
@bookwormengr
The thing is, quality of life ain't worth all that much (well okay, worth 2-3 mil). Scrape for "time" all you want – Bezos affects the timeline, you don't, he carves his fate into the world. "A cheeseburger is a cheeseburger" is a thought befitting an irrelevant animal in a zoo.
Reminder that current Groq chips use 230Mb SRAM per, so LLaMA 3 8B in int8 takes >36 chips to run (KV cache not included), with total power draw of 10 H100s, and very dubious unit economics
(h/t
@dylan522p
)
But fast is neat.
Wait, LLaMA 3 on Groq is actually INSANE. Just asked the 8b model (!!!!??) to convert a complete file from λ-Calculus to idiomatic JS... and it did it IN TWO SECONDS. What kind of wizardry is that?
First impressions: I think this is going to be my daily driver from now
(left is vanilla 8B-instruct Q8, right is the orthogonalized instruct exl2-6.5bpw)
@hayxtt
This «we done with appealing to Westerners» machismo death spiral is one of the most pathetic ways for a people to go out, the yelp of a terrified mutt trying to roar like a lion.
Immature protest which ends in the spectacle of bearded infants getting slaughtered for sport.
AGI is already here far as I'm concerned. I think there is no task for which we can credibly say "nope, a SoTA transformer variant can't learn to ace it on a reasonable data budget".
But he's right, Jihadis can still deny us progress. Never underestimate malice inspired by fear.
AGI is not inevitable. It requires hordes of engineers with million dollar paychecks. It requires a fully functional and unrestricted supply chain of the most complex hardware. It requires all of us to allow these companies to gamble with our future.
> "I think it's time to admit defeat"
How often do you see LLMs capitulate instead of doubling down or gaslighting you?
Sadly 8B Llama is struggling with The Diamond Problem (as do all <10B models that don't cheat egregiously), but its attitude sure is more human-like now.
Why do *all* the chat LLMs have such a strong prior that their response was «unclear or confusing» when you call it out as basically wrong and hallucinated?
If it turns out that Llama-3 really is merely on Mistral/Gemma level, but some secret instruct-tuning pipeline launches it into Claude 3 realm, this is awkward. For one thing it makes all those doomer "undoing guardrails" papers relevant.
It also makes a joke of community efforts
One week in, a new doomer paper dropped, showing that refusal behavior in Llama 3 8B Instruct is represented by a one-dimensional linear subspace in activation space.
Four days later, it's apparently been implemented to remove refusals from L3:
let's check
@DavidFSWD
It's a very new paper so no. But in spirit there are similar works. I mainly mean that Llama has weird RLHF'd compunctions we could remove, and they're probably intrinsically very low-rank, like shown in many doomer papers on safety unlearning.
This is why I'm bullish on Deepseek, and more so on OpenCodeInterpreter-like multiturn funetunes. It's not another Leaderboard big boy. It can think. It can reflect on its work. It only needs to be reminded of that.
DSC-67B-OCI will be insanely useful.
Tired of “LLM hacking” hype with no code? Here’s a breath of fresh air.
1. Challenges: open source ✅
2. Solution framework: open source ✅
If you’re interested in hackbots in offsec and you’re craving something you can RUN, you gotta read this