Teortaxes▶️ Profile Banner
Teortaxes▶️ Profile
Teortaxes▶️

@teortaxesTex

8,073
Followers
1,357
Following
3,812
Media
27,531
Statuses

Ours is the age of unaligned utilitarians. Other problems are relatively unimportant, but sometimes I tweet about them anyway. (кто/кого)

Joined September 2010
Don't wanna be here? Send us removal request.
Pinned Tweet
@teortaxesTex
Teortaxes▶️
5 months
Tweet media one
@StefanFSchubert
Stefan Schubert
2 years
Aldous Huxley was born on this day 128 years ago. Reminds me of that time my utilitarian friend had missed that Brave New World is supposed to be dystopian. 🤪
Tweet media one
6
1
75
6
5
113
@teortaxesTex
Teortaxes▶️
5 months
Cave people staring, slack-jawed, at artifacts of a superior civilization. There will never be a European iPad. The most they could contribute was make threats to get Americans to sigh and add USB-C to it. With those clowns around there won't be a European GPT-5 either.
@ThierryBreton
Thierry Breton
5 months
Reviewing what could be the final drafting. #AIAct #Trilogue
Tweet media one
282
83
624
68
196
2K
@teortaxesTex
Teortaxes▶️
3 months
@einstein2004113 0.01% of its total population is like 50 thousand people but you know that
24
5
2K
@teortaxesTex
Teortaxes▶️
24 days
Zuck unironically has hands-on experience building AI. People ignored this because it was "lmao lizard robot" phase. Remember his thing, choosing annual challenges? Wear a tie, Learn Mandarin, butcher cattle by hand? Yeah the theme of 2016 was "build an AI assistant like Jarvis".
Tweet media one
Tweet media two
@aidan_mclau
Aidan McLau
25 days
unbelievable based. zuck is llama3 author
8
25
452
15
75
1K
@teortaxesTex
Teortaxes▶️
6 months
@growing_daniel It *is* neat though. And she asks the right question – how is it not a thing yet?
16
1
1K
@teortaxesTex
Teortaxes▶️
3 months
Call me paranoid, but dropping the "tbh transformers just aren't great at length generalization: the paper" one day before announcing that they have near-perfect recall and in-context learning over 10M tokens seems suggestive that Gemini 1.5 is a substantially different design.
Tweet media one
Tweet media two
35
47
817
@teortaxesTex
Teortaxes▶️
8 months
@toonholechris Royally Inbred vs [ostensibly] outbred > inb4 Alexei Nikolaevich Yes, well known as a drooling mutant, now go look him up
Tweet media one
Tweet media two
24
26
722
@teortaxesTex
Teortaxes▶️
3 months
@cynawk @einstein2004113 To be clear, just one European country receives more than that in a single month But I can believe you're illiterate and innumerate so it may be that you sincerely don't get how that undermines your sarcasm
Tweet media one
6
8
677
@teortaxesTex
Teortaxes▶️
2 months
It seems that results of that Microsoft paper about ternary LLMs can be replicated after all – for 3B @100B at least.
Tweet media one
18
97
697
@teortaxesTex
Teortaxes▶️
4 months
I don't think about this often, but sometimes I do
Tweet media one
@QuanquanGu
Quanquan Gu
4 months
Give someone a fish, and you feed them for a day; teach someone to fish, and you feed them for a lifetime. Elevating from Weak to Strong with Self-Play Fine-Tuning (SPIN) for All LLMs. Empower, Evolve, SPIN!
Tweet media one
15
62
426
14
34
687
@teortaxesTex
Teortaxes▶️
5 months
@image_origins They had to rediscover the concept of having a peer group online, because they were helicoptered away from other children in the real world This is a tragic testament to atomization
5
7
621
@teortaxesTex
Teortaxes▶️
9 months
An obscure but very powerful effect in condensed matter physics, might explain a lot about LK-99
Tweet media one
19
38
618
@teortaxesTex
Teortaxes▶️
5 months
> Beff: “institutions have decayed beyond the point of salvaging” and the media is a “vector for cybernetic control of culture” > The media: gets Director of the National Center for Media Forensics to dox Beff, to reduce his cultural pull Is evil the absence of self-awareness?
Tweet media one
@NickThoughtRepo
Nick
5 months
@thegarrettscott @BasedBeffJezos Even better, Emily Baker-White gaslit us on the definition of doxxing
Tweet media one
10
5
169
21
67
569
@teortaxesTex
Teortaxes▶️
8 months
Guys like this are the precise reason Mistral was announced with a magnet link
@paul_rottger
Paul Röttger
8 months
After spending just 20 minutes with the @MistralAI model, I am shocked by how unsafe it is. It is very rare these days to see a new model so readily reply to even the most malicious instructions. I am super excited about open-source LLMs, but this can't be it! Examples below 🧵
217
107
768
13
25
566
@teortaxesTex
Teortaxes▶️
26 days
From the thread: lllama-3 8b has at least 32k near-perfect needle retrieval (RoPE theta of 4)
Tweet media one
11
55
584
@teortaxesTex
Teortaxes▶️
4 months
@kunley_drukpa That's not slop though
8
0
553
@teortaxesTex
Teortaxes▶️
6 months
@growing_daniel People actually only ever care for Western values Those "civilization states", "great powers", it's all fake, lifeless LARP to make rubes work harder Putin just wants to drink beer in Dresden Xi would rather be a soccer coach in California (Bibi is for real though)
3
3
548
@teortaxesTex
Teortaxes▶️
5 months
@ichthys30 > mid baiting so hard
4
3
528
@teortaxesTex
Teortaxes▶️
6 months
@karpathy what kind of heart emoji is that Andrej
8
4
538
@teortaxesTex
Teortaxes▶️
4 months
Von Neumann's active preference for noise has always fascinated me. Moreover his memory was audial (as per Ulam)! My hunch is that some very efficient brains are so good at dampening intrinsic noise that they require external perturbations as "random seeds" for novel thought.
Tweet media one
Tweet media two
@howyegettingon
💉simon
4 months
Biggest sign of low IQ is being comfortable with annoying background noises - Smoke alarm bleeps - Water dripping from a tap - Loudspeaker phone calls - Screaming children - Doors opening/closing in the wind - Dogs barking - Bluetooth speaker outside - Loud conversation
904
801
10K
34
32
519
@teortaxesTex
Teortaxes▶️
5 months
> Microsoft could purchase the NYT No, brother. Microsoft is a business. The NYT is a Power, a fortification of a sovereign noble clan appointed by itself to enlighten the lower castes. Merchants cannot buy nobles with mere money.
Tweet media one
@MikePFrank
Michael P. Frank is joining a startup!
5 months
The New York Times is only an $8B company. Microsoft could purchase the whole shebang with pocket change. Could be worth it to make the lawsuit go away and preserve the value of their investment in OpenAI (and recent $1T increase in the market cap of $MSFT). Yo @satyanadella , I’m…
54
8
127
30
19
475
@teortaxesTex
Teortaxes▶️
4 months
@whstancil Incidentally, Tamil Brahmins have won India 3 Science Nobels out of 4 total. Their population amounts to 2 million people – a minority within a minority. Your analysis of rhetorical tactics is astute, but rhetorics only go so far. In the end, you cannot taboo noticing patterns.
Tweet media one
24
16
456
@teortaxesTex
Teortaxes▶️
6 months
@DrJimFan Satya is the most powerful Indian on the planet now
10
14
460
@teortaxesTex
Teortaxes▶️
6 months
@nivi @Orwelian84 The board said that Mr. Altman has been "lowkey one shifty mf, fr fr", according to the people familiar with the executives.
6
19
457
@teortaxesTex
Teortaxes▶️
4 months
E/accs talk a lot of smack about their enemies outlawing math, but did you know that there literally exist illegal numbers? Codellama only exercises sensible caution here. We wouldn't want it to generate a nasty, prohibited prime, would we? Stick to permitted ones please.
Tweet media one
@MrCatid
catid (e/acc)
4 months
FYI the new CodeLLaMA 70B model refuses to produce code that generates prime numbers.. About 80% of the time it says your request is immoral and cannot be completed.
64
43
863
16
29
451
@teortaxesTex
Teortaxes▶️
1 month
Never give up
Tweet media one
4
38
437
@teortaxesTex
Teortaxes▶️
11 months
The edge of OpenAI at this point is not engineering but mythmaking; degrading ML discourse to the level of Marvel comic books – jealous villains angling for insights of eccentric geniuses, party rumors, grand visions. People stop reading papers, tracing authors, thinking. Murica.
Tweet media one
Tweet media two
@soumithchintala
Soumith Chintala
11 months
i might have heard the same 😃 -- I guess info like this is passed around but no one wants to say it out loud. GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference. Glad that Geohot said it out loud. Though, at this point, GPT-4 is…
57
390
2K
9
38
404
@teortaxesTex
Teortaxes▶️
8 months
@toonholechris Some people who like self-depreciating humor are surprisingly thin-skinned, lol
Tweet media one
3
7
368
@teortaxesTex
Teortaxes▶️
5 months
please say it ain't so @ylecun
Tweet media one
@DimitrisPapail
Dimitris Papailiopoulos
1 year
Can transformers follow instructions? We explore this in: "Looped Transformers as Programmable Computers" led by Angeliki ( @AngelikiGiannou ) and Shashank ( @shashank_r12 ) in collaboartion with the @Kangwook_Lee and @jasondeanlee Here is a 🧵
Tweet media one
18
158
787
12
24
384
@teortaxesTex
Teortaxes▶️
6 months
Remember this guy? Him moving to OpenAI was big news in small circles. He might know more about "Q*" than most.
@polynoamial
Noam Brown
10 months
I’m thrilled to share that I've joined @OpenAI ! 🚀 For years I’ve researched AI self-play and reasoning in games like Poker and Diplomacy. I’ll now investigate how to make these methods truly general. If successful, we may one day see LLMs that are 1,000x better than GPT-4 🌌 1/
66
133
2K
15
22
373
@teortaxesTex
Teortaxes▶️
3 months
"Whoa! What was this shot with!? Fiber-optic endoscope?" "H100, silly" "…oh"
@model_mechanic
Aditya Ramesh
3 months
"pov footage of an ant navigating the inside of an ant nest" Video generated by Sora
273
770
7K
4
12
359
@teortaxesTex
Teortaxes▶️
4 months
@whstancil Factually wrong and pathetic
5
1
327
@teortaxesTex
Teortaxes▶️
26 days
New generation is here They're so innocent *tears up a bit*
Tweet media one
16
7
323
@teortaxesTex
Teortaxes▶️
4 months
Might be late but I am now 100% convinced that Miqu is the same model that's accessible as Mistral-Medium on Perplexity Labs. It was plausible that it knows standard puzzles, but there ain't no way in Hell a pranker has tuned it to identically phrase the responses in Russian too.
Tweet media one
Tweet media two
18
23
322
@teortaxesTex
Teortaxes▶️
6 months
@DrJimFan Cortana revenge arc begins
3
9
310
@teortaxesTex
Teortaxes▶️
4 months
@whstancil Will, as a true southerner, have you truly not noticed that black kids of school age grow up (as in, literally get taller) at a faster pace, or do you believe this is caused by lack of healthcare and whatnot?
7
2
290
@teortaxesTex
Teortaxes▶️
5 months
Wake up babe, new paper showing that LLMs* are absurdly easy to compress/sparsify/accelerate has dropped *headline results invariably on OPT-175B, due to ReLU-induced sparcity Alas, it's not an accident that we see SwiGLU in best-in-class models. Related:
Tweet media one
Tweet media two
Tweet media three
@deliprao
Delip Rao e/σ
5 months
This is huge! Now watch the LLM API costs dropping even further. [.cn PDF link]
Tweet media one
44
228
2K
6
32
302
@teortaxesTex
Teortaxes▶️
6 months
The most glorious mistake of elites in history probably.
@plazehodler
plazehodler
6 months
@Plinz Just look at what the Internet has become. And now it's too late and completely out of control.
12
1
28
10
27
292
@teortaxesTex
Teortaxes▶️
2 months
Bro really thought he's going to get away with it huh
Tweet media one
@tszzl
roon
2 months
people will always think my vague tweets are about agi but they’re about love
158
61
1K
5
10
277
@teortaxesTex
Teortaxes▶️
8 months
People are so used to models trained for Huggingface Leaderboard they're in disbelief upon seeing a production-grade one. Maybe they shouldn't. Smol Qwens are samples of Tongyi Qianwen, not proofs of concept; to Alibaba, they're kind of like what T5-XXL is to Google. Alibaba is…
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
23
268
@teortaxesTex
Teortaxes▶️
24 days
If you're in finetuning, pivot to representation engineering (I'm serious)
14
12
266
@teortaxesTex
Teortaxes▶️
7 months
When a brilliant guy like Ilya confidently says a truthy thing like "to compress information really well, what the neural network learns is some representation of the process that produced the text" – it sounds obviously right. ...To people who can't come up with an alternative…
19
19
263
@teortaxesTex
Teortaxes▶️
6 months
An underappreciated way to reduce P(doom) is to not be a doomer
Tweet media one
13
24
262
@teortaxesTex
Teortaxes▶️
3 months
@cynawk @einstein2004113 Maybe you learn not to stan actual braindead troglodytes?
Tweet media one
3
1
239
@teortaxesTex
Teortaxes▶️
5 months
> a massive handsome unit > awesome hyphenated French name > did quantum computing research at Alphabet > I bet he understands MWI better than Yud too Aside from focusing decel pressure on Extropic investors*, I am not sure what this was supposed to acknowledge *brace for it
Tweet media one
11
3
255
@teortaxesTex
Teortaxes▶️
12 days
No polite way to say it. The purpose of a system is what it does. Lesswrong is a system for grooming autistic people into fanatics of AI Doom, making use of their social disadaptation and bad taste. This works because the US at large has no intellectual culture to offer them.
Tweet media one
Tweet media two
@jd_pressman
John David Pressman
12 days
I love that this brainworm keeps trying to evolve the defense of never thinking very hard about AI capabilities so you stay as scared as possible of a vague amorphous threat.
10
8
151
28
13
255
@teortaxesTex
Teortaxes▶️
3 months
It feels like Groq's real strategy is having Groq employees dominate all discussions of Groq
Tweet media one
24
8
252
@teortaxesTex
Teortaxes▶️
1 month
This is also what @ylecun means when saying that GPT-4 or "GPT-5000" doesn't have a cat's understanding of the world and can't anticipate simple physical causal chains not present in texts. It's easy to mock his dismissive takes – but, incredibly, there is evidence behind them.
Tweet media one
@DhruvBatraDB
Dhruv Batra
1 month
I have been working on vision+language models (VLMs) for a decade. And every few years, this community re-discovers the same lesson -- that on difficult tasks, VLMs regress to being nearly blind! Visual content provides minor improvement to a VLM over an LLM, even when these…
Tweet media one
23
116
781
24
15
254
@teortaxesTex
Teortaxes▶️
6 months
Once again: if tech bros had even a smidgeon of self-respect, they'd have bullied this behavior out of existence. – Sorry [not sorry] for being a possessed politruk, but have you tried being visibly embarrassed about your life? – …What the fuck is wrong with you? Go read Damore.
@savvyRL
Rosanne Liu
6 months
@PiotrPadlewski Sorry but I have to push back a little here. (Again, sorry for doing this today, and congrats on your very deserved achievement.) When it comes to culture and change, everyone is responsible, or culpable if something is so alarmingly wrong.
288
0
49
14
11
235
@teortaxesTex
Teortaxes▶️
4 months
@GrantSlatton High trust society moment
1
0
236
@teortaxesTex
Teortaxes▶️
7 months
Another race We hear about races often. Frontier labs rushing to AGI, Humanity against Moloch, US vs China. But there's one more, little heard of: the race between doomers completing institutional capture, and their entire theory getting discredited through transparency of AI.…
Tweet media one
Tweet media two
Tweet media three
Tweet media four
@ylecun
Yann LeCun
7 months
The groundswell of interest for Llama-1 (and for all our previous open source AI packages, Like PyTorch, DINO, SAM, NLLB, wav2vec...) is what convinced the Meta leadership that the benefits of an open release of Llama-2 would overwhelmingly outweigh the risks and transform the…
64
158
2K
12
41
233
@teortaxesTex
Teortaxes▶️
8 months
Some say this is Sin manifest. But I believe Amorphus Globosus is how the God of Animal Husbandry rushes ahead of the causal chain to deliver us the cattle's ultimate form. The redemption of carnivora: innocent flesh which knew no sentience. We shall grow them like so much fruit.
@TeyanaToad
🇵🇸 Teyana // Lord Have Mercy ❤️‍🔥 🕊
8 months
Amorphous Globosa, a phenomenon in cattle where instead of a normally developed foetus, a spherical structure covered by hairy skin and/or primitive mouthparts is formed (im obsessed by this btw)
983
2K
19K
17
21
229
@teortaxesTex
Teortaxes▶️
1 month
> try > fail, miserably > document and report your failure I've called Intel a shameless company once but this deserves respect.
Tweet media one
@_akhaliq
AK
1 month
Intel presents LLaVA-Gemma Accelerating Multimodal Foundation Models with a Compact Language Model We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs). Of particular
Tweet media one
2
43
199
4
12
233
@teortaxesTex
Teortaxes▶️
2 months
R**n is a goddamn attention whore A*ella is a boring data-driven whore Both pollute my TL like no tomorrow
24
4
229
@teortaxesTex
Teortaxes▶️
4 months
This feels like a turning point in the downfall of the US nu-elite. Misconduct, bad political takes, all gets forgiven – but not the lack of grace. They are BAD at mimicking it, lacking practice and nerve, their voices are petty and shrill. They are painfully Not Elite Material.
Tweet media one
11
12
220
@teortaxesTex
Teortaxes▶️
4 months
@johannes_hage @fouriergalois @tugot17 We should build all data centers like this
5
1
224
@teortaxesTex
Teortaxes▶️
11 days
Small minds say "what did Ilya see" Middling minds ask "where did Ilya go?" Big minds wonder "wtf happened to Alex Krizhevsky?"
@karpathy
Andrej Karpathy
11 days
# CUDA/C++ origins of Deep Learning Fun fact many people might have heard about the ImageNet / AlexNet moment of 2012, and the deep learning revolution it started. What's maybe a bit less known is that the code backing this winning submission to the…
Tweet media one
135
931
7K
9
8
214
@teortaxesTex
Teortaxes▶️
4 months
It was one of the bigger blackpills to me. It proved that the water I swim in, all this ostensibly free-thinking, irreverent, "common sense", progressive, New Atheism era online society – is beholden to a vile orthodoxy that's opportunistically destroying its competition.
@growing_daniel
Daniel
4 months
Learning that Catholic priests sexually abuse at the same rate as Protestant preachers and way less than public school teachers was a real blackpill moment for me. Why was this all about Catholics? Profoundly effective smear campaign.
939
2K
16K
9
23
208
@teortaxesTex
Teortaxes▶️
8 months
Tweet media one
@jeremyphoward
Jeremy Howard
8 months
I was chatting on Discord with an anon account ,who had just come up with a really smart new approach to model merging and had some encouraging results to show. But then he had to go. Turns out he's in 10th grade and was in trouble because he wasn't focussing on English class.
63
343
4K
2
14
207
@teortaxesTex
Teortaxes▶️
5 months
Heroes of Forbes Villains of Forbes
Tweet media one
Tweet media two
6
7
204
@teortaxesTex
Teortaxes▶️
7 months
Reminder that "IQ only measures how you do on IQ test" is a lie that kind 145 IQ college professors who go for years without a prolonged interaction with a 100 IQ person tell to their insecure above-average students, because they genuinely think that's what "low IQ" is about.
@realchasegeiser
Chase Geiser
7 months
Here's a real world example of a man with an IQ of 75 describing what it's like.
261
199
3K
14
11
202
@teortaxesTex
Teortaxes▶️
3 months
@RichardMCNgo Labor theory of value in practice? In the UK? Oh the irony
1
5
206
@teortaxesTex
Teortaxes▶️
7 months
Hyperhuman Era So we're talking of AI and the end of the human era. Here's the deal. I'm Russian. Depending on how you look at it, one belonging to the last Soviet or the first Federation generation – child of a dying empire, born to wander concrete skeletons of abandoned…
Tweet media one
@RokoMijic
Roko
7 months
@teortaxesTex lol maybe, but suddenly everyone agrees with me that the end of the human era is coming soon. 15 years ago, people were seriously talking about thousands or millions of years to AI.
5
1
45
28
24
196
@teortaxesTex
Teortaxes▶️
3 months
> they did just keep pretraining the model on an increasingly enriched corpus > sensible experiments on basic techniques > 0 goofy self-sabotage, 4.5T tokens, and 7B SoTA Feels like they have some world class PM genius who's also an OSS fanatic.
Tweet media one
Tweet media two
Tweet media three
@deepseek_ai
DeepSeek
3 months
🚀 DeepSeekMath: Approaching Mathematical Reasoning Capability of GPT-4 with a 7B Model. Highlights: - Continue pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math tokens from Common Crawl. - Introduce GRPO, a variant of PPO, that enhances mathematical reasoning and reduces…
Tweet media one
21
168
937
4
16
198
@teortaxesTex
Teortaxes▶️
14 days
On the futility of small scaling experiments; or why frontier labs will out-innovate you Devastating figure
Tweet media one
@arankomatsuzaki
Aran Komatsuzaki
14 days
Meta presents Better & Faster Large Language Models via Multi-token Prediction - training language models to predict multiple future tokens at once results in higher sample efficiency - up to 3x faster at inference
Tweet media one
16
140
901
11
10
202
@teortaxesTex
Teortaxes▶️
3 months
Go home guys, it's already over. Sam... I'm sorry. AGI has been achieved internally, by one Daniel Olsher.
Tweet media one
Tweet media two
19
20
200
@teortaxesTex
Teortaxes▶️
5 months
@cremieuxrecueil Eurasia has always been at war with Eurasia
0
0
185
@teortaxesTex
Teortaxes▶️
6 months
Deepseek-67B, trained on 2T tokens, confirms just how suboptimal the llama-2/code data mixture was. Mistral confirms this is true even if you only use open datasets – nevermind OpenAI's dirty secrets. But how do you use them? Contribute to the data lore:
Tweet media one
Tweet media two
@georgejrjrjr
George
6 months
Reading papers is cool, but if you would like an informal review of the state of the art, I'm writing one here:
4
5
114
4
13
193
@teortaxesTex
Teortaxes▶️
22 days
We should wean scientists off conda probably
33
4
190
@teortaxesTex
Teortaxes▶️
1 month
Eliezer Yudkowsky is the defining, paradigm-setting intellectual of our era, much like Karl Marx and Sigmund Freud have been in theirs. Today, he is overlooked largely due to lacking credentials; but our posthuman successors will honor Big Yud with a mile-high diamondoid statue.
41
7
189
@teortaxesTex
Teortaxes▶️
19 days
Sama must power-trip so hard from all the noise about matching GPT-4, beating it by x%, delivering ≈GPT-4 at y% the price. Seems like nobody - at least not in the open – has any clue as to how to go even further beyond. GPT-5 will be utterly different, and unmatched at launch.
23
5
190
@teortaxesTex
Teortaxes▶️
6 months
"Ah, Mr. Potter-Evans-Verres! I'm sorry to say it's time Department of MAGIC confiscated your wand – for public safety that's best, don't you agree? It would be such a shame if these wands were to fall into the wrong hands... like those of our dear old friend, Mr. You-Know-Who."
Tweet media one
@_andreamiotti
Andrea Miotti
6 months
MAGIC will: 1. Ban the development of AI systems above a certain threshold of computing power. 2. Have an exclusive mandate to conduct critical AI research. Full paper:
52
2
26
9
15
186
@teortaxesTex
Teortaxes▶️
6 months
If this two-bit coup without securing the loyalty of crucial parties was the whole gambit, then Ilya Sutskever has done more to disprove the g factor of intelligence than every lefty professor from Lewontin to Gardner to Rutherford
17
9
182
@teortaxesTex
Teortaxes▶️
7 months
@woke8yearold Tbh based, would revive England
2
0
180
@teortaxesTex
Teortaxes▶️
6 months
tfw some guy from Netherlands «solved corrigibility», it's even been discussed on LW in 2019 and nobody seems to care much
Tweet media one
@domenic
Domenic Denicola
6 months
@norabelrose @teortaxesTex And... it was solved, but nobody ever talks about it?
7
3
88
9
7
185
@teortaxesTex
Teortaxes▶️
3 months
@BillyVacant > You are born where you are born by complete chance. Bro do you think reality is a video game with randomized spawning points for souls or what
2
4
173
@teortaxesTex
Teortaxes▶️
5 months
@JacquesThibs @ericjang11 doesn't miss (The original authorship is hilarious)
Tweet media one
1
11
182
@teortaxesTex
Teortaxes▶️
23 days
Wanted to say "bullshit"… uh. I knew the margin, but didn't realize that the wafer cost of GH200 is merely $200. This is what an optimized process looks like. If TSMC could scale fabs faster, we'd have had >OOM YoY compute growth. Power is becoming the main constraint already.
Tweet media one
@YouJiacheng
YouJiacheng
23 days
@teortaxesTex @dylan522p H100 price breakdown (very rough): $40000=$38000 for NVIDIA + $1800 for HBM&CoWoS(SK Hynix&TSMC) + $200 for 4nm process(TSMC).
2
3
25
11
17
185
@teortaxesTex
Teortaxes▶️
10 months
Watch this guy
Tweet media one
@drawthingsapp
Draw Things
10 months
To make this work, we integrated MFA GEMM kernel from @philipturnerar of project. The result is instant: it improves the overall performance on SD v1 / v2 / XL by about 15%. And this is just a beginning! (4/9)
1
1
49
7
14
183
@teortaxesTex
Teortaxes▶️
4 months
Some days @yacineMTB feels extra generous and does stuff like this
Tweet media one
14
0
182
@teortaxesTex
Teortaxes▶️
2 months
Read this if you haven't yet:
Tweet media one
@tanshawn
Shawn Tan
2 months
One of the things we really needed for Sparse Universal Transformers was a fast way to do Mixture-of-Experts for attention. That's why we spent the time to write ScatterMoE.
3
3
71
4
14
180
@teortaxesTex
Teortaxes▶️
3 months
Ngl it's hilarious how Mistral has established its credentials as the most likely OpenAI slayer on little more than a le excellent 7B, ruthless execution and memes. No Turing Award PIs or Transformer inventors, no in-house silicon, no sweatshops with data annotators, no nothing.
@abacaj
anton
3 months
If mistral's new large model couldn't surpass gpt-4, what hope does anyone else have? OpenAI lead is > 1 year
99
37
1K
7
9
177
@teortaxesTex
Teortaxes▶️
6 months
On Bostrom Yudkowsky must feel lonely now. @robinhanson , Bostrom, Drexler, @perrymetzger – his intellectual icons and teachers have all defected to some extent; he stands surrounded by rabid believers – and smooth psychos in suits, worming into halls of power. (Ironically, he's…
Tweet media one
Tweet media two
Tweet media three
@jachaseyoung
Jordan Chase-Young
6 months
FINALLY: AI xrisker Nick Bostrom regrets focusing on AI risk, now worries that our fearful herd mentality will drive us to crush AI and destroy our future potential. (from an UnHerd podcast today) Nick Bostrom: It would be tragic if we never developed advanced artificial…
56
102
663
16
31
179
@teortaxesTex
Teortaxes▶️
4 months
@KanizsaBoundary Speaking of strange aides to talent: most blind mathematicians work in geometry and topology. It is argued that the spatial intuition of sighted people is degraded by the triviality of retinal perception.
Tweet media one
Tweet media two
6
24
179
@teortaxesTex
Teortaxes▶️
6 months
@pine_tree_riots I tried hard to suppress my cynicism and not believe rightoids on this one
15
0
161
@teortaxesTex
Teortaxes▶️
5 months
@jayallen_uj Do you even have a job, or is spreading vile misinformation about Japan the extent of it, Seattle guy?
0
2
170
@teortaxesTex
Teortaxes▶️
3 months
No. You should realize that higher-quality data can bail out smaller models with weaker, computationally cheaper architectures. Eg a 20B MambaMoE pretrained on 1T select tokens will wreck a 72B MHA Transformer fed with 3T of garbage. This is the current meta.
@Yampeleg
Yam Peleg
3 months
Why is 𝙼𝚒𝚜𝚝𝚛𝚊𝚕-𝟽𝙱 so good? I think it is the architecture. - 𝙼𝚒𝚜𝚝𝚛𝚊𝚕-𝟽𝙱's 𝚖𝚕𝚙 dim: 14336 - 𝙻𝙻𝚊𝙼𝙰's 𝚖𝚕𝚙 dim: 11008 The GQA moved 0.8B parameters from 𝚔_𝚙𝚛𝚘𝚓 & 𝚟_𝚙𝚛𝚘𝚓 to the 𝚖𝚕𝚙's dimension. Note: 𝚈𝚒-𝟼𝙱 also uses GQA with (smaller…
14
25
243
7
9
175
@teortaxesTex
Teortaxes▶️
7 months
@davidrkadler To think that a nation of 45 million could be saved with a few grams of bioavailable lithium
6
4
166
@teortaxesTex
Teortaxes▶️
3 months
@austin_rief @bookwormengr The thing is, quality of life ain't worth all that much (well okay, worth 2-3 mil). Scrape for "time" all you want – Bezos affects the timeline, you don't, he carves his fate into the world. "A cheeseburger is a cheeseburger" is a thought befitting an irrelevant animal in a zoo.
23
2
167
@teortaxesTex
Teortaxes▶️
24 days
Reminder that current Groq chips use 230Mb SRAM per, so LLaMA 3 8B in int8 takes >36 chips to run (KV cache not included), with total power draw of 10 H100s, and very dubious unit economics (h/t @dylan522p ) But fast is neat.
Tweet media one
@VictorTaelin
Taelin
24 days
Wait, LLaMA 3 on Groq is actually INSANE. Just asked the 8b model (!!!!??) to convert a complete file from λ-Calculus to idiomatic JS... and it did it IN TWO SECONDS. What kind of wizardry is that?
Tweet media one
25
72
796
11
10
162
@teortaxesTex
Teortaxes▶️
13 days
First impressions: I think this is going to be my daily driver from now (left is vanilla 8B-instruct Q8, right is the orthogonalized instruct exl2-6.5bpw)
Tweet media one
18
12
162
@teortaxesTex
Teortaxes▶️
3 months
@pmddomingos Almost a Greek tragedy
3
2
157
@teortaxesTex
Teortaxes▶️
6 months
@hayxtt This «we done with appealing to Westerners» machismo death spiral is one of the most pathetic ways for a people to go out, the yelp of a terrified mutt trying to roar like a lion. Immature protest which ends in the spectacle of bearded infants getting slaughtered for sport.
4
1
146
@teortaxesTex
Teortaxes▶️
2 months
AGI is already here far as I'm concerned. I think there is no task for which we can credibly say "nope, a SoTA transformer variant can't learn to ace it on a reasonable data budget". But he's right, Jihadis can still deny us progress. Never underestimate malice inspired by fear.
@PauseAI
PauseAI ⏸
2 months
AGI is not inevitable. It requires hordes of engineers with million dollar paychecks. It requires a fully functional and unrestricted supply chain of the most complex hardware. It requires all of us to allow these companies to gamble with our future.
61
36
312
19
11
154
@teortaxesTex
Teortaxes▶️
4 months
The ego on this guy. How does one make it his life's mission to harm the field he tried to make a name in?
Tweet media one
32
1
154
@teortaxesTex
Teortaxes▶️
26 days
> "I think it's time to admit defeat" How often do you see LLMs capitulate instead of doubling down or gaslighting you? Sadly 8B Llama is struggling with The Diamond Problem (as do all <10B models that don't cheat egregiously), but its attitude sure is more human-like now.
Tweet media one
@teortaxesTex
Teortaxes▶️
1 month
Why do *all* the chat LLMs have such a strong prior that their response was «unclear or confusing» when you call it out as basically wrong and hallucinated?
15
2
67
10
8
154
@teortaxesTex
Teortaxes▶️
22 days
If it turns out that Llama-3 really is merely on Mistral/Gemma level, but some secret instruct-tuning pipeline launches it into Claude 3 realm, this is awkward. For one thing it makes all those doomer "undoing guardrails" papers relevant. It also makes a joke of community efforts
11
1
153
@teortaxesTex
Teortaxes▶️
13 days
One week in, a new doomer paper dropped, showing that refusal behavior in Llama 3 8B Instruct is represented by a one-dimensional linear subspace in activation space. Four days later, it's apparently been implemented to remove refusals from L3: let's check
Tweet media one
Tweet media two
@teortaxesTex
Teortaxes▶️
25 days
@DavidFSWD It's a very new paper so no. But in spirit there are similar works. I mainly mean that Llama has weird RLHF'd compunctions we could remove, and they're probably intrinsically very low-rank, like shown in many doomer papers on safety unlearning.
1
0
13
7
10
154
@teortaxesTex
Teortaxes▶️
1 month
Well played Elon. Grok is becoming indispensable to me as a solution to one problem Twitter could have fixed with a few LOCs, namely:
Tweet media one
15
5
153
@teortaxesTex
Teortaxes▶️
3 months
This is why I'm bullish on Deepseek, and more so on OpenCodeInterpreter-like multiturn funetunes. It's not another Leaderboard big boy. It can think. It can reflect on its work. It only needs to be reminded of that. DSC-67B-OCI will be insanely useful.
Tweet media one
@shncldwll
shane @ rsa
3 months
Tired of “LLM hacking” hype with no code? Here’s a breath of fresh air. 1. Challenges: open source ✅ 2. Solution framework: open source ✅ If you’re interested in hackbots in offsec and you’re craving something you can RUN, you gotta read this
0
54
232
4
14
149