Teortaxes▶️ @teortaxesTex Twitter profile

Pinned Tweet

Teortaxes▶️

@teortaxesTex

5 months

Stefan Schubert

@StefanFSchubert

2 years

Aldous Huxley was born on this day 128 years ago. Reminds me of that time my utilitarian friend had missed that Brave New World is supposed to be dystopian. 🤪

6

1

75

6

5

113

Last Seen Profiles

@lieutenantA

@DinobetCasino

@Rynreii

@JiEunTake

@wubbleskiedoots

@NinaAgdal

@AlijahHackett

@KeSean_Stone

@dailydeaa

@dellacruzmusic

@dictahhh

@ATAISIK_

@cuckolduygular

@btsNK12

@tractor_owner

@ZeroMarksMen

@yrnrgee00

@CharlenaBineyy

@tangoizle

@RCaurin

@ahall_research

@arvindjidwivedi

@DavidSinclair7

@Kiri0209ya

@wsdot

@elliethompson_2

@Kahya1969

@a7medjomaa

@harethalhorany

@stw_pdg

@the_halle

@VidalVidal05

@FikriAmmuri

@realmeKenya

@Shock_CHB

@jtbthought

Teortaxes▶️

@teortaxesTex

5 months

Cave people staring, slack-jawed, at artifacts of a superior civilization. There will never be a European iPad. The most they could contribute was make threats to get Americans to sigh and add USB-C to it. With those clowns around there won't be a European GPT-5 either.

Thierry Breton

@ThierryBreton

5 months

Reviewing what could be the final drafting. #AIAct #Trilogue

282

83

624

68

196

2K

Teortaxes▶️

@teortaxesTex

3 months

@einstein2004113 0.01% of its total population is like 50 thousand people but you know that

24

5

2K

Teortaxes▶️

@teortaxesTex

24 days

Zuck unironically has hands-on experience building AI. People ignored this because it was "lmao lizard robot" phase. Remember his thing, choosing annual challenges? Wear a tie, Learn Mandarin, butcher cattle by hand? Yeah the theme of 2016 was "build an AI assistant like Jarvis".

Aidan McLau

@aidan_mclau

25 days

unbelievable based. zuck is llama3 author

8

25

452

15

75

1K

Teortaxes▶️

@teortaxesTex

6 months

@growing_daniel It *is* neat though. And she asks the right question – how is it not a thing yet?

16

1

1K

Teortaxes▶️

@teortaxesTex

3 months

Call me paranoid, but dropping the "tbh transformers just aren't great at length generalization: the paper" one day before announcing that they have near-perfect recall and in-context learning over 10M tokens seems suggestive that Gemini 1.5 is a substantially different design.

35

47

817

Teortaxes▶️

@teortaxesTex

8 months

@toonholechris Royally Inbred vs [ostensibly] outbred > inb4 Alexei Nikolaevich Yes, well known as a drooling mutant, now go look him up

24

26

722

Teortaxes▶️

@teortaxesTex

3 months

@cynawk @einstein2004113 To be clear, just one European country receives more than that in a single month But I can believe you're illiterate and innumerate so it may be that you sincerely don't get how that undermines your sarcasm

6

8

677

Teortaxes▶️

@teortaxesTex

2 months

It seems that results of that Microsoft paper about ternary LLMs can be replicated after all – for 3B @100B at least.

18

97

697

Teortaxes▶️

@teortaxesTex

4 months

I don't think about this often, but sometimes I do

Quanquan Gu

@QuanquanGu

4 months

Give someone a fish, and you feed them for a day; teach someone to fish, and you feed them for a lifetime. Elevating from Weak to Strong with Self-Play Fine-Tuning (SPIN) for All LLMs. Empower, Evolve, SPIN!

15

62

426

14

34

687

Teortaxes▶️

@teortaxesTex

5 months

@image_origins They had to rediscover the concept of having a peer group online, because they were helicoptered away from other children in the real world This is a tragic testament to atomization

5

7

621

Teortaxes▶️

@teortaxesTex

9 months

An obscure but very powerful effect in condensed matter physics, might explain a lot about LK-99

19

38

618

Teortaxes▶️

@teortaxesTex

5 months

> Beff: “institutions have decayed beyond the point of salvaging” and the media is a “vector for cybernetic control of culture” > The media: gets Director of the National Center for Media Forensics to dox Beff, to reduce his cultural pull Is evil the absence of self-awareness?

Nick

@NickThoughtRepo

5 months

@thegarrettscott @BasedBeffJezos Even better, Emily Baker-White gaslit us on the definition of doxxing

10

5

169

21

67

569

Teortaxes▶️

@teortaxesTex

8 months

Guys like this are the precise reason Mistral was announced with a magnet link

Paul Röttger

@paul_rottger

8 months

After spending just 20 minutes with the @MistralAI model, I am shocked by how unsafe it is. It is very rare these days to see a new model so readily reply to even the most malicious instructions. I am super excited about open-source LLMs, but this can't be it! Examples below 🧵

217

107

768

13

25

566

Teortaxes▶️

@teortaxesTex

26 days

From the thread: lllama-3 8b has at least 32k near-perfect needle retrieval (RoPE theta of 4)

11

55

584

Teortaxes▶️

@teortaxesTex

4 months

@kunley_drukpa That's not slop though

8

0

553

Teortaxes▶️

@teortaxesTex

6 months

@growing_daniel People actually only ever care for Western values Those "civilization states", "great powers", it's all fake, lifeless LARP to make rubes work harder Putin just wants to drink beer in Dresden Xi would rather be a soccer coach in California (Bibi is for real though)

3

548

Teortaxes▶️

@teortaxesTex

5 months

@ichthys30 > mid baiting so hard

4

3

528

Teortaxes▶️

@teortaxesTex

6 months

@karpathy what kind of heart emoji is that Andrej

8

4

538

Teortaxes▶️

@teortaxesTex

4 months

Von Neumann's active preference for noise has always fascinated me. Moreover his memory was audial (as per Ulam)! My hunch is that some very efficient brains are so good at dampening intrinsic noise that they require external perturbations as "random seeds" for novel thought.

💉simon

@howyegettingon

4 months

Biggest sign of low IQ is being comfortable with annoying background noises - Smoke alarm bleeps - Water dripping from a tap - Loudspeaker phone calls - Screaming children - Doors opening/closing in the wind - Dogs barking - Bluetooth speaker outside - Loud conversation

904

801

10K

34

32

519

Teortaxes▶️

@teortaxesTex

5 months

> Microsoft could purchase the NYT No, brother. Microsoft is a business. The NYT is a Power, a fortification of a sovereign noble clan appointed by itself to enlighten the lower castes. Merchants cannot buy nobles with mere money.

Michael P. Frank is joining a startup!

@MikePFrank

5 months

The New York Times is only an $8B company. Microsoft could purchase the whole shebang with pocket change. Could be worth it to make the lawsuit go away and preserve the value of their investment in OpenAI (and recent $1T increase in the market cap of $MSFT). Yo @satyanadella , I’m…

54

8

127

30

19

475

Teortaxes▶️

@teortaxesTex

4 months

@whstancil Incidentally, Tamil Brahmins have won India 3 Science Nobels out of 4 total. Their population amounts to 2 million people – a minority within a minority. Your analysis of rhetorical tactics is astute, but rhetorics only go so far. In the end, you cannot taboo noticing patterns.

24

16

456

Teortaxes▶️

@teortaxesTex

6 months

@DrJimFan Satya is the most powerful Indian on the planet now

10

14

460

Teortaxes▶️

@teortaxesTex

6 months

@nivi @Orwelian84 The board said that Mr. Altman has been "lowkey one shifty mf, fr fr", according to the people familiar with the executives.

6

19

457

Teortaxes▶️

@teortaxesTex

4 months

E/accs talk a lot of smack about their enemies outlawing math, but did you know that there literally exist illegal numbers? Codellama only exercises sensible caution here. We wouldn't want it to generate a nasty, prohibited prime, would we? Stick to permitted ones please.

catid (e/acc)

@MrCatid

4 months

FYI the new CodeLLaMA 70B model refuses to produce code that generates prime numbers.. About 80% of the time it says your request is immoral and cannot be completed.

64

43

863

16

29

451

Teortaxes▶️

@teortaxesTex

1 month

Never give up

4

38

437

Teortaxes▶️

@teortaxesTex

11 months

The edge of OpenAI at this point is not engineering but mythmaking; degrading ML discourse to the level of Marvel comic books – jealous villains angling for insights of eccentric geniuses, party rumors, grand visions. People stop reading papers, tracing authors, thinking. Murica.

Soumith Chintala

@soumithchintala

11 months

i might have heard the same 😃 -- I guess info like this is passed around but no one wants to say it out loud. GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference. Glad that Geohot said it out loud. Though, at this point, GPT-4 is…

57

390

2K

9

38

404

Teortaxes▶️

@teortaxesTex

8 months

@toonholechris Some people who like self-depreciating humor are surprisingly thin-skinned, lol

3

7

368

Teortaxes▶️

@teortaxesTex

5 months

please say it ain't so @ylecun

Dimitris Papailiopoulos

@DimitrisPapail

1 year

Can transformers follow instructions? We explore this in: "Looped Transformers as Programmable Computers" led by Angeliki ( @AngelikiGiannou ) and Shashank ( @shashank_r12 ) in collaboartion with the @Kangwook_Lee and @jasondeanlee Here is a 🧵

18

158

787

12

24

384

Teortaxes▶️

@teortaxesTex

6 months

Remember this guy? Him moving to OpenAI was big news in small circles. He might know more about "Q*" than most.

Noam Brown

@polynoamial

10 months

I’m thrilled to share that I've joined @OpenAI ! 🚀 For years I’ve researched AI self-play and reasoning in games like Poker and Diplomacy. I’ll now investigate how to make these methods truly general. If successful, we may one day see LLMs that are 1,000x better than GPT-4 🌌 1/

66

133

2K

15

22

373

Teortaxes▶️

@teortaxesTex

3 months

"Whoa! What was this shot with!? Fiber-optic endoscope?" "H100, silly" "…oh"

Aditya Ramesh

@model_mechanic

3 months

"pov footage of an ant navigating the inside of an ant nest" Video generated by Sora

273

770

7K

4

12

359

Teortaxes▶️

@teortaxesTex

4 months

@whstancil Factually wrong and pathetic

5

1

327

Teortaxes▶️

@teortaxesTex

26 days

New generation is here They're so innocent *tears up a bit*

16

7

323

Teortaxes▶️

@teortaxesTex

4 months

Might be late but I am now 100% convinced that Miqu is the same model that's accessible as Mistral-Medium on Perplexity Labs. It was plausible that it knows standard puzzles, but there ain't no way in Hell a pranker has tuned it to identically phrase the responses in Russian too.

18

23

322

Teortaxes▶️

@teortaxesTex

6 months

@DrJimFan Cortana revenge arc begins

3

9

310

Teortaxes▶️

@teortaxesTex

4 months

@whstancil Will, as a true southerner, have you truly not noticed that black kids of school age grow up (as in, literally get taller) at a faster pace, or do you believe this is caused by lack of healthcare and whatnot?

7

2

290

Teortaxes▶️

@teortaxesTex

5 months

Wake up babe, new paper showing that LLMs* are absurdly easy to compress/sparsify/accelerate has dropped *headline results invariably on OPT-175B, due to ReLU-induced sparcity Alas, it's not an accident that we see SwiGLU in best-in-class models. Related:

Delip Rao e/σ

@deliprao

5 months

This is huge! Now watch the LLM API costs dropping even further. [.cn PDF link]

44

228

2K

6

32

302

Teortaxes▶️

@teortaxesTex

6 months

The most glorious mistake of elites in history probably.

plazehodler

@plazehodler

6 months

@Plinz Just look at what the Internet has become. And now it's too late and completely out of control.

12

1

28

10

27

292

Teortaxes▶️

@teortaxesTex

2 months

Bro really thought he's going to get away with it huh

roon

@tszzl

2 months

people will always think my vague tweets are about agi but they’re about love

158

61

1K

5

10

277

Teortaxes▶️

@teortaxesTex

8 months

People are so used to models trained for Huggingface Leaderboard they're in disbelief upon seeing a production-grade one. Maybe they shouldn't. Smol Qwens are samples of Tongyi Qianwen, not proofs of concept; to Alibaba, they're kind of like what T5-XXL is to Google. Alibaba is…

Knut Jägersberg

@JagersbergKnut

8 months

Qwen-14B beats larger models in benchmarks, LLM community wonders how @bimedotcom @TheAIObserverX @Khulood_Almani @debashis_dutta @sonu_monika @theomitsa @BetaMoroney @Analytics_699 @Shi4Tech @FmFrancoise @enilev @sallyeaves @IanLJones98

9

12

62

3

23

268

Teortaxes▶️

@teortaxesTex

24 days

If you're in finetuning, pivot to representation engineering (I'm serious)

14

12

266

Teortaxes▶️

@teortaxesTex

7 months

When a brilliant guy like Ilya confidently says a truthy thing like "to compress information really well, what the neural network learns is some representation of the process that produced the text" – it sounds obviously right. ...To people who can't come up with an alternative…

19

263

Teortaxes▶️

@teortaxesTex

6 months

An underappreciated way to reduce P(doom) is to not be a doomer

13

24

262

Teortaxes▶️

@teortaxesTex

3 months

@cynawk @einstein2004113 Maybe you learn not to stan actual braindead troglodytes?

3

1

239

Teortaxes▶️

@teortaxesTex

5 months

> a massive handsome unit > awesome hyphenated French name > did quantum computing research at Alphabet > I bet he understands MWI better than Yud too Aside from focusing decel pressure on Extropic investors*, I am not sure what this was supposed to acknowledge *brace for it

11

3

255

Teortaxes▶️

@teortaxesTex

12 days

No polite way to say it. The purpose of a system is what it does. Lesswrong is a system for grooming autistic people into fanatics of AI Doom, making use of their social disadaptation and bad taste. This works because the US at large has no intellectual culture to offer them.

John David Pressman

@jd_pressman

12 days

I love that this brainworm keeps trying to evolve the defense of never thinking very hard about AI capabilities so you stay as scared as possible of a vague amorphous threat.

10

8

151

28

13

255

Teortaxes▶️

@teortaxesTex

3 months

It feels like Groq's real strategy is having Groq employees dominate all discussions of Groq

24

8

252

Teortaxes▶️

@teortaxesTex

1 month

This is also what @ylecun means when saying that GPT-4 or "GPT-5000" doesn't have a cat's understanding of the world and can't anticipate simple physical causal chains not present in texts. It's easy to mock his dismissive takes – but, incredibly, there is evidence behind them.

Dhruv Batra

@DhruvBatraDB

1 month

I have been working on vision+language models (VLMs) for a decade. And every few years, this community re-discovers the same lesson -- that on difficult tasks, VLMs regress to being nearly blind! Visual content provides minor improvement to a VLM over an LLM, even when these…

23

116

781

24

15

254

Teortaxes▶️

@teortaxesTex

6 months

Once again: if tech bros had even a smidgeon of self-respect, they'd have bullied this behavior out of existence. – Sorry [not sorry] for being a possessed politruk, but have you tried being visibly embarrassed about your life? – …What the fuck is wrong with you? Go read Damore.

Rosanne Liu

@savvyRL

6 months

@PiotrPadlewski Sorry but I have to push back a little here. (Again, sorry for doing this today, and congrats on your very deserved achievement.) When it comes to culture and change, everyone is responsible, or culpable if something is so alarmingly wrong.

288

0

49

14

11

235

Teortaxes▶️

@teortaxesTex

4 months

@GrantSlatton High trust society moment

1

0

236

Teortaxes▶️

@teortaxesTex

7 months

Another race We hear about races often. Frontier labs rushing to AGI, Humanity against Moloch, US vs China. But there's one more, little heard of: the race between doomers completing institutional capture, and their entire theory getting discredited through transparency of AI.…

Yann LeCun

@ylecun

7 months

The groundswell of interest for Llama-1 (and for all our previous open source AI packages, Like PyTorch, DINO, SAM, NLLB, wav2vec...) is what convinced the Meta leadership that the benefits of an open release of Llama-2 would overwhelmingly outweigh the risks and transform the…

64

158

2K

12

41

233

Teortaxes▶️

@teortaxesTex

8 months

Some say this is Sin manifest. But I believe Amorphus Globosus is how the God of Animal Husbandry rushes ahead of the causal chain to deliver us the cattle's ultimate form. The redemption of carnivora: innocent flesh which knew no sentience. We shall grow them like so much fruit.

🇵🇸 Teyana // Lord Have Mercy ❤️‍🔥 🕊

@TeyanaToad

8 months

Amorphous Globosa, a phenomenon in cattle where instead of a normally developed foetus, a spherical structure covered by hairy skin and/or primitive mouthparts is formed (im obsessed by this btw)

983

2K

19K

17

21

229

Teortaxes▶️

@teortaxesTex

1 month

> try > fail, miserably > document and report your failure I've called Intel a shameless company once but this deserves respect.

AK

@_akhaliq

1 month

Intel presents LLaVA-Gemma Accelerating Multimodal Foundation Models with a Compact Language Model We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs). Of particular

2

43

199

4

12

233

Teortaxes▶️

@teortaxesTex

2 months

R**n is a goddamn attention whore A*ella is a boring data-driven whore Both pollute my TL like no tomorrow

24

4

229

Teortaxes▶️

@teortaxesTex

4 months

This feels like a turning point in the downfall of the US nu-elite. Misconduct, bad political takes, all gets forgiven – but not the lack of grace. They are BAD at mimicking it, lacking practice and nerve, their voices are petty and shrill. They are painfully Not Elite Material.

11

12

220

Teortaxes▶️

@teortaxesTex

4 months

@johannes_hage @fouriergalois @tugot17 We should build all data centers like this

5

1

224

Teortaxes▶️

@teortaxesTex

11 days

Small minds say "what did Ilya see" Middling minds ask "where did Ilya go?" Big minds wonder "wtf happened to Alex Krizhevsky?"

Andrej Karpathy

@karpathy

11 days

# CUDA/C++ origins of Deep Learning Fun fact many people might have heard about the ImageNet / AlexNet moment of 2012, and the deep learning revolution it started. What's maybe a bit less known is that the code backing this winning submission to the…

135

931

7K

9

8

214

Teortaxes▶️

@teortaxesTex

4 months

It was one of the bigger blackpills to me. It proved that the water I swim in, all this ostensibly free-thinking, irreverent, "common sense", progressive, New Atheism era online society – is beholden to a vile orthodoxy that's opportunistically destroying its competition.

Daniel

@growing_daniel

4 months

Learning that Catholic priests sexually abuse at the same rate as Protestant preachers and way less than public school teachers was a real blackpill moment for me. Why was this all about Catholics? Profoundly effective smear campaign.

939

2K

16K

9

23

208

Teortaxes▶️

@teortaxesTex

8 months

Jeremy Howard

@jeremyphoward

8 months

I was chatting on Discord with an anon account ,who had just come up with a really smart new approach to model merging and had some encouraging results to show. But then he had to go. Turns out he's in 10th grade and was in trouble because he wasn't focussing on English class.

63

343

4K

2

14

207

Teortaxes▶️

@teortaxesTex

5 months

Heroes of Forbes Villains of Forbes

6

7

204

Teortaxes▶️

@teortaxesTex

7 months

Reminder that "IQ only measures how you do on IQ test" is a lie that kind 145 IQ college professors who go for years without a prolonged interaction with a 100 IQ person tell to their insecure above-average students, because they genuinely think that's what "low IQ" is about.

Chase Geiser

@realchasegeiser

7 months

Here's a real world example of a man with an IQ of 75 describing what it's like.

261

199

3K

14

11

202

Teortaxes▶️

@teortaxesTex

3 months

@RichardMCNgo Labor theory of value in practice? In the UK? Oh the irony

1

5

206

Teortaxes▶️

@teortaxesTex

7 months

Hyperhuman Era So we're talking of AI and the end of the human era. Here's the deal. I'm Russian. Depending on how you look at it, one belonging to the last Soviet or the first Federation generation – child of a dying empire, born to wander concrete skeletons of abandoned…

Roko

@RokoMijic

7 months

@teortaxesTex lol maybe, but suddenly everyone agrees with me that the end of the human era is coming soon. 15 years ago, people were seriously talking about thousands or millions of years to AI.

5

1

45

28

24

196

Teortaxes▶️

@teortaxesTex

3 months

> they did just keep pretraining the model on an increasingly enriched corpus > sensible experiments on basic techniques > 0 goofy self-sabotage, 4.5T tokens, and 7B SoTA Feels like they have some world class PM genius who's also an OSS fanatic.

DeepSeek

@deepseek_ai

3 months

🚀 DeepSeekMath: Approaching Mathematical Reasoning Capability of GPT-4 with a 7B Model. Highlights: - Continue pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math tokens from Common Crawl. - Introduce GRPO, a variant of PPO, that enhances mathematical reasoning and reduces…

21

168

937

4

16

198

Teortaxes▶️

@teortaxesTex

14 days

On the futility of small scaling experiments; or why frontier labs will out-innovate you Devastating figure

Aran Komatsuzaki

@arankomatsuzaki

14 days

Meta presents Better & Faster Large Language Models via Multi-token Prediction - training language models to predict multiple future tokens at once results in higher sample efficiency - up to 3x faster at inference

16

140

901

11

10

202

Teortaxes▶️

@teortaxesTex

3 months

Go home guys, it's already over. Sam... I'm sorry. AGI has been achieved internally, by one Daniel Olsher.

19

20

200

Teortaxes▶️

@teortaxesTex

5 months

@cremieuxrecueil Eurasia has always been at war with Eurasia

0

185

Teortaxes▶️

@teortaxesTex

6 months

Deepseek-67B, trained on 2T tokens, confirms just how suboptimal the llama-2/code data mixture was. Mistral confirms this is true even if you only use open datasets – nevermind OpenAI's dirty secrets. But how do you use them? Contribute to the data lore:

George

@georgejrjrjr

6 months

Reading papers is cool, but if you would like an informal review of the state of the art, I'm writing one here:

4

5

114

4

13

193

Teortaxes▶️

@teortaxesTex

22 days

We should wean scientists off conda probably

33

4

190

Teortaxes▶️

@teortaxesTex

1 month

Eliezer Yudkowsky is the defining, paradigm-setting intellectual of our era, much like Karl Marx and Sigmund Freud have been in theirs. Today, he is overlooked largely due to lacking credentials; but our posthuman successors will honor Big Yud with a mile-high diamondoid statue.

41

7

189

Teortaxes▶️

@teortaxesTex

19 days

Sama must power-trip so hard from all the noise about matching GPT-4, beating it by x%, delivering ≈GPT-4 at y% the price. Seems like nobody - at least not in the open – has any clue as to how to go even further beyond. GPT-5 will be utterly different, and unmatched at launch.

23

5

190

Teortaxes▶️

@teortaxesTex

6 months

"Ah, Mr. Potter-Evans-Verres! I'm sorry to say it's time Department of MAGIC confiscated your wand – for public safety that's best, don't you agree? It would be such a shame if these wands were to fall into the wrong hands... like those of our dear old friend, Mr. You-Know-Who."

Andrea Miotti

@_andreamiotti

6 months

MAGIC will: 1. Ban the development of AI systems above a certain threshold of computing power. 2. Have an exclusive mandate to conduct critical AI research. Full paper:

52

2

26

9

15

186

Teortaxes▶️

@teortaxesTex

6 months

If this two-bit coup without securing the loyalty of crucial parties was the whole gambit, then Ilya Sutskever has done more to disprove the g factor of intelligence than every lefty professor from Lewontin to Gardner to Rutherford

17

9

182

Teortaxes▶️

@teortaxesTex

7 months

@woke8yearold Tbh based, would revive England

2

0

180

Teortaxes▶️

@teortaxesTex

6 months

tfw some guy from Netherlands «solved corrigibility», it's even been discussed on LW in 2019 and nobody seems to care much

Domenic Denicola

@domenic

6 months

@norabelrose @teortaxesTex And... it was solved, but nobody ever talks about it?

7

3

88

9

7

185

Teortaxes▶️

@teortaxesTex

3 months

@BillyVacant > You are born where you are born by complete chance. Bro do you think reality is a video game with randomized spawning points for souls or what

2

4

173

Teortaxes▶️

@teortaxesTex

5 months

@JacquesThibs @ericjang11 doesn't miss (The original authorship is hilarious)

1

11

182

Teortaxes▶️

@teortaxesTex

23 days

Wanted to say "bullshit"… uh. I knew the margin, but didn't realize that the wafer cost of GH200 is merely $200. This is what an optimized process looks like. If TSMC could scale fabs faster, we'd have had >OOM YoY compute growth. Power is becoming the main constraint already.

YouJiacheng

@YouJiacheng

23 days

@teortaxesTex @dylan522p H100 price breakdown (very rough): $40000=$38000 for NVIDIA + $1800 for HBM&CoWoS(SK Hynix&TSMC) + $200 for 4nm process(TSMC).

2

3

25

11

17

185

Teortaxes▶️

@teortaxesTex

10 months

Watch this guy

Draw Things

@drawthingsapp

10 months

To make this work, we integrated MFA GEMM kernel from @philipturnerar of project. The result is instant: it improves the overall performance on SD v1 / v2 / XL by about 15%. And this is just a beginning! (4/9)

1

49

7

14

183

Teortaxes▶️

@teortaxesTex

4 months

Some days @yacineMTB feels extra generous and does stuff like this

14

0

182

Teortaxes▶️

@teortaxesTex

2 months

Read this if you haven't yet:

Shawn Tan

@tanshawn

2 months

One of the things we really needed for Sparse Universal Transformers was a fast way to do Mixture-of-Experts for attention. That's why we spent the time to write ScatterMoE.

3

71

4

14

180

Teortaxes▶️

@teortaxesTex

3 months

Ngl it's hilarious how Mistral has established its credentials as the most likely OpenAI slayer on little more than a le excellent 7B, ruthless execution and memes. No Turing Award PIs or Transformer inventors, no in-house silicon, no sweatshops with data annotators, no nothing.

anton

@abacaj

3 months

If mistral's new large model couldn't surpass gpt-4, what hope does anyone else have? OpenAI lead is > 1 year

99

37

1K

7

9

177

Teortaxes▶️

@teortaxesTex

6 months

On Bostrom Yudkowsky must feel lonely now. @robinhanson , Bostrom, Drexler, @perrymetzger – his intellectual icons and teachers have all defected to some extent; he stands surrounded by rabid believers – and smooth psychos in suits, worming into halls of power. (Ironically, he's…

Jordan Chase-Young

@jachaseyoung

6 months

FINALLY: AI xrisker Nick Bostrom regrets focusing on AI risk, now worries that our fearful herd mentality will drive us to crush AI and destroy our future potential. (from an UnHerd podcast today) Nick Bostrom: It would be tragic if we never developed advanced artificial…

56

102

663

16

31

179

Teortaxes▶️

@teortaxesTex

4 months

@KanizsaBoundary Speaking of strange aides to talent: most blind mathematicians work in geometry and topology. It is argued that the spatial intuition of sighted people is degraded by the triviality of retinal perception.

6

24

179

Teortaxes▶️

@teortaxesTex

6 months

@pine_tree_riots I tried hard to suppress my cynicism and not believe rightoids on this one

15

0

161

Teortaxes▶️

@teortaxesTex

6 months

@gdb @ctjlewis @merettm @sidorszymon @aleks_madry @sama @karpathy do the right thing please

9

3

172

Teortaxes▶️

@teortaxesTex

5 months

@jayallen_uj Do you even have a job, or is spreading vile misinformation about Japan the extent of it, Seattle guy?

0

2

170

Teortaxes▶️

@teortaxesTex

3 months

No. You should realize that higher-quality data can bail out smaller models with weaker, computationally cheaper architectures. Eg a 20B MambaMoE pretrained on 1T select tokens will wreck a 72B MHA Transformer fed with 3T of garbage. This is the current meta.

Yam Peleg

@Yampeleg

3 months

Why is 𝙼𝚒𝚜𝚝𝚛𝚊𝚕-𝟽𝙱 so good? I think it is the architecture. - 𝙼𝚒𝚜𝚝𝚛𝚊𝚕-𝟽𝙱's 𝚖𝚕𝚙 dim: 14336 - 𝙻𝙻𝚊𝙼𝙰's 𝚖𝚕𝚙 dim: 11008 The GQA moved 0.8B parameters from 𝚔_𝚙𝚛𝚘𝚓 & 𝚟_𝚙𝚛𝚘𝚓 to the 𝚖𝚕𝚙's dimension. Note: 𝚈𝚒-𝟼𝙱 also uses GQA with (smaller…

14

25

243

7

9

175

Teortaxes▶️

@teortaxesTex

7 months

@davidrkadler To think that a nation of 45 million could be saved with a few grams of bioavailable lithium

6

4

166

Teortaxes▶️

@teortaxesTex

3 months

@austin_rief @bookwormengr The thing is, quality of life ain't worth all that much (well okay, worth 2-3 mil). Scrape for "time" all you want – Bezos affects the timeline, you don't, he carves his fate into the world. "A cheeseburger is a cheeseburger" is a thought befitting an irrelevant animal in a zoo.

23

2

167

Teortaxes▶️

@teortaxesTex

24 days

Reminder that current Groq chips use 230Mb SRAM per, so LLaMA 3 8B in int8 takes >36 chips to run (KV cache not included), with total power draw of 10 H100s, and very dubious unit economics (h/t @dylan522p ) But fast is neat.

Taelin

@VictorTaelin

24 days

Wait, LLaMA 3 on Groq is actually INSANE. Just asked the 8b model (!!!!??) to convert a complete file from λ-Calculus to idiomatic JS... and it did it IN TWO SECONDS. What kind of wizardry is that?

25

72

796

11

10

162

Teortaxes▶️

@teortaxesTex

13 days

First impressions: I think this is going to be my daily driver from now (left is vanilla 8B-instruct Q8, right is the orthogonalized instruct exl2-6.5bpw)

18

12

162

Teortaxes▶️

@teortaxesTex

3 months

@pmddomingos Almost a Greek tragedy

3

2

157

Teortaxes▶️

@teortaxesTex

6 months

@hayxtt This «we done with appealing to Westerners» machismo death spiral is one of the most pathetic ways for a people to go out, the yelp of a terrified mutt trying to roar like a lion. Immature protest which ends in the spectacle of bearded infants getting slaughtered for sport.

4

1

146

Teortaxes▶️

@teortaxesTex

2 months

AGI is already here far as I'm concerned. I think there is no task for which we can credibly say "nope, a SoTA transformer variant can't learn to ace it on a reasonable data budget". But he's right, Jihadis can still deny us progress. Never underestimate malice inspired by fear.

PauseAI ⏸

@PauseAI

2 months

AGI is not inevitable. It requires hordes of engineers with million dollar paychecks. It requires a fully functional and unrestricted supply chain of the most complex hardware. It requires all of us to allow these companies to gamble with our future.

61

36

312

19

11

154

Teortaxes▶️

@teortaxesTex

4 months

The ego on this guy. How does one make it his life's mission to harm the field he tried to make a name in?

32

1

154

Teortaxes▶️

@teortaxesTex

26 days

> "I think it's time to admit defeat" How often do you see LLMs capitulate instead of doubling down or gaslighting you? Sadly 8B Llama is struggling with The Diamond Problem (as do all <10B models that don't cheat egregiously), but its attitude sure is more human-like now.

Teortaxes▶️

@teortaxesTex

1 month

Why do *all* the chat LLMs have such a strong prior that their response was «unclear or confusing» when you call it out as basically wrong and hallucinated?

15

2

67

10

8

154

Teortaxes▶️

@teortaxesTex

22 days

If it turns out that Llama-3 really is merely on Mistral/Gemma level, but some secret instruct-tuning pipeline launches it into Claude 3 realm, this is awkward. For one thing it makes all those doomer "undoing guardrails" papers relevant. It also makes a joke of community efforts

11

1

153

Teortaxes▶️

@teortaxesTex

13 days

One week in, a new doomer paper dropped, showing that refusal behavior in Llama 3 8B Instruct is represented by a one-dimensional linear subspace in activation space. Four days later, it's apparently been implemented to remove refusals from L3: let's check

Teortaxes▶️

@teortaxesTex

25 days

@DavidFSWD It's a very new paper so no. But in spirit there are similar works. I mainly mean that Llama has weird RLHF'd compunctions we could remove, and they're probably intrinsically very low-rank, like shown in many doomer papers on safety unlearning.

1

0

13

7

10

154

Teortaxes▶️

@teortaxesTex

1 month

Well played Elon. Grok is becoming indispensable to me as a solution to one problem Twitter could have fixed with a few LOCs, namely:

15

5

153

Teortaxes▶️

@teortaxesTex

3 months

This is why I'm bullish on Deepseek, and more so on OpenCodeInterpreter-like multiturn funetunes. It's not another Leaderboard big boy. It can think. It can reflect on its work. It only needs to be reminded of that. DSC-67B-OCI will be insanely useful.

shane @ rsa

@shncldwll

3 months

Tired of “LLM hacking” hype with no code? Here’s a breath of fresh air. 1. Challenges: open source ✅ 2. Solution framework: open source ✅ If you’re interested in hackbots in offsec and you’re craving something you can RUN, you gotta read this

0

54

232

4

14

149