Matt Brockman @badphilosopher Twitter profile

Last Seen Profiles

@heehauranaa

@HalickaHalicka1

@DavidTaraJorda1

@cd_hooks

@__MaOli

@lewdzure

@luisfcc_

@takweenKw

@DanielTrevilli1

@LizTray

@0xNeoCrypto

@77024Basketball

@leonjunglefish

@angels2324

@CycsNation

@yo5558935216513

@HdadSanEsteban

@SatDownSouth

@uzumakigallery

@urei_kaonasi

@jenovele

@aiaintlregion

@SlaveOfAllah44

@LenaLucy_vtuber

@DoksFemboy

@cestlep_

@starsalightx

@Elineyuhu

@Oddseliten

@ghareb33

@BTPScotland

@AviWoolf

@sivinkit

@cappucxino

@skullbasedoc

@ethanshumjr

Matt Brockman

@badphilosopher

6 months

and that's midwest integrity

Greg Brockman

@gdb

6 months

After learning today’s news, this is the message I sent to the OpenAI team:

2K

5K

36K

16

29

957

Matt Brockman

@badphilosopher

6 months

💖

Sam Altman

@sama

6 months

i love the openai team so much

5K

4K

73K

3

4

227

Matt Brockman

@badphilosopher

1 month

u know damn well if julius was hallucinating input files like it did there ud make us issue a refund +drop everything to push an immediate mitigation

rahul

@0interestrates

1 month

i’ll defend devin and their team here. most of the critique here are things you’d hear from a senior engineer (dinosaur) who’s watching over your shoulder and constantly negging you. it’s completely regarded most of his arguments are strawman - devin used an incorrect version…

145

35

787

5

3

175

Matt Brockman

@badphilosopher

25 days

julius is basically entirely backend engineers bottlenecked by figuring out how to make a button render quickly in the right color and location and cannot understate how much we appretiate people who understand this stuff helping out

rahul

@0interestrates

25 days

vercel developer support is god tier. emailed rauchg for some help, he replied within 19 minutes and looped in his team

6

4

304

8

1

128

Matt Brockman

@badphilosopher

1 month

starting to ab test it now to delve into where it's improved

OpenAI

@OpenAI

1 month

Majorly improved GPT-4 Turbo model available now in the API and rolling out in ChatGPT.

451

753

5K

5

0

92

Matt Brockman

@badphilosopher

5 months

family coding best coding

3

1

79

Matt Brockman

@badphilosopher

4 years

We built an interface around #GPT3 to create , a tool to computationally produce text-based ads for products to different contexts. We're expanding our pool of limited beta users, check it out!

4

13

68

Matt Brockman

@badphilosopher

6 months

not like this

2

3

49

Matt Brockman

@badphilosopher

2 months

wooo

Wolfram|Alpha

@Wolfram_Alpha

2 months

We're excited to supercharge @JuliusAI_ 's mathematical computations with our newest partnership...bringing Computational Intelligence to Artificial Intelligence 😊 Check them out here:

6

19

156

2

36

Matt Brockman

@badphilosopher

2 months

in celebration, we finally launched the improved dark mode experience everyone's been asking for!

rahul

@0interestrates

2 months

1 million data visualizations generated on Julius 📊🎉 @JuliusAI_ newest milestone

19

5

264

3

1

31

Matt Brockman

@badphilosopher

1 year

@goodside Formalizing something and giving it a name is still useful though. People were hitting the aidungeon scorebots years ago with prompt injection () but no one realized it was interesting enough to codify it at the time.

0

1

26

Matt Brockman

@badphilosopher

1 month

our preliminary results from yesterday show bigly improvements for julius users

Matt Brockman

@badphilosopher

1 month

starting to ab test it now to delve into where it's improved

5

0

92

3

2

24

Matt Brockman

@badphilosopher

9 months

i think i got in a few words between the 'um's

Alex Reibman 🖇️

@AlexReibman

9 months

2/ JuliusAI Jupyter notebook, on steroids. Chat powered data analytics and AI agents, all in a notebook interface. Answer any question about your data with a single prompt. @badphilosopher @0interestrates @juliusAI_

1

5

83

4

1

23

Matt Brockman

@badphilosopher

8 months

playing with env vars in julius so you can add keys for important tasks like letting gpt4 use the openai api to write and evaluate poems

Julius AI | Your AI Data Analyst

Julius is a powerful AI data analyst that helps you analyze and visualize your data. Chat with your data, create graphs, build forecasting models, and more.

julius.ai

2

1

23

Matt Brockman

@badphilosopher

6 months

now we have an excuse to be on our phones during the business meetings cause we're using julius to crunch the numbers

Julius AI

@JuliusAI_

6 months

Introducing the Julius iOS App📱🤖 AI that: - Solves Math - Analyzes Data - Creates Visualizations - Writes & Executes Code is now in your pocket 🚀. Download here: Android next!

26

31

371

3

4

23

Matt Brockman

@badphilosopher

1 year

Text-Davinci-003 has some really cool properties when it comes to getting structured output and we figured we'd write about it because it's really useful

2

3

22

Matt Brockman

@badphilosopher

1 year

So we ran the numbers and it turns out GPT3's been able to do better than chance on Word in Context all along, and it's been improving with each new model.

1

6

22

Matt Brockman

@badphilosopher

3 years

@BrendanNyhan GPT knows the right answer

0

19

Matt Brockman

@badphilosopher

1 month

chest day best day

2

0

18

Matt Brockman

@badphilosopher

3 months

these llms can get like 80, 90% of the way there for a lot of tasks but then turns into whackamole on the last bit (speed/cost as well). there's a lot of potential in making it easier to get the finishing touches in.

Julius AI

@JuliusAI_

3 months

Introducing AI Graph Editing in Julius 📊📈 Users can now tweak and customize their graphs with AI and just natural language Easily modify the plot legend, size and title or just tell the AI in simple english how you want it modified😇 Give it a try!

3

96

1

2

18

Matt Brockman

@badphilosopher

2 years

@EMostaque @HBO i heard of this newfangled thing called a neural network that's essentially a lossy compression algorithm but I don't know if anyone's figured out how to get them to do anything with text or images yet

0

17

Matt Brockman

@badphilosopher

1 year

Updated the WIC GPT3 benchmarks, looks like gpt-3.5-turbo does about the same as text-davinci-003 for JSON/Code formatted prompts, although slightly worse when text only

1

3

16

Matt Brockman

@badphilosopher

4 years

@BrendanNyhan @JasonSclar

0

12

Matt Brockman

@badphilosopher

2 months

'murica

Greg Brockman

@gdb

2 months

We started OpenAI out of my apartment eight years ago. And we’re still just getting started:

185

216

3K

0

1

11

Matt Brockman

@badphilosopher

1 year

@gruber @alextfife lists of stolen credit cards are cheap (card testing is now part of the background radiation of the internet unfortunately). paid isn't that great of a signal if at all when looking to prevent scam and fraud especially with a skeleton crew.

0

11

Matt Brockman

@badphilosopher

5 months

@andrew_n_carr also what's the x axis?

1

0

10

Matt Brockman

@badphilosopher

4 years

So the API can help translate from British to American! One of the harder issues with content analysis.

2

1

10

Matt Brockman

@badphilosopher

19 days

iconic @Zeet_Co swag and @JuliusAI_ dumps spotted!

Mickey Friedman

@mickeyxfriedman

19 days

Last call to RSVP to SF's AI Tinkerers event on April 29th! Pizza and demos -- 6pm at Github HQ. LFG. 🔥 Also, announcing our fireside Chat: @eoghan Mccabe, the founder of AI-first customer service unicorn @intercom . See you there!

3

7

53

1

0

10

Matt Brockman

@badphilosopher

5 months

@willdepue hard problem. so many times with word in context getting excited about improvement on first 100 examples and then worse performance on next 500. ab test in prod gets better scale but still problematic =/

0

1

8

Matt Brockman

@badphilosopher

6 months

missing from this analysis: Honor for many its a genuine value

Gary Marcus

@GaryMarcus

6 months

How much was OpenAI really worth, a week ago? And what are the employees thinking about right now? Here’s a thought: If OpenAI has a lot of breakthrough, difficult to replicate IP (code, data, infrastructure etc), with a real business case, a lot of employees would stick…

92

70

558

1

9

Matt Brockman

@badphilosopher

2 months

i still don't know what self aware means in the context of gpt/claude, but having either power the awesome infinite self-reflecting while-loop chatbot from the scale hackathon last year works out of the box once i upgraded the apis for it :)

4

1

9

Matt Brockman

@badphilosopher

6 months

@sdand dev day dev day dev day dev day

0

9

Matt Brockman

@badphilosopher

1 month

@yi_ding y'all are taking april fools day far this year

2

0

8

Matt Brockman

@badphilosopher

3 years

GPT's all about the logprobs.

Nick Walton

@nickwalton00

3 years

GPT-3 is a super powerful tool, but there's still a lot of unknowns around how to use it effectively. Because of that we've decided to write a series of blog posts around how to use GPT-3 from what we've learned building AI Dungeon. Check it out here!

1

18

135

1

0

8

Matt Brockman

@badphilosopher

1 year

Claude can also do few shot word in context in JSON and Code! Got access at a hackathon over the weekend so added it to the list. It seems to do its best with JSON prompting.

2

0

8

Matt Brockman

@badphilosopher

1 month

leg day best day

0

8

Matt Brockman

@badphilosopher

1 month

back day best day

1

0

7

Matt Brockman

@badphilosopher

1 year

@realGeorgeHotz Just replacing the from and putting in an @ gets most of the way there :P Thought draftjs was gonna be a bigger pain. I suppose should do cleanup after the search (and intercept clicks) but that's a lot of work.

1

0

7

Matt Brockman

@badphilosopher

2 years

@sama There's still a lot of handwaving on best practices. It's expensive to benchmark prompt usage and not centralized anywhere; would be amazing to have datasets and scoreboards using hold out test sets with free/subsidized compute for community to figure out best usage.

0

1

6

Matt Brockman

@badphilosopher

1 year

learning to code is fun (cool github viz by @sallar and friends ). new years resolution: not have those blips next year.

2

0

7

Matt Brockman

@badphilosopher

1 month

julius for simon says

Alvaro Cintas

@dr_cintas

1 month

This is a game changer. I can now run the code Claude 3 or GPT-4 generates inside the chat box. Look at me playing “Simon Says” after one single prompt! Here is how you can try it free:

29

43

320

0

1

7

Matt Brockman

@badphilosopher

1 year

@xtimv It's because of latency (we serve one). The LLM generates each token in sequence... and if you're doing hundreds tokens this can take several seconds especially when underprovisioned. By serving each token as available (just stream=True on the APIs) it shows it's working faster.

0

6

Matt Brockman

@badphilosopher

2 months

@0interestrates hmu in slack, lemme show u how to set whatever it is up in our grafana

1

0

5

Matt Brockman

@badphilosopher

3 years

After playing around with how to format the prompt to walk itself through a task for a bit, figured worth playing with having a prompt double check its output as well. Gives improvements to arithmetic in GPT-J (and also 3 but J's cheaper :P)

1+1=3. Wait, no, 1+1=2. How to have GPT sanity check itself.

Large language models have a problem where they tend to just make stuff up. You can help prevent this by having it sanity check itself.

towardsdatascience.com

1

6

Matt Brockman

@badphilosopher

4 years

Making progress on getting GPT-3 to compare/summarize news coverage across sources. Can be a bit slow on the fly but it's not too bad for grabbing differences in coverage across platforms and doing content analysis without me having to read much. Still a ways to go though.

0

2

6

Matt Brockman

@badphilosopher

3 months

i tend still to worry about war rather than peace

Yann LeCun

@ylecun

3 months

P(doom) is BS.

50

69

892

0

6

Matt Brockman

@badphilosopher

6 months

@alexgraveley We launched dark mode

1

0

6

Matt Brockman

@badphilosopher

2 months

* warning: if u run this, i don't think the off button works. it was a hackathon and didn't get that far.

GitHub - matthewlouisbrockman/SelfReviewingAgent

Contribute to matthewlouisbrockman/SelfReviewingAgent development by creating an account on GitHub.

github.com

0

6

Matt Brockman

@badphilosopher

2 years

putting stable diffusion and various llms into every website on the internet as it should be

Matt Shumer

@mattshumer_

2 years

🎉 We have some BIG news for @HyperWriteAI users! 🎉 You can now generate amazing images on nearly ANY website with just a few words! No more wasting time on Photoshop or searching for stock photos. Just let HyperWrite do the magic for you!

3

4

33

0

3

5

Matt Brockman

@badphilosopher

1 year

Anyway, all the code and results are at too. Now we gotta go run ANLI to figure out evaluating the limitations for using this stuff for editing and whatnot.

GitHub - OthersideAI/WordInContext

Contribute to OthersideAI/WordInContext development by creating an account on GitHub.

github.com

0

1

5

Matt Brockman

@badphilosopher

6 months

@gdb @merettm @sidorszymon @aleks_madry @sama 🫶

0

5

Matt Brockman

@badphilosopher

29 days

needs to happen fast. students already simply paying humans to do their homework and those companies make their services cheaper by abusing ai free tiers. efficient marketplace and all

Jeremy Nixon

@JvNixon

1 month

At the core of the AI crisis in education is that the system invents fake work for ppl to do so that they can be evaluated against the quality other people’s invented work. As soon as the frame switches to *actually making progress*, the incentive is to use AI to the greatest…

9

18

164

0

1

5

Matt Brockman

@badphilosopher

4 years

@danielbigham @FeepingCreature It can get 67% on WiC test set for figuring out where people are using the same word in different ways, . Not at human level (which would be 80%), but not random.

0

5

Matt Brockman

@badphilosopher

1 year

Also ran GPT4 through ANLI - gets 69.75% without any major tricks. Gotta go back and figure out what was getting 40% on original Davinci again though because didn't get it with these runs.

0

5

Matt Brockman

@badphilosopher

1 month

@mikiovsh some time after our hr meeting about tweeting policies idk

0

5

Matt Brockman

@badphilosopher

1 year

Similarly, we've been using pseudo-python prompts, where we declare a few classes and use their functions without defining them and let text-davinci-003 guess what the output would be (except overriding the print with f-strings so we get the output formatted the way we want).

Riley Goodside

@goodside

1 year

I mostly write GPT-3 prompts in a formal subset of JavaScript consisting only of top-level assignments of JSON literals to variables, using a stop sequence of ";⏎" (semicolon, newline) so generations can be parsed as JSON. Also, here's the wiki on Taken 4: The Musical:

11

21

444

0

5

Matt Brockman

@badphilosopher

5 months

anyone else having a bunch of @auth0 issues starting about 20 minutes ago?

0

5

Matt Brockman

@badphilosopher

4 years

So just using TFIDF for context stuffing with GPT API gets 9% exact match on the Break benchmark. (Which will be an important step for chaining prompts back into one another I suspect).

AI2 Leaderboard

The AI2 Leaderboard platform hosts public leaderboards for a variety of AI challenge tasks across multiple research domains.

leaderboard.allenai.org

0

5

Matt Brockman

@badphilosopher

1 year

@ShaanVP @craigthomler @OpenAI @sama It was much harder before text-davinci-003, but also their safety policy made it a bit hard. We'd been doing our code interviews for @HyperWriteAI with chatbots using original davinci and then text-davinci-002 and they'd been making a lot of improvements over time.

0

5

Matt Brockman

@badphilosopher

1 year

And now GPT-4 gets 75% on WIC

Matt Brockman

@badphilosopher

1 year

Updated the WIC GPT3 benchmarks, looks like gpt-3.5-turbo does about the same as text-davinci-003 for JSON/Code formatted prompts, although slightly worse when text only

1

3

16

0

4

Matt Brockman

@badphilosopher

5 months

@forstmeier @0interestrates

1

0

4

Matt Brockman

@badphilosopher

1 month

@mlejva . @0interestrates twitter feed

1

0

4

Matt Brockman

@badphilosopher

3 years

Emojis continuing to push the bounds of NLG

Andrew Mayne

@AndrewMayne

3 years

Playing with AI is one of the fun things I do at @OpenAI . GPT-3 is much more capable than people realize when you utilize advanced prompt design. Here’s one simple trick that can make it 20x more efficient. #gpt3 #openai

0

5

25

0

4

Matt Brockman

@badphilosopher

6 months

@zachfick23 is now a frontend dev

Julius AI

@JuliusAI_

6 months

🦃 feast your eyes on our new dark mode

1

0

17

0

4

Matt Brockman

@badphilosopher

11 days

@0interestrates dm'd u on slack

1

0

4

Matt Brockman

@badphilosopher

4 years

I guess not surprisingly, the API can also figure out where words appended to a chat with asterisks go :) In: interpretCorrections(text) Out: "I'm gonna eat pizza on the couch at 3am"

checkXKCD_corrections.ipynb

checkXKCD_corrections.ipynb. GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

XKCD Comic

@xkcdComic

4 years

Asterisk Corrections

38

780

4K

0

1

4

Matt Brockman

@badphilosopher

3 years

@doktorclaw @theshawwn 3 instead of 11 better tho!

2

0

4

Matt Brockman

@badphilosopher

1 year

For the business logic, you can simple say in the prompt that there's a function f that does <your task> and let the ai guess the output. and what's really cool, you can include conditionals in the function as well -

1

0

4

Matt Brockman

@badphilosopher

26 days

@atbeme @0interestrates bruh

1

0

4

Matt Brockman

@badphilosopher

25 days

@AvadyMikhail yeah but i still cant figure out the talk button on android

1

0

4

Matt Brockman

@badphilosopher

3 years

@JJSchroden it's a systemic problem (same as the 'lying to ourselves' white paper from 2015, ). when the process incentivizes lies that everyone knows are lies, the responsibility falls at the top.

0

1

4

Matt Brockman

@badphilosopher

1 year

@paulg @clark_aviation the red/white neutrality bands were painted on after the fact so allied forces wouldn't keep accidentally shooting them down like pico balloons.

0

3

Matt Brockman

@badphilosopher

3 years

The coolest thing about these bots was how the personality arises out of the need to perform a task rather than aiming to make it.

Nick Walton

@nickwalton00

3 years

Want an inside look into how we make some of the cool features of AI Dungeon? Check out our newest article in our series on how to use GPT-3 to build cool things!

0

6

32

0

3

Matt Brockman

@badphilosopher

7 months

@AlexReibman . @josh_bickett

0

3

Matt Brockman

@badphilosopher

4 months

@scottastevenson lot of poli sci research over the last 75 yrs backs that up - e.g. converse 1964 () / zaller '92 () maybe gpt just does what people do but we don't like to believe we do it

1

0

3

Matt Brockman

@badphilosopher

1 month

@mikiovsh ill put a coin in the swear jar at the sunday sync tmrw

1

0

3

Matt Brockman

@badphilosopher

1 year

But you can also get your output formatted the way you want for easy parsing. Just wrap your imaginary functions in a helper function that returns its input a JSON object with all the nesting you want and you can just call json.loads / parse()! It's great.

0

3

Matt Brockman

@badphilosopher

1 month

@atbeme @0interestrates just doing some quick and dirty market validation for getting the @JuliusAI_ gymtracker on the roadmap

1

0

3

Matt Brockman

@badphilosopher

2 years

@un1crom my gps begs to differ

1

0

3

Matt Brockman

@badphilosopher

1 month

@TheXeophon couldn't help myself lmao, sorry we should start doing writeups on our ab tests, good idea

1

0

3

Matt Brockman

@badphilosopher

3 years

@GregJaffe I'm not sure if there's many articles on the Afghan unit that now has responsibility for the base / assessments of their historical performance to really evaluate that question.

1

Matt Brockman

@badphilosopher

1 year

Generally LLMs are not great at what we might consider general logic, but since Text-Davinci-00n are a variant of the codex model (if i read their blog posts correctly) they're pretty good at predicting the output of code (see , )

james yu

@jamesjyu

1 year

This GPT virtual machine post is only the tip of the iceberg. @joshlabau and I have discovered that text-davinci-003 has the capability to do something we're calling HALLUCINATED SCRIPTS Buckle up for a thread, this one is mind bending 🤯

6

41

192

1

0

3

Matt Brockman

@badphilosopher

4 months

@TheXeophon @0interestrates i broke a rate limiter, should be fixed now, srry about that. ive been punished accordingly for each false positive.

0

2

Matt Brockman

@badphilosopher

1 year

@mathemagic1an yep, we roll our own. ended up with a lot of hard coded function calls w/ variables our scripts can use and there's some tradeoff between having to figure out how much parsing each script needs to do for different data sets (e.g. emails/webpages, search methods/count etc).

0

3

Matt Brockman

@badphilosopher

4 years

@antoniogm +1. The thing that surprised me the most about Afghanistan is I never ran into another American service member who spoke Pashto (the local language in the east/south). And that was after we'd been there for 15 years already.

1

0

3

Matt Brockman

@badphilosopher

2 months

@auth0 having issues or did i block myself

0

3

Matt Brockman

@badphilosopher

1 year

@shyamalanadkat @goodside it always has a probability for all next tokens though right? i'd think because the number of tokens over some threshold changes with each additional token in the context the cost for exploring all possible paths ahead with that cutoff may increase instead of decrease sometimes

1

0

3

Matt Brockman

@badphilosopher

1 year

LLMs are great, although once you start dealing with arbitrary inputs from users a variety of data it can be a bit hard to ensure that you're generating the content they're expecting. To deal with this, you normally have to do a bunch of pre-and post-processing.

1

0

3

Matt Brockman

@badphilosopher

1 year

Essentially you can route around and combine the imaginary inputs and outputs of multiple imaginary function calls. There's limits to how many you can include in a single prompt though.

1

0

2

Matt Brockman

@badphilosopher

2 months

si qua fata sinant iam tum tenditque fovetque

0

2

Matt Brockman

@badphilosopher

4 months

@AlexReibman sf always needs more hackathon

0

1

Matt Brockman

@badphilosopher

3 years

@dillonniederhut It's cooler if you graph it :P Gwern links to our findings at , I'd done the initial work with @MusicalBrockman

1

0

2

Matt Brockman

@badphilosopher

2 years

@arankomatsuzaki I believe "I mathed out the result of each step as I walk through the solution." gets a .805 if I just ran it correctly if someone wants to double check

1

0

2

Matt Brockman

@badphilosopher

4 years

Attempting Word in Context with the API yesterday had null results for improving or worsening single-prompt fewshot performance changing number of examples. Still beats the results in the paper but worse than multi-step methods and task-specific SOTA.

0

2

Matt Brockman

@badphilosopher

24 days

was at gym on preworkout ='(

Matt Brockman

@badphilosopher

25 days

@AvadyMikhail at gym we need to add voice input

1

0

1

0

2

Matt Brockman

@badphilosopher

3 years

the apple green-lanternist version of foundation seems so far to be a total rejection of the original where psychohistory was possible because everyone has agency in an efficient universe so the end result is predictable (so when everyone loses agency via the mule things break).

2

0

2

Matt Brockman

@badphilosopher

4 years

@nickwalton00 On the flip side, I do think a lot of people are using the API wrong. If you just ask it to purely auto-complete text, it'll just spit something out. You can get better reasoning and internal constraint by having it context-stuff itself (which improves its ANLI/WiC scores)

0

2

Matt Brockman

@badphilosopher

5 months

@0interestrates thanks @Zeet_Co for helping us!

0

2

Matt Brockman

@badphilosopher

4 years

@nickwalton00 I think there's quite a bit of political science literature that would support the perspective that it's not uncommon to repeat ideas without internalizing them / people have varying levels of internal constraint. E.g. Converse (1964)

1

0

2

Matt Brockman

@badphilosopher

3 years

@GregJaffe Not sure GIRoA won't loot it anyway (e.g. similar to the looting issue briefly mentioned in , ). There just seems to be a lot of unknowns about our partners even after being there 20 years.

0

1