Matt Brockman Profile Banner
Matt Brockman Profile
Matt Brockman

@badphilosopher

2,267
Followers
283
Following
31
Media
275
Statuses

playing with stuff

Joined July 2009
Don't wanna be here? Send us removal request.
@badphilosopher
Matt Brockman
6 months
and that's midwest integrity
@gdb
Greg Brockman
6 months
After learning today’s news, this is the message I sent to the OpenAI team:
Tweet media one
2K
5K
36K
16
29
957
@badphilosopher
Matt Brockman
6 months
💖
@sama
Sam Altman
6 months
i love the openai team so much
5K
4K
73K
3
4
227
@badphilosopher
Matt Brockman
1 month
u know damn well if julius was hallucinating input files like it did there ud make us issue a refund +drop everything to push an immediate mitigation
@0interestrates
rahul
1 month
i’ll defend devin and their team here. most of the critique here are things you’d hear from a senior engineer (dinosaur) who’s watching over your shoulder and constantly negging you. it’s completely regarded most of his arguments are strawman - devin used an incorrect version…
Tweet media one
145
35
787
5
3
175
@badphilosopher
Matt Brockman
25 days
julius is basically entirely backend engineers bottlenecked by figuring out how to make a button render quickly in the right color and location and cannot understate how much we appretiate people who understand this stuff helping out
@0interestrates
rahul
25 days
vercel developer support is god tier. emailed rauchg for some help, he replied within 19 minutes and looped in his team
Tweet media one
6
4
304
8
1
128
@badphilosopher
Matt Brockman
1 month
starting to ab test it now to delve into where it's improved
@OpenAI
OpenAI
1 month
Majorly improved GPT-4 Turbo model available now in the API and rolling out in ChatGPT.
451
753
5K
5
0
92
@badphilosopher
Matt Brockman
5 months
family coding best coding
3
1
79
@badphilosopher
Matt Brockman
4 years
We built an interface around #GPT3 to create , a tool to computationally produce text-based ads for products to different contexts. We're expanding our pool of limited beta users, check it out!
4
13
68
@badphilosopher
Matt Brockman
6 months
not like this
2
3
49
@badphilosopher
Matt Brockman
2 months
wooo
@Wolfram_Alpha
Wolfram|Alpha
2 months
We're excited to supercharge @JuliusAI_ 's mathematical computations with our newest partnership...bringing Computational Intelligence to Artificial Intelligence 😊 Check them out here:
Tweet media one
6
19
156
2
2
36
@badphilosopher
Matt Brockman
2 months
in celebration, we finally launched the improved dark mode experience everyone's been asking for!
@0interestrates
rahul
2 months
1 million data visualizations generated on Julius 📊🎉 @JuliusAI_ newest milestone
19
5
264
3
1
31
@badphilosopher
Matt Brockman
1 year
@goodside Formalizing something and giving it a name is still useful though. People were hitting the aidungeon scorebots years ago with prompt injection () but no one realized it was interesting enough to codify it at the time.
Tweet media one
0
1
26
@badphilosopher
Matt Brockman
1 month
our preliminary results from yesterday show bigly improvements for julius users
@badphilosopher
Matt Brockman
1 month
starting to ab test it now to delve into where it's improved
5
0
92
3
2
24
@badphilosopher
Matt Brockman
9 months
i think i got in a few words between the 'um's
@AlexReibman
Alex Reibman 🖇️
9 months
2/ JuliusAI Jupyter notebook, on steroids. Chat powered data analytics and AI agents, all in a notebook interface. Answer any question about your data with a single prompt. @badphilosopher @0interestrates @juliusAI_
Tweet media one
Tweet media two
1
5
83
4
1
23
@badphilosopher
Matt Brockman
8 months
playing with env vars in julius so you can add keys for important tasks like letting gpt4 use the openai api to write and evaluate poems
2
1
23
@badphilosopher
Matt Brockman
6 months
now we have an excuse to be on our phones during the business meetings cause we're using julius to crunch the numbers
@JuliusAI_
Julius AI
6 months
Introducing the Julius iOS App📱🤖 AI that: - Solves Math - Analyzes Data - Creates Visualizations - Writes & Executes Code is now in your pocket 🚀. Download here: Android next!
26
31
371
3
4
23
@badphilosopher
Matt Brockman
1 year
Text-Davinci-003 has some really cool properties when it comes to getting structured output and we figured we'd write about it because it's really useful
2
3
22
@badphilosopher
Matt Brockman
1 year
So we ran the numbers and it turns out GPT3's been able to do better than chance on Word in Context all along, and it's been improving with each new model.
Tweet media one
1
6
22
@badphilosopher
Matt Brockman
3 years
@BrendanNyhan GPT knows the right answer
Tweet media one
0
0
19
@badphilosopher
Matt Brockman
1 month
chest day best day
2
0
18
@badphilosopher
Matt Brockman
3 months
these llms can get like 80, 90% of the way there for a lot of tasks but then turns into whackamole on the last bit (speed/cost as well). there's a lot of potential in making it easier to get the finishing touches in.
@JuliusAI_
Julius AI
3 months
Introducing AI Graph Editing in Julius 📊📈 Users can now tweak and customize their graphs with AI and just natural language Easily modify the plot legend, size and title or just tell the AI in simple english how you want it modified😇 Give it a try!
3
3
96
1
2
18
@badphilosopher
Matt Brockman
2 years
@EMostaque @HBO i heard of this newfangled thing called a neural network that's essentially a lossy compression algorithm but I don't know if anyone's figured out how to get them to do anything with text or images yet
0
0
17
@badphilosopher
Matt Brockman
1 year
Updated the WIC GPT3 benchmarks, looks like gpt-3.5-turbo does about the same as text-davinci-003 for JSON/Code formatted prompts, although slightly worse when text only
Tweet media one
1
3
16
@badphilosopher
Matt Brockman
4 years
Tweet media one
0
0
12
@badphilosopher
Matt Brockman
2 months
'murica
@gdb
Greg Brockman
2 months
We started OpenAI out of my apartment eight years ago. And we’re still just getting started:
185
216
3K
0
1
11
@badphilosopher
Matt Brockman
1 year
@gruber @alextfife lists of stolen credit cards are cheap (card testing is now part of the background radiation of the internet unfortunately). paid isn't that great of a signal if at all when looking to prevent scam and fraud especially with a skeleton crew.
0
0
11
@badphilosopher
Matt Brockman
5 months
@andrew_n_carr also what's the x axis?
1
0
10
@badphilosopher
Matt Brockman
4 years
So the API can help translate from British to American! One of the harder issues with content analysis.
2
1
10
@badphilosopher
Matt Brockman
19 days
iconic @Zeet_Co swag and @JuliusAI_ dumps spotted!
@mickeyxfriedman
Mickey Friedman
19 days
Last call to RSVP to SF's AI Tinkerers event on April 29th! Pizza and demos -- 6pm at Github HQ. LFG. 🔥 Also, announcing our fireside Chat: @eoghan Mccabe, the founder of AI-first customer service unicorn @intercom . See you there!
Tweet media one
3
7
53
1
0
10
@badphilosopher
Matt Brockman
5 months
@willdepue hard problem. so many times with word in context getting excited about improvement on first 100 examples and then worse performance on next 500. ab test in prod gets better scale but still problematic =/
0
1
8
@badphilosopher
Matt Brockman
6 months
missing from this analysis: Honor for many its a genuine value
@GaryMarcus
Gary Marcus
6 months
How much was OpenAI really worth, a week ago? And what are the employees thinking about right now? Here’s a thought: If OpenAI has a lot of breakthrough, difficult to replicate IP (code, data, infrastructure etc), with a real business case, a lot of employees would stick…
92
70
558
1
1
9
@badphilosopher
Matt Brockman
2 months
i still don't know what self aware means in the context of gpt/claude, but having either power the awesome infinite self-reflecting while-loop chatbot from the scale hackathon last year works out of the box once i upgraded the apis for it :)
Tweet media one
Tweet media two
4
1
9
@badphilosopher
Matt Brockman
6 months
@sdand dev day dev day dev day dev day
0
0
9
@badphilosopher
Matt Brockman
1 month
@yi_ding y'all are taking april fools day far this year
2
0
8
@badphilosopher
Matt Brockman
3 years
GPT's all about the logprobs.
@nickwalton00
Nick Walton
3 years
GPT-3 is a super powerful tool, but there's still a lot of unknowns around how to use it effectively. Because of that we've decided to write a series of blog posts around how to use GPT-3 from what we've learned building AI Dungeon. Check it out here!
1
18
135
1
0
8
@badphilosopher
Matt Brockman
1 year
Claude can also do few shot word in context in JSON and Code! Got access at a hackathon over the weekend so added it to the list. It seems to do its best with JSON prompting.
Tweet media one
2
0
8
@badphilosopher
Matt Brockman
1 month
leg day best day
0
0
8
@badphilosopher
Matt Brockman
1 month
back day best day
1
0
7
@badphilosopher
Matt Brockman
1 year
@realGeorgeHotz Just replacing the from and putting in an @ gets most of the way there :P Thought draftjs was gonna be a bigger pain. I suppose should do cleanup after the search (and intercept clicks) but that's a lot of work.
1
0
7
@badphilosopher
Matt Brockman
2 years
@sama There's still a lot of handwaving on best practices. It's expensive to benchmark prompt usage and not centralized anywhere; would be amazing to have datasets and scoreboards using hold out test sets with free/subsidized compute for community to figure out best usage.
0
1
6
@badphilosopher
Matt Brockman
1 year
learning to code is fun (cool github viz by @sallar and friends ). new years resolution: not have those blips next year.
Tweet media one
2
0
7
@badphilosopher
Matt Brockman
1 month
julius for simon says
@dr_cintas
Alvaro Cintas
1 month
This is a game changer. I can now run the code Claude 3 or GPT-4 generates inside the chat box. Look at me playing “Simon Says” after one single prompt! Here is how you can try it free:
29
43
320
0
1
7
@badphilosopher
Matt Brockman
1 year
@xtimv It's because of latency (we serve one). The LLM generates each token in sequence... and if you're doing hundreds tokens this can take several seconds especially when underprovisioned. By serving each token as available (just stream=True on the APIs) it shows it's working faster.
0
0
6
@badphilosopher
Matt Brockman
2 months
@0interestrates hmu in slack, lemme show u how to set whatever it is up in our grafana
1
0
5
@badphilosopher
Matt Brockman
3 years
After playing around with how to format the prompt to walk itself through a task for a bit, figured worth playing with having a prompt double check its output as well. Gives improvements to arithmetic in GPT-J (and also 3 but J's cheaper :P)
1
1
6
@badphilosopher
Matt Brockman
4 years
Making progress on getting GPT-3 to compare/summarize news coverage across sources. Can be a bit slow on the fly but it's not too bad for grabbing differences in coverage across platforms and doing content analysis without me having to read much. Still a ways to go though.
0
2
6
@badphilosopher
Matt Brockman
3 months
i tend still to worry about war rather than peace
@ylecun
Yann LeCun
3 months
P(doom) is BS.
50
69
892
0
0
6
@badphilosopher
Matt Brockman
6 months
@alexgraveley We launched dark mode
1
0
6
@badphilosopher
Matt Brockman
2 months
* warning: if u run this, i don't think the off button works. it was a hackathon and didn't get that far.
0
0
6
@badphilosopher
Matt Brockman
2 years
putting stable diffusion and various llms into every website on the internet as it should be
@mattshumer_
Matt Shumer
2 years
🎉 We have some BIG news for @HyperWriteAI users! 🎉 You can now generate amazing images on nearly ANY website with just a few words! No more wasting time on Photoshop or searching for stock photos. Just let HyperWrite do the magic for you!
3
4
33
0
3
5
@badphilosopher
Matt Brockman
1 year
Anyway, all the code and results are at too. Now we gotta go run ANLI to figure out evaluating the limitations for using this stuff for editing and whatnot.
0
1
5
@badphilosopher
Matt Brockman
29 days
needs to happen fast. students already simply paying humans to do their homework and those companies make their services cheaper by abusing ai free tiers. efficient marketplace and all
@JvNixon
Jeremy Nixon
1 month
At the core of the AI crisis in education is that the system invents fake work for ppl to do so that they can be evaluated against the quality other people’s invented work. As soon as the frame switches to *actually making progress*, the incentive is to use AI to the greatest…
9
18
164
0
1
5
@badphilosopher
Matt Brockman
4 years
@danielbigham @FeepingCreature It can get 67% on WiC test set for figuring out where people are using the same word in different ways, . Not at human level (which would be 80%), but not random.
0
0
5
@badphilosopher
Matt Brockman
1 year
Also ran GPT4 through ANLI - gets 69.75% without any major tricks. Gotta go back and figure out what was getting 40% on original Davinci again though because didn't get it with these runs.
Tweet media one
0
0
5
@badphilosopher
Matt Brockman
1 month
@mikiovsh some time after our hr meeting about tweeting policies idk
0
0
5
@badphilosopher
Matt Brockman
1 year
Similarly, we've been using pseudo-python prompts, where we declare a few classes and use their functions without defining them and let text-davinci-003 guess what the output would be (except overriding the print with f-strings so we get the output formatted the way we want).
@goodside
Riley Goodside
1 year
I mostly write GPT-3 prompts in a formal subset of JavaScript consisting only of top-level assignments of JSON literals to variables, using a stop sequence of ";⏎" (semicolon, newline) so generations can be parsed as JSON. Also, here's the wiki on Taken 4: The Musical:
Tweet media one
11
21
444
0
0
5
@badphilosopher
Matt Brockman
5 months
anyone else having a bunch of @auth0 issues starting about 20 minutes ago?
0
0
5
@badphilosopher
Matt Brockman
4 years
So just using TFIDF for context stuffing with GPT API gets 9% exact match on the Break benchmark. (Which will be an important step for chaining prompts back into one another I suspect).
0
0
5
@badphilosopher
Matt Brockman
1 year
@ShaanVP @craigthomler @OpenAI @sama It was much harder before text-davinci-003, but also their safety policy made it a bit hard. We'd been doing our code interviews for @HyperWriteAI with chatbots using original davinci and then text-davinci-002 and they'd been making a lot of improvements over time.
0
0
5
@badphilosopher
Matt Brockman
1 year
And now GPT-4 gets 75% on WIC
Tweet media one
@badphilosopher
Matt Brockman
1 year
Updated the WIC GPT3 benchmarks, looks like gpt-3.5-turbo does about the same as text-davinci-003 for JSON/Code formatted prompts, although slightly worse when text only
Tweet media one
1
3
16
0
0
4
@badphilosopher
Matt Brockman
5 months
Tweet media one
1
0
4
@badphilosopher
Matt Brockman
1 month
1
0
4
@badphilosopher
Matt Brockman
3 years
Emojis continuing to push the bounds of NLG
@AndrewMayne
Andrew Mayne
3 years
Playing with AI is one of the fun things I do at @OpenAI . GPT-3 is much more capable than people realize when you utilize advanced prompt design. Here’s one simple trick that can make it 20x more efficient. #gpt3 #openai
0
5
25
0
0
4
@badphilosopher
Matt Brockman
6 months
@zachfick23 is now a frontend dev
@JuliusAI_
Julius AI
6 months
🦃 feast your eyes on our new dark mode
Tweet media one
1
0
17
0
0
4
@badphilosopher
Matt Brockman
11 days
@0interestrates dm'd u on slack
1
0
4
@badphilosopher
Matt Brockman
4 years
I guess not surprisingly, the API can also figure out where words appended to a chat with asterisks go :) In: interpretCorrections(text) Out: "I'm gonna eat pizza on the couch at 3am"
@xkcdComic
XKCD Comic
4 years
Asterisk Corrections
Tweet media one
38
780
4K
0
1
4
@badphilosopher
Matt Brockman
3 years
@doktorclaw @theshawwn 3 instead of 11 better tho!
Tweet media one
2
0
4
@badphilosopher
Matt Brockman
1 year
For the business logic, you can simple say in the prompt that there's a function f that does <your task> and let the ai guess the output. and what's really cool, you can include conditionals in the function as well -
1
0
4
@badphilosopher
Matt Brockman
25 days
@AvadyMikhail yeah but i still cant figure out the talk button on android
1
0
4
@badphilosopher
Matt Brockman
3 years
@JJSchroden it's a systemic problem (same as the 'lying to ourselves' white paper from 2015, ). when the process incentivizes lies that everyone knows are lies, the responsibility falls at the top.
0
1
4
@badphilosopher
Matt Brockman
1 year
@paulg @clark_aviation the red/white neutrality bands were painted on after the fact so allied forces wouldn't keep accidentally shooting them down like pico balloons.
0
0
3
@badphilosopher
Matt Brockman
3 years
The coolest thing about these bots was how the personality arises out of the need to perform a task rather than aiming to make it.
@nickwalton00
Nick Walton
3 years
Want an inside look into how we make some of the cool features of AI Dungeon? Check out our newest article in our series on how to use GPT-3 to build cool things!
0
6
32
0
0
3
@badphilosopher
Matt Brockman
4 months
@scottastevenson lot of poli sci research over the last 75 yrs backs that up - e.g. converse 1964 () / zaller '92 () maybe gpt just does what people do but we don't like to believe we do it
1
0
3
@badphilosopher
Matt Brockman
1 month
@mikiovsh ill put a coin in the swear jar at the sunday sync tmrw
1
0
3
@badphilosopher
Matt Brockman
1 year
But you can also get your output formatted the way you want for easy parsing. Just wrap your imaginary functions in a helper function that returns its input a JSON object with all the nesting you want and you can just call json.loads / parse()! It's great.
0
0
3
@badphilosopher
Matt Brockman
1 month
@atbeme @0interestrates just doing some quick and dirty market validation for getting the @JuliusAI_ gymtracker on the roadmap
1
0
3
@badphilosopher
Matt Brockman
2 years
@un1crom my gps begs to differ
1
0
3
@badphilosopher
Matt Brockman
1 month
@TheXeophon couldn't help myself lmao, sorry we should start doing writeups on our ab tests, good idea
1
0
3
@badphilosopher
Matt Brockman
3 years
@GregJaffe I'm not sure if there's many articles on the Afghan unit that now has responsibility for the base / assessments of their historical performance to really evaluate that question.
1
1
1
@badphilosopher
Matt Brockman
1 year
Generally LLMs are not great at what we might consider general logic, but since Text-Davinci-00n are a variant of the codex model (if i read their blog posts correctly) they're pretty good at predicting the output of code (see , )
@jamesjyu
james yu
1 year
This GPT virtual machine post is only the tip of the iceberg. @joshlabau and I have discovered that text-davinci-003 has the capability to do something we're calling HALLUCINATED SCRIPTS Buckle up for a thread, this one is mind bending 🤯
6
41
192
1
0
3
@badphilosopher
Matt Brockman
4 months
@TheXeophon @0interestrates i broke a rate limiter, should be fixed now, srry about that. ive been punished accordingly for each false positive.
0
0
2
@badphilosopher
Matt Brockman
1 year
@mathemagic1an yep, we roll our own. ended up with a lot of hard coded function calls w/ variables our scripts can use and there's some tradeoff between having to figure out how much parsing each script needs to do for different data sets (e.g. emails/webpages, search methods/count etc).
0
0
3
@badphilosopher
Matt Brockman
4 years
@antoniogm +1. The thing that surprised me the most about Afghanistan is I never ran into another American service member who spoke Pashto (the local language in the east/south). And that was after we'd been there for 15 years already.
1
0
3
@badphilosopher
Matt Brockman
2 months
@auth0 having issues or did i block myself
0
0
3
@badphilosopher
Matt Brockman
1 year
@shyamalanadkat @goodside it always has a probability for all next tokens though right? i'd think because the number of tokens over some threshold changes with each additional token in the context the cost for exploring all possible paths ahead with that cutoff may increase instead of decrease sometimes
1
0
3
@badphilosopher
Matt Brockman
1 year
LLMs are great, although once you start dealing with arbitrary inputs from users a variety of data it can be a bit hard to ensure that you're generating the content they're expecting. To deal with this, you normally have to do a bunch of pre-and post-processing.
1
0
3
@badphilosopher
Matt Brockman
1 year
Essentially you can route around and combine the imaginary inputs and outputs of multiple imaginary function calls. There's limits to how many you can include in a single prompt though.
1
0
2
@badphilosopher
Matt Brockman
2 months
si qua fata sinant iam tum tenditque fovetque
0
0
2
@badphilosopher
Matt Brockman
4 months
@AlexReibman sf always needs more hackathon
0
0
1
@badphilosopher
Matt Brockman
3 years
@dillonniederhut It's cooler if you graph it :P Gwern links to our findings at , I'd done the initial work with @MusicalBrockman
Tweet media one
1
0
2
@badphilosopher
Matt Brockman
2 years
@arankomatsuzaki I believe "I mathed out the result of each step as I walk through the solution." gets a .805 if I just ran it correctly if someone wants to double check
1
0
2
@badphilosopher
Matt Brockman
4 years
Attempting Word in Context with the API yesterday had null results for improving or worsening single-prompt fewshot performance changing number of examples. Still beats the results in the paper but worse than multi-step methods and task-specific SOTA.
Tweet media one
0
0
2
@badphilosopher
Matt Brockman
24 days
was at gym on preworkout ='(
@badphilosopher
Matt Brockman
25 days
@AvadyMikhail at gym we need to add voice input
1
0
1
0
0
2
@badphilosopher
Matt Brockman
3 years
the apple green-lanternist version of foundation seems so far to be a total rejection of the original where psychohistory was possible because everyone has agency in an efficient universe so the end result is predictable (so when everyone loses agency via the mule things break).
2
0
2
@badphilosopher
Matt Brockman
4 years
@nickwalton00 On the flip side, I do think a lot of people are using the API wrong. If you just ask it to purely auto-complete text, it'll just spit something out. You can get better reasoning and internal constraint by having it context-stuff itself (which improves its ANLI/WiC scores)
0
0
2
@badphilosopher
Matt Brockman
5 months
@0interestrates thanks @Zeet_Co for helping us!
0
0
2
@badphilosopher
Matt Brockman
4 years
@nickwalton00 I think there's quite a bit of political science literature that would support the perspective that it's not uncommon to repeat ideas without internalizing them / people have varying levels of internal constraint. E.g. Converse (1964)
1
0
2
@badphilosopher
Matt Brockman
3 years
@GregJaffe Not sure GIRoA won't loot it anyway (e.g. similar to the looting issue briefly mentioned in , ). There just seems to be a lot of unknowns about our partners even after being there 20 years.
0
1
1