i’ll defend devin and their team here. most of the critique here are things you’d hear from a senior engineer (dinosaur) who’s watching over your shoulder and constantly negging you. it’s completely regarded
most of his arguments are strawman
- devin used an incorrect version…
julius is basically entirely backend engineers bottlenecked by figuring out how to make a button render quickly in the right color and location and cannot understate how much we appretiate people who understand this stuff helping out
We built an interface around
#GPT3
to create , a tool to computationally produce text-based ads for products to different contexts. We're expanding our pool of limited beta users, check it out!
We're excited to supercharge
@JuliusAI_
's mathematical computations with our newest partnership...bringing Computational Intelligence to Artificial Intelligence 😊
Check them out here:
@goodside
Formalizing something and giving it a name is still useful though. People were hitting the aidungeon scorebots years ago with prompt injection () but no one realized it was interesting enough to codify it at the time.
2/ JuliusAI
Jupyter notebook, on steroids.
Chat powered data analytics and AI agents, all in a notebook interface. Answer any question about your data with a single prompt.
@badphilosopher
@0interestrates
@juliusAI_
Introducing the Julius iOS App📱🤖
AI that:
- Solves Math
- Analyzes Data
- Creates Visualizations
- Writes & Executes Code
is now in your pocket 🚀. Download here:
Android next!
Text-Davinci-003 has some really cool properties when it comes to getting structured output and we figured we'd write about it because it's really useful
So we ran the numbers and it turns out GPT3's been able to do better than chance on Word in Context all along, and it's been improving with each new model.
these llms can get like 80, 90% of the way there for a lot of tasks but then turns into whackamole on the last bit (speed/cost as well). there's a lot of potential in making it easier to get the finishing touches in.
Introducing AI Graph Editing in Julius 📊📈
Users can now tweak and customize their graphs with AI and just natural language
Easily modify the plot legend, size and title or just tell the AI in simple english how you want it modified😇
Give it a try!
@EMostaque
@HBO
i heard of this newfangled thing called a neural network that's essentially a lossy compression algorithm but I don't know if anyone's figured out how to get them to do anything with text or images yet
Updated the WIC GPT3 benchmarks, looks like gpt-3.5-turbo does about the same as text-davinci-003 for JSON/Code formatted prompts, although slightly worse when text only
@gruber
@alextfife
lists of stolen credit cards are cheap (card testing is now part of the background radiation of the internet unfortunately). paid isn't that great of a signal if at all when looking to prevent scam and fraud especially with a skeleton crew.
Last call to RSVP to SF's AI Tinkerers event on April 29th!
Pizza and demos -- 6pm at Github HQ. LFG. 🔥
Also, announcing our fireside Chat:
@eoghan
Mccabe, the founder of AI-first customer service unicorn
@intercom
.
See you there!
@willdepue
hard problem. so many times with word in context getting excited about improvement on first 100 examples and then worse performance on next 500.
ab test in prod gets better scale but still problematic =/
How much was OpenAI really worth, a week ago? And what are the employees thinking about right now?
Here’s a thought:
If OpenAI has a lot of breakthrough, difficult to replicate IP (code, data, infrastructure etc), with a real business case, a lot of employees would stick…
i still don't know what self aware means in the context of gpt/claude, but having either power the awesome infinite self-reflecting while-loop chatbot from the scale hackathon last year works out of the box once i upgraded the apis for it :)
GPT-3 is a super powerful tool, but there's still a lot of unknowns around how to use it effectively. Because of that we've decided to write a series of blog posts around how to use GPT-3 from what we've learned building AI Dungeon. Check it out here!
Claude can also do few shot word in context in JSON and Code! Got access at a hackathon over the weekend so added it to the list. It seems to do its best with JSON prompting.
@realGeorgeHotz
Just replacing the from and putting in an @ gets most of the way there :P Thought draftjs was gonna be a bigger pain. I suppose should do cleanup after the search (and intercept clicks) but that's a lot of work.
@sama
There's still a lot of handwaving on best practices.
It's expensive to benchmark prompt usage and not centralized anywhere; would be amazing to have datasets and scoreboards using hold out test sets with free/subsidized compute for community to figure out best usage.
This is a game changer.
I can now run the code Claude 3 or GPT-4 generates inside the chat box. Look at me playing “Simon Says” after one single prompt!
Here is how you can try it free:
@xtimv
It's because of latency (we serve one). The LLM generates each token in sequence... and if you're doing hundreds tokens this can take several seconds especially when underprovisioned. By serving each token as available (just stream=True on the APIs) it shows it's working faster.
After playing around with how to format the prompt to walk itself through a task for a bit, figured worth playing with having a prompt double check its output as well. Gives improvements to arithmetic in GPT-J (and also 3 but J's cheaper :P)
Making progress on getting GPT-3 to compare/summarize news coverage across sources. Can be a bit slow on the fly but it's not too bad for grabbing differences in coverage across platforms and doing content analysis without me having to read much. Still a ways to go though.
🎉 We have some BIG news for
@HyperWriteAI
users! 🎉
You can now generate amazing images on nearly ANY website with just a few words!
No more wasting time on Photoshop or searching for stock photos. Just let HyperWrite do the magic for you!
Anyway, all the code and results are at too.
Now we gotta go run ANLI to figure out evaluating the limitations for using this stuff for editing and whatnot.
needs to happen fast. students already simply paying humans to do their homework and those companies make their services cheaper by abusing ai free tiers. efficient marketplace and all
At the core of the AI crisis in education is that the system invents fake work for ppl to do so that they can be evaluated against the quality other people’s invented work.
As soon as the frame switches to *actually making progress*, the incentive is to use AI to the greatest…
@danielbigham
@FeepingCreature
It can get 67% on WiC test set for figuring out where people are using the same word in different ways, . Not at human level (which would be 80%), but not random.
Also ran GPT4 through ANLI - gets 69.75% without any major tricks. Gotta go back and figure out what was getting 40% on original Davinci again though because didn't get it with these runs.
Similarly, we've been using pseudo-python prompts, where we declare a few classes and use their functions without defining them and let text-davinci-003 guess what the output would be (except overriding the print with f-strings so we get the output formatted the way we want).
I mostly write GPT-3 prompts in a formal subset of JavaScript consisting only of top-level assignments of JSON literals to variables, using a stop sequence of ";⏎" (semicolon, newline) so generations can be parsed as JSON.
Also, here's the wiki on Taken 4: The Musical:
So just using TFIDF for context stuffing with GPT API gets 9% exact match on the Break benchmark. (Which will be an important step for chaining prompts back into one another I suspect).
@ShaanVP
@craigthomler
@OpenAI
@sama
It was much harder before text-davinci-003, but also their safety policy made it a bit hard. We'd been doing our code interviews for
@HyperWriteAI
with chatbots using original davinci and then text-davinci-002 and they'd been making a lot of improvements over time.
Updated the WIC GPT3 benchmarks, looks like gpt-3.5-turbo does about the same as text-davinci-003 for JSON/Code formatted prompts, although slightly worse when text only
Playing with AI is one of the fun things I do at
@OpenAI
.
GPT-3 is much more capable than people realize when you utilize advanced prompt design. Here’s one simple trick that can make it 20x more efficient.
#gpt3
#openai
I guess not surprisingly, the API can also figure out where words appended to a chat with asterisks go :)
In: interpretCorrections(text)
Out:
"I'm gonna eat pizza on the couch at 3am"
For the business logic, you can simple say in the prompt that there's a function f that does <your task> and let the ai guess the output. and what's really cool, you can include conditionals in the function as well -
@JJSchroden
it's a systemic problem (same as the 'lying to ourselves' white paper from 2015, ). when the process incentivizes lies that everyone knows are lies, the responsibility falls at the top.
@paulg
@clark_aviation
the red/white neutrality bands were painted on after the fact so allied forces wouldn't keep accidentally shooting them down like pico balloons.
Want an inside look into how we make some of the cool features of AI Dungeon?
Check out our newest article in our series on how to use GPT-3 to build cool things!
@scottastevenson
lot of poli sci research over the last 75 yrs backs that up - e.g. converse 1964 () / zaller '92 ()
maybe gpt just does what people do but we don't like to believe we do it
But you can also get your output formatted the way you want for easy parsing. Just wrap your imaginary functions in a helper function that returns its input a JSON object with all the nesting you want and you can just call json.loads / parse()! It's great.
@GregJaffe
I'm not sure if there's many articles on the Afghan unit that now has responsibility for the base / assessments of their historical performance to really evaluate that question.
Generally LLMs are not great at what we might consider general logic, but since Text-Davinci-00n are a variant of the codex model (if i read their blog posts correctly) they're pretty good at predicting the output of code (see , )
This GPT virtual machine post is only the tip of the iceberg.
@joshlabau
and I have discovered that text-davinci-003 has the capability to do something we're calling HALLUCINATED SCRIPTS
Buckle up for a thread, this one is mind bending 🤯
@mathemagic1an
yep, we roll our own. ended up with a lot of hard coded function calls w/ variables our scripts can use and there's some tradeoff between having to figure out how much parsing each script needs to do for different data sets (e.g. emails/webpages, search methods/count etc).
@antoniogm
+1. The thing that surprised me the most about Afghanistan is I never ran into another American service member who spoke Pashto (the local language in the east/south). And that was after we'd been there for 15 years already.
@shyamalanadkat
@goodside
it always has a probability for all next tokens though right? i'd think because the number of tokens over some threshold changes with each additional token in the context the cost for exploring all possible paths ahead with that cutoff may increase instead of decrease sometimes
LLMs are great, although once you start dealing with arbitrary inputs from users a variety of data it can be a bit hard to ensure that you're generating the content they're expecting. To deal with this, you normally have to do a bunch of pre-and post-processing.
Essentially you can route around and combine the imaginary inputs and outputs of multiple imaginary function calls. There's limits to how many you can include in a single prompt though.
@arankomatsuzaki
I believe "I mathed out the result of each step as I walk through the solution." gets a .805 if I just ran it correctly if someone wants to double check
Attempting Word in Context with the API yesterday had null results for improving or worsening single-prompt fewshot performance changing number of examples. Still beats the results in the paper but worse than multi-step methods and task-specific SOTA.
the apple green-lanternist version of foundation seems so far to be a total rejection of the original where psychohistory was possible because everyone has agency in an efficient universe so the end result is predictable (so when everyone loses agency via the mule things break).
@nickwalton00
On the flip side, I do think a lot of people are using the API wrong. If you just ask it to purely auto-complete text, it'll just spit something out. You can get better reasoning and internal constraint by having it context-stuff itself (which improves its ANLI/WiC scores)
@nickwalton00
I think there's quite a bit of political science literature that would support the perspective that it's not uncommon to repeat ideas without internalizing them / people have varying levels of internal constraint. E.g. Converse (1964)
@GregJaffe
Not sure GIRoA won't loot it anyway (e.g. similar to the looting issue briefly mentioned in , ). There just seems to be a lot of unknowns about our partners even after being there 20 years.