Ansong Ni Profile
Ansong Ni

@AnsongNi

1,421
Followers
385
Following
73
Media
513
Statuses

Final-year PhD student @Yale , #NLProc , LLM for Code. (ex-)intern @GoogleDeepMind , @MetaAI , @MSFTResearch , @allen_ai . MS from @SCSatCMU . Opinions are my own.

New Haven
Joined February 2020
Don't wanna be here? Send us removal request.
Pinned Tweet
@AnsongNi
Ansong Ni
2 months
Excited to share our work at @GoogleDeepMind ! We propose Naturalized Execution Tuning (NExT), a self-training method that drastically improves the LLM's ability to reason about code execution, by learning to inspect execution traces and generate chain-of-thought rationales ๐Ÿงต๐Ÿ‘‡
Tweet media one
15
123
577
@AnsongNi
Ansong Ni
5 months
I recently gave a guest lecture (outline below) about LLMs for code and math for the "AI Foundation Models" course at Yale, and I've just made the slides and recordings publicly available: slides: recordings:
Tweet media one
5
61
306
@AnsongNi
Ansong Ni
1 year
Execution results are strong indicators of program correctness. But how can we improve LLMs for code generation with execution? In our new paper, we propose LEVER, a simple method that learns to verify and rerank LLM-generated programs with their execution results. ๐Ÿงต๐Ÿ‘‡ (1/n)
Tweet media one
3
44
254
@AnsongNi
Ansong Ni
1 year
As one of Drago's current PhD students, I am still in shock and disbelief. Drago means so much more to me than the word "advisor" could ever entail. He is a great mentor, a good friend, and one of the kindest and most down-to-earth people I've ever known 1/
Tweet media one
@hmkyale
Harlan Krumholz
1 year
The #AI community, the #computerscience community, the @YaleSEAS community, and humanity have suddenly lost a remarkable person, @dragomir_radev - kind and brilliant, devoted to his family and friends... gone too soon. A sad day @Yale @YINSedge @YaleCompsci #NLP2023
Tweet media one
Tweet media two
41
87
389
6
12
250
@AnsongNi
Ansong Ni
9 months
How good are current LLMs at translating natural language into executable code? Introducing L2CEval, where we benchmark language-to-code (L2C) generation abilities of 54 models from 12 orgs, testing on 7 tasks from 3 core domains. Here is what we found in this first release of
Tweet media one
6
50
241
@AnsongNi
Ansong Ni
1 year
Late advertisement but Iโ€™m giving a talk @MIT_CSAIL on Program Synthesis w/ LLM this afternoon at 4PM EST. This talk is open to everyone so feel free to join in person or over zoom! More info (w/ zoom link) here:
Tweet media one
3
8
106
@AnsongNi
Ansong Ni
1 year
While the same NL specs can often be satisfied by different programs, most datasets only provides one for learning. This could easily lead to overfitting as the figure shown below. In our new #ICLR2023 paper, we show how we can mitigate this issue with self-sampling ๐Ÿงต1/7
Tweet media one
4
15
82
@AnsongNi
Ansong Ni
2 months
I know Iโ€™m being bullish with tokenizers but this one really made my day
Tweet media one
2
7
75
@AnsongNi
Ansong Ni
3 months
HumanEval & MBPP are top datasets in evaluating LLMs for code. Despite common suspicion of contamination, quantifying it is hard as it would require massive pairwise comparison between the examples in datasets and the pretraining corpus โ€“ and weโ€™ve done exactly this๐Ÿงต๐Ÿ‘‡ In this
Tweet media one
1
13
68
@AnsongNi
Ansong Ni
1 year
#ICLR2023 is my first in-person conference post-covid and itโ€™s the most rewarding experience ever. I reunited with my friends, collaborators, mentorsโ€ฆ some of which I havenโ€™t seen in years and some I actually met in person for the first time. 1/n
Tweet media one
Tweet media two
Tweet media three
2
6
64
@AnsongNi
Ansong Ni
1 year
Glad to share that LEVER is accepted to @icmlconf and Iโ€™m attending #ICML2023 in person in Hawaii ๐Ÿ–๏ธ
@AnsongNi
Ansong Ni
1 year
Execution results are strong indicators of program correctness. But how can we improve LLMs for code generation with execution? In our new paper, we propose LEVER, a simple method that learns to verify and rerank LLM-generated programs with their execution results. ๐Ÿงต๐Ÿ‘‡ (1/n)
Tweet media one
3
44
254
2
7
60
@AnsongNi
Ansong Ni
3 months
ICML/ACL reviewing thoughts: Itโ€™s crazy how creative people are in disguising the fact that they simply distill from GPT-4
4
5
59
@AnsongNi
Ansong Ni
10 months
This is my last week as an intern @GoogleDeepMind . The past summer was nothing but inspiring. Thanks everyone in the learning for code team, for making me feel so welcomed since day one. Next, Iโ€™ll be on the job market later this year, stay tuned!
Tweet media one
Tweet media two
1
0
59
@AnsongNi
Ansong Ni
1 month
Happy to share that NExT is accepted to #ICML2024 !
@AnsongNi
Ansong Ni
2 months
Excited to share our work at @GoogleDeepMind ! We propose Naturalized Execution Tuning (NExT), a self-training method that drastically improves the LLM's ability to reason about code execution, by learning to inspect execution traces and generate chain-of-thought rationales ๐Ÿงต๐Ÿ‘‡
Tweet media one
15
123
577
1
4
57
@AnsongNi
Ansong Ni
1 year
Seems like a good time to share that I am joining Google Brain (now Google DeepMind) as a research intern this summer! I will be working on code generation + LLM with @pengchengyin in @RandomlyWalking โ€™s team.
@demishassabis
Demis Hassabis
1 year
The phenomenal teams from Google Researchโ€™s Brain and @DeepMind have made many of the seminal research advances that underpin modern AI, from Deep RL to Transformers. Now weโ€™re joining forces as a single unit, Google DeepMind, which Iโ€™m thrilled to lead!
159
654
4K
2
1
56
@AnsongNi
Ansong Ni
1 year
OpenAI announced that Codex API will be discontinued in 3 days. Me in the middle of a Codex-related project is like:
Tweet media one
2
4
55
@AnsongNi
Ansong Ni
10 months
When you do a PhD, donโ€™t do it to impress others, do it only when it brings you closer to your dream
6
5
55
@AnsongNi
Ansong Ni
11 months
This is how I โ€œhackedโ€ the seat back screen on a Boeing 737 to be an external monitor for my MacBook:
Tweet media one
1
2
52
@AnsongNi
Ansong Ni
1 year
Soโ€ฆ In 2021, Codex was out during the 1st month of my internship @MSFTResearch ; In 2022, OPT was released right before my internship @MetaAI ; Now in 2023, PaLM 2 is out 3 weeks before my internship @DeepMind I surely know how to appear in the right place at the right time๐Ÿ˜†
@DynamicWebPaige
๐Ÿ‘ฉโ€๐Ÿ’ป Paige Bailey
1 year
๐Ÿ“„ Make sure to check out the technical report for fun examples, details about building the model, and more: #GoogleIO #GoogleIO2023 ๐Ÿ˜ญ Am also tearing up a little. So dang proud of this awesome team and excited to continue this work at Google DeepMind:
Tweet media one
9
11
159
4
1
46
@AnsongNi
Ansong Ni
3 years
Switching from one library to another (e.g., tf->torch) and tired of the manual refactoring needs to be done? We aim to solve this problem with our new @ICSEconf #icse21 paper: "SOAR: A Synthesis Approach for Data Science API Refactoring". [1/4]
Tweet media one
1
9
39
@AnsongNi
Ansong Ni
11 months
Hey cool people at #ICML2023 : We will present our poster tomorrow (Thu) from 1:30PM to whenever it takes! Come and chat with me, @VictoriaLinML and @sidawxyz if youโ€™re interested in code generation, LLM or training verifiers!
Tweet media one
@AnsongNi
Ansong Ni
1 year
Execution results are strong indicators of program correctness. But how can we improve LLMs for code generation with execution? In our new paper, we propose LEVER, a simple method that learns to verify and rerank LLM-generated programs with their execution results. ๐Ÿงต๐Ÿ‘‡ (1/n)
Tweet media one
3
44
254
0
5
36
@AnsongNi
Ansong Ni
1 year
At last, I want to say that I will be forever proud to be Drago's student, and I will do my best to honor his legacy for the many years to come. 8/8
0
0
36
@AnsongNi
Ansong Ni
11 months
With #ACL2023NLP wrapping up, it's time to warm up for #ICML2023 ! Check out the online demo for our ICML paper, "LEVER: Learning to Verify Language-to-Code Generation using Execution", now available on๐Ÿค—spaces! We also release code, model weights, and more in ๐Ÿงต๐Ÿ‘‡:
2
4
33
@AnsongNi
Ansong Ni
11 months
Arrived! Iโ€™ll be at ICML till the 30th, let me know if youโ€™d like to meet and talk about code LLM, code AI and more! DMs are welcomed!
Tweet media one
0
1
30
@AnsongNi
Ansong Ni
5 months
Btw, if you haven't, check out this great course by @armancohan on AI foundation models, which covers a wide range of topics about LLMs (e.g., PET, RAG, etc). All course materials (slides, notes, hws, code) are publicly available. course website:
3
7
29
@AnsongNi
Ansong Ni
1 year
When I applied to PhD programs a few years ago, I got rejected from 14/15 schools I applied to, and Drago is the only one who accepted me. And it turns out I was not the only one in our lab. He always sees the best in the students and offers opportunities whenever he can 2/
1
0
29
@AnsongNi
Ansong Ni
2 years
Been trying out #ChatGPT today and honestly I am not very impressed. Many known issues for GPT-3 still remains. Here is my favorite failure case where it shows no logic in its reasoning. More in the ๐Ÿงต below (1/7):
Tweet media one
6
6
27
@AnsongNi
Ansong Ni
1 year
Hey cool people at #ICLR2023 ! We are presenting this work in the poster room at station #26 today (5.3) from 11:30AM to whenever it takes! Come and talk with us about the paper, program synthesis, LLM and more! More info:
@AnsongNi
Ansong Ni
1 year
While the same NL specs can often be satisfied by different programs, most datasets only provides one for learning. This could easily lead to overfitting as the figure shown below. In our new #ICLR2023 paper, we show how we can mitigate this issue with self-sampling ๐Ÿงต1/7
Tweet media one
4
15
82
0
2
24
@AnsongNi
Ansong Ni
1 year
I was deeply touched by the number of people sharing their stories with Drago, realizing that his influence was far beyond my imagination, which is why I took the courage to share mine. As his PhD student, I will continue his research and more importantly, spread his kindness 7/
1
0
25
@AnsongNi
Ansong Ni
11 months
Okay, the hack is to fold the cover of your iPad and insert it into the crack on top of the seat back screen, and wirelessly connect to your Mac using sidecar, works like 82% of the time.
0
0
25
@AnsongNi
Ansong Ni
7 months
if you value paper acceptance above all other qualities of your work, youโ€™re gonna have a bad time
0
2
23
@AnsongNi
Ansong Ni
3 years
In multi-doc and open-domain QA, the correct answer can often be derived from different sources of evidence, but typically only one is annotated as gold. How does this affect the training of retrieval and reasoning models? Check out our new #EMNLP2021 paper (a thread):๐Ÿ‘‡
Tweet media one
1
2
22
@AnsongNi
Ansong Ni
6 months
Loving my new @GoogleDeepMind swags ๐Ÿคฉ๐Ÿคฉ๐Ÿคฉ Itโ€™s an open secret that free swags boost employee satisfaction and productivity by 200%
Tweet media one
0
0
21
@AnsongNi
Ansong Ni
3 months
The first thing I do whenever a new "SoTA" model is released is testing its ability to reason about program execution. Unfortunately, Claude 3 Opus (large) can't even reason about the simplest program. But GPT-4 does this quite well. Claude (left) vs. GPT (right)
Tweet media one
Tweet media two
1
1
19
@AnsongNi
Ansong Ni
1 year
He offered me this opportunity and believed in me. And I have been working hard ever since, trying to repay his trust and prove that he did not place his bet wrong. And as I have only 1 yr left in my PhD, it really pains me to think that I won't see him at the finish line 3/
1
0
19
@AnsongNi
Ansong Ni
3 years
One year after joining the lab, I was finally able to meet my awesome advisor and lab mates ๐Ÿ˜† @ruizhang_nlp @taoyds @alexfabbri4 @Violaoreal #VaccinesWork
Tweet media one
0
0
18
@AnsongNi
Ansong Ni
1 year
It pains me to see the papers he sent in my inbox just days ago. It also pains me to think the paper I am writing will be the last paper I coauthor with him. However, I think his greatest legacy is not the papers he published, but the people he influenced throughout the years 6/
1
0
18
@AnsongNi
Ansong Ni
1 year
Quite a lot are missing: @MetaAI InCoder @Replit Replit-code @MSFTResearch MIM @Tsinghua_Uni CodeGeeX @GoogleAI PaLM-Coder/Bard for code Also, @BigscienceW BLOOM, @MetaAI LLaMA series, @AiEleuther Pythia/GPT-NeoX/J, @AnthropicAI Claude all have non-trivial coding ability
@TheTuringPost
TuringPost
1 year
8 options for code generation: โ–ช๏ธ @huggingface & @ServiceNowRSRCH StarCoder โ–ช๏ธ @DeepMind AlphaCode โ–ช๏ธ @AmazonScience CodeWhisperer โ–ช๏ธ @OpenAI Codex & @github copilot โ–ช๏ธ @OpenAI ChatGPT โ–ช๏ธ @salesforce Codegen โ–ช๏ธ @salesforce CodeT5 โ–ช๏ธ @VHellendoorn Polycoder โ–ช๏ธ @tabnine Did we
Tweet media one
5
23
97
1
1
17
@AnsongNi
Ansong Ni
1 year
During our meetings, I would talk to Drago about anything, from research to career advice, from soccer to rock bands. He would show me videos of him being a translator at 1994 world cup, and I would share the videos of me playing rock guitar solos 4/
1
0
17
@AnsongNi
Ansong Ni
7 months
So you can see your fellow reviewersโ€™ names for ICLR this year? This must have taken those self-citers for โ€œmissing referencesโ€ by surprise lol. Amazed by the amount of people who actually do that.
2
1
16
@AnsongNi
Ansong Ni
9 months
L2CEval is very much an ongoing work but I simply can't keep those results to myself any longer๐Ÿ˜‰ Behind each number is a jsonl file that saves all the output tokens and logits, and we are doing more digging as we speak๐Ÿคฉ So let us know if you think of something interesting!
Tweet media one
2
0
16
@AnsongNi
Ansong Ni
3 years
Excited to finally share about our new summarization toolkit - SummerTime, which is accepted to #EMNLP2021 Demos! GitHub(100+โญ๏ธ): We built this library specifically for non-expert users, with several merits, for several reasons (a thread ๐Ÿงต):
Tweet media one
1
4
16
@AnsongNi
Ansong Ni
11 months
โ€œWheneverโ€ ended up being 3:45PM ๐Ÿ˜ตThanks everyone who stopped by our poster yesterday! If you missed it but still would like to know more about this work, feel free to DM me!
Tweet media one
Tweet media two
@AnsongNi
Ansong Ni
11 months
Hey cool people at #ICML2023 : We will present our poster tomorrow (Thu) from 1:30PM to whenever it takes! Come and chat with me, @VictoriaLinML and @sidawxyz if youโ€™re interested in code generation, LLM or training verifiers!
Tweet media one
0
5
36
2
0
16
@AnsongNi
Ansong Ni
11 days
The AI safety Iโ€™m worried about: * Self-driving car crashes * Robot loses grip of a knife when cooking * AI wrote a bug in rocket launching software * False-negative diagnosis of diseases The AI safety Iโ€™m not worried about: * A language model going rouge plotting against me
0
2
15
@AnsongNi
Ansong Ni
4 years
#AAAI2020 I will be giving a 20min oral presentation of our work #8812 โ€œMerging Weak and Active Supervision for Semantic Parsingโ€ on 12th (Wed), 15:50 at Sutton South room. Joint work with @pengchengyin and @gneubig . Paper link:
0
3
15
@AnsongNi
Ansong Ni
4 years
I had the pleasure working with Graham during my CMU days. Btw, I had zero NLP experience when we got started, so you know Graham means it when he says โ€œall backgrounds are welcomedโ€ :)
@gneubig
Graham Neubig
4 years
Next year I will be looking for 1-2 PhD students who are interested in doing deep and impactful work on NLP! (areas are open, but I like multilingual NLP/compling, natural language interfaces, ML for NLP) Please apply below and mention me in your app: 1/2
5
88
252
0
0
14
@AnsongNi
Ansong Ni
1 year
If you missed this one, I am going to talk about it again in an online seminar @hkust on Monday at 9AM HKT. Thanks @shenjiasi for the invite! More info (w/ zoom link) here:
@AnsongNi
Ansong Ni
1 year
Late advertisement but Iโ€™m giving a talk @MIT_CSAIL on Program Synthesis w/ LLM this afternoon at 4PM EST. This talk is open to everyone so feel free to join in person or over zoom! More info (w/ zoom link) here:
Tweet media one
3
8
106
1
3
14
@AnsongNi
Ansong Ni
6 months
If synthetic data / self-improvement is in the formula for super-intelligence, code and math are probably the first places to see it happen.
0
2
14
@AnsongNi
Ansong Ni
1 year
I thought they are just gonna start charging but discontinuing the API w/ a 3-day notice shows that they have no respect for the research community whatsoeverโ€ฆ
@deliprao
Delip Rao e/ฯƒ
1 year
OAI will discontinue support for Codex models starting March 26. And just like that, all papers and ideas built atop codex (> 200 on ArXiv) will not be replicable or usable as is. Do we still think openness doesnโ€™t matter?
Tweet media one
Tweet media two
48
222
1K
2
0
14
@AnsongNi
Ansong Ni
1 year
My last email from Drago was him saying sorry about missing my talk on Sunday as he needed to get his daughter ready for bed. And when I got the chance to reply, I said "no worries, I will go out on a limb and say I did a great job :)". Little did I know he never got this msg 5/
1
0
14
@AnsongNi
Ansong Ni
2 months
Tweet media one
0
0
13
@AnsongNi
Ansong Ni
1 year
Wanna see a movie after the NeurIPS deadline, but โ€œTransformers: Rise of the Beastsโ€? Sounds like a documentary about LLMs
Tweet media one
0
0
13
@AnsongNi
Ansong Ni
8 months
I wasnโ€™t really buying the whole โ€œmulti-modal LLM is the futureโ€ thing till I used GPT-4V. This is mind blowing, canโ€™t imagine how many use cases are out there.
Tweet media one
1
0
13
@AnsongNi
Ansong Ni
1 year
@WenhuChen NLP + Program Languages / Software Engineering. Time to follow up on your โ€œprogram of thoughtsโ€ paper ๐Ÿ˜‰
1
1
13
@AnsongNi
Ansong Ni
1 year
In addition to the paper, here is a nice photo with @vesko_st and @VictoriaLinML in Menlo Park to commemorate the great summer of 2022 ๐Ÿ˜†
Tweet media one
1
1
13
@AnsongNi
Ansong Ni
4 years
Wait, so no โ€œfrom typing import List, Dict, Tuple and a million othersโ€ anymore?
@svpino
Santiago
4 years
Python 3.9 ๐Ÿ is out! ๐Ÿฅณ Here are the 5 new features you care about. ๐Ÿงต๐Ÿ‘‡
47
2K
7K
1
2
13
@AnsongNi
Ansong Ni
1 year
If one day @gregd_nlp stops making memesโ€ฆ I can sleep at night, knowing that he has passed the skill on to his student @xiye_nlp
@xiye_nlp
Xi Ye
1 year
As an ordinary PhD student studying NLP, I have a mixed feeling about GPT-4. It is certainly disheartening as it makes me question the worth of my own research. But the thrill is too overwhelming :grinning:
Tweet media one
66
158
2K
0
0
13
@AnsongNi
Ansong Ni
1 year
Just wrote a recommendation letter for the first time (in support of a tenure as a student). This feeling of being able to support someone who helped me tremendously in the past is truly great.
0
0
12
@AnsongNi
Ansong Ni
2 months
To help LLMs better understand program execution traces, we propose an inline trace representation, which encodes execution states as update variable values within inline comments. We also add the ordinal numbers "(0) ..." to denote execution order [2/n]
Tweet media one
1
0
12
@AnsongNi
Ansong Ni
1 year
We conducted exps on 4โƒฃ NL2Code tasks and 3โƒฃ code LLMs. Results show that LEVER consistently improves the perf across different LLMs and datasets, while achieving the new SOTA results on all of them using code-davinci-002. (3/n)
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
1
12
@AnsongNi
Ansong Ni
3 months
@agihippo Like "we instruction tuned on xx dataset" and got massive 20% improvements and beats all other OS models. Then after tracking down appendix/citation you realize "xx" is in-domain data generated by GPT-4
2
0
12
@AnsongNi
Ansong Ni
1 year
Wow, we had ~20 attendees in person and 40+ more on zoom. Thanks everyone for dropping by! Also thanks a lot for hosting me, @minimario1729 and Armando! Link to the recordings: (start from minute 27)
Tweet media one
@AnsongNi
Ansong Ni
1 year
Late advertisement but Iโ€™m giving a talk @MIT_CSAIL on Program Synthesis w/ LLM this afternoon at 4PM EST. This talk is open to everyone so feel free to join in person or over zoom! More info (w/ zoom link) here:
Tweet media one
3
8
106
2
2
12
@AnsongNi
Ansong Ni
3 months
@huybery I think itโ€™s easy for people to have 1,000 citations to go from 0->5k followers on twitter than people with 5k followers to get 0->1k citations
1
0
11
@AnsongNi
Ansong Ni
3 years
@lvwerra This is awesome! I am wondering if you've tested how much GPU RAM is it able to use? Since CPU and GPU share the same RAM, it would be wonderful if it's actually able to take advantage of the whole memory.
1
0
11
@AnsongNi
Ansong Ni
8 months
@Swarooprm7
Swaroop Mishra
8 months
Introducing ๐Ÿ”ฅ โ€˜Step back promptingโ€™ ๐Ÿ”ฅ fueled by the power of abstraction. Joint work with the awesome collaborators at @GoogleDeepMind : @HuaixiuZheng , @xinyun_chen_ , @HengTze , @edchi , @quocleix , @denny_zhou . LLMs struggle to answer specific questions such as: โ€œEstella
Tweet media one
6
71
351
1
0
10
@AnsongNi
Ansong Ni
1 year
Yet reviewer 2: I donโ€™t think combining code generation from LLM with code execution is practical in โ€œreal-world applicationsโ€.
@sama
Sam Altman
1 year
we are starting our rollout of ChatGPT plugins. you can install plugins to help with a wide variety of tasks. we are excited to see what developers create!
615
3K
18K
1
0
10
@AnsongNi
Ansong Ni
6 months
My gf @ZiyueZoeyYang got me the best Christmas present for an Xoogler
Tweet media one
0
0
10
@AnsongNi
Ansong Ni
1 year
LEVER is trained to verify the correctness of a program based on the NL input, the program itself and its exec results. Then the verification prob is used in combination with the generation prob to reranks the program candidates sampled from the LLMs (2/n)
Tweet media one
1
2
10
@AnsongNi
Ansong Ni
2 years
May have found the shortest text to make DALL-E fail: "keyboard". Ablations: 1) "computer keyboard" also fails; 2) adding "English" does not help. More suggestions are welcomed ๐Ÿ˜‰
Tweet media one
Tweet media two
Tweet media three
0
0
9
@AnsongNi
Ansong Ni
1 year
To me, the key to #ChatGPT โ€™s wild popularity is not itโ€™s technology innovation but 1) adopting the conversational format; 2) having an open web demo. This makes it possible for everyone to try it just in a dialogue box.
@sundarpichai
Sundar Pichai
1 year
1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applications (LaMDA). Coming soon: Bard, a new experimental conversational #GoogleAI service powered by LaMDA.
743
3K
15K
1
0
9
@AnsongNi
Ansong Ni
5 months
Writing code is much more enjoyable and satisfying than writing papers.
2
0
9
@AnsongNi
Ansong Ni
2 months
As an iterative self-training method, NExT first bootstraps a set of high-quality chain-of-thought rationales, by naturalizing the execution traces into execution-aware CoT rationales written in NL. Then we finetune LLMs on the rationales that lead to correct code outputs [3/n]
Tweet media one
1
0
9
@AnsongNi
Ansong Ni
6 months
Your GPUs might have more FLOPS but mine glow in the dark
Tweet media one
2
0
9
@AnsongNi
Ansong Ni
6 months
POV: start writing cpp again after years of Python-only programming
Tweet media one
1
0
8
@AnsongNi
Ansong Ni
3 months
I sent this pic to my friends to brag and they asked if Iโ€™m testing new code LLMs I created. Idk if I should take this as a compliment or insult๐Ÿ˜‚
Tweet media one
1
0
8
@AnsongNi
Ansong Ni
2 months
Tracking the training process, we found that reasoning with execution traces is crucial for the success of NExT. We also found that learning to reason in natural language not only provides interpretability, but also improves generalization and sample diversity [7/n]
Tweet media one
1
1
8
@AnsongNi
Ansong Ni
1 year
My collaborators would always tell me to remove the arxiv comments before submitting, and I was like "who will be bored enough to dig up latex comments" and here we go ๐Ÿ˜…
You might know that MSFT has released a 154-page paper () on #OpenAI #GPT4 , but do you know they also commented out many parts from the original version? ๐Ÿงต: A thread of hidden information from their latex source code [1/n]
Tweet media one
27
314
1K
0
0
8
@AnsongNi
Ansong Ni
1 year
I would typically write long and thorough reviews but it's just so frustrating to see some random error thrown and 2+ hrs effort gone. @openreviewnet isn't bad, just saying, @aclmeeting . At least it has auto-save.
Tweet media one
2
1
8
@AnsongNi
Ansong Ni
1 year
Sending these tweets on my way to Kigali ๐Ÿ‡ท๐Ÿ‡ผ! Hope to see everyone there! Feel free to DM me if youโ€™d like to chat about program synthesis, LLM + Code, neuro-symbolic methods, and many more! 7/7
Tweet media one
1
1
8
@AnsongNi
Ansong Ni
2 months
We experiment with two program repair (debugging) datasets MBPP-R and HumanEvalFix+, which are MBPP and HE+ re-purposed for program repair. On MBPP-R, NExT improves the program fix rate of PaLM 2-L model by 26.1% and it also yields large improvements on HeFix+ [4/n]
Tweet media one
1
0
8
@AnsongNi
Ansong Ni
1 year
Things like this make me feel life is so unfair. Please consider making donations to help Dragoโ€™s family, any amount will be greatly appreciated.
@noemieelhadad
Noรฉmie Elhadad
1 year
My friend Drago just passed away. He left behind his wife, Axinia, and daughters, Laura and Victoria, who has a disability and requires care. We set up a GoFundMe so that Axinia can provide Victoria with the care she needs. If you can, please contribute:
0
33
45
0
2
7
@AnsongNi
Ansong Ni
2 months
Is anyone actually surprised
Tweet media one
1
0
7
@AnsongNi
Ansong Ni
9 months
๐Ÿ’ฏ๐Ÿ’ฏ๐Ÿ’ฏ โ€œAI is softwareโ€ will also help people understand that AI has vulnerabilities and may malfunction just like any software, and we donโ€™t have to prove a software to be bug-free before deploying it.
@rasbt
Sebastian Raschka
9 months
@ylecun I sometimes wonder if saying AI is software would help in these contexts. Most people nowadays know what software is, and itโ€™s also true that the existence of open-source software has not caused any specifics harm (afaik). On the contrary it helped a lot with scientific progress.
6
1
51
2
0
7
@AnsongNi
Ansong Ni
5 months
Wow when I was preparing a guest lecture about code LLMs, I was like โ€œwe should really have a course on thisโ€ and here we go!
@wellecks
Sean Welleck
5 months
Teaching a new course on Neural Code Generation with @dan_fried ! Here is the lecture on pretraining and scaling laws:
Tweet media one
3
73
406
0
0
7
@AnsongNi
Ansong Ni
11 days
TIL you can sign a letter anonymously
@AndrewCurran_
Andrew Curran
11 days
A group of current, and former, OpenAI employees - some of them anonymous - along with Yoshua Bengio, Geoffrey Hinton, and Stuart Russell have released an open letter this morning entitled 'A Right to Warn about Advanced Artificial Intelligence'.
Tweet media one
90
230
807
0
0
7
@AnsongNi
Ansong Ni
4 months
"This offers a partial answer to the long-standing question: does code training improve reasoning abilities? We believe it does, at least for mathematical reasoning."
@deepseek_ai
DeepSeek
4 months
๐Ÿš€ DeepSeekMath: Approaching Mathematical Reasoning Capability of GPT-4 with a 7B Model. Highlights: - Continue pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math tokens from Common Crawl. - Introduce GRPO, a variant of PPO, that enhances mathematical reasoning and reduces
Tweet media one
22
168
946
0
0
7
@AnsongNi
Ansong Ni
2 years
Just when I thought I canโ€™t be more excited about my internship at @MetaAI this coming summer :) Hope we can get an open source large code LM like codex soon!
@AIatMeta
AI at Meta
2 years
Today Meta AI is sharing OPT-175B, the first 175-billion-parameter language model to be made available to the broader AI research community. OPT-175B can generate creative text on a vast range of topics. Learn more & request access:
50
643
2K
0
0
7
@AnsongNi
Ansong Ni
1 year
โ€œPrompt engineering is AI babysittingโ€ โ€” @tprstly Mightโ€™ve been the best quote Iโ€™ve heard in 2023.
@OfficialLoganK
Logan Kilpatrick
1 year
Hot take ๐Ÿ”ฅ: you should not become a prompt engineer, even if someone paid you to be one. Hereโ€™s why ๐Ÿงต
78
100
761
0
0
7
@AnsongNi
Ansong Ni
1 year
Additional studies show that the learning of LEVER is data efficient and the learned knowledge is transferable across different LLMs for the same task. (5/n)
Tweet media one
Tweet media two
1
1
7
@AnsongNi
Ansong Ni
10 months
@TaliaRinger @DynamicWebPaige An author here๐Ÿ™‹โ€โ™‚๏ธWe actually debated a lot on whether to use the word โ€œverifyโ€ as (formal) verification indeed means very different things in PL. But in the end we feel we need to be consistent with prev works on โ€œverifiersโ€ like
2
0
7
@AnsongNi
Ansong Ni
1 year
Lastly, I want to thank everyone that Iโ€™ve shared a meal, a coffee/drink or even just a conversation. Iโ€™m like 96.2% sure that Iโ€™m the only one from Yale thatโ€™s attending in person this year. Thanks for keeping me company and making me feel Iโ€™m not alone. 4/n,n=4
Tweet media one
0
1
7
@AnsongNi
Ansong Ni
2 months
Has anyone tested DeepSeek-Coder on APPS? Was reviewing a paper and saw DS-Coder *6.7B* is 12% better than GPT-4 on APPS?? But according to DS's paper it's definitely worse than GPT-4 on other benchmarks. Or is it an open secret that DS-Coder is SFT-ed on APPS
3
0
6
@AnsongNi
Ansong Ni
1 year
The majority of this work is completed during a research internship @MetaAI . Had a great time working with my collaborators: @sriniiyer88 @dragomir_radev @vesko_st @scottyih , and my two wonderful co-mentors @sidawxyz and @VictoriaLinML . (8/n, n=8)
1
2
6
@AnsongNi
Ansong Ni
2 years
Was hoping to see an effort like this for so long. Happy to join the project (if theyโ€™ll allow me) and maybe you should too :)
@BigCodeProject
BigCode
2 years
print("Hello world! ๐ŸŽ‰") Excited to announce the BigCode project led by @ServiceNowRSRCH and @huggingface ! In the spirit of BigScience we aim to develop large language models for code in an open and responsible way. Join here: A thread with our goals๐Ÿงต
Tweet media one
5
72
214
0
1
6
@AnsongNi
Ansong Ni
1 year
More results for the InCoder and CodeGen models are shown here. Ablation studies show that exec info is essential for the success of LEVER, and it works seamlessly with weakly-supervised settings w/o large perf drop. (4/n)
Tweet media one
Tweet media two
1
1
6
@AnsongNi
Ansong Ni
1 year
The risk of taking โ€œlow-hanging fruitโ€ in AI4Code research is not longer just getting scooped by other researchers, but also by companyโ€™s new products๐Ÿคฆโ€โ™‚๏ธ So we gotta dream BIG now๐Ÿ˜„
@marktenenholtz
Mark Tenenholtz
1 year
Microsoft is releasing Github Copilot X ๐ŸŽ‰ It includes: โ€ข AI-generated answers from code docs โ€ข Chat interface for code suggestions โ€ข Copilot for the command line โ€ข Voice interface with Copilot โ€ข Copilot for pull requests Okay, NOW it's so over.
63
440
3K
0
0
6