Ansong Ni @AnsongNi Twitter profile | Pikagi

Pikagi

Ansong Ni

@AnsongNi

1,421

Followers

385

Following

73

Media

513

Statuses

Final-year PhD student @Yale , #NLProc , LLM for Code. (ex-)intern @GoogleDeepMind , @MetaAI , @MSFTResearch , @allen_ai . MS from @SCSatCMU . Opinions are my own.

New Haven

https://t.co/2DOeHV5UYu

Joined February 2020

Don't wanna be here? Send us removal request.

Pinned Tweet

@AnsongNi

Ansong Ni

2 months

Excited to share our work at @GoogleDeepMind ! We propose Naturalized Execution Tuning (NExT), a self-training method that drastically improves the LLM's ability to reason about code execution, by learning to inspect execution traces and generate chain-of-thought rationales 🧵👇

Tweet media one

15

123

577

Last Seen Profiles

@BSSMStrain

@subfossilguy

@HAMLIN_70

@Jules21036

@stwmaniax

@stw_pdg

@KW_PSP_GDANSK

@Mikespear26

@pengen_stw

@liumister999

@ZhenH139230

@BaltazarBass970

@apple_chef

@gabisaleh2

@erikaender

@el_taquigrafo

@DamienRactliffe

@just_Ifaf

@guangzhoukeyue2

@aruchu23potemae

@GunnarMathis

@pleindevies

@higginsmommato2

@soumya_pillai

@jandakembangstw

@sakatakunakano

@swdeayy

@Trisha2560

@EdSurge

@niu_pr

@PoliciaEscuela

@_Ragheb_Felitah

@Shiothesnowwolf

@kimutak6

@Govrnmentjudas

@joeshortsqueeze

@AnsongNi

Ansong Ni

5 months

I recently gave a guest lecture (outline below) about LLMs for code and math for the "AI Foundation Models" course at Yale, and I've just made the slides and recordings publicly available: slides: recordings:

Tweet media one

5

61

306

@AnsongNi

Ansong Ni

1 year

Execution results are strong indicators of program correctness. But how can we improve LLMs for code generation with execution? In our new paper, we propose LEVER, a simple method that learns to verify and rerank LLM-generated programs with their execution results. 🧵👇 (1/n)

Tweet media one

3

44

254

@AnsongNi

Ansong Ni

1 year

As one of Drago's current PhD students, I am still in shock and disbelief. Drago means so much more to me than the word "advisor" could ever entail. He is a great mentor, a good friend, and one of the kindest and most down-to-earth people I've ever known 1/

Tweet media one

@hmkyale

Harlan Krumholz

1 year

The #AI community, the #computerscience community, the @YaleSEAS community, and humanity have suddenly lost a remarkable person, @dragomir_radev - kind and brilliant, devoted to his family and friends... gone too soon. A sad day @Yale @YINSedge @YaleCompsci #NLP2023

Tweet media one

Tweet media two

41

87

389

6

12

250

@AnsongNi

Ansong Ni

9 months

How good are current LLMs at translating natural language into executable code? Introducing L2CEval, where we benchmark language-to-code (L2C) generation abilities of 54 models from 12 orgs, testing on 7 tasks from 3 core domains. Here is what we found in this first release of

Tweet media one

6

50

241

@AnsongNi

Ansong Ni

1 year

Late advertisement but I’m giving a talk @MIT_CSAIL on Program Synthesis w/ LLM this afternoon at 4PM EST. This talk is open to everyone so feel free to join in person or over zoom! More info (w/ zoom link) here:

Tweet media one

3

8

106

@AnsongNi

Ansong Ni

1 year

While the same NL specs can often be satisfied by different programs, most datasets only provides one for learning. This could easily lead to overfitting as the figure shown below. In our new #ICLR2023 paper, we show how we can mitigate this issue with self-sampling 🧵1/7

Tweet media one

4

15

82

@AnsongNi

Ansong Ni

2 months

I know I’m being bullish with tokenizers but this one really made my day

Tweet media one

2

7

75

@AnsongNi

Ansong Ni

3 months

HumanEval & MBPP are top datasets in evaluating LLMs for code. Despite common suspicion of contamination, quantifying it is hard as it would require massive pairwise comparison between the examples in datasets and the pretraining corpus – and we’ve done exactly this🧵👇 In this

Tweet media one

1

13

68

@AnsongNi

Ansong Ni

1 year

#ICLR2023 is my first in-person conference post-covid and it’s the most rewarding experience ever. I reunited with my friends, collaborators, mentors… some of which I haven’t seen in years and some I actually met in person for the first time. 1/n

Tweet media one

Tweet media two

Tweet media three

2

6

64

@AnsongNi

Ansong Ni

1 year

Glad to share that LEVER is accepted to @icmlconf and I’m attending #ICML2023 in person in Hawaii 🏖️

@AnsongNi

Ansong Ni

1 year

Execution results are strong indicators of program correctness. But how can we improve LLMs for code generation with execution? In our new paper, we propose LEVER, a simple method that learns to verify and rerank LLM-generated programs with their execution results. 🧵👇 (1/n)

Tweet media one

3

44

254

2

7

60

@AnsongNi

Ansong Ni

3 months

ICML/ACL reviewing thoughts: It’s crazy how creative people are in disguising the fact that they simply distill from GPT-4

4

5

59

@AnsongNi

Ansong Ni

10 months

This is my last week as an intern @GoogleDeepMind . The past summer was nothing but inspiring. Thanks everyone in the learning for code team, for making me feel so welcomed since day one. Next, I’ll be on the job market later this year, stay tuned!

Tweet media one

Tweet media two

1

0

59

@AnsongNi

Ansong Ni

1 month

Happy to share that NExT is accepted to #ICML2024 !

@AnsongNi

Ansong Ni

2 months

Excited to share our work at @GoogleDeepMind ! We propose Naturalized Execution Tuning (NExT), a self-training method that drastically improves the LLM's ability to reason about code execution, by learning to inspect execution traces and generate chain-of-thought rationales 🧵👇

Tweet media one

15

123

577

1

4

57

@AnsongNi

Ansong Ni

1 year

Seems like a good time to share that I am joining Google Brain (now Google DeepMind) as a research intern this summer! I will be working on code generation + LLM with @pengchengyin in @RandomlyWalking ’s team.

@demishassabis

Demis Hassabis

1 year

The phenomenal teams from Google Research’s Brain and @DeepMind have made many of the seminal research advances that underpin modern AI, from Deep RL to Transformers. Now we’re joining forces as a single unit, Google DeepMind, which I’m thrilled to lead!

159

654

4K

2

1

56

@AnsongNi

Ansong Ni

1 year

OpenAI announced that Codex API will be discontinued in 3 days. Me in the middle of a Codex-related project is like:

Tweet media one

2

4

55

@AnsongNi

Ansong Ni

10 months

When you do a PhD, don’t do it to impress others, do it only when it brings you closer to your dream

6

5

55

@AnsongNi

Ansong Ni

11 months

This is how I “hacked” the seat back screen on a Boeing 737 to be an external monitor for my MacBook:

Tweet media one

1

2

52

@AnsongNi

Ansong Ni

1 year

So… In 2021, Codex was out during the 1st month of my internship @MSFTResearch ; In 2022, OPT was released right before my internship @MetaAI ; Now in 2023, PaLM 2 is out 3 weeks before my internship @DeepMind I surely know how to appear in the right place at the right time😆

@DynamicWebPaige

👩‍💻 Paige Bailey

@DynamicWebPaige

1 year

📄 Make sure to check out the technical report for fun examples, details about building the model, and more: #GoogleIO #GoogleIO2023 😭 Am also tearing up a little. So dang proud of this awesome team and excited to continue this work at Google DeepMind:

Tweet media one

9

11

159

4

1

46

@AnsongNi

Ansong Ni

3 years

Switching from one library to another (e.g., tf->torch) and tired of the manual refactoring needs to be done? We aim to solve this problem with our new @ICSEconf #icse21 paper: "SOAR: A Synthesis Approach for Data Science API Refactoring". [1/4]

Tweet media one

1

9

39

@AnsongNi

Ansong Ni

7 months

Does anyone know if there is any work on “watermarking” a dataset so that we know whether a model is trained on it? Similar to maybe?

Tweet card media

A Watermark for Large Language Models

Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from...

7

3

37

@AnsongNi

Ansong Ni

11 months

Hey cool people at #ICML2023 : We will present our poster tomorrow (Thu) from 1:30PM to whenever it takes! Come and chat with me, @VictoriaLinML and @sidawxyz if you’re interested in code generation, LLM or training verifiers!

Tweet media one

@AnsongNi

Ansong Ni

1 year

Execution results are strong indicators of program correctness. But how can we improve LLMs for code generation with execution? In our new paper, we propose LEVER, a simple method that learns to verify and rerank LLM-generated programs with their execution results. 🧵👇 (1/n)

Tweet media one

3

44

254

0

5

36

@AnsongNi

Ansong Ni

1 year

At last, I want to say that I will be forever proud to be Drago's student, and I will do my best to honor his legacy for the many years to come. 8/8

0

0

36

@AnsongNi

Ansong Ni

11 months

With #ACL2023NLP wrapping up, it's time to warm up for #ICML2023 ! Check out the online demo for our ICML paper, "LEVER: Learning to Verify Language-to-Code Generation using Execution", now available on🤗spaces! We also release code, model weights, and more in 🧵👇:

2

4

33

@AnsongNi

Ansong Ni

11 months

Arrived! I’ll be at ICML till the 30th, let me know if you’d like to meet and talk about code LLM, code AI and more! DMs are welcomed!

Tweet media one

0

1

30

@AnsongNi

Ansong Ni

5 months

Btw, if you haven't, check out this great course by @armancohan on AI foundation models, which covers a wide range of topics about LLMs (e.g., PET, RAG, etc). All course materials (slides, notes, hws, code) are publicly available. course website:

3

7

29

@AnsongNi

Ansong Ni

1 year

When I applied to PhD programs a few years ago, I got rejected from 14/15 schools I applied to, and Drago is the only one who accepted me. And it turns out I was not the only one in our lab. He always sees the best in the students and offers opportunities whenever he can 2/

1

0

29

@AnsongNi

Ansong Ni

2 years

Been trying out #ChatGPT today and honestly I am not very impressed. Many known issues for GPT-3 still remains. Here is my favorite failure case where it shows no logic in its reasoning. More in the 🧵 below (1/7):

Tweet media one

6

6

27

@AnsongNi

Ansong Ni

1 year

Hey cool people at #ICLR2023 ! We are presenting this work in the poster room at station #26 today (5.3) from 11:30AM to whenever it takes! Come and talk with us about the paper, program synthesis, LLM and more! More info:

@AnsongNi

Ansong Ni

1 year

While the same NL specs can often be satisfied by different programs, most datasets only provides one for learning. This could easily lead to overfitting as the figure shown below. In our new #ICLR2023 paper, we show how we can mitigate this issue with self-sampling 🧵1/7

Tweet media one

4

15

82

0

2

24

@AnsongNi

Ansong Ni

1 year

I was deeply touched by the number of people sharing their stories with Drago, realizing that his influence was far beyond my imagination, which is why I took the courage to share mine. As his PhD student, I will continue his research and more importantly, spread his kindness 7/

1

0

25

@AnsongNi

Ansong Ni

11 months

Okay, the hack is to fold the cover of your iPad and insert it into the crack on top of the seat back screen, and wirelessly connect to your Mac using sidecar, works like 82% of the time.

0

0

25

@AnsongNi

Ansong Ni

7 months

if you value paper acceptance above all other qualities of your work, you’re gonna have a bad time

0

2

23

@AnsongNi

Ansong Ni

3 years

In multi-doc and open-domain QA, the correct answer can often be derived from different sources of evidence, but typically only one is annotated as gold. How does this affect the training of retrieval and reasoning models? Check out our new #EMNLP2021 paper (a thread):👇

Tweet media one

1

2

22

@AnsongNi

Ansong Ni

6 months

Loving my new @GoogleDeepMind swags 🤩🤩🤩 It’s an open secret that free swags boost employee satisfaction and productivity by 200%

Tweet media one

0

0

21

@AnsongNi

Ansong Ni

2 months

Paper link: Also shout-out to my wonderful collaborators: @miltos1 , @armancohan , @yinlin_deng , @kensen_shi , @RandomlyWalking , @pengchengyin , and other members of Learning for Code team at Google DeepMind :) [9/n, n=9]

Tweet card media

NExT: Teaching Large Language Models to Reason about Code Execution

A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to...

0

1

19

@AnsongNi

Ansong Ni

3 months

The first thing I do whenever a new "SoTA" model is released is testing its ability to reason about program execution. Unfortunately, Claude 3 Opus (large) can't even reason about the simplest program. But GPT-4 does this quite well. Claude (left) vs. GPT (right)

Tweet media one

Tweet media two

1

1

19

@AnsongNi

Ansong Ni

1 year

He offered me this opportunity and believed in me. And I have been working hard ever since, trying to repay his trust and prove that he did not place his bet wrong. And as I have only 1 yr left in my PhD, it really pains me to think that I won't see him at the finish line 3/

1

0

19

@AnsongNi

Ansong Ni

3 years

One year after joining the lab, I was finally able to meet my awesome advisor and lab mates 😆 @ruizhang_nlp @taoyds @alexfabbri4 @Violaoreal #VaccinesWork

Tweet media one

0

0

18

@AnsongNi

Ansong Ni

1 year

It pains me to see the papers he sent in my inbox just days ago. It also pains me to think the paper I am writing will be the last paper I coauthor with him. However, I think his greatest legacy is not the papers he published, but the people he influenced throughout the years 6/

1

0

18

@AnsongNi

Ansong Ni

1 year

Quite a lot are missing: @MetaAI InCoder @Replit Replit-code @MSFTResearch MIM @Tsinghua_Uni CodeGeeX @GoogleAI PaLM-Coder/Bard for code Also, @BigscienceW BLOOM, @MetaAI LLaMA series, @AiEleuther Pythia/GPT-NeoX/J, @AnthropicAI Claude all have non-trivial coding ability

@TheTuringPost

TuringPost

1 year

8 options for code generation: ▪️ @huggingface & @ServiceNowRSRCH StarCoder ▪️ @DeepMind AlphaCode ▪️ @AmazonScience CodeWhisperer ▪️ @OpenAI Codex & @github copilot ▪️ @OpenAI ChatGPT ▪️ @salesforce Codegen ▪️ @salesforce CodeT5 ▪️ @VHellendoorn Polycoder ▪️ @tabnine Did we

Tweet media one

5

23

97

1

1

17

@AnsongNi

Ansong Ni

1 year

During our meetings, I would talk to Drago about anything, from research to career advice, from soccer to rock bands. He would show me videos of him being a translator at 1994 world cup, and I would share the videos of me playing rock guitar solos 4/

1

0

17

@AnsongNi

Ansong Ni

7 months

So you can see your fellow reviewers’ names for ICLR this year? This must have taken those self-citers for “missing references” by surprise lol. Amazed by the amount of people who actually do that.

2

1

16

@AnsongNi

Ansong Ni

9 months

L2CEval is very much an ongoing work but I simply can't keep those results to myself any longer😉 Behind each number is a jsonl file that saves all the output tokens and logits, and we are doing more digging as we speak🤩 So let us know if you think of something interesting!

Tweet media one

2

0

16

@AnsongNi

Ansong Ni

3 years

Excited to finally share about our new summarization toolkit - SummerTime, which is accepted to #EMNLP2021 Demos! GitHub(100+⭐️): We built this library specifically for non-expert users, with several merits, for several reasons (a thread 🧵):

Tweet media one

1

4

16

@AnsongNi

Ansong Ni

11 months

“Whenever” ended up being 3:45PM 😵Thanks everyone who stopped by our poster yesterday! If you missed it but still would like to know more about this work, feel free to DM me!

Tweet media one

Tweet media two

@AnsongNi

Ansong Ni

11 months

Hey cool people at #ICML2023 : We will present our poster tomorrow (Thu) from 1:30PM to whenever it takes! Come and chat with me, @VictoriaLinML and @sidawxyz if you’re interested in code generation, LLM or training verifiers!

Tweet media one

0

5

36

2

0

16

@AnsongNi

Ansong Ni

11 days

The AI safety I’m worried about: * Self-driving car crashes * Robot loses grip of a knife when cooking * AI wrote a bug in rocket launching software * False-negative diagnosis of diseases The AI safety I’m not worried about: * A language model going rouge plotting against me

0

2

15

@AnsongNi

Ansong Ni

4 years

#AAAI2020 I will be giving a 20min oral presentation of our work #8812 “Merging Weak and Active Supervision for Semantic Parsing” on 12th (Wed), 15:50 at Sutton South room. Joint work with @pengchengyin and @gneubig . Paper link:

0

3

15

@AnsongNi

Ansong Ni

4 years

I had the pleasure working with Graham during my CMU days. Btw, I had zero NLP experience when we got started, so you know Graham means it when he says “all backgrounds are welcomed” :)

@gneubig

Graham Neubig

4 years

Next year I will be looking for 1-2 PhD students who are interested in doing deep and impactful work on NLP! (areas are open, but I like multilingual NLP/compling, natural language interfaces, ML for NLP) Please apply below and mention me in your app: 1/2

5

88

252

0

0

14

@AnsongNi

Ansong Ni

1 year

If you missed this one, I am going to talk about it again in an online seminar @hkust on Monday at 9AM HKT. Thanks @shenjiasi for the invite! More info (w/ zoom link) here:

@AnsongNi

Ansong Ni

1 year

Late advertisement but I’m giving a talk @MIT_CSAIL on Program Synthesis w/ LLM this afternoon at 4PM EST. This talk is open to everyone so feel free to join in person or over zoom! More info (w/ zoom link) here:

Tweet media one

3

8

106

1

3

14

@AnsongNi

Ansong Ni

6 months

If synthetic data / self-improvement is in the formula for super-intelligence, code and math are probably the first places to see it happen.

0

2

14

@AnsongNi

Ansong Ni

1 year

I thought they are just gonna start charging but discontinuing the API w/ a 3-day notice shows that they have no respect for the research community whatsoever…

@deliprao

Delip Rao e/σ

1 year

OAI will discontinue support for Codex models starting March 26. And just like that, all papers and ideas built atop codex (> 200 on ArXiv) will not be replicable or usable as is. Do we still think openness doesn’t matter?

Tweet media one

Tweet media two

48

222

1K

2

0

14

@AnsongNi

Ansong Ni

1 year

My last email from Drago was him saying sorry about missing my talk on Sunday as he needed to get his daughter ready for bed. And when I got the chance to reply, I said "no worries, I will go out on a limb and say I did a great job :)". Little did I know he never got this msg 5/

1

0

14

@AnsongNi

Ansong Ni

2 months

Tweet media one

0

0

13

@AnsongNi

Ansong Ni

1 year

Wanna see a movie after the NeurIPS deadline, but “Transformers: Rise of the Beasts”? Sounds like a documentary about LLMs

Tweet media one

0

0

13

@AnsongNi

Ansong Ni

8 months

I wasn’t really buying the whole “multi-modal LLM is the future” thing till I used GPT-4V. This is mind blowing, can’t imagine how many use cases are out there.

Tweet media one

1

0

13

@AnsongNi

Ansong Ni

1 year

@WenhuChen NLP + Program Languages / Software Engineering. Time to follow up on your “program of thoughts” paper 😉

1

1

13

@AnsongNi

Ansong Ni

1 year

In addition to the paper, here is a nice photo with @vesko_st and @VictoriaLinML in Menlo Park to commemorate the great summer of 2022 😆

Tweet media one

1

1

13

@AnsongNi

Ansong Ni

4 years

Wait, so no “from typing import List, Dict, Tuple and a million others” anymore?

@svpino

Santiago

4 years

Python 3.9 🐍 is out! 🥳 Here are the 5 new features you care about. 🧵👇

47

2K

7K

1

2

13

@AnsongNi

Ansong Ni

1 year

If one day @gregd_nlp stops making memes… I can sleep at night, knowing that he has passed the skill on to his student @xiye_nlp

@xiye_nlp

Xi Ye

1 year

As an ordinary PhD student studying NLP, I have a mixed feeling about GPT-4. It is certainly disheartening as it makes me question the worth of my own research. But the thrill is too overwhelming :grinning:

Tweet media one

66

158

2K

0

0

13

@AnsongNi

Ansong Ni

1 year

Just wrote a recommendation letter for the first time (in support of a tenure as a student). This feeling of being able to support someone who helped me tremendously in the past is truly great.

0

0

12

@AnsongNi

Ansong Ni

2 months

To help LLMs better understand program execution traces, we propose an inline trace representation, which encodes execution states as update variable values within inline comments. We also add the ordinal numbers "(0) ..." to denote execution order [2/n]

Tweet media one

1

0

12

@AnsongNi

Ansong Ni

1 year

We conducted exps on 4⃣ NL2Code tasks and 3⃣ code LLMs. Results show that LEVER consistently improves the perf across different LLMs and datasets, while achieving the new SOTA results on all of them using code-davinci-002. (3/n)

Tweet media one

Tweet media two

Tweet media three

Tweet media four

1

1

12

@AnsongNi

Ansong Ni

3 months

@agihippo Like "we instruction tuned on xx dataset" and got massive 20% improvements and beats all other OS models. Then after tracking down appendix/citation you realize "xx" is in-domain data generated by GPT-4

2

0

12

@AnsongNi

Ansong Ni

1 year

Wow, we had ~20 attendees in person and 40+ more on zoom. Thanks everyone for dropping by! Also thanks a lot for hosting me, @minimario1729 and Armando! Link to the recordings: (start from minute 27)

Tweet media one

@AnsongNi

Ansong Ni

1 year

Late advertisement but I’m giving a talk @MIT_CSAIL on Program Synthesis w/ LLM this afternoon at 4PM EST. This talk is open to everyone so feel free to join in person or over zoom! More info (w/ zoom link) here:

Tweet media one

3

8

106

2

2

12

@AnsongNi

Ansong Ni

3 months

@huybery I think it’s easy for people to have 1,000 citations to go from 0->5k followers on twitter than people with 5k followers to get 0->1k citations

1

0

11

@AnsongNi

Ansong Ni

3 years

@lvwerra This is awesome! I am wondering if you've tested how much GPU RAM is it able to use? Since CPU and GPU share the same RAM, it would be wonderful if it's actually able to take advantage of the whole memory.

1

0

11

@AnsongNi

Ansong Ni

8 months

@Swarooprm7

Swaroop Mishra

8 months

Introducing 🔥 ‘Step back prompting’ 🔥 fueled by the power of abstraction. Joint work with the awesome collaborators at @GoogleDeepMind : @HuaixiuZheng , @xinyun_chen_ , @HengTze , @edchi , @quocleix , @denny_zhou . LLMs struggle to answer specific questions such as: “Estella

Tweet media one

6

71

351

1

0

10

@AnsongNi

Ansong Ni

1 year

Yet reviewer 2: I don’t think combining code generation from LLM with code execution is practical in “real-world applications”.

@sama

Sam Altman

1 year

we are starting our rollout of ChatGPT plugins. you can install plugins to help with a wide variety of tasks. we are excited to see what developers create!

615

3K

18K

1

0

10

@AnsongNi

Ansong Ni

1 year

Paper available on arxiv: Code will be available soon! (7/n)

Tweet card media

LEVER: Learning to Verify Language-to-Code Generation with Execution

The advent of large language models trained on code (code LLMs) has led to significant progress in language-to-code generation. State-of-the-art approaches in this area combine LLM decoding with...

3

1

10

@AnsongNi

Ansong Ni

6 months

My gf @ZiyueZoeyYang got me the best Christmas present for an Xoogler

Tweet media one

0

0

10

@AnsongNi

Ansong Ni

1 year

LEVER is trained to verify the correctness of a program based on the NL input, the program itself and its exec results. Then the verification prob is used in combination with the generation prob to reranks the program candidates sampled from the LLMs (2/n)

Tweet media one

1

2

10

@AnsongNi

Ansong Ni

2 years

May have found the shortest text to make DALL-E fail: "keyboard". Ablations: 1) "computer keyboard" also fails; 2) adding "English" does not help. More suggestions are welcomed 😉

Tweet media one

Tweet media two

Tweet media three

0

0

9

@AnsongNi

Ansong Ni

1 year

To me, the key to #ChatGPT ’s wild popularity is not it’s technology innovation but 1) adopting the conversational format; 2) having an open web demo. This makes it possible for everyone to try it just in a dialogue box.

@sundarpichai

Sundar Pichai

1 year

1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applications (LaMDA). Coming soon: Bard, a new experimental conversational #GoogleAI service powered by LaMDA.

743

3K

15K

1

0

9

@AnsongNi

Ansong Ni

5 months

Writing code is much more enjoyable and satisfying than writing papers.

2

0

9

@AnsongNi

Ansong Ni

2 months

As an iterative self-training method, NExT first bootstraps a set of high-quality chain-of-thought rationales, by naturalizing the execution traces into execution-aware CoT rationales written in NL. Then we finetune LLMs on the rationales that lead to correct code outputs [3/n]

Tweet media one

1

0

9

@AnsongNi

Ansong Ni

6 months

Your GPUs might have more FLOPS but mine glow in the dark

Tweet media one

2

0

9

@AnsongNi

Ansong Ni

6 months

POV: start writing cpp again after years of Python-only programming

Tweet media one

1

0

8

@AnsongNi

Ansong Ni

3 months

I sent this pic to my friends to brag and they asked if I’m testing new code LLMs I created. Idk if I should take this as a compliment or insult😂

Tweet media one

1

0

8

@AnsongNi

Ansong Ni

2 months

Tracking the training process, we found that reasoning with execution traces is crucial for the success of NExT. We also found that learning to reason in natural language not only provides interpretability, but also improves generalization and sample diversity [7/n]

Tweet media one

1

1

8

@AnsongNi

Ansong Ni

1 year

My collaborators would always tell me to remove the arxiv comments before submitting, and I was like "who will be bored enough to dig up latex comments" and here we go 😅

@DV2559106965076

DV

@DV2559106965076

1 year

You might know that MSFT has released a 154-page paper () on #OpenAI #GPT4 , but do you know they also commented out many parts from the original version? 🧵: A thread of hidden information from their latex source code [1/n]

Tweet media one

27

314

1K

0

0

8

@AnsongNi

Ansong Ni

1 year

I would typically write long and thorough reviews but it's just so frustrating to see some random error thrown and 2+ hrs effort gone. @openreviewnet isn't bad, just saying, @aclmeeting . At least it has auto-save.

Tweet media one

2

1

8

@AnsongNi

Ansong Ni

1 year

Sending these tweets on my way to Kigali 🇷🇼! Hope to see everyone there! Feel free to DM me if you’d like to chat about program synthesis, LLM + Code, neuro-symbolic methods, and many more! 7/7

Tweet media one

1

1

8

@AnsongNi

Ansong Ni

2 months

We experiment with two program repair (debugging) datasets MBPP-R and HumanEvalFix+, which are MBPP and HE+ re-purposed for program repair. On MBPP-R, NExT improves the program fix rate of PaLM 2-L model by 26.1% and it also yields large improvements on HeFix+ [4/n]

Tweet media one

1

0

8

@AnsongNi

Ansong Ni

1 year

Things like this make me feel life is so unfair. Please consider making donations to help Drago’s family, any amount will be greatly appreciated.

@noemieelhadad

Noémie Elhadad

1 year

My friend Drago just passed away. He left behind his wife, Axinia, and daughters, Laura and Victoria, who has a disability and requires care. We set up a GoFundMe so that Axinia can provide Victoria with the care she needs. If you can, please contribute:

0

33

45

0

2

7

@AnsongNi

Ansong Ni

2 months

Is anyone actually surprised

Tweet media one

1

0

7

@AnsongNi

Ansong Ni

9 months

💯💯💯 “AI is software” will also help people understand that AI has vulnerabilities and may malfunction just like any software, and we don’t have to prove a software to be bug-free before deploying it.

@rasbt

Sebastian Raschka

9 months

@ylecun I sometimes wonder if saying AI is software would help in these contexts. Most people nowadays know what software is, and it’s also true that the existence of open-source software has not caused any specifics harm (afaik). On the contrary it helped a lot with scientific progress.

6

1

51

2

0

7

@AnsongNi

Ansong Ni

5 months

Wow when I was preparing a guest lecture about code LLMs, I was like “we should really have a course on this” and here we go!

@wellecks

Sean Welleck

5 months

Teaching a new course on Neural Code Generation with @dan_fried ! Here is the lecture on pretraining and scaling laws:

Tweet media one

3

73

406

0

0

7

@AnsongNi

Ansong Ni

11 days

TIL you can sign a letter anonymously

@AndrewCurran_

Andrew Curran

11 days

A group of current, and former, OpenAI employees - some of them anonymous - along with Yoshua Bengio, Geoffrey Hinton, and Stuart Russell have released an open letter this morning entitled 'A Right to Warn about Advanced Artificial Intelligence'.

Tweet media one

90

230

807

0

0

7

@AnsongNi

Ansong Ni

4 months

"This offers a partial answer to the long-standing question: does code training improve reasoning abilities? We believe it does, at least for mathematical reasoning."

@deepseek_ai

DeepSeek

4 months

🚀 DeepSeekMath: Approaching Mathematical Reasoning Capability of GPT-4 with a 7B Model. Highlights: - Continue pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math tokens from Common Crawl. - Introduce GRPO, a variant of PPO, that enhances mathematical reasoning and reduces

Tweet media one

22

168

946

0

0

7

@AnsongNi

Ansong Ni

2 years

Just when I thought I can’t be more excited about my internship at @MetaAI this coming summer :) Hope we can get an open source large code LM like codex soon!

@AIatMeta

AI at Meta

2 years

Today Meta AI is sharing OPT-175B, the first 175-billion-parameter language model to be made available to the broader AI research community. OPT-175B can generate creative text on a vast range of topics. Learn more & request access:

50

643

2K

0

0

7

@AnsongNi

Ansong Ni

1 year

“Prompt engineering is AI babysitting” — @tprstly Might’ve been the best quote I’ve heard in 2023.

@OfficialLoganK

Logan Kilpatrick

@OfficialLoganK

1 year

Hot take 🔥: you should not become a prompt engineer, even if someone paid you to be one. Here’s why 🧵

78

100

761

0

0

7

@AnsongNi

Ansong Ni

1 year

Additional studies show that the learning of LEVER is data efficient and the learned knowledge is transferable across different LLMs for the same task. (5/n)

Tweet media one

Tweet media two

1

1

7

@AnsongNi

Ansong Ni

10 months

@TaliaRinger @DynamicWebPaige An author here🙋‍♂️We actually debated a lot on whether to use the word “verify” as (formal) verification indeed means very different things in PL. But in the end we feel we need to be consistent with prev works on “verifiers” like

Tweet card media

Training Verifiers to Solve Math Word Problems

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current...

2

0

7

@AnsongNi

Ansong Ni

1 year

Lastly, I want to thank everyone that I’ve shared a meal, a coffee/drink or even just a conversation. I’m like 96.2% sure that I’m the only one from Yale that’s attending in person this year. Thanks for keeping me company and making me feel I’m not alone. 4/n,n=4

Tweet media one

0

1

7

@AnsongNi

Ansong Ni

2 months

Has anyone tested DeepSeek-Coder on APPS? Was reviewing a paper and saw DS-Coder *6.7B* is 12% better than GPT-4 on APPS?? But according to DS's paper it's definitely worse than GPT-4 on other benchmarks. Or is it an open secret that DS-Coder is SFT-ed on APPS

3

0

6

@AnsongNi

Ansong Ni

1 year

The majority of this work is completed during a research internship @MetaAI . Had a great time working with my collaborators: @sriniiyer88 @dragomir_radev @vesko_st @scottyih , and my two wonderful co-mentors @sidawxyz and @VictoriaLinML . (8/n, n=8)

1

2

6

@AnsongNi

Ansong Ni

2 years

Was hoping to see an effort like this for so long. Happy to join the project (if they’ll allow me) and maybe you should too :)

@BigCodeProject

BigCode

@BigCodeProject

2 years

print("Hello world! 🎉") Excited to announce the BigCode project led by @ServiceNowRSRCH and @huggingface ! In the spirit of BigScience we aim to develop large language models for code in an open and responsible way. Join here: A thread with our goals🧵

Tweet media one

5

72

214

0

1

6

@AnsongNi

Ansong Ni

1 year

More results for the InCoder and CodeGen models are shown here. Ablation studies show that exec info is essential for the success of LEVER, and it works seamlessly with weakly-supervised settings w/o large perf drop. (4/n)

Tweet media one

Tweet media two

1

1

6

@AnsongNi

Ansong Ni

1 year

The risk of taking “low-hanging fruit” in AI4Code research is not longer just getting scooped by other researchers, but also by company’s new products🤦‍♂️ So we gotta dream BIG now😄

@marktenenholtz

Mark Tenenholtz

@marktenenholtz

1 year

Microsoft is releasing Github Copilot X 🎉 It includes: • AI-generated answers from code docs • Chat interface for code suggestions • Copilot for the command line • Voice interface with Copilot • Copilot for pull requests Okay, NOW it's so over.

63

440

3K

0

0

6

@AnsongNi

Ansong Ni

1 year

Ah, forgot to mention the most important things😅: link to paper (); code is also available here () 8/7

Tweet card media

Learning Math Reasoning from Self-Sampled Correct and...

Pretrained language models have shown superior performance on many natural language processing tasks, yet they still struggle at multi-step formal reasoning tasks like grade school math problems....

0

1

6