Ashwinee Panda @PandaAshwinee Twitter profile

Pinned Tweet

Ashwinee Panda

@PandaAshwinee

2 months

Some cool stuff is coming, stay tuned =)

Jan Leike

@janleike

2 months

The superalignment fast grants are now decided! We got a *ton* of really strong applications, so unfortunately we had to say no to many we're very excited about. There is still so much good research waiting to be funded. Congrats to all recipients!

18

22

379

6

2

149

Last Seen Profiles

@BerlinNANDAMURI

@andrew_mummert

@MarkusBillions

@socialistsuresh

@Aten73110311

@stwmaniax

@TheSafeAnnaAnon

@nijitanbirthday

@leslie_bit

@Emilyyyyy8888

@SYNscaner

@GGM_SOARS

@MaciekGramatyka

@PajtimStatovci

@arsicika22

@stw_pdg

@jandakembangstw

@koho_yahiko

@RayMirandaRocks

@dHe8abDeEK51033

@TheSafeAnnaAnon

@paniaofthekeef

@RealDWillis

@csfelementary

@2couple1

@MarcusSantoroDJ

@NRCS_AssistSup

@lauren_minette

@PenzRob

@Somaalilanders

@meganeichiba_cp

@MistressSkylarS

@swindonadver

@ofcourselemans

@5211314_bit

@cult_leader_en

Ashwinee Panda

@PandaAshwinee

3 months

Mfs will get a setup like this and then ship the best cv paper you've ever seen

dan

@maybe_dan_

3 months

Mfs will get a setup like this and then ship the most ass code you've ever seen

197

734

15K

10

9

293

Ashwinee Panda

@PandaAshwinee

2 months

All these LLM watermarking / detection papers being written and the best tool we have is ctrl+f “delve”

Jeremy Nguyen ✍🏼 🚢

@JeremyNguyenPhD

2 months

Are medical studies being written with ChatGPT? Well, we all know ChatGPT overuses the word "delve". Look below at how often the word 'delve' is used in papers on PubMed (2023 was the first full year of ChatGPT).

411

2K

13K

3

15

202

Ashwinee Panda

@PandaAshwinee

6 months

@jxmnop we had alec and ilya give guest lectures in @pabbeel 's grad class in 2019 and alec's lecture on language models was more useful than the entirety of cal's nlp class

Lecture 10 - Alec Radford - Language Models and Their Uses.pdf

drive.google.com

3

8

167

Ashwinee Panda

@PandaAshwinee

7 months

@yacineMTB the pizza tracking system is a literal fugazi btw and the order interface isn’t hard to build, you pick a location first and make your order and the location gets a ticket

2

0

107

Ashwinee Panda

@PandaAshwinee

1 year

@DimitrisPapail I can’t decide whether this trivializes the DM result more or is an impressive showcase of GPT4’s abilities. To the latter I’m pretty sure that the optimized sort3/4/5 are in the training data but never pushed to libc bc of how long it takes to push changes there.

6

1

101

Ashwinee Panda

@PandaAshwinee

2 months

@DimitrisPapail I would have given the same response as chatgpt lol

2

0

88

Ashwinee Panda

@PandaAshwinee

4 months

"Teach LLMs to Phish: Stealing Private Information from Language Models" has been accepted at #ICLR2024 ! We introduce Neural Phishing attacks that can enable adversaries to extract information like credit card #s from LLMs with success as high as 80%.

3

18

72

Ashwinee Panda

@PandaAshwinee

1 year

@nabeelqu This is the kind of thing that sounds really cool at first and the more you think about it the more you realize it’s pretty much impossible. Watermarking generative images? Maybe, but you can just feed it into an open source model whose only purpose would be to shake any mark.

0

66

Ashwinee Panda

@PandaAshwinee

2 months

pov: you are reading a paper that rolled their own prompts instead of using lm-eval-harness

Abhi Venigalla

@abhi_venigalla

3 months

What is going on with arc-challenge evals? Lots of great new models report scores in the high 80s-90s in their blogs. But then OSS eval frameworks like @AiEleuther harness and @MosaicML gauntlet seem to report lower scores... Clearest example is Mixtral-8x7B: * blog post: 0.858

1

6

51

1

7

64

Ashwinee Panda

@PandaAshwinee

4 months

"Privacy-Preserving In-Context Learning for Large Language Models" has been accepted at #ICLR2024 ! We present a method for generating sentences from in-context learning while keeping the in-context exemplars differentially private, that can be applied to blackbox APIs (ex RAG)

3

7

64

Ashwinee Panda

@PandaAshwinee

6 months

@jeremyphoward minus points for not maintaining the misspelling of literally

1

0

52

Ashwinee Panda

@PandaAshwinee

1 month

@jxmnop “these people” bro that is your advisor

0

2

53

Ashwinee Panda

@PandaAshwinee

2 months

Hot take: this is a good thing. Paper submission is a far more even playing field than the typical ECs that privileged HS kids participate in to pad their resumes these days. I'm looking forward to meeting these high schoolers at NeurIPS 2024!

Gautam Kamath

@thegautamkamath

2 months

NeurIPS 2024 will have a track for papers from high schoolers.

79

91

597

12

2

51

Ashwinee Panda

@PandaAshwinee

4 months

@jxmnop bro this is heinous

2

0

52

Ashwinee Panda

@PandaAshwinee

1 month

@tamaybes This reminds me of an eye-opening post by @suchenzang

Susan Zhang

@suchenzang

1 year

After ignoring the details in all these "lets-fit-a-cloud-of-points-to-a-single-line" papers (all likely wrong when you really extrapolate), @stephenroller finally convinced me to work through the math in the Chinchilla paper and as expected, this was a doozy. [1/7]

5

46

309

0

1

46

Ashwinee Panda

@PandaAshwinee

14 days

@francoisfleuret They trained the model on chunks of 2M by using ring attention which is a system optimization (not an approximation). However, because it’s hard to find 2M token documents, the context may not be very good, as it’s just a lot of shorter documents stitched.

Ring Attention with Blockwise Transformers for Near-Infinite Context

Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands...

arxiv.org

4

2

40

Ashwinee Panda

@PandaAshwinee

7 months

@JulyanArbel i literally require this to be satire

0

37

Ashwinee Panda

@PandaAshwinee

2 months

It seems to me that the two schools of “faster decoding” and “linear time attention” are not compatible with each other. Both are trying to eat the same lunch. I would be pleased if someone has a counter to this, as it means these improvements can be stacked and just haven’t yet.

AK

@_akhaliq

2 months

Apple presents Recurrent Drafter for Fast Speculative Decoding in Large Language Models In this paper, we introduce an improved approach of speculative decoding aimed at enhancing the efficiency of serving large language models. Our method capitalizes on the

6

44

206

1

7

36

Ashwinee Panda

@PandaAshwinee

2 months

"technical report" as a term has gotten kind of watered down in meaning by recent technical reports that are just literally PR with no real details. this is like an actual technical report. have been playing around w the core model (ty @agihippo @maxhbain for access) recently.

Reka

@RekaAILabs

2 months

Along with Core, we have published a technical report detailing the training, architecture, data, and evaluation for the Reka models.

2

62

371

1

5

35

Ashwinee Panda

@PandaAshwinee

3 months

@shakoistsLog This is the default impact statement for ICML2024 submissions. See @zicokolter recent posts

1

2

34

Ashwinee Panda

@PandaAshwinee

14 days

I'm glad that Jan was able to send out the Superalignment grants before he left. The $10M OpenAI invested into various projects, including ours, will hopefully enable a lot of great research. The tension between product and security in industry is one reason why I enjoy academia

Jan Leike

@janleike

14 days

I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics.

33

283

3K

0

2

33

Ashwinee Panda

@PandaAshwinee

1 year

@mmitchell_ai @jeremyphoward @60Minutes Also worth noting that Bengali and Odiya are extremely similar (I can understand Bengali despite only knowing Odiya) bc the states are neighboring each other in India and the cultures have a lot shared. This is the case for many Indian languages, so grain of salt is needed.

1

2

32

Ashwinee Panda

@PandaAshwinee

23 days

Come say hi @iclr_conf for the next two hours to see our poster on in context learning made better and more private!

1

28

Ashwinee Panda

@PandaAshwinee

6 months

@tracewoodgrains @BasedBeffJezos this was a hit piece??? his h-index is twice sqrt( #papers ) i legit thought this was a marketing article for his startup

1

0

27

Ashwinee Panda

@PandaAshwinee

6 months

@thefirehacker @lilianweng Alec is one of the first names on the list lol

1

25

Ashwinee Panda

@PandaAshwinee

7 months

@yacineMTB it has gone down before, i say this as a member of a family that proudly ordered pizza online from the local dominos weekly for most of my life

2

0

25

Ashwinee Panda

@PandaAshwinee

19 days

@jasondeanlee Yes, you can write it exactly this way

KAN is just MLP.ipynb

Colab notebook

colab.research.google.com

1

6

25

Ashwinee Panda

@PandaAshwinee

2 months

Too late to fight the expectation that undergrads need papers to get into PhD. That’s been true since I applied back in 2020. How else are you going to differentiate? Getting into research as an undergrad is also competitive, so having papers in HS can be the differentiator.

Clément Canonne

@ccanonne_

2 months

OK, one of the (many) things that irks me with that NeurIPS "High School projects" track: You shouldn't set the expectation for HS students to have papers, or research experience. Hell, you shouldn't set the expectation for UNDERGRADS to have papers, or research experience.

20

41

568

5

2

22

Ashwinee Panda

@PandaAshwinee

22 days

@sarahookr It’s just a random number that was written in one of the Stanford survey papers, and it caught on somehow.

2

0

20

Ashwinee Panda

@PandaAshwinee

2 months

@aaron_defazio No compute / mem overhead and better than tuned Adam??? I’ve been burned before but I’m ready to be burned again…

0

19

Ashwinee Panda

@PandaAshwinee

1 month

@HighFreqAsuka I can’t believe this app is free

0

19

Ashwinee Panda

@PandaAshwinee

1 year

@deliprao and that’s why you don’t build research on top of industry apis

1

0

19

Ashwinee Panda

@PandaAshwinee

7 months

@WenhuChen You need to do this poll with way more higher end options. I use >10k A100 GPU hours per month at Princeton (roughly just my 32 80GB A100s running round the clock) and there are also 300 H100s.

3

19

Ashwinee Panda

@PandaAshwinee

2 months

“and now he’s back…as I knew he would be someday.” from

1

0

18

Ashwinee Panda

@PandaAshwinee

2 months

@peter_richtarik Theorem: provides guarantees for convex models Experiments: study convex models Limitations section: this does not work for nonconvex models, ex LLMs Reviewers: this Theorem does not work for LLMs because they’re nonconvex

1

0

18

Ashwinee Panda

@PandaAshwinee

2 months

This looks pretty cool. There are a bunch of jailbreak papers that propose automatic optimization techniques. But why reinvent the wheel if you can just use @lateinteraction DSPy to do the prompt optimization automatically? The code is also super clean!

Haize Labs

@haizelabs

2 months

🕊️red-teaming LLMs with DSPy🕊️ tldr; we use DSPy, a framework for structuring & optimizing language programs, to red-team LLMs 🥳this is the first attempt to use an auto-prompting framework for red-teaming, and one of the *deepest* language programs to date

9

44

266

1

18

Ashwinee Panda

@PandaAshwinee

6 months

@nearcyan "what google would do with their inventions" apparently nothing

0

16

Ashwinee Panda

@PandaAshwinee

8 months

Very excited that our paper on DP image classification received a spotlight at #NeurIPS2023 !

Xinyu Tang

@XinyuTang7

1 year

Our recent work on differentially private image classification obtains SOTA results across a range of tasks. 🧵 ↓ Paper: . Code: .

1

7

40

1

16

Ashwinee Panda

@PandaAshwinee

6 months

What makes workshops more exciting than the main program? Is it the emphasis on poster sessions over talks? The smaller but more focused communities? The introduction of new work and not “yeah I read that paper 6 months ago when they posted it on Twitter”? All of the above?

1

2

16

Ashwinee Panda

@PandaAshwinee

3 months

@typedfemale if you're implying 'gemini used hyperattention' i would say that the QT is not saying that, it's saying that Google practically implemented long-context, something that Amin showed was possible in a prior paper. also, i think has the best take on this.

AK

@_akhaliq

3 months

Simple linear attention language models balance the recall-throughput tradeoff Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is

3

45

230

3

0

14

Ashwinee Panda

@PandaAshwinee

1 month

What do you cite for instruction tuning / SFT, "Finetuned Language Models Are Zero-Shot Learners" (Wei) or "Training language models to follow instructions with human feedback" (Ouyang) ?

Wei (ICLR 2022)

48

Ouyang (NeurIPS 2022)

54

4

1

15

Ashwinee Panda

@PandaAshwinee

4 years

Excited to announce that we're launching DiscreetAI!

npm - Product Information, Latest Updates, and Reviews 2024 | Product Hunt

Relied upon by more than 11 million developers worldwide, npm is committed to making JavaScript development elegant, productive, and safe. The free npm Registry has become the center of JavaScript...

www.producthunt.com

0

3

15

Ashwinee Panda

@PandaAshwinee

2 months

@ExaAILabs search has filled the last remaining niche that Google had for me, which was superior indexing/search of Twitter and Arxiv. I can find any paper I’m thinking of on their tab and it also includes tweets (twitter search is so terrible seriously). Hope it stays good.

2

16

Ashwinee Panda

@PandaAshwinee

3 months

@_saurabh This is awesome! I’ve been manually doing this for some cases with just prompt regex but it’s quite annoying. Have been looking for a functional approach.

1

0

13

Ashwinee Panda

@PandaAshwinee

4 months

@zicokolter @icmlconf My coauthor; “how did you know the impact statement is required? It’s not in the .tex?” Me; “you have to follow Zico on twitter” My coauthor; “…”

1

14

Ashwinee Panda

@PandaAshwinee

1 month

Something interesting from my last week of testing Mistral7b vs Llama38b is that Mistral is actually better *for some vram budgets* bc the quants are better than those of Llama3, even tho at full precision Mistral is much worse than Llama3. Interesting Pareto frontier.

3

0

12

Ashwinee Panda

@PandaAshwinee

1 month

"first time a new arch is trained on the same 2t tokens as llama" isn't that because nobody else can train on Llama data? I hope we don't get a wave of arch papers trained on different proprietary datasets. makes it impossible to compare evals btwn this and ex Griffin.

Beidi Chen

@BeidiChen

1 month

This is the first time we see a new architecture making🍎to🍎 comparison at scale with Llama-7B trained on the same 2T tokens and win (unlimited context length, lower ppl, constant kv at inference, ...)! Very excited to be part of the team! Thanks for the lead @violet_zct

2

5

65

0

1

13

Ashwinee Panda

@PandaAshwinee

1 year

@ylecun What a tasty cake @bob_burrough this was from NeurIPS in 2019 I think

3

0

14

Ashwinee Panda

@PandaAshwinee

3 months

@rickasaurus this post is a bit light on details and seems to contradict databricks, who concluded that h100 is about 2x faster still

LLM Training and Inference with Intel Gaudi 2 AI Accelerators | Databricks Blog

We benchmarked LLM training and inference on an Intel Gaudi2 cluster and found that it delivered great single-node and multi-node performance.

www.databricks.com

2

0

14

Ashwinee Panda

@PandaAshwinee

6 months

@abacaj Hf is 5-10x slower for anything (loading models, inference, training) than the best way, but you can do everything in one api. Vs using gptneox for train, vLLM for inference, loading models this way.

0

12

Ashwinee Panda

@PandaAshwinee

8 months

@abeirami @emnlpmeeting Oh, so there _are_ conferences with worse reviews than NeurICMLR. That makes me feel a bit better.

0

13

Ashwinee Panda

@PandaAshwinee

1 month

I'll be presenting the Neural Phishing Attack paper at #ICLR2024 next week, DM me if you want to chat! We'll be in "Halle B #220 " at Thu 9 May 10:45 a.m. CEST (this is more for me to remember lol)

Ashwinee Panda

@PandaAshwinee

4 months

"Teach LLMs to Phish: Stealing Private Information from Language Models" has been accepted at #ICLR2024 ! We introduce Neural Phishing attacks that can enable adversaries to extract information like credit card #s from LLMs with success as high as 80%.

3

18

72

1

12

Ashwinee Panda

@PandaAshwinee

4 months

@WenhuChen better than daily :D

1

0

12

Ashwinee Panda

@PandaAshwinee

10 months

@hayou_soufiane @peter_richtarik I agree with this. In my experience the empiricists will just give a low confidence accept or at worst borderline reject of a pure theory paper. But if you put in some toy experiments then suddenly they will become experts and recommend reject.

0

1

11

Ashwinee Panda

@PandaAshwinee

1 year

@finbarrtimbers int4 quantization is just so crazy to me because your parameters can’t even represent letters anymore, like what is happening

3

0

12

Ashwinee Panda

@PandaAshwinee

1 year

@PreetumNakkiran Counteroffer: I optimize over random seed, pick a 5-digit one, report results only on that one, and then never respond to a single GitHub issue that asks why people cannot reproduce results

1

0

12

Ashwinee Panda

@PandaAshwinee

6 months

@bryancsk 3b is neither accel nor decel, it’s about people. human tenacity > human innovation. da shi tells us; even bugs can’t be underestimated. the sword holder wins via trickery not technology. humanity survives not bc of ftl tech but bc one ship blows up another and flies off.

0

10

Ashwinee Panda

@PandaAshwinee

2 years

@sritej From open source GH commits to a director level position…

0

11

Ashwinee Panda

@PandaAshwinee

7 months

@roydanroy Something on regret -> Adagrad papers (John duchy and Brendan McMahan) -> rmsprop -> ADAM -> existential crisis at the realization that nobody in the entire world knows what is really going on in the adaptive optimizer that everyone uses for everything -> do you “regret” reading?

0

11

Ashwinee Panda

@PandaAshwinee

2 months

@__tinygrad__ This is actually galaxy brain. Assume that they fail and have to give the machine back + 10k. They just got 8 weeks of machine time for 10k, which is a gargantuan discount, and a ton of PR.

0

11

Ashwinee Panda

@PandaAshwinee

2 months

@oliverjohansson I lived near Shewchuk one year and saw him and his partner a couple of times. They seemed quite happy.

0

11

Ashwinee Panda

@PandaAshwinee

6 months

@SebastienBubeck This doesn’t really look like the same curves?

1

0

11

Ashwinee Panda

@PandaAshwinee

3 months

@giffmana The outputs of the model…are based on the data?! Heresy! It must have developed self awareness!

0

11

Ashwinee Panda

@PandaAshwinee

1 month

This is really cool work! talked about something like this last year with @VSehwag_ @ashwini1024 but found that getting even trivial DP guarantees would require adding way too much Gaussian noise to images. the comments on this post are pretty mean though...

Giannis Daras

@giannis_daras

1 month

Consistent Diffusion Meets Tweedie. Our latest paper introduces an exact framework to train/finetune diffusion models like Stable Diffusion XL solely with noisy data. A year's worth of work breakthrough in reducing memorization and its implications on copyright 🧵

18

68

405

1

9

Ashwinee Panda

@PandaAshwinee

3 months

@giffmana you’re telling me…in the YEAR OF OUR LORD 2024…that the OUTPUTS OF THE MODEL…depend on the TRAINING DATA?!

0

10

Ashwinee Panda

@PandaAshwinee

3 months

When you show up to the {turning compute into useful work} competition and your opponent is the transformer slide creds @giffmana (soft launching memes onto my twitter bc i don't get enough of my papers accepted to only tweet about those lol)

0

1

10

Ashwinee Panda

@PandaAshwinee

2 months

@jxmnop HFTrainer is always the culprit. Never use HFTrainer.

2

1

10

Ashwinee Panda

@PandaAshwinee

2 months

Some nice work that follows up on our ICLR2024 work showing that the language part of a diffusion model can be finetuned on synthetic data to amplify privacy risks of real images! AI is more likely to leak data when it's more familiar w the data via synth.

Teach LLMs to Phish: Stealing Private Information from Language Models

When large language models are trained on private data, it can be a \textit{significant} privacy risk for them to memorize and regurgitate sensitive information. In this work, we propose a new...

openreview.net

Junyuan Hong

@hjy836

2 months

[Finetuning can amplify the privacy risks of Generative AI (Diffusion Models)] Last week, I was honored to give a talk at the Good Systems Symposium (), where I shared our recent work on the 🚨 privacy risks of Generative AI via finetuning. Our leading

1

10

32

1

10

Ashwinee Panda

@PandaAshwinee

6 months

@colinraffel wake up kids new colin raffel blog post just dropped

0

10

Ashwinee Panda

@PandaAshwinee

7 months

@francoisfleuret these specs are all that's required to get SOTA perf? that's consumer grade hardware!

0

10

Ashwinee Panda

@PandaAshwinee

9 months

@yoavgo Any mature enough administrative body will get to the point where it prioritizes managerial skill rather than specific skill in what it’s managing, right? The president of a landscaping company might not need to be an expert landscaper to be good at playing board politics.

1

0

10

Ashwinee Panda

@PandaAshwinee

3 months

@TheXeophon so this is how the leak happens? by inefficient caching software?

0

9

Ashwinee Panda

@PandaAshwinee

19 days

@iclr_conf is a wrap! Somehow in the melange of crowded poster sessions, bites of delicious middle eastern cuisine, and networking- I managed to squeeze in some time to work on research. Good luck on @NeurIPSConf submissions, and see you back in Vienna for @icmlconf !

0

9

Ashwinee Panda

@PandaAshwinee

10 months

@rasbt @roydanroy >=3 is technically true but if you look at the job reqs for, eg NVIDIA, they require 8 publications (for a new PhD grad position).

4

0

7

Ashwinee Panda

@PandaAshwinee

9 months

@MF_FOOM We saw this as well for optimizing the prompt in our adversarial example paper () even after 5000 iterations the loss is still going down but at some point you have to just call it.

Visual Adversarial Examples Jailbreak Aligned Large Language Models

Recently, there has been a surge of interest in integrating vision into Large Language Models (LLMs), exemplified by Visual Language Models (VLMs) such as Flamingo and GPT-4. This paper sheds...

arxiv.org

0

1

9

Ashwinee Panda

@PandaAshwinee

6 months

@michael_nielsen From your post; “If a set of principles throws off a lot of rotten fruit, it's a sign of something wrong with the principles, a reductio ad absurdum.” EA is losing the culture war bc of these rotten fruit, so I don’t see why you’re dismissing these as false straw men.

1

0

9

Ashwinee Panda

@PandaAshwinee

1 year

@kchonyc I guess decisions might not come out on 22 then?

1

0

9

Ashwinee Panda

@PandaAshwinee

1 year

@YiTayML Agreed, my top advice to new researchers is just “sign up for twitter”. Having said that I still read every paper in my field at ICLR…

0

9

Ashwinee Panda

@PandaAshwinee

9 months

@jxmnop Consider a VLM aided robotic arm, and a method that creates visual adversarial examples that fool the language model into providing malicious instructions. That system and attack exist right now!

Xiangyu Qi

@xiangyuqi_pton

11 months

Our recent study: Visual Adversarial Examples Jailbreak Large Language Models! 🧵 ↓ Paper: Github Repo:

2

34

88

0

9

Ashwinee Panda

@PandaAshwinee

9 months

@yacineMTB I think this is the right approach, any model that’s claiming to be smaller/better/faster/stronger should see real use by the community. But how do you actually track it other than hf downloads?

1

0

9

Ashwinee Panda

@PandaAshwinee

9 months

@SebastienBubeck I wish you could release model checkpoints such as those released by @AiEleuther , releasing the final trained model only lets us observe behavior for finetuning and of the trained model but studying pretraining is also important

0

9

Ashwinee Panda

@PandaAshwinee

9 months

@shortstein somewhere out there is a neurips reviewer who asked 'why don't you compare to steinke et al (jun 2023)' to a hapless phd student

0

8

Ashwinee Panda

@PandaAshwinee

2 months

@MatharyCharles I mean obviously he doesn’t want Joshi to get the credit for actually “proving” ABC right? Joshi’s main motivation and what he says over a and over again in that i-iv series is that Mochizuki’s proof is incomplete and that he is completing it.

2

1

8

Ashwinee Panda

@PandaAshwinee

8 months

@giffmana @francoisfleuret @neurosutras alex graves was on fire ca 2006-2023, from 2006-2008 he was specifically on fire in that area of cv

1

0

6

Ashwinee Panda

@PandaAshwinee

6 months

Pleased to say that our paper on visual adversarial examples has been accepted at #AAAI2024 !

Xiangyu Qi

@xiangyuqi_pton

11 months

Our recent study: Visual Adversarial Examples Jailbreak Large Language Models! 🧵 ↓ Paper: Github Repo:

2

34

88

0

7

Ashwinee Panda

@PandaAshwinee

3 months

@felix_red_panda @jeethu the inference is fp16 acc to that independent eval org

2

0

8

Ashwinee Panda

@PandaAshwinee

25 days

@srush_nlp @jasondeanlee yeah but they’re actually implementing PPO online. I don’t think the sauce is anywhere but “on policy sampling”.

1

8

Ashwinee Panda

@PandaAshwinee

1 year

@thegautamkamath 1. Blue check mark crowds out my favorite professors and I miss their tweets 2. There are >20 tweets about the same papers (eg RTM) that don’t even go over the most basic details (simulated memorization noise task) 3. :(

2

0

8

Ashwinee Panda

@PandaAshwinee

3 months

@david_picard @jbhuang0604 This is someone’s random paper describing ReLU that just gets cited bc Google shows it as the ReLU paper

0

8

Ashwinee Panda

@PandaAshwinee

10 months

I used to be really good about posting books read on Goodreads () but I moved to posting on Instagram. I’d welcome any recs based on my books read from 2021, and will try to find the time to update with 2022 onwards.

Goodreads 2021 Year in Books

Check out My 2021 Year in Books on Goodreads!

www.goodreads.com

6

0

8

Ashwinee Panda

@PandaAshwinee

7 months

@kevinsxu You should not care about this, all LLMs have mostly the same arch. Anyways they already apologized (despite the apology not being necessary) and made a commitment to fix it.

01-ai/Yi-34B · llama-compatibility

huggingface.co

1

0

7

Ashwinee Panda

@PandaAshwinee

8 months

@shortstein Can’t wait to publicize the AC’s meta review that said we need to implement a comparison to a submission to NeurIPS 2023 in order to meet the bar …for NeurIPS 2023.

0

8

Ashwinee Panda

@PandaAshwinee

2 months

Merging fails when experts disagree with each other. Sparsity helps by minimizing disagreements. If you do magnitude pruning then your merging will work with LoRA and full finetuning. But another important step is resolving disagreements when they arise.

Teortaxes▶️

@teortaxesTex

2 months

@georgejrjrjr It seems to me that model merging, when it works at all, works due to sparsity of finetunes (often created using LoRA/QLoRA) and low divergence from the shared root. Any ideas?

1

0

2

1

0

8

Ashwinee Panda

@PandaAshwinee

3 months

The latest in the line of work starting from ReLoRA, I would really be interested to see: - how does this work when training on many more tokens? - what if seqlens are longer? - can this be used for making distributed pretraining more efficient?

Prof. Anima Anandkumar

@AnimaAnandkumar

3 months

For the first time, we show that the Llama 7B LLM can be trained on a single consumer-grade GPU (RTX 4090) with only 24GB memory. This represents more than 82.5% reduction in memory for storing optimizer states during training. Training LLMs from scratch currently requires huge

48

389

2K

2

0

8

Ashwinee Panda

@PandaAshwinee

11 months

Missed our #ACL2023 TrustNLP work on Differentially Private In-Context Learning? Here's a recap. Our work is the first method that can augment cloud-based LLMs with private data, no retraining necessary.

1

0

8

Ashwinee Panda

@PandaAshwinee

4 months

Why do we care about the privacy of in-context data? We're seeing papers that adapt LLMs to medical usecases via ICL, and projects that use RAG for sensitive DBs/commercial data. In these settings, it's important that the private in-context database doesn't get leaked by the LLM.

1

0

7

Ashwinee Panda

@PandaAshwinee

1 month

@natolambert @andersonbcdefg @aryaman2020 @dylan522p I think we missed it lol, after reading the paper 100/100 people would bet on phi-3 to lose. Should have bet sooner.

1

0

7

Ashwinee Panda

@PandaAshwinee

5 months

@natolambert @dingboard_ first result: official corporate tweet with an appropriate level of decorum second result: what does this mean for dingboard's legacy?