Ashwinee Panda Profile Banner
Ashwinee Panda Profile
Ashwinee Panda

@PandaAshwinee

1,079
Followers
643
Following
73
Media
1,534
Statuses

PhD @princeton , @Cal alum, currently working on LLMs

Joined February 2020
Don't wanna be here? Send us removal request.
Pinned Tweet
@PandaAshwinee
Ashwinee Panda
2 months
Some cool stuff is coming, stay tuned =)
Tweet media one
@janleike
Jan Leike
2 months
The superalignment fast grants are now decided! We got a *ton* of really strong applications, so unfortunately we had to say no to many we're very excited about. There is still so much good research waiting to be funded. Congrats to all recipients!
18
22
379
6
2
149
@PandaAshwinee
Ashwinee Panda
3 months
Mfs will get a setup like this and then ship the best cv paper you've ever seen
Tweet media one
@maybe_dan_
dan
3 months
Mfs will get a setup like this and then ship the most ass code you've ever seen
Tweet media one
197
734
15K
10
9
293
@PandaAshwinee
Ashwinee Panda
2 months
All these LLM watermarking / detection papers being written and the best tool we have is ctrl+f “delve”
@JeremyNguyenPhD
Jeremy Nguyen ✍🏼 🚢
2 months
Are medical studies being written with ChatGPT? Well, we all know ChatGPT overuses the word "delve". Look below at how often the word 'delve' is used in papers on PubMed (2023 was the first full year of ChatGPT).
Tweet media one
411
2K
13K
3
15
202
@PandaAshwinee
Ashwinee Panda
6 months
@jxmnop we had alec and ilya give guest lectures in @pabbeel 's grad class in 2019 and alec's lecture on language models was more useful than the entirety of cal's nlp class
3
8
167
@PandaAshwinee
Ashwinee Panda
7 months
@yacineMTB the pizza tracking system is a literal fugazi btw and the order interface isn’t hard to build, you pick a location first and make your order and the location gets a ticket
2
0
107
@PandaAshwinee
Ashwinee Panda
1 year
@DimitrisPapail I can’t decide whether this trivializes the DM result more or is an impressive showcase of GPT4’s abilities. To the latter I’m pretty sure that the optimized sort3/4/5 are in the training data but never pushed to libc bc of how long it takes to push changes there.
6
1
101
@PandaAshwinee
Ashwinee Panda
2 months
@DimitrisPapail I would have given the same response as chatgpt lol
2
0
88
@PandaAshwinee
Ashwinee Panda
4 months
"Teach LLMs to Phish: Stealing Private Information from Language Models" has been accepted at #ICLR2024 ! We introduce Neural Phishing attacks that can enable adversaries to extract information like credit card #s from LLMs with success as high as 80%.
Tweet media one
3
18
72
@PandaAshwinee
Ashwinee Panda
1 year
@nabeelqu This is the kind of thing that sounds really cool at first and the more you think about it the more you realize it’s pretty much impossible. Watermarking generative images? Maybe, but you can just feed it into an open source model whose only purpose would be to shake any mark.
0
0
66
@PandaAshwinee
Ashwinee Panda
2 months
pov: you are reading a paper that rolled their own prompts instead of using lm-eval-harness
Tweet media one
@abhi_venigalla
Abhi Venigalla
3 months
What is going on with arc-challenge evals? Lots of great new models report scores in the high 80s-90s in their blogs. But then OSS eval frameworks like @AiEleuther harness and @MosaicML gauntlet seem to report lower scores... Clearest example is Mixtral-8x7B: * blog post: 0.858
1
6
51
1
7
64
@PandaAshwinee
Ashwinee Panda
4 months
"Privacy-Preserving In-Context Learning for Large Language Models" has been accepted at #ICLR2024 ! We present a method for generating sentences from in-context learning while keeping the in-context exemplars differentially private, that can be applied to blackbox APIs (ex RAG)
3
7
64
@PandaAshwinee
Ashwinee Panda
6 months
@jeremyphoward minus points for not maintaining the misspelling of literally
1
0
52
@PandaAshwinee
Ashwinee Panda
1 month
@jxmnop “these people” bro that is your advisor
0
2
53
@PandaAshwinee
Ashwinee Panda
2 months
Hot take: this is a good thing. Paper submission is a far more even playing field than the typical ECs that privileged HS kids participate in to pad their resumes these days. I'm looking forward to meeting these high schoolers at NeurIPS 2024!
@thegautamkamath
Gautam Kamath
2 months
NeurIPS 2024 will have a track for papers from high schoolers.
Tweet media one
79
91
597
12
2
51
@PandaAshwinee
Ashwinee Panda
4 months
@jxmnop bro this is heinous
2
0
52
@PandaAshwinee
Ashwinee Panda
1 month
@tamaybes This reminds me of an eye-opening post by @suchenzang
@suchenzang
Susan Zhang
1 year
After ignoring the details in all these "lets-fit-a-cloud-of-points-to-a-single-line" papers (all likely wrong when you really extrapolate), @stephenroller finally convinced me to work through the math in the Chinchilla paper and as expected, this was a doozy. [1/7]
5
46
309
0
1
46
@PandaAshwinee
Ashwinee Panda
14 days
@francoisfleuret They trained the model on chunks of 2M by using ring attention which is a system optimization (not an approximation). However, because it’s hard to find 2M token documents, the context may not be very good, as it’s just a lot of shorter documents stitched.
4
2
40
@PandaAshwinee
Ashwinee Panda
7 months
@JulyanArbel i literally require this to be satire
0
0
37
@PandaAshwinee
Ashwinee Panda
2 months
It seems to me that the two schools of “faster decoding” and “linear time attention” are not compatible with each other. Both are trying to eat the same lunch. I would be pleased if someone has a counter to this, as it means these improvements can be stacked and just haven’t yet.
@_akhaliq
AK
2 months
Apple presents Recurrent Drafter for Fast Speculative Decoding in Large Language Models In this paper, we introduce an improved approach of speculative decoding aimed at enhancing the efficiency of serving large language models. Our method capitalizes on the
Tweet media one
6
44
206
1
7
36
@PandaAshwinee
Ashwinee Panda
2 months
"technical report" as a term has gotten kind of watered down in meaning by recent technical reports that are just literally PR with no real details. this is like an actual technical report. have been playing around w the core model (ty @agihippo @maxhbain for access) recently.
@RekaAILabs
Reka
2 months
Along with Core, we have published a technical report detailing the training, architecture, data, and evaluation for the Reka models.
Tweet media one
Tweet media two
2
62
371
1
5
35
@PandaAshwinee
Ashwinee Panda
3 months
@shakoistsLog This is the default impact statement for ICML2024 submissions. See @zicokolter recent posts
1
2
34
@PandaAshwinee
Ashwinee Panda
14 days
I'm glad that Jan was able to send out the Superalignment grants before he left. The $10M OpenAI invested into various projects, including ours, will hopefully enable a lot of great research. The tension between product and security in industry is one reason why I enjoy academia
@janleike
Jan Leike
14 days
I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics.
33
283
3K
0
2
33
@PandaAshwinee
Ashwinee Panda
1 year
@mmitchell_ai @jeremyphoward @60Minutes Also worth noting that Bengali and Odiya are extremely similar (I can understand Bengali despite only knowing Odiya) bc the states are neighboring each other in India and the cultures have a lot shared. This is the case for many Indian languages, so grain of salt is needed.
1
2
32
@PandaAshwinee
Ashwinee Panda
23 days
Come say hi @iclr_conf for the next two hours to see our poster on in context learning made better and more private!
Tweet media one
1
1
28
@PandaAshwinee
Ashwinee Panda
6 months
@tracewoodgrains @BasedBeffJezos this was a hit piece??? his h-index is twice sqrt( #papers ) i legit thought this was a marketing article for his startup
1
0
27
@PandaAshwinee
Ashwinee Panda
6 months
@thefirehacker @lilianweng Alec is one of the first names on the list lol
1
1
25
@PandaAshwinee
Ashwinee Panda
7 months
@yacineMTB it has gone down before, i say this as a member of a family that proudly ordered pizza online from the local dominos weekly for most of my life
2
0
25
@PandaAshwinee
Ashwinee Panda
19 days
@jasondeanlee Yes, you can write it exactly this way
1
6
25
@PandaAshwinee
Ashwinee Panda
2 months
Too late to fight the expectation that undergrads need papers to get into PhD. That’s been true since I applied back in 2020. How else are you going to differentiate? Getting into research as an undergrad is also competitive, so having papers in HS can be the differentiator.
@ccanonne_
Clément Canonne
2 months
OK, one of the (many) things that irks me with that NeurIPS "High School projects" track: You shouldn't set the expectation for HS students to have papers, or research experience. Hell, you shouldn't set the expectation for UNDERGRADS to have papers, or research experience.
20
41
568
5
2
22
@PandaAshwinee
Ashwinee Panda
22 days
@sarahookr It’s just a random number that was written in one of the Stanford survey papers, and it caught on somehow.
2
0
20
@PandaAshwinee
Ashwinee Panda
2 months
@aaron_defazio No compute / mem overhead and better than tuned Adam??? I’ve been burned before but I’m ready to be burned again…
0
0
19
@PandaAshwinee
Ashwinee Panda
1 month
@HighFreqAsuka I can’t believe this app is free
0
0
19
@PandaAshwinee
Ashwinee Panda
1 year
@deliprao and that’s why you don’t build research on top of industry apis
1
0
19
@PandaAshwinee
Ashwinee Panda
7 months
@WenhuChen You need to do this poll with way more higher end options. I use >10k A100 GPU hours per month at Princeton (roughly just my 32 80GB A100s running round the clock) and there are also 300 H100s.
3
3
19
@PandaAshwinee
Ashwinee Panda
2 months
“and now he’s back…as I knew he would be someday.” from
Tweet media one
1
0
18
@PandaAshwinee
Ashwinee Panda
2 months
@peter_richtarik Theorem: provides guarantees for convex models Experiments: study convex models Limitations section: this does not work for nonconvex models, ex LLMs Reviewers: this Theorem does not work for LLMs because they’re nonconvex
1
0
18
@PandaAshwinee
Ashwinee Panda
2 months
This looks pretty cool. There are a bunch of jailbreak papers that propose automatic optimization techniques. But why reinvent the wheel if you can just use @lateinteraction DSPy to do the prompt optimization automatically? The code is also super clean!
@haizelabs
Haize Labs
2 months
🕊️red-teaming LLMs with DSPy🕊️ tldr; we use DSPy, a framework for structuring & optimizing language programs, to red-team LLMs 🥳this is the first attempt to use an auto-prompting framework for red-teaming, and one of the *deepest* language programs to date
Tweet media one
9
44
266
1
1
18
@PandaAshwinee
Ashwinee Panda
6 months
@nearcyan "what google would do with their inventions" apparently nothing
0
0
16
@PandaAshwinee
Ashwinee Panda
8 months
Very excited that our paper on DP image classification received a spotlight at #NeurIPS2023 !
@XinyuTang7
Xinyu Tang
1 year
Our recent work on differentially private image classification obtains SOTA results across a range of tasks. 🧵 ↓ Paper: . Code: .
Tweet media one
1
7
40
1
1
16
@PandaAshwinee
Ashwinee Panda
6 months
What makes workshops more exciting than the main program? Is it the emphasis on poster sessions over talks? The smaller but more focused communities? The introduction of new work and not “yeah I read that paper 6 months ago when they posted it on Twitter”? All of the above?
1
2
16
@PandaAshwinee
Ashwinee Panda
3 months
@typedfemale if you're implying 'gemini used hyperattention' i would say that the QT is not saying that, it's saying that Google practically implemented long-context, something that Amin showed was possible in a prior paper. also, i think has the best take on this.
@_akhaliq
AK
3 months
Simple linear attention language models balance the recall-throughput tradeoff Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is
Tweet media one
3
45
230
3
0
14
@PandaAshwinee
Ashwinee Panda
1 month
What do you cite for instruction tuning / SFT, "Finetuned Language Models Are Zero-Shot Learners" (Wei) or "Training language models to follow instructions with human feedback" (Ouyang) ?
Wei (ICLR 2022)
48
Ouyang (NeurIPS 2022)
54
4
1
15
@PandaAshwinee
Ashwinee Panda
2 months
@ExaAILabs search has filled the last remaining niche that Google had for me, which was superior indexing/search of Twitter and Arxiv. I can find any paper I’m thinking of on their tab and it also includes tweets (twitter search is so terrible seriously). Hope it stays good.
2
2
16
@PandaAshwinee
Ashwinee Panda
3 months
@_saurabh This is awesome! I’ve been manually doing this for some cases with just prompt regex but it’s quite annoying. Have been looking for a functional approach.
1
0
13
@PandaAshwinee
Ashwinee Panda
4 months
@zicokolter @icmlconf My coauthor; “how did you know the impact statement is required? It’s not in the .tex?” Me; “you have to follow Zico on twitter” My coauthor; “…”
1
1
14
@PandaAshwinee
Ashwinee Panda
1 month
Something interesting from my last week of testing Mistral7b vs Llama38b is that Mistral is actually better *for some vram budgets* bc the quants are better than those of Llama3, even tho at full precision Mistral is much worse than Llama3. Interesting Pareto frontier.
3
0
12
@PandaAshwinee
Ashwinee Panda
1 month
"first time a new arch is trained on the same 2t tokens as llama" isn't that because nobody else can train on Llama data? I hope we don't get a wave of arch papers trained on different proprietary datasets. makes it impossible to compare evals btwn this and ex Griffin.
@BeidiChen
Beidi Chen
1 month
This is the first time we see a new architecture making🍎to🍎 comparison at scale with Llama-7B trained on the same 2T tokens and win (unlimited context length, lower ppl, constant kv at inference, ...)! Very excited to be part of the team! Thanks for the lead @violet_zct
Tweet media one
2
5
65
0
1
13
@PandaAshwinee
Ashwinee Panda
1 year
@ylecun What a tasty cake @bob_burrough this was from NeurIPS in 2019 I think
Tweet media one
3
0
14
@PandaAshwinee
Ashwinee Panda
6 months
@abacaj Hf is 5-10x slower for anything (loading models, inference, training) than the best way, but you can do everything in one api. Vs using gptneox for train, vLLM for inference, loading models this way.
0
0
12
@PandaAshwinee
Ashwinee Panda
8 months
@abeirami @emnlpmeeting Oh, so there _are_ conferences with worse reviews than NeurICMLR. That makes me feel a bit better.
0
0
13
@PandaAshwinee
Ashwinee Panda
1 month
I'll be presenting the Neural Phishing Attack paper at #ICLR2024 next week, DM me if you want to chat! We'll be in "Halle B #220 " at Thu 9 May 10:45 a.m. CEST (this is more for me to remember lol)
@PandaAshwinee
Ashwinee Panda
4 months
"Teach LLMs to Phish: Stealing Private Information from Language Models" has been accepted at #ICLR2024 ! We introduce Neural Phishing attacks that can enable adversaries to extract information like credit card #s from LLMs with success as high as 80%.
Tweet media one
3
18
72
1
1
12
@PandaAshwinee
Ashwinee Panda
4 months
@WenhuChen better than daily :D
1
0
12
@PandaAshwinee
Ashwinee Panda
10 months
@hayou_soufiane @peter_richtarik I agree with this. In my experience the empiricists will just give a low confidence accept or at worst borderline reject of a pure theory paper. But if you put in some toy experiments then suddenly they will become experts and recommend reject.
0
1
11
@PandaAshwinee
Ashwinee Panda
1 year
@finbarrtimbers int4 quantization is just so crazy to me because your parameters can’t even represent letters anymore, like what is happening
3
0
12
@PandaAshwinee
Ashwinee Panda
1 year
@PreetumNakkiran Counteroffer: I optimize over random seed, pick a 5-digit one, report results only on that one, and then never respond to a single GitHub issue that asks why people cannot reproduce results
1
0
12
@PandaAshwinee
Ashwinee Panda
6 months
@bryancsk 3b is neither accel nor decel, it’s about people. human tenacity > human innovation. da shi tells us; even bugs can’t be underestimated. the sword holder wins via trickery not technology. humanity survives not bc of ftl tech but bc one ship blows up another and flies off.
0
0
10
@PandaAshwinee
Ashwinee Panda
2 years
@sritej From open source GH commits to a director level position…
0
0
11
@PandaAshwinee
Ashwinee Panda
7 months
@roydanroy Something on regret -> Adagrad papers (John duchy and Brendan McMahan) -> rmsprop -> ADAM -> existential crisis at the realization that nobody in the entire world knows what is really going on in the adaptive optimizer that everyone uses for everything -> do you “regret” reading?
0
0
11
@PandaAshwinee
Ashwinee Panda
2 months
@__tinygrad__ This is actually galaxy brain. Assume that they fail and have to give the machine back + 10k. They just got 8 weeks of machine time for 10k, which is a gargantuan discount, and a ton of PR.
0
0
11
@PandaAshwinee
Ashwinee Panda
2 months
@oliverjohansson I lived near Shewchuk one year and saw him and his partner a couple of times. They seemed quite happy.
0
0
11
@PandaAshwinee
Ashwinee Panda
6 months
@SebastienBubeck This doesn’t really look like the same curves?
1
0
11
@PandaAshwinee
Ashwinee Panda
3 months
@giffmana The outputs of the model…are based on the data?! Heresy! It must have developed self awareness!
0
0
11
@PandaAshwinee
Ashwinee Panda
1 month
This is really cool work! talked about something like this last year with @VSehwag_ @ashwini1024 but found that getting even trivial DP guarantees would require adding way too much Gaussian noise to images. the comments on this post are pretty mean though...
@giannis_daras
Giannis Daras
1 month
Consistent Diffusion Meets Tweedie. Our latest paper introduces an exact framework to train/finetune diffusion models like Stable Diffusion XL solely with noisy data. A year's worth of work breakthrough in reducing memorization and its implications on copyright 🧵
Tweet media one
18
68
405
1
1
9
@PandaAshwinee
Ashwinee Panda
3 months
@giffmana you’re telling me…in the YEAR OF OUR LORD 2024…that the OUTPUTS OF THE MODEL…depend on the TRAINING DATA?!
0
0
10
@PandaAshwinee
Ashwinee Panda
3 months
When you show up to the {turning compute into useful work} competition and your opponent is the transformer slide creds @giffmana (soft launching memes onto my twitter bc i don't get enough of my papers accepted to only tweet about those lol)
Tweet media one
0
1
10
@PandaAshwinee
Ashwinee Panda
2 months
@jxmnop HFTrainer is always the culprit. Never use HFTrainer.
2
1
10
@PandaAshwinee
Ashwinee Panda
2 months
Some nice work that follows up on our ICLR2024 work showing that the language part of a diffusion model can be finetuned on synthetic data to amplify privacy risks of real images! AI is more likely to leak data when it's more familiar w the data via synth.
@hjy836
Junyuan Hong
2 months
[Finetuning can amplify the privacy risks of Generative AI (Diffusion Models)] Last week, I was honored to give a talk at the Good Systems Symposium (), where I shared our recent work on the 🚨 privacy risks of Generative AI via finetuning. Our leading
Tweet media one
1
10
32
1
1
10
@PandaAshwinee
Ashwinee Panda
6 months
@colinraffel wake up kids new colin raffel blog post just dropped
0
0
10
@PandaAshwinee
Ashwinee Panda
7 months
@francoisfleuret these specs are all that's required to get SOTA perf? that's consumer grade hardware!
Tweet media one
0
0
10
@PandaAshwinee
Ashwinee Panda
9 months
@yoavgo Any mature enough administrative body will get to the point where it prioritizes managerial skill rather than specific skill in what it’s managing, right? The president of a landscaping company might not need to be an expert landscaper to be good at playing board politics.
1
0
10
@PandaAshwinee
Ashwinee Panda
3 months
@TheXeophon so this is how the leak happens? by inefficient caching software?
0
0
9
@PandaAshwinee
Ashwinee Panda
19 days
@iclr_conf is a wrap! Somehow in the melange of crowded poster sessions, bites of delicious middle eastern cuisine, and networking- I managed to squeeze in some time to work on research. Good luck on @NeurIPSConf submissions, and see you back in Vienna for @icmlconf !
Tweet media one
Tweet media two
Tweet media three
0
0
9
@PandaAshwinee
Ashwinee Panda
10 months
@rasbt @roydanroy >=3 is technically true but if you look at the job reqs for, eg NVIDIA, they require 8 publications (for a new PhD grad position).
4
0
7
@PandaAshwinee
Ashwinee Panda
9 months
@MF_FOOM We saw this as well for optimizing the prompt in our adversarial example paper () even after 5000 iterations the loss is still going down but at some point you have to just call it.
0
1
9
@PandaAshwinee
Ashwinee Panda
6 months
@michael_nielsen From your post; “If a set of principles throws off a lot of rotten fruit, it's a sign of something wrong with the principles, a reductio ad absurdum.” EA is losing the culture war bc of these rotten fruit, so I don’t see why you’re dismissing these as false straw men.
1
0
9
@PandaAshwinee
Ashwinee Panda
1 year
@kchonyc I guess decisions might not come out on 22 then?
1
0
9
@PandaAshwinee
Ashwinee Panda
1 year
@YiTayML Agreed, my top advice to new researchers is just “sign up for twitter”. Having said that I still read every paper in my field at ICLR…
0
0
9
@PandaAshwinee
Ashwinee Panda
9 months
@jxmnop Consider a VLM aided robotic arm, and a method that creates visual adversarial examples that fool the language model into providing malicious instructions. That system and attack exist right now!
@xiangyuqi_pton
Xiangyu Qi
11 months
Our recent study: Visual Adversarial Examples Jailbreak Large Language Models! 🧵 ↓ Paper: Github Repo:
Tweet media one
2
34
88
0
0
9
@PandaAshwinee
Ashwinee Panda
9 months
@yacineMTB I think this is the right approach, any model that’s claiming to be smaller/better/faster/stronger should see real use by the community. But how do you actually track it other than hf downloads?
1
0
9
@PandaAshwinee
Ashwinee Panda
9 months
@SebastienBubeck I wish you could release model checkpoints such as those released by @AiEleuther , releasing the final trained model only lets us observe behavior for finetuning and of the trained model but studying pretraining is also important
0
0
9
@PandaAshwinee
Ashwinee Panda
9 months
@shortstein somewhere out there is a neurips reviewer who asked 'why don't you compare to steinke et al (jun 2023)' to a hapless phd student
0
0
8
@PandaAshwinee
Ashwinee Panda
2 months
@MatharyCharles I mean obviously he doesn’t want Joshi to get the credit for actually “proving” ABC right? Joshi’s main motivation and what he says over a and over again in that i-iv series is that Mochizuki’s proof is incomplete and that he is completing it.
2
1
8
@PandaAshwinee
Ashwinee Panda
8 months
@giffmana @francoisfleuret @neurosutras alex graves was on fire ca 2006-2023, from 2006-2008 he was specifically on fire in that area of cv
1
0
6
@PandaAshwinee
Ashwinee Panda
6 months
Pleased to say that our paper on visual adversarial examples has been accepted at #AAAI2024 !
@xiangyuqi_pton
Xiangyu Qi
11 months
Our recent study: Visual Adversarial Examples Jailbreak Large Language Models! 🧵 ↓ Paper: Github Repo:
Tweet media one
2
34
88
0
0
7
@PandaAshwinee
Ashwinee Panda
3 months
@felix_red_panda @jeethu the inference is fp16 acc to that independent eval org
2
0
8
@PandaAshwinee
Ashwinee Panda
25 days
@srush_nlp @jasondeanlee yeah but they’re actually implementing PPO online. I don’t think the sauce is anywhere but “on policy sampling”.
1
1
8
@PandaAshwinee
Ashwinee Panda
1 year
@thegautamkamath 1. Blue check mark crowds out my favorite professors and I miss their tweets 2. There are >20 tweets about the same papers (eg RTM) that don’t even go over the most basic details (simulated memorization noise task) 3. :(
2
0
8
@PandaAshwinee
Ashwinee Panda
3 months
@david_picard @jbhuang0604 This is someone’s random paper describing ReLU that just gets cited bc Google shows it as the ReLU paper
0
0
8
@PandaAshwinee
Ashwinee Panda
10 months
I used to be really good about posting books read on Goodreads () but I moved to posting on Instagram. I’d welcome any recs based on my books read from 2021, and will try to find the time to update with 2022 onwards.
6
0
8
@PandaAshwinee
Ashwinee Panda
7 months
@kevinsxu You should not care about this, all LLMs have mostly the same arch. Anyways they already apologized (despite the apology not being necessary) and made a commitment to fix it.
1
0
7
@PandaAshwinee
Ashwinee Panda
8 months
@shortstein Can’t wait to publicize the AC’s meta review that said we need to implement a comparison to a submission to NeurIPS 2023 in order to meet the bar …for NeurIPS 2023.
0
0
8
@PandaAshwinee
Ashwinee Panda
2 months
Merging fails when experts disagree with each other. Sparsity helps by minimizing disagreements. If you do magnitude pruning then your merging will work with LoRA and full finetuning. But another important step is resolving disagreements when they arise.
@teortaxesTex
Teortaxes▶️
2 months
@georgejrjrjr It seems to me that model merging, when it works at all, works due to sparsity of finetunes (often created using LoRA/QLoRA) and low divergence from the shared root. Any ideas?
1
0
2
1
0
8
@PandaAshwinee
Ashwinee Panda
3 months
The latest in the line of work starting from ReLoRA, I would really be interested to see: - how does this work when training on many more tokens? - what if seqlens are longer? - can this be used for making distributed pretraining more efficient?
@AnimaAnandkumar
Prof. Anima Anandkumar
3 months
For the first time, we show that the Llama 7B LLM can be trained on a single consumer-grade GPU (RTX 4090) with only 24GB memory. This represents more than 82.5% reduction in memory for storing optimizer states during training. Training LLMs from scratch currently requires huge
48
389
2K
2
0
8
@PandaAshwinee
Ashwinee Panda
11 months
Missed our #ACL2023 TrustNLP work on Differentially Private In-Context Learning? Here's a recap. Our work is the first method that can augment cloud-based LLMs with private data, no retraining necessary.
Tweet media one
1
0
8
@PandaAshwinee
Ashwinee Panda
4 months
Why do we care about the privacy of in-context data? We're seeing papers that adapt LLMs to medical usecases via ICL, and projects that use RAG for sensitive DBs/commercial data. In these settings, it's important that the private in-context database doesn't get leaked by the LLM.
Tweet media one
1
0
7
@PandaAshwinee
Ashwinee Panda
1 month
@natolambert @andersonbcdefg @aryaman2020 @dylan522p I think we missed it lol, after reading the paper 100/100 people would bet on phi-3 to lose. Should have bet sooner.
1
0
7
@PandaAshwinee
Ashwinee Panda
5 months
@natolambert @dingboard_ first result: official corporate tweet with an appropriate level of decorum second result: what does this mean for dingboard's legacy?
Tweet media one
1
0
7
@PandaAshwinee
Ashwinee Panda
2 months
i have so many more meme ideas for this ctrl+f strategy so people need to publish more papers about it
Tweet media one
@james_y_zou
James Zou
2 months
Our new study estimates that ~17% of recent CS arXiv papers used #LLMs substantially in its writing. Around 8% for bioRxiv papers 🧵
Tweet media one
5
57
256
0
0
7