Hao Zhang Profile
Hao Zhang

@haozhangml

2,924
Followers
275
Following
3
Media
330
Statuses

Asst. Prof. @HDSIUCSD and @ucsd_cse running @haoailab . Cofounder and runs @lmsysorg . 20% with @SnowflakeDB

San Francisco
Joined July 2021
Don't wanna be here? Send us removal request.
Pinned Tweet
@haozhangml
Hao Zhang
2 months
Check out our latest blogpost discussing a better metric -- goodput (throughput s.t. latency constraints) -- for LLM serving, and our new technique prefill-decoding disaggregation that optimizes goodput and achieves lower cost-per-query and high service quality at the same time!
@haoailab
Hao AI Lab
2 months
Still optimizing throughput for LLM Serving? Think again: Goodput might be a better choice! Splitting prefill from decode to different GPUs yields - up to 4.48x goodput - up to 10.2x stricter latency criteria Blog: Paper:
4
50
179
0
9
35
@haozhangml
Hao Zhang
8 months
As our PagedAttention paper is live, it is time to delve into several key techniques in LLM serving. The 3 most important and *MUST-KNOW* techniques for a (2023-ish) top-notch LLM serving system: (1) continuous batching: 5 - 10x throughput improvement, (2) paged attention: 3x…
@_akhaliq
AK
8 months
Efficient Memory Management for Large Language Model Serving with PagedAttention paper page: High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the…
Tweet media one
5
66
296
9
141
627
@haozhangml
Hao Zhang
1 year
A much-delayed life update: I will be joining @UCSanDiego @HDSIUCSD as Asst. Prof. this July. I am recruiting postdocs and students who are interested in ML and systems! I do a lot of research on large models like #chatgpt ! Check out my page ! 1/5
20
20
275
@haozhangml
Hao Zhang
1 year
We Just released the delta weights of our Vicuna 13B. Try on your own GPU with this chatbot that is closest to chatgpt in open source! We #lmsysorg 💕 open source and open science.
@lmsysorg
lmsys.org
1 year
We are excited to release the weights of Vicuna-13B. 🔥 Run it with a single GPU on your own machine! Get the weights: Web UI demo: Command line demo: see below
42
371
1K
4
26
166
@haozhangml
Hao Zhang
6 months
New results just dropped. Check out our new, fast decoding algorithm -- lookahead decoding!
@lmsysorg
lmsys.org
6 months
Introduce lookahead decoding: - a parallel decoding algo to accelerate LLM inference - w/o the need for a draft model or a data store - linearly decreases # decoding steps relative to log(FLOPs) used per decoding step. Blog: Code:
23
250
1K
2
13
147
@haozhangml
Hao Zhang
6 months
Was about to drop some cool research results on X last Friday, but then the OpenAI drama wave hit. Thought I'd wait for the calm on Monday to share, but seems like the drama sea is endless. 😅😂
3
2
130
@haozhangml
Hao Zhang
6 months
Hi @mustafasuleyman : how about you give us (lmsys) an API access so we can put inflection-2 in chatbot arena? Thanks!
@mustafasuleyman
Mustafa Suleyman
6 months
Thrilled to announce that Inflection-2 is now the 2nd best LLM in the world! 💚✨🎉 It will be powering very soon. And available to select API partners in time. Tech report linked... Come run with us!
79
118
1K
3
8
124
@haozhangml
Hao Zhang
6 months
Congrats @MSFTDeepSpeed . SplitFuse is a pretty interesting and effective technique to further speedup inference. Similar techniques called piggybacking/chunked-prefill were studied by another group of MSR researchers and posted on arxiv earlier: . Worth a…
@MSFTDeepSpeed
DeepSpeed
6 months
Introducing DeepSpeed-FastGen 🚀 Serve LLMs and generative AI models with - 2.3x higher throughput - 2x lower average latency - 4x lower tail latency w. Dynamic SplitFuse batching Auto TP, load balancing w. perfect linear scaling, plus easy-to-use API
Tweet media one
6
121
560
0
13
112
@haozhangml
Hao Zhang
7 months
Our latest work on long sequence training (almost) reinvented sequence parallelism with many new optimizations very specific to today's decoder LLMs and memory-efficient attention. Training 2x faster on 8x longer sequences! There is a secret trick that can readily accelerate…
@RulinShao
Rulin Shao
7 months
Introduce LightSeq for long-context LLM training: - Highly optimized for decoder models - smarter checkpointing - better support for fewer heads models up to 2x faster, 2-8x longer sequences vs Megatron-LM.
7
93
379
0
18
100
@haozhangml
Hao Zhang
6 months
Nice list, but I must admit there's a tinge of sadness that our line of work on Alpa is not on this list. Our @lmsysorg team, though widely recognized for Vicuna and vLLM, have been deeply studying model-parallel training for years and derived a lot of math/systems to help…
@StasBekman
Stas Bekman
6 months
The Model Parallelism chapter of the ML Engineering is now quite complete. The future of training LLM/VLMs is exciting with so many great minds putting their smarts into giving the ML community amazing tools to work with. I will now stop making too many…
Tweet media one
5
62
456
5
9
69
@haozhangml
Hao Zhang
1 year
Recently we port several new open-source large models into alpa.llm_serving package -- BLOOM-176B, Codegen-16B, and OPT-IML WIP!🔥 Try to host them on your cluster with alpa. We'll also repurpose to offer inference API endpoints on all of them, stay tuned!
3
5
68
@haozhangml
Hao Zhang
5 months
Thanks for your words. Please consider donating us a bunch of GPUs or endpoints so we can host more models (esp. those you want to see)? This is really a community effort and we leverage a lot of helps, resource-wise and manpower-wise, from the community members like @mbzuai
@erhartford
Eric Hartford
5 months
@lmsysorg You are right; I was harsh in my previous comment. I am sorry. thank you for your warm response.
0
0
17
5
4
48
@haozhangml
Hao Zhang
10 days
Welcome @yuxiangw_cs to UCSD!! Yu-xiang was my context optimization go-to TA back when we were at CMU and I still remember every night I spent on his assignments 😄😁 Excited to reunite at UCSD and amp up UCSD's ML/systems to the next level!
@yuxiangw_cs
Yu-Xiang Wang
10 days
It's an eventful month in ML with #ICML2024 notice, #AISTATS & #ICLR happening, and #NeurIPS deadline creeping up. It's high time for a *career update*: I've joined @UCSanDiego as an associate professor in @hdsiucsd and @ucsd_cse . Super excited about the new chapter to come!
Tweet media one
49
11
290
4
4
48
@haozhangml
Hao Zhang
8 months
The vLLM/PagedAttention paper is available on arxiv!
@_akhaliq
AK
8 months
Efficient Memory Management for Large Language Model Serving with PagedAttention paper page: High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the…
Tweet media one
5
66
296
1
3
46
@haozhangml
Hao Zhang
8 months
Congrats to Mistral on the release of the best 7B model ever! Extremely exciting to see that Mistral adopted the full stack of LLM infra we built at : fastchat as the finetuning and serving infra, vllm as the inference engine, and mt-bench for evaluation!
@GuillaumeLample
Guillaume Lample @ ICLR 2024
8 months
Mistral 7B is out. It outperforms Llama 2 13B on every benchmark we tried. It is also superior to LLaMA 1 34B in code, math, and reasoning, and is released under the Apache 2.0 licence.
Tweet media one
52
488
3K
0
1
38
@haozhangml
Hao Zhang
2 years
I'll be speaking at #RaySummit tomorrow (Aug 23) in San Francisco! I'll introduce Alpa (again :-) and explain the technology behind our free unlimited OPT-175B hosting at ! If you happen to be around, let's grab a coffee and chat about big models!🙂
Tweet media one
1
5
35
@haozhangml
Hao Zhang
5 months
Flying to Neurips, staying there until 12/17. Happy to chat with anyone on topics: - LLMs: pretraining, finetuning, data curation, etc. - LMSYS: how we can do better to improve our projects at @lmsysorg : arena, fastchat, vLLM, and more! - Phd applicants: I am recruiting 2-3…
@lmsysorg
lmsys.org
5 months
LMSys members will be NeurIPS! @lm_zheng @infwinston @ying11231 @DachengLi177 @haozhangml Look forward to meeting with people. Let us know if you’d like to chat!
1
2
39
0
3
32
@haozhangml
Hao Zhang
2 months
This is the first project launch we did at my lab at UCSD () Feeling super proud of my student @Junda_Chen_ for the hard work and thanks to @Lanxiang_Hu for helping us build this lab website 😁😆
@haozhangml
Hao Zhang
2 months
Check out our latest blogpost discussing a better metric -- goodput (throughput s.t. latency constraints) -- for LLM serving, and our new technique prefill-decoding disaggregation that optimizes goodput and achieves lower cost-per-query and high service quality at the same time!
0
9
35
0
1
32
@haozhangml
Hao Zhang
26 days
Checkout Megalodon: a new alternative architecture of transformers: - head-by-head comparison at the scale of 7B and 2T tokens showing lower ppl - unlimited ctx len - constant KV cache at inference Exciting work by @MaxMa1987 @violet_zct @_xiaomengy_ Ckpts available soon!
@violet_zct
Chunting Zhou
26 days
How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head…
Tweet media one
Tweet media two
4
51
228
0
8
32
@haozhangml
Hao Zhang
2 years
ICML is a lot of fun! First in-person conference since the pandemic! Gave a big model tutorial yesterday with @lm_zheng @zhuohan123 and Ion. Met a lot friends! Check out the tutorial website: and a lot useful information there!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
4
28
@haozhangml
Hao Zhang
1 year
Super excited about this new work (to appear at #ICLR2023 ) that offers a new perspective for private transformer inference! Also, the first work that I supervised as an advisor 😁!
@DachengLi177
Dacheng Li
1 year
(1/5) Love using Copilot but don’t want to send codes to the cloud? Our framework enables private inference with Secure Multiparty Computation (MPC) for Transformers (Copilot, ChatGPT, OPT, etc) #ICLR2023 (spotlight) Paper: Code:
Tweet media one
4
13
29
1
3
27
@haozhangml
Hao Zhang
1 month
Happy 1 year birthday🎂 to Vicuna. The past 12 months have been super fun building @lmsysorg with amazing students and faculty here
@lmsysorg
lmsys.org
1 month
One year ago was Vicuna's birthday🎂! We were so excited and built a demo for it at chat .lmsys .org. We never imagined it could get this far. Millions of people downloaded our models, visited our demo, and played with our fine-tuning recipe in FastChat project. We then…
7
21
198
0
0
26
@haozhangml
Hao Zhang
18 days
It is super fun working with @vivek7ue and the Snowflake AI Research team on shipping the Snowflake Arctic model. Try this model which went live today if you haven't. One thing I found super exciting is that we discovered so many new challenges and open problems when you shift…
@vivek7ue
Vivek Raghunathan
18 days
A lot of the insider knowledge on how to build an LLM has gone underground in the last 24 months. We are going to build #SnowflakeArctic in the open Model arch ablations, training and inference system performance, dataset and data composition ablations, post-training fun, big…
9
82
632
0
3
26
@haozhangml
Hao Zhang
12 days
And credits to @infwinston , @lm_zheng , and members at @lmsysorg who tirelessly update models and tweets, and moderate arena for the entire community!
@natolambert
Nathan Lambert
12 days
Really interesting talk -- good for people to remember that even ChatBotArena almost died in the summer of 2023 because it took them that long to get traction/support. Huge congrats @haozhangml and team.
Tweet media one
2
13
70
1
3
26
@haozhangml
Hao Zhang
15 days
(perhaps) the most important topic in LLMs -- the data recipe!
@SnowflakeDB
Snowflake
16 days
We’re excited to share insights and lessons learned collecting the data needed for Arctic as part of our #SnowflakeArctic Cookbook Series. 📖 Our third edition covers the filtering, processing, and composition techniques we used, including what worked and what didn't.
1
5
35
0
1
25
@haozhangml
Hao Zhang
1 year
We #lmsys are pushing the limit to democratize LLMs. Check out our official release of the Vicuna-7B weights. What's more exciting -- we add Apple silicon support! Now you can run it on your M1/M2 Macbook, with or without 8-bit, depending on your memory setup!
@lmsysorg
lmsys.org
1 year
We’re releasing Vicuna-7B: small, efficient, yet capable. 💻 MacBook users can simply "pip install fschat" and run Vicuna-7B with GPU acceleration on M1 chips! code: weights:
Tweet media one
23
226
1K
0
3
24
@haozhangml
Hao Zhang
8 months
@abacaj And wont be sued by openai?
4
0
22
@haozhangml
Hao Zhang
11 months
MPT-30B on Leaderboard now, check the leaderboard and give a vote in the Arena! Within just 6 hours of releasing MPT-30B, it has already made its way onto the Arena and leaderboard. Kudos to the amazing team behind it: @infwinston , @ying11231 , and @lm_zheng ! 🎉
@lmsysorg
lmsys.org
11 months
Update: a strong model MPT-30B-Chat by @MosaicML has just landed Arena🤖! And yes.. We’ve also evaluated it with MT-bench and updated our leaderboard in the blog! See screenshots. Arena: MT-bench demo with model answers/judgments
Tweet media one
Tweet media two
Tweet media three
2
19
113
1
2
22
@haozhangml
Hao Zhang
6 months
@StasBekman Our team gave a tutorial discussing this at ICML 2022 last year:
1
0
21
@haozhangml
Hao Zhang
11 months
Check out vLLM -- redefining the new state-of-the-art in LLM serving: 24x and 3.5x more throughput than HF transformers and TGI, respectively. Secret sauce behind @lmsysorg : vLLM + FastChat = making Chatbot/LLM serving accessible to everyone!
@zhuohan123
Zhuohan Li
11 months
🌟 Thrilled to introduce vLLM with @woosuk_k ! 🚀 vLLM is an open-source LLM inference and serving library that accelerates HuggingFace Transformers by 24x and powers @lmsysorg Vicuna and Chatbot Arena. Github: Blog:
20
265
1K
1
0
21
@haozhangml
Hao Zhang
1 year
@Stone_Tao @lmsysorg see the blogpost -- we also performed a winrate analysis here: But using ELO is because it is scalable -- in future when new model joins we can easily compare their capabilities via ELO ratings
1
0
21
@haozhangml
Hao Zhang
11 months
We were among the first to eval Vicuna&other open models using GPT-4 -- previously it was for fun (despite it was later adopted by many others) -- but now we had a rigorous study of this method after running Chatbot Arena for 1 month! GPT-4 eval is actually *creditable* when…
@lmsysorg
lmsys.org
11 months
Since Vicuna's debut of GPT-4 as a judge, interest has sparked in using powerful LLMs to evaluate chatbots. But can we trust LLM judges? Our latest study delves into this question using a multi-turn benchmark, MT-bench, and data from Chatbot Arena.
3
44
256
0
0
20
@haozhangml
Hao Zhang
6 months
Yes, we are partnering with Kaggle to push open evaluation of LLMs! Our chatbot arena platform () has some substantial improvement (both on UI and backend). Cast your vote and contribute to open LLM development!
@lmsysorg
lmsys.org
6 months
We're super excited to partner with @kaggle , welcoming the ML and data science community to Arena! Yesterday's Kaggle launch, we recorded the highest traffic to date since the Arena launch! Over 4K votes in a day🗳️ Our mission remains building an open and community-first…
Tweet media one
2
23
163
0
0
20
@haozhangml
Hao Zhang
1 year
Hey! Check this out. Come and play vicuna we built recently. We also come up with an interesting way to evaluate chatbots using GPT-4, and GPT-4 says vicuna is very close to chatgpt quality!
@lmsysorg
lmsys.org
1 year
Introducing Vicuna, an open-source chatbot impressing GPT-4! 🚀 Vicuna reaches 90%* quality of ChatGPT/Bard while significantly outperforming other baselines, according to GPT-4's assessment. Blog: Demo:
58
549
2K
1
3
17
@haozhangml
Hao Zhang
9 months
Vicuna-v1.5 (built on top of Llama) series were just released! Improved MT-Bench and MMLU, longer context length, more permissive license!
@lmsysorg
lmsys.org
9 months
Excited to release our latest Vicuna v1.5 series, featuring 4K and 16K context lengths with improved performance on almost all benchmarks! Vicuna v1.5 is based on the commercial-friendly Llama 2 and has extended context length via positional interpolation. Since its release,…
Tweet media one
Tweet media two
24
141
680
1
0
17
@haozhangml
Hao Zhang
5 months
It seems we got two grants from A16z in 6 months 😆 ( @lmsysorg and vllm)
@BornsteinMatt
Matt Bornstein
5 months
We're announcing the second batch of @a16z open source AI grants today This cohort focuses on: ▶️ tools for LLM training/ hosting/ evals ▶️ visual AI models & communities Thank you to the grantees for your contributions! More info in the linked post
15
41
244
0
0
18
@haozhangml
Hao Zhang
1 year
The really interesting part I found when doing this project is that -- with just a little amount of good data, it is actually so easy and so inexpensive to tune a chatbot that answers quite well to users. Tuning Vicuna only costs ~$300. LLMs will be very accessible!
@lmsysorg
lmsys.org
1 year
Through careful prompt engineering, GPT-4 is able to accurately evaluate the response quality in most cases, as shown in the example below. More examples: Code:
Tweet media one
2
10
72
0
1
16
@haozhangml
Hao Zhang
1 year
Latest work by our Alpa team on massively serving large models like #gpt3 #chatgpt ; The idea is super simple yet the results are surprising! Check this out!
@zhuohan123
Zhuohan Li
1 year
Unlock the full potential of model parallelism with AlpaServe 🚀: Besides scaling models beyond one GPU, our new paper shows that model parallelism can process NN serving requests 10x faster even if the models fit into 1 GPU! Paper: 👇 [1/8]
Tweet media one
3
20
150
0
0
16
@haozhangml
Hao Zhang
11 months
How long do open LLMs truly promise on context length? We design some simple tests and try to reveal some false promises!
@lmsysorg
lmsys.org
11 months
🔥Introducing LongChat🤖, our new chatbots supporting 16K tokens context, and LongEval, our new benchmark for testing long context chatbots. 🤥Surprisingly, we found open LLMs often fail to achieve their promised context length. Check our blog for details:
Tweet media one
4
106
477
0
0
15
@haozhangml
Hao Zhang
1 year
I actually have switched to use vicuna (previous was ChatGPT) for writing assistance everyday as I hosted it myself😁
@marcotcr
Marco Tulio Ribeiro
1 year
Blog post: playing with Vicuna-13B, ChatGPT (3.5), MPT-7B-Chat on harder stuff TL;DR: We think ChatGPT is still way ahead, but sometimes the extra control from open source models is worth it.
4
54
313
2
1
15
@haozhangml
Hao Zhang
11 months
We just released a new series of Vicuna v1.3 models -- including the latest, eagerly awaited Vicuna-33B! We also significantly updated the Chatbot Arena leaderboard -- more models entering the Arena, more comprehensive metrics: Elo, MT-bench scores, and MMLU.
@lmsysorg
lmsys.org
11 months
🔥Big news from Chatbot Arena: Meet our new MT-Bench leaderboard & Vicuna-33B! We present a comprehensive, scalable, and validated leaderboard differentiating across open (Falcon, Wizard & Guanaco) and proprietary models (GPT-4, Claude & PaLM). Blog post:
Tweet media one
14
102
437
1
2
14
@haozhangml
Hao Zhang
2 years
Our ICML'22 tutorial recording is now publicly available (no registration needed): Interested in knowing how large models like GPT-3 are trained and served? This tutorial has plenty of info! Check out for more!
0
3
14
@haozhangml
Hao Zhang
9 months
Glad that vLLM is recognized!
@zhuohan123
Zhuohan Li
9 months
Deeply honored to be the first cohort of the program and a big shout-out to @a16z for setting up the grant and recognizing vLLM! Let's go, open source!
8
3
91
0
0
15
@haozhangml
Hao Zhang
5 months
Very very valuable release by my colleagues at Petuum/MBZUAI/CMU. The first ever fully open sourced LLM pretraining trajectory. I plan to read every detail of this paper: !
@llm360
LLM360
5 months
🚀 1/7 We are thrilled to launch LLM360 — pushing the frontier of open-source & transparent LLMs! Starting with Amber (7B) & CrystalCoder (7B), we are releasing brand new pre-trained LLMs with all training code, data, and up to 360 model checkpoints. 🔗
19
191
1K
2
2
15
@haozhangml
Hao Zhang
8 months
@chrisatgradient Do you mind submitting a PR to vLLM or open an issue? We can fix it in vLLM.
1
0
14
@haozhangml
Hao Zhang
1 year
Actually many of the Vicuna developers #lmsys are hardcore distributed system folks and we'll have more efficient stuff coming out soon! 😍😍
@lmsysorg
lmsys.org
1 year
We encourage users to use our default library, FastChat, for using Vicuna. - It correctly handles prompt templates for the best model quality. - Supports CPU/GPU/Mac. - Nice CLI and GUI with streaming and syntax highlighting - More updates coming soon [3/3]
1
0
22
1
2
14
@haozhangml
Hao Zhang
30 days
We just added categories in chatbot arena. Now you can see how these models compare to each other under code/different languages/context length!
@lmsysorg
lmsys.org
1 month
We tag all the conversations containing code snippets in Coding Arena. In this domain, we find GPT-4-Turbo performs even stronger. This aligns with the recent finding in challenging coding benchmark such as LiveCodeBench by You can also easily view…
Tweet media one
3
10
118
1
3
14
@haozhangml
Hao Zhang
10 months
Thank everyone for casting votes at Chatbot Arena. Besides the leaderboard, we're releasing the conversations and votes we collected so far to the community to foster more open research down the road! Check blog post for the dataset details:
@lmsysorg
lmsys.org
10 months
We are excited to announce the first major release of the Chatbot Arena conversation dataset! - 33K conversations with pairwise human preferences - 20 SOTA models such as GPT-4, Claude, and LLaMA-based Vicuna - From 13K unique IPs in the wild - An additional 3K expert-level…
Tweet media one
Tweet media two
14
177
731
0
0
14
@haozhangml
Hao Zhang
1 year
@generatorman_ai @lmsysorg @OpenAI @AnthropicAI the team is working hard to add them and contributions are welcome!
2
0
14
@haozhangml
Hao Zhang
10 months
Yep I'll be at ICML next week with many other team members -- looking forward to catching up!
@lmsysorg
lmsys.org
10 months
Our members @ying11231 @lm_zheng @infwinston @haozhangml will attend ICML 🏝️ next week. DM us if you want to chat!
0
2
21
0
0
14
@haozhangml
Hao Zhang
1 year
@arankomatsuzaki Interesting paper with an obvious conclusion that surprises no one. IMO the only and most important things here are which capability of the LLMs you care most and how you evaluate them.
0
0
14
@haozhangml
Hao Zhang
1 year
0
0
13
@haozhangml
Hao Zhang
18 days
Our Reza is sharing how to optimize big MoE kernels !
@Reza_LOD
Reza
18 days
1/4 Have you wondered how to optimize sys-perf for training Arctic-like models (MoE arch)? Let’s dive in! Our first technique: custom fused kernels. By crafting these kernels, we streamline irregular and sparse operators, boosting efficiency. #SnowflakeArctic #SystemOptimization
Tweet media one
6
9
37
0
0
12
@haozhangml
Hao Zhang
6 months
@srush_nlp the sharing feature we have now is that when you sample multiple candidates from a prompt, vLLM will share as many kv caches as possible at generation, such as the KV cache of the prompt itself and overlapped beam branches. If you specify your sampling method as parallel sampling…
4
0
12
@haozhangml
Hao Zhang
9 months
Longchat-7b gets an upgrade! 32K context length with Llama-2 as the new base!
@DachengLi177
Dacheng Li
9 months
Along with Vicuna-v1.5, we also released LongChat-v1.5, based on Llama-2 and 32k context length. You can try it in FastChat or evaluate it in the LongChat repo !
1
18
79
0
0
12
@haozhangml
Hao Zhang
6 months
UCSD HDSI/CS is on a flying rocket upper trajectory! Our board area search is open: including assistant professor positions!
@MountainOfMoon
Arya Mazumdar
6 months
Just a reminder that application portals are open for PhD admissions, multiple postdoc positions and faculty positions at @HDSIUCSD and @ucsd_cse
Tweet media one
0
1
10
0
1
12
@haozhangml
Hao Zhang
6 months
Our department at UCSD is looking for new faculty members. Check out the job postings below ⬇️⬇️
@GuptaUcsd
Rajesh K. Gupta
7 months
Multiple faculty positions are open at @HDSIUCSD at all levels of seniority. * Broad Area Search: All areas of AI, Machine Learning * Statistical Foundations of Data Science * Data Science and Bioengineering * Data Sciences and Public Policy * Teaching Faculty in all areas of…
Tweet media one
0
10
25
1
1
12
@haozhangml
Hao Zhang
9 months
I know recently there are two classes of LLM devs: GPU-rich are racing to AGI while GPU-poor is dethroning another GPU-poor on yet another benchmark. As GPU-poor, I hope this isn't something considered "counter-productive use of our skills and time" 🤣🤣
@lmsysorg
lmsys.org
9 months
We’ve just added two powerful chat and coding models, Llama-chat-70b and CodeLlama-34b-instruct, to Arena! Challenge them with your toughest prompts and watch them climb the leaderboard. We’ll soon update  ranking once we get enough votes🗳️ Link:
5
19
108
0
0
12
@haozhangml
Hao Zhang
6 months
New information might change my perspective but my strong respects to @sama and @gdb for leading our current AI industry and shipping one of the greatest product chatgpt. Looking forward to seeing your next venture and hope to see another model produced by you championing our…
@gdb
Greg Brockman
6 months
After learning today’s news, this is the message I sent to the OpenAI team:
Tweet media one
2K
5K
36K
1
0
11
@haozhangml
Hao Zhang
6 months
@srush_nlp that seems like a feature we can support pretty quickly
1
0
11
@haozhangml
Hao Zhang
7 months
@arankomatsuzaki Isn't this just section 4.4 of the paged attention/vllm paper? 😅😅
1
0
10
@haozhangml
Hao Zhang
26 days
@Michaelvll1
Zhanghao Wu
26 days
I am honored to share that our recent paper won the Outstanding Paper Award in NSDI’24! The paper explores the policy design of our SkyPilot managed spot for @skypilot_org : Can’t Be Late: Optimizing Spot Instance Savings under Deadlines It would not be possible, if it were not…
Tweet media one
10
5
88
0
0
11
@haozhangml
Hao Zhang
6 months
check out latest benchmark about vLLM vs. DeepSpeed
@woosuk_k
Woosuk Kwon
6 months
We’ve just released a new blog post comparing vLLM with DeepSpeed-FastGen. While we are happy to see the open-source technology advancements from the DeepSpeed team, we’ve got different results with more extensive performance benchmarks. vLLM is actually faster than DeepSpeed in…
Tweet media one
3
30
210
0
0
10
@haozhangml
Hao Zhang
2 months
Check this important update on the chatbot arena leaderboard we just posted!
@lmsysorg
lmsys.org
2 months
[Arena Update] 70K+ new Arena votes🗳️ are in! Claude-3 Haiku has impressed all, even reaching GPT-4 level by our user preference! Its speed, capabilities & context length are unmatched now in the market🔥 Congrats @AnthropicAI on the incredible Claude-3 launch! More exciting…
Tweet media one
30
236
1K
0
0
10
@haozhangml
Hao Zhang
1 year
Come and play Databricks Dolly 2.0; compare open-source chatbots yourself! 😁
@lmsysorg
lmsys.org
1 year
@databricks Dolly 2.0 is now live on @lmsysorg demo server! Come and chat with 🐑🦙
0
10
22
0
1
10
@haozhangml
Hao Zhang
2 years
@Marc__Watkins @pmddomingos For now due to high traffic (as a free web demo) we restrict the max_seq_len to 512 and hardcode some generation params. But you can refer to our tutorial on how to tune a good set of generation params for your prompt; but indeed it is opt-175b!
0
0
9
@haozhangml
Hao Zhang
1 year
Our approach of evaluating LLMs: 🎉The Season 1 results of the EPIC open-source Chatbot Arena are revealed! 👀💥! Join the arena at , and be the judge yourself!
@lmsysorg
lmsys.org
1 year
Evaluating LLMs is notoriously difficult, and academic benchmarks may fail. Inspired by chess and MOBA games, we are taking a new approach by calculating Elo ratings of models with crowdsourced battle data. - Blog: - Leaderboard:
Tweet media one
31
277
1K
0
0
9
@haozhangml
Hao Zhang
1 year
The past 8 years have been a fantastic journey: studying @SCSatCMU , doing a startup with @PetuumInc , postdoc @ucbrise , and working with friends at @anyscalecompute . Really value all the friends and advisors I've met, and looking forward to the next stage at @UCSD . 2/5
0
0
9
@haozhangml
Hao Zhang
1 year
Join the action NOW at ! 💪🏼💻
@lmsysorg
lmsys.org
1 year
We are hosting an exciting battle between open LLMs at and need your help! The goal is to collect 10k anonymous battle results and release a leaderboard. # progress ▓▓░░░░░░░░ 17.21%
Tweet media one
5
59
203
0
0
8
@haozhangml
Hao Zhang
11 months
It was great to talk to Emily @electric_humans and thanks for describing our LLM evaluation effort in Chatbot Arena to PCMag audiences!
@electric_humans
Emily Dreibelbis
11 months
You may have put the same prompt into ChatGPT, Bard, Claude, or another chatbot to see which one you like best. But what happens when 40K people do that? Check out the winner in this live UC Berkeley competition. @haozhangml @lmsysorg #OpenAI #GPT4 #GPT3
1
3
4
1
0
9
@haozhangml
Hao Zhang
1 year
This is our major effort #lmsysorg on building commercial-use friendly chatbots: absolutely NO restriction on license. Primarily contributed by my former student @DachengLi177 .
@lmsysorg
lmsys.org
1 year
We are excited to release FastChat-T5: our compact and commercial-friendly chatbot! - Fine-tuned from Flan-T5, ready for commercial usage! - Outperforms Dolly-V2 with 4x fewer parameters. Link:
Tweet media one
Tweet media two
30
153
743
1
1
9
@haozhangml
Hao Zhang
19 days
Check out this awesome work by (it is maitrix, not matrix!😄)
@MaitrixOrg
Maitrix.org
19 days
Releasing 🔥LLM Reasoners v1.0🔥 🥇Popular library for advanced LLM reasoning - Reasoning-via-Planning (RAP)🎶 - Chain-of-Thoughts (CoT)⛓️ - Tree-of-Thoughts (ToT)🌴 - Grace decoding💄 - Beam search🔎 🥇Enhances #Llama3 , GPT4, LLMs on @huggingface
Tweet media one
2
59
189
0
1
9
@haozhangml
Hao Zhang
9 months
Thanks for putting vLLM on the top 😁
@appenz
Guido Appenzeller
9 months
With Llama 2 turning out to be a viable GPT-3.5 alternative, LLM serving frameworks are getting a lot of attention. Interesting new benchmark: vLLM and CTranslate looking good as usual, and MII/DeepSpeed if you want multiple replicas.
2
4
15
0
0
8
@haozhangml
Hao Zhang
8 months
Yes, FastChat is absolutely amazing. It - is super scalable and elastic: we used it to serve models across global regions. - define and unify chatbot templates w/ community's effort - integrates diverse backend serving engines and frontend chatbot models If you are up to serve…
@profjoeyg
Joey Gonzalez
8 months
Our FastChat project is killing it! Great work @lm_zheng , @infwinston , and @haozhangml ! I am looking forward to the big announcements next week. 🏟️
1
3
18
1
0
8
@haozhangml
Hao Zhang
6 months
@mustafasuleyman And let's see if it is also the second best on chatbot arena 😃:
@mustafasuleyman
Mustafa Suleyman
6 months
Thrilled to announce that Inflection-2 is now the 2nd best LLM in the world! 💚✨🎉 It will be powering very soon. And available to select API partners in time. Tech report linked... Come run with us!
79
118
1K
1
0
8
@haozhangml
Hao Zhang
10 months
@lmsysorg According to @natolambert 's estimation the data might value at 36K * $10/prompt (assuming 2 turns in average) = $360K? No joke 🤑🤑🤣
@natolambert
Nathan Lambert
10 months
if you're wondering why preference data is so expensive update: I did some math wrong, the ~$20 is per prompt, so it's like >$5 per turn which is still way more than people think. $10mil, not 20
2
1
24
1
0
7
@haozhangml
Hao Zhang
1 month
@PY_Z001 Why don’t try lightseq?😀
1
1
7
@haozhangml
Hao Zhang
4 months
Amazing. Very much looking forward to Gemini Ultra now. Will it dethrone GPT-4-turbo?
@lmsysorg
lmsys.org
4 months
🔥Breaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini…
Tweet media one
155
632
3K
0
1
7
@haozhangml
Hao Zhang
6 months
True -- also the principle guided the design of lookahead decoding. Suppose there're near-infinite flops in the near future, LLM decoding will just become instant.
@elonmusk
Elon Musk
6 months
True
4K
3K
23K
0
0
7
@haozhangml
Hao Zhang
11 months
very interesting 🧐
@Francis_YAO_
Yao Fu
11 months
Is Falcon really better than LLaMA? Short take: probably not. Longer take: we reproduced LLaMA 65B eval on MMLU and we got 61.4, close to the official number (63.4), much higher than its Open LLM Leaderboard number (48.8), and clearly higher than Falcon (52.7). Code and prompt…
34
129
722
0
0
7
@haozhangml
Hao Zhang
10 months
Vicuna-33B silently tops the AlpacaEval leaderboard 😅🧐.
@lmsysorg
lmsys.org
10 months
Thrilled to see Vicuna-33B top on the AlpacaEval leaderboard! Nonetheless, it's crucial to recognize that open models are still lagging behind in some areas, such as math, coding, and extraction as per our latest MT-bench study [2, 3]. Plus, GPT-4 may occasionally misjudge,…
Tweet media one
Tweet media two
8
77
362
0
0
7
@haozhangml
Hao Zhang
11 months
The new MT-bench can highlight the gaps between LLMs in specific categories/tasks -- see interesting findings in the blog:
@lmsysorg
lmsys.org
11 months
Key takeaways: - Strong correlation between MT-Bench & Arena Elo - Noticeable gaps between best open vs. proprietary models - Category breakdown reveals area for open models to improve. e.g., Extraction, Coding, & Math - Multi-turn helps differentiating chatbots
Tweet media one
Tweet media two
2
2
22
0
0
7
@haozhangml
Hao Zhang
1 year
Github link here:
0
0
7
@haozhangml
Hao Zhang
1 year
@4evaBehindSOTA @lmsysorg honestly it is extremely hard to evaluate chatbots at the presence of pretrained models using all Internet data
2
0
7
@haozhangml
Hao Zhang
1 year
Congrats @DachengLi177 for the new work and Dacheng is applying for 23Fall Phd!!
@DachengLi177
Dacheng Li
1 year
This is a joint work with @HongyiWang10 @haozhangml and @ericxing . Check out our code at: (4/4)
0
1
1
0
1
7
@haozhangml
Hao Zhang
5 months
New updates on our chatbot arena -- a lot of latest open & close models are in the play!
@lmsysorg
lmsys.org
5 months
Exciting Arena Leaderboard Updates! Six new models: - Tulu-2-DPO-70B and Yi-34B-Chat are the new SoTA open models - Mistral-based 7B models (OpenChat, OpenHermes-2.5, Starling-7B) are stronger than ever Big congrats to the OSS AI community! Learn more
Tweet media one
Tweet media two
12
74
344
0
0
6
@haozhangml
Hao Zhang
6 months
@jerryjliu0 Glad that I revised and submitted two rebuttals lol
0
0
5
@haozhangml
Hao Zhang
1 year
I interviewed with @HDSIUCSD last spring and met a lot of extremely kind and smart people @TweetAtAKK @GuptaUcsd @ilkayaltintas @yuqirose etc., and my old friend @ZhitingHu . The culture in HDSI and UCSD campuses is super welcoming. 3/5
0
0
6
@haozhangml
Hao Zhang
11 months
Meanwhile we introduce two LongChat models with a context length 16K, using the RoPE condensing method discussed by @yacineMTB . They have much better retrieving abilities than other open LLMs (at the claimed context length), try them and give us feedbacks!
@lmsysorg
lmsys.org
11 months
LongChat is currently available in two sizes: 7B and 13B. The preview versions are available at and
2
3
24
0
1
6
@haozhangml
Hao Zhang
6 months
@exists_forall Hi nice paper, but I am wondering if you are aware of our work lightSeq: posted on arxiv a few months ago. The core idea of your optimization for casual mask appears almost same with ours.
1
0
6
@haozhangml
Hao Zhang
5 months
@migtissera can you tune mixtral using 4x80G?
1
0
5
@haozhangml
Hao Zhang
11 months
@ReporterWeather @BorisMPower @lmsysorg Not exactly equivalent.. Testing Bard is harder as there is no API access -- we're trying though.
0
0
5
@haozhangml
Hao Zhang
3 months
pretty neat trick to make LLM faster for structured outputs
@lmsysorg
lmsys.org
3 months
Here is another example using LLaVA! You can use this feature to extract structured information from images.
6
2
31
0
0
5