@BanghuaZ
Banghua Zhu
5 months
Not sure if that's a fair comparison when bard is using search API while GPT-4 and other models are not (example below). The baremetal Gemini Pro API seems to be in between Mixtral 8*7B and GPT-3.5. So the key difference is search that greatly improves human preference?
Tweet media one
@lmsysorg
lmsys.org
5 months
🔥Breaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini…
Tweet media one
154
628
3K
6
10
60

Replies

@whoisjohngaltcl
Who is John Galt? 🇨🇱
5 months
0
0
0
@axel_pond
Axel Pond | Generapedia
5 months
@BanghuaZ @jeremyphoward I was excited at first but upon closer inspection it is clear they are comparing apples to oranges.
0
0
0
@gblazex
Blaze (Balázs Galambosi)
5 months
@BanghuaZ Perplexity Online is using search api too in Chatbot Arena. (Try asking about recent events) I think it’s “fair” in a sense that they separate the Gemini API from Bard. So we can compare. Many things are diff on Arena. Params, MoE/not, closed/open
@gblazex
Blaze (Balázs Galambosi)
5 months
Very important that Bard is available for FREE on - Yes, it might use RAG or other techniques in the background that OpenSource models don't - But GPT-4, Claude & Mistral Medium are closed too. Cannot see under the hood either - Bard is the only free one!
5
5
61
1
0
4
@TheDayQuest
The Day Quest
5 months
@BanghuaZ I have to admit, beyond what you’re saying, I think it’s in the best interests of society if Google fades away. They gave us a bunch, but they stopped improving humanity a long time - Google search is a joke these days that only returns an alternating list of topic summaries.
0
0
0
@florianleuerer
florianleuerer
5 months
@BanghuaZ Yes and it really makes the Leaderboard kind of useless. @lmsysorg please add at least a column and a filter for „chatbot / model“. It used to be the last useful LLM benchmark …
0
0
1
@kolergy
kolergy
5 months
@BanghuaZ it is about how usefull those models are not about comparing just the raw model below.
0
0
0