Not sure if that's a fair comparison when bard is using search API while GPT-4 and other models are not (example below). The baremetal Gemini Pro API seems to be in between Mixtral 8*7B and GPT-3.5. So the key difference is search that greatly improves human preference? Tweet added by Banghua Zhu @BanghuaZ

Banghua Zhu

5 months

Not sure if that's a fair comparison when bard is using search API while GPT-4 and other models are not (example below). The baremetal Gemini Pro API seems to be in between Mixtral 8*7B and GPT-3.5. So the key difference is search that greatly improves human preference?

lmsys.org

@lmsysorg

5 months

🔥Breaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini…

154

628

3K

6

10

60

Who is John Galt? 🇨🇱

@whoisjohngaltcl

5 months

@BanghuaZ @jeremyphoward Tools…

0

Axel Pond | Generapedia

@axel_pond

5 months

@BanghuaZ @jeremyphoward I was excited at first but upon closer inspection it is clear they are comparing apples to oranges.

0

Blaze (Balázs Galambosi)

@gblazex

5 months

@BanghuaZ Perplexity Online is using search api too in Chatbot Arena. (Try asking about recent events) I think it’s “fair” in a sense that they separate the Gemini API from Bard. So we can compare. Many things are diff on Arena. Params, MoE/not, closed/open

Blaze (Balázs Galambosi)

@gblazex

5 months

Very important that Bard is available for FREE on - Yes, it might use RAG or other techniques in the background that OpenSource models don't - But GPT-4, Claude & Mistral Medium are closed too. Cannot see under the hood either - Bard is the only free one!

5

61

1

0

4

The Day Quest

@TheDayQuest

5 months

@BanghuaZ I have to admit, beyond what you’re saying, I think it’s in the best interests of society if Google fades away. They gave us a bunch, but they stopped improving humanity a long time - Google search is a joke these days that only returns an alternating list of topic summaries.

0

florianleuerer

@florianleuerer

5 months

@BanghuaZ Yes and it really makes the Leaderboard kind of useless. @lmsysorg please add at least a column and a filter for „chatbot / model“. It used to be the last useful LLM benchmark …

0

1

kolergy

@kolergy

5 months

@BanghuaZ it is about how usefull those models are not about comparing just the raw model below.

0

Replies