Gemini 1.5 Model Family: Technical Report updates now published
In the report we present the latest models of the Gemini family – Gemini 1.5 Pro and Gemini 1.5 Flash, two highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information
gemini-1.5 flash was the winner (by a substantial margin!)
- claude haiku is a yes-man :( he told me a shitty classifier was nearly 100% accurate
- 3.5-turbo errs on the side of false negatives: impossible to please!
- flash is the most balanced & has highest agreement with gpt4o
Having worked both as a heavy GCP user and as an engineer on GCP, I’ve been pretty frustrated with both reading bad docs and spending a lot of time thinking about how to write accurate but succinct docs. Extremely bullish on LLMs to solve this, especially with cheap long context
@suchenzang
@TheSeaMouse
GCP’s biggest flaw is that the documentation is bloated and impossible to navigate. It’s such a pain in the ass to figure out how to do anything
introducing SEAL leaderboards -- strong contender to
@lmsysorg
with (1) private/secure and (2) experts-annotated, high-quality datasets.
some models were even self-hosted on the infrastructure built by our ML infra team!
check it out!
Sonnet by Gemini 1.5 Flash:
The second order climb, a steep ascent,
Where algorithms strive, their purpose clear,
To find the peak, where truth is manifest,
A summit bright, where all the answers cheer.
The optimizer, tireless in its quest,
(1/3)
@ankesh_anand
@CasualBrady
more latency improvements coming soon - Groq also supports only 8k context, with its max speeds coming from aggresively-quantized Llama 70B at 4k
With gradients as its map, it seeks to know
The hidden path, the secret of the crest,
Where wisdom waits, a treasure yet to show,
And heaven's will descends, to put to test
The searching mind, the heart that yearns to find
The truth that lies beyond, for all mankind.
(3/3)
@suquimdecaja
@jeremyphoward
Caching should produce the exact same response distribution as no caching, so no impact on hallucination or inaccuracy rate
@SavinovNikolay
Very interesting that 1.5 Flash demonstrates better performance vs. 1.5 Pro on higher-level languages like py/ts and worse on lower-level cpp/rs
@deedydas
The right framing is that AI will fundamentally change software engineering, just as software engineering today is a fundamentally different job from the 50s. This new form of building logical abstractions will replace today’s software engineering.
People claim LLM knowledge distillation is trivial with logprobs, but that's not quite right...
It's very tricky to distill between different tokenizers. [1]
Internally, we've solved this with a clever algorithm we called tokenization transfer
(1/7)
@yasinhassanien
@SullyOmarr
@OfficialLoganK
1. Cost: if you have a codebase or list of docs you want to reference in your queries, you can reduce cost of subsequent queries and not continue paying full cost for what’s cached
2. Latency: context caching can reduce latency on subsequent queries
3. Easier prompt construction
@danesonance
Re: 1) I think this is where I imagine feeding the entire REST API spec to an LLM and having it summarize / generate relevant docs to your particular query (e.g. how to create a VM with X GB RAM, Y CPUs).
I think LLMs still hallucinate too much to do this 100% but it’s coming.