Phil Chen Profile
Phil Chen

@philhchen

733
Followers
203
Following
1
Media
34
Statuses

in pursuit of continuous learning @GoogleDeepMind . previously @scale_AI , math/cs @Stanford @SISLaboratory

Joined August 2022
Don't wanna be here? Send us removal request.
@philhchen
Phil Chen
1 month
Flash is natively multimodal, faster than Haiku/3.5, and supports 2M context. We’re adding caching coming soon for even faster latency.
Tweet media one
@JeffDean
Jeff Dean (@🏡)
1 month
Gemini 1.5 Model Family: Technical Report updates now published In the report we present the latest models of the Gemini family – Gemini 1.5 Pro and Gemini 1.5 Flash, two highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information
Tweet media one
Tweet media two
Tweet media three
28
240
1K
7
18
182
@philhchen
Phil Chen
1 month
Latency, cost, and quality: choose all three with flash
@andersonbcdefg
Ben (e/sqlite)
1 month
gemini-1.5 flash was the winner (by a substantial margin!) - claude haiku is a yes-man :( he told me a shitty classifier was nearly 100% accurate - 3.5-turbo errs on the side of false negatives: impossible to please! - flash is the most balanced & has highest agreement with gpt4o
Tweet media one
12
10
114
0
1
20
@philhchen
Phil Chen
1 month
Having worked both as a heavy GCP user and as an engineer on GCP, I’ve been pretty frustrated with both reading bad docs and spending a lot of time thinking about how to write accurate but succinct docs. Extremely bullish on LLMs to solve this, especially with cheap long context
@danesonance
dane
1 month
@suchenzang @TheSeaMouse GCP’s biggest flaw is that the documentation is bloated and impossible to navigate. It’s such a pain in the ass to figure out how to do anything
0
0
5
3
1
18
@philhchen
Phil Chen
23 days
Congrats to @tiffzhao05 and team! Very useful for the future of model evaluation
@tiffzhao05
Tiffany Zhao
23 days
introducing SEAL leaderboards -- strong contender to @lmsysorg with (1) private/secure and (2) experts-annotated, high-quality datasets. some models were even self-hosted on the infrastructure built by our ML infra team! check it out!
0
0
6
0
0
4
@philhchen
Phil Chen
1 month
Sonnet by Gemini 1.5 Flash: The second order climb, a steep ascent, Where algorithms strive, their purpose clear, To find the peak, where truth is manifest, A summit bright, where all the answers cheer. The optimizer, tireless in its quest, (1/3)
@_arohan_
rohan anil
1 month
“Second order climb, Optimizer seeks the peak, Heaven's will descends.” - Haiku by Gemini 1.5 Flash
4
2
23
1
0
4
@philhchen
Phil Chen
1 month
@TheSeaMouse Will take a look - thanks for reporting. This is a lot slower than expected.
2
0
4
@philhchen
Phil Chen
21 days
@virattt @scale_AI @GroqInc With upcoming Gemini caching APIs, your 128k+ context queries will be discounted if you have repeatable context
1
0
2
@philhchen
Phil Chen
1 month
@andersonbcdefg Curious how Flash compares with Sonnet
1
0
3
@philhchen
Phil Chen
24 days
@ankesh_anand @CasualBrady more latency improvements coming soon - Groq also supports only 8k context, with its max speeds coming from aggresively-quantized Llama 70B at 4k
0
0
3
@philhchen
Phil Chen
1 month
With gradients as its map, it seeks to know The hidden path, the secret of the crest, Where wisdom waits, a treasure yet to show, And heaven's will descends, to put to test The searching mind, the heart that yearns to find The truth that lies beyond, for all mankind. (3/3)
1
0
2
@philhchen
Phil Chen
1 month
@suquimdecaja @jeremyphoward Caching should produce the exact same response distribution as no caching, so no impact on hallucination or inaccuracy rate
0
0
2
@philhchen
Phil Chen
1 month
@SavinovNikolay Very interesting that 1.5 Flash demonstrates better performance vs. 1.5 Pro on higher-level languages like py/ts and worse on lower-level cpp/rs
1
0
3
@philhchen
Phil Chen
21 days
@deedydas The right framing is that AI will fundamentally change software engineering, just as software engineering today is a fundamentally different job from the 50s. This new form of building logical abstractions will replace today’s software engineering.
0
0
2
@philhchen
Phil Chen
1 month
@_philschmid Another issue with open models distillation is with different vocabs:
@amanrsanger
Aman Sanger
7 months
People claim LLM knowledge distillation is trivial with logprobs, but that's not quite right... It's very tricky to distill between different tokenizers. [1] Internally, we've solved this with a clever algorithm we called tokenization transfer (1/7)
7
21
298
0
0
2
@philhchen
Phil Chen
1 month
Through valleys dark and ridges sharp and high, With steps precise, it seeks to find the best, And reach the heights, beneath the vaulted sky. (2/3)
1
0
2
@philhchen
Phil Chen
19 days
@giffmana @EugeneVinitsky @TheXeophon Undergrads were advertising other other undergrads’ work
0
0
2
@philhchen
Phil Chen
1 month
@_sholtodouglas 💙 in very large part thanks to your foundational work
0
0
2
@philhchen
Phil Chen
1 month
@jeremyphoward Trust me we also want caching out ASAP 😀
1
0
2
@philhchen
Phil Chen
1 month
@mmmbchang @OpenAI Incredible, congrats Michael and team!
4
0
1
@philhchen
Phil Chen
1 month
slightly tricky to me that "dients" of gradients was intended to be one syllable instead of two, but otherwise perfect Shakespearean form
0
0
1
@philhchen
Phil Chen
1 month
@yasinhassanien @SullyOmarr @OfficialLoganK 1. Cost: if you have a codebase or list of docs you want to reference in your queries, you can reduce cost of subsequent queries and not continue paying full cost for what’s cached 2. Latency: context caching can reduce latency on subsequent queries 3. Easier prompt construction
1
0
1
@philhchen
Phil Chen
1 month
@danesonance Re: 1) I think this is where I imagine feeding the entire REST API spec to an LLM and having it summarize / generate relevant docs to your particular query (e.g. how to create a VM with X GB RAM, Y CPUs). I think LLMs still hallucinate too much to do this 100% but it’s coming.
1
0
1
@philhchen
Phil Chen
1 month
@_xjdr @TheSeaMouse I just tested some ~200k prompts and got time to first token of 10-15 seconds
0
0
1