Phil Chen @philhchen Twitter profile

Last Seen Profiles

@TheWheelOfTime

@seolton

@samarfactor

@aria1sthebest

@IamMinglee

@stw_pdg

@blendedmojito

@ShehnaazGill

@CaritasLleida

@HaansPeter1

@ErnestScheyder

@Rafiszzk

@ninnzinn_ikura

@v_rodrigarroyo

@levisslxt

@MatteoLolli

@fendidaja

@elvuelo_lechuza

@NFTyuru

@DrDisrespect

@purcelestial

@Jotaroshi_

@JoyousCivic

@Poofyboom

@nikhilDevlove

@bringmetheflor

@mamama9290791

@adi_hashmonai

@dovemiint

@Nana_Nuamah10

@carrilloa33

@TheProGamerJay

@GainFi_Official

@jandakembangstw

@nft_war

@thisisrory

Phil Chen

@philhchen

1 month

Flash is natively multimodal, faster than Haiku/3.5, and supports 2M context. We’re adding caching coming soon for even faster latency.

Jeff Dean (@🏡)

@JeffDean

1 month

Gemini 1.5 Model Family: Technical Report updates now published In the report we present the latest models of the Gemini family – Gemini 1.5 Pro and Gemini 1.5 Flash, two highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information

28

240

1K

7

18

182

Phil Chen

@philhchen

1 month

Latency, cost, and quality: choose all three with flash

Ben (e/sqlite)

@andersonbcdefg

1 month

gemini-1.5 flash was the winner (by a substantial margin!) - claude haiku is a yes-man :( he told me a shitty classifier was nearly 100% accurate - 3.5-turbo errs on the side of false negatives: impossible to please! - flash is the most balanced & has highest agreement with gpt4o

12

10

114

0

1

20

Phil Chen

@philhchen

1 month

Having worked both as a heavy GCP user and as an engineer on GCP, I’ve been pretty frustrated with both reading bad docs and spending a lot of time thinking about how to write accurate but succinct docs. Extremely bullish on LLMs to solve this, especially with cheap long context

dane

@danesonance

1 month

@suchenzang @TheSeaMouse GCP’s biggest flaw is that the documentation is bloated and impossible to navigate. It’s such a pain in the ass to figure out how to do anything

0

5

3

1

18

Phil Chen

@philhchen

23 days

Congrats to @tiffzhao05 and team! Very useful for the future of model evaluation

Tiffany Zhao

@tiffzhao05

23 days

introducing SEAL leaderboards -- strong contender to @lmsysorg with (1) private/secure and (2) experts-annotated, high-quality datasets. some models were even self-hosted on the infrastructure built by our ML infra team! check it out!

0

6

0

4

Phil Chen

@philhchen

1 month

Sonnet by Gemini 1.5 Flash: The second order climb, a steep ascent, Where algorithms strive, their purpose clear, To find the peak, where truth is manifest, A summit bright, where all the answers cheer. The optimizer, tireless in its quest, (1/3)

rohan anil

@_arohan_

1 month

“Second order climb, Optimizer seeks the peak, Heaven's will descends.” - Haiku by Gemini 1.5 Flash

4

2

23

1

0

4

Phil Chen

@philhchen

1 month

@TheSeaMouse Will take a look - thanks for reporting. This is a lot slower than expected.

2

0

4

Phil Chen

@philhchen

21 days

@virattt @scale_AI @GroqInc With upcoming Gemini caching APIs, your 128k+ context queries will be discounted if you have repeatable context

1

0

2

Phil Chen

@philhchen

1 month

@andersonbcdefg Curious how Flash compares with Sonnet

1

0

3

Phil Chen

@philhchen

24 days

@ankesh_anand @CasualBrady more latency improvements coming soon - Groq also supports only 8k context, with its max speeds coming from aggresively-quantized Llama 70B at 4k

0

3

Phil Chen

@philhchen

1 month

With gradients as its map, it seeks to know The hidden path, the secret of the crest, Where wisdom waits, a treasure yet to show, And heaven's will descends, to put to test The searching mind, the heart that yearns to find The truth that lies beyond, for all mankind. (3/3)

1

0

2

Phil Chen

@philhchen

1 month

@suquimdecaja @jeremyphoward Caching should produce the exact same response distribution as no caching, so no impact on hallucination or inaccuracy rate

0

2

Phil Chen

@philhchen

1 month

@SavinovNikolay Very interesting that 1.5 Flash demonstrates better performance vs. 1.5 Pro on higher-level languages like py/ts and worse on lower-level cpp/rs

1

0

3

Phil Chen

@philhchen

21 days

@deedydas The right framing is that AI will fundamentally change software engineering, just as software engineering today is a fundamentally different job from the 50s. This new form of building logical abstractions will replace today’s software engineering.

0

2

Phil Chen

@philhchen

1 month

@_philschmid Another issue with open models distillation is with different vocabs:

Aman Sanger

@amanrsanger

7 months

People claim LLM knowledge distillation is trivial with logprobs, but that's not quite right... It's very tricky to distill between different tokenizers. [1] Internally, we've solved this with a clever algorithm we called tokenization transfer (1/7)

7

21

298

0

2

Phil Chen

@philhchen

1 month

Through valleys dark and ridges sharp and high, With steps precise, it seeks to find the best, And reach the heights, beneath the vaulted sky. (2/3)

1

0

2

Phil Chen

@philhchen

19 days

@giffmana @EugeneVinitsky @TheXeophon Undergrads were advertising other other undergrads’ work

0

2

Phil Chen

@philhchen

1 month

@_sholtodouglas 💙 in very large part thanks to your foundational work

0

2

Phil Chen

@philhchen

1 month

@jeremyphoward Trust me we also want caching out ASAP 😀

1

0

2

Phil Chen

@philhchen

1 month

@mmmbchang @OpenAI Incredible, congrats Michael and team!

4

0

1

Phil Chen

@philhchen

1 month

slightly tricky to me that "dients" of gradients was intended to be one syllable instead of two, but otherwise perfect Shakespearean form

0

1

Phil Chen

@philhchen

1 month

@yasinhassanien @SullyOmarr @OfficialLoganK 1. Cost: if you have a codebase or list of docs you want to reference in your queries, you can reduce cost of subsequent queries and not continue paying full cost for what’s cached 2. Latency: context caching can reduce latency on subsequent queries 3. Easier prompt construction

1

0

1

Phil Chen

@philhchen

1 month

@danesonance Re: 1) I think this is where I imagine feeding the entire REST API spec to an LLM and having it summarize / generate relevant docs to your particular query (e.g. how to create a VM with X GB RAM, Y CPUs). I think LLMs still hallucinate too much to do this 100% but it’s coming.

1

0

1

Phil Chen

@philhchen

1 month

@_xjdr @TheSeaMouse I just tested some ~200k prompts and got time to first token of 10-15 seconds

0

1