Super excited to introduce 🌳Acadia (
@AcadiaAI
) Playground, an interpretable data exploration tool to understand your evaluation data’s quality and help unlock insights into model performance using AI!
🧵
Thrilled to welcome our newest cohort of Venture Partners to the Contrary family! With nearly 1300 applications, this year was our most competitive yet.
We’re excited to work with you all to meet and invest in the next generation of exceptional founders and companies.
We also…
from last minute late night ideas to fruition, the beautiful Figma offices to the inspiring ppl. true thanks
@hackclub
and the Assemble team for making things happen!
#assemble22
#sf
3/ Hierarchical semantic clustering
Clustering scheme that generates an interconnected hierarchy that links ideas together into a single post
Consolidate your notes into a blog post
🥇First place
@JvNixon
@_nathanmarquez_
@zvhgpyxqtnys
the most sf imagery from today is seeing two ppl squeeze into a waymo front seat and another waymo blow up from fireworks 😳 anways.. happy lunar new year!🧧
This is our first of many steps towards bringing interpretability into datasets and evals of growing quantity, complexity, and modalities.
We want to make it easy to unlock high quality signal from the data for many LLM + multimodal applications.
6/6
Here's a new SOTA text-to-image eval metric that's much better at complex compositional reasoning than current ones (e.g CLIPScore, PickScore)!
We also show that it generalizes to video/3d evaluation + released a comprehensive t2visual meta-eval metrics benchmark.
Great to have…
In text-to-image generation, evaluating how well the generated image matches the prompt is a major challenge. We address this with VQAScore: a SOTA metric that significantly surpasses CLIPScore, PickScore, ImageReward, TIFA, and more!
VQAScore works especially well on complex…
AI companies: introducing our new talented 👏 brilliant 👏incredible👏amazing👏show stopping👏spectacular👏never the same👏model
Also AI companies: you cant use it yet
🗃️ Combine "Topics" of choice to filter and inspect individual datums
🧐 Select a model of interest, toggle on failure case mode, log, and visualize where failure cases occurs
2/6
@khoomeik
@ArYoMo
i'm curious--how are you baselining with gpt4v exactly? inputting screenshot & directly prompting it to output observation, thought, and action? i usually find gpt4v to be better at relative spatial reasoning/spitting out img descriptions
🛝You can define a custom set of task-specific "Topics" of interest, and Acadia Playground visually decomposes a target datasets' content into these categories
🔍 Explore dynamic embedding views of your data points--either embedded by overall semantics or “Topic” slices
1/6
pulling out a weekend project from a few mos ago...
Fireo🔥, a neural net tensor shape debugger!
- Useful print statements only
- Only needs pseudo input + model class
- No more hours spent manually tracing through shapes in your dl model dev workflow
@AcadiaAI
Playground is multimodal! We used it to analyze
🖼️ Winoground (VLM image caption matching task)
💻 HumanEval (LLM code generation task)
More details coming soon :)
3/6
@itsandrewgao
the swin transformer for example. also, although the naive attention’s work is in order n^2, multi-headed attention/parallelize-ability makes the span closer to linear or logn.
JUST IN: Meta AI introduces LLaMA, a 65B parameter LLM.
LLaMa only relies on publicly available data and outperforms GPT-3 on most benchmarks despite being 10x smaller.
@AcadiaAI
Playground can also be used for:
- Cross comparison of various models to evaluate the best model for your use case
- Identify and target weaknesses in your dataset distribution (such as duplication or misrepresented categories), inform better data curation
4/6
when reading research papers, isn't it so annoying to click the link to see the citations but then have to scroll all the way back up or am i missing out on something?
demo day was awesome. cv has always been extremely interesting to me but I had never first-hand witnessed how inspiring it may also be for others until today, esp by it’s real world applications that bridge imaginative sci-fi with reality. 🦾
#gangstaminecraft
200 on clip is crazy 😱. there’ll probably be a lot more on nerfs / 3d vision once 2d vision is solved (alr feels like it has by gpt4v but opensource still has a long way to go)
@YiMaTweets
hmm feels like it's more prior ⊆ latter. classification/recog. are discriminative tasks whose objective is to learn conditional prob distribution P(X|Y) aka decision boundaries, which is a subset of generative models that learn a joint distribution P(X,Y) where we sample from
@akbirthko
awesome, this was what i was leaning towards. but in this case, what is the point of even having different heads if their end result is concatenated together anyways b4 the linear layer? don't the q, k, v operate independently between the different hidden dims anyway?
@HaoliYin
I've actually thought about this b4 haha! I feel like generating accurate and robust 3d mesh/point cloud/surface is pretty difficult and unsolved problem.
imagine if there exists an arXiv that consists of papers/logs of project ideas that failed or went nowhere. that way, actual innovation might progress much faster.
reliable models only result from robust evaluations and metrics. what are (relatively) non-subjective ways to eval generative models or is that just its nature?
IT IS OFFICIAL!!! The world’s biggest, most powerful rocket ever, will attempt its first launch on the morning of Monday, April 17th!!! We have our stream ready to go with some amazing views and incredible audio to help bring you along!
@gdb
increase in RPD limits; random server errors occur at times; browser version feels like it’s much more willing to describe; log probs would be great!
@MarioKrenn6240
Due to the influx of papers, bec it's rare for any AI researcher to have read every single paper in their relative subdomain, there're undoubtedly lots of overlapping "novelties." So even just having a systematic approach for tracking defs and training paradigms would be helpful
currently playing with
@runwayml
's gen-2 video gen models -- definitely something going on
"A baker pulling freshly baked bread out of an oven in a bakery"
send in some prompts👇
today i ran into a symposium at the CMU robotics institute while exploring campus. interesting work + had very nice convos abt language and vision w/ these grad students who presented at CVPR & ICML
@HaoliYin
@alexfmckinney
i say try the former, if not good enough then the latter, we def have stronger text embedding models than vision. also i'm interested to see how close CLIP img encoder embeddings are to img->description->CLIP text embeddings, perhaps that could be a finetuning objective for CLIP
The software engineering aspect of deep learning repos I've been watching closely is how they store, catalogue, override, manage and plumb hyperparameter configs. Have come to dislike argparse, YAMLs (too inflexible), and fully enumerated kwargs on classes/defs. Any favorites?
Apart from intention-based factors such as company direction and algorithm design, it’s interesting to note the dissimilarity of the current knowledge transfer ability bet. natural language-based (twitter) vs vision/img/vid based (insta, tiktok) mediums. language is clearly ahead
quick technical question: does increasing # of heads in the transformer MSA increase param count? i've gotten mixed answers. if this is implementation dependent then is there a standard? for most implementation i've seen (pytorch & swin) the answer seems to be a no.