The most clearest and crisp explanation, I've ever heard, of how large language models compress and capture a "world-model" in their weights simply by learning to predict the next word accurately.
Furthermore, how the raw power of these base models can then be tamed by teaching
Anthropic was able to solve the "lost in the middle" problem "by adding the sentence “Here is the most relevant sentence in the context:” to the start of Claude’s response. This was enough to raise Claude 2.1’s score from 27% to 98% on the original evaluation."
Does it just take
Using a 7B Model + RAG to Identify and Edit Word-level Hallucinations in LLMs better then GPT-4:
In Short⏩:
> Train a model that consists of a Retreiver and a Language Model:
>> The retriever, Mret, takes the original output you want to check to hallucination (y) and
Why do large language models pay more attention to and reason better over the beginning and end of what you tell them in prompts?🤔
@nelsonfliu
and Percy Liang's group at Stanford recently published a paper () that discovered this "lost in the middle"
Fine-tuned larger language models and longer context lengths eliminate the need for retrieval from external knowledge/vector databases, right? ... Not quite!!
NVIDIA asked the same question last month!
They published a new paper() examining how well very
This is one of the best courses on NLP I've taken!
Stanford - CS224U: NLU
It covers:
Part 1: Details around both the encoder(bidirectional encoder representations from transformer - BERT) and decoder(generative pretrained transformers -GPTs) parts of the original transformer.
❓When using LLMs is unsupervised fine-tuning better than RAG for knowledge-intensive tasks? Should you do both?
If you want to augment an LLM with knowledge of your enterprise data you can do so by augmenting the parametric (finetune) or non-parametric(w/ a vector db like
A breakdown of the Long Context Retrieval Embedding Models from Stanford!💥
In Short⏩:
1. They release 3 long context(2k/8k/32k) BERT-like encoder embedding models on HuggingFace
2. The models are only 80M params and outperform MUCH larger models (4-85x larger)
3. Accessible
Ted Chiang has one of the most profound articles, I've read, on explaining LLMs.
He mentions that understanding and compression are two sides of the same coin.🪙
And interestingly, when we’re dealing with predicting words, lossy compression looks smarter than lossless
Can we use synthetic, LLM generated, data to train the next generations of bigger and better LLMs? How far will synthetic data take us in the pursuit of AGI?🤔
A paper() from researchers at Oxford and Cambridge addressed these questions earlier this year.
Karpathy explains the "SolidGoldMagikarp" attack!
He thinks it's due to a Reddit username that was frequently found in the tokenizer training set but never found in the LLM training set.
Thus when used in a prompt the model doesn't know how to behave as it was never trained.
How do you teach a Large Language Model to understand images?
This paper proposes a technique called Visual Instruction Tuning that is now used by many of the language vision models we see in the field such as GPT4-Vision and Gemini etc.
In Short:
The paper introduces a method
Just watched the awesome talk from
@rao2z
! 💯
Fav Quote: "If you have infinite memory, that you can retrieve from, then you don't need to reason!"
Talk breakdown of topics:
> What are LLMs - very large and sparse n-gram models
> Can they plan?
> Retrieval vs. reasoning.
Can we finetune our LLM and retriever together to improve RAG performance?
This paper proposes a technique to do exactly that!
RAG Basics:
When you prompt an LLM, RAG supplies relevant documents. A separate retrieval model computes the probability of each text chunk being
Can you see how much information the Matryoshka sub-vectors of the OpenAI text-embeddings-3-large model capture?
The graph shows the smoothed stdev. per dimension of 10k random samples from DBpedia embedded with the new text-emb3-large model.
We can see the variance is a
Influence functions are a really cool way to understand why an LLM generated a specific answer to a prompt.
They allow you to trace a sequence of generated tokens back to the training documents that most influenced ita generation.
They also allow you to see which layers of the
What's better than retrieval augmented generation(RAG)? 🥁🥁🥁
Multimodal RAG! 😎👌🔥
RAG allows you to pack retrieved context into a prompt so that a language model can read relevant information before generating a response - this function is critical and allows us to
I knew it! Dota2 MMR is not just a number!?
This study finds "that performance in the popular MOBA correlates with intelligence".
> So my MMR is low basically b/c I'm stupid
> Performance in FPS games doesn't correlate w/ intelligence.
@yacineMTB
@NahazDota
@emollick
Combine that paper with this and you get 10x potential.
This paper finds that talking science 1 on 1 with a science buddy that you trust is a very good for the creative process.
Really cool perspective on why next token prediction might be enough to reach AGI.
In this snippet from the
@dwarkesh_sp
podcast, Ilya argues that great next-token prediction requires a profound level of understanding of underlying concepts to accomplish.
This is in stark
❓Your RAG workflow is only as good as the retrieved context. Can you use LLMs to improve recall and search relevance for dense retrievers?🤔
📜Work from Microsoft () uses synthetic data + LLMs as embedding models to achieve SOTA on the MTEB benchmark.🧵
An Overview of OpenAI's New Truncatable - Matryoshka Embeddings🪆
OpenAI recently announced embeddings that you can simply use chunks of (say the first 8, 16, 32, 64, 128 or 256 ... dimensions of the total 2048d vector) they use Matryoshka representation learning(MRL).
This is
Anytime I use vector search I always wonder what my embeddings "look" like so I decided to take a look!🔍
Each dot is one of 100k Wikipedia article chunks embedded into vector space.
Demo + Code 👇
Here's what I did:
1⃣ Took a dataset of 100k Wikipedia Article chunks
2⃣
❓What text chunk size should we use in our RAG workflows? How does chunk size impact retrieval recall? Are bigger chunks better? Smaller chunks but keep more top-k?
📜The new paper from Tencent and Carnegie Mellon() asked:
1. What chunk size is best to
Let your LLMs work from home🏠🧑💻
ChatGPT works reliably harder when told that "it can WFH"😂- mean response length is higher and response length st. deviation is pretty low.
All other times responses are pretty erratic(super high stdev) - most so after a 2-hour commute🤣
A breakdown of the different types of hallucinations from AI2:🍄
1. Verifiably Factually Wrong ❌
- Entity: an entity in a statement is incorrect (eg. Christmas falls on Nov. 25th)
- Relation: semantic relationship in a statement is incorrect (eg. The mouse ate the cat.)
-
Just watched this discussion on DSPy and ColBERT!
My takeaways:
1. Language models are not yet reliable enough to be used as standalone systems, but they can be very powerful when used as components of larger pipelines.
2. DSPy (a framework for building these pipelines)
Hey everyone! I am BEYOND EXCITED to publish our interview with Omar Khattab (
@lateinteraction
) from
@stanfordnlp
! 🔥
Omar delivers remarkably clear explanations of DSPy and ColBERT, as well as views on the state of AI! I hope you find this useful! 🎙️
How does the underlying data representation change as you use more and more dimensions of a Matryoshka embedding?
Below every frame is a 3d vector space that was generated using PCA on up to only a certain number of MRL vector dimensions.
Took 10k random samples from DBpedia
Multimodal ML models can convert multiple data types (text, images, or audio) into vectors. Each embedding is then stored in the same vector space.
Learn more about multi-modal embeddings and pairing it with vector search in this
@DeepLearningAI
workshop:
Can large language models infer causation from correlation?
And if they can't automatically bridge the gap from correlation to causation, then can we at least fine-tune them to improve at this task?
These two questions were addressed by researchers at the Max Planck
Full blog on MM-RAG out now! 👇
I cover:
1. Contrastive Training of Multimodal Models
2. Any-to-any search and retrieval with code examples
3. MM-RAG with code examples using GPT4-V
What's better than retrieval augmented generation(RAG)? 🥁🥁🥁
Multimodal RAG! 😎👌🔥
RAG allows you to pack retrieved context into a prompt so that a language model can read relevant information before generating a response - this function is critical and allows us to
@abacaj
I feel like right now we're at a spot where prompt engineering is more alchemy then a science. We're just taking shots in the dark and sometimes shit just works...
❓Can you really get a LLM to self-check its own responses for hallucinations?
📜Researchers from Cambridge released a paper() developing a method called SelfCheckGPT - a framework that uses only black-box access to a LLM through an API to assess if it's
A great explanation of 3 levels of "creativity" that AI systems can exhibit and be tested for:
Level 1: Interpolation - making more of the same by averaging known examples
Level 2: Extrapolation - extending the boundaries of your knowledge using what you know
Level 3:
"TikTok video generation SOTA over Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion"
Q: What does it take to bake a delicious ML SOTA model cake?
Data is KING 👑
Next Q: Who has more data than TikTok?
ByteDance just announced MagicVideo-V2
Multi-Stage High-Aesthetic Video Generation
paper page:
The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce
Wrapped up an incredible workshop with
@DeepLearningAI
yesterday!
@sebawita
and I deep dive into Multi-Modal Search with Vector Databases! 📷
YouTube:
Github Repo:
Register for the Weaviate course here:
A Simple Overview of the LLM Training Steps:🔡
1. Unsupervised Pretraining:
>> High quantity, low quality data
>> The model is trained to predict the next token for trillions of tokens.
>> Produces what is called the foundation or base model.
2. Supervised Finetuning:
>>
HYBRID SEARCH in vector db's be like...
Hybrid search performs a hybrid of both ML based semantic search with dense vectors AND keyword search with sparse vectors in parallel.
Semantic search is great for matching concepts and leveraging an ML model's understanding of concepts.
Q: Now that LLMs are widely used what happens if future LLMs are trained on increasingly more diluted AI-generated scraped content?
A: Nothing good, like making a photocopy of a photocopy, data dist. degrades over time.
@iliaishacked
,
@NicolasPapernot
explore this problem...🧵
🧑💻Demo Alert: Cook up some multimodal search magic of your own!🪄🧙
Github Repo:
Multimodal models generate a data representation space unified across modalities - this means we can represent audio, video, text, and images in the same numerical language!
Had a blast speaking at
@KGConference
2023 about designing recommender systems using
@weaviate_io
vector database to provide real-time filtering over millions of objects! Slides from the presentation are here:
Can fine-tuning be used to teach an LLMs entirely novel capabilities or does it just modulate existing ones?
This paper shows that:
(i) fine-tuning rarely alters the underlying model capabilities;
(ii) a minimal transformation is typically learned on top of the underlying
This new work from the Allen Institute is so cool.🔥😎👌
They released, papermage, a Python toolkit for analyzing and processing PDFs. You can use it to loop over the text, tables, and figures in a PDF.
Imagine using this with multimodal embedding and generative models! Cooking
The Revolution Will Not Be Unimodal!
This new paper provides an overview of 26 large multimodal model architectures and training pipelines. They also cover the performance of these LMM on mainstream benchmarks.
VILA the latest LMM from NVIDIA seems to be dominating.
Prof. Hinton is the master of intuitively explaining complicated concepts.
His motivation for how he came up with the idea of dropout is still one of my favorite things in entire the field of deep learning!
Dropout does this seemingly crazy thing of randomly eliminating nodes
Does a 10M(or even ∞) context length mean the death of RAG?
I think that context length and RAG are not competing but rather synergistic solutions.
For every irrelevant token you pass into the context window you unnecessarily increase inference time and cost. And it does add
Interested in multimodality and how it's revolutionizing e-commerce product recommendations?
I'll be speaking with David Wood, CTO of
@moonsift
, about how he's building multimodal recommender systems to improve personalized product discovery!
Sign up here👇
The Python v4 client is really user friendly, my biggest wow moments when internally testing it for the first time:
1. Feels pythonic - just the way it should!🐍
2. It's so much faster - gRPC ftw!⚡
Explore the new data addition capabilities in
@weaviate_io
#Python
client v4! Learn how to add data with the new syntax, from single inserts to improved batching methods and error handling techniques.
Watch the tutorial by
@_jphwang
here or on YouTube:
Can you tell the difference between human-written language and AI-generated text?🤔
To solve this problem we need watermarks!📃
Researchers at the University of Maryland() created a way for us to modify LLMs such that a watermark would automatically be
Can you reliably tell apart fake, LLM-generated, text from human-written text?🤖⚖️👱
Binoculars is a technique that requires no training and can 0-shot detect 90% of LLM-generated content at a 0.01% false positive rate.
In Short⏩:
Human tokens are, on average more surprising
Check out my talk at
@twelve_labs
about the promise of multimodal-RAG used with
@weaviate_io
. A cool concept that was presented at ICML 2023() that can be used to further control the output of mm generative models.
Really excited about the paper reviews initiative...LFG🚀🚀🚀
The goal is to provide digestible 1-2 min summaries of current research in IR, representation learning, RAG, ML to make it more accessible to our community.
Did you know about the Binoculars technique to reliably tell apart fake, LLM-generated, text from human-written text? Or have you heard about Modular RAG? What about Matryoshka Embeddings?
These are just a few posts in our new paper reviews page, where we create 1-2 minute
Pumped to be presenting at the Big Data Conference Europe 2023 today. 🎙️
My talk will focus on how people can get started practically using vector databases, explaining the fundamentals behind how they work, along with a spicy demo of
@weaviate_io
in action!
We attained a ~40% QPS speed up at 90% Recall in Weaviate running on Intel's new Xeon Processor, Emerald Rapids! 🚀
Check out our new blog post for a technical deep dive into different implementations for vector distance calculations and optimizations enabled by Intel's new 5th
Sure, you can train a LLM, perhaps you can even finetune one! But can you brainwash one into forgetting specific concepts?🧠
How would you erase a concept from a LLM's parametric memory?
This question was addressed by researchers at MicrosoftAI in their new
A recent research direction has explored directly prompting LLMs to perform unsupervised ranking using pointwise, pairwise, or listwise techniques. Some of these techniques even surpass the performance of state-of-the-art supervised systems.
In the 10th session of
#MultimodalWeekly
, we have exciting speakers who will share their work in interior design and scaling multimodal models to production
Interested in how you can use multimodal embeddings and large multimodal models with vector databases?
Just finished off a talk at NDC London. I covered vector search, multimodal embedding models, multimodal recommender systems, and MM-RAG!
All slides and demos are below! 👇
Why does binary quantization reduce vector embedding size soo much?
It's kind of like turning a colored image into a black and white image.
You can still understand what the image is about but you lose a lot of information.
More on this in the blog👇
Stoked to be giving a talk and running a workshop next week at
@NDC_Conferences
in London!🇬🇧
Talk/workshop will be focussing on lightning-fast, production-ready multimodal search using
@weaviate_io
.
Workshop:
Talk:
Scientific Discovery at the Speed of Compute!🔬⚡💻
Over the next decade, I am most excited about deep learning accelerated science discoveries (particularly biomedical).
Very cool talk from Chris Bishop on how machine intelligence can augment this discovery process by
You don't read the entire library if you only need to review one paragraph.
Not to mention, you pay per token in both time and money.
Search just scales better.
I'm looking forward to next weeks Haystack2023 EU conference in Berlin!
Tune in to see how multimodal retrieval can help improve search relevance with
@weaviate_io
!
Check it out 🚀💫
GPT-5 live-testing its Q* DotA mode (internally known as QpenAI-5), just one of many capabilities that I've heard emerge at ultramassive scale.
Still some work left to do though.
My recent talk on intro to vector DBs from ML Conf42 is now available on youtube.
The slides I use are here: .
If you have any questions or comments/corrections let me know!
Want to learn more about Vector Databases?
Check out the recording of "A Gentle Introduction to Vector Databases" by
@ZainHasan6
at
@conf42com
2023.
Learn how these databases are transforming the world of Machine Learning!
@ChristophMolnar
Another great resource for self supervised learning is this book by
@ylecun
and colleagues.
"Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook."