1/12.
@softwaredoug
wrote a blog about
#ChatGPT
as a natural language programming paradigm. He posed some interesting questions about building
#search
systems with
#LLMs
.
We interviewed our CTO
@amin3141
to get his take on these.
2/12. Question 1: Imagine scaling up this system, but I can index millions of documents and it can run queries in sub 100ms time. Is that even possible?
3/12. Answer: There’s lots of interesting research that focuses on training
#neural
networks to directly serve queries on a corpus of documents by inputting query strings and outputting document ids. While it’s very costly, the gains in performance are also huge.
4/12. With the massive, ongoing investments into
#ML
hardware innovation, I expect such approaches to become commercially feasible within the next 2-3 years.
5/12. A more tractable approach is leveraging
#LLMs
to produce document
#embeddings
that capture
#semantic
information. With
#embeddings
, you can achieve a natural language query response latency of sub 100 ms.
6/12.
#Embeddings
are high dimensional
#vectors
that allow the interaction between a query and document to be reduced to a linear function, thereby allowing scaling to millions, even billions, of documents in a knowledge base.
8/12. Answer: Compiling to optimized machine code is already happening: eg. for
#PyTorch
, the compiled, optimized version is TorchScript. But it goes much further than that:
#neural
networks today run on purpose-built hardware like TPU, Inferentia, Tensor Chip, NVidia A/H100, IPU
9/12. The biggest problem with
#generative
#LLMs
isn't their speed, but a problem known in the research literature as hallucination. They can unpredictably fabricate information that sounds plausible, but is, in reality, false & unsupported.
10/12. While research into controlling or eliminating hallucination is very active, and I have no doubt that it will be solved at some point, it is a serious impediment to recognizing the potential of
#generative
#LLMs
.
11/12. When hallucination is solved,
#generative
#neural
systems are going to reshape the business world. They will be powered on the backend by extractive
#neural
platforms like Vectara, and will fully automate lots of knowledge-based jobs in every industry.
12/12. Unlike
#generative
systems, extractive
#search
platforms put customers in full control of the provenance of
#search
results which are always extracted, verbatim, from material explicitly indexed in the system.