1/12. @softwaredoug wrote a blog about #ChatGPT as a natural language programming paradigm. He posed some interesting questions about building #search systems with #LLMs. We interviewed our CTO @amin3141 to get his take on these. Tweet added by Vectara @vectara

Vectara

2 years

1/12. @softwaredoug wrote a blog about #ChatGPT as a natural language programming paradigm. He posed some interesting questions about building #search systems with #LLMs . We interviewed our CTO @amin3141 to get his take on these.

Make a search engine in ChatGPT

Index some documents, provide some queries, ChatGPT will tell you the most relevant documents for those queries

softwaredoug.com

1

3

13

Vectara

@vectara

2 years

2/12. Question 1: Imagine scaling up this system, but I can index millions of documents and it can run queries in sub 100ms time. Is that even possible?

1

4

Vectara

@vectara

2 years

3/12. Answer: There’s lots of interesting research that focuses on training #neural networks to directly serve queries on a corpus of documents by inputting query strings and outputting document ids. While it’s very costly, the gains in performance are also huge.

1

4

Vectara

@vectara

2 years

4/12. With the massive, ongoing investments into #ML hardware innovation, I expect such approaches to become commercially feasible within the next 2-3 years.

1

4

Vectara

@vectara

2 years

5/12. A more tractable approach is leveraging #LLMs to produce document #embeddings that capture #semantic information. With #embeddings , you can achieve a natural language query response latency of sub 100 ms.

1

4

Vectara

@vectara

2 years

6/12. #Embeddings are high dimensional #vectors that allow the interaction between a query and document to be reduced to a linear function, thereby allowing scaling to millions, even billions, of documents in a knowledge base.

1

4

Vectara

@vectara

2 years

7/12. Question 2: Can natural language programming systems be compiled into optimized machine code to achieve high speed inference at scale?

1

4

Vectara

@vectara

2 years

8/12. Answer: Compiling to optimized machine code is already happening: eg. for #PyTorch , the compiled, optimized version is TorchScript. But it goes much further than that: #neural networks today run on purpose-built hardware like TPU, Inferentia, Tensor Chip, NVidia A/H100, IPU

1

4

Vectara

@vectara

2 years

9/12. The biggest problem with #generative #LLMs isn't their speed, but a problem known in the research literature as hallucination. They can unpredictably fabricate information that sounds plausible, but is, in reality, false & unsupported.

1

5

Vectara

@vectara

2 years

10/12. While research into controlling or eliminating hallucination is very active, and I have no doubt that it will be solved at some point, it is a serious impediment to recognizing the potential of #generative #LLMs .

1

5

Vectara

@vectara

2 years

11/12. When hallucination is solved, #generative #neural systems are going to reshape the business world. They will be powered on the backend by extractive #neural platforms like Vectara, and will fully automate lots of knowledge-based jobs in every industry.

1

2

6

Vectara

@vectara

2 years

12/12. Unlike #generative systems, extractive #search platforms put customers in full control of the provenance of #search results which are always extracted, verbatim, from material explicitly indexed in the system.

0

3

9

Replies