LLMs as information retrieval devices

Discover how LLMs like ChatGPT redefine information access. Learn how they tackle challenges in delivering accurate data through prompt engineering and architecture. An efficient, cost-effective solution for precision.

💡 Articles
2 February 2024
Article Image

ChatGPT's front page says it can occasionally generate incorrect information and has limited knowledge of world after 2021. But what if you want reliable and factual information? What if you want most up-to-date information? Should you fine-tune a model for that? No! You can use some basic engineering techniques to create a program that can get you factual information. Read this article to learn how.

Large Language Models (LLMs) like ChatGPT and GPT4 have become the talk of the town. They're designed to process and generate human-like text, ingest tons of data and at the same time, be friendly to the machines by allowing parallel computation -- saving us a lot of time and reducing drop in performance due to long range dependencies.

These brainy language models have impressed nearly everyone with their ability to take a chunk of text as input and produce relevant and coherent responses. Whether they are the path towards Artificial General Intelligence (AGI) is still unknown, but they can serve as the building blocks of technologies that were totally science fiction before.

Just like any other technology, however, LLMs have their own strengths and weaknesses. While bloviation and spitting wrong information confidently remain arguably its biggest weaknesses, there are some areas where it performs really well -- namely in-context learning and semantic reasoning capabilities.

The synergy between the pros can be leveraged to greatly overcome the cons which means fewer hallucinations. By combining them you end up with a tool that can query any knowledge source and provide a natural language interface for it -- alleviating the requirement of knowing the querying language of that particular source.

I've been experimenting with this idea a lot recently and I strongly feel that sharing my experiences and insights in an article would be valuable for others having similar interests or motives.

Architecture

A simple information retrieval device can be created by putting together the following components:

Disambiguator

User inputs can be ambiguous and such inputs passed to LLM can cause hallucinations. It’s better to first remove as much ambiguities from a query as possible.

This component is responsible for taking input a user query and removing semantic ambiguities from it by asking clarification questions. Once the system is confident enough to comprehend the query, it can determine the data that the query is requesting from the knowledge source. The output of it is a transformed and more explicit query that can be better understood by the next components.

If you have a huge amount of complex data in your knowledge source, it's better to first design a taxonomy of queries that can be asked. This grouping will give you a high-level idea about the kind of queries you can run into. This information is useful to craft category-specific content which can then be dynamically fetched using vector database lookup and inserted into the LLM's context.

Query Agent

This component takes input the unambiguous query and breaks it down into small tasks that are then carried out to fetch the relevant information from the graph. The tasks can be probing the knowledge source for schema related information, doing vector database lookup for finding relevant documents etc., and determining whether the system has all the prerequisite information required to effectively query the knowledge source.

Since we already know what kind of queries we are going to receive, we can have a series of specific tasks with predefined acceptance criteria. Once a task has been completed, its acceptance criteria can be used to validate the output.

Once all the tasks have completed execution, a formal and structured query is generated that the knowledge source understands and results are obtained.

You can perform an LLM summarization round on the information obtained from your knowledge source or just return it in raw format.

The effectiveness of both of these components depends on two things

  • Crafting effective prompts that bound the LLM to think in a certain way
  • Making the LLM reason correctly.

A comprehensive discussion of both of these topics is out of scope for this article but I've some pointers that will give you a more clear picture of the do's and do don'ts.

Prompt Engineering

Prompt engineering involves crafting use-case specific inputs that restrict an LLM to output predictable and desirable outputs. In other words, it is about supervising the thought process of an LLM so that it aligns with the goals of the user.

There are a lot of prompting techniques out there that you can use depending on what you want to achieve. They provide you with a framework for structuring your prompt content but, what should be the content? Well, your prompt should mostly contain relevant information to the query.

Having specialized prompts providing query-specific information greatly decreases hallucinations. If you make your prompt too specific, you'll end up with a brittle system that is likely to hallucinate on unseen inputs. Your job is to find a the content that generalizes to different aspects of the query.

Knowledge Intensive Reasoning Tasks

LLMs do struggle with reasoning and they may even fail to get some of the most basic things right. That's probably because they're relying a bit too much on their system 1, surface-level thinking. Takeshi et al. found that by just adding "Let's think step by step" at the end of the input, their performance significantly improved on diverse reasoning tasks.

Step-by-step thinking coupled with a few-shot examples in the context leads to more accurate and traceable outputs. Additionally, you can use some mechanisms to ascertain the relevance of LLM answers at each step. This is a simple, elegant and powerful framework that provides a good accuracy on several reasoning tasks.

Final Thoughts

This article covered designing a basic information retrieval system that provides a natural language system. Such a system gives you the accuracy, credibility and tractability you need in scenarios where getting factual information is important. This is also much cheaper than training or fine-tuning your own LLM on custom datasets. Moreover, all the user queries that returned relevant information can be stored and dynamically inserted into the context of the LLM whenever a similar question is asked again resulting in a system whose accuracy improves overtime.

Please let me know your thoughts on it. I look forward to your feedback 🤗🤗.

References:

  • https://twitter.com/fchollet/status/1614069777953361920?t=JRBkyjtoEoUEyvTOrgADLA&s=19
  • Zhuosheng Zhang, Aston Zhang, Mu Li, & Alex Smola. (2022). Automatic Chain of Thought Prompting in Large Language Models.
  • Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, & Yusuke Iwasawa. (2023). Large Language Models are Zero-Shot Reasoners.
  • Zamani, H., Dumais, S., Craswell, N., Bennett, P., & Lueck, G. (2020). Generating Clarifying Questions for Information Retrieval. In Proceedings of The Web Conference 2020 (pp. 418–428). Association for Computing Machinery.

This article was written by Muhammad Saad, Co-founder & CTO at Antematter.io