Retrieval-Agmented Generation(RAG) is a process of optimizing the output of the Large Language Model , so it references an external knowledge source(documents, pdfs etc) or knowledge base outside it's training data sources before generating a response.
The launch of ChatGPT in December 2022 was a significant turning point for RAG. Since then, RAG has used it to leverage external documentation or knowledge sources with the reasoning capabilities of large language Models(LLM).
In this blog post, you will learn a brief introduction about the RAG concept, purpose and characteristics.
What is Retrieval Augmented Generation (RAG)?
RAG, or Retrieval-Augmented Generation, is a framework that combines two main approaches in natural language processing retrieval and generation.
It involves retrieving relevant information from a large dataset(Non structured or semi structured data) and then using that information to enhance the generation of text-based outputs, such as summaries or answers to questions.
This combination aims to improve the quality and relevance of generated text by providing additional context from retrieved sources.
Why is Retrieval-Augmented Generation important?
Retrieval-Augmented Generation (RAG) is important because it addresses several challenges faced by Large Language Models (LLMs), which are used in AI for chatbots and natural language processing:
Reducing False Information: Sometimes, LLMs answer questions with false information, especially when they don't know the answer. RAG helps by guiding the LLM to find and use reliable information.
Keeping Information Current: LLMs often have a knowledge cutoff, meaning they can't provide the latest information. RAG connects the LLM to up-to-date sources, ensuring responses are current and relevant.
Using Trustworthy Sources: Without RAG, LLMs might create responses from sources that aren't reliable. RAG ensures the LLM uses authoritative and trustworthy sources, improving the accuracy of its responses.
Clarifying Terminology Confusion: LLMs can get confused when the same words mean different things in different contexts. RAG helps by retrieving information from sources that use terminology accurately and consistently.
Think of an LLM like a keen new employee who's eager to answer every question confidently but isn't always well-informed. This can harm user trust. RAG helps this "employee" to stay informed by guiding them to reliable, current information, thereby improving trust and accuracy in chatbot responses.
What are the benefits of the Retrieval-Augmented Generation?
Why do we still need RAG when we have LLM? The reason is simple: LLM cannot solve the problems that RAG can address. These problems include:
Model hallucination problem: The text generation in LLM is based on probability. Without sufficient factual support, it may generate content that appears serious but lacks coherence.
Timeliness problem: The larger the parameter size of LLM, the higher the training cost and the longer the time required. As a result, time-sensitive data may not be included in training in a timely manner, leading to the model’s inability to directly answer time-sensitive questions.
Data security problem: Generic LLMs do not have access to enterprise internal or user-private data. To ensure data security while using LLM, a good solution is to store the data locally and perform all data computations locally. The cloud LLM only serves the purpose of summarizing information.
Answer constraint problem: RAG provides more control over LLM generation. For instance, when a question involves multiple knowledge points, the clues retrieved through RAG can be used to limit the boundaries of LLM generation.
Workflow of RAG
Indexing: Indexing is a multi step process as you can see in the picture. It's a step by step process to first create an external data set, chunk it into usage and smaller format and using an embedding model to convert into numeric format.
The new data outside of the LLM's original training data set is called external data.
External data(not the LLM's original training), comes from various sources like APIs, databases, and repositories. This data, in different formats such as files or text, database records, or long-form text.
1) is converted into Useful smaller chunks.
2) Labeling the data(this step is optional, however, can improve the vector database search efficiency)
3) Converting the data into numerical forms by embedding language models and stored in a vector database, creating a usable knowledge library for AI models.
Finally, an index is created to store these text chunks and their vector embeddings as key-value pairs, enabling efficient and scalable search capabilities.
 This process creates a knowledge library that the generative AI models can understand.
 Incorporating the Pgvector extension with PostgreSQL not only results in substantial cost savings but also leverages the strengths of an open-source database system. The best part is that PostgreSQL database can be used at multiple stage of the enterprise data management for Structures data, Semi structured and for Vector data.
(To learn more about vector database please read :Vector database and why it's Popular in AI)
Retrieve Information:
The next step is to perform a relevancy search. The system transforms user queries into vectors to search relevant documents in vector databases. For instance, a chatbot for HR queries retrieves related policy documents and the employee's records based on the specific query, like "How much annual leave do I have?". These specific documents will be returned because they are highly-relevant to what the employee has input. The relevancy was calculated and established using mathematical vector calculations and representations.
Augment Input: The RAG model then enhances the user's input by adding this retrieved information. This uses prompt engineering to better interact with the language model, resulting in more precise responses.
Update Data: To keep the external data fresh and relevant, it's updated asynchronously, either in real-time or through scheduled batch processes, addressing a common challenge in data analytics and management.
To sum up, the concept of Retrieval-Augmented Generation (RAG) in relation to Large Language Models (LLMs) can be metaphorically compared to the idea of an open-book test. In this case, just as students are allowed to use their study materials to locate pertinent information for answering questions, RAG enables LLMs to consult a wide array of external data sources for generating more informed and accurate responses.
Thank you for reading.
Stay Learning -
Aj
Please feel free to read my blogs related to Data
Vector database and why it's Popular in AI
Database Landscape" What Are the Different Types of Databases? (Part-1)
The Importance of Data Governance and Data Security in Modern Organizations
References
AWS provides a detailed explanation of what Retrieval-Augmented Generation is and its importance. This source covers aspects such as cost-effectiveness, the relevance of current information, user trust enhancement, and developer control in the context of RAG: AWS - What is Retrieval-Augmented Generation?
Datastax offers a comprehensive guide on RAG, discussing its technical implementation, comparison with Semantic Search, applications, and benefits. This could provide in-depth insights and examples for your blog: Datastax - Retrieval Augmented Generation: A Comprehensive Guide
IBM Research Blog presents an explainer on RAG, focusing on its application in grounding LLMs with accurate and up-to-date information. It also touches on the challenges and potential in implementing RAG in AI frameworks: IBM Research Blog - What is retrieval-augmented generation?
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv preprint arXiv:2005.11401, 2023.
I like how this covers all the basics and keeps it "pg-13" as far as NOT jumping into code goes.