💻
RAG and LLM Bootcamp
  • Welcome to the Bootcamp
    • Course Structure
    • Course Syllabus and Timelines
    • Know your Educators
    • Action Items and Prerequisites
    • Kick-Off Session for the Bootcamp
  • Basics of LLMs
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of LLMs
    • Bonus Resource: Multimodal LLMs and Google Gemini
  • Word Vectors, Simplified
    • What is a Word Vector?
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
    • Bonus: Overview of the Transformer Architecture
      • Attention Mechanism
      • Multi-Head Attention and Transformer Architecture
      • Vision Transformers (ViTs)
    • Bonus: Future of LLMs? | By Transformer Co-inventor
    • Graded Quiz 1
  • Prompt Engineering and Token Limits
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • For Starters: Best Practices
    • Navigating Token Limits
    • Hallucinations in LLMs
    • Prompt Engineering Excercise (Ungraded)
      • Story for the Excercise: The eSports Enigma
      • Your Task fror the Module
  • RAG and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)?
    • Primer to RAG: Pre-trained and Fine-Tuned LLMs
    • In-context Learning
    • High-level LLM Architecture Components for In-context Learning
    • Diving Deeper: LLM Architecture Components
    • Basic RAG Architecture with Key Components
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in RAG
    • Key Benefits of using RAG in an Enterprise/Production Setup
    • Hands-on Demo: Performing Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search (Bonus Module)
    • Bonus Video: Implementing End-to-End RAG | 1-Hour Session
    • Graded Quiz 2
  • Hands-on Development
    • Prerequisites (Must)
    • Docker Basics
    • Your Hands-on RAG Journey
    • 1 – First RAG Pipeline
      • Building with Open AI
      • How it Works
      • Using Open AI Alternatives
      • RAG with Open Source and Running "Examples"
    • 2 – Amazon Discounts App
      • How the Project Works
      • Building the App
    • 3 – Private RAG with Mistral, Ollama and Pathway
      • Building a Private RAG project
      • (Bonus) Adaptive RAG Overview
    • 4 – Realtime RAG with LlamaIndex/Langchain and Pathway
      • Understand the Basics
      • Implementation with LlamaIndex and Langchain
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Suggested Tracks for Ideation
    • Sample Projects and Additional Resources
    • Submit Project for Review
Powered by GitBook
On this page
  • Understanding Adaptive RAG
  • Adaptive RAG strategy
  • To understand this better: Read the Blog Paper on Adaptive RAG
  1. Hands-on Development
  2. 3 – Private RAG with Mistral, Ollama and Pathway

(Bonus) Adaptive RAG Overview

PreviousBuilding a Private RAG projectNext4 – Realtime RAG with LlamaIndex/Langchain and Pathway

Last updated 10 months ago

TLDR: You can dynamically adapt the number of documents in a RAG prompt using feedback from the LLM. This allows a 4x cost reduction of RAG LLM question answering while maintaining good accuracy. The method also helps explain the lineage of LLM outputs.

Understanding Adaptive RAG

You know Retrieval Augmented Generation (RAG) allows Large Language Models (LLMs) to answer questions based on knowledge not present in the original training set.

At Pathway, we use RAG to build document intelligence solutions that answer questions based on private document collections, such as a repository of legal contracts. We are constantly working on improving the accuracy and explainability of our models while keeping costs low. Adapptive RAG is a trick that helps us achieve those goals.

It's all about Balancing Costs and Accuracy

In practical implementations, the number of documents in the prompt must balance costs, desired answer quality, and explainability. A larger number of documents increase the LLM's ability to provide a correct answer but also increases costs and can complicate model outputs.

Adaptive RAG strategy

Adaptive RAG dynamically adjusts the context size based on the question's complexity and the LLM's feedback:

  1. Initial Query: Ask the LLM with a small number of context documents.

  2. Adaptive Expansion: If the LLM refuses to answer, re-ask with a larger prompt, expanding the context size using a geometric series (doubling the number of documents each time).

Experiment Insights

We conducted experiments to analyze the accuracy and cost efficiency of this approach:

  • Base RAG: Shows a typical relationship between accuracy and supporting context size.

  • Error Analysis: Reveals that more context reduces "Do not know" responses but increases hallucinated answers.

  • Adaptive RAG: Efficiently balances cost and accuracy by starting with a minimal prompt and expanding only when necessary.

Key Findings:

  • Even a single supporting document yields 68% accuracy.

  • Doubling the context size reduces costs and improves accuracy incrementally.

  • Overlapping prompt strategy maintains accuracy better than non-overlapping prompts.

To understand this better: Read the Blog Paper on Adaptive RAG

This might also be a good time for you to revisit the complete talk by Łukasz Kaiser (Co creator, ChatGPT, Transformer, GPT4o, TensorFlow) and Jan Chorowski (Co-author with Bengio and Hinton, Ex-Google Brain, CTO at Pathway) on the future of LLMs at a recent Pathway SF Meetup.

The talk includes various elements we've covered so far.

LogoAdaptive RAG: cut your LLM costs without sacrificing accuracy | Pathway