💻
RAG and LLM Bootcamp
  • Welcome to the Bootcamp
    • Course Structure
    • Course Syllabus and Timelines
    • Know your Educators
    • Action Items and Prerequisites
    • Kick-Off Session for the Bootcamp
  • Basics of LLMs
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of LLMs
    • Bonus Resource: Multimodal LLMs and Google Gemini
  • Word Vectors, Simplified
    • What is a Word Vector?
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
    • Bonus: Overview of the Transformer Architecture
      • Attention Mechanism
      • Multi-Head Attention and Transformer Architecture
      • Vision Transformers (ViTs)
    • Bonus: Future of LLMs? | By Transformer Co-inventor
    • Graded Quiz 1
  • Prompt Engineering and Token Limits
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • For Starters: Best Practices
    • Navigating Token Limits
    • Hallucinations in LLMs
    • Prompt Engineering Excercise (Ungraded)
      • Story for the Excercise: The eSports Enigma
      • Your Task fror the Module
  • RAG and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)?
    • Primer to RAG: Pre-trained and Fine-Tuned LLMs
    • In-context Learning
    • High-level LLM Architecture Components for In-context Learning
    • Diving Deeper: LLM Architecture Components
    • Basic RAG Architecture with Key Components
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in RAG
    • Key Benefits of using RAG in an Enterprise/Production Setup
    • Hands-on Demo: Performing Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search (Bonus Module)
    • Bonus Video: Implementing End-to-End RAG | 1-Hour Session
    • Graded Quiz 2
  • Hands-on Development
    • Prerequisites (Must)
    • Docker Basics
    • Your Hands-on RAG Journey
    • 1 – First RAG Pipeline
      • Building with Open AI
      • How it Works
      • Using Open AI Alternatives
      • RAG with Open Source and Running "Examples"
    • 2 – Amazon Discounts App
      • How the Project Works
      • Building the App
    • 3 – Private RAG with Mistral, Ollama and Pathway
      • Building a Private RAG project
      • (Bonus) Adaptive RAG Overview
    • 4 – Realtime RAG with LlamaIndex/Langchain and Pathway
      • Understand the Basics
      • Implementation with LlamaIndex and Langchain
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Suggested Tracks for Ideation
    • Sample Projects and Additional Resources
    • Submit Project for Review
Powered by GitBook
On this page
  1. RAG and LLM Architecture

Basic RAG Architecture with Key Components

PreviousDiving Deeper: LLM Architecture ComponentsNextRAG versus Fine-Tuning and Prompt Engineering

Last updated 10 months ago

Now that we've explored the various components that make up the architecture of Large Language Models (LLMs), let's dive into how Retrieval-Augmented Generation (RAG) can work synergistically with these components of an LLM architecture. The aim is to show you how RAG can supercharge an LLM's capabilities by seamlessly integrating real-time or static data sources into the information retrieval and generation processes.

RAG/LLM Architecture

For a nuanced understanding of how Retrieval-Augmented Generation (RAG) optimizes Large Language Models, we'll delve into the essential elements and procedural steps that comprise the LLM architecture.

  1. Data Sources: Whether your starting point is cloud storage, Git repositories, or databases like PostgreSQL, the first task is to bring these varied data forms together through pre-configured connectors.

  2. Dynamic Vector Indexing: Text from these data sources is broken down into smaller segments (also called "chunks") and converted into vector representations. Models specialized for text embeddings, such as OpenAI's text-embedding-ada-002, are employed here. These vectors are continuously indexed to facilitate rapid search later on.

  3. Query Transformation: A user’s input query is likewise transformed into a compatible vector representation, ensuring that it can be effectively matched with the indexed vectors for data retrieval.

  4. Contextual Retrieval: Algorithms like Locality-Sensitive Hashing (LSH) are applied to find the closest matches between the user query and the indexed data vectors, staying within the model's token limitations.

  5. Text Generation: With the retrieved context, foundational LLMs like GPT-3.5 Turbo or Llama-2 employ techniques from the Transformer architecture, such as self-attention, to generate an appropriate response.

  6. User Interface: Finally, the generated text is presented to the user via interfaces like Streamlit or ChatGPT.

LLM Architecture Diagram to show how RAG works with Real-time or Static Data Source