💻
RAG and LLM Bootcamp
  • Welcome to the Bootcamp
    • Course Structure
    • Course Syllabus and Timelines
    • Know your Educators
    • Action Items and Prerequisites
    • Kick-Off Session for the Bootcamp
  • Basics of LLMs
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of LLMs
    • Bonus Resource: Multimodal LLMs and Google Gemini
  • Word Vectors, Simplified
    • What is a Word Vector?
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
    • Bonus: Overview of the Transformer Architecture
      • Attention Mechanism
      • Multi-Head Attention and Transformer Architecture
      • Vision Transformers (ViTs)
    • Bonus: Future of LLMs? | By Transformer Co-inventor
    • Graded Quiz 1
  • Prompt Engineering and Token Limits
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • For Starters: Best Practices
    • Navigating Token Limits
    • Hallucinations in LLMs
    • Prompt Engineering Excercise (Ungraded)
      • Story for the Excercise: The eSports Enigma
      • Your Task fror the Module
  • RAG and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)?
    • Primer to RAG: Pre-trained and Fine-Tuned LLMs
    • In-context Learning
    • High-level LLM Architecture Components for In-context Learning
    • Diving Deeper: LLM Architecture Components
    • Basic RAG Architecture with Key Components
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in RAG
    • Key Benefits of using RAG in an Enterprise/Production Setup
    • Hands-on Demo: Performing Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search (Bonus Module)
    • Bonus Video: Implementing End-to-End RAG | 1-Hour Session
    • Graded Quiz 2
  • Hands-on Development
    • Prerequisites (Must)
    • Docker Basics
    • Your Hands-on RAG Journey
    • 1 – First RAG Pipeline
      • Building with Open AI
      • How it Works
      • Using Open AI Alternatives
      • RAG with Open Source and Running "Examples"
    • 2 – Amazon Discounts App
      • How the Project Works
      • Building the App
    • 3 – Private RAG with Mistral, Ollama and Pathway
      • Building a Private RAG project
      • (Bonus) Adaptive RAG Overview
    • 4 – Realtime RAG with LlamaIndex/Langchain and Pathway
      • Understand the Basics
      • Implementation with LlamaIndex and Langchain
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Suggested Tracks for Ideation
    • Sample Projects and Additional Resources
    • Submit Project for Review
Powered by GitBook
On this page
  • 1. Fine-Tuning vs RAG
  • 2. Prompt Engineering vs RAG
  1. RAG and LLM Architecture

RAG versus Fine-Tuning and Prompt Engineering

PreviousBasic RAG Architecture with Key ComponentsNextVersatility and Efficiency in RAG

Last updated 10 months ago

In the rapidly evolving landscape of Large Language Models (LLMs), achieving cost-efficiency and operational simplicity is critical, and this is where Retrieval-Augmented Generation (RAG) shines. When compared to methods like Fine-tuning and Prompt engineering, RAG stands out due to its advantages in cost-effectiveness, simplicity, and adaptability.

Let's individually explore these options to understand where RAG excels.

1. Fine-Tuning vs RAG

For those less familiar with the concept, fine-tuning involves modifying a pre-trained language model (such as GPT-3.5 Turbo, Mistral-7b, or Llama-2) with a smaller, targeted dataset to work optimally for specific use cases.

While fine-tuning avoids the need to build a model from scratch, it does have its drawbacks, which RAG effectively addresses.

Data Preparation Challenges

  • Having control over training data permits steps to address biases, yet implementing such measures is far from straightforward. Interventions like altering variable importance or ensuring balanced data distribution demand in-depth data analysis skills.

  • Furthermore, expertise in the subject matter is essential for accurately annotating data that serves specialized or research-specific functions.

Cost Efficiency

  • Retraining and deployment are not only time-consuming but also financially taxing.

  • For instance, using vector embeddings API in RAG models is roughly 80 times less expensive compared to commonly utilized fine-tuning APIs from OpenAI (check / ).

  • Consider the need to repeat this process each time your company launches a new product, all to ensure that your teams are not provided with outdated information from your Gen AI model.

Data Freshness

  • At the outset, it's only logical to expect that when developing an LLM application, you'd want your large language model to consistently deliver current and pertinent out.

  • When it comes to fine-tuning, the model's accuracy can significantly decline if the data undergoes changes or isn't regularly updated. Consequently, despite the associated challenges, this task must be performed at frequent intervals to maintain the model's efficacy.

Note: If needed, you can consider using RAG with LLMs that are fine-tuned for your use case. Here's a recent discussion on OpenAI's Developer Forum along these lines.

2. Prompt Engineering vs RAG

Prompt engineering might seem like a lighter alternative; however, it comes with its challenges, such as data privacy, inefficient retrieval of information, and the technical constraint of a token limit.

  • Data Privacy: For organizations handling sensitive information, manually copy-pasting large chunks of data to retrieve a specific piece poses a risk of unintended data exposure.

  • Inefficient Retrieval: When dealing with vast data corpora, knowing where to find the relevant data becomes crucial. Manual prompt engineering lacks the efficiency offered by automated mechanisms, such as vector indexing in RAG, which enables quick and semantically accurate data retrieval.

  • Token Limit Constraints: Language models have built-in token limitations, restricting the amount of text they can process in a single prompt. This makes it challenging to include all the necessary information in one interaction.

In contrast, RAG's approach of storing data in efficient vector indexes circumvents these limitations by facilitating quick and semantically relevant information retrieval, making it a more viable option for dealing with large and complex data sets.

OpenAI Pricing
Google Cloud Vertex AI Pricing
LogoFine-tuning vs Context-Injection (RAG)OpenAI Developer Forum