💻
RAG and LLM Bootcamp
  • Welcome to the Bootcamp
    • Course Structure
    • Course Syllabus and Timelines
    • Know your Educators
    • Action Items and Prerequisites
    • Kick-Off Session for the Bootcamp
  • Basics of LLMs
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of LLMs
    • Bonus Resource: Multimodal LLMs and Google Gemini
  • Word Vectors, Simplified
    • What is a Word Vector?
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
    • Bonus: Overview of the Transformer Architecture
      • Attention Mechanism
      • Multi-Head Attention and Transformer Architecture
      • Vision Transformers (ViTs)
    • Bonus: Future of LLMs? | By Transformer Co-inventor
    • Graded Quiz 1
  • Prompt Engineering and Token Limits
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • For Starters: Best Practices
    • Navigating Token Limits
    • Hallucinations in LLMs
    • Prompt Engineering Excercise (Ungraded)
      • Story for the Excercise: The eSports Enigma
      • Your Task fror the Module
  • RAG and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)?
    • Primer to RAG: Pre-trained and Fine-Tuned LLMs
    • In-context Learning
    • High-level LLM Architecture Components for In-context Learning
    • Diving Deeper: LLM Architecture Components
    • Basic RAG Architecture with Key Components
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in RAG
    • Key Benefits of using RAG in an Enterprise/Production Setup
    • Hands-on Demo: Performing Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search (Bonus Module)
    • Bonus Video: Implementing End-to-End RAG | 1-Hour Session
    • Graded Quiz 2
  • Hands-on Development
    • Prerequisites (Must)
    • Docker Basics
    • Your Hands-on RAG Journey
    • 1 – First RAG Pipeline
      • Building with Open AI
      • How it Works
      • Using Open AI Alternatives
      • RAG with Open Source and Running "Examples"
    • 2 – Amazon Discounts App
      • How the Project Works
      • Building the App
    • 3 – Private RAG with Mistral, Ollama and Pathway
      • Building a Private RAG project
      • (Bonus) Adaptive RAG Overview
    • 4 – Realtime RAG with LlamaIndex/Langchain and Pathway
      • Understand the Basics
      • Implementation with LlamaIndex and Langchain
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Suggested Tracks for Ideation
    • Sample Projects and Additional Resources
    • Submit Project for Review
Powered by GitBook
On this page
  • 1. Which data types can RAG handle?
  • 2. Are Vector Databases necessary for RAG?
  • 3. Is a Separate Real-Time Processing Framework Needed for a Real-time Stream of Data?
  • 4. Is the ChatGPT Plugin for Bing Search an example of RAG?
  1. RAG and LLM Architecture

Versatility and Efficiency in RAG

PreviousRAG versus Fine-Tuning and Prompt EngineeringNextKey Benefits of using RAG in an Enterprise/Production Setup

Last updated 10 months ago

To further equip you with a comprehensive understanding of RAG and its adaptability, let's address some frequently asked questions (FAQs) and challenges that developers often encounter. A couple of them were asked to us during this bootcamp.

1. Which data types can RAG handle?

One of the most compelling features of RAG and frameworks such as the Langchain, LlamaIndex, or Pathway that we will use going forward is their flexibility to work with an array of data types. Whether it's relational databases, APIs, transcribed audio, or even live feeds from the internet, RAG can seamlessly integrate these into its retrieval mechanism. This adaptability enhances the model's ability to generate contextually accurate and informative responses.

  • Data types supported by these popular frameworks: Relational databases, free-form text, PDFs, APIs, transcribed audio, and streaming platforms like Kafka, Debezium, Redpanda, etc.

2. Are Vector Databases necessary for RAG?

Short answer – no. Let's understand.

If you don't know, a Vector Database is a specialized database that handles vector embeddings. These databases are primarily used to efficiently store, search, and retrieve vectors for their use cases in LLMs and Recommender Systems. Just like the emergence of foundational Large Language Models (LLMs) or what some call the "AI Wave," Retrieval Augmented Generation (RAG) is also a relatively new concept. It gained prominence after the 2020 publication "" by a team at Facebook AI Research.

Initially, vector embeddings led developers to naturally consider the application of dedicated vector databases – frequently seen in Recommender Systems.

However, this understanding is quickly evolving.

  • Misconception about Vector Databases: It's a common belief that vector databases are essential for RAG. While libraries for LLMs, such as LLM App, are compatible with renowned vector databases (for example – Pinecone, Weaviate, etc.,) these are not mandatory components.

  • Enterprise Challenges: Introducing any new database into an enterprise environment comes with its own set of complexities and challenges, making a simpler solution preferable in many instances.

  • Built-in Real-time Indexing within Specific Frameworks: Tools like Pathway have capabilities to generate and manage their real-time own vector indexes, negating the need for a separate vector database. Additionally, most conventional databases like PostgreSQL are expanding their features to include built-in support for vector indexing, thanks to extensions like PG Vector.

So, while the allure of vector databases exists, it's critical to understand that they are not the only path to efficient RAG implementation.

3. Is a Separate Real-Time Processing Framework Needed for a Real-time Stream of Data?

Good news – no. Under the hood, the LLM App uses Pathway, an ultra-performant data processing engine () which is suitable for both batch and streaming (live data) use cases. But how does it manage stream data processing related use-cases efficiently? Check out the Bonus Resource ahead on incremental indexing linked to the Bonus Module on kNN+LSH.

4. Is the ChatGPT Plugin for Bing Search an example of RAG?

Some may wonder whether certain web search plugins in LLM applications like ChatGPT utilize a Retrieval-Augmented Generation (RAG) approach. For example, a ChatGPT interface with a web search plugin can pull current information from the internet, providing a more accurate and up-to-date response.

Example: This plugin lets the model answer questions about things that happened after it was last trained, by retrieving information from a blog on the internet.

In a way, these can be considered LLM applications that employ a retrieval-augmented generation strategy. However, they do not retrieve data from a diverse set of data sources, such as a collection of PDFs, links, or Kafka topics. This affects the quality of their responses. In such scenarios, LLMs like ChatGPT can only offer the best answer they find from the specific webpage that the plugin accessed.

By understanding these facets, you're better equipped to leverage the strengths of RAG in your LLM architecture, whether for an enterprise solution or a personal project.

On the other hand, when you incorporate RAG into your custom LLM application, you benefit from efficient vector embeddings and vector search capabilities. These allow you to extract much more relevant information from a comprehensive data corpus, aiding in the identification of the most pertinent answers. When used with something like the optimized for real-time data streams (kudos to Pathway's engine under the hood), you're looking at retrieving the most relevant information from a real-time vector index. Isn't that fascinating?

LLM App
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
2023 benchmarks
🤩
Pathway LLM xpack