04. Key Terms and Variants
Naive RAG
The baseline implementation: one retrieval pass using vector similarity, top-k chunks fed directly to the LLM. Fast to build, sufficient for simple factual queries, but degrades on ambiguous questions, domain-specific terminology, and queries requiring multi-step reasoning.
Advanced RAG
Adds multiple refinement layers around the core pipeline:
- Query rewriting to resolve ambiguity and improve domain-specific matching
- Hybrid retrieval combining dense vectors with sparse BM25 keyword search
- Reranking using a cross-encoder for post-retrieval precision
- Context compression to remove redundant or low-value passages before generation
- Feedback loops that allow chunks to be scored and improved over time
Advanced RAG is the recommended production default for most applications. Its cost-to-quality ratio outperforms both naive RAG and agentic RAG for the majority of queries.
Agentic RAG
Replaces the fixed one-pass pipeline with an autonomous agent loop. The agent:
- Decides whether the retrieved context is sufficient
- Decomposes complex questions into sub-queries
- Performs multi-hop retrieval (each result informs the next query)
- Validates retrieved content for contradictions and relevance
- Routes queries to different tools: vector stores, SQL databases, web search, APIs, code execution
Frameworks supporting agentic RAG include LangGraph (most mature for production), LlamaIndex Agents, Microsoft AutoGen, and CrewAI.
Agentic RAG costs 3 to 10 times more in tokens than advanced RAG. It is only justified for hard multi-step reasoning questions or cross-source synthesis tasks where standard retrieval fails.
Graph RAG
A variant that stores knowledge in a graph structure (nodes are entities, edges are relationships) rather than a flat vector index. Useful for queries that require traversing relationships, such as "what companies are connected to this person through board membership." Microsoft's GraphRAG research (2024) demonstrated gains on community-level summarization and multi-hop reasoning tasks.
Modalities
RAG is not limited to text. Multimodal RAG retrieves images, audio transcripts, tables, or structured data alongside text chunks, allowing the generative model to reason across modalities.