What is RAG?
Mar 25, 2026
In This Chapter
- The main idea behind this part of the RAG system
- The trade-offs that matter in practice
- The interview framing that makes the topic easier to explain
The Core Idea
RAG stands for Retrieval-Augmented Generation .
The idea is simple:
- retrieve external information
- give that information to the model as context
- ask the model to answer from that context
Instead of expecting the model to know everything from training, you let it look things up at query time.
Why RAG Exists
LLMs have two major limitations:
- they do not know data added after training
- they can hallucinate when asked for specific facts
RAG helps with both by grounding the answer in external documents, but it does not eliminate hallucination on its own.
The most useful way to describe RAG in interviews is:
RAG is a knowledge access pattern, not a reasoning upgrade
It helps the model read the right information at query time. It does not automatically make the model smarter.
A Simple Example
Imagine a support assistant asked:
What is our refund policy for enterprise plans?If the answer lives in an internal policy document, a plain LLM may guess.
A RAG system instead:
- finds the relevant policy chunks
- places them into the prompt
- asks the model to answer using only that evidence
That changes the model's job from "remember the answer" to "read the answer from retrieved text."
What RAG Is Not
RAG is not the same as fine-tuning.
- RAG keeps knowledge outside the model and retrieves it at query time
- fine-tuning changes the model's weights
For most factual, changing, or private knowledge bases, RAG is the better first choice. The deeper comparison belongs in a dedicated RAG vs Fine-tuning article.
The Two-Phase Mental Model
Every RAG system has two broad phases:
- indexing: prepare documents for retrieval
- query time: retrieve evidence and generate an answer
That is the basic mental model you should hold before diving into chunking, embeddings, vector databases, or evaluation.
The Core Trade-Offs
RAG is attractive because it improves freshness and grounding, but it introduces system trade-offs:
- better factual grounding vs more retrieval complexity
- fresher knowledge vs indexing and maintenance work
- higher answer quality vs more latency at query time
That tradeoff framing is often what interviewers actually want.
Why Interviewers Ask About RAG
RAG is not just an AI buzzword. It is a system design pattern.
Interviewers ask about it because it tests whether you understand:
- how to connect models to changing knowledge
- how retrieval quality affects answer quality
- how to trade off latency, precision, and recall
- how to reduce hallucination in practical systems
They are usually not testing whether you memorized the acronym. They are testing whether you understand the tradeoffs of connecting models to external knowledge.
Key Questions
Q: What does RAG stand for?
RAG stands for Retrieval-Augmented Generation. It means retrieving relevant external information first, then using that retrieved information to augment the model's generation.
Q: Why do people use RAG instead of relying only on an LLM?
Because LLMs have knowledge cutoffs and can hallucinate. RAG lets the system answer from current or private documents at query time, which improves factual accuracy and grounding.
Q: What does grounding mean in RAG?
Grounding means the model's answer is anchored to retrieved evidence rather than generated from memory alone. A grounded answer should be traceable back to specific chunks.
Q: What is the most important tradeoff introduced by RAG?
RAG improves freshness and grounding, but it adds retrieval infrastructure, latency, and more failure points. You are trading a simpler pure-model system for a more controllable but more complex system.