Atlas
Knowledge
Sketches
Projects
← Back to Sketches
Mar 31, 2026
rag
RAG Overview
A visual overview of retrieval, ranking, and generation flow in a RAG pipeline.
Open Viewer
Challenge with simple rag
1, Messay raw data (e.g., Image, diagram, charts)
2, Inaccurate retrieval
3, Complex questions
RAG (Retrieval-Augmented Generation)
Inference Time
RAG
Retrieved Context
Training Time
Pre-training
Fine-tuning
LLM
Ways to Give an LLM Knowledge
Table stakes
Advanced Retrival
Fine-tuning
Agentic Behavior
More expensive
Harder to implement
Higher Lantency/Cost
Less expensive
Easier to implement
Lower Latency/Cost
Better Parsers
Chunk Sizes
Metadata Filters
Hybrid Search
Reranking
Recursive Retrieval
Embedded Tables
Small-to-big Retrieval
Embedding fine-tuning
Routing
Query Planning
Multi-document Agents
1. Better Data Parser
Ways to improve RAG performance
Big chunk
Small Chunk
More granularity
More diverse sources
Incomplete contect
Less granularity
Less diverse sources
Lost in the middle problem
Tools:
- pypdf -> LlamaParse
- LlamaParse (local documents)
- FireCrawl (website to markdown)
2. Chunk size
* Setup Eval to find optimal chunk size (response time, faithfulness, Avg relevancy)
* Routing rag pipeline (Optimal chunk size for specific documents)
3. Rerank
Most relevent chunks
Reranker
Better-ranked Chunks
4. Hybrid Search
Vector search
Keyword search
Reranker
Better-ranked Chunks
keyword search captures
exact terms, entities, and
identifiers
vector search captures
semantic similarity
5. Agentic RAG
Query planning
Query translation
Step-back prompting:
- instead of answering the user’s question directly,
- the model first steps back and asks a more general, higher-level question,
- then uses that broader view to answer the original question.
- before retrieving or answering,
- the system first breaks a complex user question into smaller steps or sub-queries,
- and decides how to solve it step by step.
Metadata
filter/routing
Routing and Metadata filtering in Agentic RAG
allows the agent to dynamically select the right data source and narrow retrieval using attributes
such as date, source, product, or document type.
User Query
Agent
Selected Data Source
Metadata Filters
Retrieval with Filters
Retrieved Chunks
Corrective RAG Agent
Vector Database
Embeding
Raw data
Chunking
Embeding
User Query
similarity Search
store
Retrieved
relevant chunks
Response
LLM
context
Indexing part
Query part
User Query
RAG
Retrieval
Knowledge
Correction
x: Who was the screenwriter for Death of a Batman?
Retrieved Documents
d1
d2
Correct
Ambiguous
Incorrect
Knowledge Refinement
d1
d2
Decompose
strip1
strip2
stripk
strip1
k_in
Recompose
Knowledge Searching
x
q: Death of a Batman;
screenwriter; Wikipedia
Rewrite
k1
Select
Correct
x
+
k_in
Ambiguous
x
+
k_in
+
k_ex
Incorrect
x
+
k_ex
Generator
Retrieval
Evaluator
Ask: If retrieved
documents are
correct to x?
strip2
Filter
k_ex
k2
Web
Search
Generation
kn
...
...
Corrective RAG Agent
Source:
- Youtube AI Jason https://www.youtube.com/watch?v=u5Vcrwpzoz8&t=41s
✦