Reranking
Mar 30, 2026
In This Chapter
- The main idea behind this part of the RAG system
- The trade-offs that matter in practice
- The interview framing that makes the topic easier to explain
What Reranking Does
Reranking is a second-pass scoring step applied after initial retrieval.
Instead of trusting the vector database's first ordering, the system takes the top candidate chunks and asks a stronger model to sort them again by relevance.
query
-> retrieve top-20
-> rerank top-20
-> keep best 5That is the basic pattern.
Why Reranking Helps
Initial retrieval is optimized for speed.
Rerankers are optimized for relevance.
This matters because the first retrieval pass often returns chunks that are:
- topically related
- partially relevant
- not the best answer-bearing evidence
Reranking improves the final ordering so the generation step sees cleaner context.
Typical Model Choice
Rerankers are often cross-encoders or other relevance models that score:
- the query
- one candidate chunk at a time
They are slower than embedding search, which is why they are usually applied only to a small candidate set.
What It Improves
Reranking usually helps with:
- precision at the top of the list
- noisy corpora
- long documents with many semantically similar chunks
- hybrid search pipelines with mixed candidates
It is especially useful when retrieval is "almost right" but the best chunks are not consistently first.
What It Does Not Fix
Reranking is not a substitute for bad indexing.
It cannot rescue the system if:
- the answer-bearing chunk was never retrieved
- chunking is broken
- metadata filters are wrong
Reranking only improves ordering within the candidate set it receives.
Latency Tradeoff
The main cost is latency.
A common production pattern is:
- broad first-pass retrieval
- rerank a small set such as top-20 or top-50
- pass only the best few chunks to generation
That keeps quality gains while limiting cost.
Key Questions
Q: What is reranking in RAG?
Reranking is a second-pass relevance scoring step applied after retrieval. The system first fetches a candidate set, then uses a stronger model to reorder those candidates so the best evidence appears at the top.
Q: Why not use the reranker on the whole corpus directly?
Because rerankers are much slower than vector search. They compare the query against individual candidates in a more expensive way, so they are practical only after a fast first-pass retriever narrows the search space.
Q: What problem does reranking solve best?
It is best at fixing ordering quality when retrieval is close but imperfect. If the right chunks are already in the candidate set but not ranked highly enough, reranking can move them to the top before generation.