From Tokens to Embodied Minds · Drill cards · Chapter 20
Drills
Advanced RAG and evals
10 atomic recall cards. Export to Anki and let spaced repetition do its slow work.
In Anki: File → Import, choose this TSV, set field separator to Tab, deck = Tokens to Embodied Minds · Ch 20, note type = Basic.
| Front | Back |
|---|---|
| What does BM25 do that dense embedding retrieval cannot? | BM25 matches exact tokens by term frequency — catching proper nouns, ticker symbols, and entity names that dense embeddings may miss through semantic generalization. |
| What is Reciprocal Rank Fusion (RRF) used for in hybrid retrieval? | RRF merges ranked lists from BM25 and dense retrieval by summing reciprocal ranks for each document, producing a single merged ranking that benefits from both sparse and dense signals. |
| What does a cross-encoder reranker do differently than a bi-encoder? | A cross-encoder sees the query and document concatenated together, producing a calibrated relevance score. It is more accurate than cosine similarity but requires O(N) forward passes — one per candidate document. |
| What is contextual retrieval and what improvement does it provide? | It prepends LLM-generated context describing each chunk's role in the full document before embedding. Anthropic reported a 49% reduction in retrieval failures (Sept 2024). |
| What is HyDE and when should you avoid it? | HyDE generates a hypothetical answer to the query and embeds that instead of the raw query. Avoid it on domain-specific entity questions where the LLM may confabulate wrong facts, drifting the embedding from true relevant documents. |
| Name the four metrics RAGAS measures. | Context precision, context recall, faithfulness (answer claims grounded in context), and answer relevancy (answer addresses the question). |
| Why is it wrong to measure only end-to-end accuracy for a RAG system? | The four RAGAS metrics fail independently. A system can be faithful but miss relevant chunks (high faithfulness, low recall) or retrieve correctly but hallucinate from the correct context. Collapsing to one metric erases diagnostic signal. |
| What query type does GraphRAG specifically outperform dense retrieval on? | Global sensemaking questions requiring cross-document entity resolution and aggregation — e.g., 'Which portfolio founders have prior exits?' — where dense retrieval cannot aggregate across all documents. |
| What is the practical cost of building a GraphRAG index over 500 VC memos? | Roughly 1,000 LLM calls for entity extraction (~$5 at GPT-4o mini pricing) plus offline graph construction (Leiden community detection) and storage in a graph database or parquet format. |
| What is the parent-document retrieval pattern and why is it useful? | Index small child chunks for retrieval precision, but return the full parent section (e.g., the full financial section of a memo) when a child chunk matches. This preserves retrieval precision while giving the LLM sufficient context to answer. |