RAG
Retrieval-Augmented Generation lets an LLM retrieve external knowledge before answering.
Use RAG when the model needs private, updated, source-backed, or domain-specific knowledge.
Use when
- Knowledge base Q&A
- Customer support over product docs
- Internal document assistants
- Source-backed answers with citations
Avoid when
- Strict calculations
- Transactional business workflows
- Pure tone or style adaptation
- Problems caused mainly by bad prompts
When RAG is the right first move
RAG is usually the first technical answer when the core problem is knowledge access. If the model does not know your private docs, product policies, changelog, support history, or domain corpus, retrieval gives it relevant context at answer time.
It is also useful when answers need visible evidence. A support assistant, research assistant, or internal knowledge bot often needs to show where the answer came from, not just sound plausible.
When RAG is the wrong fix
Do not reach for RAG when the real problem is workflow control, deterministic calculation, account permissions, or output formatting. Retrieval can give the model context, but it does not make the model a rules engine.
RAG also will not automatically improve bad content. If the documents are outdated, duplicated, or vague, retrieval mostly makes those weaknesses easier to surface.
Common mistakes
- Treating vector search as the whole RAG system.
- Chunking documents without testing answer quality.
- Skipping citations and evaluation.
- Adding more retrieved text when the prompt needs a clearer decision boundary.
MVP implementation path
Start with a small, high-value corpus such as product docs, support articles, or internal policies. Split the documents into chunks, create embeddings, retrieve the most relevant chunks for a user question, and ask the model to answer only from the retrieved evidence.
For the first version, keep the architecture boring: ingestion script, vector store, retrieval query, answer prompt, and citations. Add hybrid search or reranking only after you see real misses in retrieval quality.
Failure modes to watch
RAG fails quietly when the wrong context looks plausible. Track whether the retrieved chunks actually contain the answer, whether the final answer cites the right source, and whether users ask follow-up questions because the first answer was too broad.
The most useful early evaluation set is not large. Ten to twenty realistic questions with expected source documents can reveal bad chunking, weak retrieval, missing metadata, and prompt instructions that let the model answer from memory instead of evidence.
Next decision
Compare RAG with fine-tuning when the question is whether to add knowledge or change behavior. Compare it with long context when the corpus is small enough to fit directly into the prompt.
Compare RAG with prompt engineering when the model already knows enough but needs clearer instructions. If better instructions fix the output, retrieval may be unnecessary complexity.