RAG
Retrieval-Augmented Generation lets an LLM retrieve external knowledge before answering.
Use RAG when the model needs private, updated, source-backed, or domain-specific knowledge.
Use when
- Knowledge base Q&A
- Customer support over product docs
- Internal document assistants
- Source-backed answers with citations
Avoid when
- Strict calculations
- Transactional business workflows
- Pure tone or style adaptation
- Problems caused mainly by bad prompts
When RAG is the right first move
RAG is usually the first technical answer when the core problem is knowledge access. If the model does not know your private docs, product policies, changelog, support history, or domain corpus, retrieval gives it relevant context at answer time.
It is also useful when answers need visible evidence. A support assistant, research assistant, or internal knowledge bot often needs to show where the answer came from, not just sound plausible.
When RAG is the wrong fix
Do not reach for RAG when the real problem is workflow control, deterministic calculation, account permissions, or output formatting. Retrieval can give the model context, but it does not make the model a rules engine.
RAG also will not automatically improve bad content. If the documents are outdated, duplicated, or vague, retrieval mostly makes those weaknesses easier to surface.
Common mistakes
- Treating vector search as the whole RAG system.
- Chunking documents without testing answer quality.
- Skipping citations and evaluation.
- Adding more retrieved text when the prompt needs a clearer decision boundary.
Next decision
Compare RAG with fine-tuning when the question is whether to add knowledge or change behavior. Compare it with long context when the corpus is small enough to fit directly into the prompt.