rag medium complexity mvp

RAG

Retrieval-Augmented Generation lets an LLM retrieve external knowledge before answering.

Decision

Use RAG when the model needs private, updated, source-backed, or domain-specific knowledge.

Use when

  • Knowledge base Q&A
  • Customer support over product docs
  • Internal document assistants
  • Source-backed answers with citations

Avoid when

  • Strict calculations
  • Transactional business workflows
  • Pure tone or style adaptation
  • Problems caused mainly by bad prompts

When RAG is the right first move

RAG is usually the first technical answer when the core problem is knowledge access. If the model does not know your private docs, product policies, changelog, support history, or domain corpus, retrieval gives it relevant context at answer time.

It is also useful when answers need visible evidence. A support assistant, research assistant, or internal knowledge bot often needs to show where the answer came from, not just sound plausible.

When RAG is the wrong fix

Do not reach for RAG when the real problem is workflow control, deterministic calculation, account permissions, or output formatting. Retrieval can give the model context, but it does not make the model a rules engine.

RAG also will not automatically improve bad content. If the documents are outdated, duplicated, or vague, retrieval mostly makes those weaknesses easier to surface.

Common mistakes

  1. Treating vector search as the whole RAG system.
  2. Chunking documents without testing answer quality.
  3. Skipping citations and evaluation.
  4. Adding more retrieved text when the prompt needs a clearer decision boundary.

MVP implementation path

Start with a small, high-value corpus such as product docs, support articles, or internal policies. Split the documents into chunks, create embeddings, retrieve the most relevant chunks for a user question, and ask the model to answer only from the retrieved evidence.

For the first version, keep the architecture boring: ingestion script, vector store, retrieval query, answer prompt, and citations. Add hybrid search or reranking only after you see real misses in retrieval quality.

Failure modes to watch

RAG fails quietly when the wrong context looks plausible. Track whether the retrieved chunks actually contain the answer, whether the final answer cites the right source, and whether users ask follow-up questions because the first answer was too broad.

The most useful early evaluation set is not large. Ten to twenty realistic questions with expected source documents can reveal bad chunking, weak retrieval, missing metadata, and prompt instructions that let the model answer from memory instead of evidence.

Next decision

Compare RAG with fine-tuning when the question is whether to add knowledge or change behavior. Compare it with long context when the corpus is small enough to fit directly into the prompt.

Compare RAG with prompt engineering when the model already knows enough but needs clearer instructions. If better instructions fix the output, retrieval may be unnecessary complexity.