MCP vs RAG

The MCP-versus-RAG question gets asked constantly and almost always misframed. The terms describe different layers of the stack. Model Context Protocol governs how a model talks to capabilities. Retrieval-augmented generation governs what gets fed into a prompt before the model speaks. The only systems forced to pick between them are the ones that have not thought carefully about either.

Two things travel under the MCP-versus-RAG question, and almost none of the comparisons being drawn between them are real comparisons. The terms describe different layers of the stack. Model Context Protocol governs how a model talks to capabilities. Retrieval-augmented generation governs what gets fed into a prompt before the model speaks. The only systems forced to pick between them are the ones that have not thought carefully about either.

§01What each thing is

MCP is a protocol. Anthropic published it in late 2024 as a way to standardize how an agent reaches outward. File systems, browsers, databases, issue trackers, anything you can wrap in a callable interface. Before MCP, every agent framework reinvented its own tool-calling shape. After MCP, the model talks to a Linear server and a Postgres server and a Playwright server the same way. The protocol does not care what the tool does. It cares that the request and the response have predictable structure.

RAG is a pattern, not a protocol. Take a corpus you own. Chunk it. Embed each chunk into a vector. Store the vectors. At query time, embed the user's question, retrieve the nearest chunks, and stuff those into the prompt as context before the model generates. The whole thing is a single round-trip through the model with extra context smuggled in. Nothing here cares how the model talks to tools. The model is not even told that retrieval happened.

§02They live on different layers

The categories pass right past each other. MCP is about how an agent reaches outward. RAG is about what you put in front of the model before it speaks. You can run RAG without ever touching MCP, just by stuffing retrieved chunks into the system prompt of a vanilla chat call. You can run MCP without ever doing RAG, by giving an agent a filesystem MCP and a Linear MCP and letting it search and read live, on demand.

You can also do both at once, which is where most production systems eventually settle. The cleanest version of "both" is the one almost nobody draws on the whiteboard: wrap the vector store as an MCP tool. The agent gets a search_docs tool that, internally, runs a RAG retrieval. The agent decides when to call it. The retrieval is still RAG. The dispatch is still MCP.

§03When RAG carries the weight

Pick RAG when the corpus is large, mostly static, and the answer is extractive. Documentation. Product knowledge bases. Compliance archives. Legal corpora. Any place where the user's question can be satisfied by finding the right paragraph and rephrasing it.

Pick RAG when queries are predictable enough to tune for. Embedding choice, chunk size, reranker. These knobs reward effort, and the effort compounds across queries. A one-off agentic search loop will not match a well-tuned retrieval pipeline on either cost or latency for the queries it was tuned for.

Pick RAG when the latency budget is tight. One round-trip beats N round-trips, and an agent that needs to call a tool, read the result, decide whether to call again, and finally answer will lose to a single retrieve-then-generate pipeline every time it has to compete with one.

The cost of RAG shows up in the seams. Retrieval is opaque to the model. If retrieval misses, the model has no way to ask for a different search. It will answer from whatever it got, confidently and wrongly. This is the failure mode that RAG evals are trying to catch, and it is the one that makes RAG hard to ship in agentic contexts where questions are not purely extractive.

§04When MCP carries the weight

Pick MCP when the task requires action, not just retrieval. Creating a Linear issue. Running a SQL query. Editing a file. RAG cannot mutate anything. MCP can.

Pick MCP when the workflow spans systems that already have structure. Linear has a graph. Postgres has rows. The filesystem has files. Wrapping any of those in a vector store discards the structure that made them useful and then tries to recover it through embeddings. Querying them directly through an MCP server gives the model the real shape.

Pick MCP when freshness is the constraint. Vector stores are snapshots. The moment the underlying system changes, the embeddings drift. For anything whose answer turns on what is true right now, ticket status or deploy state or billing balance, RAG becomes a tax. MCP just reads the live system.

§05Where they meet

The honest pattern for most production AI products is RAG behind an MCP tool. You build the retrieval pipeline you would have built anyway. You wrap it as an MCP server with one or two well-named tools, search_engineering_docs or search_compliance_corpus or whatever the corpus actually is. The agent decides when to call those tools, receives retrieved chunks, and proceeds with whatever live tools it also has on hand.

This pattern keeps the things that make RAG efficient (the static corpus, the tuned retrieval, the cheap round-trip when it works) and the things that make MCP general (the agent's option to combine retrieval with action, to retry with a different query, to fall back to live data when retrieval misses). It costs one extra layer of indirection. In return, the agent gets the option to not retrieve, which turns out to matter more than anyone expects.

◆ pull quote

“The systems that struggled with RAG were not the ones that needed better retrieval. They were the ones where retrieval was the only move the model could make.”

§06Why the framing keeps showing up

Two reasons. The first is commercial. Vendors selling RAG infrastructure have every reason to position MCP as a passing fad, and vendors selling MCP servers have every reason to position RAG as the old way of doing things. Both messages travel well. Neither is true.

The second is older. Comparisons are easier to read than layered architectures. "X vs Y, which should I use?" is a tractable post. "X and Y are on different layers and most real systems combine them" is a worse headline. The cleaner answer loses to the more clickable one, the same way it has for the past two decades of infrastructure punditry.

The piece worth writing is not the comparison. It is the architecture. Which parts of the product need a static corpus, which parts need live state, which parts need action, and where the agent should be allowed to choose between them at run time. That is the work. The vs framing is a shortcut around it.

◇ summary · field notes

$ vibgineer summarize mcp-vs-rag

01
MCP, the protocol
- ▸tool-call transport between model and capabilities
- ▸file systems, browsers, databases, issue trackers
- ▸standardizes invocation, response, and error shape
02
RAG, the pattern
- ▸embed corpus, store vectors
- ▸retrieve k-NN at query time
- ▸stuff into prompt before generation
03
When each carries the weight
- ▸RAG for large static corpora and extractive answers
- ▸MCP for live systems, actions, multi-tool work
- ▸latency versus freshness as the main axis
04
Where they meet
- ▸wrap the vector store as a search MCP tool
- ▸agent decides when to retrieve
- ▸falls back to other MCP tools on miss
- ▸most production systems end here

✓ Stop comparing. Start layering.