RAG Is Starting to Feel a Lot Like Microservices Did in 2017.

RYAN.SYS·SESSION_OK·PROXMOX_NODE: ONLINE·128_ACTIVE THREADS·4_CONCURRENT VENTURES·HOMELAB: R730XD·LOCATION: DALLAS_TX·RANK: E-7_CPO·ROLE: CTO·NET: 1_GBPS·MEM: 128_GB_DDR4·STATUS: BUILDING·RYAN.SYS·SESSION_OK·PROXMOX_NODE: ONLINE·128_ACTIVE THREADS·4_CONCURRENT VENTURES·HOMELAB: R730XD·LOCATION: DALLAS_TX·RANK: E-7_CPO·ROLE: CTO·NET: 1_GBPS·MEM: 128_GB_DDR4·STATUS: BUILDING·

loading…

[OK] dns resolved

[OK] tcp handshake

[..] waiting on payload

RAG Is Starting to Feel a Lot Like Microservices Did in 2017. — Ryan · ryanxf.com

RAG has become the default answer to an uncomfortable number of AI product questions.

Need better answers from the model? Add retrieval.

Need access to internal docs? Add retrieval.

Need fresher context? Add retrieval.

Need the demo to stop hallucinating long enough to survive a customer call? You guessed it: add retrieval.

Sometimes that is exactly the right move.

A lot of the time, though, RAG is starting to feel less like disciplined system design and more like microservices felt in 2017: a pattern with real value that got promoted into a reflex before most teams learned where the sharp edges were.

That is usually when architecture turns into theater.

We have seen this movie before. A useful technique arrives. It solves real problems for the teams that actually have those problems. Then the market decides it is not a technique anymore. It is a maturity signal. Soon every roadmap, every vendor pitch, and every over-caffeinated prototype has the same answer whether the problem requires it or not.

That is where I think a lot of retrieval work is drifting.

Not because RAG is fake. Because “add retrieval” is becoming the new “split it into services.”

And just like last time, a lot of teams are about to buy themselves more moving parts than understanding.

RAG is a tool. The market keeps trying to make it a personality.

At a high level, retrieval-augmented generation is not mysterious.

You take external context—documents, tickets, notes, code, specs, policies, whatever—retrieve the parts that look relevant, stuff them into the model context, and ask the model to answer with that material in view.

That is useful. In many cases it is exactly what you want.

If the model needs access to private knowledge that was not in training, retrieval helps. If the answer depends on fast-changing information, retrieval helps. If you want source-grounded responses instead of pure statistical improv, retrieval helps.

All true.

The trouble starts when teams stop asking whether retrieval is the right solution to their actual failure mode.

Because not every bad AI result is a retrieval problem.

Sometimes the issue is:

weak prompt structure
unclear task boundaries
bad source documents
missing system instructions
no output constraints
poor evaluation design
too much context, not too little
a workflow problem pretending to be a model problem

But retrieval has become fashionable enough that people reach for it before they diagnose any of that.

So now the stack gets more complex, latency goes up, observability gets worse, relevance gets hand-wavy, and everyone acts surprised when the answers are still inconsistent.

You did not fix the system. You just gave the confusion embeddings.

A lot of RAG deployments are solving for investor comfort, not system quality

This is the slightly mean part, but it is true.

For a lot of teams, RAG is not just a technical choice. It is a signaling choice.

It sounds serious. It sounds enterprise. It sounds like there is a pipeline, an index, a ranking layer, chunking strategy, freshness model, and governance story behind the scenes.

Whether there actually is one is another question.

I keep seeing products where retrieval is clearly present because it had to be present in order for the architecture slide to look modern. Not because the team had a disciplined reason for introducing it.

That leads to systems that can say:

we use vector search
we ground outputs in proprietary knowledge
we support enterprise retrieval
we built a knowledge layer for agents

Great.

Now answer the more useful questions:

Are the retrieved documents actually the right ones?
What happens when the top chunks conflict?
How stale is the index?
What is the failure mode when the retrieval misses?
Can you explain why one source ranked above another?
Do users know when the answer came from weak or partial context?
Have you measured whether retrieval improves correctness or just confidence?

That is the part where a lot of the swagger suddenly gets very quiet.

Because retrieval systems can make outputs look more grounded without making them meaningfully more correct.

And if you have ever watched a company congratulate itself for adding operational complexity before proving operational value, you already know why this bothers me.

Microservices became a cargo cult when teams forgot to price coordination

This is why the microservices comparison keeps nagging at me.

Microservices were not a bad idea. They were and are the right answer for certain scale, ownership, deployment, and isolation problems.

But once the pattern escaped into the broader market, a lot of teams stopped treating it like a tradeoff and started treating it like adulthood.

That is when perfectly ordinary applications got exploded into distributed systems they did not need.

Suddenly everybody had:

service discovery
network retries
partial failure
cross-service debugging
schema drift
operational sprawl
ownership ambiguity
latency tax
new categories of failure they were not staffed to understand

All in exchange for solving problems many of them did not actually have yet.

That is the risk with RAG.

If you add retrieval casually, you are also adding:

ingestion pipelines
chunking decisions
indexing jobs
freshness questions
ranking behavior
source quality issues
context-window budgeting
citation expectations
failure modes that are harder to inspect than a plain prompt

That does not mean do not use it. It means stop pretending it is free.

Every architecture pattern looks elegant before you factor in the cost of operating it.

Retrieval quality is downstream of information quality

This is the part a lot of AI teams still do not want to hear.

If your documentation is stale, contradictory, scattered, vague, politically edited, or missing the actual operational details people need, RAG does not magically fix that.

It industrializes access to the mess.

That can still be useful, to be fair. Sometimes faster access to messy information is better than no access.

But teams keep talking as if retrieval turns internal knowledge into a coherent system of truth. It does not. It turns your existing information environment into an input surface.

If that environment is bad, the model inherits the badness. It may even make it harder to notice, because now the answer is fluent, sourced, and decorated with plausible confidence.

This is one reason weak RAG systems are dangerous. They can create the feeling of groundedness without the substance of it.

The model found a document. Wonderful.

Was it the right document? Was it current? Did it reflect the actual policy, or just the policy somebody meant to replace six months ago? Did it describe the intended workflow or the real workflow humans silently use because the documented one never worked?

Retrieval does not answer those questions. Operators still do.

The retrieval pipeline is often where hidden product decisions go to hide

One thing I wish more teams understood is that RAG is not just infrastructure. It is product behavior.

Every retrieval system bakes in choices about what the product will consider knowable, relevant, trustworthy, recent, and worth showing.

Those choices show up in places like:

chunk size
overlap strategy
metadata filters
query rewriting
ranking logic
freshness windows
access controls
source weighting
fallback behavior
citation formatting

That is not plumbing. That is policy.

If you chunk the docs badly, you break meaning. If you rank for lexical similarity instead of task relevance, you get elegant nonsense. If you privilege old canonical docs over recent incident notes, you can retrieve the official answer instead of the answer that actually works. If you optimize for recall without precision, you flood the model with junk and call it context.

These are not small details. These are the system.

A lot of companies are still treating retrieval as if it were a neutral addon they can staple on near the end. That is how you end up with a very expensive ambiguity engine.

Sometimes a simpler design beats RAG outright

This is the part the hype cycle never likes.

There are plenty of cases where the best next move is not retrieval at all.

Sometimes you want:

better prompt structure
a curated knowledge base instead of a giant document dump
deterministic routing to specific sources
a narrow workflow with explicit fields
hand-built context assembly
templates plus validation
search without generation
a plain old form and a better process

That last one hurts people’s feelings, but it is true more often than the market would like.

Not every knowledge problem needs a language model to synthesize from semantically retrieved chunks. Sometimes the system should just fetch the exact runbook, exact account record, exact schema doc, or exact policy page and show it clearly.

Sometimes what people call a RAG use case is really a UI failure.

The user does not need a generated answer. They need the right source surfaced faster.

If generation does not add meaningful compression, judgment, or translation, then you may just be paying extra to make search less honest.

The teams that win with RAG will be the ones that treat it like systems engineering

I am not bearish on retrieval. I am bearish on lazy retrieval.

The teams getting durable value out of it are usually the ones doing the unsexy work:

cleaning source content
defining which documents deserve trust
measuring retrieval quality separately from generation quality
logging what was retrieved and why
testing freshness and conflict behavior
constraining where generation is allowed to improvise
making failures inspectable instead of magical

That is what maturity looks like here.

Not “we added a vector database.” Not “our agent is grounded in enterprise knowledge.” Not “we solved hallucinations with retrieval.”

You did not solve hallucinations. You changed the shape of them.

Maybe you improved the system. Good. But prove it with evaluation, not adjectives.

The real question is not “should we use RAG?”

The real question is:

What failure are we actually trying to fix, and what complexity are we willing to own in exchange?

That is the question teams skipped during the microservices stampede, and a lot of them paid for it in paging noise, coordination overhead, and architecture regret.

We should not repeat the same mistake just because the nouns changed.

RAG is powerful. It is also easy to overuse, easy to hand-wave, and easy to turn into a status symbol instead of a disciplined design choice.

That is why it is starting to feel familiar.

A real pattern. A real use case. A real source of leverage.

And, if teams are not careful, the next architecture fashion people adopt faster than they understand.

Use retrieval when the problem actually demands retrieval. Measure whether it helps. Pay the coordination cost with your eyes open.

Otherwise you may end up reenacting 2017 with embeddings instead of YAML.