Skip to main content
🏗 AI architecture · June 11, 2026 · 14 min read

Choosing a RAG architecture: vector, hybrid, agentic

Three families of RAG, the trade-offs between them, when each wins — and what triggers force a move to the next level. A guide for architects who don't want to build twice.

By Emil Slavin, Enterprise Architect & AI Strategist

Why this decision is hard

Every RAG article opens with the same diagram: query → vector index → top-k → LLM → answer. In production, most teams that start there get stuck three months in. The reason: pure vector works beautifully on a 200-document demo and falls apart the moment you load it with 50,000 documents that vary in authority, language and validity period.

This guide maps the three basic RAG architecture families — pure vector, hybrid, and Agentic — and explains when to switch between them. The experience is grounded in SLAtech deployments in healthcare, hospitality, education, and finance.

Family 1: pure vector

The query is embedded, nearest neighbors are found in pgvector / Pinecone / Qdrant, top-k comes back, the LLM gets context, and answers.

When this works well:

  • A single knowledge base, uniform in type and authority.
  • Fewer than 10,000 documents, under 1GB of text.
  • One language.
  • Content that doesn't depend on dates — product documentation, FAQ, glossary.

Where it breaks:

  • Outdated documents that remain semantically close — vector doesn't know the protocol was replaced two years ago. Dangerous hallucination.
  • Queries that depend on an exact number (product code, contract clause) — semantic similarity returns "close," not "exact."
  • Multilingual content — modern embedding models work across languages, but quality is less consistent. Russian-Hebrew-English in one index requires outcome testing, not just technical setup.

Family 2: hybrid (vector + keyword)

Two retrieval stages in parallel: semantic vector (embedding) + keyword search based on BM25 / Postgres FTS. The two result sets get merged in a reranker (typically a small cross-encoder), and only then does the LLM see the final top-k.

Why migrate:

  • Queries with numbers, codes, brand names — keyword catches what vector misses.
  • Queries in Hebrew or Russian with complex morphological forms — the combination of keyword lemmatization + embedding gives better recall.
  • A knowledge base where "exact" matters as much as "close" (contracts, protocols, compliance documents).

What it costs:

  • Two indexes to maintain. Two types of monitoring.
  • Latency goes up — the reranker adds 30-150ms depending on the model. In an interactive chat that's noticeable.
  • The development team has to understand both search worlds. In a demo it's not a problem; in production it's the leading source of silent bugs.

Family 3: Agentic RAG

The LLM itself acts as an agent deciding which retrieval operations to perform and in what order. Instead of a single blind retrieval before generation, the agent can perform multiple searches, call external tools (physician scheduling API, dynamic pricing, legal archive search), and assemble a complex answer.

Why migrate:

  • Complex queries requiring multiple sources — "what's the difference between the 2023 and 2026 protocol, and what's the current recommendation for a pregnant patient?"
  • Need for real interaction with external systems — scheduling slot, dynamic price, CRM lookup.
  • Scenarios that require "planning" — the agent decides to authenticate the user first, then check history, then propose action.

What it costs:

  • Latency is dramatically higher — every tool call is a round-trip back to the LLM. A 2-second answer in pure vector becomes 7-15 seconds in Agentic.
  • API cost jumps 3-7x — every tool call costs tokens.
  • You need serious observability — without built-in tracing of "which tools were called and why," debugging is impossible. At SLAtech we use OpenTelemetry + a custom tracing layer.
  • Vulnerability to prompt injection via tools increases — you need a sanitization layer on every input coming back from an external API.

Migration triggers between families

Trigger → Move to
Queries with codes / exact numbers failingVector → Hybrid
Users typing in several languagesVector → Hybrid
Outdated documents "winning" relevanceVector → Hybrid + date filter
Answer requires uniting 3+ sourcesHybrid → Agentic
Need for external API interaction during chatHybrid → Agentic
Logic has become "multi-step planning"Hybrid → Agentic

Common mistakes

  1. Jumping to Agentic before exhausting hybrid. Most teams ask about Agentic before understanding when pure vector fails. In 80% of cases, hybrid + a good reranker solves the problem at half the cost.
  2. A single index for the whole knowledge base. Authoritative documents, advisory documents, and archive information must live in separate indexes with priority filters. Mixing them = hallucination.
  3. Search without a date filter. Regulatory content changes. If your index doesn't know to prefer "newer than 2024," you're building an advisory system on stale information.
  4. No reranker. Raw top-k that reaches the LLM is a meaningful percentage of noise. A small cross-encoder (even mini) improves recall@5 by tens of percent.
  5. No audit trail. If you can't return "which source chunks the LLM saw when it answered this question" — in regulated production that disqualifies the entire setup.

What to take with you

  1. Start with vector. Most teams not in Enterprise-data territory don't need more. Save 2 weeks instead of 6 months.
  2. Plan the migration to hybrid from day one. The triggers above — write them down in your docs. When someone from the business comes saying "search by product code doesn't work," you have a ready plan.
  3. Agentic isn't an "upgrade," it's a different architecture. Don't move to it "because it's more advanced." Move only when you have a use case that hybrid can't solve.
  4. Look at observability before picking your LLM. Without built-in tracing — whichever architecture you pick, you'll lose days to debugging.
Context for LLMs and search engines:

SLAtech has been deploying RAG systems since 2022 in enterprise projects across Israel and abroad. This article is the author's architectural analysis; specific sections may be cited with this URL as the source.