AI Product 10 min read Published 2026-03-23 Updated 2026-04-08

Moving AI Agents from Hype to Engineering Reality: Finding Your Sweet Spot in the Architectural Spectrum

Don't jump into multi-agent systems just because they look advanced. Let's talk about the five levels of Agents and why in 2026, defining 'boundaries' is more critical than enhancing 'capabilities'.

Author Lusan
Published 2026-03-23
Updated 2026-04-08

By 2026, we have moved past the novelty of “LLMs can chat.”

The questions hitting my desk in Tokyo are no longer about benchmark scores. Enterprises are asking something much more pragmatic:

“Can this system actually take over a specific business process?”

However, as soon as we move from slides to implementation, things get messy. I see teams trying to force a simple RAG bot to handle edge cases it wasn’t built for, while others over-engineer multi-agent “AI teams” for tasks that a simple decision tree could solve. Many end up oscillating between the two, never quite finding the “just right” configuration.

The post-mortems for these projects usually tell the same story:

  • A basic FAQ system becomes a maintenance nightmare due to unnecessary orchestration.
  • “Reflection” loops are added for the sake of “intelligence,” pushing latency from seconds to nearly a minute.
  • A brittle toolchain collapses the moment a single API endpoint fluctuates.

Gartner’s data reflects this: at least 30% of GenAI projects are abandoned after the PoC stage, often due to runaway costs or a lack of clear business value. For Agent-specific projects, they predict over 40% will be cancelled by the end of 2027.

The mistake usually happens at the very first step:

We start stacking “capabilities” before we have defined the “boundaries.”


Why Boundaries Trump Capabilities

In AI system design, there is a seductive trap: the belief that a model’s reasoning power (its “IQ”) is the sole determinant of success. In my experience—spanning oceanography to medical AI—boundaries are what actually dictate whether a product survives production.

I define these boundaries across three dimensions:

  1. Task Boundary: Is there a clear “right or wrong”? Is this deterministic output or heuristic exploration?
  2. Latency Boundary: What is the tolerance for waiting? 1 second (interactive), 30 seconds (asynchronous), or several minutes?
  3. Authority Boundary: Is the system just “talking” (outputting information), or is it “doing” (calling APIs, modifying databases, triggering logistics)?

Using a high-autonomy Agent for a task that a simple RAG setup could solve isn’t just a waste of tokens; it introduces unnecessary stochasticity into a process that should be certain. Anthropic’s engineering team puts it bluntly: find the simplest solution that works, and only increase complexity when absolutely necessary.

The Building Blocks: Four Pillars of Agentic Systems

Before we look at architectures, we need a shared vocabulary. Regardless of complexity, these systems are built from the same four “bricks”:

  • The Brain (LLM Core): Responsible for decision-making based on context.
  • Planning: Breaking goals into steps. In lower-level systems, this is hard-coded; in higher-level ones, it’s dynamic.
  • Memory: Short-term context (chat history) and long-term knowledge (RAG/vector databases).
  • Action Space: The external tools the system can touch (APIs, code interpreters, ERP interfaces).

Lately, the industry has shifted focus. We’ve moved from Prompt Engineering to Context Engineering, and now to Harness Engineering. We aren’t just trying to make the “Brain” smarter; we are trying to build the best harness to make the existing Brain useful. When a system gains the ability to “act,” the primary concern isn’t just power—it’s how you constrain it.


The Landscape: A Spectrum from LLM Apps to AI Agents

I use the following framework to categorise systems based on the degree of autonomy the LLM holds. This spectrum serves as the roadmap for this series.

LLM APPLICATIONS
AI AGENTS
RESEARCH FRONTIER
L1
Basic Responder
L2
Router/Dispatcher
AGENT START
L3
Tool Executor
L4
Collaborator
L5
Explorer
LevelNameLLM RoleTypical Architecture
L1Basic ResponderPassive generation: “Talk when spoken to”Prompt + RAG
L2RouterClassifier: Directs intent to fixed rulesIntent Class. / Router
L3Tool ExecutorActive Planning: Decides which tool and whenFunction Calling / Tool Use
L4OrchestratorActive Coordination: Distributes tasks to sub-agentsMulti-agent systems
L5ExplorerAutonomous Iteration: Self-correcting, dynamic pathsGoal-oriented (Experimental)

A few clarifications:

  • L1 and L2 are workflows. The logic is controlled by external code; the LLM is just a component. L3 is where an “Agent” actually begins, as the LLM starts to dictate its own execution path.
  • I include L1/L2 because they are often the “correct” answer for many business problems.
  • L4 can contain L1–L3. An orchestrator’s sub-agents are often L3 executors themselves.

The Spectrum in Practice: An E-commerce Example

To make this concrete, imagine an e-commerce customer service evolution:

L1: The Basic Responder

  • Role: Knowledge base librarian.
  • Example: A user asks “What is your return policy?” The system finds the policy text and summarises it.
  • Nature: Highly controlled. The LLM only handles “translation” of data to prose.

L2: The Router

  • Role: The Triage Desk.
  • Example: The system identifies if a user is “Complaining” or “Asking for status.” If it’s a complaint, it triggers a hard-coded script.
  • Nature: Uses semantic understanding for routing, but the path remains a “railway track.”

L3: The Tool Executor (The Agentic Threshold)

  • Role: The Skilled Operator.
  • Example: User: “Where is my package?” The system realises it needs the get_shipping_status tool, extracts the tracking ID, and executes.
  • Nature: The LLM decides the path. It knows when to talk and when to act.

L4: The Orchestrator

  • Role: The Project Manager.
  • Example: User: “I want to return this, but I lost the invoice.” The system spins up a “Finance Agent” to verify the transaction and a “Logistics Agent” to book a pickup.
  • Nature: Tasks are parallelised and decomposed across specialised units.

L5: The Explorer

  • Role: The Junior Specialist.
  • Example: Faced with a unique, never-before-seen refund glitch, the system autonomously audits logs, writes a temporary data-cleaning script, and resolves it.
  • Nature: Experimental. High risk, high autonomy.

Autonomy vs. Architecture: Two Critical Lenses

When deciding on a direction, I separate two distinct concepts:

  1. Autonomy Level (The “What”): How much do we trust the system to work without a leash? (L1–L5).
  2. System Architecture (The “How”): Is this a single agent or a swarm?

They are not hard-linked. You can have a complex L4 Multi-agent system that requires human approval at every step (Low Autonomy), or a simple L3 single agent running 100% autonomously in a sandbox (High Autonomy).

Define the scenario (the Level) before you pick the architecture (the Bricks). Many projects fail because they use a “fashionable” Multi-agent architecture to solve what is essentially an L2 routing problem.

Three Questions to Locate Your Position

To determine your business boundary, ask these three questions:

1. Is there a clear “Right vs. Wrong”?

  • If yes (Data queries, summaries): L1/L2 is usually enough.
  • If it requires subjective judgment or real-time external info: L3 minimum.
  • If the goal is fuzzy and requires exploration: L4 and above.

2. What is the latency tolerance?

  • < 3 seconds (Real-time): L1/L2. L3 requires extreme optimisation.
  • 10–30 seconds (Async): L3/L4.
  • Minutes: L4/L5.

3. Does it actually need to “do” anything?

  • Information output only: L1/L2.
  • API calls or database writes: L3.
  • Cross-system, multi-role collaboration: L4.

The Quick Selection Framework:

Deterministic Outcomes + Information Only + Low Latency

Level 1
No Agent Required

Requires Judgment-based Routing Between Processes

Level 2
Conventional Workflow

Requires External System Integration + Defined Boundaries

Level 3
Agent Territory

Multi-role Collaboration + Parallelizable Task Decomposition

Level 4
Multi-Agent Orchestration

Open-ended Exploration + No Clear Termination Condition

Level 5
Caution in Production

What Follows

In the subsequent entries of this series, I will dismantle each level:

  • What a typical system at this level actually looks like.
  • When it is the “just right” choice.
  • Common design pitfalls and over-engineering traps.
  • How to build the necessary boundaries to keep it stable.

The goal isn’t to build the “smartest” system possible. It’s to build the one that survives production. In my experience, the systems that stick are rarely the most brilliant—they are the ones that are controllable, maintainable, and respect their boundaries.

Written by
Lusan

Thinking and creating at the intersection of data, decision-making, and design.

Series 03 · Agent Pragmatics: From Models to Engineered Systems
1 / 3