The War Beyond Models — What Agent Engineers Actually Build

“Hook up an LLM API, write a good prompt, and you’ve got an agent.”

That formula worked six months ago. Not anymore.

In December 2025, the Agentic AI Foundation (AAIF) launched under the Linux Foundation. Founding members: AWS, Anthropic, Google, Microsoft, OpenAI, Block, Cloudflare, Bloomberg — companies that compete fiercely, sitting at the same table. Anthropic donated MCP, OpenAI contributed AGENTS.md, Block gave goose. The entire industry agreed that agent systems need shared infrastructure.

This signals that agents have moved beyond “prompt + API call” territory. Agent engineering is becoming its own discipline.

One Big Direction: From Conversation to Execution

If you boil the current agent wave down to a single statement, it’s this:

LLM applications are shifting from “conversational answer tools” to “stateful execution systems.”

Agent engineers can no longer focus on prompts and models alone. An agent is now a software system comprising these components:

Component	Role
Model	Reasoning, planning, generation
Tools	APIs, DB, browser, code execution, SaaS connectors
Context	Files, conversations, user state, organizational data
Memory	Short/long-term memory, preferences, task history
State	In-progress tasks, failure/retry state
Policy	Permissions, prohibited actions, approval conditions
Evaluation	Result verification, quality measurement
Observability	Logs, traces, cost, latency
Human-in-the-loop	Approval, correction, abort, rollback

Notice: LLM is just one of nine components. An agent engineer isn’t “someone who calls LLMs well” — they’re closer to someone who safely embeds an uncertain reasoning engine inside an execution system.

Matt Webb’s concept of “context plumbing” captures this well — the engineering infrastructure to move context from diverse sources to where agents need it. The job of making an agent work well isn’t about using smarter models. It’s about designing better information flow infrastructure.

7 Companies, 7 Central Objects

The essence of each company’s agent strategy comes down to one question:

What object does the agent revolve around?

This single lens cleanly separates seven different approaches.

Google: Context Broker Across the User Surface

Central object: user, calendar, documents, search results.

Gmail, Calendar, Docs, Sheets, Android — Google’s user surface area is overwhelming. ADK 2.0 introduced a graph-based workflow runtime combining deterministic flows with adaptive AI reasoning, and the A2A Protocol v1.0.0 became an agent communication standard with 50+ partners.

The key engineering challenge is multi-app context integration and permission boundary design. Stitching together email, documents, calendar, and search results into a single task context — while limiting what data can be read and written across apps.

Google shows us that agents are evolving into context brokers spanning the entire user surface.

Anthropic / Claude: Verifiable Professional Workflows

Central object: specialized work documents, analysis tasks.

Claude’s direction isn’t a general-purpose assistant — it’s vertical agents for high-trust work. They donated MCP to AAIF to standardize tool integration and published the Agent Skills Standard for declaring agent capabilities. Claude Code reached $2.5B annualized revenue, becoming the reference implementation for coding agents.

The key challenge is domain-specific task decomposition and verifiable execution. Breaking complex work into review, extraction, judgment, and writing steps — connecting every judgment to source documents — ensuring humans can review intermediate results.

The lesson: not “delegate everything to the agent,” but separate what LLMs do well from what deterministic logic should handle.

Cursor: The Coding Agent Where Diff Is the Output

Central object: codebase, diff, branch, PR.

Named Leader in Gartner’s 2026 Magic Quadrant for Enterprise AI Coding Agents. Cursor 3 unified the workspace, Composer 2.5 significantly improved long-horizon agentic task performance, and Cloud Agents handle extended autonomous development work.

The key challenge is task isolation, code change verification, and parallel agent execution. Finding the right files and dependencies, executing in sandboxed environments, isolating multiple tasks to prevent conflicts.

The crucial insight: agent output isn’t “text answers” — it’s diffs. Agent quality is measured by change accuracy, test passage, and reviewability — not conversation quality. The supervision paradox from blog #11 is sharpest here.

OpenAI / Codex: General-purpose Agent Runtime

Central object: tool, runtime, agent step.

The Agents SDK evolved into a lightweight framework supporting 100+ LLMs, offering Sandbox Agents (containerized long-running tasks), Sessions (automatic history management), and Tracing (built-in observability). Codex expanded to a macOS app with 1M+ monthly developers.

The key challenge is designing an agent framework where model, tools, state, and evaluation are decoupled. Tool interfaces persist even when models change, different models are selected per task, and quality loops automatically evaluate results.

The takeaway from OpenAI’s approach: agent substrate — treating the model as a swappable component and designing the layers above it.

Microsoft: Control Plane for Agent Management

Central object: agent identity, policy, tenant.

Microsoft Foundry Agent Service offers three agent types: Prompt Agents (no-code), Workflow Agents (visual/YAML orchestration), and Hosted Agents (container-based custom code). Microsoft Entra Agent ID applies Zero Trust security to agents with RBAC, content filters, and VNet isolation.

The key challenge is agent registry, identity, permission, and audit. When many agents exist, the question isn’t just “do they work?” but “who controls them?” Agents need accounts, permissions, logs, and accountability.

Microsoft’s core message: an agent is software and an “actor” simultaneously. Agent engineering will tightly couple with DevOps, SecOps, and IAM going forward.

ServiceNow: Stateful Process Orchestrator

Central object: ticket, workflow, incident, approval.

These agents operate around business objects, not chat windows. State transitions (received → analyzing → processing → approved → completed), role handoffs between humans, systems, and agents, SLA awareness, and event-driven execution are the core.

ServiceNow-style agents are closer to process agents than task agents. The focus isn’t “one intelligent response” but maintaining state over long periods and driving processes to completion.

Salesforce: Customer Agent on Domain Objects

Central object: customer, account, case, opportunity.

Reasoning is grounded on CRM objects like Account, Lead, and Case, performing business actions like follow-ups, quotes, and case updates.

The lesson: agent memory shouldn’t be simple conversation history — it should be business objects and relationship graphs. The AI-Ready Data conditions from blog #13 apply directly here — agents need structured domain objects to function well.

Developer Insight

The central object is the starting point for all design. Seven companies each placed different objects at the center, and from that, context, actions, state, permissions, and evaluation all follow. When designing a new agent system, the first question isn’t “which model should we use?” — it’s “what is our agent’s central object?”

The “Central Object” Determines Everything

Here’s the seven-company summary in one table:

Company	Central Object	Engineering Edge
Google	User, calendar, docs, search results	Multi-app context and personalization
Anthropic/Claude	Specialized work docs, analysis tasks	High-trust vertical workflow
Cursor	Codebase, diff, branch, PR	Code change execution and verification loop
OpenAI	Tool, runtime, agent step	General-purpose agent runtime and tool abstraction
Microsoft	Agent identity, policy, tenant	Agent governance and control plane
ServiceNow	Ticket, workflow, incident	Stateful business process orchestration
Salesforce	Customer, account, case	Domain object-grounded CRM agent

Once the central object is defined, everything else follows:

Design Question	Answers Depend on Central Object
Where does context come from?	Repo, CRM, docs, email, tickets
What are the actions?	Create PR, send email, update case, call API
How does state change?	pending → running → blocked → approved → done
Where do permissions attach?	User, agent, tool, object
How is evaluation done?	Tests, grounding accuracy, SLA, revenue, CSAT
Where is human review needed?	Before sending, before deploying, before customer impact

6 Essential Design Patterns

As Anthropic’s “Building Effective Agents” emphasizes, successful agent implementations come from simple, composable patterns — not complex frameworks. Here are the six patterns every agent engineer needs.

Pattern 1. Plan → Act → Observe → Verify

The most fundamental agent loop:

Goal → Plan → Tool Call → Observation → Verification → Next Step or Done

The key: Verify must always follow Act. Don’t trust tool call results blindly. As Armin Ronacher pointed out, testing and evaluation are the hardest problems in agent design — requiring integrated observability, not isolated unit tests.

Pattern 2. Human-gated Action

High-risk actions must pass through human approval. This isn’t theoretical — in December 2025, an autonomous agent sent spam to Rob Pike. In May 2026, an agent ordered 120 eggs for a cafe with no kitchen.

Draft Action → Explain reason → Show evidence → Ask approval → Execute → Log result

Actions requiring approval: email sends (external communication), payments (financial impact), DB updates (data mutation), code merge/deploy (service impact), permission changes (security impact).

Pattern 3. Tool Permission Matrix

Separate what the model can do from what the agent is allowed to do.

Agent Type	Read	Write	External Risk
Research Agent	Docs, web	None	Low
Coding Agent	Repo read	Branch write	Medium
Support Agent	Customer/policy read	Ticket draft	High
Admin Agent	Config read	Permission changes	Very high

Pattern 4. Agent State Machine

Agent work must be managed as a state machine — enabling retry, abort, resume, and audit.

CREATED → PLANNING → WAITING_FOR_TOOL → RUNNING → WAITING_FOR_APPROVAL → COMPLETED

On failure: RUNNING → FAILED → RETRYING → ESCALATED

Without state, nobody knows “how far it got and why it stopped.” LangChain’s ADLC (Agent Development Lifecycle) framework also puts state tracking at the core of its Build→Test→Deploy→Monitor structure.

Pattern 5. Evidence-first Response

For high-trust agents, the “generate answer first, attach evidence later” approach is dangerous.

Retrieve evidence → Rank evidence → Extract facts → Generate answer → Check answer against evidence

Without this structure, hallucination is uncontrollable. As covered in blog #13, evidence quality ultimately depends on how AI-Ready your data is.

Pattern 6. Evaluator as First-class Component

Agents must have evaluators:

Evaluator	What It Measures
Grounding evaluator	Evidence-answer alignment
Tool result evaluator	Tool call result interpretation accuracy
Safety evaluator	Prohibited actions, sensitive data exposure
Task completion evaluator	Goal achievement
Cost evaluator	Tokens, API calls, execution time
Regression evaluator	Whether new versions break existing performance

Going forward, agent system quality will be determined by evaluation harnesses, not prompts.

Developer Insight

Understanding why a pattern is needed matters more than memorizing it. The common thread across all six patterns: LLMs are uncertain engines, so you need structures that isolate, verify, and control that uncertainty. For deeper pattern analysis, see the Agentic AI Patterns Guide.

The Agent Engineer’s Tech Stack

Building an agent system requires seven layers:

┌─────────────────────────────────────────────┐
│  Application Layer                          │
│  Chat UI · IDE UI · Workflow UI · Admin     │
├─────────────────────────────────────────────┤
│  Agent Layer                                │
│  Planner · Router · Executor · Verifier     │
│  Memory Manager                             │
├─────────────────────────────────────────────┤
│  Tool Layer                                 │
│  APIs · DB · Browser · Code Runner          │
│  File System · SaaS Connectors · MCP        │
├─────────────────────────────────────────────┤
│  Context Layer                              │
│  RAG · Search · Vector DB · Metadata Store  │
│  Knowledge Graph                            │
├─────────────────────────────────────────────┤
│  Control Layer                              │
│  Policy · Permission · Approval             │
│  Audit Log · Rate Limit                     │
├─────────────────────────────────────────────┤
│  Evaluation Layer                           │
│  Test Sets · Simulations · LLM Judge        │
│  Rule-based Checks · Replay                 │
├─────────────────────────────────────────────┤
│  Observability Layer                        │
│  Trace · Token Cost · Latency               │
│  Tool Error · User Feedback                 │
└─────────────────────────────────────────────┘

You need to see this entire picture to call yourself an “agent engineer.” Most people stop at the Agent and Tool layers, but production agents cannot operate without Control, Evaluation, and Observability.

The Coming Axes of Differentiation

Competition among agent companies won’t be decided by model performance alone. The real differentiation happens on these axes:

Axis	Core Question
Context depth	How deep and accurate is the context retrieved?
Tool reach	How many systems can be safely operated?
Statefulness	How reliably can long-running tasks execute?
Trust	Are there evidence, verification, approval, and logs?
UX	Does the user feel in control?
Governance	Can the organization manage its agents?
Evaluation	Can quality be continuously measured?
Distribution	Is the agent already embedded where users are?

A “good agent” isn’t just a smart model:

Good agent = Good model x Good context x Safe tool execution x Verifiable state management x User-trusted UX

Learning Priorities — Where Most People Stop

For growing as an agent engineer, this order is effective:

Tool calling / function calling structure
RAG and context engineering
Agent state machine ★
Workflow orchestration
Sandboxed execution
Human approval UX
Permission / policy / audit ★
Evaluation harness ★
Observability / tracing
Multi-agent coordination

Items 3, 7, and 8 are critical. Most people stop at 1 and 2. But production-grade agents cannot operate without state management, permission management, and evaluation systems.

Connecting this to the developer competencies for the AX era from blog #12: tool calling and RAG are “fundamentals” — state machines and evaluation are “differentiators.” An agent engineer’s real value comes from the ability to design execution systems, not call models well.

For foundations, see the AI Agent Guide. For pattern deep-dives, see the Agentic AI Patterns Guide.

Conclusion: It’s Execution System Engineering, Not LLM App Development

The direction of agent engineering is clear:

LLM app development → Execution system engineering that includes LLMs

Future competitiveness will come from how well you design state, tools, permissions, verification, and observability — not from how well you call models. The AAIF launch, A2A and MCP standardization, every company’s investment in agent governance — it all points the same way.

One question for agent engineers:

Are you still picking models, or are you designing systems?

References

Agentic AI Foundation (AAIF) Launch — Under Linux Foundation, 8 founding members
Google A2A Protocol v1.0.0 — 50+ partners, agent communication standard
Google Agent Development Kit (ADK) 2.0 — Graph-based workflow runtime
Anthropic, “Building Effective Agents” — Simple, composable patterns
OpenAI Agents SDK — Lightweight multi-LLM framework
Microsoft Foundry Agent Service — 3 agent types, Entra Agent ID
Matt Webb, “Context Plumbing” — Agent essence is information flow infrastructure
Armin Ronacher, “Agent Design is Still Hard” — Testing/evaluation is the hardest problem
LangChain, Agent Development Lifecycle (ADLC) — Build→Test→Deploy→Monitor
Cursor 3 / Composer 2.5 — 2026 Gartner Leader

The War Beyond Models — What Agent Engineers Actually Build

One Big Direction: From Conversation to Execution

7 Companies, 7 Central Objects

Google: Context Broker Across the User Surface

Anthropic / Claude: Verifiable Professional Workflows

Cursor: The Coding Agent Where Diff Is the Output

OpenAI / Codex: General-purpose Agent Runtime

Microsoft: Control Plane for Agent Management

ServiceNow: Stateful Process Orchestrator

Salesforce: Customer Agent on Domain Objects

Developer Insight

The “Central Object” Determines Everything

6 Essential Design Patterns

Pattern 1. Plan → Act → Observe → Verify

Pattern 2. Human-gated Action

Pattern 3. Tool Permission Matrix

Pattern 4. Agent State Machine

Pattern 5. Evidence-first Response

Pattern 6. Evaluator as First-class Component

Developer Insight

The Agent Engineer’s Tech Stack

The Coming Axes of Differentiation

Learning Priorities — Where Most People Stop

Conclusion: It’s Execution System Engineering, Not LLM App Development

References

Checklist: Agent Engineering Maturity Self-Assessment