One of the questions I’ve been sitting with since RSAC is deceptively simple: when your orchestrator agent hands off a task to a sub-agent, how does the sub-agent know the instruction is legitimate?

It sounds almost philosophical. But it isn’t. It’s a governance gap most of us haven’t closed yet, and the infrastructure to close it is only just arriving.

We’ve spent the better part of two years thinking about how, and even if, humans should trust AI agents. How we govern their identities, scope their permissions, review their access. That work is necessary. But it’s only part of the risk equation. In a multi-agent system, an operationally critical question is how agents decide to trust each other, and what happens when they get that wrong.

Multi-Agent Systems

Let’s look more closely at multi-agent architecture to understand the security implications in the handoffs.

In a simple agentic workflow, like Claude Cowork, a human issues an instruction to an orchestrator agent. The orchestrator breaks the instruction into subtasks and delegates those to specialized sub-agents: one that searches, one that drafts, one that retrieves from internal systems, one that executes code. You may have seen some of these tasks go by on the screen as “thinking” when you use a frontier model like Claude or ChatGPT. Each sub-agent carries out its specific task and passes results back up the chain. The final output lands in front of the human, who may never see all of the intermediate steps.

That architecture is powerful because it enables multi-agent systems to do things a single model can’t. The specialization and parallelism are features that improve usability and functionality.

But think about what’s happening at each handoff. Agent B receives an instruction. That instruction arrived via a message from Agent A. Agent B may have no independent way to verify: was this actually Agent A? Did Agent A’s instructions reflect what the human originally authorized? Was Agent A’s session compromised before it generated this task? 

Trust is assumed, not verified.

The Structural Problem

In traditional IAM, trust flows from authentication. A user presents credentials, the credentials are verified, and access is granted accordingly. The credentials travel with the actor. When that actor does something, we have a verified chain of accountability.

In a multi-agent workflow, the verifiable identity and authorization context do not reliably travel with the task in a way that downstream agents can independently validate.

An orchestrator agent typically holds credentials for the human who initiated the session. When it delegates to a sub-agent, the sub-agent may inherit those credentials, operate under its own service identity, or receive scoped permissions for that specific task, depending on how the system is built. Few organizations have made those choices deliberately. Most are taking whatever the framework and agent harness, the software around the agent, defaults to.

That creates two problems that compound each other.

The first is impersonation. If an attacker can inject a malicious instruction into an orchestrator’s context, they can cause that orchestrator to generate a legitimate-looking delegation message to a sub-agent. The sub-agent receives an instruction that appears to come from a trusted peer. It has no mechanism to distinguish that from a genuine task. It executes.

The second is propagation. Because agents operate across chains, a compromise at step two can propagate through steps three, four, and five before anyone notices. The blast radius is not bounded to the agent that was manipulated. It extends to everything that trusted that agent’s output.

Consider a simple but realistic path: an orchestrator ingests a poisoned document, generates a task to retrieve and update internal data, a retrieval agent pulls sensitive information, and an execution agent writes or transmits that data externally. This is essentially the path of a very real vulnerability, ForcedLeak, documented by Noma Security Labs last year. Each step is locally valid. The chain, end to end, is not. From a detection standpoint, each action appears legitimate in isolation, which makes these failures difficult to identify in real time.

Why This is Different from Prompt Injection

It’s worth being precise here, because the two problems can blur together.

Prompt injection is about external content entering an agent’s context and being mistaken for a trusted instruction. The attack vector is data: a document, a web page, a database result that contains embedded commands.

Inter-agent trust is about what happens after an agent has processed input and is communicating with another agent. The attack surface is the channel between agents, and the mechanism of trust that governs it.

You can have strong protections against prompt injection and still have no controls over whether your sub-agents can verify the legitimacy of instructions they receive from an orchestrator. These are layered problems.

Both matter. But right now, most organizations are working on the first one and haven’t started on the second, and defense in depth requires both.

Infrastructure Advances

We’re starting to see the shape of an answer.

OWASP published the Agentic AI Top 10 in December 2025. It’s the first formal taxonomy of risks specific to autonomous AI agents. Goal hijacking, tool misuse, identity abuse, memory poisoning, cascading failures: the threat model is starting to be written down.

Implementations such as Microsoft’s open-source Agent Governance Toolkit (April 2026) include what is being called an Inter-Agent Trust Protocol. Cryptographic identity for agents, so a sub-agent can verify it’s receiving an instruction from a legitimate, uncompromised peer. Dynamic trust scoring, conceptually similar to Zero Trust approaches, so that an agent whose behavior has deviated from its baseline is treated with reduced trust before it can cause downstream harm.

The honest answer is that no platform has fully implemented this at enterprise scale yet. Most of the multi-agent frameworks organizations are deploying today, including LangGraph, CrewAI, Microsoft’s Copilot Studio, and Salesforce’s AgentForce, do not enforce cryptographic inter-agent authentication by default. Trust is implicit and assumed, not independently verified.

That will change. But the window between “agents are in production” and “inter-agent trust is enforced” is the risk window we’re in right now.

Security Fundamentals Still Apply

Least privilege. Separation of duties. Defense in depth. Blast radius containment.

These principles don’t break down in multi-agent systems. They get harder to apply, but they don’t become irrelevant.

Least privilege at the action level is what most people are focused on: an agent should only be able to do what its task requires. That’s right, of course, but it’s not enough. We also need least privilege at the delegation level: an orchestrator agent should only be able to authorize sub-agents to act within the scope the human explicitly granted. Not the full scope of the orchestrator’s own permissions. The sub-scope of what this specific task requires.

That’s a meaningful distinction. Right now, most sub-agents inherit whatever the orchestrator has, which often reflects the human’s full session permissions. That’s standing privilege applied transitively across a chain. It violates least privilege before the task even starts.

Separation of duties applies too. An agent that both plans and executes, both searches external sources and writes to internal systems, in very high impact systems this combines roles we might not grant a single human employee. Breaking multi-agent architectures into specialized agents with narrow permissions is not just an engineering choice. It’s a critical control.

And blast radius containment means thinking about what happens if one agent in the chain is compromised. Can that compromise propagate? What would stop it? If the answer is “not much,” the architecture needs runtime guardrails.

What to Do Now

Let’s get our governance thinking ahead of the tooling capabilities that are coming.

Start by mapping your agents and their delegation chains. For any multi-agent system running in your environment, document which agents are passing, or can pass, tasks to which other agents, and under what identity. Most organizations don’t have this picture yet. You can’t govern what you can’t see.

Then ask three questions about each handoff: What credentials does the receiving agent operate under? What scope of action is it authorized for, independent of what the sending agent has? And what would happen if the sending agent’s session were compromised before this handoff?

If the answer to the third question is “the sub-agent would execute whatever it received,” you have a trust boundary that is not being enforced.

Second, apply down-scoped delegation. When an orchestrator issues a task to a sub-agent, it should issue credentials scoped specifically to that task, not a copy of its own session permissions. Just-in-time, task-bound, expiring at completion. This is service account hygiene applied to dynamic agent delegation. The tooling to do this well is still maturing, but the design principle can be enforced architecturally now.

Third, require that audit trails reflect the full chain of delegation and authorization. When a sub-agent takes an action, the log should show not just what it did, but the delegating agent identity, the receiving agent identity, the scoped credential used, a task or correlation ID, and whether the instruction was verified or simply trusted. An audit trail that shows only the leaf-node action is useful for forensics and almost useless for governance.

Finally, watch the OWASP GenAI project and the emerging inter-agent authentication protocols. The tooling will catch up to the threat model. The organizations that understand the threat model now will be the ones who can evaluate and implement that tooling quickly when it arrives. If you are running multi-agent systems today, assume every delegation boundary is unverified until you have proven otherwise, and design controls accordingly.

Can We Trust AI Agents?

Most organizations have agents running in production that are trusting each other on the basis of proximity: this message came through the right channel, from something that looked like a peer, so we proceeded.

That’s not a criticism. But the deployment velocity is ahead of the governance velocity, and that gap is where the incidents will come from. A compromised orchestrator or sub-agent that passes a perfectly formatted, a well-formed, legitimate-looking task to a downstream agent, which executes it faithfully, at the full scope of its delegated permissions.

“Can we trust AI agents?” is the right question. We’ve just been applying it in only one direction.

Agents need to be asking it too.

5 min read

Category:

Table of Contents

Share this: