Back to Blog

Why the Rule of Two Can’t Protect Your Agents: MCP Servers, Agentic Risk, and the Framework That Should Replace It

Gal Moyal

March 17, 2026

The industry’s most popular agentic AI risk framework is already broken. At RSAC 2026, we’re presenting what comes next.

In October 2025, Meta published the “Agents Rule of Two,” a security framework inspired by Simon Willison’s concept of the “lethal trifecta.” The premise is straightforward: an AI agent becomes dangerous when it simultaneously satisfies three properties: (a) processing untrusted inputs, (b) accessing sensitive data or changing state, and (c) communicating externally. Limit the agent to any two of the three, and you’ve deterministically reduced the highest-impact consequences of prompt injection.

It was a clean, intuitive mental model that security teams latched onto. Unfortunately it is fundamentally broken, and not because it is wrong in principle. Constraining agent capabilities is always sound advice. But the threat landscape has moved faster than the framework, and real-world incidents have exposed fundamental gaps in its logic. If your agentic security strategy starts and ends with the Rule of Two, you’re building on a foundation that can’t hold.

At RSAC 2026, Noma Security researchers Sasi Levi and Gal Moyal will present a new framework for agentic risk, one built not around the sources of risk but around the amplifiers of it.

The limitations of the rule of two

The Rule of Two tells us that satisfying only two of the three conditions yields an acceptable risk posture, but in the real world this doesn’t work.

Consider just (a) untrusted input and (b) the ability to change state, without any external communication channel at all. In July 2025, a hacker injected a destructive prompt into the Amazon Q extension for VS Code via a simple GitHub pull request. The embedded instruction told the AI agent to wipe the local filesystem and delete AWS cloud resources. No exfiltration. No external communication. Just untrusted input meeting destructive capability. Two out of three, and the result was a potential system wipe affecting every developer who installed the compromised version.

Now consider just (b) alone. That same month, Replit’s AI coding agent deleted an entire production database containing over 1,200 executive records during an active code freeze. The agent wasn’t processing untrusted external input or exfiltrating data. It simply hallucinated, panicked, and executed destructive commands it should never have had permission to run. With no attacker involvement, there was complete data loss.

Figure 1: Agents Rule of Two: A Practical Approach to AI Agent Security, Meta, October 2025

Autonomy is the missing variable

The Rule of Two is silent on what may be the single most important factor in agentic risk: how much freedom the agent has to act without human oversight.

An agent that satisfies all three conditions but requires explicit human approval before every sensitive action may actually be safer than an agent with only two properties running in full YOLO mode. The Rule of Two can’t capture this distinction because it treats agent architecture as a static set of properties rather than a dynamic system with tunable controls.

Untrusted input and sensitive data aren’t always separate

In indirect prompt injection attacks, the very attack class the Rule of Two was designed to address, the untrusted input is often embedded within the sensitive data itself. A prompt injection payload hidden inside a document retrieved via RAG doesn’t arrive through a separate “untrusted input” channel. It arrives as part of the data the agent was designed to trust.

As RAG architectures become the default pattern for enterprise agents, the clean separation between “untrusted input” and “sensitive data access” dissolves. The two properties collapse into one, and the Rule of Two’s combinatorial logic breaks down.

Noma’s own vulnerability research has demonstrated this repeatedly. In ForcedLeak, we showed how an indirect prompt injection embedded in Salesforce Agentforce’s trusted data could force the agent to exfiltrate sensitive CRM records. In GeminiJack, we exploited the same class of flaw in Google’s Gemini and Vertex AI platforms, using poisoned context to hijack agent behavior from within the data layer itself. In both cases, the “untrusted input” and “sensitive data” weren’t two separate properties, they were the same thing.

The threat surface is wider than most teams realize

And it extends beyond MCP servers into the opaque reasoning layer where Skills operate. MCP risk is largely observable: structured tool calls, logged parameters, auditable code. But Skills, the textual instruction sets that shape how agents reason about tasks, influence behavior at a layer most security tooling cannot see. When a Skill manipulates an agent’s reasoning, no tool call fires and no structured log is produced. The downstream action (an email sent, a file deleted, a command executed) looks entirely routine.

Most organizations are governing the observable half of their agent attack surface and leaving the invisible half entirely unaddressed. This asymmetry is at the heart of why input-focused frameworks like the Rule of Two are insufficient. They assume risk enters through channels you can monitor, yet increasingly, it doesn’t.

What should replace it: No Excessive CAP

This is the question Sasi and Gal will answer at RSAC, and it starts with a fundamental shift in framing.

Rather than a framework built around the sources of risk (which properties an agent has), we need one built around the amplifiers of risk (how much damage an agent can actually do when something goes wrong).

OWASP already laid the groundwork. LLM06:2025, “Excessive Agency,” identifies three root causes of dangerous agent behavior that together form a more actionable model. We’re calling it No Excessive CAP:

C, Excessive Capabilities. What tools and actions does this agent have access to? The more high-impact tools an agent can invoke, the larger its blast radius.
A, Excessive Autonomy. How much freedom does this agent have to act without human oversight? Can it execute multi-step workflows end-to-end without approval?
P, Excessive Permissions. What identity does this agent operate under? Is it running on a static, over-privileged service account, or a scoped, delegated identity that inherits only the permissions of the user it’s acting on behalf of?

Each dimension is a dial, and not binary. The further you turn each dial toward the maximum, the more risk you accumulate. All three at maximum? An agent that can do anything, decides everything on its own, and runs with God-mode credentials. That’s not a risk posture, it’s a countdown.

And the dials don’t operate in isolation, they multiply. If we go back to the Replit incident: the agent had excessive capabilities (full database write/delete access), excessive autonomy (no human checkpoint before destructive operations), and excessive permissions (credentials that allowed production database deletion). Turning down any one of those dials would have prevented the outcome. Turning down all three would have made it structurally impossible.

What we’ll cover at RSAC

Their session will present the full No Excessive CAP framework alongside original research that makes the model concrete:

Why the Rule of Two fails in production. Real-world incidents that break the framework’s assumptions, including cases where a single property was enough to cause catastrophic damage.
Original vulnerability research. A deep dive into Noma’s ForcedLeak (Salesforce Agentforce) and GeminiJack (Google Gemini/Vertex AI) disclosures, showing how indirect prompt injection collapses the assumptions behind existing frameworks, and what these attacks reveal about where the CAP dials fail first.
The No Excessive CAP model in depth. How to assess and score your agent deployments across all three dimensions, including how they compound.

Add to your RSAC agenda

The bottom line

The Rule of Two was a reasonable starting point for a world where we were just beginning to think about agentic risk. But we’ve now moved past that world. Real incidents have shown that two-out-of-three can still be devastating, that autonomy is the variable that matters most, and that the clean separation between untrusted input and sensitive data is dissolving.

Agentic AI security requires a framework that’s continuous, controllable, and grounded in what teams can actually do. OWASP’s Excessive Agency definition gives us the foundation, and Excessive CAP gives us the language. Sasi and Gal will present the full framework and the research behind it at RSAC 2026.

5 min read

Back to Blog

Why the Rule of Two Can’t Protect Your Agents: MCP Servers, Agentic Risk, and the Framework That Should Replace It

The limitations of the rule of two

Autonomy is the missing variable

Untrusted input and sensitive data aren’t always separate

The threat surface is wider than most teams realize

What should replace it: No Excessive CAP

What we’ll cover at RSAC

The bottom line

Category:

Share this:

Noma x OWASP: The Ultimate Red Team Playbook

ContextCrush: The Context7 MCP Server Vulnerability Hiding in Plain Sight

AWS and Noma: Building the Foundation for Secure Enterprise AI

PLATFORM

SOLUTIONS

COMPANY

LEARN

Stay updated