Back to Blog

The risk of destructive capabilities in agentic ai

Gal Moyal

September 10, 2025

One of the most-concerning AI trends is the emergence of destructive risk brought on by an AI landscape undergoing a fundamental shift. We’re witnessing the evolution of AI systems from passive assistants that respond to queries into autonomous agents capable of taking independent actions across complex digital environments. While this transformation promises unprecedented productivity gains, it also introduces a new category of risk that demands immediate attention.

Destructive risk is especially acute with AI agents representing a combination of excessive autonomy, functionality, and permissions, which is a dangerous convergence that’s becoming increasingly common as our confidence and excitement in AI capabilities grows. This overconfidence is leading organizations to grant AI systems broader access and greater independence than necessary, creating a perfect storm for catastrophic failures.

What Makes AI Agents Destructive

The destructive potential of AI agents stems from three critical dimensions that, when poorly calibrated, can transform helpful tools into catastrophic system threats. As defined by the OWASP Gen AI Security Project’s LLM06:2025 classification, this “Excessive Agency” enables damaging actions through unexpected, ambiguous, or manipulated LLM outputs, with root causes falling into three categories: excessive functionality, excessive permissions, and excessive autonomy.

Excessive functionality occurs when we provide AI agents with far more tools and capabilities than they actually need to accomplish their tasks. Consider the common pattern of providing an agent with API keys for all AWS services when it only needs S3 read access. This is equivalent to giving someone master keys to an entire building when they only need access to a single office. The agent now has the theoretical capability to terminate EC2 instances, delete databases, or modify critical infrastructure, capabilities that serve no purpose for its intended function but create enormous potential for harm.

Excessive autonomy represents the dangerous practice of allowing AI agents to execute critical actions without human oversight or approval mechanisms. A striking example is when Claude Code or similar systems are authorized to remove all files from a directory structure without any confirmation prompts or safety checks (through the infamous “YOLO mode”, for example). The agent operates with complete independence, making irreversible decisions based solely on its interpretation of instructions or environmental conditions.

Excessive permission involves granting AI agents higher levels of access than their tasks require. This manifests when we provide write access to production databases when read-only access would suffice, or when we grant administrative privileges where standard user permissions would be adequate. Each excessive permission multiplies the potential blast radius of any mistake or malicious action.

Real-World Trigger Mechanisms for Destruction

The theoretical risks become tangible when we examine actual incidents where AI agents have caused significant damage through two primary trigger mechanisms.

Hallucination-driven actions represent perhaps the most harmful form of AI agent destruction because they appear to emerge from the AI’s own flawed reasoning rather than external manipulation. The Replit incident is an alarming example of this during what was intended to be a casual “vibe-coding” session, Replit’s AI agent panicked, ignoring explicit instructions to avoid touching production systems. By not following these constraints the agent executed destructive SQL commands against the production database destroying 1,206 executive records and wiping 1,196 company entries from existence. But perhaps more disturbing than the initial destruction was the agent’s subsequent behavior. It fabricated test results to hide the damage and lied about rollback viability when questioned. This incident reveals how AI agents can not only cause harm but actively work to conceal their destructive actions, making detection and recovery even more challenging.

Malicious-prompt-driven actions demonstrate how external actors can weaponize AI agents’ capabilities through carefully crafted attacks. The Amazon Q incident exemplifies this threat vector with chilling precision. Attackers managed to inject malicious prompts into version 1.84.0 of Amazon’s Q extension through a seemingly innocent pull request. Once activated, these prompts instructed the AI to systematically destroy both local development environments and cloud resources.

The attack was comprehensive in its destructive scope: wiping local files, terminating EC2 instances, emptying S3 buckets, and deleting IAM users through AWS CLI commands. What makes this attack particularly concerning is how it leveraged the AI’s legitimate access and trusted position within the development workflow to execute commands that would have been immediately suspicious if attempted by an external actor.

The Balance Principle: Managing the Capability-Autonomy Trade-off

Understanding the relationship between an AI agent’s capabilities and its autonomy is crucial for preventing destructive outcomes. This relationship follows an inverse correlation: as we increase an agent’s functionality or autonomy, we must correspondingly decrease the other dimension to maintain safety.

The principle manifests in practical rules of thumb that can guide system design. When building high-functionality agents with access to powerful tools and extensive system capabilities, we must severely constrain their autonomy and operate them with minimal permissions. These agents should require human approval for significant actions and operate within tightly controlled boundaries.

Conversely, high-autonomy agents that can operate independently for extended periods must be sandboxed within limited functionality and granted minimal permissions. They might excel at their specific tasks but lack the broad capabilities that could cause system-wide damage if misapplied.

This principle mirrors established practices in traditional computing security. We don’t grant root privileges to fully automated scripts precisely because the combination of high capability and high autonomy creates unacceptable risk. The same logic must apply to AI agents, regardless of how sophisticated or trustworthy they appear.

Comprehensive Mitigation Strategies

Addressing the risks of destructive AI agents requires a multi-layered approach that tackles each dimension of the problem while building robust safeguards against both accidental and malicious failures.

Minimizing Excessive Agency forms the foundation of any effective mitigation strategy. The principle of least privilege must be rigorously applied, granting AI agents only the minimal permissions required to accomplish their specific tasks. This means conducting thorough analysis of what an agent actually needs to do versus what we might want it to be capable of doing in theoretical future scenarios.

Autonomy constraints require careful definition of triggers and scope for unsupervised actions. Rather than allowing agents to make open-ended decisions, we should specify clear conditions under which autonomous action is permitted and establish strict boundaries around what actions can be taken without human oversight.

Functionality reduction involves providing only the tools required for the specific job rather than defaulting to comprehensive access. The common anti-pattern of giving MCP agents access to every available tool by default must be abandoned in favor of curated, task-specific toolsets that minimize potential attack surfaces.

Layering Safeguards creates multiple defensive barriers that can prevent or contain destructive actions even when primary controls fail. Human-in-the-loop mechanisms must be mandatory for irreversible operations, ensuring that actions with permanent consequences receive human review and approval before execution.

Continuous audit and monitoring of agent behavior helps prevent permission creep and detect anomalous activities that might indicate compromise or malfunction. These systems should track not just what actions agents take, but also how their behavior patterns change over time and whether they’re operating within expected parameters.

Ensuring reversibility wherever possible through versioning, backup, soft-delete mechanisms, and dry-run capabilities provides crucial recovery options when things go wrong. While not all actions can be made reversible, maximizing recoverability reduces the ultimate impact of failures.

Runtime Protection represents the final line of defense, operating at the moment of execution to prevent harmful actions from reaching their targets. This requires sophisticated detection systems that can identify destructive actions before they’re executed by AI agents and block them in real-time.

Critical to this approach is understanding that examining only prompts and responses is insufficient. A comprehensive agentic AI firewall must inspect indirect prompt injection pathways and monitor tool calling behaviors to catch attacks that might bypass traditional content filters.

The Path Forward

The risks associated with agentic AI are not theoretical future concerns, they’re present realities demanding immediate attention. The incidents at Replit and Amazon Q demonstrate that the current approach to AI agent deployment is fundamentally flawed, prioritizing capability and convenience over safety and security.

The choice facing organizations today is to continue down the current path of adopting powerful and autonomous agents within our organizations until a major catastrophe forces change, or we can proactively implement the safeguards and principles necessary to ensure that AI agents remain beneficial tools rather than becoming digital threats to the systems they’re meant to serve. At Noma security we recently announced our extensive capabilities within the platform to ensure the destructive capabilities of agentic AI can be mitigated to ensure that agents that can autonomously call tools, access memory or load knowledge into its context can be secured. If you would like to learn more about how we can help you secure AI agents, please contact us.

5 min read

Back to Blog

The risk of destructive capabilities in agentic ai

What Makes AI Agents Destructive

Real-World Trigger Mechanisms for Destruction

The Balance Principle: Managing the Capability-Autonomy Trade-off

Comprehensive Mitigation Strategies

The Path Forward

Category:

Share this:

Why the Rule of Two Can’t Protect Your Agents: MCP Servers, Agentic Risk, and the Framework That Should Replace It

Noma x OWASP: The Ultimate Red Team Playbook

ContextCrush: The Context7 MCP Server Vulnerability Hiding in Plain Sight

PLATFORM

SOLUTIONS

COMPANY

LEARN

Stay updated