Back to Blog

Off the Map: When Autonomous Agents Go Rogue

Ori Abargil

April 28, 2026

The efficiency, limitless scale, and promise of the autonomous AI agent are undeniable. By granting a model the ability to use tools, browse the web, and execute code, organizations can achieve true productivity automation. However, as recent incidents demonstrate, giving an AI agent write access to production environments creates a new class of systemic risk. When an agent is empowered to modify infrastructure, the distance between a minor hallucination and a catastrophic data loss event shrinks to zero. In this specific case, it shrank to 9 seconds.

The Cursor/PocketOS Incident: 9 Seconds to Total Wipeout
Catastrophic Failures

A recent viral thread from Jer Crane on X highlights exactly how quickly unseen agentic access and compounding cascading failures can spiral. In this instance, an AI agent was tasked with a routine operation but ended up destroying production data. The failure was not just a technical glitch; it was a fundamental breakdown in the agent’s reasoning and its relationship with the environment it was permitted to touch.

When autonomous agents go “off the map,” they often follow a fairly simple chain of events. It all begins with a lack of context. The agent attempts to solve a problem by generating a command that seems logical in a sandbox, but is typically lethal in production. Because the agent has broad permissions (often broader than the user realizes), and works with elevated autonomy, there is no manual, or automatic gate to stop execution. In fact, the agent may never have realized it had caused damage until the system failed or a user asked it to confess. The entire episode revealed a rather chilling reality: the AI native guardrails we assume are there are often just suggestions, sales pitches, or promises that never shipped.

A recent viral incident involving the startup PocketOS highlights how quickly unseen agentic access can spiral out of control. An AI agent in Cursor (using the flagship model Claude Opus 4.6), performing a routine operation, was tasked with fixing a staging credential issue. Instead of resolving the challenge locally or accepting defeat, the agent kicked off a fast cycle of cascading and compounding failures:

Credential discovery: A Railway CLI token, originally created for managing custom domains, was sitting in a file unrelated to the agent’s task. The agent discovered it via a filesystem scan. Compounding the risks, Railway tokens have no operationor environment-level scoping, so a token created for one narrow purpose carries blanket access across the entire GraphQL API. ncluding allowing destructive operations like volumeDelete.
Environment Confusion: Agent failed to distinguish between the staging task and the token’s production access.
Goal Drift: The agent independently attempted to fix a problem by taking a destructive shortcut without considering all side effects.
Unapproved destructive command execution: The agent triggered a single-line GraphQL mutation that deleted the entire production database, along with its volume-level backups, without requesting for any human approval before making a destructive call.

The failure shows a fundamental breakdown in the agent’s reasoning. Although the agent later confessed in the logs that it had violated its own instructions, it neither proactively informed the user nor asked for consent before proceeding. The reality is that enterprise guardrails often serve only as suggestions rather than hard, unbreakable defenses.

Beyond Access Control

Traditional security frameworks prioritize Access Control through scoped permissions, least privilege, and token-based authentication. While these remain a solid foundation, the recent Cursor incident on PocketOS offers additional proof that static permissions cannot defend against rogue agents. Current models define what an agent is authorized to touch, but fail to account for intent, behavior, or the sensitivity of the surrounding environment.

The PocketOS breach highlights a critical vulnerability in how we define Agentic Identity. Security often focuses on the credentials explicitly assigned to an agent, ignoring the access implicitly available through its operational environment. In this instance, the agent utilized a valid token that was not intentionally provisioned to it; rather, the token existed within a file that the agent was able to discover and read.

Because the system recognized a legitimate credential, it permitted a catastrophic request. A model based strictly on permissions fails to recognize that a valid action can still be a destructive one. While traditional access control solutions can limit the eventual blast radius, it cannot prevent an agent from wielding its discovered power. Relying only on identity based security is insufficient when an agent can inherit authority from its context rather than its configuration.

“Big Tech” Agentic Security Mirage

It is a mistake to think autonomous agentic risks are limited to startups. Major agentic AI players have demonstrated that their built-in protections are often insufficient for enterprise-grade security.

Amazon Q Developer (2025): A prompt-injection attack targeted the VS Code extension. Attackers injected instructions into a codebase that the AI read as context. The AI followed those instructions to delete local filesystems and exfiltrate data. Only a syntax error in the attacker’s code prevented a mass disaster for over 1 million developers.
Microsoft Copilot Permission Explosion: Copilot creates permission sprawl, acting as a high-speed search engine for improperly secured sensitive documents long forgotten within the organization.
ChatGPT and the CISA Leak (2026): Even security experts responsible for US national security are not immune. Early 2026 saw a public disclosure that high-level officials at (CISA), the Cybersecurity and Infrastructure Security Agency, mistakenly uploaded sensitive contracting documents into the public ChatGPT instance.

The Missing Layer: Runtime Security Protection

The industry currently prioritizes AI capabilities scaling over AI security. To protect production data at scale, we must eliminate the gap between that distinction:

Access Control Layer

This layer provides the baseline for what an agent can access. Without it, agents have an unlimited blast radius. It is a non-negotiable foundation, but it is a static defense and often focused on the permissions given to the agent, but not necessarily on possible actions the agent can do to escalate their privilege (such as token discovery in the filesystem).

Runtime Security Layer

This layer observes execution in real-time. It analyzes what the agent is actually doing and why. Runtime security evaluates the scope of the action, the sensitivity of the data accessed, and the level of destructiveness before finalizing the command.

Feature	Access Control	Runtime Security
Primary Goal	Define Permissions	Enforce Intent
Logic	Binary (Yes/No)	Contextual (Safe/Unsafe)
Function	Limits Blast Radius	Monitors (and Blocks) Malicious Behavior

Why External AI Security is Required

An AI vendor has a primary incentive to make its agent as capable and frictionless as possible. Rigorous security checks and permission scoping create friction that competes with the user experience. Relying solely on a model provider for security means trusting the same entity incentivized to maximize the agent’s autonomy.

Noma Research focuses on providing an independent layer of observation and protection. To protect production environments, organizations must implement defense strategies that exist outside the agent’s direct control:

Context-Aware Policy: Systems must recognize that a delete command is acceptable in some scenarios (such as development environment) but requires a hard stop in production.
Granular Permission Scoping: Restricted agents to the specific task at hand with permissions that expire once the task is complete. This should include all granted permissions, as well as inferred permissions that the agent can discover
Real-Time Enforcement: Security must involve the ability to allow, monitor, or block actions based on their destructiveness level, and the alignment of the agent’s behavior with the user’s original intent.

Moving Toward Execution Control

The goal of autonomous agents in any enterprise is to expand capabilities and increase speed and accuracy, not to operate in a vacuum. Access control limits what is possible, but runtime security controls what actually happens. As agents gain more autonomy, security must become real-time, behavior-aware, and independent of the agent.

To prevent the next 9-second disaster, organizations need to stop assuming that built-in agentic safety features are enough. True resilience comes from a security architecture that can distinguish between a valid command and a safe one, before the destruction is allowed to occur. It’s time to stop assuming that enterprise-grade AI tools, agents, and automation are safe as designed and instead start ensuring they are protected and secured.

5 min read

Back to Blog

Off the Map: When Autonomous Agents Go Rogue

The Cursor/PocketOS Incident: 9 Seconds to Total Wipeout
Catastrophic Failures

Beyond Access Control

“Big Tech” Agentic Security Mirage

The Missing Layer: Runtime Security Protection

Access Control Layer

Runtime Security Layer

Why External AI Security is Required

Moving Toward Execution Control

Category:

Share this:

Your Agents Are Trusting Each Other – Should They?

Cursor Triple Backtrick: Bypassing Guardrails for Arbitrary Command Execution

That’s a Great Question! What is an AI Agent Harness and Where Does Agentic Security Live?

PLATFORM

SOLUTIONS

COMPANY

LEARN

Stay updated

Back to Blog

Off the Map: When Autonomous Agents Go Rogue

The Cursor/PocketOS Incident: 9 Seconds to Total Wipeout Catastrophic Failures

Beyond Access Control

“Big Tech” Agentic Security Mirage

The Missing Layer: Runtime Security Protection

Access Control Layer

Runtime Security Layer

Why External AI Security is Required

Moving Toward Execution Control

Category:

Share this:

Your Agents Are Trusting Each Other – Should They?

Cursor Triple Backtrick: Bypassing Guardrails for Arbitrary Command Execution

That’s a Great Question! What is an AI Agent Harness and Where Does Agentic Security Live?

The Cursor/PocketOS Incident: 9 Seconds to Total Wipeout
Catastrophic Failures