Blog 3.1 Addressing Agentic Risk: Part 1

Now that we understand what agentic systems are, how they are deployed, and what can go wrong, let’s focus on how to defend and protect these systems. As with most things in security, mitigating agentic risk requires a layered defense. But, these defenses must be adapted to the unique ways agents perceive, reason, and act. Here are the core strategies, drawn from OWASP’s guidance and hardened by field experience:

Discovery and Governance

Recent research indicates that companies are adopting agentics quickly with over 80% in a recent survey reporting they had already deployed agents in some capacity. ​​When adoption occurs at breakneck speed companies need robust governance to scale. Which is why addressing agentic risk starts with treating inventory as a living control surface, not a static list. An intelligent inventory supports agentic risk management and governance by continuously discovering agents, their tools, data flows, connections, and non-human identities, then highlights and prioritizes risks so security teams can fix the biggest exposures first. This enables early detection of tool misuse (T2)  through unvetted integrations, privilege compromise (T3) from over-permissioned connections and poisoned agents, and signals of misaligned behaviors (T7) when autonomy drifts from enterprise intent. It also exposes cross-agent blind spots that can fuel agent communication poisoning (T12). 

The more complex the architecture, the more important visibility becomes, and this is especially true in multi-agent systems, where dozens of agents may interact and a single rogue agent can propagate trust failures across the environment. Coupled with policies defining data access, escalation boundaries, and approval points, continuous discovery gives security teams the ability to see where the issues are and the data they need to prioritize fixes effectively.

Secure Design and Development

Building secure agents starts long before deployment. Agents need strong boundaries from the outset. Threat modeling should explicitly account for agent-specific risks such as memory poisoning (T1), recursive planning loops that drive intent breaking (T6), and tool misuse (T2). Threat modeling frameworks like MAESTRO extend traditional threat modeling frameworks into the agentic future with a focus on understanding the threats at each layer, cross-layer risks, and the dynamic, changing nature of agentic systems themselves.

Prompts must be hardened, with strict delimiters and prohibited instructions; think of the system prompt as the agent’s constitution, setting out what it can and cannot do. And above all, apply the principle of least privilege: agents and the tools they call should have only the access required for their role, limiting the fallout from privilege compromise (T3) and unexpected remote code execution (T11). Developers should also build in reflection mechanisms so agents can self-check outputs before taking actions which could help to reduce negative impacts from cascading hallucinations (T5).

Deployment Hardening

When agents move into production, their environment must be designed to contain failure, to limit what is sometimes referred to as the blast radius. Segmenting and isolating, via containers, microVMs, sandboxes, network ACLs, and security groups, can help prevent rogue behavior like misaligned or deceptive actions (T7) from spreading throughout the entire system. In multi-agent deployments, separating control (oversight and security) and data (processing and execution) planes help to ensure that communication poisoning (T12) or rogue agents (T13) cannot issue malicious commands or manipulate operational flows without security oversight. 

To help prevent resource overload (T5), configurations to impose limits in CPU, memory, timeouts and number of external connections so that even if an agent does go rogue it won’t be able to consume system resources and cause a denial of service. Credential management is also critical. Agents should use just-in-time, short-lived credentials, reducing the window of abuse if an attacker steals credentials and tries impersonation (T9) or privilege escalation (T3).

Equally important is red-teaming and pre-deployment testing. Before these guardrails are relied on in production, they must be pressure-tested in controlled conditions. Human red teams attempt prompt injections, privilege escalations, and resource exhaustion to verify that isolation boundaries, credential policies, and oversight mechanisms hold up under adversarial stress. Incorporating red-teaming into the deployment process also helps identify weaknesses in logging and monitoring (T8) and ensures that explainability features give humans enough context to spot deception before harm occurs (T7). 

For greater coverage, organizations should adopt AI-driven automated red-teaming alongside human testing. Automated agents can continuously probe systems across critical dimensions like permission escalation (T3), hallucination and reasoning errors (T5, T7), orchestration flaws (T6, T12), memory manipulation (T1), and supply chain risks (T2, T11) at a scale and speed humans alone cannot match. Automated scanning can simulate adversarial probing of endpoints for safety risks, measure attack success rates, and generate scorecards of vulnerabilities by category. Logging these results and tracking them over time provides a data-driven way to validate improvements and findings can be combined with runtime protections to help enforce compliance.

Memory Hygiene

An agent’s memory is both its strength and its Achilles’ heel. Every piece of data committed to memory should be validated and sanitized before storage, preventing poisoning attacks (T1). Encryption at rest and in transit ensures that sensitive information can’t be stolen or manipulated, reducing the risk of repudiation and untraceability (T8). Time-to-live (TTL) policies enforce digital amnesia, limiting the ability of attackers to corrupt long-term reasoning or propagate misinformation through cascading hallucinations (T5).

Organizations can further improve memory hygiene by adopting memory versioning with audit trails and self-healing forgetting mechanisms. Versioning ensures that changes to facts, instructions, or beliefs are tracked over time and audit trails help to prevent repudiation (T8).  Self-healing and forgetting mechanisms should help to detect outdated information or corruption within memory, then trigger cleanup to contain cascading hallucinations (T5) and reduce risks of misaligned behaviors (T7).