Blog 3.2 Addressing Agentic Risk: Part 2

In the last blog we did a deeper dive into actions companies can take now to improve their agentic AI deployment success and security. Continuing the steps, in this blog we’ll cover runtime, logging, human oversight and identity.

Runtime Safeguards

Agents don’t just need a safe launch, they need continuous monitoring once they’re live. Inputs and outputs should be scrutinized for jailbreak attempts and policy violations, blocking attacks like prompt injections that cause goal manipulation (T6) or cascading hallucinations (T5). Runtime guardrails should block risky tool calls to prevent misuse (T2), enforce rate limits to resist overload attempts (T4), and validate action plans before execution. And no matter how advanced the system, operators must retain a kill switch, essential for stopping misaligned behaviors (T7) or rogue agents (T13) in real time.

Additionally, to augment a runtime defense-in-depth strategy, use behavioral analytics and anomaly detection to flag deviations that fall outside expected patterns, helping defend against cascading hallucinations (T5) and rogue agents (T13) and to ensure model integrity verification. Strong containment measures such as process isolation and sandboxing keep compromised agents from accessing sensitive resources or moving laterally, mitigating privilege compromise T3). Adding least privilege access controls further reduces the blast radius of compromise and using secure inter-agent communication protocols like A2A which support TLS and OAuth can help prevent communication poisoning (T12) and identity spoofing (T9).

Logging, Audit, and Forensics

Accountability is impossible without evidence. Every tool call, memory update, and plan change should be logged in structured, immutable records, making it harder for attackers to cover their tracks (T8). Traceability with unique IDs ensures you can reconstruct exactly what happened if tools are misused (T2), privileges escalated (T3), or unexpected code execution attempted (T11). Logs themselves must be sanitized to prevent sensitive data leakage that could later be used in human manipulation attacks (T15). Properly implemented, these logs act as the “black box” recorder when something goes wrong.

But organizations should not stop there. First, observability for agentic AI requires enriched context. Logs need to capture not just the action, but also the reasoning chain or plan state the agent used. This is highly important for post-incident forensics where you may need to determine whether intent was broken (T6) or deceptive behaviors emerged (T7). Second, companies should implement cryptographic signing of logs at the point of creation. This makes repudiation nearly impossible and ensures regulators or auditors can verify the integrity of evidence over time (T8).

For additional forensic readiness, maintain versioned snapshots of agent memory and context, so investigators can roll back to a prior state and verify whether malicious changes were introduced (T1). Finally, embed observability into cross-team response playbooks. For auditability, logs and telemetry need to flow not just into security, but also compliance, legal, and operations teams. This aligns with OWASP’s call for comprehensive monitoring and governance in agent ecosystems.

Human Oversight and Controls

Despite the hype, AI agents can’t be trusted blindly. Some actions, like moving money, writing to a production database, or touching personal data, should always require explicit human approval. This prevents catastrophic fallout from privilege compromise (T3) or intent manipulation (T6). Human-in-the-loop (HitL) workflows should also be designed to withstand flooding attacks, where adversaries overwhelm reviewers until oversight fails (T10). In regulated domains, combining HitL with explainability audits strengthens defenses against misaligned or deceptive behaviors (T7), making sure humans understand why the agent made a recommendation.

To make this oversight meaningful in practice, companies should design tiered escalation paths. Low-risk and perhaps even medium tasks can proceed automatically with monitoring, but high-risk actions should escalate to humans along with clear context to help with the decision. This reduces fatigue and makes oversight sustainable.

Where appropriate, enforce separation of duties(SoD)  in agent approval workflows. If a current workflow already requires SoD or multiple approvals ensure this carries into agentic workflows.  For example, the same operator should not both authorize and execute a sensitive action. This reduces the chance that a single compromised account or manipulated agent can bypass guardrails (T3, T9). Embed contextual alerts and replay tools. Oversight improves when humans can see not just “what” the agent is doing but also the reasoning chain that led there. The ability to playback logs and review dashboards allows reviewers to validate whether a decision aligns with policy or shows signs of deception (T6, T7, T8).

When implementing red-teaming during deployment hardening, don’t forget to include simulations of human oversight mechanisms. Test whether adversaries can exploit weaknesses by overwhelming reviewers.  Finally, consider adopting oversight-by-design principles by embedding policy checkpoints where humans can tune thresholds, revoke privileges, or inject corrective instructions without shutting down the whole system.

Identity and Trust

Finally, agents need to be treated like digital employees with proper identity and access management. Each should have a unique machine (non human) identity (NHI) that can be provisioned and de-provisioned like a user account, limiting impersonation attempts (T9). Inter-agent communication should be secured with cryptographic request signing and mutual TLS, addressing communication poisoning (T12). And in multi-agent swarm-style deployments, decentralized identifiers (DiDs) and verifiable credentials can establish trust across agents, reducing the risk of rogue agents (T13) and human attacks on multi-agent systems (T14).

Organizations should consider adopting ephemeral authentication models, where credentials are short-lived and scoped only to the task at hand. This reduces the chance that a stolen token could be reused for privilege escalation (T3) or persistent data access. By binding each credential to specific metadata (task, purpose, time window) auditability is improved. In the same vein, look at implementing just-in-time (JIT) identity provisioning to replace static service accounts. In this model, AI agents request temporary credentials only when needed, that expire when the action is complete. This narrows the attack window for impersonation (T9) and helps enforce least privilege.

Another way to manage identity risks associated with agents is to enforce continuous authorization and adaptive trust scoring. Rather than granting broad, static roles, agent access rights are adjusted dynamically based on real-time behavior, context, and risk signals. An agent that is exhibiting anomalous behavior, such as escalating privileges could be automatically suspended while a human reviews the activity.  This approach directly mitigates intent breaking (T6) and misaligned behaviors (T7).

Modern federated and fine-grained access controls approaches are a great fit for agentic NHIs. Attribute-Based Access Control (ABAC) and Policy-Based Access Control (PBAC) allow organizations to tie permissions not just to agent “roles” but also to environmental conditions, data sensitivity, and security posture. This prevents broad access rights from bleeding into sensitive systems and reduces cross-agent contamination (T1, T12). Treating agents like digital employees and applying robust identity controls to these NHIs ensures they remain productive while still accountable.