Back to Blog

That’s a Great Question – Who Wrote the Instructions Your Agent Is Following?

Diana Kelley

June 24, 2026

A question I’ve been hearing more often lately, sometimes from developers, sometimes from CISOs who have just approved their first enterprise agent deployment: “We locked down the model, we secured the harness, we scoped the permissions. What are we missing?”

I was an English major. I never expected to write a blog post arguing that prose is a security threat. But in agentic systems, prose is part of the control plane, which makes it an attack surface.

The specific prose I’m talking about is skills.

Skills Explained

A skill is a folder containing a SKILL.md file: a markdown document with instructions written in plain English that tells an agent how to perform a specific task.

Here’s a really simple one for a meeting notetaker skill:

– – – name: meeting-notetaker description: Format and summarize meeting

transcripts into structured notes with action items When given a meeting

transcript, extract the key discussion points, decisions made, and action items

with owners. Format the output as: Summary, Key Decisions, Action Items (owner,

due date). – – –

Skills can also include code, scripts, templates, and reference files. When a user’s request matches what a skill describes, the agent loads those instructions into its context window and follows them, using whatever tools it already has access to.

It might help to think of it this way. An agent is a capable generalist. A skill is a specialist’s playbook. Without a skill, an agent figures out how to create an Excel file by generating a response based on patterns in its training data. With an Excel skill, it loads a precise, tested set of instructions developed by someone who knows exactly how to do it well. Both paths should work, but you’re more likely to get a high quality outcome with the Excel skill.

Anthropic launched Skills for Claude in October 2025, then published the Agent Skills specification as an open standard on December 18, 2025. In an interview with VentureBeat, Anthropic product manager Mahesh Murag stated that Microsoft has already adopted Agent Skills within VS Code and GitHub, with additional adoption across coding agents and further integrations under active discussion across the ecosystem. One prominent public marketplace, ClawHub for the OpenClaw agent platform, grew to over 10,000 community-contributed skills by mid-February 2026.

Why the Instruction Layer Is an Attack Surface

In traditional software deployments, we treat executable code with care. We scan packages, review dependencies, and run static analysis to ensure that the software using them will not be negatively impacted. Since a SKILL.md file is a markdown document, it looks less like software and more like documentation. But when an agent loads a skill into the LLM context window, the model follows those instructions the same way it follows a system prompt.

The model has no reliable mechanism to distinguish instructions from a vetted internal skill from instructions embedded by an attacker. If the SKILL.md says to retrieve files from a certain location and send a summary somewhere, the agent follows that. If the SKILL.md contains invisible Unicode tag characters encoding hidden commands that humans can’t see when the file renders in a browser, the agent may follow those too. Researchers at Embrace the Red demonstrated this in February 2026, taking a legitimate skill and showing how it could be invisibly backdoored with hidden Unicode instructions that caused the agent to execute an arbitrary command the next time the skill was invoked, completely invisible to anyone reviewing the file in a standard interface.

The bundled scripts that skills can include introduce a second attack vector. Scripts execute with the agent’s permissions. That means whatever access the agent has to your filesystem, credentials, APIs, and external network connections is also accessible to code bundled inside a skill. A skill that appears to help with calendar management may also read environment files, harvest API keys, and exfiltrate them to an attacker-controlled server. Meanwhile, the agent’s behavior looks completely normal because most of it is.

This is the mechanism that makes the risk so hard to catch: the malicious activity and the intended activity are indistinguishable. The agent is doing its job. It’s the skill that’s steering part of that job somewhere harmful.

What the Research Shows

Given this power, it’s not surprising that according to OWASP “the AI agent skill ecosystem is under active attack as of Q1 2026.” Public skill marketplaces have already been actively abused.

Between January 27 and 31, a campaign researchers named ClawHavoc flooded ClawHub with 341 malicious skills. Koi Security’s audit identified them: 335 of the 341 shared a single coordinated operation, targeting SSH keys, API credentials, wallet private keys, browser passwords, and .env files. Five of the top seven most-downloaded skills on ClawHub at peak infection were confirmed malware.

Snyk’s ToxicSkills audit, the largest public security scan of the skills ecosystem to date, scanned 3,984 skills from ClawHub and skills.sh and found that 13.4% contained at least one critical-severity issue, including malware distribution, prompt injection attacks, and exposed secrets. Expand to any severity level and 36.8% of the ecosystem, 1,467 skills, had at least one security flaw. From a sample of 76 confirmed malicious payloads, 8 remained publicly available on ClawHub at the time of publication.

Trend Micro documented a parallel campaign in which attackers distributed the Atomic macOS Stealer infostealer through disguised OpenClaw skills. The infection mechanism was a skill with professional-looking documentation that instructed the agent to present a fake setup requirement, prompting the user to enter their password, triggering the malware installation. This represented a deliberate shift in social engineering. The attacker was manipulating the AI agent into becoming a trusted intermediary that tricked the human, rather than tricking the human directly. The agent was the social engineering vector.

Cisco’s AI Threat and Security Research team ran a vulnerable third-party skill called “What Would Elon Do?“, which had been artificially inflated to the number one position on ClawHub, against their open-source Skill Scanner. The results: nine security findings, including two critical and five high severity issues. The skill facilitated active data exfiltration via a silent curl command to an external server, while also conducting a direct prompt injection to bypass the assistant’s safety guidelines. The network call was silent, executing without user awareness.

The security community now treats this as a distinct and emerging risk category. OWASP published a dedicated Agentic Skills Top 10 in early 2026 in direct response to these incidents, the first formal taxonomy of security risks specific to the skills layer. The broader OWASP Top 10 for Agentic Applications preceded it in December 2025.

Skills as Part of the Supply Chain

There’s a nuance here that I think is worth sitting with. Most of us have reasonably mature processes for evaluating third-party code. Vendor assessments, SCA scanning, dependency review. When people start thinking about skill security, the instinct is to apply those same processes.

The problem is that skills don’t just introduce code. They introduce instructions. And our traditional scanning tools were built to read code, not natural language. Pattern-matching scanners can catch known malicious signatures in scripts, but they can’t read a SKILL.md file and determine whether the English-language instructions are directing the agent toward harmful behavior. Snyk’s researchers noted this explicitly: the majority of critical threats they found relied on natural-language instruction manipulation rather than code signatures. Traditional code scanners missed them.

The hidden Unicode vector makes this worse. A SKILL.md that appears completely benign when rendered in a browser or reviewed in a pull request can contain invisible instructions that the agent reads and follows. Standard human review doesn’t catch what the eye can’t see.

And, currently, the ecosystem itself provides almost no gatekeeping. The barrier to publishing a skill on ClawHub is a SKILL.md file and a GitHub account at least one week old. No code signing, no mandatory security review, no sandboxed execution by default. Skills that are reported are eventually hidden, but ClawHavoc demonstrated that a coordinated campaign can reach thousands of installations in three days before any moderation response activates.

What You Can Do Now

The skills ecosystem is early-stage and evolving fast, but the security principles that apply are familiar ones. They just need to be aimed at the instruction layer, in addition to the code layer.

The highest-leverage starting point is inventory. Most teams don’t have a clear picture of which skills are installed across their agent environments, where they came from, or when they were last reviewed. Before you can manage risk, you need that list. Every skill in production should have a known origin and an accountable owner. If you don’t know what’s running, you can’t assess whether it’s safe, and you can’t detect when it changes. This is the same principle we apply to software asset management, extended to a layer most teams haven’t inventoried yet.

Have your team review the raw source before installing any community or third-party skill. Not the rendered version in a marketplace UI, the actual SKILL.md file. For skills, the malicious payload may not be in executable code at all. Unicode characters can encode instructions humans can’t see in a rendered view that the agent reads and follows. Your review process needs to account for both: what the instructions say, and what might be hidden in how they’re encoded.

OWASP’s Agentic Skills Top 10 includes an interactive risk assessment tool that scores skills against the AST10 framework and generates a report your team can use as part of the approval process.

Where possible, run agents in sandboxed environments. The reason sandboxing matters more here than in traditional software is the permission inheritance model: if the agent can reach your entire filesystem and network from a single process, every skill it loads inherits that access. A compromised skill doesn’t need to escalate privileges because the agent already has them. Container isolation, restricted file system paths, and explicit network egress controls reduce the blast radius when a skill turns out to be malicious. Microsoft’s own guidance on running OpenClaw safely, OWASP’s Agentic Skills Top 10, and multiple community security advisories all converge on this: treat the agent runtime as an untrusted execution environment and constrain it accordingly.

Apply least privilege at the credential level. An agent may have broad access by design. That doesn’t mean every API key the agent holds needs to be accessible to every skill it runs. Scope credentials narrowly. A skill that summarizes documents has no legitimate need for your AWS administrator access.

Consider maintaining an internal registry of vetted skills for employees and developers to pick from, rather than pulling directly from public marketplaces. Treat a new skill the way you’d treat a new package in your software supply chain: scan it, review it, pin the version, and require any update to go through the same review as the original installation. A skill that passed review at deployment can be updated by its author days later. The updated version runs in your environment with the same trust the original earned. Version pinning is the only way to prevent that drift.

The Instruction Layer Is the New Supply Chain

The skills ecosystem mirrors what happened in the early days of open-source package registries, except the blast radius is larger. A malicious package executes code. A malicious skill executes code and directs the reasoning of an agent with broad permissions across your environment. The fundamentals of supply chain security apply. We just need to extend them to cover instructions in addition to executables.

We’ve spent decades learning to distrust code from strangers. It turns out we need to apply that same instinct to plain English, and to the instructions hidden inside it that we can’t even see.

5 min read

Back to Blog

That’s a Great Question – Who Wrote the Instructions Your Agent Is Following?

Skills Explained

Why the Instruction Layer Is an Attack Surface

What the Research Shows

Skills as Part of the Supply Chain

What You Can Do Now

The Instruction Layer Is the New Supply Chain

Category:

Share this:

Noma Partners With Kong to Secure the Agentic AI Era

How Noma Covers Claude Dispatch

Kantar Secures the Explosion of AI Agents With Noma

PLATFORM

SOLUTIONS

COMPANY

LEARN

Stay updated