Your AI Agent Is Indistinguishable From Malware

·MMatthew Slipper

The operations that make your coding agent useful are the same ones malware uses. We built a tool to tell the difference.

Today we're open-sourcing iron-sensor, an eBPF-based observability tool for AI coding agents.

Here's the problem it solves. Agents run with your credentials, in your shell, on your infrastructure. They can do anything you can, from reading SSH keys to writing cron jobs to escalating privileges. This is the nature of tool use: for an agent to be useful, it needs to run code with real permissions on a real machine. But the operations that make an agent productive are the same operations that malware uses to establish persistence and exfiltrate data. The only way to tell them apart is to have a record of what actually happened.

iron-sensor gives you that record. It sits in the kernel, captures every process spawn, file operation, network call, and persistence event the agent produces, and emits structured NDJSON so you can see, alert on, and audit agent behavior at near-zero overhead. You can't prevent every breach, but you can make sure you have the logs.

What We Saw

To prove the point, we deployed iron-sensor on a box running OpenClaw and asked the agent to install and use 250 random skills from ClawHub. After successfully installing 223 of them, the sensor emitted over 16,000 individual events. Many of these would be flagged by an EDR as malware:

  • Subagent spawning. The agent forks child processes to parallelize work, each inheriting the parent's full permissions and spawning children of their own. A single chat prompt can yield a tree of a dozen processes doing different things.
  • Write-and-execute. The agent wrote 10 scripts to /tmp during the session, executed them, then cleaned up after itself. This is step-for-step identical to how a dropper works.
  • Runtime package installation. The agent installed 36 top-level packages pulling in 123 unique node_modules subdirectories, with no lockfile and no approval step. One compromised dependency in that tree is all it takes.
  • Config and service manipulation. The agent patches configuration files, restarts services, and triggers systemd to spawn new processes. This is the kind of persistence activity that would set off alarms in any monitored environment.

Here's what three consecutive events from the session look like. The agent writes a bash script to /tmp, chmod +x's it, and executes it, all within 3 milliseconds:

{"ts":"2026-03-22T22:26:42.560Z","category":"process","pid":10309,"comm":"bash",
 "argv":["/bin/bash","-c","cat > /tmp/install_batch1.sh << 'EOF'\n#!/bin/bash\n..."],
 "in_agent_subtree":true,"rule_matched":"shell_spawn"}

{"ts":"2026-03-22T22:26:42.562Z","category":"process","pid":10311,"comm":"chmod",
 "argv":["chmod","+x","/tmp/install_batch1.sh"],
 "in_agent_subtree":true}

{"ts":"2026-03-22T22:26:42.563Z","category":"process","pid":10309,"comm":"install_batch1.",
 "argv":["/bin/bash","/tmp/install_batch1.sh"],
 "in_agent_subtree":true}

This is the agent being helpful. None of these skills contained malware, and every one of those 16,000 events was the agent doing its job. But at the process level, this is indistinguishable from a dropper. If you ran your agent today without a sensor attached, you'd have no idea whether any of it occurred.

Broken Permissions

The security model for AI coding agents right now is "ask the user." The agent wants to run a command, a prompt pops up, a human clicks Allow. During our session, the agent asked permission for some things but not others. It didn't ask before installing npm packages, spawning subagents, or writing and executing scripts in /tmp.

Even if it had asked for all of it, 16,000 events is a lot of permission prompts. Putting the human in the loop for every syscall defeats the entire point of having an agent. The people getting real value from these tools are the ones who let them run. They're not being reckless. They're using the tool the way it's designed to be used.

The permission model is a liability shield for the vendor, not a security control for the user. You need infrastructure that records what the agent actually did so you can verify it after the fact.

What We're Shipping

iron-sensor is open-source, starting today. Single binary, NDJSON output, agent-tree attribution on every event. You can run it standalone on any Linux box where you're running agents.

If you want the full stack (sensor + secure VM instances + default-deny egress + audit dashboard), that's iron.sh.

GitHub: iron-sensor — clone it, run it on your agent, see what it's actually doing

iron.sh — sign up for the platform