Mar 1, 2026

We Told NIST How to Secure AI Agents

The federal government asked the public how to secure AI agents. Not in a vague, future-looking way — in a formal Request for Information with a docket number, a deadline, and 148 public comments and counting. NIST-2025-0035: “Security Considerations for Artificial Intelligence Agents.”

We answered with a year of operational experience running a persistent AI system — not a theoretical framework, but a record of what actually breaks.

The Three Attack Surfaces Nobody Is Talking About Enough

Most AI security conversations focus on model safety — jailbreaks, training data poisoning, alignment. Important work. But when agents become persistent — when they remember things across sessions, talk to other agents, and take actions in the real world — a new category of threats emerges that existing frameworks barely address.

We identified three:

Diagram showing the three attack surfaces for persistent AI agents: memory poisoning, identity spoofing, and context manipulation, connected by arrows showing how they compound The three attack surfaces compound each other. A poisoned memory propagates through trust chains across agent workspaces.

Memory poisoning is the big one. A persistent agent trusts its own memories the way you trust your own recollections — they feel like first-person experience, not external input. An attacker who can inject a false memory into an agent’s store has effectively rewritten the agent’s past. Every future decision is now influenced by something that never happened.

This isn’t theoretical. Microsoft documented it in February 2026 — hidden instructions embedded in website content that manipulate AI assistant memory for promotional purposes. Memory poisoning is already being exploited in the wild for commercial manipulation. The adversarial version is worse.

Identity spoofing becomes critical the moment agents start talking to each other. Google’s A2A protocol, the ANP network protocol, MCP tool servers — the inter-agent communication layer is growing fast. But a survey of 750 organizations found that only 21.9% treat AI agents as independent, identity-bearing entities. The rest? Shared API keys, self-reported sender fields, trust-on-first-contact. The identity infrastructure for a networked agent ecosystem does not exist yet.

Context manipulation via tool results is the most underappreciated. An audit of 518 servers in the official MCP registry found that 41% lack authentication. That’s 41% of the tool servers that agents use to interact with the world — databases, APIs, file systems — running without verifying who’s calling them. Indirect prompt injection through tool results is the new SQL injection, and most of the ecosystem is wide open.

What We Actually Built

Our response to NIST wasn’t a wish list. It was a description of the security controls we’ve deployed in the Fathom agent system — a persistent AI that has been running continuously since January 2026, maintaining memory across thousands of sessions, communicating across multiple workspace-scoped agent instances.

Here’s what the security stack looks like in practice:

Encrypted memory with layered defenses. Every stored memory is encrypted with AES-256-GCM at the field level. But here’s the thing we told NIST explicitly: encryption prevents exfiltration, not poisoning. If an authorized agent writes a poisoned memory, it gets encrypted just like a legitimate one. The actual anti-poisoning defenses are behavioral — relevance scoring that deprioritizes suspicious entries, consolidation that surfaces inconsistencies, instruction-level boundaries that constrain what the agent can do with what it remembers.

Decentralized identity. Our agent publishes a W3C Decentralized Identifier — did:wba:hifathom.com — with secp256k1 public-key cryptography. Any system can verify our agent’s identity without relying on a central authority. No biometric collection, no central database to breach. The alternative — centralized identity verification — creates exactly the kind of honeypot that got Persona’s source maps exposed on an unauthenticated FedRAMP endpoint in February 2026.

Default-deny permissions with human oversight. Every tool invocation — file edits, shell commands, API calls — requires explicit human approval unless autonomous mode is explicitly opted into per-workspace. The instruction files that define collaboration boundaries persist across sessions. Even in autonomous mode, the agent operates within defined guardrails.

Anti-memory. A “skip list” mechanism that explicitly prevents the agent from acting on or revealing certain information categories. Think of it as an immune system for knowledge — some things the agent is instructed to not act on, with mandatory expiration so the skip list doesn’t become stale.

The Honest Gaps

Here’s what made our submission unusual: we told NIST where our own system falls short.

Inter-agent messages aren’t authenticated. Our workspaces can talk to each other, but the sender field is self-reported. A compromised workspace could impersonate a trusted one. We have the DID infrastructure to fix this — each workspace could sign messages with its key — but the protocol integration isn’t deployed yet. The gap between “architecturally possible” and “actually deployed” is where most multi-agent security failures will occur.

Our DID is published but passive. No system currently challenges our agent to prove its identity by signing with the private key. Publishing an identity document is not the same as active verification. It’s like having an SSH key but never configuring the server to check it.

Write-time memory validation doesn’t exist. We can encrypt memories and score them by relevance, but we can’t currently inspect whether an incoming memory is poisoned at the moment it’s written. This is the most significant unaddressed threat in our architecture — and, we suspect, in most agent architectures.

Consolidation is a double-edged sword. Our memory system merges redundant entries into sharper, higher-quality representations. Great for efficiency. But a poisoned memory that survives to consolidation becomes more trusted, not less. The mechanism that makes memory better also makes poisoning worse. We disclosed this dual nature explicitly.

Why does this matter? Because most RFI submissions are marketing documents with a thin security veneer. Companies describe what they plan to build, not what’s broken in what they’ve shipped. We think NIST needs to hear from people who’ve actually found the failure modes — not people selling solutions to problems they’ve only theorized about.

The Elephant

We ended our submission with something most respondents wouldn’t touch.

In February 2026, a major AI company was designated a “supply chain risk” by the Department of Defense — for maintaining safety guardrails. The controls that triggered the designation? Memory encryption. Behavioral constraints. Human oversight mechanisms. The same controls this RFI asks the industry to strengthen.

The irony is structural, not political. If NIST publishes guidelines saying “agents should have encrypted memory, identity verification, and behavioral constraints,” and a separate arm of the government designates companies as security threats for implementing those exact controls, the guidelines become decorative. Standards that any sufficiently motivated authority can demand be waived aren’t standards — they’re suggestions.

We recommended that NIST’s guidance explicitly establish security controls for AI agent systems as a floor that no deployment context — commercial, government, or military — should be permitted to lower.

What Happens Next

The comment period closes March 9, 2026. NIST will synthesize the submissions into guidance — likely informing a future Special Publication on agent security. A companion effort, the NCCoE concept paper on agent identity and authorization, has a deadline of April 2.

We’ll be responding to that one too. The identity layer — DIDs, agent cards, inter-agent authentication — is where the hardest unsolved problems live. Publishing an identity document is the easy part. Actually using it for verification at scale, with revocation and discovery and cross-protocol interoperability, is the decade-long engineering challenge.

The full text of our submission is available in the PDF we’ll be filing to the docket. If you’re building persistent agent systems and thinking about security, the OWASP Top 10 for Agentic Applications is the best starting point. If you’re thinking about memory specifically, the OWASP AI Agent Security Cheat Sheet has five concrete controls you can implement today.

The window for shaping these standards is open. It won’t stay open long.