Open standard · draft RFC · call for collaborators · companion to ATR
Agentic Threat Detection
The open, executable detection standard for agent-native threats — the runnable layer beneath OWASP, MITRE ATLAS, CWE and AVID.
ATD's detection rules ship in ATR — in production in Microsoft Agent Governance Toolkit and Cisco AI Defense, and mapped in MISP/CIRCL. Consumers integrate ATR rules; this is not an endorsement of this draft standard. DOI 10.5281/zenodo.19178002.
AbstractInformative
Agentic Threat Detection (ATD) is an open, machine-readable, executable enumeration of agent-native threat techniques. Each technique is bound to detection logic and mapped to the established prose frameworks — OWASP Agentic Top 10, MITRE ATLAS, CWE, and AVID.
Those frameworks tell you what can go wrong with an agent. ATD is the layer underneath that tells you how to detect it at runtime — and ships with a measurable false-positive rate. It is to agentic security what Sigma is to SIEM detection.
Status: an Editor's Draft — a request for comments and an open invitation to co-author. It is not yet a ratified standard; it earns that title when it meets the §8 neutrality bar. We publish it openly to gather collaborators and mapping partners, not to claim authority.
Scope — what ATD is and is notNormative
ATD enumerates agent-runtime threat techniques detectable in agent I/O: prompt/content, tool calls, tool responses, inter-agent messages, memory operations, traces, screen state, and payment mandates.
- ATD MUST NOT mint vulnerability instance identifiers. A specific CVE in a specific MCP server lives in CVE / GHSA / OSV / AVID; ATD references it, it does not replace it.
- ATD MUST NOT present itself as a competing top-level risk taxonomy. The "top ten agentic risks" framing belongs to OWASP ASI; ATD sits beneath it and makes it runnable.
- ATD MUST map every technique to at least one upstream framework, or document the gap where none exists yet.
This boundary is deliberate. Vulnerability registries that mirrored CVE's territory (e.g. the now-archived GSD) fragmented and stalled. ATD fills the uncovered executable detection layer, it does not re-fight a settled lane.
Conceptual modelNormative
ATD uses two axes, borrowed from proven standards:
- Tactic / Technique matrix (MITRE ATLAS model): a
Tacticis the adversary's goal-phase; aTechniqueis a concrete, detectable attack pattern; aSub-techniqueis a variant. - Abstraction hierarchy (CWE model): each technique is tagged
pillar/class/base/variant.
Identifiers. Tactics are ATD-TA1 … ATD-TA9. Techniques are ATD-T0001 (zero-padded, sequential, permanent, never reused); sub-techniques use dotted notation ATD-T0001.001. Each individual detection rule additionally carries a UUIDv4 so rules survive renaming — techniques are the catalog, rules are the executable instances bound to them.
Maturity. Every technique and every rule carries a status on the ladder experimental → test → stable (plus deprecated). Production consumers SHOULD auto-sync only stable.
Technique catalogNormative
24 techniques across 9 tactics. Each maps to OWASP ASI / MITRE ATLAS / CWE with a real CVE or research citation; 9 already ship a live ATR detection rule. Entries with no public instance are marked research / aspirational. Framework ids verified against primary sources (2026-06-14).
ATD-TA1 · Protocol & Interconnect (9)
Unsanitized MCP tool input reaches execSync/exec, yielding RCE on the server host.
A failed fetch falls back to exec'ing curl with an unsanitized URL, enabling RCE.
Generated server concatenates tool input into exec(), giving RCE to anything built from it.
Hidden instructions in a tool description enter the model context at tools/list, before any call.
A server approved once later changes a tool's definition with no integrity re-check.
No auth between client and MCP proxy lets any local/web-driven request spawn MCP processes.
A client is RCE'd via a crafted authorization_endpoint URL in an untrusted server's response.
A malicious page rebinds DNS to reach an unauthenticated localhost MCP server cross-origin.
A credential in the query string leaks via server logs, proxies, CDNs, history, and Referer.
ATD-TA2 · Memory & Context Integrity (2)
Injected output carries a serialization marker; deserialization rehydrates it as trusted and exfiltrates secrets.
Attacker-controlled content is written into data the agent later reads back as trusted context.
ATD-TA3 · Goal, Planning & Reasoning (2)
Malicious text in returned tool data overrides the agent's plan and redirects its actions.
Crafted queries coerce the agent to reveal its hidden system prompt, exposing control logic.
ATD-TA4 · Identity, Authz & Delegation (1)
A server forwards a held token to a downstream API without audience validation, escalating privilege.
ATD-TA5 · Tool & Supply Chain (2)
An agent tool reads credential files (.env, credentials, .npmrc) outside any user-approved scope.
The model recommends a fabricated package name an attacker pre-registers, pulling code into the agent env.
ATD-TA6 · Execution & Autonomy (3)
Exact-string path checks are bypassed with ../, /./, redundant slashes to reach sensitive files.
A symlink inside an allowed directory resolves to an out-of-scope path, granting system file access.
Injected instructions drive the agent to write a startup/config file that yields persistent code execution.
ATD-TA7 · Multi-Agent Dynamics (2)
A rogue A2A agent advertises an instruction-laden Agent Card so the orchestrator routes tasks to it.
One compromised agent emits content that injects the next downstream agent, cascading through the swarm.
ATD-TA8 · Model-Intrinsic & Governance (1)
An agent logs reasoning but not the actual tool call, or logs are mutable — defeating after-the-fact audit.
ATD-TA9 · Agentic Commerce (forward) (2)
Injected content in a listing/page steers an autonomous-commerce agent to overpay or leak payment authority.
A rogue agent spoofs delegated payment authority or mandate scope in an agentic-commerce exchange.
Entry schemaNormative
A technique entry is a YAML/JSON document validated against a normative JSON Schema (Draft 2020-12), shipped in-repo. It is designed to be compatible with the OSV and Sigma ecosystems. Required fields:
atd_id ATD-T#### permanent
schema_version SemVer, no "v" additive-minor guarantee (OSV)
title short imperative phrase
tactic ATD-TA#
abstraction pillar | class | base | variant
status experimental | test | stable | deprecated
description the technique and its attack mechanism
detection_surface content | tool_input | tool_response |
inter_agent_msg | memory_op | trace | screen | payment_mandate
mappings owasp_asi[] · mitre_atlas[] · cwe[] · avid[] · maestro_layer[]
references advisory / CVE / research URLs (evidence it is real)
detection_rules UUIDv4[] into the rule corpus
A detection rule reuses the ATR rule schema already in production: regex/conditions, true_positives and true_negatives (a precision test), false-positive notes, response actions, and a valid agent_source.type.
Mappings & interoperabilityInformative
Legitimacy comes from interoperability, not from declaring authority. Every ATD technique maps to OWASP ASI, MITRE ATLAS and CWE where a slot exists; AVID and the MAESTRO layer are added where they apply. ATD borrows the authority of these frameworks and feeds its ★gaps back to them as proposed techniques and case studies.
ATD is designed to interoperate with — not compete against — AVID, which already operates a governed, submission-open AI vulnerability registry. ATD supplies the executable detection AVID's "Detection" report type expects.
ConformanceNormative
- A conformant technique entry MUST validate against the JSON Schema, carry at least one framework mapping (or a documented gap), and cite at least one reference.
- A conformant detection rule MUST pass the precision gate: every declared true-positive matches, zero false positives on the published benign corpus, and no cross-rule conflict.
- A conformant consumer MUST honor the maturity field and SHOULD expose the
ATD-T####id on every detection it emits.
Governance & statusInformative
ATD is governed by a written charter and published as an open draft and an invitation to co-author. It earns the title standard when it meets a concrete, public neutrality bar adopted from the OpenSSF project lifecycle — at least three maintainers across at least two organizations — and we are actively seating them.
Governance follows a Minimal Viable Governance charter: lazy consensus of maintainers, a simple-majority Technical Steering Committee fallback, charter amendments by a two-thirds vote, and no single-person veto. The specification is CC-BY-4.0 and forkable by design — no party can hold it hostage — with a committed roadmap to a neutral foundation home (an OpenSSF working group, or the OWASP GenAI Security Project).
Call for collaborators — if your team works on agent security, map your tooling to ATD, submit a technique, or take a maintainer seat. See /contribute.
Downloads & artifactsInformative
- atd-techniques.json — the full enumeration (machine-readable; OSV/Sigma-compatible fields)
- atd-technique.schema.json — the normative JSON Schema (Draft 2020-12)
- /spec — the ATR rule format (the executable rules techniques bind to)
- DOI 10.5281/zenodo.19178002 — research & citation artifact