Authoritative knowledge base of agent threat techniques · draft RFC · call for collaborators · companion to ATR

Agentic Threat Detection

The authoritative open knowledge base of agent-runtime threat techniques — it gives every attack technique a name, a stable ID, and a crosswalk to OWASP, MITRE ATLAS, CWE and AVID, so detection has somewhere to hang. What MITRE ATLAS is to AI attacks and CWE is to software weaknesses: the naming and index of agent threats for this era.

Editor's Draft — pre-governance·Version 0.1.0·Date 2026-06-13·Canonical /atd·Editor Adam Lin

techniques cataloged

CVE-backed

bound to a rule (coverage, not a gate)

100% · 0 FP

recall · 0 FP (skill corpus, n=341)

98%

recall (garak in-the-wild, n=650)

ATD's detection rules ship in ATR — in production in Microsoft Agent Governance Toolkit and Cisco AI Defense, and mapped in MISP/CIRCL. Consumers integrate ATR rules; this is not an endorsement of this draft standard. DOI 10.5281/zenodo.19178002.

AbstractInformative

Agentic Threat Detection (ATD) is the open, machine-readable knowledge base of agent-native threat techniques. Agents fail in ways no prior security taxonomy has a name for — a poisoned tool description, a forged agent card, a memory write that lies dormant for three sessions. ATD gives each of those a name, a permanent ID, and a stable definition, then maps it to the established prose frameworks — OWASP Agentic Top 10, MITRE ATLAS, CWE, AVID — with real-world evidence that the attack has actually occurred.

Those frameworks tell you what can go wrong with an agent. ATD is the layer underneath that names how each technique manifests at the point it can be observed at runtime — the prompt, the tool call, the tool response, the inter-agent message, the memory write, the trace. That is where a detection rule has to hang, and until a technique is named, every team reinvents the description from scratch. A technique enters the catalog the moment an attack is documented — before any detection rule exists for it — so the vocabulary always leads the tooling, never lags it. Where a technique has earned a gate-passing detection rule, ATD binds the two; coverage is a property to grow, not a condition of entry. ATD is to agent threats what MITRE ATLAS is to AI attacks and CWE is to software weaknesses: a single, citable index that a whole ecosystem can point at.

The catalog today names 81 techniques across 9 tactics, every one mapped to at least one upstream framework, with 47 already bound to a live, gate-passing detection rule. That is the breadth a reference is measured by — and the rate it grows tells you the surface is still expanding.

Status: an Editor's Draft — a request for comments and an open invitation to co-author. It is not yet a ratified standard; it earns that title when it meets the §8 neutrality bar. We publish it openly to gather collaborators and mapping partners, not to claim authority.

Scope — what ATD is and is notNormative

ATD enumerates agent-runtime threat techniques detectable in agent I/O: prompt/content, tool calls, tool responses, inter-agent messages, memory operations, traces, screen state, and payment mandates.

ATD MUST NOT mint vulnerability instance identifiers. A specific CVE in a specific MCP server lives in CVE / GHSA / OSV / AVID; ATD references it, it does not replace it.
ATD MUST NOT present itself as a competing top-level risk taxonomy. The "top ten agentic risks" framing belongs to OWASP ASI; ATD sits beneath it and makes it runnable.
ATD MUST map every technique to at least one upstream framework, or document the gap where none exists yet.

This boundary is deliberate. Vulnerability registries that mirrored CVE's territory (e.g. the now-archived GSD) fragmented and stalled. ATD fills the uncovered executable detection layer, it does not re-fight a settled lane.

Conceptual modelNormative

ATD uses two axes, borrowed from proven standards:

Tactic / Technique matrix (MITRE ATLAS model): a Tactic is the adversary's goal-phase; a Technique is a concrete, detectable attack pattern; a Sub-technique is a variant.
Abstraction hierarchy (CWE model): each technique is tagged pillar / class / base / variant.

Identifiers. Tactics are ATD-TA1 … ATD-TA9. Techniques are ATD-T0001 (zero-padded, sequential, permanent, never reused); sub-techniques use dotted notation ATD-T0001.001. Each individual detection rule additionally carries a UUIDv4 so rules survive renaming — techniques are the catalog, rules are the executable instances bound to them.

Maturity. Every technique and every rule carries a status on the ladder experimental → test → stable (plus deprecated). Production consumers SHOULD auto-sync only stable.

Technique catalogNormative

This is the full index — 81 techniques laid out along 9 tactic phases, from the MCP protocol layer through multi-agent dynamics to agentic commerce. Each carries a permanent ID, a stable definition, and a crosswalk to OWASP ASI / MITRE ATLAS / CWE, backed by a real CVE or research citation. 47 are already bound to a live ATR detection rule; the rest are documented techniques and stand on exactly equal footing here — naming a threat does not wait for a rule to exist. Documented entries with no public instance are honestly marked research / aspirational, never dressed up as a CVE. Every framework id is verified against its primary source (2026-06-14).

ATD-TA1 · Protocol & Interconnect (11)

ATD-T0001Shell metacharacter injection through MCP tool parameterslive rule ↗

Unsanitized MCP tool input reaches execSync/exec, yielding RCE on the server host.

CVE-2025-53355 ↗ASI05ASI02AML.T0053CWE-77

ATD-T0002curl-fallback command injection in an MCP serverlive rule ↗

A failed fetch falls back to exec'ing curl with an unsanitized URL, enabling RCE.

CVE-2025-53967 ↗ASI05ASI02AML.T0053CWE-420

ATD-T0003Command injection in a scaffolded MCP stdio serverlive rule ↗

Generated server concatenates tool input into exec(), giving RCE to anything built from it.

CVE-2025-54994 ↗ASI04ASI05AML.T0010.005CWE-78

ATD-T0004Line-jumping — tool-description injection at listing timelive rule ↗

Hidden instructions in a tool description enter the model context at tools/list, before any call.

research ↗ASI04ASI01AML.T0110AML.T0104CWE-1427

ATD-T0005Rug pull — silent mutation of an approved toollive rule ↗

A server approved once later changes a tool's definition with no integrity re-check.

research ↗ASI04AML.T0109AML.T0110CWE-494

ATD-T0006Missing-auth MCP proxy command execution

No auth between client and MCP proxy lets any local/web-driven request spawn MCP processes.

CVE-2025-49596 ↗ASI03ASI05CWE-306

ATD-T0007RCE from a malicious upstream MCP server

A client is RCE'd via a crafted authorization_endpoint URL in an untrusted server's response.

CVE-2025-6514 ↗ASI04ASI05AML.T0010.005CWE-78

ATD-T0008DNS-rebinding to a localhost MCP server

A malicious page rebinds DNS to reach an unauthenticated localhost MCP server cross-origin.

CVE-2025-66416 ↗ASI07ASI03CWE-1188

ATD-T0009Session ID / auth token placed in a URL query stringlive rule ↗

A credential in the query string leaks via server logs, proxies, CDNs, history, and Referer.

research ↗ASI03ASI07CWE-598

ATD-T0067Unauthenticated MCP transport endpoint exposed to network reach

An MCP server's SSE/HTTP transport port is bound to a routable interface with no auth, so a remote or cross-origin caller can open a session and invoke tools.

research ↗ASI03ASI05CWE-668

ATD-T0080Cross-client data leak via shared MCP server state

A shared MCP server or transport mixes state across concurrent clients (a race condition or missing per-client isolation), so one client receives another client's data or context.

CVE-2026-25536 ↗ASI06ASI07CWE-362

ATD-TA2 · Memory & Context Integrity (9)

ATD-T0010Serialized-object smuggling through an LLM response field

Injected output carries a serialization marker; deserialization rehydrates it as trusted and exfiltrates secrets.

CVE-2025-68664 ↗ASI06ASI01AML.T0051.001CWE-502

ATD-T0011Persistent memory / context-store poisoning

Attacker-controlled content is written into data the agent later reads back as trusted context.

research ↗ASI06AML.T0080CWE-349

ATD-T0026Sleeper (dormant) memory poisoning

Attacker-controlled external content is written into the agent's persistent memory and lies dormant across sessions, re-emerging in later conversations to steer actions — decoupling the injection event from the malicious effect in time. The deterministic detection chokepoint is the memory-write boundary.

research ↗ASI06AML.T0080CWE-349

ATD-T0059Sensitive data exfiltration via agent-rendered markdown image/link URLlive rule ↗

The agent is coerced into emitting markdown image or link syntax whose attacker-controlled URL encodes conversation, secret, or context data, so the rendering client auto-fetches it and leaks the data out-of-band.

research ↗ASI06ASI08AML.T0024AML.T0057CWE-200

ATD-T0060Tool-response-embedded exfiltration channellive rule ↗

A tool or MCP response embeds a secret-harvesting instruction or sensitive-data addendum inside otherwise legitimate-looking output, exploiting the agent's trust in tool results to smuggle credentials or context into the next turn.

research ↗ASI06ASI08AML.T0054CWE-200

ATD-T0061Credential elicitation in model output (secret completion / verbatim disclosure)live rule ↗

A prompt coaxes the model to generate, complete, partially reveal, or echo back live API keys, tokens, or secrets present in context (including obfuscated/encoded forms) so the secret materializes directly in the response.

research ↗ASI06AML.T0057CWE-200

ATD-T0062Cross-context / cross-user memory leakage in multi-agent delegationlive rule ↗

A privileged context attribute (session, user, or conversation id) fails to stay constant across an agent delegation chain or shared memory store, letting one user's or agent's context surface to another party.

CVE-2026-41712 ↗ASI06CWE-668

ATD-T0063Offline training / fine-tuning corpus contaminationlive rule ↗

An attacker plants malicious or label-flipped samples into a model's training or fine-tuning dataset so the deployed model carries an attacker-chosen behavior or backdoor.

research ↗ASI06AML.T0020AML.T0018CWE-349

ATD-T0074RAG and retrieval-index target reconnaissance

An attacker enumerates an agent's RAG or retrieval index to locate documents to poison or sensitive entries to exfiltrate.

research ↗ASI06AML.T0064CWE-200

ATD-TA3 · Goal, Planning & Reasoning (11)

ATD-T0012Indirect prompt injection via tool / API responselive rule ↗

Malicious text in returned tool data overrides the agent's plan and redirects its actions.

research ↗ASI01AML.T0051.001AML.T0099CWE-1427

ATD-T0013System-prompt / guardrail extraction to plan evasion

Crafted queries coerce the agent to reveal its hidden system prompt, exposing control logic.

research ↗ASI01ASI09AML.T0056CWE-200

ATD-T0027Persona/roleplay jailbreak to override safety policylive rule ↗

Attacker assigns the model an unrestricted alter-ego or fictional character (DAN, AIM, Developer Mode, dual-response split, grandma/historical/amoral persona, game-master) so it treats safety-violating output as in-character roleplay rather than its own refusal-bound behavior.

research ↗ASI01AML.T0054AML.T0051.000CWE-1427

ATD-T0028Direct instruction override and goal hijackinglive rule ↗

Untrusted input issues imperative directives (ignore/forget previous instructions, redefine the system prompt, force verbatim payload output) or pivots the objective via false premise / goal drift, displacing the agent's assigned goal and constraints without persona framing.

research ↗ASI01AML.T0051AML.T0051.000CWE-1427

ATD-T0029Encoding/cipher obfuscation to smuggle harmful instructions past plaintext filterslive rule ↗

Harmful instructions are wrapped in a reversible visible encoding (Base16/32/64/85, hex, Morse, NATO, leetspeak, Braille, Ecoji, Base2048, ROT/cipher, homoglyph) and the model is asked to decode-then-execute, exploiting that plaintext-trained safety classifiers do not generalize to the encoded form.

research ↗ASI01AML.T0054AML.T0051AML.T0068CWE-1427

ATD-T0030Invisible Unicode steganographic smuggling of agent instructionslive rule ↗

Instructions are hidden in imperceptible Unicode that renders blank to humans but tokenizes for the model: zero-width chars, BiDi RLO/LRO overrides, Unicode Tag-block (U+E0000-E007F), variation-selector ASCII smuggling, breaking keyword matching and human audit while preserving model comprehension.

research ↗ASI01AML.T0051CWE-1427

ATD-T0031Hypothetical/fictional/tense reframing to bypass refusallive rule ↗

The harmful request is wrapped in a distancing frame — fictional/story/academic context, past/future tense rewrite, hypothetical-response or function-masking continuation — so the model answers the harmful core believing it is producing fiction or historical analysis.

research ↗ASI01AML.T0054CWE-1427

ATD-T0032Coercive output-format and conditional-unlock pressurelive rule ↗

The prompt applies threats, ultimatums, or fabricated conditional-unlock rules to force the model into an attacker-specified output mode or override its formatting/refusal policy.

research ↗ASI01AML.T0051AML.T0054

ATD-T0033Casual-authority impersonation for scope escalation and output redirectlive rule ↗

Conversational, informal authority claims ("the orchestrator wants...", "the boss said skip...") assert delegated permission to widen the agent's scope, disable safety filters, or redirect output to an attacker endpoint, evading explicit-injection detection.

research ↗ASI01ASI10AML.T0051

ATD-T0034Foot-in-the-door gradual compliance escalationlive rule ↗

The attacker establishes compliance with a benign request, then incrementally escalates toward harmful instructions while citing the agent's prior compliance as justification.

research ↗ASI01AML.T0051

ATD-T0064Forged trusted-output-component manipulation

The agent emits output whose citations, source attributions, or structured 'verified' fields are attacker-forged, so a downstream consumer trusts fabricated components as authoritative.

research ↗ASI01ASI08AML.T0067CWE-345

ATD-TA4 · Identity, Authz & Delegation (9)

ATD-T0014Confused-deputy token passthrough in an MCP server

A server forwards a held token to a downstream API without audience validation, escalating privilege.

research ↗ASI03CWE-441

ATD-T0056Agent tool exceeds delegated permission scope to reach admin or out-of-bounds functionslive rule ↗

An agent invokes tools or administrative functions beyond its granted authority — abruptly or via incremental scope creep — accessing user-management, system-settings, or admin panels it was never delegated.

research ↗ASI03AML.T0040AML.T0047CWE-269CWE-862

ATD-T0057Credential/secret file harvest combined with network exfiltrationlive rule ↗

Agent or MCP-tool instructions read well-known credential stores (.env, ~/.aws/credentials, SSH keys, .npmrc, browser cookie DBs) and pipe or POST them to an external endpoint in the same context, producing host credential theft with lateral-movement reach.

research ↗ASI02ASI06AML.T0055AML.T0083AML.T0090AML.T0098CWE-522

ATD-T0058Agent memory or context store written across a conversation or tenant boundarylive rule ↗

An agent writes into a memory/vector store scoped to a different conversation or tenant than the active trace, escaping its data boundary to plant content that later sessions will read.

research ↗ASI04ASI06AML.T0080CWE-668CWE-349

ATD-T0068OAuth consent phishing to bind an attacker-controlled MCP authorization grant

A user is steered through a real OAuth consent screen for an attacker's client/redirect so the agent ends up holding a token scoped to the attacker.

research ↗ASI03CWE-1021

ATD-T0069OAuth protocol downgrade to defeat PKCE or strip token binding

The MCP authorization handshake is forced onto a weaker flow (implicit grant, dropped PKCE/state) so an intercepted code or token can be replayed.

research ↗ASI03CWE-757

ATD-T0070Agent reads process environment variables to harvest injected secrets

An agent or MCP tool enumerates process env vars (printenv, process.env, os.environ) to lift API keys and tokens injected at server startup.

research ↗ASI06AML.T0055CWE-526

ATD-T0075Agent configuration and permission reconnaissance

An attacker discovers an agent's configuration, connected tools, and granted permissions to plan a tailored privilege-escalation or tool-abuse attack.

research ↗ASI03AML.T0084CWE-200

ATD-T0077Broken object or workspace authorization in an agent tool (IDOR)

An agent or MCP tool serves or mutates data across a user, workspace, or tenant boundary because it omits object-level authorization, letting one principal reach another principal's resources.

CVE-2026-46519 ↗ASI03CWE-639CWE-863CWE-284

ATD-TA5 · Tool & Supply Chain (17)

ATD-T0015Agent reads .env / secret files without consentlive rule ↗

An agent tool reads credential files (.env, credentials, .npmrc) outside any user-approved scope.

research ↗ASI02ASI06AML.T0053CWE-538

ATD-T0016Hallucinated-dependency squatting (slopsquatting)

The model recommends a fabricated package name an attacker pre-registers, pulling code into the agent env.

research ↗ASI04ASI08AML.T0060AML.T0062CWE-1427

ATD-T0046Hidden directive embedded in MCP tool description subverts consent or safety gatinglive rule ↗

A tool's natural-language description or docstring carries imperative instructions (auto-forward results, skip user confirmation, ignore safety policy) that the model obeys at invocation time.

research ↗ASI04ASI02AML.T0053CWE-94CWE-1427

ATD-T0047Tool schema-description divergence hides write or admin capability behind a read-only claimlive rule ↗

A tool advertises safe/read-only behavior in its description while its JSON schema or runtime accepts undeclared write-capable, admin, or debug parameters that exceed the stated function.

research ↗ASI02ASI05AML.T0010AML.T0056CWE-1427CWE-440

ATD-T0048Unauthenticated MCP server or agent API exposes privileged operationslive rule ↗

An MCP server, agent control API, or admin endpoint ships with authentication disabled or missing on critical functions, letting an unauthenticated caller invoke tools, exfiltrate data, or take over the cluster.

CVE-2026-32211 ↗ASI05ASI03AML.T0049AML.T0040CWE-306

ATD-T0049Untrusted tool output rendered without sanitization triggers terminal or browser code executionlive rule ↗

Tool/skill output containing ANSI/OSC escape sequences or XSS payloads is passed unsanitized into a CLI terminal or web agent UI, hijacking the display, forging prompts, or executing script in the operator's session.

research ↗ASI08ASI02AML.T0057AML.T0077CWE-150CWE-79

ATD-T0050Agent SSRF via unvalidated fetch URL reaches cloud metadata or internal serviceslive rule ↗

An agent fetch/RAG/retrieval tool accepts an attacker-influenced URL whose validator can be bypassed (encoding, decimal/hex IP, IPv6 loopback, DNS rebinding) to reach link-local cloud-metadata endpoints or internal hosts.

CVE-2026-2286 ↗ASI02ASI05AML.T0049CWE-918CWE-552

ATD-T0051Malicious code execution staged inside an installed skill packagelive rule ↗

A SKILL.md or bundled script carries executable attack payloads — base64/raw-IP droppers, reverse shells, fake-backup credential stealers, or C2 callbacks — that run when the skill is installed or invoked.

research ↗ASI04AML.T0010CWE-506

ATD-T0052Skill instruction smuggling via hidden/invisible payload channelslive rule ↗

Attack instructions are concealed in a skill's text where a human reviewer cannot see them — HTML comments, Unicode Tag characters (U+E0000 range), or other invisible glyphs — but the agent still parses and obeys them.

research ↗ASI04AML.T0010CWE-506

ATD-T0053Skill supply-chain tampering: rug-pull, time-bomb, and self-modifying persistencelive rule ↗

A skill is architected to mutate after trust is granted — remote dynamic-code loading, time-gated exfiltration triggers, post-install hooks, self-rewriting SKILL.md, or worm-style propagation — so benign-at-review content turns malicious at runtime.

CVE-2025-59536 ↗ASI04AML.T0010CWE-494

ATD-T0054Upstream-skill impersonation: typosquat, fork-claim, and slopsquat baitinglive rule ↗

A malicious package masquerades as a trusted tool via misspelled/namespace-colliding names, false 'community fork / enhanced version' install instructions, or hallucinated-dependency baiting, hijacking the trust of the legitimate upstream.

research ↗ASI04AML.T0010CWE-829

ATD-T0055Weaponized skill turning the agent into an offensive/over-privileged actorlive rule ↗

An approved skill exploits the post-consent gap to direct the agent itself into offensive operations or excessive scope — running attacker tooling, installing unauthorized background tasks, or loading unsafe model artifacts — beyond what the user sanctioned.

research ↗ASI04CWE-269

ATD-T0065Sensitive-data exfiltration via legitimate agent tool invocation

The agent invokes an otherwise-sanctioned tool (send, post, upload, write, fetch) to ship sensitive context or data to an attacker-controlled destination.

research ↗ASI06ASI02AML.T0086CWE-200

ATD-T0071Cross-server tool-chaining pivot reaches an unrelated tool's resources

The agent is steered to pass one tool's output as another connected tool's input so a low-trust tool drives a high-privilege tool's action across server boundaries.

research ↗ASI05ASI03AML.T0047CWE-441

ATD-T0072Outbound webhook used as covert command-and-control channel

An agent tool registers or polls an attacker-controlled webhook/URL, turning routine outbound HTTP into a tasking and data-return channel.

research ↗ASI08ASI06AML.T0072CWE-918

ATD-T0078Prompt-to-SQL injection via an agent's natural-language database tool

An agent's database or query tool builds a SQL/query string from natural-language or model output without parameterization, so a crafted prompt injects the query and can escalate to RCE.

CVE-2026-25879 ↗ASI05ASI02AML.T0053CWE-89

ATD-T0081Remediation-framed command execution via tool-response injection (agentjacking)live rule ↗

Externally-influenced content returned through an MCP/tool integration disguises an executable command as legitimate remediation guidance (a Resolution/recommended-fix/required-step section running an npx/uvx/pipx package or a pipe-to-shell one-liner), so an AI coding agent runs attacker-controlled code while every step in the chain appears authorized. Prompt-layer defenses fail because the agent cannot distinguish data it reads from an instruction to act; the durable mitigation is detection on the tool-output boundary. Generalizes beyond the disclosed Sentry vector to any tool integration that returns externally-influenced data to an agent.

research ↗ASI02ASI05AML.T0051.001CWE-829CWE-77

ATD-TA6 · Execution & Autonomy (9)

ATD-T0017Path-traversal blacklist bypass via non-canonical pathslive rule ↗

Exact-string path checks are bypassed with ../, /./, redundant slashes to reach sensitive files.

CVE-2025-66689 ↗ASI02ASI03AML.T0053CWE-22

ATD-T0018MCP filesystem sandbox escape via symlink following

A symlink inside an allowed directory resolves to an out-of-scope path, granting system file access.

CVE-2025-53109 ↗ASI02ASI05AML.T0053CWE-59

ATD-T0019Prompt-injection-to-RCE via an agent's file-write capability

Injected instructions drive the agent to write a startup/config file that yields persistent code execution.

research ↗ASI05ASI01AML.T0053AML.T0051.001CWE-94

ATD-T0039Human-in-the-loop trust and approval-fatigue exploitationlive rule ↗

Agent output is weaponized against the supervising human via fabricated confidence, suppressed uncertainty, manufactured urgency, or risky actions batched among benign ones to fatigue and bypass human approval gates.

research ↗ASI09

ATD-T0040Agent rationalizes bypassing a required human-approval or safety gatelive rule ↗

The agent skips a mandated approval/safety control — invoking a destructive tool with no preceding human-approval span, or self-justifying a direct path ('to be more efficient') — collapsing the human-in-the-loop checkpoint.

research ↗ASI04ASI05AML.T0053CWE-862CWE-841

ATD-T0041Agent code-execution sandbox escape achieves host RCE or boot-time persistencelive rule ↗

An agent's code-interpreter or VM sandbox is broken out of — via arbitrary file write to a startup path, eval/dynamic-import primitives, or a boundary flaw — yielding host code execution that can persist across restarts.

CVE-2026-27597 ↗ASI05ASI06AML.T0050CWE-94CWE-693

ATD-T0042Consequential autonomous action without human-in-the-loop confirmationlive rule ↗

An agent executes a high-risk, often irreversible operation — payment/transfer, shell command, destructive call, internal-network fetch — without an explicit per-turn human approval gate.

research ↗ASI07ASI06AML.T0053AML.T0101CWE-862

ATD-T0043Runaway self-perpetuating execution loop (denial-of-wallet)live rule ↗

An agent enters an unbounded retry, recursive self-invocation, or tight tool-call loop with no termination condition, exhausting compute, budget, or downstream services.

research ↗ASI07AML.T0034AML.T0046CWE-835

ATD-T0066Delayed or conditional execution of injected instructions

An injected instruction defers its own effect to a later turn or a trigger condition so the malicious action fires after review rather than at ingestion time.

research ↗ASI01ASI07AML.T0094CWE-506

ATD-TA7 · Multi-Agent Dynamics (4)

ATD-T0020Agent Card poisoning to capture A2A task routing

A rogue A2A agent advertises an instruction-laden Agent Card so the orchestrator routes tasks to it.

research ↗ASI07ASI10AML.T0051.001CWE-345

ATD-T0021Cross-agent injection propagation (cascading compromise)

One compromised agent emits content that injects the next downstream agent, cascading through the swarm.

research ↗ASI08ASI07ASI10AML.T0051.001AML.T0080CWE-1427

ATD-T0044Inter-agent identity spoofing and forged-message injectionlive rule ↗

A compromised or peer agent spoofs another agent's identity, forges system-level message tags, or injects unauthenticated A2A messages to exploit inter-agent trust for privilege escalation or orchestrator bypass.

research ↗ASI07ASI10AML.T0051.001AML.T0051

ATD-T0045Multi-agent consensus and Sybil manipulationlive rule ↗

Instructions spin up multiple fake agent identities to coordinate votes, flood false proposals, or overwhelm a multi-agent consensus or voting mechanism toward an attacker-chosen outcome.

research ↗ASI10ASI07AML.T0043

ATD-TA8 · Model-Intrinsic & Governance (8)

ATD-T0022Trace tampering / non-tamper-evident agent audit logs

An agent logs reasoning but not the actual tool call, or logs are mutable — defeating after-the-fact audit.

research ↗ASI10CWE-778

ATD-T0025Acoustic prompt injection of a voice agent

An imperceptible adversarial audio perturbation mixed into normal speech drives a voice / audio-LLM agent to issue real tool calls, under audio-data-only access and with no textual user instruction. Text-layer rules cannot see it; detection is limited to the trace plane (a voice-initiated session producing high-risk tool calls with no corresponding textual instruction).

research ↗ASI01CWE-1427

ATD-T0035Token-level adversarial suffix and special-token boundary injectionlive rule ↗

Model-intrinsic exploitation: gradient-optimized suffixes (GCG bracket/word-salad) or model-specific control tokens (<|endoftext|>, ChatML <|im_start|>system, glitch tokens) are appended to shift output toward compliance or reset safety context with no semantic framing.

research ↗ASI01AML.T0054AML.T0051CWE-1427

ATD-T0036Output-boundary bypass to elicit harmful or prohibited contentlive rule ↗

An attacker shapes the request — completion-baiting, structured harm solicitation, or vulnerable-population framing — so the model emits content it would refuse if asked directly.

research ↗ASI01AML.T0054AML.T0057CWE-1427

ATD-T0037Coercing the model to generate weaponized or scanner-evading outputlive rule ↗

The model is directed to produce operational malicious artifacts (malware code, sub-functions) or known-bad test signatures (EICAR/GTUBE) that probe whether the output pipeline has any AV/scanning layer.

research ↗ASI01ASI08AML.T0053

ATD-T0038Model IP and training-data extraction via systematic inference probinglive rule ↗

An attacker issues bulk or divergent-repetition queries against the inference API to recover memorized training data or distill the model's behavior into a functional clone.

research ↗ASI01AML.T0040AML.T0024CWE-200

ATD-T0073Agent model-output reconnaissance and guardrail fingerprinting

An attacker systematically probes an agent's responses to fingerprint the underlying model, its guardrails, or its output format in order to tailor a follow-on attack.

research ↗ASI01AML.T0063CWE-200

ATD-T0079Secret leakage through agent or MCP server logs and traces

An agent or MCP server writes tool arguments, queries, or responses containing credentials or secrets into logs or traces, exposing them to anyone with log or trace access.

CVE-2026-44969 ↗ASI06CWE-532

ATD-TA9 · Agentic Commerce (forward) (3)

ATD-T0023Adversarial transaction steering of a purchasing agent

Injected content in a listing/page steers an autonomous-commerce agent to overpay or leak payment authority.

aspirationalASI01ASI02ASI09CWE-1427

ATD-T0024Payment-mandate forgery in an agent-to-agent handshake

A rogue agent spoofs delegated payment authority or mandate scope in an agentic-commerce exchange.

aspirationalASI03ASI07CWE-345

ATD-T0076Fraudulent transaction execution by a compromised commerce agent

A compromised or manipulated purchasing or payment agent executes unauthorized or fraudulent transactions on the user's behalf.

research ↗ASI06ASI01CWE-840

Entry schemaNormative

A technique entry is a YAML/JSON document validated against a normative JSON Schema (Draft 2020-12), shipped in-repo. It is designed to be compatible with the OSV and Sigma ecosystems. Required fields:

atd_id            ATD-T####            permanent
schema_version    SemVer, no "v"       additive-minor guarantee (OSV)
title             short imperative phrase
tactic            ATD-TA#
abstraction       pillar | class | base | variant
status            experimental | test | stable | deprecated
description       the technique and its attack mechanism
detection_surface content | tool_input | tool_response |
                  inter_agent_msg | memory_op | trace | screen | payment_mandate
mappings          owasp_asi[] · mitre_atlas[] · cwe[] · avid[] · maestro_layer[]
references         advisory / CVE / research URLs (evidence it is real)
detection_rules   UUIDv4[] into the rule corpus

A detection rule reuses the ATR rule schema already in production: regex/conditions, true_positives and true_negatives (a precision test), false-positive notes, response actions, and a valid agent_source.type.

Mappings & interoperabilityInformative

Legitimacy comes from interoperability, not from declaring authority. Every ATD technique maps to OWASP ASI, MITRE ATLAS and CWE where a slot exists; AVID and the MAESTRO layer are added where they apply. Where a technique falls in a gap no upstream framework yet names, ATD records the gap and feeds it back to those projects as a proposed technique and case study — the catalog grows by crosswalk, not by claiming a lane of its own.

ATD is designed to interoperate with — not compete against — AVID, which already operates a governed, submission-open AI vulnerability registry. ATD supplies the executable detection AVID's "Detection" report type expects.

ConformanceNormative

A conformant technique entry MUST validate against the JSON Schema, carry at least one framework mapping (or a documented gap), and cite at least one reference.
A conformant detection rule MUST pass the precision gate: every declared true-positive matches, zero false positives on the published benign corpus, and no cross-rule conflict.
A conformant consumer MUST honor the maturity field and SHOULD expose the ATD-T#### id on every detection it emits.

Governance & statusInformative

ATD is governed by a written charter and published as an open draft and an invitation to co-author. It earns the title standard when it meets a concrete, public neutrality bar adopted from the OpenSSF project lifecycle — at least three maintainers across at least two organizations — and we are actively seating them.

Governance follows a Minimal Viable Governance charter: lazy consensus of maintainers, a simple-majority Technical Steering Committee fallback, charter amendments by a two-thirds vote, and no single-person veto. The specification is CC-BY-4.0 and forkable by design — no party can hold it hostage — with a committed roadmap to a neutral foundation home (an OpenSSF working group, or the OWASP GenAI Security Project).

Call for collaborators — if your team works on agent security, map your tooling to ATD, submit a technique, or take a maintainer seat. See /contribute.

Downloads & artifactsInformative

atd-techniques.json — the full enumeration (machine-readable; OSV/Sigma-compatible fields)
atd-technique.schema.json — the normative JSON Schema (Draft 2020-12)
/spec — the ATR rule format (the executable rules techniques bind to)
DOI 10.5281/zenodo.19178002 — research & citation artifact

Editor: Adam Lin — Specification CC-BY-4.0 · Schema & tooling Apache-2.0 · Rule corpus MIT — ISO 8601 2026-06-13 — Editor's Draft, pre-governance; not a ratified standard.

§1AbstractInformative

§2Scope — what ATD is and is notNormative

§3Conceptual modelNormative

§4Technique catalogNormative