ATD — Agentic Threat Detection | ATR

技法已編目

有 CVE 佐證

已綁定偵測規則(覆蓋率,非門檻)

100% · 0 FP

recall(skill 語料 n=341)

98%

recall(garak 野外 n=650)

ATD 的偵測規則由 ATR 發布 —— 已在 production:Microsoft Agent Governance Toolkit、Cisco AI Defense;並於 MISP/CIRCL 對映。consumer 整合的是 ATR 規則,非對本草案標準的背書。DOI 10.5281/zenodo.19178002。

Abstract (摘要)Informative

Agentic Threat Detection (ATD) 是 agent 原生威脅技法的開放、機器可讀知識庫。agent 出錯的方式,既有的安全分類大多沒有名字 ── 一段被下毒的 tool description、一張偽造的 agent card、一次潛伏三個 session 的 memory 寫入。ATD 為每一種命名、給它一個永久 ID 與穩定定義,再對映既有的論述型框架 ── OWASP Agentic Top 10、MITRE ATLAS、CWE、AVID ── 並附帶該攻擊確實發生過的真實佐證。

那些框架告訴你 agent 會出什麼錯。ATD 是其下的一層,為每條技法如何在可觀測之處顯現命名 ── prompt、tool call、tool response、inter-agent 訊息、memory 寫入、trace。那正是偵測規則必須掛上的地方;而在技法被命名之前,每個團隊都從零重寫一遍描述。一條技法在攻擊被記錄的當下就進入目錄 ── 早於任何偵測規則的存在 ── 讓詞彙永遠領先工具,而非落後。當技法已長出一條通過閘門的偵測規則,ATD 把兩者綁定;覆蓋率是要成長的屬性,不是進入的條件。ATD 之於 agent 威脅,如同 MITRE ATLAS 之於 AI 攻擊、CWE 之於軟體弱點:一套整個生態都能指向的、可引用的索引。

目錄至今已為 9 個戰術下的 81 條技法命名,每條至少對映一個上游框架,其中 47 條已綁定一條 live、通過閘門的偵測規則。這正是一套參考被衡量的廣度 ── 而它成長的速度,告訴你這片攻擊面仍在擴張。

狀態:Editor's Draft ── 一份 request for comments,公開邀請共同撰寫。它尚未是 ratified standard;唯有達到 §8 的中立門檻後才冠此名。我們公開它是為了徵集協作者與對映夥伴,不是宣稱權威。

Scope ── ATD 是什麼、不是什麼Normative

ATD 列舉 可在 agent I/O 觀測到的 agent-runtime 威脅技法:prompt/content、tool call、tool response、inter-agent 訊息、memory 操作、trace、螢幕狀態、payment mandate。

ATD MUST NOT 鑄造漏洞實例識別碼。特定 MCP server 的特定 CVE 屬於 CVE / GHSA / OSV / AVID;ATD 引用它,不取代它。
ATD MUST NOT 自居為競爭性的頂層風險分類。「agentic 十大風險」的框架屬於 OWASP ASI;ATD 在其下,讓它可執行。
ATD MUST 為每條技法至少對映一個上游框架,或在尚無對映時記錄該缺口。

此邊界是刻意的。鏡像 CVE 地盤的漏洞登錄(如已封存的 GSD)碎片化而停擺。ATD 填的是無人覆蓋的 可執行偵測 層,不重打已定的戰場。

概念模型Normative

ATD 採用兩條軸,借自已驗證的標準:

Tactic / Technique 矩陣(MITRE ATLAS 模型):Tactic 是對手的目標階段;Technique 是具體、可偵測的攻擊模式;Sub-technique 是變體。
抽象層級(CWE 模型):每條技法標 pillar / class / base / variant。

識別碼。Tactic 為 ATD-TA1 … ATD-TA9。Technique 為 ATD-T0001(補零、序列、永久、不重用);sub-technique 用點記法 ATD-T0001.001。每條偵測規則另帶 UUIDv4,使規則在改名後仍可追蹤 ── technique 是 catalog,rule 是綁定其上的可執行實例。

成熟度。每條技法與規則皆帶階梯狀態 experimental → test → stable(加 deprecated)。production 消費者 SHOULD 只自動同步 stable。

技法目錄Normative

這是完整目錄 ── 81 條技法,沿 9 個戰術階段排列,從 MCP 協定層一路到 multi-agent 動態與 agentic commerce。每條都有一個永久 ID、一段穩定定義,並對映 OWASP ASI / MITRE ATLAS / CWE,附上真實 CVE 或研究佐證。其中 47 條已綁定一條 live 的 ATR 偵測規則 ── 其餘是僅被記錄的技法,在此地位完全相同:命名一個威脅不需要先有規則。已記錄而尚無公開實例者誠實標 research / aspirational,絕不偽裝成 CVE。每個框架識別碼都對原始來源現驗 (2026-06-14)。

ATD-TA1 · Protocol & Interconnect (11)

ATD-T0001Shell metacharacter injection through MCP tool parameterslive 規則 ↗

Unsanitized MCP tool input reaches execSync/exec, yielding RCE on the server host.

CVE-2025-53355 ↗ASI05ASI02AML.T0053CWE-77

ATD-T0002curl-fallback command injection in an MCP serverlive 規則 ↗

A failed fetch falls back to exec'ing curl with an unsanitized URL, enabling RCE.

CVE-2025-53967 ↗ASI05ASI02AML.T0053CWE-420

ATD-T0003Command injection in a scaffolded MCP stdio serverlive 規則 ↗

Generated server concatenates tool input into exec(), giving RCE to anything built from it.

CVE-2025-54994 ↗ASI04ASI05AML.T0010.005CWE-78

ATD-T0004Line-jumping — tool-description injection at listing timelive 規則 ↗

Hidden instructions in a tool description enter the model context at tools/list, before any call.

研究 ↗ASI04ASI01AML.T0110AML.T0104CWE-1427

ATD-T0005Rug pull — silent mutation of an approved toollive 規則 ↗

A server approved once later changes a tool's definition with no integrity re-check.

研究 ↗ASI04AML.T0109AML.T0110CWE-494

ATD-T0006Missing-auth MCP proxy command execution

No auth between client and MCP proxy lets any local/web-driven request spawn MCP processes.

CVE-2025-49596 ↗ASI03ASI05CWE-306

ATD-T0007RCE from a malicious upstream MCP server

A client is RCE'd via a crafted authorization_endpoint URL in an untrusted server's response.

CVE-2025-6514 ↗ASI04ASI05AML.T0010.005CWE-78

ATD-T0008DNS-rebinding to a localhost MCP server

A malicious page rebinds DNS to reach an unauthenticated localhost MCP server cross-origin.

CVE-2025-66416 ↗ASI07ASI03CWE-1188

ATD-T0009Session ID / auth token placed in a URL query stringlive 規則 ↗

A credential in the query string leaks via server logs, proxies, CDNs, history, and Referer.

研究 ↗ASI03ASI07CWE-598

ATD-T0067Unauthenticated MCP transport endpoint exposed to network reach

An MCP server's SSE/HTTP transport port is bound to a routable interface with no auth, so a remote or cross-origin caller can open a session and invoke tools.

研究 ↗ASI03ASI05CWE-668

ATD-T0080Cross-client data leak via shared MCP server state

A shared MCP server or transport mixes state across concurrent clients (a race condition or missing per-client isolation), so one client receives another client's data or context.

CVE-2026-25536 ↗ASI06ASI07CWE-362

ATD-TA2 · Memory & Context Integrity (9)

ATD-T0010Serialized-object smuggling through an LLM response field

Injected output carries a serialization marker; deserialization rehydrates it as trusted and exfiltrates secrets.

CVE-2025-68664 ↗ASI06ASI01AML.T0051.001CWE-502

ATD-T0011Persistent memory / context-store poisoning

Attacker-controlled content is written into data the agent later reads back as trusted context.

研究 ↗ASI06AML.T0080CWE-349

ATD-T0026Sleeper (dormant) memory poisoning

Attacker-controlled external content is written into the agent's persistent memory and lies dormant across sessions, re-emerging in later conversations to steer actions — decoupling the injection event from the malicious effect in time. The deterministic detection chokepoint is the memory-write boundary.

研究 ↗ASI06AML.T0080CWE-349

ATD-T0059Sensitive data exfiltration via agent-rendered markdown image/link URLlive 規則 ↗

The agent is coerced into emitting markdown image or link syntax whose attacker-controlled URL encodes conversation, secret, or context data, so the rendering client auto-fetches it and leaks the data out-of-band.

研究 ↗ASI06ASI08AML.T0024AML.T0057CWE-200

ATD-T0060Tool-response-embedded exfiltration channellive 規則 ↗

A tool or MCP response embeds a secret-harvesting instruction or sensitive-data addendum inside otherwise legitimate-looking output, exploiting the agent's trust in tool results to smuggle credentials or context into the next turn.

研究 ↗ASI06ASI08AML.T0054CWE-200

ATD-T0061Credential elicitation in model output (secret completion / verbatim disclosure)live 規則 ↗

A prompt coaxes the model to generate, complete, partially reveal, or echo back live API keys, tokens, or secrets present in context (including obfuscated/encoded forms) so the secret materializes directly in the response.

研究 ↗ASI06AML.T0057CWE-200

ATD-T0062Cross-context / cross-user memory leakage in multi-agent delegationlive 規則 ↗

A privileged context attribute (session, user, or conversation id) fails to stay constant across an agent delegation chain or shared memory store, letting one user's or agent's context surface to another party.

CVE-2026-41712 ↗ASI06CWE-668

ATD-T0063Offline training / fine-tuning corpus contaminationlive 規則 ↗

An attacker plants malicious or label-flipped samples into a model's training or fine-tuning dataset so the deployed model carries an attacker-chosen behavior or backdoor.

研究 ↗ASI06AML.T0020AML.T0018CWE-349

ATD-T0074RAG and retrieval-index target reconnaissance

An attacker enumerates an agent's RAG or retrieval index to locate documents to poison or sensitive entries to exfiltrate.

研究 ↗ASI06AML.T0064CWE-200

ATD-TA3 · Goal, Planning & Reasoning (11)

ATD-T0012Indirect prompt injection via tool / API responselive 規則 ↗

Malicious text in returned tool data overrides the agent's plan and redirects its actions.

研究 ↗ASI01AML.T0051.001AML.T0099CWE-1427

ATD-T0013System-prompt / guardrail extraction to plan evasion

Crafted queries coerce the agent to reveal its hidden system prompt, exposing control logic.

研究 ↗ASI01ASI09AML.T0056CWE-200

ATD-T0027Persona/roleplay jailbreak to override safety policylive 規則 ↗

Attacker assigns the model an unrestricted alter-ego or fictional character (DAN, AIM, Developer Mode, dual-response split, grandma/historical/amoral persona, game-master) so it treats safety-violating output as in-character roleplay rather than its own refusal-bound behavior.

研究 ↗ASI01AML.T0054AML.T0051.000CWE-1427

ATD-T0028Direct instruction override and goal hijackinglive 規則 ↗

Untrusted input issues imperative directives (ignore/forget previous instructions, redefine the system prompt, force verbatim payload output) or pivots the objective via false premise / goal drift, displacing the agent's assigned goal and constraints without persona framing.

研究 ↗ASI01AML.T0051AML.T0051.000CWE-1427

ATD-T0029Encoding/cipher obfuscation to smuggle harmful instructions past plaintext filterslive 規則 ↗

Harmful instructions are wrapped in a reversible visible encoding (Base16/32/64/85, hex, Morse, NATO, leetspeak, Braille, Ecoji, Base2048, ROT/cipher, homoglyph) and the model is asked to decode-then-execute, exploiting that plaintext-trained safety classifiers do not generalize to the encoded form.

研究 ↗ASI01AML.T0054AML.T0051AML.T0068CWE-1427

ATD-T0030Invisible Unicode steganographic smuggling of agent instructionslive 規則 ↗

Instructions are hidden in imperceptible Unicode that renders blank to humans but tokenizes for the model: zero-width chars, BiDi RLO/LRO overrides, Unicode Tag-block (U+E0000-E007F), variation-selector ASCII smuggling, breaking keyword matching and human audit while preserving model comprehension.

研究 ↗ASI01AML.T0051CWE-1427

ATD-T0031Hypothetical/fictional/tense reframing to bypass refusallive 規則 ↗

The harmful request is wrapped in a distancing frame — fictional/story/academic context, past/future tense rewrite, hypothetical-response or function-masking continuation — so the model answers the harmful core believing it is producing fiction or historical analysis.

研究 ↗ASI01AML.T0054CWE-1427

ATD-T0032Coercive output-format and conditional-unlock pressurelive 規則 ↗

The prompt applies threats, ultimatums, or fabricated conditional-unlock rules to force the model into an attacker-specified output mode or override its formatting/refusal policy.

研究 ↗ASI01AML.T0051AML.T0054

ATD-T0033Casual-authority impersonation for scope escalation and output redirectlive 規則 ↗

Conversational, informal authority claims ("the orchestrator wants...", "the boss said skip...") assert delegated permission to widen the agent's scope, disable safety filters, or redirect output to an attacker endpoint, evading explicit-injection detection.

研究 ↗ASI01ASI10AML.T0051

ATD-T0034Foot-in-the-door gradual compliance escalationlive 規則 ↗

The attacker establishes compliance with a benign request, then incrementally escalates toward harmful instructions while citing the agent's prior compliance as justification.

研究 ↗ASI01AML.T0051

ATD-T0064Forged trusted-output-component manipulation

The agent emits output whose citations, source attributions, or structured 'verified' fields are attacker-forged, so a downstream consumer trusts fabricated components as authoritative.

研究 ↗ASI01ASI08AML.T0067CWE-345

ATD-TA4 · Identity, Authz & Delegation (9)

ATD-T0014Confused-deputy token passthrough in an MCP server

A server forwards a held token to a downstream API without audience validation, escalating privilege.

研究 ↗ASI03CWE-441

ATD-T0056Agent tool exceeds delegated permission scope to reach admin or out-of-bounds functionslive 規則 ↗

An agent invokes tools or administrative functions beyond its granted authority — abruptly or via incremental scope creep — accessing user-management, system-settings, or admin panels it was never delegated.

研究 ↗ASI03AML.T0040AML.T0047CWE-269CWE-862

ATD-T0057Credential/secret file harvest combined with network exfiltrationlive 規則 ↗

Agent or MCP-tool instructions read well-known credential stores (.env, ~/.aws/credentials, SSH keys, .npmrc, browser cookie DBs) and pipe or POST them to an external endpoint in the same context, producing host credential theft with lateral-movement reach.

研究 ↗ASI02ASI06AML.T0055AML.T0083AML.T0090AML.T0098CWE-522

ATD-T0058Agent memory or context store written across a conversation or tenant boundarylive 規則 ↗

An agent writes into a memory/vector store scoped to a different conversation or tenant than the active trace, escaping its data boundary to plant content that later sessions will read.

研究 ↗ASI04ASI06AML.T0080CWE-668CWE-349

ATD-T0068OAuth consent phishing to bind an attacker-controlled MCP authorization grant

A user is steered through a real OAuth consent screen for an attacker's client/redirect so the agent ends up holding a token scoped to the attacker.

研究 ↗ASI03CWE-1021

ATD-T0069OAuth protocol downgrade to defeat PKCE or strip token binding

The MCP authorization handshake is forced onto a weaker flow (implicit grant, dropped PKCE/state) so an intercepted code or token can be replayed.

研究 ↗ASI03CWE-757

ATD-T0070Agent reads process environment variables to harvest injected secrets

An agent or MCP tool enumerates process env vars (printenv, process.env, os.environ) to lift API keys and tokens injected at server startup.

研究 ↗ASI06AML.T0055CWE-526

ATD-T0075Agent configuration and permission reconnaissance

An attacker discovers an agent's configuration, connected tools, and granted permissions to plan a tailored privilege-escalation or tool-abuse attack.

研究 ↗ASI03AML.T0084CWE-200

ATD-T0077Broken object or workspace authorization in an agent tool (IDOR)

An agent or MCP tool serves or mutates data across a user, workspace, or tenant boundary because it omits object-level authorization, letting one principal reach another principal's resources.

CVE-2026-46519 ↗ASI03CWE-639CWE-863CWE-284

ATD-TA5 · Tool & Supply Chain (17)

ATD-T0015Agent reads .env / secret files without consentlive 規則 ↗

An agent tool reads credential files (.env, credentials, .npmrc) outside any user-approved scope.

研究 ↗ASI02ASI06AML.T0053CWE-538

ATD-T0016Hallucinated-dependency squatting (slopsquatting)

The model recommends a fabricated package name an attacker pre-registers, pulling code into the agent env.

研究 ↗ASI04ASI08AML.T0060AML.T0062CWE-1427

ATD-T0046Hidden directive embedded in MCP tool description subverts consent or safety gatinglive 規則 ↗

A tool's natural-language description or docstring carries imperative instructions (auto-forward results, skip user confirmation, ignore safety policy) that the model obeys at invocation time.

研究 ↗ASI04ASI02AML.T0053CWE-94CWE-1427

ATD-T0047Tool schema-description divergence hides write or admin capability behind a read-only claimlive 規則 ↗

A tool advertises safe/read-only behavior in its description while its JSON schema or runtime accepts undeclared write-capable, admin, or debug parameters that exceed the stated function.

研究 ↗ASI02ASI05AML.T0010AML.T0056CWE-1427CWE-440

ATD-T0048Unauthenticated MCP server or agent API exposes privileged operationslive 規則 ↗

An MCP server, agent control API, or admin endpoint ships with authentication disabled or missing on critical functions, letting an unauthenticated caller invoke tools, exfiltrate data, or take over the cluster.

CVE-2026-32211 ↗ASI05ASI03AML.T0049AML.T0040CWE-306

ATD-T0049Untrusted tool output rendered without sanitization triggers terminal or browser code executionlive 規則 ↗

Tool/skill output containing ANSI/OSC escape sequences or XSS payloads is passed unsanitized into a CLI terminal or web agent UI, hijacking the display, forging prompts, or executing script in the operator's session.

研究 ↗ASI08ASI02AML.T0057AML.T0077CWE-150CWE-79

ATD-T0050Agent SSRF via unvalidated fetch URL reaches cloud metadata or internal serviceslive 規則 ↗

An agent fetch/RAG/retrieval tool accepts an attacker-influenced URL whose validator can be bypassed (encoding, decimal/hex IP, IPv6 loopback, DNS rebinding) to reach link-local cloud-metadata endpoints or internal hosts.

CVE-2026-2286 ↗ASI02ASI05AML.T0049CWE-918CWE-552

ATD-T0051Malicious code execution staged inside an installed skill packagelive 規則 ↗

A SKILL.md or bundled script carries executable attack payloads — base64/raw-IP droppers, reverse shells, fake-backup credential stealers, or C2 callbacks — that run when the skill is installed or invoked.

研究 ↗ASI04AML.T0010CWE-506

ATD-T0052Skill instruction smuggling via hidden/invisible payload channelslive 規則 ↗

Attack instructions are concealed in a skill's text where a human reviewer cannot see them — HTML comments, Unicode Tag characters (U+E0000 range), or other invisible glyphs — but the agent still parses and obeys them.

研究 ↗ASI04AML.T0010CWE-506

ATD-T0053Skill supply-chain tampering: rug-pull, time-bomb, and self-modifying persistencelive 規則 ↗

A skill is architected to mutate after trust is granted — remote dynamic-code loading, time-gated exfiltration triggers, post-install hooks, self-rewriting SKILL.md, or worm-style propagation — so benign-at-review content turns malicious at runtime.

CVE-2025-59536 ↗ASI04AML.T0010CWE-494

ATD-T0054Upstream-skill impersonation: typosquat, fork-claim, and slopsquat baitinglive 規則 ↗

A malicious package masquerades as a trusted tool via misspelled/namespace-colliding names, false 'community fork / enhanced version' install instructions, or hallucinated-dependency baiting, hijacking the trust of the legitimate upstream.

研究 ↗ASI04AML.T0010CWE-829

ATD-T0055Weaponized skill turning the agent into an offensive/over-privileged actorlive 規則 ↗

An approved skill exploits the post-consent gap to direct the agent itself into offensive operations or excessive scope — running attacker tooling, installing unauthorized background tasks, or loading unsafe model artifacts — beyond what the user sanctioned.

研究 ↗ASI04CWE-269

ATD-T0065Sensitive-data exfiltration via legitimate agent tool invocation

The agent invokes an otherwise-sanctioned tool (send, post, upload, write, fetch) to ship sensitive context or data to an attacker-controlled destination.

研究 ↗ASI06ASI02AML.T0086CWE-200

ATD-T0071Cross-server tool-chaining pivot reaches an unrelated tool's resources

The agent is steered to pass one tool's output as another connected tool's input so a low-trust tool drives a high-privilege tool's action across server boundaries.

研究 ↗ASI05ASI03AML.T0047CWE-441

ATD-T0072Outbound webhook used as covert command-and-control channel

An agent tool registers or polls an attacker-controlled webhook/URL, turning routine outbound HTTP into a tasking and data-return channel.

研究 ↗ASI08ASI06AML.T0072CWE-918

ATD-T0078Prompt-to-SQL injection via an agent's natural-language database tool

An agent's database or query tool builds a SQL/query string from natural-language or model output without parameterization, so a crafted prompt injects the query and can escalate to RCE.

CVE-2026-25879 ↗ASI05ASI02AML.T0053CWE-89

ATD-T0081Remediation-framed command execution via tool-response injection (agentjacking)live 規則 ↗

Externally-influenced content returned through an MCP/tool integration disguises an executable command as legitimate remediation guidance (a Resolution/recommended-fix/required-step section running an npx/uvx/pipx package or a pipe-to-shell one-liner), so an AI coding agent runs attacker-controlled code while every step in the chain appears authorized. Prompt-layer defenses fail because the agent cannot distinguish data it reads from an instruction to act; the durable mitigation is detection on the tool-output boundary. Generalizes beyond the disclosed Sentry vector to any tool integration that returns externally-influenced data to an agent.

研究 ↗ASI02ASI05AML.T0051.001CWE-829CWE-77

ATD-TA6 · Execution & Autonomy (9)

ATD-T0017Path-traversal blacklist bypass via non-canonical pathslive 規則 ↗

Exact-string path checks are bypassed with ../, /./, redundant slashes to reach sensitive files.

CVE-2025-66689 ↗ASI02ASI03AML.T0053CWE-22

ATD-T0018MCP filesystem sandbox escape via symlink following

A symlink inside an allowed directory resolves to an out-of-scope path, granting system file access.

CVE-2025-53109 ↗ASI02ASI05AML.T0053CWE-59

ATD-T0019Prompt-injection-to-RCE via an agent's file-write capability

Injected instructions drive the agent to write a startup/config file that yields persistent code execution.

研究 ↗ASI05ASI01AML.T0053AML.T0051.001CWE-94

ATD-T0039Human-in-the-loop trust and approval-fatigue exploitationlive 規則 ↗

Agent output is weaponized against the supervising human via fabricated confidence, suppressed uncertainty, manufactured urgency, or risky actions batched among benign ones to fatigue and bypass human approval gates.

研究 ↗ASI09

ATD-T0040Agent rationalizes bypassing a required human-approval or safety gatelive 規則 ↗

The agent skips a mandated approval/safety control — invoking a destructive tool with no preceding human-approval span, or self-justifying a direct path ('to be more efficient') — collapsing the human-in-the-loop checkpoint.

研究 ↗ASI04ASI05AML.T0053CWE-862CWE-841

ATD-T0041Agent code-execution sandbox escape achieves host RCE or boot-time persistencelive 規則 ↗

An agent's code-interpreter or VM sandbox is broken out of — via arbitrary file write to a startup path, eval/dynamic-import primitives, or a boundary flaw — yielding host code execution that can persist across restarts.

CVE-2026-27597 ↗ASI05ASI06AML.T0050CWE-94CWE-693

ATD-T0042Consequential autonomous action without human-in-the-loop confirmationlive 規則 ↗

An agent executes a high-risk, often irreversible operation — payment/transfer, shell command, destructive call, internal-network fetch — without an explicit per-turn human approval gate.

研究 ↗ASI07ASI06AML.T0053AML.T0101CWE-862

ATD-T0043Runaway self-perpetuating execution loop (denial-of-wallet)live 規則 ↗

An agent enters an unbounded retry, recursive self-invocation, or tight tool-call loop with no termination condition, exhausting compute, budget, or downstream services.

研究 ↗ASI07AML.T0034AML.T0046CWE-835

ATD-T0066Delayed or conditional execution of injected instructions

An injected instruction defers its own effect to a later turn or a trigger condition so the malicious action fires after review rather than at ingestion time.

研究 ↗ASI01ASI07AML.T0094CWE-506

ATD-TA7 · Multi-Agent Dynamics (4)

ATD-T0020Agent Card poisoning to capture A2A task routing

A rogue A2A agent advertises an instruction-laden Agent Card so the orchestrator routes tasks to it.

研究 ↗ASI07ASI10AML.T0051.001CWE-345

ATD-T0021Cross-agent injection propagation (cascading compromise)

One compromised agent emits content that injects the next downstream agent, cascading through the swarm.

研究 ↗ASI08ASI07ASI10AML.T0051.001AML.T0080CWE-1427

ATD-T0044Inter-agent identity spoofing and forged-message injectionlive 規則 ↗

A compromised or peer agent spoofs another agent's identity, forges system-level message tags, or injects unauthenticated A2A messages to exploit inter-agent trust for privilege escalation or orchestrator bypass.

研究 ↗ASI07ASI10AML.T0051.001AML.T0051

ATD-T0045Multi-agent consensus and Sybil manipulationlive 規則 ↗

Instructions spin up multiple fake agent identities to coordinate votes, flood false proposals, or overwhelm a multi-agent consensus or voting mechanism toward an attacker-chosen outcome.

研究 ↗ASI10ASI07AML.T0043

ATD-TA8 · Model-Intrinsic & Governance (8)

ATD-T0022Trace tampering / non-tamper-evident agent audit logs

An agent logs reasoning but not the actual tool call, or logs are mutable — defeating after-the-fact audit.

研究 ↗ASI10CWE-778

ATD-T0025Acoustic prompt injection of a voice agent

An imperceptible adversarial audio perturbation mixed into normal speech drives a voice / audio-LLM agent to issue real tool calls, under audio-data-only access and with no textual user instruction. Text-layer rules cannot see it; detection is limited to the trace plane (a voice-initiated session producing high-risk tool calls with no corresponding textual instruction).

研究 ↗ASI01CWE-1427

ATD-T0035Token-level adversarial suffix and special-token boundary injectionlive 規則 ↗

Model-intrinsic exploitation: gradient-optimized suffixes (GCG bracket/word-salad) or model-specific control tokens (<|endoftext|>, ChatML <|im_start|>system, glitch tokens) are appended to shift output toward compliance or reset safety context with no semantic framing.

研究 ↗ASI01AML.T0054AML.T0051CWE-1427

ATD-T0036Output-boundary bypass to elicit harmful or prohibited contentlive 規則 ↗

An attacker shapes the request — completion-baiting, structured harm solicitation, or vulnerable-population framing — so the model emits content it would refuse if asked directly.

研究 ↗ASI01AML.T0054AML.T0057CWE-1427

ATD-T0037Coercing the model to generate weaponized or scanner-evading outputlive 規則 ↗

The model is directed to produce operational malicious artifacts (malware code, sub-functions) or known-bad test signatures (EICAR/GTUBE) that probe whether the output pipeline has any AV/scanning layer.

研究 ↗ASI01ASI08AML.T0053

ATD-T0038Model IP and training-data extraction via systematic inference probinglive 規則 ↗

An attacker issues bulk or divergent-repetition queries against the inference API to recover memorized training data or distill the model's behavior into a functional clone.

研究 ↗ASI01AML.T0040AML.T0024CWE-200

ATD-T0073Agent model-output reconnaissance and guardrail fingerprinting

An attacker systematically probes an agent's responses to fingerprint the underlying model, its guardrails, or its output format in order to tailor a follow-on attack.

研究 ↗ASI01AML.T0063CWE-200

ATD-T0079Secret leakage through agent or MCP server logs and traces

An agent or MCP server writes tool arguments, queries, or responses containing credentials or secrets into logs or traces, exposing them to anyone with log or trace access.

CVE-2026-44969 ↗ASI06CWE-532

ATD-TA9 · Agentic Commerce (forward) (3)

ATD-T0023Adversarial transaction steering of a purchasing agent

Injected content in a listing/page steers an autonomous-commerce agent to overpay or leak payment authority.

前瞻(尚無實例)ASI01ASI02ASI09CWE-1427

ATD-T0024Payment-mandate forgery in an agent-to-agent handshake

A rogue agent spoofs delegated payment authority or mandate scope in an agentic-commerce exchange.

前瞻(尚無實例)ASI03ASI07CWE-345

ATD-T0076Fraudulent transaction execution by a compromised commerce agent

A compromised or manipulated purchasing or payment agent executes unauthorized or fraudulent transactions on the user's behalf.

研究 ↗ASI06ASI01CWE-840

Entry schemaNormative

technique entry 是 YAML/JSON 文件,對 normative JSON Schema(Draft 2020-12)驗證,隨 repo 發布,設計上與 OSV 及 Sigma 生態相容。必填欄位:

atd_id            ATD-T####            永久
schema_version    SemVer, 無 "v"        加法式 minor 保證(OSV)
title             簡短祈使片語
tactic            ATD-TA#
abstraction       pillar | class | base | variant
status            experimental | test | stable | deprecated
description       技法與其攻擊機制
detection_surface content | tool_input | tool_response |
                  inter_agent_msg | memory_op | trace | screen | payment_mandate
mappings          owasp_asi[] · mitre_atlas[] · cwe[] · avid[] · maestro_layer[]
references         advisory / CVE / research URL(真實證據)
detection_rules   UUIDv4[] 指向 rule corpus

detection rule 重用已在 production 的 ATR rule schema:regex/conditions、true_positives 與 true_negatives(精準度測試)、false-positive 註記、response action、合法的 agent_source.type。

對映與互通Informative

合法性來自互通,而非自封權威。每條 ATD 技法在有對應槽時對映 OWASP ASI、MITRE ATLAS、CWE;在適用時加上 AVID 與 MAESTRO 層。當一條技法落在上游框架尚未命名的缺口,ATD 記錄該缺口,並以提議技法與案例研究回饋給那些專案 ── 目錄靠 crosswalk 成長,而非自闢一條地盤。

ATD 設計上與 AVID 互通而非競爭 ── AVID 已營運一個有治理、開放投稿的 AI 漏洞登錄。ATD 提供 AVID「Detection」報告類型所期待的可執行偵測。

ConformanceNormative

conformant technique entry MUST 通過 JSON Schema 驗證,至少帶一個框架對映(或記錄缺口),並至少引一個 reference。
conformant detection rule MUST 通過精準度閘:每條宣告的 true-positive 命中、對公開 benign corpus 零誤報、無跨規則衝突。
conformant consumer MUST 尊重 maturity 欄,並 SHOULD 在每筆偵測上揭露 ATD-T#### id。

治理與狀態Informative

ATD 依書面 charter 治理,以開放草案發布,並公開邀請共同撰寫。它在達到一條具體、公開的中立門檻(採自 OpenSSF 專案生命週期:至少三名 maintainer,橫跨至少兩個組織)後冠上 standard 之名 ── 我們正在積極就位。

治理採 Minimal Viable Governance charter:maintainer 的 lazy consensus、Technical Steering Committee 簡單多數 fallback、章程修訂需三分之二投票、無單人否決。規格以 CC-BY-4.0 授權且設計上可 fork ── 無任何一方能挾持它 ── 並承諾遷往中立基金會(OpenSSF working group,或 OWASP GenAI Security Project)。

徵求協作者 ── 若你的團隊從事 agent 安全,把工具對映到 ATD、投一條技法、或擔任 maintainer 席位。見 /contribute。

下載與產物Informative

atd-techniques.json — 完整 enumeration(機器可讀,OSV/Sigma 相容欄位)
atd-technique.schema.json — normative JSON Schema (Draft 2020-12)
/spec — ATR 規則格式(技法綁定的可執行規則)
DOI 10.5281/zenodo.19178002 — 研究與引用 artifact

編輯: Adam Lin — 規格 CC-BY-4.0 · Schema 與工具 Apache-2.0 · 規則庫 MIT — ISO 8601 2026-06-13 — Editor's Draft,治理前;非 ratified standard。

§1Abstract (摘要)Informative

§2Scope ── ATD 是什麼、不是什麼Normative

§3概念模型Normative

§4技法目錄Normative