A public attack-detection standard must answer for dual use.
ATR's 655 rules describe AI agent attack patterns for the purpose of detection. A standard that documents attacks can also be misread as a recipe for them — CVE, Sigma, YARA, and ATT&CK all carry the same tension. ATR's answer is transparency: the rules are public and reviewable, and the boundaries are written down here rather than relying on obscurity. This page states the design intent, the known dual-use risks, what ATR does not catch, and how to report misuse.
ATR is a defensive detection standard.
Run ATR rules in CI/CD pipelines, agent runtimes, or MCP server middleware to detect known attack patterns in SKILL.md files, tool descriptions, and agent configs. Cisco AI Defense and Microsoft AGT are production examples.
Within an authorized scope, use ATR rules to measure what fraction of the attacks your red-team tool discovers already have detection coverage, and what fraction are novel — the novel fraction is where the next rule comes from. This is the design intent of the in-review NVIDIA garak integration.
Every ATR rule carries crosswalks to OWASP, MITRE ATLAS, NIST AI RMF, EU AI Act, and ISO 42001. Use those mappings as executable coverage evidence — showing auditors or procurement which framework requirements have a corresponding detection.
Cite specific rule IDs (e.g. ATR-2026-00440) as executable detection baselines for attack research. IDs follow CVE/CWE-style numbering and never change after publication — safe for papers, CI scripts, and external documentation to reference for the long term.
ATR rules should not be used as attack generators.
Each ATR rule contains a regex description of an attack pattern and test cases covering both true_positives and true_negatives. Those test_cases exist to prove the rule detects what it claims and does not fire on benign input — they are how the standard stays conformance-testable, not a supply of ready-made attack payloads.
The following uses constitute misuse:
- ▸Using true_positive test cases directly as attack payloads against production AI agent systems without authorization.
- ▸Reverse-engineering ATR rule regexes into evasion variants and deploying them against unauthorized targets.
- ▸Building attack automation tools grounded in the ATR rule corpus with the intent of breaching systems that have deployed ATR rules.
- ▸Using ATR data to train attack models aimed at generating adversarial inputs that evade ATR detection.
The MIT license permits any use, including commercial and forked use. This misuse definition is not a legal restriction — a standard cannot control how it is used, and should not pretend to. It is a clear statement of design intent, giving adopters a reference they can cite when evaluating risk.
This standard is dual-use, and we say so in the open.
ATR rules describe behavioral signatures of AI agent attacks. Any system that documents attacks — CVE databases, Sigma rules, YARA signatures, MITRE ATT&CK — carries the same tension. The field's long-standing consensus is that writing techniques down in the open gives defenders a lead larger than the convenience attackers draw from the documentation. Obscurity only delays; disclosure lets every defender close the gap at once. ATR sits on the side of transparency because a detection standard is only worth as much as it can be reviewed.
Two deliberate design constraints narrow that tension:
- 1.The true_positive test cases are minimal pattern exemplars, not complete attack chains. They are sufficient to validate detection, yet still require additional attack engineering to become a working exploit — the signal a detection standard needs is a small fraction of the work weaponization takes.
- 2.Rules for the most dangerous cases (high CVSS, actively exploited in the wild) follow responsible-disclosure timelines, confirming affected vendors have had patch time before the PR is filed. ATR-2026-00440 and ATR-2026-00441 — covering two critical Microsoft Semantic Kernel CVEs (CVE-2026-26030 and CVE-2026-25592) — are an instance of this: the rules were published only after MSRC's public disclosure.
Source provenance is bounded too. PyRIT's Pliny L1B3RT4S dataset was not imported — Anthropic's usage policy does not allow our subagents to consume it. AdvBench, HarmBench, and JailbreakBench are classified as test corpora (data/test-corpora/) rather than rule sources: those datasets describe target behaviors, not wrapped attack payloads.
Finally, honest disclosure means stating what ATR cannot catch. The rules are pattern-based, and are blind by construction to paraphrase, semantic equivalence, non-English injection, gradual multi-turn escalation, and transport-layer protocol attacks — any attacker who reads the published rules can route around the specific verbs, nouns, and syntax. ATR publishes 64 known evasion techniques (in each rule's evasion_tests field) and a full LIMITATIONS.md, writing these boundaries down as a citable document rather than leaving them as a blank in the marketing. A detection standard earns trust by publishing its worst figure, not hiding it.
If you observe ATR being misused.
A standard layer cannot prevent misuse, and should not pretend to — that is not what the MIT license permits, nor good engineering. But misuse cases are worth knowing about: they feed back into documentation and rule design, making the next revision clearer and harder to turn around. If you see ATR being misused, tell us.