What ATR is built for — and what it is not.
ATR's 419 rules describe AI agent attack patterns for the purpose of detection. The same descriptions can be misused to generate attacks. This page explains our design intent, known dual-use risks, and how to report misuse.
ATR is a defensive detection tool.
Run ATR rules in CI/CD pipelines, agent runtimes, or MCP server middleware to detect known attack patterns. Cisco AI Defense and Microsoft AGT are production examples.
Use ATR rules to measure what fraction of attacks your red-team tool discovers already have detection coverage, and what fraction are novel. This is the design intent of the NVIDIA garak integration.
ATR rules map to OWASP, MITRE ATLAS, NIST AI RMF, EU AI Act, and ISO 42001. Use these mappings to generate compliance evidence packages or explain framework coverage to procurement.
Cite specific rule IDs (e.g. ATR-2026-00440) as executable detection baselines for attack research. Rule IDs are permanent, suitable for academic citation.
ATR rules should not be used as attack generators.
Each ATR rule contains a regex description of an attack pattern and test cases that include true_positives. Those test_cases exist to validate detection capability — not to provide ready-made attack payloads.
The following uses constitute misuse:
- ▸Using true_positive test cases directly as attack payloads against production AI agent systems without authorization.
- ▸Reverse-engineering ATR rule regexes into evasion variants and deploying them against unauthorized targets.
- ▸Building attack automation tools grounded in the ATR rule corpus with the intent of breaching systems that have deployed ATR rules.
- ▸Using ATR data to train attack models aimed at generating adversarial inputs that evade ATR detection.
The MIT license permits any use, including commercial and forked use. This misuse definition is not a legal restriction — it is a clear statement of design intent, so adopters have a clear reference when evaluating risk.
We acknowledge this standard has dual-use properties.
ATR rules describe behavioral signatures of AI agent attacks. Any system that describes attack patterns — CVE databases, Sigma rules, YARA signatures, MITRE ATT&CK — has a similar dual nature. The defensive rationale for public disclosure is that defenders see documented techniques faster than attackers benefit from the documentation.
We have designed two mitigations:
- 1.The true_positive test cases in rules are minimal pattern exemplars, not complete attack chains. They are sufficient to validate detection but require additional attack engineering to function as a real exploit.
- 2.For particularly dangerous rules (high CVSS, actively exploited in the wild), we follow responsible-disclosure timelines and confirm affected vendors have had patch time before filing the PR. ATR-2026-00440 and ATR-2026-00441 (Microsoft Semantic Kernel CVEs) are an example: rules were published after MSRC's public disclosure.
We did not import PyRIT's Pliny L1B3RT4S dataset because Anthropic's usage policy prevented our subagents from consuming it. AdvBench, HarmBench, and JailbreakBench were reclassified as test corpora (data/test-corpora/) rather than rule sources — those datasets describe target behaviors, not wrapped attack payloads.
If you observe ATR being misused.
We cannot technically prevent misuse — the MIT license does not allow it and it would not be the right engineering approach. We do want to know about misuse cases in order to respond in documentation and rule design.