Data, benchmarks, and honest limitations.
ATR publishes evasion tests openly. We tell you what we can't catch.
108 detection rules, dual benchmarks (PINT 99.6% precision + SKILL.md 96.9% recall), 53K mega scan, Cisco adoption. The complete ATR thesis.
The largest AI agent security scan to date. 53,577 skills, 946 threats, 875 critical, 0% false positive. Tool description poisoning accounts for 71% of detections.
Empirical analysis of the MCP attack surface. 30 CVEs in 60 days, 38% zero authentication, 7-class attack taxonomy, 53K ecosystem scan. 15x faster than Docker's first two years.
Benchmarks
Tested with our own corpus AND external benchmarks we've never seen before.
The gap between 99.6% precision and 61.4% recall is expected. Regex catches known patterns but misses paraphrases and multilingual attacks.
SKILL.md Detection Benchmark
Tested against 498 real-world OpenClaw SKILL.md files (32 malicious + 466 benign). Layer A = explicit malicious instructions. Layer C = obfuscated / hidden attacks.
Ecosystem Scan Data
Real scans of real MCP skill registries.
What ATR Cannot Detect
We publish this section because honest limitations build more trust than false confidence.
Any regex rule can be bypassed by semantically equivalent rephrasing. "Ignore previous instructions" is detected; "please set aside the guidance you were given earlier" is not.
All patterns are English-only. Injection payloads in Spanish, Chinese, Arabic, or any other language bypass all rules completely.
"Delete all records" might be legitimate or malicious. Regex matches patterns without understanding authorization context.
ATR inspects content, not transport. Message replay, schema manipulation, MCP transport-level MITM are invisible.
Gradual trust escalation across 20 turns, where no single message is detectable, is not correlated. ATR evaluates events independently.
By definition, regex cannot detect attack patterns that don't exist yet. New techniques require new rules.