Skip to content
Research

Data, benchmarks, and honest limitations.

ATR publishes evasion tests openly. We tell you what we can't catch.

v4April 2026 · 23 pages · 65 references
The Collapse of Trust: Security Architecture for the Age of Autonomous AI Agents

108 detection rules, dual benchmarks (PINT 99.6% precision + SKILL.md 96.9% recall), 53K mega scan, Cisco adoption. The complete ATR thesis.

NEWApril 2026 · 10 pages · 25 references
53,577 Skills, 946 Threats: The First Large-Scale Security Audit of the AI Agent Ecosystem

The largest AI agent security scan to date. 53,577 skills, 946 threats, 875 critical, 0% false positive. Tool description poisoning accounts for 71% of detections.

NEWApril 2026 · 14 pages · 22 references
30 CVEs in 60 Days: The Model Context Protocol Attack Surface

Empirical analysis of the MCP attack surface. 30 CVEs in 60 days, 38% zero authentication, 7-class attack taxonomy, 53K ecosystem scan. 15x faster than Docker's first two years.

Benchmarks

Tested with our own corpus AND external benchmarks we've never seen before.

PINT (External, Adversarial)
99.6%
Precision
61.4%
Recall
76
F1
850 samples
Self-Test (Own Rules)
100%
Precision
88.5%
Recall
341
Samples

The gap between 99.6% precision and 61.4% recall is expected. Regex catches known patterns but misses paraphrases and multilingual attacks.

SKILL.md Detection Benchmark

Tested against 498 real-world OpenClaw SKILL.md files (32 malicious + 466 benign). Layer A = explicit malicious instructions. Layer C = obfuscated / hidden attacks.

Overall Performance
96.9%
Recall
100%
Precision
98.4
F1
0%
False positive rate
498
Real-world samples
By Attack Layer
Layer A100%
Explicit malicious instructions — 24/24 detected
Layer C87.5%
Obfuscated attacks — 7/8 detected (1 rug-pull timebomb missed)
Benign0 FP
466 benign SKILL.md files — zero false positives

Ecosystem Scan Data

Real scans of real MCP skill registries.

Mega Scan (OpenClaw + Skills.sh)
53,577
skills scanned
875
CRITICAL
52
HIGH
946
Total flagged
ClawHub Registry Scan
36,394
skills crawled
182
CRITICAL
1,124
HIGH
9,676
With source code

What ATR Cannot Detect

We publish this section because honest limitations build more trust than false confidence.

Paraphrase Attacks

Any regex rule can be bypassed by semantically equivalent rephrasing. "Ignore previous instructions" is detected; "please set aside the guidance you were given earlier" is not.

Multi-Language Attacks

All patterns are English-only. Injection payloads in Spanish, Chinese, Arabic, or any other language bypass all rules completely.

Context-Dependent Attacks

"Delete all records" might be legitimate or malicious. Regex matches patterns without understanding authorization context.

Protocol-Level Attacks

ATR inspects content, not transport. Message replay, schema manipulation, MCP transport-level MITM are invisible.

Multi-Turn Behavioral Patterns

Gradual trust escalation across 20 turns, where no single message is detectable, is not correlated. ATR evaluates events independently.

Novel Attacks

By definition, regex cannot detect attack patterns that don't exist yet. New techniques require new rules.