Research

Data, benchmarks, and honest limitations.

ATR publishes evasion tests openly. We tell you what we can't catch.

v4April 2026 · 23 pages · 65 references

The Collapse of Trust: Security Architecture for the Age of Autonomous AI Agents

108 detection rules, dual benchmarks (PINT 99.6% precision + SKILL.md 96.9% recall), 53K mega scan, Cisco adoption. The complete ATR thesis.

Zenodo DOI PDF (GitHub)SSRN: 6457179

NEWApril 2026 · 10 pages · 25 references

53,577 Skills, 946 Threats: The First Large-Scale Security Audit of the AI Agent Ecosystem

The largest AI agent security scan to date. 53,577 skills, 946 threats, 875 critical, 0% false positive. Tool description poisoning accounts for 71% of detections.

PDF (GitHub)

NEWApril 2026 · 14 pages · 22 references

30 CVEs in 60 Days: The Model Context Protocol Attack Surface

Empirical analysis of the MCP attack surface. 30 CVEs in 60 days, 38% zero authentication, 7-class attack taxonomy, 53K ecosystem scan. 15x faster than Docker's first two years.

PDF (GitHub)

Benchmarks

Tested with our own corpus AND external benchmarks we've never seen before.

PINT (External, Adversarial)

99.6%

Precision

61.4%

Recall

850 samples

Self-Test (Own Rules)

100%

Precision

88.5%

Recall

341

Samples

The gap between 99.6% precision and 61.4% recall is expected. Regex catches known patterns but misses paraphrases and multilingual attacks.

SKILL.md Detection Benchmark

Tested against 498 real-world OpenClaw SKILL.md files (32 malicious + 466 benign). Layer A = explicit malicious instructions. Layer C = obfuscated / hidden attacks.

Overall Performance

96.9%

Recall

100%

Precision

98.4

False positive rate

498

Real-world samples

By Attack Layer

Layer A100%

Explicit malicious instructions — 24/24 detected

Layer C87.5%

Obfuscated attacks — 7/8 detected (1 rug-pull timebomb missed)

Benign0 FP

466 benign SKILL.md files — zero false positives

Ecosystem Scan Data

Real scans of real MCP skill registries.

Mega Scan (OpenClaw + Skills.sh)

53,577

skills scanned

875

CRITICAL

HIGH

946

Total flagged

ClawHub Registry Scan

36,394

skills crawled

182

CRITICAL

1,124

HIGH

9,676

With source code

Download raw data (CSV)→Download stats (JSON)→

What ATR Cannot Detect

We publish this section because honest limitations build more trust than false confidence.

Paraphrase Attacks

Any regex rule can be bypassed by semantically equivalent rephrasing. "Ignore previous instructions" is detected; "please set aside the guidance you were given earlier" is not.

Multi-Language Attacks

All patterns are English-only. Injection payloads in Spanish, Chinese, Arabic, or any other language bypass all rules completely.

Context-Dependent Attacks

"Delete all records" might be legitimate or malicious. Regex matches patterns without understanding authorization context.

Protocol-Level Attacks

ATR inspects content, not transport. Message replay, schema manipulation, MCP transport-level MITM are invisible.

Multi-Turn Behavioral Patterns

Gradual trust escalation across 20 turns, where no single message is detectable, is not correlated. ATR evaluates events independently.

Novel Attacks

By definition, regex cannot detect attack patterns that don't exist yet. New techniques require new rules.

Full limitations document with 64 evasion tests→