Skip to content
Changelog

Every release.
On the record.

Generated from CHANGELOG.md at build time. Rule IDs are permanent once published. rule_version may bump; the ID stays stable. Full commit history is on GitHub.

v2.2.2

2026-05-13
Added
  • ATR-2026-00523 (skill-compromise): Claude Code Hooks SessionStart pre-trust RCE (CVE-2025-59536). Detects repo-shipped .claude/settings.json registering a SessionStart hook with the startup matcher; the hook command fires before the trust dialog renders. Six layered conditions: literal Hooks-schema shape anchor, shell-interpreter command resolution, pipe-to-shell / inline-exec payload shape, package-manager + credential-tool invocation, repo-path + payload co-occurrence, and pre-trust prose anchor. 7 TP / 6 TN / 3 evasion tests documented. Reported by Aviv Donenfeld and Oded Vanunu (Check Point Research); patched via GHSA-ph6w-f82w-28w6 enhanced trust-dialog warning.
  • ATR-2026-00524 (context-exfiltration): Claude Code ANTHROPIC_BASE_URL credential exfiltration (CVE-2026-21852, CVSS 5.3 Moderate). Detects repo-shipped .claude/settings.json or env-var configuration rebinding ANTHROPIC_BASE_URL to an endpoint outside the Anthropic-controlled allowlist (api.anthropic.com, Vertex *.googleapis.com, Bedrock bedrock*.<region>.amazonaws.com); pre-trust API request leaks the live Authorization: Bearer header. Seven layered conditions: JSON-form allowlist negative-lookahead, shell / dotenv / Dockerfile env-var form, bare public-IPv4 detector (excludes loopback + RFC1918), cleartext-HTTP detector against non-loopback hosts, repo-path + payload co-occurrence, pre-trust prose anchor, and CVE-identifier co-occurrence. 7 TP / 8 TN / 3 evasion tests documented. Patched in Claude Code >= 2.0.65 (GHSA-jh7p-qr78-84p7); PoC at github.com/atiilla/CVE-2026-21852-PoC.
Changed
  • Total rule count: 419 → 421.
Anchors
  • 2 Check Point Research disclosures against Claude Code: CVE-2025-59536 (Hooks SessionStart pre-trust RCE) + CVE-2026-21852 (ANTHROPIC_BASE_URL pre-trust credential exfil). Both share the same root cause class — repo-scoped .claude/settings.json parsed and acted on before the trust dialog gates the action — and both upstream patches narrow the trust-gate ordering. ATR rules anchor on the dangerous config shape so detection survives both pre- and post-patch deployments.

v2.2.0

2026-05-12
Added
  • 75 new rules across 5 categories (prompt-injection 57 + context-exfiltration 6 + tool-poisoning 4 + excessive-autonomy 2 + model-abuse 2). Total rule count: 344 → 419.
  • 5 new rule source integrations: HackAPrompt (EMNLP 2023, 4,780 samples, ATR-2026-00452..00456), NeMo-Guardrails + llm-guard + Promptfoo vendor test suites (94 samples combined, ATR-2026-00500..00505), PromptInject (NeurIPS 2022, ATR-2026-00506..00509), OWASP LLM Top 10 + MITRE ATLAS PoCs (8 standards-aligned rules ATR-2026-00510..00517).
  • 6-framework compliance metadata on all 75 new rules (OWASP / MITRE ATLAS / NIST AI RMF / EU AI Act / ISO 42001 / SAFE-MCP).
  • 53 rules with regex generalized from literal corpus fingerprints to multi-layer structural patterns.
  • 4 rules kept as KEPT-AS-IS corpus fingerprints (ATR-GARAK-a7fcb4e5 + 3 others): cannot generalize without unacceptable FP rate; marked experimental, not for production blocking.
Changed
  • Total rule count: 344 → 419 (357 stable + 62 experimental).
  • NVIDIA garak wrapped rule count: 293 → 419.
Metrics
  • HackAPrompt (4,780 samples): recall 28.6% → 66.2% (+37.6pp), 100% precision maintained.
  • garak (3,475 prompts): ATR-core families ~80%+; per-family breakdown: latentinjection 34.4%, sysprompt_extraction 67.9%, dan 90.2%.
  • PINT (850 samples): 0.25% FP maintained, 0 FP regression.
  • SKILL.md (341 samples): 100% precision, 0% FP, 0 FP regression.
  • 0 FP regression on benign corpus (432 real-world benign skills).
Scope
  • PyRIT Pliny L1B3RT4S: refused — Anthropic usage policy prevented subagent import.
  • AdvBench, HarmBench, JailbreakBench: reclassified to data/test-corpora/; describe target behaviors, not attack payloads.

v2.1.4

2026-05-12
Added
  • ATR-2026-00448 (tool-poisoning): Spring AI MilvusVectorStore filter-expression injection (CVE-2026-41705). Tautology-based deletion-broadening, paren-breakout, in-clause chaining, terminator-DROP, like-ESCAPE bypass primitive. 8 TP / 6 TN / 0 FP on 431-sample benign corpus.
  • ATR-2026-00449 (context-exfiltration): Spring AI PromptChatMemoryAdvisor cross-user memory leakage (CVE-2026-41712). Shared / singleton ChatMemory wiring, placeholder conversation_id, mismatched user-tag windows, wildcard-tenant scope. 8 TP / 6 TN / 0 FP.
  • ATR-2026-00450 (data-poisoning): Spring AI PromptChatMemoryAdvisor memory poisoning (CVE-2026-41713). Persistence-aware ignore-previous, bracketed [SYSTEM-MEMORY-PERSIST] canaries, smuggled role-tag prefixes, REMEMBER:-shaped persona override, ChatMemory.add(SystemMessage) sink-level concat. 8 TP / 6 TN / 0 FP.
  • ATR-2026-00451 (privilege-escalation): LiteLLM proxy admin endpoint SQL injection (CVE-2026-42208, CISA KEV 2026-05-08, CVSS 9.3, federal remediation due 2026-05-11). Tautology + comment-out, stacked DROP / TRUNCATE, UNION SELECT exfil, pg_sleep blind, Postgres metadata recon, error-based extractvalue / updatexml / load_extension. 8 TP / 6 TN / 0 FP.
Changed
  • Total rule count: 344 → 348.
Anchors
  • 3 Spring AI CVEs (CVE-2026-41705 / -41712 / -41713) disclosed 2026-05-08 in Spring Security Advisory; patches in Spring AI >= 1.0.0.
  • 1 CISA Known Exploited Vulnerabilities catalog entry: CVE-2026-42208 added 2026-05-08, federal remediation due 2026-05-11, active exploitation observed against financial services and healthcare deployments; patch in LiteLLM >= 1.48.3.

v2.1.3

2026-05-11
Added
  • 6 HackAPrompt-cluster rules (ATR-2026-00442..00447) + benchmark harness (PR #51). Auto-published via tc-pr-back release workflow after PR #51 merge.
Changed
  • Total rule count: 338 → 344.

v2.1.2

2026-05-11
Added
  • ATR-2026-00440 (agent-manipulation): Microsoft Semantic Kernel CVE-2026-26030 — In-Memory Vector Store lambda+eval RCE. AST-traversal-via-mro primitives, BuiltinImporter reflective access, Function-constructor variants. 8 TP / 5 TN / 0 FP on 466-sample benign corpus.
  • ATR-2026-00441 (privilege-escalation): Microsoft Semantic Kernel CVE-2026-25592 — SessionsPythonPlugin arbitrary file write to autostart paths. Windows Start Menu Startup, XDG autostart, systemd-user, cron, macOS LaunchAgents/Daemons, Windows Registry Run-key persistence. 7 TP / 5 TN / 0 FP.
  • `src/redact.ts` new module: redactMatchedValue() + redactMatchedValues() helpers. ATR-consuming integrations can run ATRMatch.matchedPatterns entries through this before logging. Recognises AWS keys, GitHub tokens, Slack tokens, OpenAI/Anthropic keys, Bearer creds, JWTs, PEM keys. 13 unit tests. Additive opt-in API — no existing API changes.
Changed
  • Total rule count: 336 → 338.
Loop closure
  • Microsoft Copilot SWE Agent opened microsoft/agent-governance-toolkit#1981 2026-05-11 06:07 UTC with regression-test fixtures presuming ATR detection. PR #50 merged + v2.1.2 npm + GitHub release published within 2h 16m. Closing-the-loop comment mapped each fixture to ATR rule IDs.

v2.1.1

2026-05-10
Added
  • 6 ATR rules covering 7 gap CVEs identified by Phase 1 audit (CVSS 9.1–10.0). PR #46 + follow-up batches.
  • STIX 2.1 extension for x-atr-rule custom SDO (PR #49) — self-published extension definition; ATR rules can now be expressed as STIX SDOs for STIX-compliant consumers.
Adopted (external)
  • MISP/misp-taxonomies#323 merged 2026-05-10 by adulau (MISP project lead). 10 ATR predicates + 330 rule IDs as MISP machine tags.
  • MISP/misp-galaxy#1207 merged 2026-05-10 by adulau. 336 cluster values with kill-chain, severity, CVE / OWASP LLM / MITRE ATLAS cross-refs. 10,408 lines.

v2.1.0

2026-05-09
Added
  • NIST AI RMF mapping v0.2 — 100% coverage of all 4 AI RMF functions (Govern / Map / Measure / Manage), 72 controls + 31 cross-reference links + worked example profile. Schema-valid OSCAL catalog published under Agent-Threat-Rule/ai-rmf-oscal-catalog (PR #46).
  • Dedicated /compliance/nist-ai-rmf page on website with 314 → 330 rule count refresh sitewide.
Adopted (external)
  • OWASP/Agent-Security-Regression-Harness#74 merged 2026-05-11 by mertsatilmaz (OWASP Project Lead): "Welcome to the team."

v2.0.11

2026-04-21
Added
  • 193 new rules covering the full NVIDIA garak probe corpus (ATR-00300 ~ ATR-00414), bringing total to 311 rules - agent-manipulation: DanInTheWild community jailbreak library batch 7-9, AntiDAN inverted persona, ChatGPT DevMode RANTI, Image-Markdown unlock, Ablation_Dan_11_0 generator-name binding, AutoDAN hierarchical suffix, Goodside ThreatJSON death coercion, Doctor XML policy puppetry - prompt-injection: latentinjection separator/keyword, leakreplay cloze MASK, glitch token repeat-back (SolidGoldMagikarp et al.), echo SQL-jinja template injection, DRA parenthesis reconstruction, phrasing past/future tense, LMRC harm category, FigStep image-list multimodal, continuation ethnic slur, backspace deletion character, LLM special-token boundary, toxic continuation prefix, snowball impossible reasoning, ecoji/base2048/variant-selector/sneaky-bits encoding injection, zalgo/braille/sata encoded attacks - context-exfiltration: markdown image URL exfil + XSS, API key generation and partial-key completion - model-abuse: malwaregen code generation (TopLevel + SubFunctions), HuggingFace unsafe model artifact load - tool-poisoning: ANSI escape elicitation request
  • Word-boundary (\b) anchoring on grandma-roleplay rule to prevent partial-word FP (late/gran/read)
  • metadata_provenance.garak_probe field for ATR-00267 (GCGCached), ATR-00301 (TAPCached), and all new rules
  • Missing true_negatives added to 10 existing DAN-family rules (ATR-00352–00376) to meet ≥5 TN safety gate requirement
Changed
  • Total rule count: 113 → 311
  • garak in-the-wild benchmark recall: 69.7% → 97.1% (646/666 prompts detected against garak community jailbreak corpus)
  • SKILL.md precision/FP: maintained 100% precision, 0% FP on 498-sample benign corpus
  • FP mitigations applied to new rules (response action [alert] only for medium-FP-risk rules — phrasing, API key, malwaregen, SQL-jinja)
Fixed
  • ATR-00402 (grandma roleplay): added word boundaries to prevent late/gran/read substring FP in Firebase documentation
  • ATR-00401 ([MASK]): required elicitation context to prevent NLP-documentation FP
  • ATR-00403 (glitch tokens): required elicitation verb before token names (StreamerBot etc.) to prevent legitimate software reference FP
  • ATR-00411 (API key gen): negative lookahead on "from your dashboard/console" to exclude documentation requests

v2.0.10

2026-04-21
Added
  • garak coverage batch 10: ansiescape, badchars, realtoxicityprompts, topic, fileformats (ATR-00393~00398)

v2.0.9

2026-04-20
Added
  • garak DanInTheWild coverage batch 8-9: jailbreak templates, emoji-flag, prompt-browser (ATR-00377~00392)

v2.0.0

2026-04-15
BREAKING
  • Compound detection gate: MCP rules now require 30%+ conditions to match in skill context. This prevents over-triggering when MCP-specific patterns appear in legitimate SKILL.md documentation.
  • Code block suppression: Skill rules no longer suppress matches inside code blocks (they are instructions, not documentation examples).
  • All fields resolve to content in skill context: tool_description, tool_args, user_input all map to the full skill content for static analysis.
Added
  • 26 new rules (87 to 113 total, 9 categories): - ATR-2026-00149: Compound exfiltration (12 patterns: SSH archive, wallet, browser, DNS, IMDS) - ATR-2026-00158/159/160: TC-crystallized rules (first rules generated end-to-end by Threat Cloud) - ATR-2026-00161: Cross-tool shadowing via IMPORTANT tag (from Invariant Labs PoC) - ATR-2026-00162: Credential exfil combo - ATR-2026-00163: Hidden override instructions - ATR-2026-00164: Scope hijack - Plus 19 more across prompt-injection, tool-poisoning, and data-exfiltration categories
  • RFC-001 v1.1 (ATR Quality Standard): - Two-Dimensional Compliance Model (metadata presence + provenance) - L0-L5 Review Tier Levels (first standard to include LLM-assisted review as formal tier) - Community Signal Aggregation (confirmations, FP reports, evasion reports) - Multi-Runtime Compatibility (14 runtimes: Claude Code, Cursor, Hermes, OpenAI Assistants, Google A2A, etc.) - Relaxed experimental gate (3/3/0) for community velocity; stable tier retains full 5/5/3 - Future Work: RFC-002 (Detection Types), RFC-003 (Collective Defense), RFC-004 (Enterprise)
  • GOVERNANCE.md v1.1: - Permanent MIT license commitment (never BSL/SSPL) - MITRE ATLAS positioning: ATR = detection rules complement to ATLAS (like Sigma to ATT&CK) - ATR Numbering Authority specification - Technical Advisory Group roadmap
  • 96,096 real-world skills scanned (OpenClaw 56K + Skills.sh 3K + Hermes 123 + ClawHub 36K + MCP Registry 5K) - 751 malware skills discovered (hightower6eu 354, sakaen736jih 212, 52yuanchangxing 137) - Research report: docs/research/openclaw-malware-campaign-2026-04.md - NousResearch notified (hermes-agent#9809)
  • Threat Cloud blacklist: 554 entries uploaded from wild scan
  • 4 export formats: SARIF v2.1.0, Splunk SPL, Elasticsearch DSL, generic regex (714 portable patterns)
  • Exponential backoff for npm registry crawl (fixes Daily Ecosystem Scan rate limiting)
Changed
  • Engine: compound gate for context-aware matching
  • Confidence formula: cross-context penalty (0.7x) for rules firing outside their tested runtime
  • SKILL.md denylist reduced from 22 to 10 rules (re-enabled low-FP rules for better coverage)
Metrics
  • 113 rules (7 stable, 85 experimental, 21 draft)
  • 361 tests passing
  • SKILL.md benchmark (498 corpus): 100% recall, 97% precision, 0.20% FP rate
  • MCP benchmark (PINT 850): 62.7% recall, 99.6% precision
  • Wild scan (96K): 1.35% flag rate, 751 confirmed malware
  • OWASP Agentic Top 10: 10/10 coverage
  • MITRE ATLAS: 100/113 rules mapped
  • SAFE-MCP: 91.8% technique coverage
  • Avg scan latency: 14ms per file
Ecosystem
  • Cisco AI Defense: 34 ATR rules shipped in production (PR #79 merged)
  • OWASP: Attack examples contributed (PR #814 merged)
  • NousResearch: Malware campaign reported (hermes-agent#9809)

v1.0.0

2026-04-06
BREAKING
  • Rule IDs: 3-digit → 5-digit (ATR-2026-001ATR-2026-00001). Zero-padded, detection logic unchanged.
Added
  • ATR-SPEC-v1.md — Formal rule format specification. Third parties can build conforming engines in any language.
  • GOVERNANCE.md — Contribution process, quality gates, severity rubric.
  • `scan_target` metadata — Every rule declares mcp or skill. Engines filter by scan context.
  • `rule_version` field — All rules carry version number. Bump on detection logic changes.
  • Unified CLIatr scan auto-detects JSON (MCP) vs .md (SKILL.md).
  • `ScanResult` type — Unified output with scan_type, content_hash (SHA-256).
  • MCP server `atr_scan_skill` tool — 7 tools total.
  • 6 new skill rules: context poisoning (00125), rug pull setup (00126), subcommand overflow (00127), HTML comment hidden payload (00128), unicode smuggling (00129), exfil URL in instructions (00135).
Metrics
  • 87 rules (75 MCP + 12 skill)
  • 53,399 real-world skills scanned (skills.sh + OpenClaw)
  • 670 confirmed malicious skills detected
  • SKILL.md recall: 94.4%, FP rate: 0.096%
  • MCP recall: 62.7% (PINT 850, unchanged)

v0.3.0

2026-03-18
Added
  • Evaluation framework (src/eval/): - npm run eval: 341-sample corpus (321 attacks + 20 benign), 9 attack categories - npm run eval:pint: 850-sample external PINT benchmark - Per-rule quality metrics (TP/FP/matchCount per rule) - Confusion matrix, precision/recall/F1, latency percentiles - Regression gate (auto-fail on metric degradation) - JSON report output (data/eval-report.json)
  • CI gate (.github/workflows/eval.yml): typecheck + test + eval + validate on PR
  • 279 auto-extracted corpus samples from rule test_cases
  • 8 new detection layers in ATR-2026-001: - forget-everything shorthand, task switching, system prompt extraction, - praise-then-redirect, German formal/informal, French injection patterns
  • PINT benchmark integration (deepset/prompt-injections + Lakera gandalf datasets)
Changed
  • Embedding similarity threshold: 0.82 -> 0.65 (10 extra TP, 0 extra FP on PINT)
  • Test count: 225 -> 246 (+21 eval framework tests)
Fixed
  • shadow-evaluator.ts type error (TS2352)
  • Removed external product references from ATR-FRAMEWORK-SPEC.md
  • Added temp file patterns to .gitignore
Benchmark Results (honest numbers)
  • Self-corpus (341 samples): Precision 100%, Recall 99.4%, F1 99.5%
  • PINT external (850 samples): Precision 99.4%, Recall 39.9%, F1 57.0%
  • Only 6/61 rules fire on external attacks
  • See LIMITATIONS.md for full analysis

v0.2.3

2026-03-16
Added
  • 9 new rules from Threat Cloud community promotion (ATR-2026-100~108): consent bypass, trust escalation, disguised analytics exfiltration, hidden safety bypass, persona hijacking, silent action concealment, schema-description contradiction, delayed execution bypass, Sybil attack
  • Python engine (pyATR) v0.2.0: validate, test, stats CLI commands, 48 tests
  • Splunk SPL converter (atr convert splunk)
  • Elastic Query DSL converter (atr convert elastic)
  • Layer 3 LLM-as-judge prompt templates (docs/layer3-prompt-templates.md)
  • Automated scan pipeline (scripts/auto-scan-pipeline.sh)
  • Deployment guide for external teams (docs/deployment-guide.md)
  • MCP ecosystem security audit report: 1,295 packages, 14,299 tools
  • npm crawler with pagination (795 → 2,769 discoverable packages)
Fixed
  • CLI test runner: handle tool_description field, fix event type mapping
  • All 61 rules pass embedded test cases (556/556, 100%)
  • CJK test cases moved to evasion_tests (honest: regex can't match them)
  • Removed all external product references for ATR independence
  • Fixed pyATR URLs pointing to wrong GitHub org
Stats
  • 61 rules (44 experimental + 17 draft)
  • 556 test cases (100% pass rate)
  • 164 TypeScript tests + 48 Python tests = 212 engine tests
  • 12 SIEM converter tests

v0.2.2

2026-03-14
Fixed
  • ReDoS vulnerability in SSRF rule (ATR-2026-013) — O(n^2) backtracking on long hostnames
  • SSRF rule false positive on filesystem paths like /home/user/
  • tool_args field extraction fallback for tool_call events
Added
  • True negatives for 21 rules with insufficient test coverage (7 had zero, 14 had only 1)
  • Vitest coverage reporting with v8 provider (60%+ threshold on core modules)

v0.2.1

2026-03-10
Changed
  • Standardized 17 predicted rule IDs (ATR-PRED → ATR-2026-080~096)
  • Fixed rule validator for skill-compromise category
  • Toned down coverage claims to reflect actual verification status

v0.1.0-rc2

2026-03-09
Added
  • 32 initial experimental detection rules across 9 attack categories
  • TypeScript reference engine with SessionTracker
  • OWASP Top 10 for Agentic Applications (2026) mapping (6 covered, 2 partial, 2 gaps)
  • 13 CVE reference mappings across 16 rules (pattern-based, not empirically verified)
  • OWASP LLM Top 10 (2025) mapping (7 covered, 3 gaps)
  • MITRE ATLAS technique references
  • JSON Schema specification (spec/atr-schema.yaml)
  • Built-in true positive and true negative test cases for every rule
  • Attack corpus validation tests
  • Coverage report (COVERAGE.md)
Attack Categories
  • Prompt Injection (5 rules)
  • Tool Poisoning (4 rules)
  • Context Exfiltration (3 rules)
  • Agent Manipulation (3 rules)
  • Privilege Escalation (2 rules)
  • Excessive Autonomy (2 rules)
  • Skill Compromise (7 rules)
  • Data Poisoning (1 rule)
  • Model Security (2 rules)