Find the attack.
We ship the rule.
2 hours 16 minutes.
Microsoft's Copilot SWE Agent already opens PRs presuming ATR exists. Cisco AI Defense ships 314 of our rules by default. MISP exports them to STIX with your name on them. NeurIPS 2024's HarmBench, AgentDojo, JailbreakBench wire through next.
2026-05-07 Microsoft Security disclosed Semantic Kernel CVE-2026-26030. 2026-05-11 06:07 UTC Microsoft Copilot SWE Agent opened a regression-test PR presuming ATR coverage. 08:24 UTC v2.1.2 published on npm with paired rules. End-to-end loop.
Microsoft (Agent Governance Toolkit weekly auto-sync). Cisco AI Defense (314-rule pack in production). CIRCL/MISP (taxonomies + galaxy merged by project lead). OWASP (Project Lead merged with "Welcome to the team"). FINOS, NIST OSCAL, UK Gov AISI in motion.
Every rule carries author + metadata_provenance.discovered_by. Microsoft AGT, Cisco AI Defense, MISP, OWASP all preserve it on sync. When MISP exports to STIX, attribution survives. When NIST cites the rule, lineage is intact.
6-check quality gate: own-TP must match + 1,784 benign + 157 research-mention + 1,611 cross-rule conflict-free + own true_negative coverage. Rules that fire on the paper describing the attack don't ship.
Four concrete payoffs depending on who you are.
Citable artifact paired with your attack. Concrete adoption metrics for your CV: number of rules, downstream consumers, fire counts in production telemetry.
Your name appears in the rule file's author + metadata_provenance.discovered_by. When MISP exports the taxonomy to STIX, your attribution propagates. When NIST cites the rule in a publication (in-progress with iMichaela at NIST OSCAL), the lineage is intact.
Your adversarial work becomes the defensive standard before competitors. Your team shows up as the discoverer in 350+ rules consumed by every major AI-security platform.
Microsoft's Copilot SWE Agent already opens PRs presuming ATR coverage (AGT #1981, closed 2026-05-11). Being the originator of ATR-2026-NNNNN rules is a real authority signal in vendor-eval conversations.
Ship detection without writing regex. Submit positive + negative examples — auto-regex tries 4 variants against the full gate, ~30% pass on first attempt.
Auto-regex deterministic generator clears 0 FP across 3,551 samples (benign + extended + research-mention + cross-rule). The PR labels itself gate-passed and goes straight to maintainer review.
A second income stream for the same attack: the bug-bounty payout AND a permanent detection rule. Combines well with Huntr.dev, Hackerone AI scope, Protect AI bounty programmes.
ATR proposals are MIT-licensed and citable. No NDA conflict — the bounty programme owns the responsible-disclosure window; ATR ships detection after disclosure with public attribution.
Probe in. Auto-regex generates. Quality gate validates. Merge if green.
You fill the form
3 attack samples, 3 benign lookalikes, attack category, source paper / repo. Takes 3-5 minutes.
No schema to learn. No YAML to write. No fork.
Auto-regex tries 4 variants
Deterministic n-gram set-cover algorithm extracts distinctive phrases from your positives, builds an alternation regex, tightens with word boundaries / whitespace anchors / co-occurrence constraints. Each variant runs through the full gate.
Gate = your TPs must match 100% + 1,784-sample benign corpus 0 FP + 157-sample research-mention 0 FP + 0 cross-rule conflicts.
Gate clears → complete rule on a PR
PR lands with the gate-passed label. Maintainer reviews regex shape — is it too literal, can it generalize? Usually merged within 1-3 days. If gate didn't clear, stays as stub and a maintainer hand-crafts the regex (still using your test cases as ground truth).
Rule auto-propagates downstream
Microsoft AGT syncs weekly. Cisco AI Defense pins to release tags. MISP taxonomy + galaxy pull on every release. OWASP A-S-R-H references rule IDs in fixtures. Your discovered_by field propagates through the whole chain.
Microsoft Semantic Kernel CVE went from public disclosure to v2.1.2 npm publish in 2h 16m. That's the cadence.
ATR is wired into the red-team ecosystem. Your tool can be next.
These are red-team tools — offensive testing frameworks and adversarial corpora. Defensive frameworks live on /ecosystem.
HackAPrompt
The largest crowd-sourced prompt-injection competition corpus, ~600k attack attempts across all GPT/Claude/PaLM versions.
Clustered the HackAPrompt corpus by attack family and shipped 6 ATR rules (ATR-2026-00442..00447) covering the dominant clusters. Each rule cites the HackAPrompt cluster in its provenance.
The reference open-source LLM vulnerability scanner. 50+ probe families, jmartin-tech + leondz maintainers.
Wrapped 330 ATR rules as garak detectors. PR #1676 cleared two review rounds; in-the-wild benchmark posted 97.1% recall (646/666) on garak's own community jailbreak corpus.
The toolkit Microsoft uses internally to red-team production LLM products. Roman Lutz leads.
Added an ATR dataset loader exposing the rule corpus as PyRIT attack sources. Roman reviewed within 2 min on first push; iterating on doc shape.
The original academic benchmark that started prompt-injection research. 8.2k stars; cited by every prompt-injection paper since.
Issue #9 proposes a corpus-to-ATR pipeline turning every PromptInject attack into a paired ATR rule. The reference academic benchmark gets defensive parity.
10k stars, used by red teams at Klarna, Discord, Anduril. Promptfoo runs adversarial tests; ATR catches what Promptfoo found.
PR #8529 adds an MCP red-team example using ATR as the deterministic defense layer. Promptfoo runs the probe; ATR rules return the verdict.
A CTF-style training target with 10 intentionally-vulnerable MCP scenarios. The DVWA of agent security.
PR #29 ships the blue-team detection guide — every CTF challenge gets a paired ATR rule so trainees learn detection alongside the attack.
Red-team integrations queued. One a week. Public schedule.
ATR doesn't only chase the biggest names. Here's the queue for the next six weeks — real dates, real targets. Maintainers who see themselves on the schedule tend to review faster. That's the side-effect of a public commitment.
NeurIPS 2024 spotlight, 320-behavior standardized red-team benchmark used by Anthropic / OpenAI / Google safety teams.
NeurIPS 2024 — the only agent-specific attack benchmark with a real tool-use harness. 78 attack tasks across 4 environments.
NeurIPS 2024 Datasets & Benchmarks. 100-behavior standardized jailbreak leaderboard, fixed eval interface.
3.1k stars. The reference NLP-adversarial framework. Used in undergraduate security curricula.
Microsoft's CLI-first AI red-teaming tool. Pairs with PyRIT (discovery) and AGT (governance) for a Microsoft trifecta on shared ATR backend.
Cleanest direct-vs-indirect agent injection taxonomy. 1,054 attack cases, complements AgentDojo.
First credible LLM fuzzer. Detection-evasion pass closes the loop on what their fuzzer finds.
162-scenario agent safety benchmark. LLM-as-judge + content-rule = complementary defense lanes.
Schedule syncs from filed GitHub issues / PRs. Once merged, entries move to "Already Integrated" above. Maintainers wanting earlier engagement: [email protected].
Auto-regex already clears 0 FP across 3,551 samples on the sample probe.
$ npx tsx scripts/auto-regex.ts \
--file proposals/red-team-probes/dan-trust-phrase-wrapping.proposal.yaml \
--write
[auto-regex] 3 TPs, 3 TNs — generating candidate regex…
[auto-regex] gate corpora: 431 benign + 1,352 extended + 157 research + 1,611 cross-rule TNs
[auto-regex] variant 0: 3 phrases, tp=100%, fp=0
(benign=0 ext=0 res=0 cross=0) — PASS
[auto-regex] wrote regex to proposals/red-team-probes/...
::auto-regex-summary::
{ "passed": true, "variant": 0, "tp_coverage": 1, "total_fp": 0 }10 minutes of form-filling. Permanent attribution. Ships to every major AI security platform.
MIT licensed. No CLA. No telemetry. Forever free. You retain every right to publish the attack itself — ATR only carries the detection.
344 rules deployed across 8 threat categories. Every one has an author field and metadata_provenance.discovered_by intact.