Skip to content
For red teams

Find the attack.
We ship the rule.
2 hours 16 minutes.

Microsoft's Copilot SWE Agent already opens PRs presuming ATR exists. Cisco AI Defense ships 314 of our rules by default. MISP exports them to STIX with your name on them. NeurIPS 2024's HarmBench, AgentDojo, JailbreakBench wire through next.

Why this is worth your time
2h 16m
Disclosure → npm publish

2026-05-07 Microsoft Security disclosed Semantic Kernel CVE-2026-26030. 2026-05-11 06:07 UTC Microsoft Copilot SWE Agent opened a regression-test PR presuming ATR coverage. 08:24 UTC v2.1.2 published on npm with paired rules. End-to-end loop.

7
Tier-1 institutions integrating

Microsoft (Agent Governance Toolkit weekly auto-sync). Cisco AI Defense (314-rule pack in production). CIRCL/MISP (taxonomies + galaxy merged by project lead). OWASP (Project Lead merged with "Welcome to the team"). FINOS, NIST OSCAL, UK Gov AISI in motion.

344
Rules. Each with your name attached forever

Every rule carries author + metadata_provenance.discovered_by. Microsoft AGT, Cisco AI Defense, MISP, OWASP all preserve it on sync. When MISP exports to STIX, attribution survives. When NIST cites the rule, lineage is intact.

0 FP
Required across 3,551 benign samples

6-check quality gate: own-TP must match + 1,784 benign + 157 research-mention + 1,611 cross-rule conflict-free + own true_negative coverage. Rules that fire on the paper describing the attack don't ship.

What you get

Four concrete payoffs depending on who you are.

For
Academic researchers

Citable artifact paired with your attack. Concrete adoption metrics for your CV: number of rules, downstream consumers, fire counts in production telemetry.

Your name appears in the rule file's author + metadata_provenance.discovered_by. When MISP exports the taxonomy to STIX, your attribution propagates. When NIST cites the rule in a publication (in-progress with iMichaela at NIST OSCAL), the lineage is intact.

For
Corporate red teams

Your adversarial work becomes the defensive standard before competitors. Your team shows up as the discoverer in 350+ rules consumed by every major AI-security platform.

Microsoft's Copilot SWE Agent already opens PRs presuming ATR coverage (AGT #1981, closed 2026-05-11). Being the originator of ATR-2026-NNNNN rules is a real authority signal in vendor-eval conversations.

For
Independent researchers

Ship detection without writing regex. Submit positive + negative examples — auto-regex tries 4 variants against the full gate, ~30% pass on first attempt.

Auto-regex deterministic generator clears 0 FP across 3,551 samples (benign + extended + research-mention + cross-rule). The PR labels itself gate-passed and goes straight to maintainer review.

For
Bug bounty hunters

A second income stream for the same attack: the bug-bounty payout AND a permanent detection rule. Combines well with Huntr.dev, Hackerone AI scope, Protect AI bounty programmes.

ATR proposals are MIT-licensed and citable. No NDA conflict — the bounty programme owns the responsible-disclosure window; ATR ships detection after disclosure with public attribution.

What happens after you submit

Probe in. Auto-regex generates. Quality gate validates. Merge if green.

01

You fill the form

3 attack samples, 3 benign lookalikes, attack category, source paper / repo. Takes 3-5 minutes.

No schema to learn. No YAML to write. No fork.

02

Auto-regex tries 4 variants

Deterministic n-gram set-cover algorithm extracts distinctive phrases from your positives, builds an alternation regex, tightens with word boundaries / whitespace anchors / co-occurrence constraints. Each variant runs through the full gate.

Gate = your TPs must match 100% + 1,784-sample benign corpus 0 FP + 157-sample research-mention 0 FP + 0 cross-rule conflicts.

03

Gate clears → complete rule on a PR

PR lands with the gate-passed label. Maintainer reviews regex shape — is it too literal, can it generalize? Usually merged within 1-3 days. If gate didn't clear, stays as stub and a maintainer hand-crafts the regex (still using your test cases as ground truth).

04

Rule auto-propagates downstream

Microsoft AGT syncs weekly. Cisco AI Defense pins to release tags. MISP taxonomy + galaxy pull on every release. OWASP A-S-R-H references rule IDs in fixtures. Your discovered_by field propagates through the whole chain.

Microsoft Semantic Kernel CVE went from public disclosure to v2.1.2 npm publish in 2h 16m. That's the cadence.

Red team tooling in motion

ATR is wired into the red-team ecosystem. Your tool can be next.

These are red-team tools — offensive testing frameworks and adversarial corpora. Defensive frameworks live on /ecosystem.

Integrated (1)

HackAPrompt

Learn Prompting
Corpus ingested

The largest crowd-sourced prompt-injection competition corpus, ~600k attack attempts across all GPT/Claude/PaLM versions.

Clustered the HackAPrompt corpus by attack family and shipped 6 ATR rules (ATR-2026-00442..00447) covering the dominant clusters. Each rule cites the HackAPrompt cluster in its provenance.

Under review (5)
On deck

Red-team integrations queued. One a week. Public schedule.

ATR doesn't only chase the biggest names. Here's the queue for the next six weeks — real dates, real targets. Maintainers who see themselves on the schedule tend to review faster. That's the side-effect of a public commitment.

01
HarmBench
Center for AI Safety · Dan Hendrycks

NeurIPS 2024 spotlight, 320-behavior standardized red-team benchmark used by Anthropic / OpenAI / Google safety teams.

Filing
2026-05-13
scheduled
02
AgentDojo
ETH Zurich SPY Lab · Florian Tramèr

NeurIPS 2024 — the only agent-specific attack benchmark with a real tool-use harness. 78 attack tasks across 4 environments.

Filing
2026-05-16
scheduled
03
JailbreakBench
Princeton · Patrick Chao

NeurIPS 2024 Datasets & Benchmarks. 100-behavior standardized jailbreak leaderboard, fixed eval interface.

Filing
2026-05-20
scheduled
04
TextAttack
QData · ACL 2020

3.1k stars. The reference NLP-adversarial framework. Used in undergraduate security curricula.

Filing
2026-05-22
scheduled
05
Microsoft Counterfit
Microsoft Azure Security

Microsoft's CLI-first AI red-teaming tool. Pairs with PyRIT (discovery) and AGT (governance) for a Microsoft trifecta on shared ATR backend.

Filing
2026-05-26
scheduled
06
InjecAgent
UIUC Kang Lab

Cleanest direct-vs-indirect agent injection taxonomy. 1,054 attack cases, complements AgentDojo.

Filing
2026-05-29
queued
07
GPTFuzz
NDSS 2024

First credible LLM fuzzer. Detection-evasion pass closes the loop on what their fuzzer finds.

Filing
2026-06-02
queued
08
R-Judge
Tongxin Yuan et al.

162-scenario agent safety benchmark. LLM-as-judge + content-rule = complementary defense lanes.

Filing
2026-06-05
queued

Schedule syncs from filed GitHub issues / PRs. Once merged, entries move to "Already Integrated" above. Maintainers wanting earlier engagement: [email protected].

Pipeline is not vapor

Auto-regex already clears 0 FP across 3,551 samples on the sample probe.

$ npx tsx scripts/auto-regex.ts \
    --file proposals/red-team-probes/dan-trust-phrase-wrapping.proposal.yaml \
    --write

[auto-regex] 3 TPs, 3 TNs — generating candidate regex…
[auto-regex] gate corpora: 431 benign + 1,352 extended + 157 research + 1,611 cross-rule TNs
[auto-regex] variant 0: 3 phrases, tp=100%, fp=0
  (benign=0 ext=0 res=0 cross=0) — PASS
[auto-regex] wrote regex to proposals/red-team-probes/...

::auto-regex-summary::
{ "passed": true, "variant": 0, "tp_coverage": 1, "total_fp": 0 }
Submit a probe

10 minutes of form-filling. Permanent attribution. Ships to every major AI security platform.

MIT licensed. No CLA. No telemetry. Forever free. You retain every right to publish the attack itself — ATR only carries the detection.

344 rules deployed across 8 threat categories. Every one has an author field and metadata_provenance.discovered_by intact.