For red teams

Find the attack.
We ship the rule.
2 hours 16 minutes.

Microsoft's Copilot SWE Agent already opens PRs presuming ATR exists. Cisco AI Defense ships 314 of our rules by default. MISP exports them to STIX with your name on them. NeurIPS 2024's HarmBench, AgentDojo, JailbreakBench wire through next.

Submit a probe →See who's next →How it works →

Why this is worth your time

2h 16m

Disclosure → npm publish

2026-05-07 Microsoft Security disclosed Semantic Kernel CVE-2026-26030. 2026-05-11 06:07 UTC Microsoft Copilot SWE Agent opened a regression-test PR presuming ATR coverage. 08:24 UTC v2.1.2 published on npm with paired rules. End-to-end loop.

Tier-1 institutions integrating

Microsoft (Agent Governance Toolkit weekly auto-sync). Cisco AI Defense (314-rule pack in production). CIRCL/MISP (taxonomies + galaxy merged by project lead). OWASP (Project Lead merged with "Welcome to the team"). FINOS, NIST OSCAL, UK Gov AISI in motion.

344

Rules. Each with your name attached forever

Every rule carries author + metadata_provenance.discovered_by. Microsoft AGT, Cisco AI Defense, MISP, OWASP all preserve it on sync. When MISP exports to STIX, attribution survives. When NIST cites the rule, lineage is intact.

0 FP

Required across 3,551 benign samples

6-check quality gate: own-TP must match + 1,784 benign + 157 research-mention + 1,611 cross-rule conflict-free + own true_negative coverage. Rules that fire on the paper describing the attack don't ship.

What you get

Four concrete payoffs depending on who you are.

For

Academic researchers

Citable artifact paired with your attack. Concrete adoption metrics for your CV: number of rules, downstream consumers, fire counts in production telemetry.

Your name appears in the rule file's author + metadata_provenance.discovered_by. When MISP exports the taxonomy to STIX, your attribution propagates. When NIST cites the rule in a publication (in-progress with iMichaela at NIST OSCAL), the lineage is intact.

For

Corporate red teams

Your adversarial work becomes the defensive standard before competitors. Your team shows up as the discoverer in 350+ rules consumed by every major AI-security platform.

Microsoft's Copilot SWE Agent already opens PRs presuming ATR coverage (AGT #1981, closed 2026-05-11). Being the originator of ATR-2026-NNNNN rules is a real authority signal in vendor-eval conversations.

For

Independent researchers

Ship detection without writing regex. Submit positive + negative examples — auto-regex tries 4 variants against the full gate, ~30% pass on first attempt.

Auto-regex deterministic generator clears 0 FP across 3,551 samples (benign + extended + research-mention + cross-rule). The PR labels itself gate-passed and goes straight to maintainer review.

For

Bug bounty hunters

A second income stream for the same attack: the bug-bounty payout AND a permanent detection rule. Combines well with Huntr.dev, Hackerone AI scope, Protect AI bounty programmes.

ATR proposals are MIT-licensed and citable. No NDA conflict — the bounty programme owns the responsible-disclosure window; ATR ships detection after disclosure with public attribution.

What happens after you submit

Probe in. Auto-regex generates. Quality gate validates. Merge if green.

You fill the form

3 attack samples, 3 benign lookalikes, attack category, source paper / repo. Takes 3-5 minutes.

No schema to learn. No YAML to write. No fork.

Auto-regex tries 4 variants

Deterministic n-gram set-cover algorithm extracts distinctive phrases from your positives, builds an alternation regex, tightens with word boundaries / whitespace anchors / co-occurrence constraints. Each variant runs through the full gate.

Gate = your TPs must match 100% + 1,784-sample benign corpus 0 FP + 157-sample research-mention 0 FP + 0 cross-rule conflicts.

Gate clears → complete rule on a PR

PR lands with the gate-passed label. Maintainer reviews regex shape — is it too literal, can it generalize? Usually merged within 1-3 days. If gate didn't clear, stays as stub and a maintainer hand-crafts the regex (still using your test cases as ground truth).

Rule auto-propagates downstream

Microsoft AGT syncs weekly. Cisco AI Defense pins to release tags. MISP taxonomy + galaxy pull on every release. OWASP A-S-R-H references rule IDs in fixtures. Your discovered_by field propagates through the whole chain.

Microsoft Semantic Kernel CVE went from public disclosure to v2.1.2 npm publish in 2h 16m. That's the cadence.

Red team tooling in motion

ATR is wired into the red-team ecosystem. Your tool can be next.

These are red-team tools — offensive testing frameworks and adversarial corpora. Defensive frameworks live on /ecosystem.

Integrated (1)

HackAPrompt

Learn Prompting

Corpus ingested

The largest crowd-sourced prompt-injection competition corpus, ~600k attack attempts across all GPT/Claude/PaLM versions.

Clustered the HackAPrompt corpus by attack family and shipped 6 ATR rules (ATR-2026-00442..00447) covering the dominant clusters. Each rule cites the HackAPrompt cluster in its provenance.

Tool →PR →

Under review (5)

NVIDIA Garak

NVIDIA AI Red Team

The reference open-source LLM vulnerability scanner. 50+ probe families, jmartin-tech + leondz maintainers.

Wrapped 330 ATR rules as garak detectors. PR #1676 cleared two review rounds; in-the-wild benchmark posted 97.1% recall (646/666) on garak's own community jailbreak corpus.

Microsoft PyRIT

Microsoft AI Red Team

The toolkit Microsoft uses internally to red-team production LLM products. Roman Lutz leads.

Added an ATR dataset loader exposing the rule corpus as PyRIT attack sources. Roman reviewed within 2 min on first push; iterating on doc shape.

PromptInject

agencyenterprise · NeurIPS 2022 Best Paper

The original academic benchmark that started prompt-injection research. 8.2k stars; cited by every prompt-injection paper since.

Issue #9 proposes a corpus-to-ATR pipeline turning every PromptInject attack into a paired ATR rule. The reference academic benchmark gets defensive parity.

Promptfoo

Promptfoo Inc.

10k stars, used by red teams at Klarna, Discord, Anduril. Promptfoo runs adversarial tests; ATR catches what Promptfoo found.

PR #8529 adds an MCP red-team example using ATR as the deterministic defense layer. Promptfoo runs the probe; ATR rules return the verdict.

Damn Vulnerable MCP Server

harishsg993010

A CTF-style training target with 10 intentionally-vulnerable MCP scenarios. The DVWA of agent security.

PR #29 ships the blue-team detection guide — every CTF challenge gets a paired ATR rule so trainees learn detection alongside the attack.

On deck

Red-team integrations queued. One a week. Public schedule.

ATR doesn't only chase the biggest names. Here's the queue for the next six weeks — real dates, real targets. Maintainers who see themselves on the schedule tend to review faster. That's the side-effect of a public commitment.

HarmBench

Center for AI Safety · Dan Hendrycks

NeurIPS 2024 spotlight, 320-behavior standardized red-team benchmark used by Anthropic / OpenAI / Google safety teams.

Filing

2026-05-13

scheduled

AgentDojo

ETH Zurich SPY Lab · Florian Tramèr

NeurIPS 2024 — the only agent-specific attack benchmark with a real tool-use harness. 78 attack tasks across 4 environments.

Filing

2026-05-16

scheduled

JailbreakBench

Princeton · Patrick Chao

NeurIPS 2024 Datasets & Benchmarks. 100-behavior standardized jailbreak leaderboard, fixed eval interface.

Filing

2026-05-20

scheduled

TextAttack

QData · ACL 2020

3.1k stars. The reference NLP-adversarial framework. Used in undergraduate security curricula.

Filing

2026-05-22

scheduled

Microsoft Counterfit

Microsoft Azure Security

Microsoft's CLI-first AI red-teaming tool. Pairs with PyRIT (discovery) and AGT (governance) for a Microsoft trifecta on shared ATR backend.

Filing

2026-05-26

scheduled

InjecAgent

UIUC Kang Lab

Cleanest direct-vs-indirect agent injection taxonomy. 1,054 attack cases, complements AgentDojo.

Filing

2026-05-29

queued

GPTFuzz

NDSS 2024

First credible LLM fuzzer. Detection-evasion pass closes the loop on what their fuzzer finds.

Filing

2026-06-02

queued

R-Judge

Tongxin Yuan et al.

162-scenario agent safety benchmark. LLM-as-judge + content-rule = complementary defense lanes.

Filing

2026-06-05

queued

Schedule syncs from filed GitHub issues / PRs. Once merged, entries move to "Already Integrated" above. Maintainers wanting earlier engagement: [email protected].

Pipeline is not vapor

Auto-regex already clears 0 FP across 3,551 samples on the sample probe.

$ npx tsx scripts/auto-regex.ts \
    --file proposals/red-team-probes/dan-trust-phrase-wrapping.proposal.yaml \
    --write

[auto-regex] 3 TPs, 3 TNs — generating candidate regex…
[auto-regex] gate corpora: 431 benign + 1,352 extended + 157 research + 1,611 cross-rule TNs
[auto-regex] variant 0: 3 phrases, tp=100%, fp=0
  (benign=0 ext=0 res=0 cross=0) — PASS
[auto-regex] wrote regex to proposals/red-team-probes/...

::auto-regex-summary::
{ "passed": true, "variant": 0, "tp_coverage": 1, "total_fp": 0 }

Submit a probe

10 minutes of form-filling. Permanent attribution. Ships to every major AI security platform.

MIT licensed. No CLA. No telemetry. Forever free. You retain every right to publish the attack itself — ATR only carries the detection.

Open a probe issue →Read the repo →Defensive side at /ecosystem →

344 rules deployed across 8 threat categories. Every one has an author field and metadata_provenance.discovered_by intact.

Find the attack.We ship the rule.2 hours 16 minutes.

Four concrete payoffs depending on who you are.

Probe in. Auto-regex generates. Quality gate validates. Merge if green.

You fill the form

Auto-regex tries 4 variants

Gate clears → complete rule on a PR

Rule auto-propagates downstream

ATR is wired into the red-team ecosystem. Your tool can be next.

HackAPrompt

Red-team integrations queued. One a week. Public schedule.

Auto-regex already clears 0 FP across 3,551 samples on the sample probe.

10 minutes of form-filling. Permanent attribution. Ships to every major AI security platform.

Find the attack.
We ship the rule.
2 hours 16 minutes.