獻給紅隊

你找到攻擊。
我們發 detection 規則。
2 小時 16 分鐘。

Microsoft Copilot SWE Agent 自己會開 PR 預設 ATR 存在。Cisco AI Defense production 內建 314 條規則。MISP 全部 export 到 STIX 帶你的署名。NeurIPS 2024 的 HarmBench / AgentDojo / JailbreakBench 下一個 wave 接進來。

送一個 probe →看下一波接什麼 →怎麼跑 →

為什麼這值得貢獻

2h 16m

Disclosure → npm publish

2026-05-07 Microsoft Security disclosed Semantic Kernel CVE-2026-26030. 2026-05-11 06:07 UTC Microsoft Copilot SWE Agent opened a regression-test PR presuming ATR coverage. 08:24 UTC v2.1.2 published on npm with paired rules. End-to-end loop.

Tier-1 institutions integrating

Microsoft (Agent Governance Toolkit weekly auto-sync). Cisco AI Defense (314-rule pack in production). CIRCL/MISP (taxonomies + galaxy merged by project lead). OWASP (Project Lead merged with "Welcome to the team"). FINOS, NIST OSCAL, UK Gov AISI in motion.

344

Rules. Each with your name attached forever

Every rule carries author + metadata_provenance.discovered_by. Microsoft AGT, Cisco AI Defense, MISP, OWASP all preserve it on sync. When MISP exports to STIX, attribution survives. When NIST cites the rule, lineage is intact.

0 FP

Required across 3,551 benign samples

6-check quality gate: own-TP must match + 1,784 benign + 157 research-mention + 1,611 cross-rule conflict-free + own true_negative coverage. Rules that fire on the paper describing the attack don't ship.

你能拿到什麼

依你是誰，紅隊貢獻有四種具體報酬。

適用對象

Academic researchers

Citable artifact paired with your attack. Concrete adoption metrics for your CV: number of rules, downstream consumers, fire counts in production telemetry.

Your name appears in the rule file's author + metadata_provenance.discovered_by. When MISP exports the taxonomy to STIX, your attribution propagates. When NIST cites the rule in a publication (in-progress with iMichaela at NIST OSCAL), the lineage is intact.

適用對象

Corporate red teams

Your adversarial work becomes the defensive standard before competitors. Your team shows up as the discoverer in 350+ rules consumed by every major AI-security platform.

Microsoft's Copilot SWE Agent already opens PRs presuming ATR coverage (AGT #1981, closed 2026-05-11). Being the originator of ATR-2026-NNNNN rules is a real authority signal in vendor-eval conversations.

適用對象

Independent researchers

Ship detection without writing regex. Submit positive + negative examples — auto-regex tries 4 variants against the full gate, ~30% pass on first attempt.

Auto-regex deterministic generator clears 0 FP across 3,551 samples (benign + extended + research-mention + cross-rule). The PR labels itself gate-passed and goes straight to maintainer review.

適用對象

Bug bounty hunters

A second income stream for the same attack: the bug-bounty payout AND a permanent detection rule. Combines well with Huntr.dev, Hackerone AI scope, Protect AI bounty programmes.

ATR proposals are MIT-licensed and citable. No NDA conflict — the bounty programme owns the responsible-disclosure window; ATR ships detection after disclosure with public attribution.

送一個 probe 之後會發生什麼

Probe 進來、auto-regex 自動產、quality gate 全綠才 merge。

你填表單

3 個攻擊樣本、3 個 benign lookalike、攻擊類別、來源論文/repo。3-5 分鐘。

沒有 schema 要學、沒有 YAML 要寫、不用 fork repo。

Auto-regex 跑 4 個變體

Deterministic n-gram set-cover algorithm 從你的 positive examples 萃取 distinctive phrases，建 alternation regex，加 word boundary、whitespace anchor 或 co-occurrence 約束 — 每個變體跑完整 gate。

Gate = 自己 TP 必須 100% 命中 + 1,784 樣本 benign corpus 0 FP + 157 樣本 research-mention 0 FP + 跨規則 0 衝突。

Gate 過了 → 完整規則送 PR

PR 帶 gate-passed label。Maintainer 看 regex shape 是否太字面、需不需要 generalize — 通常 1-3 天 merge。沒過就留 stub，maintainer 手寫 regex（仍然會用你的 test cases）。

規則自動往下游傳

Microsoft AGT 每週 sync、Cisco AI Defense 跟 release tag、MISP taxonomy + galaxy 每次 release 拉、OWASP A-S-R-H 在 fixture 中引用 rule ID。你的 discovered_by 跟著整條鏈傳遞。

Microsoft Semantic Kernel CVE 從 disclosure 到 v2.1.2 publish 用了 2 小時 16 分鐘。這就是 cadence。

已經在跑的紅隊整合

我們把 ATR 接到紅隊這邊。你的工具可以是下一個。

這些是真的紅隊工具（offensive testing）。防禦端框架在 /ecosystem。

已整合 (1)

HackAPrompt

Learn Prompting

資料集已消化

The largest crowd-sourced prompt-injection competition corpus, ~600k attack attempts across all GPT/Claude/PaLM versions.

Clustered the HackAPrompt corpus by attack family and shipped 6 ATR rules (ATR-2026-00442..00447) covering the dominant clusters. Each rule cites the HackAPrompt cluster in its provenance.

工具 →PR →

審查中 (5)

NVIDIA Garak

NVIDIA AI Red Team

The reference open-source LLM vulnerability scanner. 50+ probe families, jmartin-tech + leondz maintainers.

Wrapped 330 ATR rules as garak detectors. PR #1676 cleared two review rounds; in-the-wild benchmark posted 97.1% recall (646/666) on garak's own community jailbreak corpus.

Microsoft PyRIT

Microsoft AI Red Team

The toolkit Microsoft uses internally to red-team production LLM products. Roman Lutz leads.

Added an ATR dataset loader exposing the rule corpus as PyRIT attack sources. Roman reviewed within 2 min on first push; iterating on doc shape.

PromptInject

agencyenterprise · NeurIPS 2022 Best Paper

The original academic benchmark that started prompt-injection research. 8.2k stars; cited by every prompt-injection paper since.

Issue #9 proposes a corpus-to-ATR pipeline turning every PromptInject attack into a paired ATR rule. The reference academic benchmark gets defensive parity.

Promptfoo

Promptfoo Inc.

10k stars, used by red teams at Klarna, Discord, Anduril. Promptfoo runs adversarial tests; ATR catches what Promptfoo found.

PR #8529 adds an MCP red-team example using ATR as the deterministic defense layer. Promptfoo runs the probe; ATR rules return the verdict.

Damn Vulnerable MCP Server

harishsg993010

A CTF-style training target with 10 intentionally-vulnerable MCP scenarios. The DVWA of agent security.

PR #29 ships the blue-team detection guide — every CTF challenge gets a paired ATR rule so trainees learn detection alongside the attack.

下一波

已經排好的紅隊整合 — 一週一個，公開排程。

ATR 不是只跟最大牌的整合。下面是接下來六週要送的 issue / PR — 真實日期、真實對象。Maintainer 看到自己被排上會優先 review，這是公開承諾的副作用。

HarmBench

Center for AI Safety · Dan Hendrycks

NeurIPS 2024 spotlight, 320-behavior standardized red-team benchmark used by Anthropic / OpenAI / Google safety teams.

送出日

2026-05-13

已排程

AgentDojo

ETH Zurich SPY Lab · Florian Tramèr

NeurIPS 2024 — the only agent-specific attack benchmark with a real tool-use harness. 78 attack tasks across 4 environments.

送出日

2026-05-16

已排程

JailbreakBench

Princeton · Patrick Chao

NeurIPS 2024 Datasets & Benchmarks. 100-behavior standardized jailbreak leaderboard, fixed eval interface.

送出日

2026-05-20

已排程

TextAttack

QData · ACL 2020

3.1k stars. The reference NLP-adversarial framework. Used in undergraduate security curricula.

送出日

2026-05-22

已排程

Microsoft Counterfit

Microsoft Azure Security

Microsoft's CLI-first AI red-teaming tool. Pairs with PyRIT (discovery) and AGT (governance) for a Microsoft trifecta on shared ATR backend.

送出日

2026-05-26

已排程

InjecAgent

UIUC Kang Lab

Cleanest direct-vs-indirect agent injection taxonomy. 1,054 attack cases, complements AgentDojo.

送出日

2026-05-29

佇列中

GPTFuzz

NDSS 2024

First credible LLM fuzzer. Detection-evasion pass closes the loop on what their fuzzer finds.

送出日

2026-06-02

佇列中

R-Judge

Tongxin Yuan et al.

162-scenario agent safety benchmark. LLM-as-judge + content-rule = complementary defense lanes.

送出日

2026-06-05

佇列中

排程從 GitHub issue / PR 同步。Merge 後從這裡移到 "Already Integrated"。Maintainer 若想插隊：[email protected]。

Pipeline 不是 vapor

Auto-regex 已經對你的範本 0 FP 跨 3,551 樣本。

$ npx tsx scripts/auto-regex.ts \
    --file proposals/red-team-probes/dan-trust-phrase-wrapping.proposal.yaml \
    --write

[auto-regex] 3 TPs, 3 TNs — generating candidate regex…
[auto-regex] gate corpora: 431 benign + 1,352 extended + 157 research + 1,611 cross-rule TNs
[auto-regex] variant 0: 3 phrases, tp=100%, fp=0
  (benign=0 ext=0 res=0 cross=0) — PASS
[auto-regex] wrote regex to proposals/red-team-probes/...

::auto-regex-summary::
{ "passed": true, "variant": 0, "tp_coverage": 1, "total_fp": 0 }

送一個 probe

10 分鐘的填表 → 永久的署名 → 進每個主流 AI 安全平台。

MIT 授權、無 CLA、無遙測、永遠免費。你保留出版攻擊本身的所有權利 — ATR 只負責把它變成偵測。

開 probe issue →看 repo →防禦端在 /ecosystem →

已部署 344 條規則，跨 8 個威脅類別。每條都有 author + metadata_provenance.discovered_by。

你找到攻擊。我們發 detection 規則。2 小時 16 分鐘。

依你是誰，紅隊貢獻有四種具體報酬。

Probe 進來、auto-regex 自動產、quality gate 全綠才 merge。

你填表單

Auto-regex 跑 4 個變體

Gate 過了 → 完整規則送 PR

規則自動往下游傳

我們把 ATR 接到紅隊這邊。你的工具可以是下一個。

HackAPrompt

已經排好的紅隊整合 — 一週一個，公開排程。

Auto-regex 已經對你的範本 0 FP 跨 3,551 樣本。

10 分鐘的填表 → 永久的署名 → 進每個主流 AI 安全平台。

你找到攻擊。
我們發 detection 規則。
2 小時 16 分鐘。