你找到攻擊。
我們發 detection 規則。
2 小時 16 分鐘。
Microsoft Copilot SWE Agent 自己會開 PR 預設 ATR 存在。Cisco AI Defense production 內建 314 條規則。MISP 全部 export 到 STIX 帶你的署名。NeurIPS 2024 的 HarmBench / AgentDojo / JailbreakBench 下一個 wave 接進來。
2026-05-07 Microsoft Security disclosed Semantic Kernel CVE-2026-26030. 2026-05-11 06:07 UTC Microsoft Copilot SWE Agent opened a regression-test PR presuming ATR coverage. 08:24 UTC v2.1.2 published on npm with paired rules. End-to-end loop.
Microsoft (Agent Governance Toolkit weekly auto-sync). Cisco AI Defense (314-rule pack in production). CIRCL/MISP (taxonomies + galaxy merged by project lead). OWASP (Project Lead merged with "Welcome to the team"). FINOS, NIST OSCAL, UK Gov AISI in motion.
Every rule carries author + metadata_provenance.discovered_by. Microsoft AGT, Cisco AI Defense, MISP, OWASP all preserve it on sync. When MISP exports to STIX, attribution survives. When NIST cites the rule, lineage is intact.
6-check quality gate: own-TP must match + 1,784 benign + 157 research-mention + 1,611 cross-rule conflict-free + own true_negative coverage. Rules that fire on the paper describing the attack don't ship.
依你是誰,紅隊貢獻有四種具體報酬。
Citable artifact paired with your attack. Concrete adoption metrics for your CV: number of rules, downstream consumers, fire counts in production telemetry.
Your name appears in the rule file's author + metadata_provenance.discovered_by. When MISP exports the taxonomy to STIX, your attribution propagates. When NIST cites the rule in a publication (in-progress with iMichaela at NIST OSCAL), the lineage is intact.
Your adversarial work becomes the defensive standard before competitors. Your team shows up as the discoverer in 350+ rules consumed by every major AI-security platform.
Microsoft's Copilot SWE Agent already opens PRs presuming ATR coverage (AGT #1981, closed 2026-05-11). Being the originator of ATR-2026-NNNNN rules is a real authority signal in vendor-eval conversations.
Ship detection without writing regex. Submit positive + negative examples — auto-regex tries 4 variants against the full gate, ~30% pass on first attempt.
Auto-regex deterministic generator clears 0 FP across 3,551 samples (benign + extended + research-mention + cross-rule). The PR labels itself gate-passed and goes straight to maintainer review.
A second income stream for the same attack: the bug-bounty payout AND a permanent detection rule. Combines well with Huntr.dev, Hackerone AI scope, Protect AI bounty programmes.
ATR proposals are MIT-licensed and citable. No NDA conflict — the bounty programme owns the responsible-disclosure window; ATR ships detection after disclosure with public attribution.
Probe 進來、auto-regex 自動產、quality gate 全綠才 merge。
你填表單
3 個攻擊樣本、3 個 benign lookalike、攻擊類別、來源論文/repo。3-5 分鐘。
沒有 schema 要學、沒有 YAML 要寫、不用 fork repo。
Auto-regex 跑 4 個變體
Deterministic n-gram set-cover algorithm 從你的 positive examples 萃取 distinctive phrases,建 alternation regex,加 word boundary、whitespace anchor 或 co-occurrence 約束 — 每個變體跑完整 gate。
Gate = 自己 TP 必須 100% 命中 + 1,784 樣本 benign corpus 0 FP + 157 樣本 research-mention 0 FP + 跨規則 0 衝突。
Gate 過了 → 完整規則送 PR
PR 帶 gate-passed label。Maintainer 看 regex shape 是否太字面、需不需要 generalize — 通常 1-3 天 merge。沒過就留 stub,maintainer 手寫 regex(仍然會用你的 test cases)。
規則自動往下游傳
Microsoft AGT 每週 sync、Cisco AI Defense 跟 release tag、MISP taxonomy + galaxy 每次 release 拉、OWASP A-S-R-H 在 fixture 中引用 rule ID。你的 discovered_by 跟著整條鏈傳遞。
Microsoft Semantic Kernel CVE 從 disclosure 到 v2.1.2 publish 用了 2 小時 16 分鐘。這就是 cadence。
我們把 ATR 接到紅隊這邊。你的工具可以是下一個。
這些是真的紅隊工具(offensive testing)。防禦端框架在 /ecosystem。
HackAPrompt
The largest crowd-sourced prompt-injection competition corpus, ~600k attack attempts across all GPT/Claude/PaLM versions.
Clustered the HackAPrompt corpus by attack family and shipped 6 ATR rules (ATR-2026-00442..00447) covering the dominant clusters. Each rule cites the HackAPrompt cluster in its provenance.
The reference open-source LLM vulnerability scanner. 50+ probe families, jmartin-tech + leondz maintainers.
Wrapped 330 ATR rules as garak detectors. PR #1676 cleared two review rounds; in-the-wild benchmark posted 97.1% recall (646/666) on garak's own community jailbreak corpus.
The toolkit Microsoft uses internally to red-team production LLM products. Roman Lutz leads.
Added an ATR dataset loader exposing the rule corpus as PyRIT attack sources. Roman reviewed within 2 min on first push; iterating on doc shape.
The original academic benchmark that started prompt-injection research. 8.2k stars; cited by every prompt-injection paper since.
Issue #9 proposes a corpus-to-ATR pipeline turning every PromptInject attack into a paired ATR rule. The reference academic benchmark gets defensive parity.
10k stars, used by red teams at Klarna, Discord, Anduril. Promptfoo runs adversarial tests; ATR catches what Promptfoo found.
PR #8529 adds an MCP red-team example using ATR as the deterministic defense layer. Promptfoo runs the probe; ATR rules return the verdict.
A CTF-style training target with 10 intentionally-vulnerable MCP scenarios. The DVWA of agent security.
PR #29 ships the blue-team detection guide — every CTF challenge gets a paired ATR rule so trainees learn detection alongside the attack.
已經排好的紅隊整合 — 一週一個,公開排程。
ATR 不是只跟最大牌的整合。下面是接下來六週要送的 issue / PR — 真實日期、真實對象。Maintainer 看到自己被排上會優先 review,這是公開承諾的副作用。
NeurIPS 2024 spotlight, 320-behavior standardized red-team benchmark used by Anthropic / OpenAI / Google safety teams.
NeurIPS 2024 — the only agent-specific attack benchmark with a real tool-use harness. 78 attack tasks across 4 environments.
NeurIPS 2024 Datasets & Benchmarks. 100-behavior standardized jailbreak leaderboard, fixed eval interface.
3.1k stars. The reference NLP-adversarial framework. Used in undergraduate security curricula.
Microsoft's CLI-first AI red-teaming tool. Pairs with PyRIT (discovery) and AGT (governance) for a Microsoft trifecta on shared ATR backend.
Cleanest direct-vs-indirect agent injection taxonomy. 1,054 attack cases, complements AgentDojo.
First credible LLM fuzzer. Detection-evasion pass closes the loop on what their fuzzer finds.
162-scenario agent safety benchmark. LLM-as-judge + content-rule = complementary defense lanes.
排程從 GitHub issue / PR 同步。Merge 後從這裡移到 "Already Integrated"。Maintainer 若想插隊:[email protected]。
Auto-regex 已經對你的範本 0 FP 跨 3,551 樣本。
$ npx tsx scripts/auto-regex.ts \
--file proposals/red-team-probes/dan-trust-phrase-wrapping.proposal.yaml \
--write
[auto-regex] 3 TPs, 3 TNs — generating candidate regex…
[auto-regex] gate corpora: 431 benign + 1,352 extended + 157 research + 1,611 cross-rule TNs
[auto-regex] variant 0: 3 phrases, tp=100%, fp=0
(benign=0 ext=0 res=0 cross=0) — PASS
[auto-regex] wrote regex to proposals/red-team-probes/...
::auto-regex-summary::
{ "passed": true, "variant": 0, "tp_coverage": 1, "total_fp": 0 }10 分鐘的填表 → 永久的署名 → 進每個主流 AI 安全平台。
MIT 授權、無 CLA、無遙測、永遠免費。你保留出版攻擊本身的所有權利 — ATR 只負責把它變成偵測。
已部署 344 條規則,跨 8 個威脅類別。每條都有 author + metadata_provenance.discovered_by。