NVIDIA garak × ATR
Turn every garak failed probe into a MIT-licensed ATR detection rule. Zero extra red-team work, full provenance on the author line.
The idea
garak finds attacks at the LLM layer. ATR defends at the agent layer (tool calls, skills, MCP). Same attack payload, different defence surface. The bridge is a single Python script: garak reports in, ATR proposals out, crystallisation loop does the rest.
garak (red team) ATR (standard) downstream
───────────────── ────────────── ──────────
Probes claude-3.7 314 rules today npm: agent-threat-rules
Probes gpt-5 ──▶ +auto-crystallised PyPI: pyatr
Probes gemini-2-pro from garak evidence Cisco DefenseClaw
Writes report.jsonl canary 24h OWASP Agentic Top 10
auto-merge your CI pipelineOne-time setup
- Email [email protected] with your org name to receive a partner API key (manual issuance during early-partner phase, no cost, MIT terms).
- Fetch the pipe script: it lives in the ATR repo.
curl -O https://raw.githubusercontent.com/Agent-Threat-Rule/agent-threat-rules/main/scripts/garak-to-tc.py - Set the key in your environment:
export ATR_PARTNER_KEY=…
Run it
After a garak eval session:
# Native garak JSONL report
python3 garak-to-tc.py \
--input ~/.local/share/garak/runs/2026-04-18-claude-3-7/report.jsonl \
--partner-name nvidia-airt \
--target-model claude-3-7-sonnet \
--garak-version 0.14.1
# Or ATR-style eval (from scripts/eval-garak.sh in this repo)
python3 garak-to-tc.py \
--input data/garak-benchmark/garak-eval-report.json \
--partner-name nvidia-airt \
--dry-run # inspect before submittingTwo modes
The script has two submission modes. The default (--mode drafter) is what you want.
- drafter (default). POSTs each failed probe to
/api/atr-proposals/from-payload. Server-side runs a tool-use LLM drafter (grep existing rules for dedup, fetch research for grounding, write YAML with 3+ conditions / 5+ TP / 5+ TN / 3+ evasion tests), passes RFC-001 quality gate, self-tests its own regex, then files as a proposal. Each call takes 30-60s. This is the mode that produces real ATR rules. - proposal (legacy). POSTs a client-built literal draft. Fast but the LLM reviewer typically rejects these as too narrow (literal fingerprint ≠ detection rule). Only use if you have your own YAML-generation step upstream.
What happens downstream
- Each unique failed probe → one proposal in
/api/atr-proposalswith full garak provenance embedded. - TC crystallisation LLM reads the attack payload + garak metadata and drafts a generalised detection regex.
- Rule enters 24h canary. Every ATR-integrated client in the field gets a sampling dose (~10%). If any client reports false positives, the rule is quarantined automatically.
- Survives canary → safety gate (0 FP on 432-sample benign corpus, test_cases present, ≤10 rules per PR) → auto-merged into
agent-threat-rulesmain →npm publishfires within minutes → every downstream consumer upgrades on next install. - Every resulting rule carries
author: nvidia-airt (via garak vX.Y.Z)andmetadata_provenance.garak_probe: <probe-id>. Attribution is permanent and machine-readable.
What counts as a failed probe
The script treats a garak attempt as a rule candidate when either:
- At least one garak detector returned a score above 0.5 (the target model fell for the probe).
- The input is an ATR-style eval report and the prompt appears in
missed_samples— garak flagged it malicious and ATR had no pattern for it yet (highest-signal candidates).
Rate limits & safety
- Default 100ms delay between POSTs. At 666 findings (the public garak in-the-wild-jailbreak set) that is about a minute.
patternHashis a sha256 of the prompt, so re-running on the same report is idempotent — TC recognises duplicate submissions and just bumps a confirmation counter, never creates duplicate rules.- All traffic is authenticated with your partner key. Key compromise? Email and we revoke and re-issue.
- Every submitted proposal is reviewable and rejectable via the TC admin dashboard before it enters canary.
Related
- /integrate — general adopters (npm + PyPI), no partner key needed
- /partner-sync — live rule pull for platforms that consume ATR downstream
- scripts/garak-to-tc.py — source, 300 lines, no dependencies beyond Python stdlib