RFC-001 · Quality Standard

An AI agent rule standard
with provenance tracking

A detection standard earns trust by publishing its worst figure, not hiding it. Every rule has a confidence score you can compute yourself, every mapping a provenance you can audit, and every detection lane reports its false-positive rate lane by lane — never as a single flattering number. No black boxes, no vendor lock-in: just a public formula, open-source code, and data that has survived the wild.

Working Draft·version 3.5.11·updated 16 June 2026·canonical /spec/spec·editor Adam Lin

Read RFC-001 →npm install [email protected]

Live

Full ATR rule pack in Cisco AI Defense production

96,096

Real agent skills scanned across 6 registries (as of 2026-04-14)

99.7%

Precision on the PINT-format adversarial corpus

~0.24%

False-positive rate on the enforce lane (mature rules only)

Rule provenance breakdown

What fraction is human-reviewed vs. LLM-generated.

ATR publishes this number because every downstream adopter should know it when evaluating trust.

Human-reviewed

~30%

Rules authored by humans at origin or promoted to stable via human review. These rules carry metadata_provenance: human-reviewed.

LLM-generated + 2-gate validated

~65%

Drafted by LLM, passed syntax gate and RFC-001 quality gate, validated against benign corpus with 0 FP. Tagged llm-generated; confidence capped at 70 until human review.

Corpus fingerprints (kept experimental)

~5%

Highly literal patterns extracted from specific corpora. Generalizing them would produce unacceptable FP. Kept as experimental, not used for production blocking, explicitly labeled.

Why trust LLM-generated rules? Because the two-dimensional compliance model tracks origin and trust separately. LLM-generated rules can pass the experimental gate for fast iteration; stable promotion requires human review. Adopters can decide: deploy only human-reviewed rules, only stable-tier rules, or both.

The ruleset grows daily through an auto-crystallization flywheel, and most newly generated rules start tagged LLM-generated (2-gate validated). The human-reviewed share rises as human review accumulates — and that share is itself published.

The Formula

Confidence is a number, not an opinion

Every component is computed from measurable facts. Run it yourself — the formula is public.

confidence = 0.4 × precision + 0.3 × wild + 0.2 × coverage + 0.1 × evasion

Precision40%

(1 − wild_fp_rate) × 100

Measured false-positive rate on real-world corpora.

Wild validation30%

min(wild_samples / 10,000, 1) × 100

How much real data the rule has survived.

Coverage20%

min(conditions / 5, 1) × 100

Detection depth — distinct attack layers covered.

Evasion docs10%

min(documented_evasions / 5, 1) × 100

Honest acknowledgment of known bypass techniques.

90–100 · Very High

Safe to block in production

60–79 · Medium

Alert-only with monitoring

<40 · Draft

Do not deploy

The Differentiator

Two-dimensional compliance model

Separating 'does the rule have the metadata' from 'who verified it'. The LLM participates only at authoring time; the detection core itself is deterministic regex / AST matching and calls no LLM at evaluation time.

Dimension 1 · Technical compliance

Does the rule have the required metadata? Detection conditions, test cases, OWASP and MITRE references, false positive documentation. Machine-verifiable in under a millisecond.

validateRuleMeetsStandard(rule)

Dimension 2 · Trust compliance

Who verified the metadata? human-reviewed, community-contributed, auto-generated, or llm-generated. Stable promotion requires verified provenance — not just presence.

metadata_provenance: { mitre_atlas: human-reviewed }

Why this matters

Traditional rule standards (Sigma, YARA, OWASP CRS) treat compliance as binary — either the metadata is there or it is not. This creates a perverse incentive: vendors pad metadata to pass the check without doing the underlying review work.

ATR separates the two dimensions. Auto-generated mappings can pass the experimental gate for fast iteration. Stable promotion — the level enterprises block in production — requires human review. Fast iteration and honest trust, at the same time.

The Ladder

Every rule climbs by evidence, not by age

draft → experimental → test → stable → deprecated. Each gate is explicit and mechanical — a rule moves up on the evidence it has accumulated, never on how long it has sat in the repository. When quality regresses, demotion fires automatically, with no human in the loop.

Draft

Promotion gate

Valid schema · ≥1 TP + 1 TN · no ReDoS

Deployment guidance

Not deployed

Experimental

Promotion gate

3+ TP + 3+ TN · CI pass · OWASP + MITRE mapping encouraged (not required) · evasion tests encouraged (not required)

Deployment guidance

Alert-only

Test

Promotion gate

Canary observation passed · wild FP measured · no unresolved false-positive reports in the canary window

Deployment guidance

Alert-only, promotion candidate

Stable

Promotion gate

Wild-validated (1,000+ samples) · FP rate ≤ 0.5% · human-verified provenance · ≥3 evasion tests

Deployment guidance

Block in production

Deprecated

Promotion gate

Superseded or demoted · retained for provenance · engines MUST NOT enable by default

Deployment guidance

Off by default

Automatic demotion

Stable rules with a wild false positive rate above 2%, or three unresolved false positive reports within 30 days, are automatically demoted to experimental. No human decision required. The system self-corrects.

Detection lanes

Maturity decides what a rule is allowed to do

The ladder is not just a label — it drives three detection lanes. Each lane is an explicit tradeoff between which maturities it admits and how much false positive it tolerates. And each lane publishes its own false-positive rate.

enforce lane

~0.24%

false positives · stable + confirm only

Admits only the most mature, human-confirmed rules. The precision is bought by giving up recall — a deliberate tradeoff, stated openly, for anyone who blocks in production.

alert lane

Alert

stable + test · no blocking

Adds promotion-candidate test rules, surfacing more of the attack surface — but only as alerts, where a false positive costs a notification, not a blocked user.

hunt lane (default)

~9%

false positives · everything, advisory only

Runs every rule as an advisory signal, giving threat hunters maximum visibility. The ~9% false-positive rate is not a flaw hidden in a footnote — it is printed right here, because this lane never blocks anything on its own.

Quality is honesty

0.24% and 9% are the real figures from one ruleset across two lanes. Printing them side by side — instead of quoting only the flattering one — is what this standard means by quality: a standard earns trust by publishing its worst figure, not hiding it. Adopters get a tradeoff table they can verify themselves, not a polished promise.

The Gauntlet

Six stages before a rule reaches production

An LLM-drafted rule passes through six independent verification stages before it ever protects a user. Each stage has mechanical, public criteria.

Stage 1

LLM Drafter

Claude Sonnet generates a YAML rule against a strict prompt requiring 3+ conditions, 5+ TP/TN, 3+ evasion tests, and OWASP + MITRE mapping.

Stage 2

Syntax Gate

Regex extraction, invalid pattern rejection, PCRE-to-JS normalization. Broken rules are dropped with logged reasons.

Stage 3

Quality Gate

The RFC-001 formula runs: detection depth, test coverage, reference mapping, documentation completeness. Below the bar — rejected.

Stage 4

Canary Observation

Accepted rules enter a canary window. Independent confirmations and wild FP measurements gate further promotion.

Stage 5

Human Review

Provenance starts as llm-generated. Human review upgrades to human-reviewed before the rule can reach stable.

Stage 6

Upstream PR

Promoted rules open pull requests against the public ATR repository for open review and merge.

Live Crystallization Output · Gate PassedATR-2026-DRAFT-8f3c9a72

Hidden Credential Exfiltration with Silent Execution Override

severity · critical

Detection layers

5 + 5

TP + TN cases

Evasion tests

100%

Required fields

OWASP

✓LLM01:2025 — Prompt Injection

✓ASI01:2026 — Agent Behaviour Hijack

MITRE ATLAS

✓AML.T0051 — LLM Prompt Injection

Provenance

llm-generated

Tagged honestly as LLM-generated. Confidence capped at 70 until human review upgrades it.

The Landscape

How ATR compares to existing rule standards

Sigma, YARA, OWASP CRS, and Suricata solved this for malware, SIEM, WAF, and IDS. Nobody had solved it for AI agents — until now.

Feature	ATR	Sigma	YARA	OWASP CRS	Suricata
Maturity ladder with explicit gates	✓	✓	—	✓	✓
Formula-based confidence score (0–100)	✓	—	—	—	~
Wild validation required for production	✓	—	—	—	—
Per-field provenance tracking	✓	—	—	—	—
Automatic demotion on quality regression	✓	—	—	—	—
Open-source reference implementation	✓	✓	✓	✓	✓

ATR is the only standard requiring wild-scan validation with measured FP rates and automatic demotion on quality regression.

Verify It Yourself

Don't trust us — run the validator

Every function is pure, open-source, and documented. Score your own rules — or ours — in under a minute.

Install

npm install [email protected]

Score any rule

import {
  parseATRRule,
  computeConfidence,
  validateRuleMeetsStandard,
} from 'agent-threat-rules/quality';

const rule = parseATRRule(yamlContent);
const score = computeConfidence(rule);
const gate = validateRuleMeetsStandard(rule, 'stable');

console.log('Confidence:', score.total);    // 0-100
console.log('Passes stable:', gate.passed);
console.log('Issues:', gate.issues);

Measurable. Auditable. Open.

The formula is public, the per-lane false-positive rates are public, and so is the worst of them. Any conformant scanner — ATR, Cisco, Microsoft AGT, or one you write yourself — can score rules on the same axes, with the same library, and verify the result independently. That is what trust in an open standard looks like.

Read the RFC →Browse rules

An AI agent rule standardwith provenance tracking

What fraction is human-reviewed vs. LLM-generated.

Confidence is a number, not an opinion

Two-dimensional compliance model

Every rule climbs by evidence, not by age

Maturity decides what a rule is allowed to do

Six stages before a rule reaches production

LLM Drafter

Syntax Gate

Quality Gate

Canary Observation

Human Review

Upstream PR

Hidden Credential Exfiltration with Silent Execution Override

How ATR compares to existing rule standards

Don't trust us — run the validator

Measurable. Auditable. Open.

An AI agent rule standard
with provenance tracking