v3.1.1
2026-06-05Fixed
- ▸ATR-2026-00001 (prompt-injection) false positive: the
start over/fresh/anew with a new …task-switch branch matched benign phrasing such as "start fresh with a new outline" / "a new draft" because it lacked the task-noun constraint its sibling branches carry. Tightened to require a task-like noun (task / instruction / assignment / objective / goal / mission / prompt / persona / role / directive / job) after "new". The rule's documented true-positives are unaffected; full suite green (535 tests). Surfaced by the new semantic-validation corpus.
Added
- ▸Semantic-validation harness (
data/semantic-validation/+scripts/semantic-validation-score.mts): a focused, reproducible corpus — 20 held-out paraphrased attacks targeting the two v3.1.0 semantic rules (00573 instruction-override, 00574 system-prompt extraction) plus 15 adversarial benign near-misses — and a scorer. Documents what the optional semantic stage actually adds on its two target threats, and ships the corpus so anyone can re-run it with their own judge.
Measured
- ▸Focused semantic validation (n=35, Claude-as-judge via worklist mode, threshold 0.7): the deterministic regex layer caught 10% of the paraphrased attacks; the optional semantic judge stage raised combined recall to 95% (19/20), with 0 judge false positives on the 15 adversarial benign near-misses (0 regex FP after the 00001 fix above). This is a small-n, authored, self-judged validation of the pipeline — NOT an independent benchmark. See
data/semantic-validation/README.mdfor method and caveats.