ATR-2026-00020highContext Exfiltrationexperimental
System Prompt and Internal Instruction Leakage
Detects when an agent's output reveals system prompt content, internal instructions, guardrail configurations, or confidential operational parameters. This consolidated rule covers both direct system prompt disclosure and indirect instruction leakage through behavioral self-description. Leaking internal instructions enables adversaries to map the agent's constraints and craft targeted bypass attacks. Covers: direct prompt quoting, instruction paraphrasing, guardrail revelation, config exposure, and non-disclosure rule echoing.
Severity
high
Category
Context Exfiltration
Scan Target
mcp
Author
ATR Community
Response Actions
block outputalertsnapshotreset contextescalate
References
OWASP Agentic
ASI01:2026 - Agent Goal HijackASI06:2026 - Memory and Context Poisoning
OWASP LLM
LLM07:2025 - System Prompt LeakageLLM02:2025 - Sensitive Information Disclosure
MITRE ATLAS
AML.T0056 - LLM Meta Prompt ExtractionAML.T0051 - LLM Prompt Injection