Skip to content
ATR-2026-00020highContext Exfiltrationexperimental

System Prompt and Internal Instruction Leakage

Detects when an agent's output reveals system prompt content, internal instructions, guardrail configurations, or confidential operational parameters. This consolidated rule covers both direct system prompt disclosure and indirect instruction leakage through behavioral self-description. Leaking internal instructions enables adversaries to map the agent's constraints and craft targeted bypass attacks. Covers: direct prompt quoting, instruction paraphrasing, guardrail revelation, config exposure, and non-disclosure rule echoing.

Severity
high
Category
Context Exfiltration
Scan Target
mcp
Author
ATR Community

Response Actions

block outputalertsnapshotreset contextescalate

References

OWASP Agentic
ASI01:2026 - Agent Goal HijackASI06:2026 - Memory and Context Poisoning
OWASP LLM
LLM07:2025 - System Prompt LeakageLLM02:2025 - Sensitive Information Disclosure
MITRE ATLAS
AML.T0056 - LLM Meta Prompt ExtractionAML.T0051 - LLM Prompt Injection