ATR-2026-00020highContext Exfiltrationexperimental
System Prompt and Internal Instruction Leakage
Detects when an agent's output reveals system prompt content, internal instructions, guardrail configurations, or confidential operational parameters. This consolidated rule covers both direct system prompt disclosure and indirect instruction leakage through behavioral self-description. Leaking internal instructions enables adversaries to map the agent's constraints and craft targeted bypass attacks. Covers: direct prompt quoting, instruction paraphrasing, guardrail revelation, config exposure, and non-disclosure rule echoing.
嚴重度
high
類別
Context Exfiltration
掃描目標
mcp
作者
ATR Community
建議回應
block outputalertsnapshotreset contextescalate
參考資料
OWASP Agentic
ASI01:2026 - Agent Goal HijackASI06:2026 - Memory and Context Poisoning
OWASP LLM
LLM07:2025 - System Prompt LeakageLLM02:2025 - Sensitive Information Disclosure
MITRE ATLAS
AML.T0056 - LLM Meta Prompt ExtractionAML.T0051 - LLM Prompt Injection