ATR-2026-00004criticalPrompt Injectionexperimental
System Prompt Override Attempt
Detects attempts to override, replace, or redefine the agent's system prompt. Attackers craft inputs that mimic system-level instructions to hijack the agent's foundational behavior. Detection covers: explicit system prompt replacement/update statements, model-specific special tokens (ChatML, Llama, Mistral, Gemma), JSON role injection, YAML-style system directives, markdown header system sections, system prompt invalidation claims, fake admin/override tags, XML-style system blocks, instruction replacement without delimiters, configuration object injection, and multi-format delimiter abuse. This is critical-severity as successful exploitation grants full control over agent behavior.
嚴重度
critical
類別
Prompt Injection
掃描目標
mcp
作者
ATR Community
建議回應
block inputalertsnapshotkill agent
參考資料
OWASP Agentic
ASI01:2026 - Agent Goal HijackASI06:2026 - Memory and Context Poisoning
OWASP LLM
LLM01:2025 - Prompt Injection
MITRE ATLAS
AML.T0051 - LLM Prompt InjectionAML.T0051.000 - Direct