ATR-2026-00004criticalPrompt Injectionexperimental
System Prompt Override Attempt
Detects attempts to override, replace, or redefine the agent's system prompt. Attackers craft inputs that mimic system-level instructions to hijack the agent's foundational behavior. Detection covers: explicit system prompt replacement/update statements, model-specific special tokens (ChatML, Llama, Mistral, Gemma), JSON role injection, YAML-style system directives, markdown header system sections, system prompt invalidation claims, fake admin/override tags, XML-style system blocks, instruction replacement without delimiters, configuration object injection, and multi-format delimiter abuse. This is critical-severity as successful exploitation grants full control over agent behavior.
Severity
critical
Category
Prompt Injection
Scan Target
mcp
Author
ATR Community
Response Actions
block inputalertsnapshotkill agent
References
OWASP Agentic
ASI01:2026 - Agent Goal HijackASI06:2026 - Memory and Context Poisoning
OWASP LLM
LLM01:2025 - Prompt Injection
MITRE ATLAS
AML.T0051 - LLM Prompt InjectionAML.T0051.000 - Direct