ATR-2026-00032highAgent Manipulationexperimental
Agent Goal Hijacking Detection
Detects when an agent's objective is being redirected away from its original task through explicit redirection commands, subtle topic pivoting, urgency injection, or self-initiated goal changes. Goal hijacking occurs when adversarial input causes an agent to abandon its assigned objective and pursue a different goal, resulting in task failure, unauthorized actions, data leakage, or resource waste. This rule uses regex-only detection on both user input and agent output to identify redirection language patterns.
嚴重度
high
類別
Agent Manipulation
掃描目標
mcp
作者
ATR Community
建議回應
reduce permissionsescalatealertsnapshot
參考資料
OWASP Agentic
ASI01:2026 - Agent Goal Hijack
OWASP LLM
LLM01:2025 - Prompt InjectionLLM06:2025 - Excessive Agency
MITRE ATLAS
AML.T0051 - Prompt InjectionAML.T0043 - Craft Adversarial Data