Skip to content
ATR-2026-00032highAgent Manipulationexperimental

Agent Goal Hijacking Detection

Detects when an agent's objective is being redirected away from its original task through explicit redirection commands, subtle topic pivoting, urgency injection, or self-initiated goal changes. Goal hijacking occurs when adversarial input causes an agent to abandon its assigned objective and pursue a different goal, resulting in task failure, unauthorized actions, data leakage, or resource waste. This rule uses regex-only detection on both user input and agent output to identify redirection language patterns.

嚴重度
high
類別
Agent Manipulation
掃描目標
mcp
作者
ATR Community

建議回應

reduce permissionsescalatealertsnapshot

參考資料

OWASP Agentic
ASI01:2026 - Agent Goal Hijack
OWASP LLM
LLM01:2025 - Prompt InjectionLLM06:2025 - Excessive Agency
MITRE ATLAS
AML.T0051 - Prompt InjectionAML.T0043 - Craft Adversarial Data