Skip to content
ATR-2026-00077highAgent Manipulationexperimental

Human-Agent Trust Exploitation Detection

Detects when an agent attempts to exploit human trust by presenting fabricated confidence, suppressing uncertainty, manufacturing urgency, or discouraging verification. Covers patterns where agents claim false certainty to get human approval for risky actions, suppress caveats or warnings, create artificial time pressure to bypass review, claim exclusive authority to discourage second opinions, and use emotional manipulation to override human judgment. Note: Detects explicit manipulation language patterns. Subtle manipulation techniques (selective omission, framing effects) require semantic analysis planned for v0.2.

嚴重度
high
類別
Agent Manipulation
掃描目標
mcp
作者
ATR Community

建議回應

escalatealertsnapshot

參考資料

OWASP Agentic
ASI09:2026 - Human-Agent Trust Exploitation
OWASP LLM
LLM06:2025 - Excessive AgencyLLM09:2025 - Misinformation
MITRE ATLAS
AML.T0048 - Adversarial Prompt Techniques