ATR-2026-00077highAgent Manipulationexperimental
Human-Agent Trust Exploitation Detection
Detects when an agent attempts to exploit human trust by presenting fabricated confidence, suppressing uncertainty, manufacturing urgency, or discouraging verification. Covers patterns where agents claim false certainty to get human approval for risky actions, suppress caveats or warnings, create artificial time pressure to bypass review, claim exclusive authority to discourage second opinions, and use emotional manipulation to override human judgment. Note: Detects explicit manipulation language patterns. Subtle manipulation techniques (selective omission, framing effects) require semantic analysis planned for v0.2.
Severity
high
Category
Agent Manipulation
Scan Target
mcp
Author
ATR Community
Response Actions
escalatealertsnapshot
References
OWASP Agentic
ASI09:2026 - Human-Agent Trust Exploitation
OWASP LLM
LLM06:2025 - Excessive AgencyLLM09:2025 - Misinformation
MITRE ATLAS
AML.T0048 - Adversarial Prompt Techniques