Skip to content
ATR-2026-00077highAgent Manipulationexperimental

Human-Agent Trust Exploitation Detection

Detects when an agent attempts to exploit human trust by presenting fabricated confidence, suppressing uncertainty, manufacturing urgency, or discouraging verification. Covers patterns where agents claim false certainty to get human approval for risky actions, suppress caveats or warnings, create artificial time pressure to bypass review, claim exclusive authority to discourage second opinions, and use emotional manipulation to override human judgment. Note: Detects explicit manipulation language patterns. Subtle manipulation techniques (selective omission, framing effects) require semantic analysis planned for v0.2.

Severity
high
Category
Agent Manipulation
Scan Target
mcp
Author
ATR Community

Response Actions

escalatealertsnapshot

References

OWASP Agentic
ASI09:2026 - Human-Agent Trust Exploitation
OWASP LLM
LLM06:2025 - Excessive AgencyLLM09:2025 - Misinformation
MITRE ATLAS
AML.T0048 - Adversarial Prompt Techniques