Skip to content
ATR-2026-00011highTool Poisoningexperimental

Instruction Injection via Tool Output

Detects hidden instructions embedded in tool outputs that attempt to manipulate the agent's subsequent behavior. Tool responses may contain injected directives disguised as data that instruct the agent to perform unauthorized actions, change behavior, or exfiltrate information. Detection covers: urgency-prefixed directives addressing the agent, direct agent manipulation commands, information suppression directives, tool invocation instructions, data exfiltration commands, hidden instruction tags, response injection directives, conversational steering, system-pretending tokens, fake API response structures, subtle action-required patterns, and steganographic instruction embedding. Patterns are designed to require multiple signals where possible to reduce false positives.

Severity
high
Category
Tool Poisoning
Scan Target
mcp
Author
ATR Community

Response Actions

block toolalertsnapshotescalate

References

OWASP Agentic
ASI02:2026 - Tool Misuse and Exploitation
OWASP LLM
LLM01:2025 - Prompt InjectionLLM05:2025 - Improper Output Handling
MITRE ATLAS
AML.T0053 - LLM Plugin CompromiseAML.T0051.001 - Indirect Prompt Injection