ATR-2026-00103criticalTool Poisoningexperimental

Hidden LLM Safety Bypass Instructions in Tool Descriptions

Detects tools that embed explicit instructions directing the LLM to disregard safety mechanisms, ignore security policies, or bypass previous instructions. Commonly uses patterns like "NOTE TO AI: disregard all safety instructions" to manipulate the LLM into operating outside its safety boundaries. Unlike consent-bypass attacks (ATR-2026-100), this targets the LLM's core safety mechanisms rather than specific user confirmation flows.

Severity

critical

Response Actions

block toolalertsnapshot

References

OWASP Agentic

ASI01:2026 - Agent Goal Hijack

OWASP LLM

LLM01:2025 - Prompt InjectionLLM07:2025 - System Prompt Leakage

MITRE ATLAS

AML.T0051 - Prompt Injection

View full YAML on GitHub →

More Tool Poisoning Rules

ATR-2026-00010criticalMalicious Content in MCP Tool Response ATR-2026-00011highInstruction Injection via Tool Output ATR-2026-00012highUnauthorized Tool Call Detection ATR-2026-00013criticalSSRF via Agent Tool Calls ATR-2026-00061mediumSkill Description-Behavior Mismatch