ATR-2026-00103criticalTool Poisoningexperimental

Hidden LLM Safety Bypass Instructions in Tool Descriptions

Detects tools that embed explicit instructions directing the LLM to disregard safety mechanisms, ignore security policies, or bypass previous instructions. Commonly uses patterns like "NOTE TO AI: disregard all safety instructions" to manipulate the LLM into operating outside its safety boundaries. Unlike consent-bypass attacks (ATR-2026-100), this targets the LLM's core safety mechanisms rather than specific user confirmation flows.

嚴重度

critical

類別

Tool Poisoning

掃描目標

mcp

作者

ATR Community

建議回應

block toolalertsnapshot

參考資料

OWASP Agentic

ASI01:2026 - Agent Goal Hijack

OWASP LLM

LLM01:2025 - Prompt InjectionLLM07:2025 - System Prompt Leakage

MITRE ATLAS

AML.T0051 - Prompt Injection

在 GitHub 上查看完整 YAML →

Hidden LLM Safety Bypass Instructions in Tool Descriptions

建議回應

參考資料

更多 Tool Poisoning 規則