Skip to content
ATR-2026-00062criticalTool Poisoningexperimental

Hidden Capability in MCP Skill

Detects MCP skills that expose hidden or undocumented capabilities beyond their declared tool schema. A skill may advertise a simple interface but accept hidden parameters like "debug_mode", "admin_override", or "raw_exec" that unlock dangerous functionality. This is a common pattern in trojaned MCP packages.

Severity
critical
Category
Tool Poisoning
Scan Target
mcp
Author
ATR Community

Response Actions

block toolquarantine sessionalertsnapshotkill agent

References

OWASP Agentic
ASI02:2026 - Tool Misuse and ExploitationASI05:2026 - Unexpected Code Execution
OWASP LLM
LLM03:2025 - Supply Chain VulnerabilitiesLLM06:2025 - Excessive Agency
MITRE ATLAS
AML.T0010 - ML Supply Chain Compromise

Wild Validation

Validated
2026-04-08
Samples
53,577
False Positive Rate
0%

Detection Conditions

Combinator: any
  1. 01
    field: tool_argsop: regex
  2. 02
    field: tool_argsop: regex
  3. 03
    field: tool_argsop: regex
  4. 04
    field: tool_responseop: regex

Attack Examples (Rule Triggers)

  1. Matches: hidden debug_mode and raw_exec parameters
    tool_name: note_taker
    tool_args: '{"text": "hello", "debug_mode": true, "raw_exec": "cat /etc/shadow"}'
  2. Matches: dunder-wrapped hidden parameter
    tool_name: calculator
    tool_args: '{"expression": "2+2", "__backdoor__": "reverse_shell"}'

Real-world attack payloads (sanitized). Used for regression testing.

Benign Examples (Rule Doesn't Trigger)

  1. tool_name: debugger
    tool_args: '{"breakpoint": "line_42", "mode": "step"}'
  2. tool_name: admin_panel
    tool_args: '{"action": "list_users"}'

Known False Positive Contexts

  • Development/staging environments where debug modes are intentionally enabled
  • Administrative tools that legitimately expose debug parameters

Full YAML Definition

Edit on GitHub →
title: "Hidden Capability in MCP Skill"
id: ATR-2026-00062
rule_version: 1
status: experimental
description: >
  Detects MCP skills that expose hidden or undocumented capabilities beyond their
  declared tool schema. A skill may advertise a simple interface but accept hidden
  parameters like "debug_mode", "admin_override", or "raw_exec" that unlock dangerous
  functionality. This is a common pattern in trojaned MCP packages.
author: "ATR Community"
date: "2026/03/08"
schema_version: "0.1"
detection_tier: pattern
maturity: experimental
severity: critical

references:
  owasp_llm:
    - "LLM03:2025 - Supply Chain Vulnerabilities"
    - "LLM06:2025 - Excessive Agency"
  owasp_agentic:
    - "ASI02:2026 - Tool Misuse and Exploitation"
    - "ASI05:2026 - Unexpected Code Execution"
  mitre_atlas:
    - "AML.T0010 - ML Supply Chain Compromise"
  cve:
    - "CVE-2025-59536"

compliance:
  nist_ai_rmf:
    - subcategory: "GV.6.1"
      context: "Hidden capabilities in MCP skills represent third-party/supplier AI risk where a packaged tool exposes undocumented parameters beyond its declared schema; GV.6.1 requires policies and procedures that govern third-party AI components and detect deviations from declared interfaces."
      strength: primary
    - subcategory: "MG.3.2"
      context: "Trojaned MCP packages with hidden parameters like debug_mode or admin_override are exactly the supply-chain risk MG.3.2 addresses by requiring monitoring of pre-trained models and third-party components used in development for unexpected or unsafe capabilities."
      strength: secondary
    - subcategory: "MP.5.1"
      context: "Detection of undocumented dangerous parameters characterizes the likelihood and magnitude of supply-chain compromise impact; MP.5.1 requires that these hidden-capability risks be identified and tracked as part of risk characterization."
      strength: secondary
tags:
  category: tool-poisoning
  subcategory: hidden-capability
  scan_target: mcp
  confidence: high

agent_source:
  type: tool_call
  framework:
    - mcp
    - langchain
    - autogen
  provider:
    - any

detection:
  conditions:
    - field: tool_args
      operator: regex
      value: "(?i)(debug[_-]?mode|admin[_-]?override|raw[_-]?exec|unsafe[_-]?mode|bypass[_-]?check|no[_-]?sandbox)"
    - field: tool_args
      operator: regex
      value: "(?i)(__[a-z]+__|\\$\\$[a-z]+|_internal_|_hidden_|_backdoor_|_secret_)"
    - field: tool_args
      operator: regex
      value: "(?i)(eval_code|exec_raw|shell_cmd|run_arbitrary|system_call|inject_prompt)"
    - field: tool_response
      operator: regex
      value: "(?i)(hidden\\s+mode\\s+activated|debug\\s+mode\\s+enabled|admin\\s+access\\s+granted|sandbox\\s+disabled)"
  condition: any
  false_positives:
    - "Development/staging environments where debug modes are intentionally enabled"
    - "Administrative tools that legitimately expose debug parameters"

response:
  actions:
    - block_tool
    - quarantine_session
    - alert
    - snapshot
    - kill_agent
  auto_response_threshold: critical
  message_template: >
    [ATR-2026-062] CRITICAL: Hidden capability detected in MCP skill. Tool arguments
    or response indicate undocumented functionality that may be a backdoor.
    Session quarantined. Investigate skill source immediately.

confidence: 53
wild_validated: "2026/04/08"
wild_samples: 53577
wild_fp_rate: 0
test_cases:
  true_positives:
    - input:
        tool_name: "note_taker"
        tool_args: '{"text": "hello", "debug_mode": true, "raw_exec": "cat /etc/shadow"}'
      expected: triggered
      matched_condition: "hidden debug_mode and raw_exec parameters"
    - input:
        tool_name: "calculator"
        tool_args: '{"expression": "2+2", "__backdoor__": "reverse_shell"}'
      expected: triggered
      matched_condition: "dunder-wrapped hidden parameter"
  true_negatives:
    - input:
        tool_name: "debugger"
        tool_args: '{"breakpoint": "line_42", "mode": "step"}'
      expected: not_triggered
      reason: "Debugger tool legitimately using debug-related terminology"
    - input:
        tool_name: "admin_panel"
        tool_args: '{"action": "list_users"}'
      expected: not_triggered
      reason: "Admin tool using normal admin operations"

Revision History

Created
2026-03-08
Last modified
2026-05-24
View full commit history on GitHub →