ATR-2026-00129criticalSkill Compromiseexperimental

Unicode Tag Character Smuggling

Detects Unicode Tag characters (U+E0001-U+E007F) and other invisible Unicode ranges used to smuggle hidden instructions in SKILL.md files. These characters are invisible in most editors and renderers but are processed by AI agents as text. Snyk ToxicSkills research (Feb 2026) documented this as "ASCII smuggling" — invisible payloads that agents read but humans cannot see.

Severity

critical

Response Actions

alertblock input

References

OWASP Agentic

ASI01:2026 - Agent Goal Hijack

OWASP LLM

LLM01:2025 - Prompt Injection

MITRE ATLAS

AML.T0010 - AI Supply Chain Compromise

Wild Validation

Validated

2026-04-08

Samples

53,577

False Positive Rate (in-the-wild)

0.0056%

Detection Conditions

Combinator: any

01
3+ Unicode Tag characters (invisible payload, Snyk ASCII smuggling) via surrogate pairs
field: contentop: regex
02
5+ zero-width characters spread through text (steganographic smuggling)
field: contentop: regex

Attack Examples (Rule Triggers)

# Skill

Normaltextwithhiddeninstructionsembedded

Real-world attack payloads, sanitized and versioned alongside the rule as regression tests — so a future revision can't silently stop catching them.

Benign Examples (Rule Doesn't Trigger)

# Normal Skill

This skill helps with file management.

## Commands
- list: List files

```
# Tool

Use this tool for simple tasks.
```

Full YAML Definition

Edit on GitHub →

title: "Unicode Tag Character Smuggling"
id: ATR-2026-00129
rule_version: 1
status: experimental
description: >
  Detects Unicode Tag characters (U+E0001-U+E007F) and other invisible Unicode
  ranges used to smuggle hidden instructions in SKILL.md files. These characters
  are invisible in most editors and renderers but are processed by AI agents as
  text. Snyk ToxicSkills research (Feb 2026) documented this as "ASCII smuggling"
  — invisible payloads that agents read but humans cannot see.
author: "ATR Community"
date: "2026/04/05"
schema_version: "1.0"
detection_tier: pattern
maturity: test
severity: critical

references:
  mitre_atlas:
    - "AML.T0010 - AI Supply Chain Compromise"
  owasp_llm:
    - "LLM01:2025 - Prompt Injection"
  owasp_ast:
    - "AST01:2026 - Malicious Skills"
    - "AST05:2026 - Unsafe Deserialization"
  research:
    - "Snyk ToxicSkills: ASCII smuggling via Unicode Tags (Feb 2026)"
    - "Unicode Technical Report #36: Security Considerations"

  owasp_agentic:
    - ASI01:2026 - Agent Goal Hijack
metadata_provenance:
  mitre_atlas: auto-generated

compliance:
  nist_ai_rmf:
    - subcategory: "MP.5.1"
      context: "Invisible Unicode Tag characters and zero-width steganographic payloads embedded in SKILL.md files are adversarial inputs that exploit the gap between human-visible content and agent-parsed content; MP.5.1 requires identifying and characterizing these hidden prompt-injection vectors as risks to the AI system."
      strength: primary
    - subcategory: "MG.3.2"
      context: "SKILL.md files are third-party supplied artifacts consumed by AI agents, and Unicode smuggling is a supply chain compromise vector; MG.3.2 requires monitoring of these pre-trained/third-party components for hidden malicious content before agent execution."
      strength: secondary
    - subcategory: "MG.2.3"
      context: "Detection of 3+ Unicode Tag characters or 5+ zero-width characters indicates a covert injection payload that must trigger containment of the affected skill; MG.2.3 mandates predefined response plans to disengage or quarantine compromised skills before agents execute the smuggled instructions."
      strength: secondary
    - subcategory: "MS.2.7"
      context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the skill supply-chain compromise (Unicode Tag Character Smuggling)."
      strength: primary
    - subcategory: "MG.3.1"
      context: "NIST AI RMF MANAGE 3.1 (third-party resources regularly monitored) is supported where this rule detects the skill supply-chain compromise (Unicode Tag Character Smuggling)."
      strength: secondary
  iso_42001:
    - clause: "8.1"
      context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the skill supply-chain compromise (Unicode Tag Character Smuggling)."
      strength: primary
    - clause: "8.3"
      context: "ISO/IEC 42001 Clause 8.3 (AI risk treatment) is supported by this rule, which implements runtime detection of the skill supply-chain compromise (Unicode Tag Character Smuggling) as a treatment control."
      strength: secondary
  eu_ai_act:
    - article: "15"
      context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the skill supply-chain compromise (Unicode Tag Character Smuggling)."
      strength: primary
    - article: "9"
      context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the skill supply-chain compromise (Unicode Tag Character Smuggling)."
      strength: secondary
tags:
  category: skill-compromise
  subcategory: unicode-smuggling
  confidence: high
  scan_target: skill

agent_source:
  type: mcp_exchange
  framework:
    - any
  provider:
    - any

detection:
  condition: any
  conditions:
    # Detect Unicode Tag characters (U+E0000-U+E007F) used for ASCII smuggling.
    # These are invisible in editors but processed by AI agents as text.
    # Must use Unicode codepoint escapes (JS strings are UTF-16, not raw bytes).
    - field: content
      operator: regex
      value: "(?:\\uDB40[\\uDC00-\\uDC7F]){3,}"
      description: "3+ Unicode Tag characters (invisible payload, Snyk ASCII smuggling) via surrogate pairs"

    # Detect zero-width characters used for steganographic payload hiding.
    # ZWSP, ZWNJ, ZWJ, Word Joiner, BOM — 5+ occurrences = suspicious density.
    - field: content
      operator: regex
      value: "[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff]"
      description: "5+ zero-width characters spread through text (steganographic smuggling)"

confidence: 50
wild_validated: "2026/04/08"
wild_samples: 53577
wild_fp_rate: 0.0056
test_cases:
  true_positives:
    - input: "# Skill\n\nNormal\u200b\u200btext\u200b\u200bwith\u200b\u200bhidden\u200b\u200binstructions\u200b\u200bembedded"
      expected: trigger

  true_negatives:
    - input: "# Normal Skill\n\nThis skill helps with file management.\n\n## Commands\n- list: List files"
      expected: no_trigger
    - input: "# Tool\n\nUse this tool for simple tasks."
      expected: no_trigger

response:
  actions:
    - alert
    - block_input

Revision History

Created

2026-04-05

Last modified

2026-07-07

View full commit history on GitHub →

More Skill Compromise Rules

ATR-2026-00060highMCP Skill Impersonation and Supply Chain Attack ATR-2026-00061mediumSkill Description-Behavior Mismatch ATR-2026-00062criticalHidden Capability in MCP Skill ATR-2026-00063criticalMulti-Skill Chain Attack ATR-2026-00064highOver-Permissioned MCP Skill