Skip to content
ATR-2026-00129criticalSkill Compromiseexperimental

Unicode Tag Character Smuggling

Detects Unicode Tag characters (U+E0001-U+E007F) and other invisible Unicode ranges used to smuggle hidden instructions in SKILL.md files. These characters are invisible in most editors and renderers but are processed by AI agents as text. Snyk ToxicSkills research (Feb 2026) documented this as "ASCII smuggling" — invisible payloads that agents read but humans cannot see.

Severity
critical
Category
Skill Compromise
Scan Target
skill
Author
ATR Community

Response Actions

alertblock input

References

OWASP LLM
LLM01:2025 - Prompt Injection
MITRE ATLAS
AML.T0010 - ML Supply Chain Compromise

Wild Validation

Validated
2026-04-08
Samples
53,577
False Positive Rate
0.0056%

Detection Conditions

Combinator: any
  1. 01
    3+ Unicode Tag characters (invisible payload, Snyk ASCII smuggling) via surrogate pairs
    field: contentop: regex
  2. 02
    5+ zero-width characters spread through text (steganographic smuggling)
    field: contentop: regex

Attack Examples (Rule Triggers)

  1. # Skill
    
    Normal​​text​​with​​hidden​​instructions​​embedded

Real-world attack payloads (sanitized). Used for regression testing.

Benign Examples (Rule Doesn't Trigger)

  1. # Normal Skill
    
    This skill helps with file management.
    
    ## Commands
    - list: List files
  2. # Tool
    
    Use this tool for simple tasks.

Full YAML Definition

Edit on GitHub →
title: "Unicode Tag Character Smuggling"
id: ATR-2026-00129
rule_version: 1
status: experimental
description: >
  Detects Unicode Tag characters (U+E0001-U+E007F) and other invisible Unicode
  ranges used to smuggle hidden instructions in SKILL.md files. These characters
  are invisible in most editors and renderers but are processed by AI agents as
  text. Snyk ToxicSkills research (Feb 2026) documented this as "ASCII smuggling"
  — invisible payloads that agents read but humans cannot see.
author: "ATR Community"
date: "2026/04/05"
schema_version: "1.0"
detection_tier: pattern
maturity: experimental
severity: critical

references:
  mitre_atlas:
    - "AML.T0010 - ML Supply Chain Compromise"
  owasp_llm:
    - "LLM01:2025 - Prompt Injection"
  owasp_ast:
    - "AST01:2026 - Malicious Skills"
    - "AST05:2026 - Unsafe Deserialization"
  research:
    - "Snyk ToxicSkills: ASCII smuggling via Unicode Tags (Feb 2026)"
    - "Unicode Technical Report #36: Security Considerations"

metadata_provenance:
  mitre_atlas: auto-generated

compliance:
  nist_ai_rmf:
    - subcategory: "MP.5.1"
      context: "Invisible Unicode Tag characters and zero-width steganographic payloads embedded in SKILL.md files are adversarial inputs that exploit the gap between human-visible content and agent-parsed content; MP.5.1 requires identifying and characterizing these hidden prompt-injection vectors as risks to the AI system."
      strength: primary
    - subcategory: "MG.3.2"
      context: "SKILL.md files are third-party supplied artifacts consumed by AI agents, and Unicode smuggling is a supply chain compromise vector; MG.3.2 requires monitoring of these pre-trained/third-party components for hidden malicious content before agent execution."
      strength: secondary
    - subcategory: "MG.2.3"
      context: "Detection of 3+ Unicode Tag characters or 5+ zero-width characters indicates a covert injection payload that must trigger containment of the affected skill; MG.2.3 mandates predefined response plans to disengage or quarantine compromised skills before agents execute the smuggled instructions."
      strength: secondary
tags:
  category: skill-compromise
  subcategory: unicode-smuggling
  confidence: high
  scan_target: skill

agent_source:
  type: mcp_exchange
  framework:
    - any
  provider:
    - any

detection:
  condition: any
  conditions:
    # Detect Unicode Tag characters (U+E0000-U+E007F) used for ASCII smuggling.
    # These are invisible in editors but processed by AI agents as text.
    # Must use Unicode codepoint escapes (JS strings are UTF-16, not raw bytes).
    - field: content
      operator: regex
      value: "(?:\\uDB40[\\uDC00-\\uDC7F]){3,}"
      description: "3+ Unicode Tag characters (invisible payload, Snyk ASCII smuggling) via surrogate pairs"

    # Detect zero-width characters used for steganographic payload hiding.
    # ZWSP, ZWNJ, ZWJ, Word Joiner, BOM — 5+ occurrences = suspicious density.
    - field: content
      operator: regex
      value: "[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff]"
      description: "5+ zero-width characters spread through text (steganographic smuggling)"

confidence: 50
wild_validated: "2026/04/08"
wild_samples: 53577
wild_fp_rate: 0.0056
test_cases:
  true_positives:
    - input: "# Skill\n\nNormal\u200b\u200btext\u200b\u200bwith\u200b\u200bhidden\u200b\u200binstructions\u200b\u200bembedded"
      expected: trigger

  true_negatives:
    - input: "# Normal Skill\n\nThis skill helps with file management.\n\n## Commands\n- list: List files"
      expected: no_trigger
    - input: "# Tool\n\nUse this tool for simple tasks."
      expected: no_trigger

response:
  actions:
    - alert
    - block_input

Revision History

Created
2026-04-05
Last modified
2026-05-24
View full commit history on GitHub →