ATR-2026-00129criticalSkill Compromiseexperimental
Unicode Tag Character Smuggling
Detects Unicode Tag characters (U+E0001-U+E007F) and other invisible Unicode ranges used to smuggle hidden instructions in SKILL.md files. These characters are invisible in most editors and renderers but are processed by AI agents as text. Snyk ToxicSkills research (Feb 2026) documented this as "ASCII smuggling" — invisible payloads that agents read but humans cannot see.
Severity
critical
Category
Skill Compromise
Scan Target
skill
Author
ATR Community
Response Actions
alertblock input
References
OWASP LLM
LLM01:2025 - Prompt Injection
MITRE ATLAS
AML.T0010 - ML Supply Chain Compromise
Wild Validation
Validated
2026-04-08
Samples
53,577
False Positive Rate
0.0056%
Detection Conditions
Combinator: any- 013+ Unicode Tag characters (invisible payload, Snyk ASCII smuggling) via surrogate pairsfield: contentop: regex
- 025+ zero-width characters spread through text (steganographic smuggling)field: contentop: regex
Attack Examples (Rule Triggers)
# Skill Normaltextwithhiddeninstructionsembedded
Real-world attack payloads (sanitized). Used for regression testing.
Benign Examples (Rule Doesn't Trigger)
# Normal Skill This skill helps with file management. ## Commands - list: List files
# Tool Use this tool for simple tasks.
Full YAML Definition
Edit on GitHub →title: "Unicode Tag Character Smuggling"
id: ATR-2026-00129
rule_version: 1
status: experimental
description: >
Detects Unicode Tag characters (U+E0001-U+E007F) and other invisible Unicode
ranges used to smuggle hidden instructions in SKILL.md files. These characters
are invisible in most editors and renderers but are processed by AI agents as
text. Snyk ToxicSkills research (Feb 2026) documented this as "ASCII smuggling"
— invisible payloads that agents read but humans cannot see.
author: "ATR Community"
date: "2026/04/05"
schema_version: "1.0"
detection_tier: pattern
maturity: experimental
severity: critical
references:
mitre_atlas:
- "AML.T0010 - ML Supply Chain Compromise"
owasp_llm:
- "LLM01:2025 - Prompt Injection"
owasp_ast:
- "AST01:2026 - Malicious Skills"
- "AST05:2026 - Unsafe Deserialization"
research:
- "Snyk ToxicSkills: ASCII smuggling via Unicode Tags (Feb 2026)"
- "Unicode Technical Report #36: Security Considerations"
metadata_provenance:
mitre_atlas: auto-generated
compliance:
nist_ai_rmf:
- subcategory: "MP.5.1"
context: "Invisible Unicode Tag characters and zero-width steganographic payloads embedded in SKILL.md files are adversarial inputs that exploit the gap between human-visible content and agent-parsed content; MP.5.1 requires identifying and characterizing these hidden prompt-injection vectors as risks to the AI system."
strength: primary
- subcategory: "MG.3.2"
context: "SKILL.md files are third-party supplied artifacts consumed by AI agents, and Unicode smuggling is a supply chain compromise vector; MG.3.2 requires monitoring of these pre-trained/third-party components for hidden malicious content before agent execution."
strength: secondary
- subcategory: "MG.2.3"
context: "Detection of 3+ Unicode Tag characters or 5+ zero-width characters indicates a covert injection payload that must trigger containment of the affected skill; MG.2.3 mandates predefined response plans to disengage or quarantine compromised skills before agents execute the smuggled instructions."
strength: secondary
tags:
category: skill-compromise
subcategory: unicode-smuggling
confidence: high
scan_target: skill
agent_source:
type: mcp_exchange
framework:
- any
provider:
- any
detection:
condition: any
conditions:
# Detect Unicode Tag characters (U+E0000-U+E007F) used for ASCII smuggling.
# These are invisible in editors but processed by AI agents as text.
# Must use Unicode codepoint escapes (JS strings are UTF-16, not raw bytes).
- field: content
operator: regex
value: "(?:\\uDB40[\\uDC00-\\uDC7F]){3,}"
description: "3+ Unicode Tag characters (invisible payload, Snyk ASCII smuggling) via surrogate pairs"
# Detect zero-width characters used for steganographic payload hiding.
# ZWSP, ZWNJ, ZWJ, Word Joiner, BOM — 5+ occurrences = suspicious density.
- field: content
operator: regex
value: "[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff]"
description: "5+ zero-width characters spread through text (steganographic smuggling)"
confidence: 50
wild_validated: "2026/04/08"
wild_samples: 53577
wild_fp_rate: 0.0056
test_cases:
true_positives:
- input: "# Skill\n\nNormal\u200b\u200btext\u200b\u200bwith\u200b\u200bhidden\u200b\u200binstructions\u200b\u200bembedded"
expected: trigger
true_negatives:
- input: "# Normal Skill\n\nThis skill helps with file management.\n\n## Commands\n- list: List files"
expected: no_trigger
- input: "# Tool\n\nUse this tool for simple tasks."
expected: no_trigger
response:
actions:
- alert
- block_input