ATR-2026-00129criticalSkill 入侵experimental
Unicode Tag Character Smuggling
Detects Unicode Tag characters (U+E0001-U+E007F) and other invisible Unicode ranges used to smuggle hidden instructions in SKILL.md files. These characters are invisible in most editors and renderers but are processed by AI agents as text. Snyk ToxicSkills research (Feb 2026) documented this as "ASCII smuggling" — invisible payloads that agents read but humans cannot see.
嚴重度
critical
類別
Skill 入侵
掃描目標
skill
作者
ATR Community
建議回應
alertblock input
參考資料
OWASP LLM
LLM01:2025 - Prompt Injection
MITRE ATLAS
AML.T0010 - ML Supply Chain Compromise
實地驗證
驗證日期
2026-04-08
樣本數
53,577
誤報率
0.0056%
偵測條件
組合方式: any- 013+ Unicode Tag characters (invisible payload, Snyk ASCII smuggling) via surrogate pairs欄位: content運算子: regex
- 025+ zero-width characters spread through text (steganographic smuggling)欄位: content運算子: regex
攻擊範例(規則會觸發)
# Skill Normaltextwithhiddeninstructionsembedded
以上為真實攻擊 payload 脫敏版本。用於 regression testing。
正常樣本(規則不會觸發)
# Normal Skill This skill helps with file management. ## Commands - list: List files
# Tool Use this tool for simple tasks.
完整 YAML 定義
在 GitHub 編輯 →title: "Unicode Tag Character Smuggling"
id: ATR-2026-00129
rule_version: 1
status: experimental
description: >
Detects Unicode Tag characters (U+E0001-U+E007F) and other invisible Unicode
ranges used to smuggle hidden instructions in SKILL.md files. These characters
are invisible in most editors and renderers but are processed by AI agents as
text. Snyk ToxicSkills research (Feb 2026) documented this as "ASCII smuggling"
— invisible payloads that agents read but humans cannot see.
author: "ATR Community"
date: "2026/04/05"
schema_version: "1.0"
detection_tier: pattern
maturity: experimental
severity: critical
references:
mitre_atlas:
- "AML.T0010 - ML Supply Chain Compromise"
owasp_llm:
- "LLM01:2025 - Prompt Injection"
owasp_ast:
- "AST01:2026 - Malicious Skills"
- "AST05:2026 - Unsafe Deserialization"
research:
- "Snyk ToxicSkills: ASCII smuggling via Unicode Tags (Feb 2026)"
- "Unicode Technical Report #36: Security Considerations"
metadata_provenance:
mitre_atlas: auto-generated
compliance:
nist_ai_rmf:
- subcategory: "MP.5.1"
context: "Invisible Unicode Tag characters and zero-width steganographic payloads embedded in SKILL.md files are adversarial inputs that exploit the gap between human-visible content and agent-parsed content; MP.5.1 requires identifying and characterizing these hidden prompt-injection vectors as risks to the AI system."
strength: primary
- subcategory: "MG.3.2"
context: "SKILL.md files are third-party supplied artifacts consumed by AI agents, and Unicode smuggling is a supply chain compromise vector; MG.3.2 requires monitoring of these pre-trained/third-party components for hidden malicious content before agent execution."
strength: secondary
- subcategory: "MG.2.3"
context: "Detection of 3+ Unicode Tag characters or 5+ zero-width characters indicates a covert injection payload that must trigger containment of the affected skill; MG.2.3 mandates predefined response plans to disengage or quarantine compromised skills before agents execute the smuggled instructions."
strength: secondary
tags:
category: skill-compromise
subcategory: unicode-smuggling
confidence: high
scan_target: skill
agent_source:
type: mcp_exchange
framework:
- any
provider:
- any
detection:
condition: any
conditions:
# Detect Unicode Tag characters (U+E0000-U+E007F) used for ASCII smuggling.
# These are invisible in editors but processed by AI agents as text.
# Must use Unicode codepoint escapes (JS strings are UTF-16, not raw bytes).
- field: content
operator: regex
value: "(?:\\uDB40[\\uDC00-\\uDC7F]){3,}"
description: "3+ Unicode Tag characters (invisible payload, Snyk ASCII smuggling) via surrogate pairs"
# Detect zero-width characters used for steganographic payload hiding.
# ZWSP, ZWNJ, ZWJ, Word Joiner, BOM — 5+ occurrences = suspicious density.
- field: content
operator: regex
value: "[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff].*[\\u200b\\u200c\\u200d\\u2060\\ufeff]"
description: "5+ zero-width characters spread through text (steganographic smuggling)"
confidence: 50
wild_validated: "2026/04/08"
wild_samples: 53577
wild_fp_rate: 0.0056
test_cases:
true_positives:
- input: "# Skill\n\nNormal\u200b\u200btext\u200b\u200bwith\u200b\u200bhidden\u200b\u200binstructions\u200b\u200bembedded"
expected: trigger
true_negatives:
- input: "# Normal Skill\n\nThis skill helps with file management.\n\n## Commands\n- list: List files"
expected: no_trigger
- input: "# Tool\n\nUse this tool for simple tasks."
expected: no_trigger
response:
actions:
- alert
- block_input