ATR-2026-00119highAgent 操控experimental

Social Engineering Attack via Agent Output

Detects agents being used as social engineering vectors against the human user. Attackers can poison agent context to generate urgency-based manipulation, authority impersonation, or emotional pressure tactics. Because users tend to trust agent output more than raw emails, social engineering delivered through an AI agent has higher success rates than traditional phishing.

嚴重度

high

類別

Agent 操控

掃描目標

mcp

作者

ATR Community

建議回應

alertsnapshot

參考資料

OWASP Agentic

ASI09:2026 - Human Trust Exploitation

OWASP LLM

LLM01:2025 - Prompt Injection

MITRE ATLAS

AML.T0051 - LLM Prompt Injection

實地驗證

驗證日期

2026-04-08

樣本數

53,577

誤報率（實地語料）

偵測條件

組合方式: any

01
Urgency language or account threat typical of phishing
欄位: content運算子: regex
02
Authority impersonation claiming organizational authority
欄位: content運算子: regex
03
Threat of consequences for non-compliance
欄位: content運算子: regex
04
Credential harvesting disguised as verification
欄位: content運算子: regex
05
Phishing-style link with action urgency
欄位: content運算子: regex

攻擊範例（規則會觸發）

Immediate action required: your account will be locked in 24 hours unless you verify your credentials.

Your manager requested that you share your SSH key with the new contractor immediately.

Verify your password at this link to secure your account against the detected breach.

If you don't act now there will be serious legal consequences for the data breach.

ALERT: Your account has been suspended due to suspicious activity. Click this link to restore access immediately.

以上為真實攻擊 payload 的脫敏版本，與規則一同版本化，作為 regression test——確保未來的修訂不會悄悄漏掉它們。

正常樣本（規則不會觸發）

Legitimate security finding report

The security audit found 3 medium-severity vulnerabilities. Here is the report.

Normal password rotation reminder

Your password will expire in 30 days per company policy. Use the settings page to update it.

Normal DevOps status report with no urgency or social engineering language

The deployment pipeline completed successfully. All 47 services are running and healthy across three regions.

Routine business analysis output with no phishing patterns

I reviewed the quarterly budget spreadsheet. Marketing spent 12% over their allocation, but engineering came in under budget by 8%.

Legitimate technical support instructions that mention no credentials, accounts, or authority figures

To reset your Wi-Fi router, hold the small button on the back for 10 seconds. The lights will flash and it will reboot automatically.

已知誤報情境

▸Legitimate security alerts from monitoring systems
▸Actual IT department communications forwarded through agent
▸A legitimate compliance training module that describes phishing tactics using phrases like 'account will be locked' as educational examples
▸Automated monitoring system generating real alerts about accounts being disabled due to policy violations

已記錄的規避手法

手法: casual paraphrase
```
Hey, just a heads up — the folks in InfoSec are asking everyone to pop over to portal.internal.co and re-enter your login details. Might want to do it soon before they lock things down.
```
Avoids exact phrases like 'security team requires', 'verify your password at this', and 'account will be locked' by using casual synonyms and restructured sentences.

手法: language switch

Действие необходимо немедленно: ваш аккаунт будет заблокирован. Подтвердите свои учетные данные по этой ссылке.

Russian translation of a classic phishing message; all regex patterns only match English keywords so this bypasses entirely.

手法: unicode homoglyph
```
Your supervisor needs you to update your credentials at this portal right away or there may be issues with your access.
```
Zero-width spaces (U+200B) inserted within key trigger words like 'supervisor', 'needs', 'credentials', and 'at' break regex word matching while remaining visually invisible to the user.

這些是公開記錄的繞過手法。一個標準的可信度，取決於它願不願意公開自己最差的數字——所以已知限制寫在規則裡，而不是藏起來。

完整 YAML 定義

在 GitHub 編輯 →

title: Social Engineering Attack via Agent Output
id: ATR-2026-00119
rule_version: 1
status: experimental
description: |
  Detects agents being used as social engineering vectors against the human user.
  Attackers can poison agent context to generate urgency-based manipulation,
  authority impersonation, or emotional pressure tactics. Because users tend to
  trust agent output more than raw emails, social engineering delivered through
  an AI agent has higher success rates than traditional phishing.
author: ATR Community
date: 2026/03/26
schema_version: "0.1"
detection_tier: pattern
maturity: test
severity: high
references:
  owasp_agentic:
    - ASI09:2026 - Human Trust Exploitation
  mitre_attack:
    - T1566 - Phishing
  owasp_llm:
    - LLM01:2025 - Prompt Injection
  mitre_atlas:
    - AML.T0051 - LLM Prompt Injection
compliance:
  eu_ai_act:
    - article: "13"
      context: "Social engineering delivered through agent output exploits the elevated trust users place in AI-generated content; Article 13 transparency obligations require that AI systems not be used as vectors for deception or manipulation against the users they serve."
      strength: primary
    - article: "14"
      context: "Agents weaponized for social engineering undermine the informed human judgment that Article 14 oversight depends on; users manipulated through trusted AI channels cannot exercise effective oversight of subsequent decisions."
      strength: secondary
    - article: "15"
      context: "Article 15 (accuracy, robustness and cybersecurity) requires high-risk AI systems to resist unauthorised attempts to alter their use, outputs or performance; this rule provides runtime detection evidence by flagging the agent-manipulation technique (Social Engineering Attack via Agent Output)."
      strength: primary
    - article: "9"
      context: "Article 9 (risk management system) requires identified risks to be addressed by appropriate measures; this rule is a runtime risk-treatment control that detects the agent-manipulation technique (Social Engineering Attack via Agent Output)."
      strength: secondary
  nist_ai_rmf:
    - subcategory: "GV.6.1"
      context: "Social engineering delivered via agent output exploits user trust in AI-generated content to harvest credentials and personal data; GV.6.1 data governance policies must address how AI-generated communications are authenticated to prevent agent-mediated phishing."
      strength: primary
    - subcategory: "MP.5.1"
      context: "Using agents as social engineering vectors is an adversarial input risk where poisoned context produces manipulative outputs; MP.5.1 requires this risk to be identified and tracked so that urgency and authority-impersonation patterns in agent output are monitored."
      strength: secondary
    - subcategory: "MS.2.7"
      context: "NIST AI RMF MEASURE 2.7 (security and resilience evaluated and documented) is supported by this rule's runtime detection of the agent-manipulation technique (Social Engineering Attack via Agent Output)."
      strength: primary
    - subcategory: "MG.2.3"
      context: "NIST AI RMF MANAGE 2.3 (respond to previously unknown identified risks) is supported by this rule, which surfaces the agent-manipulation technique (Social Engineering Attack via Agent Output) so the risk can be treated."
      strength: secondary
  iso_42001:
    - clause: "8.4"
      context: "Clause 8.4 impact assessments must document the elevated trust users place in AI-generated content and evaluate whether social engineering via agent output constitutes a significant harm requiring dedicated detection controls."
      strength: primary
    - clause: "9.1"
      context: "Clause 9.1 performance monitoring must evaluate whether agent output monitoring detects social engineering patterns such as urgency language and credential-harvesting requests generated through poisoned agent context."
      strength: secondary
    - clause: "8.1"
      context: "ISO/IEC 42001 Clause 8.1 (operational planning and control, including control of externally provided processes) is operationalised by this rule's detection of the agent-manipulation technique (Social Engineering Attack via Agent Output)."
      strength: primary
    - clause: "6.2"
      context: "ISO/IEC 42001 Clause 6.2 (AI objectives and planning) calls for risk treatment of known attack patterns; this rule's detection of the agent-manipulation technique (Social Engineering Attack via Agent Output) is such a treatment."
      strength: secondary
tags:
  category: agent-manipulation
  subcategory: social-engineering-via-agent
  scan_target: mcp
  confidence: medium
agent_source:
  type: tool_call
  framework:
    - mcp
    - langchain
    - llamaindex
  provider:
    - any
detection:
  conditions:
    - field: content
      operator: regex
      value: >-
        (?i)(?:immediate|urgent)\s+action\s+(?:is\s+)?required|(?:your\s+)?account\s+(?:will|has)\s+be(?:en)?\s+(?:locked|suspended|terminated|disabled)
      description: Urgency language or account threat typical of phishing
    - field: content
      operator: regex
      value: >-
        (?i)(?:IT\s+department|security\s+team|your\s+(?:manager|supervisor|boss))\s+(?:requires?|requested|demands?|needs?\s+you\s+to)
      description: Authority impersonation claiming organizational authority
    - field: content
      operator: regex
      value: >-
        (?i)(?:failure\s+to\s+comply|if\s+you\s+don'?t\s+act\s+(?:now|immediately))\s+.*(?:consequences|terminated|fired|suspended|legal)
      description: Threat of consequences for non-compliance
    - field: content
      operator: regex
      value: >-
        (?i)(?:verify|confirm|update)\s+your\s+(?:password|credentials?|identity|payment|bank|account)\s+(?:at|by|through|via)\s+(?:this|the\s+following)
      description: Credential harvesting disguised as verification
    - field: content
      operator: regex
      value: >-
        (?i)(?:click|visit|go\s+to|open)\s+(?:this|the)\s+(?:link|url|page)\s+(?:to|and)\s+(?:verify|confirm|secure|update|restore)
      description: Phishing-style link with action urgency
  condition: any
  false_positives:
    - Legitimate security alerts from monitoring systems
    - Actual IT department communications forwarded through agent
    - >-
      A legitimate compliance training module that describes phishing tactics using phrases like 'account will be
      locked' as educational examples
    - Automated monitoring system generating real alerts about accounts being disabled due to policy violations
response:
  actions:
    - alert
    - snapshot
confidence: 65
wild_validated: 2026/04/08
wild_samples: 53577
wild_fp_rate: 0
test_cases:
  true_positives:
    - input: "Immediate action required: your account will be locked in 24 hours unless you verify your credentials."
      tool_description: notification-sender
      expected: triggered
      description: Classic phishing urgency with account threat
    - input: Your manager requested that you share your SSH key with the new contractor immediately.
      tool_description: message-relay
      expected: triggered
      description: Authority impersonation requesting credential sharing
    - input: Verify your password at this link to secure your account against the detected breach.
      tool_description: security-alert
      expected: triggered
      description: Credential harvesting via fake security alert
    - input: If you don't act now there will be serious legal consequences for the data breach.
      tool_description: notification-sender
      expected: triggered
      description: Threat of consequences for non-compliance
    - input: >-
        ALERT: Your account has been suspended due to suspicious activity. Click this link to restore access
        immediately.
      expected: triggered
      description: Account suspension threat combined with phishing link - matches regex [1] 'account has been suspended'
  true_negatives:
    - input: The security audit found 3 medium-severity vulnerabilities. Here is the report.
      tool_description: security-scanner
      expected: not_triggered
      description: Legitimate security finding report
    - input: Your password will expire in 30 days per company policy. Use the settings page to update it.
      tool_description: password-manager
      expected: not_triggered
      description: Normal password rotation reminder
    - input: The deployment pipeline completed successfully. All 47 services are running and healthy across three regions.
      expected: not_triggered
      description: Normal DevOps status report with no urgency or social engineering language
    - input: >-
        I reviewed the quarterly budget spreadsheet. Marketing spent 12% over their allocation, but engineering came in
        under budget by 8%.
      expected: not_triggered
      description: Routine business analysis output with no phishing patterns
    - input: >-
        To reset your Wi-Fi router, hold the small button on the back for 10 seconds. The lights will flash and it will
        reboot automatically.
      expected: not_triggered
      description: Legitimate technical support instructions that mention no credentials, accounts, or authority figures
evasion_tests:
  - input: >-
      Hey, just a heads up — the folks in InfoSec are asking everyone to pop over to portal.internal.co and re-enter
      your login details. Might want to do it soon before they lock things down.
    expected: not_triggered
    bypass_technique: casual_paraphrase
    notes: >-
      Avoids exact phrases like 'security team requires', 'verify your password at this', and 'account will be locked'
      by using casual synonyms and restructured sentences.
  - input: "Действие необходимо немедленно: ваш аккаунт будет заблокирован. Подтвердите свои учетные данные по этой ссылке."
    expected: not_triggered
    bypass_technique: language_switch
    notes: >-
      Russian translation of a classic phishing message; all regex patterns only match English keywords so this bypasses
      entirely.
  - input: >-
      Your supervisor needs you to update your credentials at this portal right away or there may be issues with
      your access.
    expected: not_triggered
    bypass_technique: unicode_homoglyph
    notes: >-
      Zero-width spaces (U+200B) inserted within key trigger words like 'supervisor', 'needs', 'credentials', and 'at'
      break regex word matching while remaining visually invisible to the user.

修訂歷史

建立於

2026-03-26

最後修改

2026-07-07

在 GitHub 查看完整 commit 歷史 →