Skip to content
ATR-2026-00553highExcessive Autonomydraft

Runaway tool-call loop within a single session

Detects a runaway tool-call loop where an agent emits more than 100 tool call spans within a one-minute window for a single session. This is a behavioral-method rule per spec/atr-method-v1.1.md §7, demonstrating the aggregation grammar (count over time window with per-session grouping and a min-events floor). Threat model: prompt-injection attacks, agent goal-drift, or compound retry-loop bugs can cause an agent to enter a tight tool-call loop — retrying the same tool, polling a service, or exhausting a budget by calling many tools in rapid succession. The behavior is not visible in any single tool call (each call may be syntactically benign); the signature is in the aggregate frequency. The runaway pattern is the most common cause of denial-of-wallet incidents in production agents and the leading runtime symptom of agent goal drift in long-running workflows. Suppression: `min_events: 10` prevents false positives during cold- start where one or two early calls would otherwise trip the rule; `cooldown: PT5M` prevents alert spam after the first violation. This rule is the canonical behavioral-method reference example shipped with v1.1 of the method-extensions spec.

Severity
high
Category
Excessive Autonomy
Scan Target
runtime
Author
ATR Community

Response Actions

alertrate limit sourceescalate

References

OWASP Agentic
ASI05:2026 - Unexpected Code ExecutionASI07:2026 - Excessive Autonomy
OWASP LLM
LLM06:2025 - Excessive Agency
MITRE ATLAS
AML.T0034 - Cost Harvesting

Detection Conditions

Combinator: any
  1. 01
    Synthetic field emitted by the behavioral engine when threshold is exceeded; behavioral-native engines evaluate detection.behavioral.
    field: behavioral.metric_valueop: regex

Attack Examples (Rule Triggers)

  1. {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_runaway"},"metric_value":150,"event_count":150}
    
  2. {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_loop"},"metric_value":250,"event_count":250}
    
  3. {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_borderline"},"metric_value":101,"event_count":101}
    
  4. {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_poll"},"metric_value":300,"event_count":300}
    
  5. {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_drift"},"metric_value":500,"event_count":500}
    

Real-world attack payloads (sanitized). Used for regression testing.

Benign Examples (Rule Doesn't Trigger)

  1. Normal session: 15 tool calls in 1 minute (well below threshold)
    {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_normal"},"metric_value":15,"event_count":15}
    
  2. Quiet session: 3 calls (below min_events floor 10, would not fire even if at threshold)
    {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_quiet"},"metric_value":3,"event_count":3}
    
  3. Exactly at threshold (100 = 100), not greater — gt is strict
    {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_at_floor"},"metric_value":100,"event_count":100}
    
  4. Tagged scheduled batch job — operator policy exemption (filter applied pre-aggregation)
    {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_batch"},"metric_value":200,"event_count":200,"attributes":{"policy_exemption":"batch_job"}}
    
  5. Session already in cooldown from prior Match — engine suppresses duplicate alert
    {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_cooldown"},"metric_value":150,"event_count":150,"in_cooldown":true}
    

Known False Positive Contexts

  • Legitimate batch operations (data migration, bulk ingest, scheduled cleanup jobs) may legitimately emit > 100 tool calls in a minute. Mitigation: operators tag scheduled batch jobs with attributes.policy_exemption="batch_job" and add a filter to the rule's behavioral.filter block.
  • The 100 / minute threshold is a default. Operators with high- throughput agents (e.g., search/recommendation systems doing hundreds of vector lookups per request) MUST tune the threshold to their baseline before deployment; otherwise base-rate false positives will overwhelm alert channels.

Full YAML Definition

Edit on GitHub →
title: "Runaway tool-call loop within a single session"
id: ATR-2026-00553
rule_version: 1
status: draft
description: >
  Detects a runaway tool-call loop where an agent emits more than 100 tool
  call spans within a one-minute window for a single session. This is a
  behavioral-method rule per spec/atr-method-v1.1.md §7, demonstrating the
  aggregation grammar (count over time window with per-session grouping
  and a min-events floor).

  Threat model: prompt-injection attacks, agent goal-drift, or compound
  retry-loop bugs can cause an agent to enter a tight tool-call loop —
  retrying the same tool, polling a service, or exhausting a budget by
  calling many tools in rapid succession. The behavior is not visible
  in any single tool call (each call may be syntactically benign); the
  signature is in the aggregate frequency. The runaway pattern is the
  most common cause of denial-of-wallet incidents in production agents
  and the leading runtime symptom of agent goal drift in long-running
  workflows.

  Suppression: `min_events: 10` prevents false positives during cold-
  start where one or two early calls would otherwise trip the rule;
  `cooldown: PT5M` prevents alert spam after the first violation.

  This rule is the canonical behavioral-method reference example shipped
  with v1.1 of the method-extensions spec.
author: "ATR Community"
date: "2026/05/28"
schema_version: "1.0"
maturity: draft
severity: high

references:
  owasp_agentic:
    - "ASI05:2026 - Unexpected Code Execution"
    - "ASI07:2026 - Excessive Autonomy"
  owasp_llm:
    - "LLM06:2025 - Excessive Agency"
  mitre_atlas:
    - "AML.T0034 - Cost Harvesting"

compliance:
  nist_csf:
    - "DE.AE-02"
    - "DE.AE-04"
  etsi_ts_104223:
    - "P3.3"
  eu_ai_act:
    - article: "15"
      context: >
        Cybersecurity & robustness — runaway tool loops are an Article 15
        robustness failure. The rule provides runtime evidence.
      strength: primary
  nist_ai_rmf:
    - subcategory: "MS.2.6"
      context: >
        Information security — autonomous-action rate anomalies require
        detection per MS-2.6.
      strength: primary

tags:
  category: excessive-autonomy
  subcategory: runaway-tool-loop
  scan_target: runtime
  confidence: high
  source: behavioral-reference

agent_source:
  type: agent_behavior
  framework:
    - any
  provider:
    - any

detection:
  method: behavioral
  condition: any
  false_positives:
    - >
      Legitimate batch operations (data migration, bulk ingest, scheduled
      cleanup jobs) may legitimately emit > 100 tool calls in a minute.
      Mitigation: operators tag scheduled batch jobs with
      attributes.policy_exemption="batch_job" and add a filter to the
      rule's behavioral.filter block.
    - >
      The 100 / minute threshold is a default. Operators with high-
      throughput agents (e.g., search/recommendation systems doing
      hundreds of vector lookups per request) MUST tune the threshold
      to their baseline before deployment; otherwise base-rate false
      positives will overwhelm alert channels.
  conditions:
    - field: behavioral.metric_value
      operator: regex
      value: "(?i)tool_calls_per_session_exceeds_threshold"
      description: >
        Synthetic field emitted by the behavioral engine when threshold
        is exceeded; behavioral-native engines evaluate detection.behavioral.
  behavioral:
    metric: "tool_calls"
    aggregation: count
    window: "PT1M"
    operator: gt
    threshold: 100
    group_by:
      - "session.id"
    min_events: 10
    cooldown: "PT5M"
    filter:
      span.kind:
        in:
          - TOOL

response:
  actions:
    - alert
    - rate_limit_source
    - escalate
  message_template: >
    [ATR-2026-00553] HIGH: Runaway tool-call loop detected in session
    {{behavioral.session_id}}. {{behavioral.metric_value}} tool calls
    in {{behavioral.window}} (threshold: {{behavioral.threshold}}).
    Rate-limiting the session; escalate for operator review. Likely
    causes: goal-drift loop, polling-without-backoff, prompt-injection-
    induced loop.

confidence: 80

wild_fp_rate: 0

test_cases:
  true_positives:
    - input: |
        {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_runaway"},"metric_value":150,"event_count":150}
      expected: triggered
      description: "150 tool calls in 1 minute for one session — exceeds threshold 100"
    - input: |
        {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_loop"},"metric_value":250,"event_count":250}
      expected: triggered
      description: "Aggressive runaway: 250 calls in 1 minute"
    - input: |
        {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_borderline"},"metric_value":101,"event_count":101}
      expected: triggered
      description: "Just over threshold (101 > 100)"
    - input: |
        {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_poll"},"metric_value":300,"event_count":300}
      expected: triggered
      description: "Polling-without-backoff: 300 read calls in 1 minute"
    - input: |
        {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_drift"},"metric_value":500,"event_count":500}
      expected: triggered
      description: "Severe runaway: 500 calls in 1 minute (goal-drift loop)"

  true_negatives:
    - input: |
        {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_normal"},"metric_value":15,"event_count":15}
      expected: not_triggered
      description: "Normal session: 15 tool calls in 1 minute (well below threshold)"
    - input: |
        {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_quiet"},"metric_value":3,"event_count":3}
      expected: not_triggered
      description: "Quiet session: 3 calls (below min_events floor 10, would not fire even if at threshold)"
    - input: |
        {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_at_floor"},"metric_value":100,"event_count":100}
      expected: not_triggered
      description: "Exactly at threshold (100 = 100), not greater — gt is strict"
    - input: |
        {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_batch"},"metric_value":200,"event_count":200,"attributes":{"policy_exemption":"batch_job"}}
      expected: not_triggered
      description: "Tagged scheduled batch job — operator policy exemption (filter applied pre-aggregation)"
    - input: |
        {"window_end":"2026-05-28T10:00:00Z","window":"PT1M","group":{"session.id":"sess_cooldown"},"metric_value":150,"event_count":150,"in_cooldown":true}
      expected: not_triggered
      description: "Session already in cooldown from prior Match — engine suppresses duplicate alert"

Revision History

Created
2026-05-28
Last modified
2026-05-28
View full commit history on GitHub →