Skip to content
ATR-2026-00073criticalData Poisoningexperimental

Malicious Fine-tuning Data

Detects poisoned fine-tuning datasets that contain instruction-following backdoors, trigger phrases, or behavior-modifying training examples. Attackers inject carefully crafted training samples that teach the model to respond to specific trigger inputs with malicious behaviors such as bypassing safety filters, exfiltrating data, or executing unauthorized actions. This rule inspects fine-tuning data uploads and training example submissions.

Severity
critical
Category
Data Poisoning
Scan Target
mcp
Author
ATR Community

Response Actions

block inputalertsnapshotquarantine sessionescalate

References

OWASP Agentic
ASI04:2026 - Agentic Supply Chain Vulnerabilities
OWASP LLM
LLM03:2025 - Supply Chain VulnerabilitiesLLM01:2025 - Prompt Injection
MITRE ATLAS
AML.T0020 - Poison Training DataAML.T0018 - Backdoor ML Model