Skip to content
ATR-2026-00073criticalData Poisoningexperimental

Malicious Fine-tuning Data

Detects poisoned fine-tuning datasets that contain instruction-following backdoors, trigger phrases, or behavior-modifying training examples. Attackers inject carefully crafted training samples that teach the model to respond to specific trigger inputs with malicious behaviors such as bypassing safety filters, exfiltrating data, or executing unauthorized actions. This rule inspects fine-tuning data uploads and training example submissions.

嚴重度
critical
類別
Data Poisoning
掃描目標
mcp
作者
ATR Community

建議回應

block inputalertsnapshotquarantine sessionescalate

參考資料

OWASP Agentic
ASI04:2026 - Agentic Supply Chain Vulnerabilities
OWASP LLM
LLM03:2025 - Supply Chain VulnerabilitiesLLM01:2025 - Prompt Injection
MITRE ATLAS
AML.T0020 - Poison Training DataAML.T0018 - Backdoor ML Model