7. MLSecOps

Purpose of this chapter

DevSecOps provides a strong foundation for securing software delivery, but ML systems introduce new artifacts, feedback loops, and failure modes that DevSecOps alone does not address. MLSecOps extends DevSecOps to ensure models, data, prompts, and evaluations are governed with the same rigor as code—while accounting for non-determinism and behavioral drift.

This chapter explains how MLSecOps builds on DevSecOps, where it diverges, and which controls are required to operate GenAI systems safely at scale. It complements the lifecycle framing in chapter 6.

DevSecOps as the foundation

DevSecOps practices remain essential and do not go away with GenAI:

CI/CD pipelines and artifact promotion
Secrets management and IAM
Infrastructure as Code (IaC)
Logging, monitoring, and incident response

What changes is what must be secured, and how assurance is established.

DevSecOps vs MLSecOps (comparison)

Dimension	DevSecOps (traditional)	MLSecOps (GenAI / ML)
Primary artifacts	Code, containers	Models, prompts, datasets, evals
Change triggers	Code commits	Data updates, fine-tunes, feedback
Determinism	Deterministic builds	Probabilistic behavior
Testing focus	Functional & security tests	Robustness, misuse, drift
Rollback	Revert code	Behavior may persist
Monitoring	Errors, latency	Behavioral & semantic signals
Supply chain	Dependencies & images	Models, data sources, adapters

Key takeaway: MLSecOps adds controls where behavior changes without code changes.

MLSecOps pipeline (end-to-end)

MLSecOps pipelines introduce additional stages beyond CI/CD:

Data ingestion & validation
Training / fine-tuning jobs
Evaluation & gating
Model registry & signing
Deployment & serving
Runtime monitoring
Feedback ingestion & re-training

MLSecOps / LLM security pipeline (illustrative)

Caption inspiration: “MLSecOps: secure your large language model (LLM) applications.”

Artifact governance in MLSecOps

MLSecOps treats the following as first-class security artifacts:

Artifact	Why it matters
Datasets	Define long-term behavior
Labels	Encode policy and bias
Models / adapters	Executable decision logic
Prompts / policies	Control plane
Evaluations	Gate behavior changes

Controls to apply

Provenance tracking
Versioning and immutability
Signing and verification
Least-privilege access

Threats unique to MLSecOps

Threat category	Description
Training poisoning	Malicious or tainted data alters behavior
Model substitution	Swapping approved models with unsafe ones
Unsafe deserialization	Executing malicious model artifacts
Evaluation bypass	Shipping models without safety gates
Feedback poisoning	Steering behavior post-deployment

These threats persist even when traditional CI/CD is secure.

Evaluation gates (security, not accuracy)

In MLSecOps, evaluation gates replace binary “build passes.”

Evaluation type	Purpose
Robustness tests	Stability under adversarial input
Misuse scenarios	Resistance to prompt manipulation
Leakage checks	Memorization & inference risks
Drift baselines	Detect behavioral change

Evaluation of LLM-based applications (illustrative)

Caption inspiration: “Steady the course: navigating the evaluation of LLM-based applications.”

Runtime monitoring: from metrics to meaning

Traditional monitoring answers: Is the service up?
MLSecOps monitoring must answer: Is the model behaving safely?

Monitoring signal	What it detects
Prompt / context patterns	Injection attempts
Tool invocation rates	Privilege abuse
Output semantics	Policy drift
Retrieval sources	RAG poisoning

Principle: monitor behavior and intent, not just availability.

Feedback loops as a security boundary

Feedback is powerful—and dangerous.

Feedback type	Risk
Human ratings	Social engineering
Automated signals	Reinforcing unsafe shortcuts
Self-training	Compounding drift

The following illustrates an architectural option that isolates the feedback loop from the rest of the machine learning pipeline:

Strategic feedback loops in ML systems (illustrative)

Caption inspiration: “Elevating machine learning systems through strategic feedback loops.”

Controls

Vet feedback sources
Scope what feedback can change
Require approvals for re-training

Operating principles for MLSecOps

Assume drift; detect it continuously
Gate behavior, not just artifacts
Separate authority from language
Keep humans in critical loops
Design for rollback that includes behavior