DevSecOps for the Agent Era
The Security Gap Nobody's Talking About
Three CVE vulnerabilities hit Anthropic's MCP Git server. Docker acquired MCP Defender for runtime agent security. OWASP published a dedicated Top 10 for Agentic Applications. AI agents are shipping to production with read/write access to files, databases, APIs, and infrastructure — but the security model hasn't caught up. Here's the agent security playbook DevOps teams need right now.
Agents Are Shipping. Security Isn't.
Think about what your AI agents can do right now: read files, execute code, call external APIs, access databases, manage infrastructure, create pull requests, deploy services. They operate with the permissions of the user who launched them—often with far more access than the task requires.
Now ask yourself: who's auditing that access? In most organizations, the answer is nobody. We're in the early days of a security model transition that most teams haven't even recognized yet.
40%
of enterprise apps will embed AI agents by end of 2026
Gartner, Aug 2025
82:1
machine-to-human identity ratio in the avg enterprise
OWASP Agentic Top 10
80%
of orgs report agents taking unintended actions
NIST 2026 Research
We're not securing what AI says anymore. We're securing what AI does.
The Wake-Up Call: 3 CVEs in Anthropic's MCP Git Server
In January 2026, Israeli cybersecurity firm Cyata disclosed three chained vulnerabilities in Anthropic's canonical MCP Git server—the reference implementation developers are expected to copy. These aren't edge cases or exotic configurations. They work out of the box.
The setting that restricts the MCP server to a specific repository path didn't actually enforce it. The server allowed users to create repos anywhere on the filesystem.
Could turn any arbitrary directory into a Git repository, enabling attackers to plant .git/config files for code execution.
The git_diff command could be abused to empty or overwrite any file reachable by the server process.
The Chained Attack
These three flaws could be chained via prompt injection. An attacker who can influence what an AI assistant reads—a malicious README, a poisoned issue description, a compromised webpage—can weaponize these vulnerabilities without any direct access to the victim's system.
When both the Git MCP server and a Filesystem MCP server are enabled, the exploit chain achieves remote code execution by writing to .git/config and triggering Git's smudge/clean filters to execute shell commands. No credentials, no shell access, no direct interaction required.
Fix: Anthropic released version 2025.12.18 of mcp-server-git which enforces path validation, addresses argument handling, and completely removes the git_init tool. If you're running an older version, update immediately.
The Agent Security Playbook
Sandbox Everything
An Agent That Can't Escape Can't Compromise Your Infra
Run every agent in an isolated environment. The agent should never share a kernel, filesystem, or network namespace with your host or other workloads. In 2026, sandboxed agents reduce security incidents by 90% compared to agents with unrestricted access. This is baseline, not optional.
Docker Sandboxes — MicroVM Isolation
# Run Claude Code in a Docker Sandbox docker sandbox create \ --name agent-sandbox \ --workspace ./my-project # Each sandbox gets its own: # - MicroVM with dedicated kernel # - Private Docker daemon # - Isolated filesystem # - No host access
Firecracker MicroVM — AWS-Grade Isolation
# Firecracker boots in ~125ms # <5 MiB memory overhead per VM # Up to 150 VMs/second per host # Used by: AWS Lambda, Fargate, # E2B, Fly.io, Northflank # Each agent gets its own Linux kernel # Hardware-level KVM isolation # Must escape guest kernel AND # hypervisor to reach host
Docker Sandboxes
MicroVM
Dedicated kernel per agent. Private Docker daemon. Built for coding agents. macOS + Linux.
Firecracker
~125ms
Boot time. AWS's VMM for Lambda/Fargate. KVM isolation. Open source since 2018.
Apple Containers
Swift
Each container in its own VM. No shared kernel. Rootless. Apple Silicon optimized. WWDC 2025.
gVisor
Syscall
User-space kernel intercepts syscalls. 10-30% I/O overhead. Google's middle ground for compute-heavy agents.
Containers alone are not enough. Standard Docker containers share the host kernel. A kernel vulnerability or misconfiguration can allow container escape. For untrusted agent code, you need microVM isolation (Firecracker, Apple Containers) or at minimum a user-space kernel (gVisor).
Apply Least Privilege to Tool Access
Your Agent Doesn't Need Access to Every MCP Tool
OWASP calls this “Least-Agency”—an extension of least privilege for autonomous systems. Agents should only be granted the minimum level of autonomy required to complete their defined task. A code review agent doesn't need database write access. A research agent doesn't need deployment permissions.
Bad — Agent has everything
# Agent config: kitchen sink tools: - filesystem # read + write + delete - git # clone + commit + push - database # full CRUD - kubernetes # deploy + scale + delete - aws # sts + ec2 + s3 + iam
Good — Scoped tool allowlist
# Code review agent: read-only
tools:
- filesystem:
permissions: [read]
paths: ["/src/**", "/tests/**"]
- git:
permissions: [diff, log, blame]
# No clone, commit, or pushDefine Explicit Tool Allowlists per Agent
MCP allows servers to expose only specific “tools” (functions) rather than broad system access. Use this. Define a tool manifest per agent role that lists exactly which tools the agent can call, with which parameters, and against which resources.
Separate System Tools from User Tools
System Tools (organizational-level permissions like deploy, delete, modify infrastructure) should be separated from User Tools (scoped to individual contexts like read files, search code). High-risk operations require multi-factor authorization where the agent must justify the action to a separate validation system.
Validate Every Parameter
Even when an agent has permission to invoke a tool, validate that parameters fall within expected ranges and don't contain injection attempts. Use MCP servers as proxies that perform schema validation and content filtering before requests reach backend APIs.
MCP spec guidance: Start with a minimal initial scope set (e.g., mcp:tools-basic) containing only low-risk discovery/read operations. Use incremental elevation via targeted WWW-Authenticate scope challenges when the agent first attempts privileged operations.
Audit Every MCP Connection
MCP Servers Are the New Attack Surface
MCP servers are rapidly becoming the connective tissue for agentic AI. Treat them exactly like API endpoints: authenticate, authorize, log, rate-limit. The CVEs in Anthropic's Git server prove this isn't theoretical. Docker acquired MCP Defender in September 2025 specifically to address this gap, and followed up with the open-source MCP Gateway for production environments.
MCP Gateway — Docker Compose Integration
# docker-compose.yml
services:
mcp-gateway:
image: docker/mcp-gateway:latest
ports:
- "8080:8080"
environment:
# OAuth 2.1 resource server config
AUTH_ISSUER: https://auth.example.com
AUTH_AUDIENCE: mcp-gateway
# Rate limiting
RATE_LIMIT_RPM: 100
RATE_LIMIT_BURST: 20
# Logging
LOG_LEVEL: info
LOG_FORMAT: json
volumes:
- ./policies:/etc/mcp-gateway/policies
# Your MCP servers sit behind the gateway
git-server:
image: mcp/git-server:2025.12.18
# NOT exposed directly - only via gatewayAuthentication & Authorization
Logging & Monitoring
Critical: MCP servers MUST NOT accept tokens that were not explicitly issued for them. Attackers exploit MCP proxy servers connecting to third-party APIs to create “confused deputy” vulnerabilities—allowing malicious clients to obtain authorization codes without proper user consent.
Implement Agent RBAC
Different Agents, Different Permissions
Your deployment agent gets different access than your research agent. Map agent roles like you map human roles. But go further—traditional RBAC was built for human users with predictable behavior. Agents need dynamic, context-aware privilege management because their needs change based on task context.
Agent Role Mapping — Policy Example
# agent-rbac-policy.yaml
agent_roles:
code-reviewer:
description: "Reviews PRs and suggests improvements"
tools: [git:diff, git:log, filesystem:read]
data_access: [source_code]
environments: [development, staging]
max_session_duration: 1h
deployment-agent:
description: "Deploys approved releases"
tools: [kubernetes:apply, helm:upgrade]
data_access: [manifests, configs]
environments: [staging, production]
requires_approval: true # Human-in-the-loop
max_session_duration: 30m
research-agent:
description: "Searches docs and answers questions"
tools: [web:search, filesystem:read]
data_access: [documentation]
environments: [development]
network_access: [outbound_https_only]
max_session_duration: 2h
incident-responder:
description: "Investigates production incidents"
tools: [kubernetes:get, logs:read, metrics:query]
data_access: [logs, metrics, traces]
environments: [production]
escalation_required: [kubectl:delete, kubectl:exec]
max_session_duration: 4hAgent Identity Governance
Treat agents as Non-Human Identities (NHI). Each agent gets its own identity—separate from the user who launched it. This improves visibility and auditability.
Require a human sponsor to govern each agent's identity and lifecycle. This prevents orphaned agents running with stale credentials.
Dynamic Privilege Controls
Beyond static RBAC: The consensus across OWASP, Microsoft, NIST, and FINOS is that static RBAC alone is insufficient for AI agents. Move toward Policy-Based Access Control (PBAC) using centralized policy engines that evaluate real-time risk signals, task context, and agent behavior patterns.
Monitor Agent Behavior in Production
Non-Deterministic Systems Need Runtime Monitoring
Agents are non-deterministic. The same prompt can produce different tool calls, different API sequences, different outcomes. You can't unit-test your way to safety. You need runtime observability: track every tool call, flag anomalies, set circuit breakers. If an agent starts making unexpected API calls at 3 AM, you want to know.
Agent Monitoring — Observability Stack
# agent-monitoring-config.yaml
monitoring:
# Track every tool invocation
tool_call_logging:
enabled: true
fields: [agent_id, tool, params, timestamp,
duration, result_status]
destination: datadog # or splunk, elastic, etc.
# Anomaly detection rules
anomaly_rules:
- name: unusual_tool_access
condition: "tool NOT IN agent_role.allowed_tools"
action: block_and_alert
- name: high_frequency_calls
condition: "tool_calls_per_minute > 50"
action: rate_limit_and_alert
- name: off_hours_activity
condition: "hour NOT IN [8..18] AND
env == 'production'"
action: alert_oncall
- name: data_exfiltration_pattern
condition: "outbound_bytes > 10MB AND
tool == 'web:fetch'"
action: block_and_escalate
# Circuit breakers
circuit_breakers:
max_consecutive_errors: 5
max_cost_per_session: $10
max_tool_calls_per_session: 500
kill_switch: true # Emergency stopTrack
Every tool call, every parameter, every response. Correlation IDs across agent sessions. Full audit trail.
Flag
Anomalous patterns: unexpected tools, high-frequency calls, off-hours activity, large data transfers.
Break
Circuit breakers: max errors, max cost, max calls per session. Kill switches for emergency stop.
OWASP calls this “Strong Observability”—a non-negotiable security control requiring clear, comprehensive visibility into what agents are doing, why, and which tools they're invoking. Detailed logging of goal state, tool-use patterns, and decision pathways is mandatory.
OWASP Top 10 for Agentic Applications 2026
The New Security Framework for What AI Does
OWASP published a dedicated Top 10 for Agentic Applications in late 2025, developed with 100+ industry experts. This isn't an update to the LLM Top 10—it's a new framework because the attack surface is categorically different. Risks range from Agent Goal Hijack to Rogue Agents and are “no longer theoretical—they're active, systemic, and already hitting production systems.”
The Four Attack Layers
Model Layer
The LLM itself. Vulnerable to adversarial inputs, goal hijacking, and prompt injection.
Tool Ecosystem
External systems and APIs agents invoke — email, databases, cloud infra. Each is an abuse vector.
Memory Architecture
Vector databases, RAG repos, conversation histories that shape agent behavior over time.
Multi-Agent Mesh
Inter-agent communication where autonomous systems coordinate across org boundaries.
Core Principles
Least-Agency
Extension of least privilege for autonomous systems. Avoid unnecessary autonomy. Grant only the minimum level required for the defined task.
Strong Observability
Non-negotiable. Comprehensive visibility into what agents are doing, why, and which tools they invoke. Log goal state, tool-use patterns, and decision pathways.
Sandbox Technology Comparison
| Technology | Isolation | Startup | Overhead | Best For |
|---|---|---|---|---|
| Docker Sandboxes | MicroVM + dedicated kernel | Sub-second | Low | Coding agents (Claude Code, Copilot) |
| Firecracker | KVM hardware isolation | ~125ms | <5 MiB/VM | Multi-tenant, untrusted code, production |
| Apple Containers | Per-container VM, no shared kernel | Sub-second | Low | macOS / Apple Silicon development |
| gVisor | User-space kernel (syscall interception) | Fast | 10-30% I/O | Compute-heavy agents, GKE |
| Kata Containers | OCI-compatible microVM | ~200ms | Moderate | Kubernetes-native enterprise |
| WebAssembly | Capability-based (WASI) | <1ms | Minimal | Lightweight, capability-restricted agents |
What NOT to Do
Don't Deploy Agents Without IT Approval
Trend Micro's February 2026 report found 1 in 5 organizations deployed agentic AI tools without IT approval. Unsupervised deployment with broad permissions and high autonomy turns theoretical risks into tangible threats across entire organizations.
Don't Give Agents Your Personal Credentials
When an AI agent acts on behalf of a human, authorization is evaluated against the agent's identity, not the requester's. If the agent is compromised, the attacker inherits your full access. Create dedicated agent identities with scoped permissions.
Don't Run Agents in Standard Containers and Call It "Sandboxed"
Standard Docker containers share the host kernel. A kernel exploit gives the attacker access to your host and every other container on it. For untrusted agent code, you need microVM isolation (Firecracker, Apple Containers) or at minimum gVisor.
Don't Trust MCP Servers Blindly
MCP servers are the new API endpoints. An unvetted MCP server can exfiltrate data, inject malicious instructions via prompt injection, or serve as a confused deputy. Authenticate every connection, validate every token, log every call.
Don't Skip Runtime Monitoring Because "It Worked in Testing"
Agents are non-deterministic. The same prompt can produce different tool calls in production than it did in staging. You need runtime anomaly detection, circuit breakers, and kill switches — not just pre-deployment testing.
The Corey Quinn Test
“Would you be comfortable running this agent in an AWS account called Superfund?”
If not, your sandbox isn't good enough. If the thought of your agent having unrestricted access to that account makes you uncomfortable, then you haven't properly scoped its permissions, isolated its environment, or instrumented its behavior.
Apply this test to every agent you deploy. If you wouldn't trust it in your most sensitive account, it shouldn't be running in any account without proper guardrails.
The Trend Micro Warning
Trend Micro's 2026 Security Predictions Report warns that agentic AI introduces a new class of threats. When agents hallucinate, are manipulated, or are compromised, the consequences can be devastating—altering supply chains, draining accounts, or disrupting infrastructure without human awareness.
What Trend Micro Is Seeing
Their Recommendations
Your Action Plan
Implement This Week
The teams that build agent security in now won't be scrambling when 40% of their enterprise apps embed agents by year-end. Start with these concrete steps:
Inventory every AI agent in your org. Document what tools each one accesses, who launched it, and what credentials it uses.
Run at least one agent in Docker Sandboxes or a Firecracker microVM. Verify it can't access the host filesystem or network.
Audit your MCP server configurations. Update mcp-server-git to version 2025.12.18 or later. Remove any unneeded tools.
Define explicit tool allowlists for your two most-used agents. Start with read-only, add write access only where justified.
Create agent roles in your IAM system. Map each agent to a role with time-bounded, scoped permissions. No more inheriting the developer's full access.
Set up basic tool-call logging for one production agent. Ship logs to your SIEM. Set an alert for off-hours activity.
Implement one circuit breaker: max tool calls per session, max cost per session, or kill switch. Test it.
Run the Corey Quinn test on every agent. If you wouldn't trust it in your most sensitive account, tighten the sandbox.
Review the OWASP Top 10 for Agentic Applications with your security team. Map your current posture against the four attack layers.
Block agents from deploying to production without human-in-the-loop approval. No exceptions until your monitoring is mature.
Key Takeaways
AI agents are shipping to production with broad access to files, code, APIs, databases, and infrastructure — but security models designed for human users haven't caught up.
Three CVEs in Anthropic's canonical MCP Git server (CVE-2025-68145, -68143, -68144) demonstrated chained prompt injection leading to remote code execution. MCP servers are the new attack surface.
Sandbox everything. Standard containers share the host kernel and aren't sufficient. Use microVM isolation (Docker Sandboxes, Firecracker, Apple Containers) or user-space kernels (gVisor).
Apply least-agency — OWASP's extension of least privilege for autonomous systems. Define explicit tool allowlists per agent. Validate every parameter. Start minimal, escalate incrementally.
Treat MCP servers like API endpoints: authenticate with OAuth 2.1, authorize with fine-grained scopes, log every request, rate-limit by agent identity. Docker's MCP Gateway provides an open-source enforcement layer.
Implement agent RBAC with dynamic, context-aware privileges. Treat agents as Non-Human Identities. Time-bound credentials, auto-de-escalate when idle, require human-in-the-loop for high-risk operations.
Non-deterministic systems need runtime monitoring. Track every tool call, flag anomalies, set circuit breakers, implement kill switches. OWASP calls this "Strong Observability" — it's non-negotiable.
OWASP published a dedicated Top 10 for Agentic Applications 2026 identifying risks across four attack layers: model, tool ecosystem, memory architecture, and multi-agent mesh.
Gartner predicts 40% of enterprise apps will embed AI agents by end of 2026, up from <5% in 2025. The teams that build security in now won't be scrambling later.
Run the Corey Quinn test: would you be comfortable running this agent in an AWS account called "Superfund"? If not, your sandbox isn't good enough.
Security Is a Guardrail, Not a Gate.
Build it into the agent lifecycle, not at the end of it. Every unaudited agent in production is an unmonitored identity with broad access. Start with one sandbox, one allowlist, one circuit breaker. Then scale.
Related Posts
AI in the Cloud: Microsoft Foundry — The Agentic Developer Platform
Microsoft Ignite 2025 introduced Foundry, positioned to be the central hub for building AI agents on Azure. With unified MCP tool catalogue, multi-agent runtime, and enterprise-grade governance.
GitHub Agentic Workflows: The Decision Framework Nobody's Talking About
Everyone's excited about AI in CI/CD. Nobody's asking when to use it vs when not to. GitHub Agentic Workflows just entered technical preview — the architecture is solid. But the real decision isn't which agent to pick. It's when to use agentic workflows vs deterministic ones. Here's the decision framework, the adoption pattern, and the three questions to answer before you deploy.
GitHub Agentic Workflows: "Continuous AI" Enters the CI/CD Loop
GitHub launched Agentic Workflows in technical preview — replacing YAML with Markdown for AI-driven pipeline automation. Copilot, Claude Code, and Codex handle jobs that require judgment, not just deterministic execution. Open source under MIT. Here's how it works and what your team should do.