DevOps 2025 Year in Review
The 5 Biggest Infrastructure Shifts
Ever wonder why your infrastructure team suddenly became AI experts? From AI becoming infrastructure to eBPF going mainstream, these five shifts defined 2025—and set the stage for what's coming in 2026.
2025: The Year Everything Changed
If 2024 was about experimenting with AI in development workflows, 2025 was about AI becoming the workload itself. DevOps teams went from "how do we deploy faster" to "how do we run trillion-parameter models profitably."
DevOps Market by 2028
up from $10.4B in 2023
Multi-Cloud Adoption
average of 3.4 cloud providers
Platform Teams by 2026
Gartner prediction
Shift #1: AI is Infrastructure Now
It's not experimental anymore. DevOps teams own LLM costs, latency, and SLAs. The question is no longer "should we use AI?" but "how do we run AI workloads profitably at scale?"
The Cost Reality
80% of enterprises miss AI spending forecasts by 25% or more. 84% report margin erosion of at least 6% due to unexpected LLM costs. Enterprise deployments can hit $10,000-$20,000/month for cloud hosting and scaling alone.
LLM market in 2025
projected by 2029 (28% CAGR)
DevOps Becomes AIOps
Infrastructure teams now track metrics that didn't exist two years ago: cost per query, tokens per query, cache hit rates, GPU utilization, model usage mix, and inference latency SLAs.
- Over 60% of teams integrate automated AI security scans into CI/CD pipelines
- 72% of businesses plan to increase AI budgets in 2026
- Nearly 40% already spend over $250,000 annually on LLM initiatives
- Strategic optimization can reduce LLM infrastructure costs by 30-50%
The Silver Lining
AI inference costs are dropping approximately 10x year-over-year without sacrificing performance. GPT-3 cost $60/million tokens in 2021. Equivalent-performance models cost $0.06/million tokens in late 2024. This trend continued aggressively through 2025.
Shift #2: Platform Engineering Became Mandatory
If your developers are still wrestling with YAML, you're behind. Platform engineering moved from "emerging trend" to "boardroom priority" in 2025.
Current Adoption
of organizations have adopted platform engineering in 2025, with 92% of CIOs planning AI integrations
Gartner 2026 Prediction
of large software engineering orgs will establish platform teams—up from 45% in 2022
Why 2025 Was the Tipping Point
- Platform engineering now appears on 10+ Gartner hype cycles—5x increase from previous year
- Over 60% of Kubernetes-heavy enterprises have dedicated platform teams
- 49% of primary drivers: reducing reliance on repetitive tasks through automation
- 15,000+ platform engineers in active community sharing best practices
The Business Case
Gartner predicts that by 2027, platform engineering principles will influence more than 50% of infrastructure and operations technology decisions—up from less than 20% today. That's not a trend; it's a fundamental restructuring of how organizations build software.
The shift: DevOps empowered developers to "build it and run it." Platform engineering lets them do that without becoming part-time infrastructure experts.
Shift #3: Multi-Cloud Hit 89% Adoption
But here's the twist: deep cloud integrations matter more than raw portability. The "write once, run anywhere" dream ran headfirst into the reality of specialized cloud services.
Multi-cloud adoption
Average providers used
Hybrid/multi-cloud by EOY
Global cloud spend 2025
The AWS-Google Multicloud Breakthrough
In December 2025, something unprecedented happened: AWS and Google Cloud launched a joint multicloud networking product. AWS Interconnect for multicloud is now in preview, with Microsoft Azure joining in 2026. Native high-speed private connections between the big three clouds—with an open specification for interoperability.
This isn't about portability anymore. It's about leveraging the best of each cloud without network latency penalties.
Industry Adoption Leaders
Media & Entertainment
Content distribution, VFX, streaming
Financial Services
Compliance-friendly platforms
The Complexity Tax
- 66% find managing multi-cloud environments challenging (80% of enterprises)
- Multi-cloud environments face 38% more vulnerabilities due to complex access management
- 68% of IT leaders say multi-cloud improves risk mitigation and service resilience
- 37% adopted multi-cloud to avoid vendor lock-in—up 8% YoY
Shift #4: eBPF Went Mainstream
AWS EKS now defaults to Cilium. If you're deploying Kubernetes without eBPF-based CNI, you're leaving performance—and security—on the table.
Performance Gains
- • ~20% CPU usage reduction on test workloads
- • Kube-proxy replacement for efficient routing
- • Exceptional observability via Hubble UI
- • No external agents required
Security Evolution
- • Identity-based policies, not just IP addresses
- • Tetragon for runtime security in the kernel
- • Deep packet inspection at kernel level
- • Replaces iptables-based chains entirely
AWS EKS Expansion (August 2025)
Amazon EKS expanded Cilium support as the CNI for EKS Hybrid Nodes. With EKS now supporting up to 100,000 nodes per cluster, Cilium's role in hybrid environments becomes pivotal for ultra-scale AI training—potentially handling 1.6 million accelerators.
Max nodes per EKS cluster
Public production users
Major cloud providers
Cilium Everywhere
Cilium is now the CNI for Alibaba, APPUiO, Azure, AWS, DigitalOcean, Exoscale, Google Cloud, Hetzner, and Tencent Cloud. From the 2025 annual report: on-premises bare metal has surpassed AWS as the most common deployment environment—signaling that organizations are building complex self-managed platforms for HPC and AI.
Shift #5: Deep Integrations Beat Raw Portability
The "multi-cloud everything" dream faced reality. Turns out, best-in-class integrations matter more than theoretical portability. Case in point: the Google Cloud + Palo Alto Networks partnership.
The $10 Billion Partnership (December 2025)
Palo Alto Networks and Google Cloud forged what Reuters reports as a contract "approaching $10 billion" over several years. This isn't just vendor lock-in—it's strategic alignment that delivers capabilities no portable abstraction layer could match.
What Palo Alto Gets:
- • Migration of key workloads to Google Cloud
- • Vertex AI platform and Gemini LLMs
- • Power for their security copilots
What Customers Get:
- • Prisma AIRS for AI workload protection
- • Secure Vertex AI and Agent Engine
- • 75+ joint integrations
The AI Security Imperative
Palo Alto's December 2025 State of Cloud Security Report found that 99% of respondents experienced at least one attack on their AI infrastructure over the last year. Deep platform integration isn't optional when you're defending AI workloads.
99% of organizations experienced AI infrastructure attacks in 2025
The Lesson for 2026
Abstract what can be abstracted. Integrate deeply where it matters. The organizations winning at multi-cloud aren't the ones with the most portable architectures—they're the ones who strategically chose where to go deep and where to stay flexible.
Pro Tip: What 2026 Will Be About
2026 will be about execution at scale. The hype is over. Now it's about running AI workloads profitably, securing infrastructure properly, and building platforms developers actually want to use.
Run AI Profitably
- Track cost per query
- Optimize inference pipelines
- Right-size GPU allocations
- Implement LLM caching strategies
Secure Properly
- AI-aware security posture
- eBPF-based runtime protection
- Identity-first network policies
- Continuous AI red teaming
Build Better Platforms
- Developer experience metrics
- Golden paths, not golden cages
- Self-service with guardrails
- Measure time-to-production
The Bottom Line
2025 was the year infrastructure teams stopped asking "if" and started mastering "how."
AI is infra
Platforms mandatory
Multi-cloud mature
eBPF mainstream
Integrations win
The organizations that will thrive in 2026 are the ones treating these shifts not as trends to watch, but as foundations to build on.
More insights at talk-nerdy-to-me.com/blog