Netflix migrates millions of batch workloads to Kubernetes-native Kueue system
Netflix has successfully migrated its managed batch compute solution to the Kubernetes-native job queueing system Kueue, integrating it within its Titus container platform. This architectural transition introduces preemption-based fair sharing, resulting in significantly increased resource utilization for running back-end processing pipelines. The migration of millions of batch workloads was achieved with zero downtime, maintaining complete API parity for internal users.
Key Takeaways
- Kueue replaces custom logic in Netflix's 2018-era Compute Managed Batch (CMB) solution while maintaining full API parity for internal users.
- The implementation uses preemption-based fair sharing to allow high-priority business-critical workloads to reclaim resources from lower-priority jobs.
- Netflix achieved a full production rollout in four weeks by prioritizing the migration of its largest and most complex customer first.
- Kueue was integrated specifically to avoid fragmented job placement, allowing it to work with existing specialized Titus scheduling profiles.
Why It Matters
By adopting Kueue, Netflix continues its strategic pivot from homegrown infrastructure toward standard Kubernetes-native components. For the streaming industry, this move demonstrates how to achieve high-density resource utilization and automated preemption at massive scale—crucial for managing the high-cost batch processing required for video encoding and machine learning. As platforms face increasing cost pressure and hardware heterogeneity, this shift provides a architectural blueprint for balancing specialized isolation with shared capacity efficiency. Watch for whether other major streaming players move away from custom schedulers like Apache Mesos in favor of Kueue for their AI and encoding pipelines.
Additional Context
The migration to Kueue is a key component of Netflix’s broader multi-year effort to modernize its internal container platform, Titus. Originally launched in 2015 on Apache Mesos, Titus was open-sourced in 2018 as a way to manage three million containers per week. However, as Kubernetes solidified as the industry standard, Netflix began a phased 'lift and shift' of the Titus control plane to Kubernetes-native extensions. Per internal technical reports from late 2025, this transition allows Netflix to leverage the broader open-source ecosystem while maintaining the specialized runtime environments required for its custom systemd-compatible containers and deep AWS integrations. This infrastructure evolution aligns with recent industry shifts toward high-performance batch processing for generative AI and data analytics. According to the CNCF 2025 Annual Survey, approximately 84% of organizations now utilize Kubernetes in production, with a growing subset adopting tools like Kueue to handle bursty, resource-intensive workloads. Similar efforts are visible across the streaming landscape; for instance, per engineering logs from early 2026, Spotify manages over 4,000 microservices and specialized training jobs across 200 clusters. By standardizing on Kueue, Netflix reduces its internal maintenance burden and positions its stack to better handle diverse hardware flavors including the specific GPU requirements of modern recommendation engines.
Read full article at netflixtechblog.com
