AWS scales EKS for AI with 100,000-node clusters and sub-second inference
Amazon Web Services details how Amazon EKS (Elastic Kubernetes Service) supports AI/ML workloads, including inference, training, and generative AI applications, highlighting its performance, scalability, and cost optimization capabilities. The platform allows organizations to leverage existing Kubernetes expertise for orchestrating complex AI/ML pipelines and integrates with open-source tools and AWS services. Several companies, including BMW Group and Booking.com, use EKS for various AI/ML tasks, achieving significant improvements in efficiency and cost savings.
Key Takeaways
- Amazon EKS now supports up to 100,000 worker nodes per cluster, facilitating the training of trillion-parameter models.
- Booking.com uses EKS for search ranking inference, processing 250,000 requests per second with 40 ms p99.9 latency.
- Content moderation firm Unitary achieved an 80% reduction in container boot times for processing 26 million daily videos.
- Synthesia reported a 30x improvement in machine learning model training throughput for generative video creation.
- Anthropic runs its Claude foundation models on EKS using AWS Trainium and NVIDIA GPU clusters.
Why It Matters
This development signals that Kubernetes has become the primary control plane for high-scale AI in the streaming industry. For platforms managing massive libraries or live UGC, the ability to run content moderation and metadata extraction with sub-second latency on a unified infrastructure reduces the high cost of fragmented GPU environments. As streaming shifts toward agentic AI for personalization and automated highlight generation, EKS provides the necessary orchestration to scale these services without the need for bespoke infrastructure. Watch for the adoption of AWS Trainium3 instances within EKS clusters to further drive down the training costs for proprietary video models.
Additional Context
The expansion of EKS capabilities aligns with broader industry shifts toward container-native AI infrastructure. Per AWS at re:Invent 2025, GPU usage managed by Kubernetes doubled year-over-year between 2024 and 2025, driven largely by agentic and multimodal workloads. Gartner predicts that by 2028, roughly 95% of new AI workloads will run on Kubernetes, a substantial increase from less than 30% in late 2024. This growth is evidenced by companies like Flawless AI, which reported a 5x speedup in film localization experiments and a reduction in training times from weeks to days after migrating to EKS hybrid nodes. Simultaneously, AWS is integrating EKS more deeply with its broader AI stack to simplify the developer experience. Per an announcement in December 2025, AWS launched 'EKS Auto Mode' and integrated it with Amazon Q to automate GPU provisioning and troubleshooting. Specialized media tools are also being integrated via the Model Context Protocol (MCP), allowing AI agents on EKS to interact directly with creative platforms like Blender. This infrastructure layer is critical as Prime Video and others deploy generative AI for real-time artwork moderation and live stream quality enhancement, according to AWS reporting from late 2025. Hardware innovation remains a central pillar of this ecosystem. Per CNBC in December 2025, the launch of Trainium3 chips—featuring 3nm technology—offers 4.4x more compute performance and 4x greater energy efficiency than previous generations. These chips are being deployed in 'UltraServers' within EKS environments to help media companies manage the 'token generation' bottlenecks common in large-scale video understanding and localized dubbing workflows.
Read full article at docs.aws.amazon.com