AWS maps global services to control-plane failure boundaries
This article from AWS discusses the architecture of global AWS services, distinguishing between their control planes and data planes within the context of fault isolation. It categorizes global services into 'unique by partition' and 'edge network' types, as well as 'global single-Region operations', and provides recommendations for building resilient high availability (HA) and disaster recovery (DR) mechanisms by reducing dependencies on control planes during recovery events.
Key Takeaways
- IAM, Organizations, and Account Management have control planes in us-east-1, while Route 53 Application Recovery Controller and AWS Network Manager are in us-west-2.
- Edge-network global services include Route 53 Public DNS, CloudFront, AWS WAF, ACM for CloudFront, AWS Global Accelerator, and AWS Shield Advanced.
- AWS warns against using Route 53 record changes, IAM CRUDL actions, AGA traffic-dial changes, or CloudFront origin edits as part of failover.
- Amazon S3 bucket configuration changes such as PutBucketPolicy, PutBucketVersioning, and PutBucketReplication depend on us-east-1 in the aws partition.
- AWS recommends using Regional STS endpoints, pre-provisioning ELBs and API Gateway endpoints, and keeping SAML break-glass users ready.
Why It Matters
The immediate takeaway is operational: AWS is telling teams to design failover so they do not need a live global control plane to recover. That matters across the streaming stack because DNS, load balancing, object storage, and edge delivery often sit in the recovery path, and AWS explicitly calls out Route 53, S3, CloudFront, ELB, API Gateway, and STS dependencies. The next signal to watch is whether your own DR runbooks still require creating DNS records, bucket changes, or edge endpoints during an outage.
Read full article at docs.aws.amazon.com
