AWS Architecture Blog
Build a multi-Region AWS PrivateLink backed service with seamless failover
Global Payments Inc. is a leading worldwide provider of payment technology and software solutions, headquartered in Atlanta, Georgia. The company processes more than 75 billion transactions annually, serving more than 5 million merchant locations and nearly 2,000 financial institutions globally. Through its merger with TSYS in 2019, Global Payments expanded its capabilities beyond merchant acquiring services to include issuer processing solutions.
The company’s services now encompass ecommerce and omnichannel payments, business management software, customer engagement tools, and cloud-based solutions. Their commitment to technological innovation and customer service has positioned them as one of the largest financial technology companies globally, consistently ranking among the Fortune 500.
This post demonstrates how the Issuer Solutions business of Global Payments, as a service provider, implemented cross-Region failover for an AWS PrivateLink backed service exposed to their customers. Their solution enables failover to a secondary Region without customer coordination, reducing Recovery Time Objective (RTO).
AWS PrivateLink involves two key roles: service providers and service consumers. Service providers build, own, and manage endpoint services. Service consumers create and manage Amazon Virtual Private Cloud (Amazon VPC) endpoints that connect their VPC to an Amazon VPC endpoint service privately, without exposing the traffic to the public internet. Enterprises often build services in multiple AWS Regions for resilience. This approach requires endpoint services in two Regions, with service consumers creating VPC endpoints for each service.
The architecture uses Amazon Route 53 to resolve the service’s Fully Qualified Domain Name (FQDN) to the active Region’s service. Amazon Application Recovery Controller (ARC) is used to initiate the failover.
Customer requirements
Issuer Solutions had the following requirements for this implementation:
- The ability for consumers to access the service privately, without traversing the public internet
- Resilience to degradation in a single Region by allowing failover to a secondary Region
- Independent failover without customer coordination
- Reliable and simple failover process
Solution overview
The following simplified architecture diagram illustrates the connectivity and failover mechanisms. The exact service implementation of Issuer Solutions is beyond this post’s scope. For simplicity, we represent the service as a Network Load Balancer backed by Amazon Elastic Compute Cloud (Amazon EC2) instances.
The solution consists of the following key components:
- The service provider deploys identical services in two Regions. The service is represented in this simplified version with a Network Load Balancer backed by EC2 instances. Each Region is independent and therefore resilient against failures in the other Region.
- Services are exposed through PrivateLink as VPC endpoint services in each Region, allowing client connections without needing NAT gateways or internet gateways and keeping the traffic within the customers’ own private IP space.
- The service provider authorizes the consumer AWS account to find the services using the VPC endpoint service names.
- The service consumer uses the VPC PrivateLink endpoint service names to create VPC endpoints with Elastic Network Interfaces (ENIs) in two Availability Zones in each Region. The consumer does this in both consumer Regions, and each consumer Region has two sets of VPC endpoints: one for the primary service of the provider and one for the secondary service.
- The service consumer creates a Route 53 private hosted zone in each of the two Regions they use, each with two alias records, with a simple routing policy, pointing to the VPC endpoints’ FQDNs in their own Regions. These two alias records are primary.example.com and secondary.example.com.
- The service provider creates a Route 53 ARC cluster with routing controls for the primary and secondary Regions.
- The service provider creates a private hosted zone with a failover record set for a consumer specific CNAME like custC.service.p.com that resolves to primary.example.com and secondary.example.com. These records are associated with health checks that are associated with Route 53 ARC routing controls.
As shown in the architecture diagram, it is not necessary for the provider’s and consumer’s Regions to be the same because AWS supports creating VPC endpoints in a Region different to that of the VPC endpoint service itself. Refer to AWS PrivateLink now supports cross-region connectivity for more details.
How the DNS resolution works
Each consumer Region has two Route 53 private hosted zones involved in DNS resolution in this approach: the service provider’s hosted zone and the service consumer’s hosted zone. Both hosted zones are associated with the VPC used by the service consumer. Here is how the DNS resolution works:
- When a client in the consumer VPC wants to reach the service, it uses the FQDN
custC.service.p.com
. - The hosted zone in the service provider’s account resolves custC.service.p.com to either primary.example.com or secondary.example.com depending on the status of the health checks controlled by ARC. For now, let’s assume this resolves to primary.example.com.
- Next, primary.example.com resolves to the VPC endpoint FQDN
com.amazonaws.vpce.<primary-region>.vpce-svcabc123
due to the hosted zone in the service consumer.
At the time of failover, the service provider will update the ARC health checks to turn off the primary control and turn on the secondary routing control. This causes the service provider’s hosted zone to resolve custC.service.p.com to secondary.example.com, which in turn resolves to the VPC endpoint FQDN in the secondary Region due to the service consumer’s hosted zone.
Considerations
With this setup, the service provider can fail over when they need to without the service consumer having to manually make any changes. This is especially useful for services that have multiple service consumers. Additionally, service consumers can make changes to the VPC endpoint as they see fit. They only need to update the hosted zone they manage to make sure that primary.example.com and secondary.example.com point to the correct VPC endpoints.
We used ARC in this post because it offers a robust solution for cross-Region failover with built-in static stability. ARC also avoids single points of failure in the failover logic by distributing it across multiple Regions.
This setup demonstrates an active-passive configuration where traffic is routed to a single Region at a time using the Route 53 failover routing policy. For an active-active approach, you can adapt this setup by employing alternative Route 53 policies such as weighted or latency-based routing, as detailed in Active-active and active-passive failover.
Code sample
We have created a GitHub repository with Terraform code to demonstrate this solution. The repository has the steps to set it up and test it.
Conclusion
This implementation of cross-Region failover by Global Payments Issuer Solutions for their PrivateLink backed service demonstrates a robust and flexible approach to providing high availability and resilience. By using AWS services such as PrivateLink, Route 53, and Route 53 ARC, they have created a solution that meets their key requirements. This architecture not only benefits Global Payments by allowing them to manage failovers efficiently, but also provides advantages to their service consumers. Customers maintain control over their own infrastructure while benefiting from seamless service continuity. As cloud architectures continue to evolve, solutions like this showcase the power of combining various AWS services to create highly available and fault-tolerant systems that meet complex business needs.
Clone our GitHub repository now and deploy this solution in your own AWS environment to try out this approach to cross-Region failover. Contact your AWS representative today to begin your journey toward enhanced business continuity.