AWS re:Invent 2018 – AWS Network architecture!

AWS re:Invent 2018! Yes, i am starting my blog series again on it so that i can give you as much internal information as i can that i collected from multiple sessions happened on it. I want to be specific, that i am not discussing here about AWS VPC, Direct Connect, CloudFront, Private Links, Routing, Subnets etc but i am giving their internal infra insight that they are using to deliver their service.

Today AWS has 19 Region and  57 Availability Zones(AZ) and largest AZ has maximum of 14 datacenter that has maximum of 3000 physical servers each.

  • Some important information:
  • Regional model allows AWS connectivity at low cost.
  • Availability Zones are atleast a mile away from each other to avoid HA issue.
  • 388 fibre spans between Availability Zones and Transit Center that helps in high throughput and low latency and in aggregate provides 4947 TeraBits of connectivity in just that single region.
  • AWS is achieving decreased network infrastructure cost by packing more fibres in single cable.
  • Continuous increase in Regions and AZ where they are planning to launch new 5 more Regions in Bahrain, Cape Town, Hong Kong SAR, Milan, and Stockholm.
  • They also announced 15 more AZ that will be coming soon in new Region.
  • Get more detailed insight on their global infra page.

Following AWS Global backbone Network in 2017 and 2018 shows the difference that they have made in just one year and we can see how much efforts they are inserting to have there own physical infrastructure as soon as possible so that they can isolate them self from other ISP to have better quality infrastructure. I don’t think monopoly behind their this aggressiveness but they would like to provide better services to customer in shorter duration span to get more market share from public cloud infra.                                       

 

 

 

 

 

 

Fig 1 – AWS Global Backbone Network 2017   

 

 

 

 

 

  

Fig 2 – AWS Global Backbone Network 2018

AWS has strong Point of Presence across globe and has 150 edge location in their CloudFront network through which they are providing you content delivery network and cached things for faster performance.

  • They have developed their own huge edge based infra with various ISP available in the world. For an example they might have tie up with your local Internet service provider to have better throughput in less time.
  • AWS has 180 Direct connect Partner.

 

AWS has 3 new backbone networking projects to expand its wings in the world and deploying more backbone cables in Asia and South Africa and projects names are Hawaki, Jupiter and Bay to Bay Express. Where Bay to Bay Express is from Hongkong to Singapore to US and it will get ready in 2021, Jupiter cable from Japan to US and it will get ready by 2020.

Below 3 images gives you complete submarine cable details and reference can be found on Hawaiki, Jupiter and Bay to Bay

Hawaiki submarine cable

Jupiter submarine cable

Bay to Bay Express submarine Cable

Why AWS is building so massive backbone infrastructure that takes huge billion dollar investment and the answer is

  • Security
  • Availability
  • Reliable Performance
  • Connecting closer to customer

Let me give you some insight about how your traffic is reaching to your physical hypervisor on which your EC2 instance are running inside AWS Data Centers.

 

  • This diagram shows traffic flow from Internet to AWS Global network then to AWS Edge POP traveling through Transit Centers in each region and then reaching to Availability zone and from AZ to specific Data Center building and from their to your VPC and then directly to your host Hypervisor through Nitro technology.
  • Each AZ has multiple DC and minimum will be 2 and 3 DC in all new upcoming AZ’s.
  • Each Region has 2 Transit center those are connected to each AZ in that region.

 

AWS provides AZ to customers to design HA around their infrastructure.  Transit center in each AWS region receives data from Internet and forwards them to specific DC in AZ. All DC in AZ are interconnected through high throughput ,low latency fiber cables. Each AZ in same region are interconnected and adjoining AZ directly receives network traffic from one of DC connected to it.

  • A standard AWS DC has around 3000 physical server and those servers are totally designed by AWS and racked and stacked by them for better performance.

 

 

 

Following few diagrams will look similar because amazon has standard design across their Transit center, AZ and DC but you will find different terminology used for those similar components so that they can differentiate among them and even they use same terms while supporting their internal network infra.

Some of the terms used in cellular DC Architecture are:

  1. Transit Center
  2. Core Edge Cell
  3. Core Inter-AZ Cell
  4. Core Intra-AZ Cell
  5. Core Access Cell
  6. Access Cell
  7. Hosts
  8. Other AZ
  9. Local AZ DC

 

What is Transit Center?

  • Provide internet and inter-Region(Backbone) connectivity.
  • All AZ are connected redundantly
  • Located in facilities with dense internet inter-connection.

Edge POP features:

  • Extends the AWS global network to the internet edge
  • Increased network scaling
  • Optimal interconnection with external networks
  • All major network services like Route53, Direct Connect, CloudFront, Shield(DDos and Scrubbing) comes under Edge POP network.
  • AWS Shield stops traffic at internet edge location before such malicious traffic reaches to AWS Backbone network.
  • It provides private peering/private network ineterconnection (PNI)
  • Public peering/internet exchange

Following diagram is AWS Global network backbone fabric infrastructure

Some of the terms used in backbone fabric are:

  1. Transit Center/Edge POP
  2. Remote POP
  3. Backbone Cell

 

 

 

 

Following diagram shows whats happens inside an Edge POP as various networking components directly interact with these POP devices.

Some of the terms used inside an Edge POP are:

  1. Backbone Cell
  2. External Internet Cell
  3. External Network
  4. AWS Services like Cloudfront, Route53,Direct Connect etc

 

 

 

 

Following diagram shows Edge POP internet edge-outbound structure that purely uses Border Gateway Protocol to talk to adjoining routers, whether those are hosted in AWS infra or out side their infra like major ISP providers or other backbones. AWS has designed their network totally from scratch as existing network gear vendors don’t have capacity and capability to handle such massive distributed IT infrastructure. They designed their own networking gears, own processing chips so that it can perform better in terms of higher throughput and less latency within their infra.

It contains some of the networking terms like:

  1. AWS Edge POP
  2. Router
  3. External Network
  4. BGP running on each routing device

 

In above diagram we can see 2 lines , Red line shows that it don’t have connectivity from AWS infra to actual customer device and second green line is showing that its has full connectivity to reach customer DC or device. AWS Routers are using published BGP ASN number to get information about routes reachable from connected routers and they AWS decides through which router it have to send those packet so that packets can reach to end device with higher throughput and low latency.

AWS has clarity where to use Large-Chasis Network Platform and where to use Single-Chip Network Platform and they are heavily utilizing these single-chip network platform as it reduces failure landscape and help them for easy recovery in case of failure.

 

  • AWS don’t believe in standby mode HA as they say they don’t feel confident in such setup because standby system can also fail at same time or can become non functional due to multiple factors like firmware, software, hardware reasons.
  • AWS finds more internal traffic in between Linecard and switch fabric.

 

 

Single-Chip network platform are totally designed by AWS from scratch to handle their workload in their own way.

Some of the benefits of this are:

  1. Easy to replace in case failure
  2. They can easily push updates to it
  3. Simple data transfer logic
  4. No much protocol implementation and configured as per requirement.
  5. Less complexity in power management etc.
  6. Loosely coupled routing gears.

Following diagram shows older and newer fiber cables where they have pushed the limits and combined 3456 cables in 2 inch convoy and 6912 fibre in same convoy. They developed these cables with their parterns and continue to invent in this domain. They are using Dense Wavelength Division Multiplexing (DWDM) technology for their inter and intra AZ connectivity but it is expensive and they are achieving better combination with dense fiber and other normal fibre. They are working to deploy these cables in Atlantic, India and Africa.

                                                                           Fig: Sample old fiber cable and with new fiber cable

AWS is aggressively deploying their own infra and avoiding usage of shared ISP infra as they see many down side in that traditional approach.

  • Health of other network not aware.
  • Route information and path calculation can take minutes and it causes network congestion.
  • Decision independence in routing.
  • Two ISP’s can forward traffic to same malfunctioning network and that leads to more congestion and re calculation.
  • Old usage was fine for most of the application but not good for QoS application like Voice, Video and any other interactive application like gaming etc.

So now AWS deploying their own network infra like CloudFront, Direct Connect etc and achieving following benefits like

  • Complete visibility
  • Latency aware decision
  • Avoid downside approach of ISP
  • Optimization of your application using AWS Global infra
  • Inbuilt fault isolation
  • You can control how routing will happen in geo location
  • Failure handling perceivable

More details about these services and concerned infra details can be read on provided AWS links but we can give core idea behind it using following flow diagram.There are 2 new services launched in this reinvent to help over come mentioned network/ISP/Shared network infra problems i.e.

  1. AWS Transit Gateway

 

 

 

 

2. AWS Global Accelerator

I will cover more details about each component and how they are helping customers in my further blogs on AWS re:Invent 2018 so keep eye on my coming up blogs but i don’t want to stretch this blog and my point is to list down components and services so that we can get glimpse in single blog so that we can study further on such components.

Directly, you are not going to interact with mentioned network components and those are just for better understanding about how your packets are traveling from your DC to AWS DC so that you can design your infra in better manner with more awareness. Being an solution architect we should keep all consideration in mind and we should keep our self informed about available services from such vendors and at the end i can say AWS is customer focused and listening to their problem and providing new services as per customer demand as we know big companies has offices/DC in multiple location and they comes up with multiple VPC and peering connection that creating complexity in management and i hope these new services will help us to manage our hosted infra well on AWS Console.