Wednesday 7 August 2019

Designing HA Architecture in AWS part-3

Amazon Web Services (AWS) offers a wide range of services that we can leverage to implement High Availability (HA) to any web application that we deploy on the AWS cloud.

I am going to create a Web application which is highly available, resilient, optimized and secure. It delivers high performance and low latency everytime users are accessing it and from whereever the users are accessing it.
I have deployed this application with following parameters:

Region - ap-south-1 (Asia Pacific Mumbai)
Availability Zone - AZ-a and AZ-b
A highly available architecture that spans two Availability Zones. Multi-AZ deployment.
VPC - I have created 2 Virtual Private Clouds
  • testVPC (10.0.0.0/16)
  • new-testVPC (10.1.0.0/16)

Subnets - I have created 2 Subnets and 2 Subnets in this VPC.

Under Availability Zone AZ-b
  • Public_subnet1b = 10.0.1.0/24
  • Private_subnet1b = 10.0.2.0/24

Under Availability Zone AZ-a
  • Public_subnet1a = 10.0.20.0/24
  • Private_subnet1a = 10.0.21.0/24
WPS - I have provisioned the EC2 instances in both of the Public Subnets and installed wordpress on them and named them WPS. So at any point of time we will be having traffic served by atleast 2 intances (1 in each subnet) to a maximum capacity of 6 instances (3 in each subnet).

Route53 - Its a DNS service used for Domain Name Registration, maintaining DNS record sets and setting up different routing policies for Domain Names.

Schematic diagram of the HA architecture below:






I have tried to explain this in Problem and Solution fashion, means what problems I have faced during designing a HA architecture and how I have implemented the solution to that problem.


Problem - How to manage unpredictable load on webserver EC2 instances in our architecture?
Solution - Auto-Scaling Groups

  • An Auto-Scaling group is dependent on 2 things: Launch Configuration and Scaling Policies.

Implementation -

  • First choose the Launch Configuration and the VPC and Subnet in which we are implementing the Auto-Scaling group.
  • Define the group size starting with 1 instance.
  • Next step defining the minimum and maximum size of our group. (Min=1 and Max=3 instances)
  • Create an alarm for Increase Group Size : Avg. CPU utilization >= 75% for consecutive period of 5 mins.
  • You can optionally opt for sending Notification to a Topic name when the above alarm is triggered.
  • Similarly create an alarm for Decrease Group Size : Avg. CPU utilization < 60% for consecutive period of 5 mins.
  • Under the instances tab on selecting Auto-Scaling group, we can see the lifecycle/status of the instances.



Problem - How to get an auto-configured server everytime an instance is added in the Auto-Scaling group?
Solution - Snapshot
  • A Snapshot is a backup of a single EBS volume. You can create an AMI from Snapshot. It is not a bootable copy but an AMI is.

Implementation -

  • Provision a template EC2 instance -> Install Apache,Wordpress and all the key configurations
  • Take Snapshot of the root volume of the EC2 instance, (we can terminate the template instance after taking snapshot).
  • Create an AMI with this Snapshot and name it myWebAppAMI.
  • Use this AMI as Launch Configuration in the Auto Scaling group to spawn an auto-configured EC2 instance in the subnets.
  • In this manner we will be saving the boot time of the newly created EC2 instances and no need to do configurations remotely everytime a server boots up.



Problem - How to update/patch the EC2 instance in private subnet as there is no inbound nor outbound internet access?
Solution - NAT instance

  • A NAT instance, allows your private instances outgoing connectivity to the internet while at the same time blocking inbound traffic from the internet.
  • A NAT instance is similar to a normal EC2 instance with NAT Optimized HVM type AMI.

Implementation -
  • In the public subnet (public_subnet1a), provisioned a NAT instance to allow outbound internet access for DB_server instance in the private subnet (private_subnet1a).
  • After running this instance I have changed the Destination/Source check and disabled it. To do this, right click on your NAT Instance within the AWS Console and select ‘Networking > Change Source/Dest. Check > Yes, Disable’.
  • NAT Gateways provide the same functionality as a NAT instance, however, a NAT Gateway is an AWS managed NAT service. As a result, these NAT Gateways offer greater availability and bandwidth and require less configuration and administration. This was a costlier option and moreover should be applied to a very large scale application.


Problem - How to have SSH access of EC2 instances in private subnets?
Solution - Bastion hosts

  • A Bastion Host is a special purpose computer on a host designed and configured to withstand attacks.
  • It acts as a jump server, allowing you to use SSH or RDP to log in to other instances within private subnets.

Implementation -

  • As we know the servers in private subnets are not configured to talk to the outside network. I have updated the Private Security Group to accept all the inbound/outbound traffic from Public Security Group.
  • In the public subnets, EC2 instances in an Auto Scaling group to allow inbound Secure Shell (SSH) access to EC2 instances in private subnets.



Problem - How to maintain performance of website with the increasing site traffic?
Solution - Elastic Load Balancer (ELB)
Implementation -

  • An Application Load Balancer must be deployed into at least two subnets (Public_subnet1a & Public_subnet1b) to distribute HTTP and HTTPS requests across multiple WordPress instances (WPS1,WPS2,..,WPSn).



Problem - Increasing site traffic means increasing query load on Database, How to cope up?
Solution - ElastiCache

  • Caching improves application performance by storing critical pieces of data in memory for low latency access.
  • Cached information may include the results of I/O-intensive database queries or the results of computationally-intensive calculations.
  • Suppose we are running an online business, customers continuously asking for the information of a particular product. Instead of making a call to DB and always asking information for that product, we can cache the product data using Elasticache.

Implementation -

  • I have used Redis nodes for caching database queries.
  • As the Redis Cache is dependent on just Memory, we should always create the Redis node with Memory Optimized Instance type – (X1, R5, R4, R3)
  • Port should be kept default port - 6379
  • We can set number of Replicas from 0 to 5 to be a part of the cluster.



Problem - How to control the inbound/outbound traffic in the VPC?
Solution - Security Groups and Network Access Control Lists

  • Security Groups add a security layer to EC2 instances that control both inbound and outbound traffic at the instance level.
  • NACL also adds an additional layer of security associated with subnets that control both inbound and outbound traffic at the subnet level.

Implementation -

  • I have created 2 Security Groups - test_security_group and test_security_group_private.
  • I have declared the routes in the route tables and associated the subnets with the respective table - public_route_table and private_route_table



Problem - Why not create a MySQL Database instance instead of a RDS instance?
Solution - RDS (MySQL)
Implementation -

  • Creating a RDS endpoint instead of deploying mySQL Database on an EC2 instance is better in many ways.
  • It frees you from managing the time-consuming database administration tasks such as provisioning, backups, software patching, monitoring, and hardware scaling.
  • It supports "License-included" licensing model, so we do not have to care about purchasing license separately.
  • Amazon RDS provides high availability of MySQL Server using multi-availability zone capability, and this reduces the risk to data loss.



Problem - How should I keep an active most recent backup of our WebApp?
Solution - Amazon S3

  • S3 buckets store the data as objects. We have many advantages of backing up any webapp on these buckets because of their high availability and durability.

Implementation -

  • I am using S3-IA or S3-RRS buckets instead of S3-Standard to decrease the costing of this model as we do not want to access it frequently.
  • I have created a Cron job running on the WebApp EC2 instance to take backup of site hourly.
  • * */1 * * * aws s3 sync --delete /var/www/html s3://<bucket_name>



Problem - How to minimize the high latency if users are accessing this web application from US/Europe or any other part of globe?
Solution - CloudFront
Implementation -

  • I have implemented the CloudFront as the static Content Delivery Network because static content is heavy (such as JPG, media, Audio files) and unnecessary load on EC2 instances.
  • All the static content is being served from the nearest edge locations to the users worldwide. This has helped a lot in maintaining a low latency irrespective of the placement of the EC2 instances.



Problem - The Web application needs to talk to an EC2 instance in a different VPC, how should we do it?
Solution - VPC Peering

  • A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses.
  • Instances in either VPC can communicate with each other as if they are within the same network.
  • You can create a VPC peering connection between your own VPCs, or with a VPC in another AWS account.
  • The VPCs can be in different regions (also known as an inter-region VPC peering connection), it provides a simple and cost-effective way to share resources between regions.

Implementation -

  • This peering was added to the architecture taking into account that some service might be required by WebApp running in main VPC to communicate/fetch data from some other EC2 instance in different VPC.
  • I have created a VPC peering named test_peer between the 2 VPCs (testVPC and new-testVPC) which has made communication possible between these VPCs.
  • Additionally I have to add an entry in route table for both of the public subnets in which the EC2 instances needs to communicate.
    • public_route_table -> 10.1.0.0./16 test_peer
    • new_public_route_table -> 10.0.0.0./16 test_peer

I hope this information will be helpful for upcoming solution architects. If you have any doubt/query please feel free to ask in the comment section below.

1 comment: