Amazon Web Services (AWS) offers a wide range of services that we can leverage to implement High Availability (HA) to any web application that we deploy on the AWS cloud.
I am going to create a Web application which is highly available, resilient, optimized and secure. It delivers high performance and low latency everytime users are accessing it and from whereever the users are accessing it.
I have deployed this application with following parameters:
Region - ap-south-1 (Asia Pacific Mumbai)
Availability Zone - AZ-a and AZ-b
A highly available architecture that spans two Availability Zones. Multi-AZ deployment.
VPC - I have created 2 Virtual Private Clouds
Subnets - I have created 2 Subnets and 2 Subnets in this VPC.
Under Availability Zone AZ-b
Under Availability Zone AZ-a
Route53 - Its a DNS service used for Domain Name Registration, maintaining DNS record sets and setting up different routing policies for Domain Names.
I have tried to explain this in Problem and Solution fashion, means what problems I have faced during designing a HA architecture and how I have implemented the solution to that problem.
Problem - How to manage unpredictable load on webserver EC2 instances in our architecture?
Solution - Auto-Scaling Groups
Implementation -
Problem - How to get an auto-configured server everytime an instance is added in the Auto-Scaling group?
I am going to create a Web application which is highly available, resilient, optimized and secure. It delivers high performance and low latency everytime users are accessing it and from whereever the users are accessing it.
I have deployed this application with following parameters:
Region - ap-south-1 (Asia Pacific Mumbai)
Availability Zone - AZ-a and AZ-b
A highly available architecture that spans two Availability Zones. Multi-AZ deployment.
VPC - I have created 2 Virtual Private Clouds
- testVPC (10.0.0.0/16)
- new-testVPC (10.1.0.0/16)
Subnets - I have created 2 Subnets and 2 Subnets in this VPC.
Under Availability Zone AZ-b
- Public_subnet1b = 10.0.1.0/24
- Private_subnet1b = 10.0.2.0/24
Under Availability Zone AZ-a
- Public_subnet1a = 10.0.20.0/24
- Private_subnet1a = 10.0.21.0/24
Route53 - Its a DNS service used for Domain Name Registration, maintaining DNS record sets and setting up different routing policies for Domain Names.
Schematic diagram of the HA architecture below:
I have tried to explain this in Problem and Solution fashion, means what problems I have faced during designing a HA architecture and how I have implemented the solution to that problem.
Problem - How to manage unpredictable load on webserver EC2 instances in our architecture?
Solution - Auto-Scaling Groups
- An Auto-Scaling group is dependent on 2 things: Launch Configuration and Scaling Policies.
Implementation -
- First choose the Launch Configuration and the VPC and Subnet in which we are implementing the Auto-Scaling group.
- Define the group size starting with 1 instance.
- Next step defining the minimum and maximum size of our group. (Min=1 and Max=3 instances)
- Create an alarm for Increase Group Size : Avg. CPU utilization >= 75% for consecutive period of 5 mins.
- You can optionally opt for sending Notification to a Topic name when the above alarm is triggered.
- Similarly create an alarm for Decrease Group Size : Avg. CPU utilization < 60% for consecutive period of 5 mins.
- Under the instances tab on selecting Auto-Scaling group, we can see the lifecycle/status of the instances.
Problem - How to get an auto-configured server everytime an instance is added in the Auto-Scaling group?
Solution - Snapshot
Implementation -
Problem - How to update/patch the EC2 instance in private subnet as there is no inbound nor outbound internet access?
Solution - NAT instance
Implementation -
Problem - How to have SSH access of EC2 instances in private subnets?
Solution - Bastion hosts
Implementation -
Problem - How to maintain performance of website with the increasing site traffic?
Solution - Elastic Load Balancer (ELB)
Implementation -
Problem - Increasing site traffic means increasing query load on Database, How to cope up?
Solution - ElastiCache
Implementation -
Problem - How to control the inbound/outbound traffic in the VPC?
Solution - Security Groups and Network Access Control Lists
Implementation -
Problem - Why not create a MySQL Database instance instead of a RDS instance?
Solution - RDS (MySQL)
Implementation -
Problem - How should I keep an active most recent backup of our WebApp?
Solution - Amazon S3
Implementation -
Problem - How to minimize the high latency if users are accessing this web application from US/Europe or any other part of globe?
Solution - CloudFront
Implementation -
Problem - The Web application needs to talk to an EC2 instance in a different VPC, how should we do it?
Solution - VPC Peering
Implementation -
- A Snapshot is a backup of a single EBS volume. You can create an AMI from Snapshot. It is not a bootable copy but an AMI is.
Implementation -
- Provision a template EC2 instance -> Install Apache,Wordpress and all the key configurations
- Take Snapshot of the root volume of the EC2 instance, (we can terminate the template instance after taking snapshot).
- Create an AMI with this Snapshot and name it myWebAppAMI.
- Use this AMI as Launch Configuration in the Auto Scaling group to spawn an auto-configured EC2 instance in the subnets.
- In this manner we will be saving the boot time of the newly created EC2 instances and no need to do configurations remotely everytime a server boots up.
Problem - How to update/patch the EC2 instance in private subnet as there is no inbound nor outbound internet access?
Solution - NAT instance
- A NAT instance, allows your private instances outgoing connectivity to the internet while at the same time blocking inbound traffic from the internet.
- A NAT instance is similar to a normal EC2 instance with NAT Optimized HVM type AMI.
Implementation -
- In the public subnet (public_subnet1a), provisioned a NAT instance to allow outbound internet access for DB_server instance in the private subnet (private_subnet1a).
- After running this instance I have changed the Destination/Source check and disabled it. To do this, right click on your NAT Instance within the AWS Console and select ‘Networking > Change Source/Dest. Check > Yes, Disable’.
- NAT Gateways provide the same functionality as a NAT instance, however, a NAT Gateway is an AWS managed NAT service. As a result, these NAT Gateways offer greater availability and bandwidth and require less configuration and administration. This was a costlier option and moreover should be applied to a very large scale application.
Problem - How to have SSH access of EC2 instances in private subnets?
Solution - Bastion hosts
- A Bastion Host is a special purpose computer on a host designed and configured to withstand attacks.
- It acts as a jump server, allowing you to use SSH or RDP to log in to other instances within private subnets.
Implementation -
- As we know the servers in private subnets are not configured to talk to the outside network. I have updated the Private Security Group to accept all the inbound/outbound traffic from Public Security Group.
- In the public subnets, EC2 instances in an Auto Scaling group to allow inbound Secure Shell (SSH) access to EC2 instances in private subnets.
Problem - How to maintain performance of website with the increasing site traffic?
Solution - Elastic Load Balancer (ELB)
Implementation -
- An Application Load Balancer must be deployed into at least two subnets (Public_subnet1a & Public_subnet1b) to distribute HTTP and HTTPS requests across multiple WordPress instances (WPS1,WPS2,..,WPSn).
Problem - Increasing site traffic means increasing query load on Database, How to cope up?
Solution - ElastiCache
- Caching improves application performance by storing critical pieces of data in memory for low latency access.
- Cached information may include the results of I/O-intensive database queries or the results of computationally-intensive calculations.
- Suppose we are running an online business, customers continuously asking for the information of a particular product. Instead of making a call to DB and always asking information for that product, we can cache the product data using Elasticache.
Implementation -
- I have used Redis nodes for caching database queries.
- As the Redis Cache is dependent on just Memory, we should always create the Redis node with Memory Optimized Instance type – (X1, R5, R4, R3)
- Port should be kept default port - 6379
- We can set number of Replicas from 0 to 5 to be a part of the cluster.
Problem - How to control the inbound/outbound traffic in the VPC?
Solution - Security Groups and Network Access Control Lists
- Security Groups add a security layer to EC2 instances that control both inbound and outbound traffic at the instance level.
- NACL also adds an additional layer of security associated with subnets that control both inbound and outbound traffic at the subnet level.
Implementation -
- I have created 2 Security Groups - test_security_group and test_security_group_private.
- I have declared the routes in the route tables and associated the subnets with the respective table - public_route_table and private_route_table
Problem - Why not create a MySQL Database instance instead of a RDS instance?
Solution - RDS (MySQL)
Implementation -
- Creating a RDS endpoint instead of deploying mySQL Database on an EC2 instance is better in many ways.
- It frees you from managing the time-consuming database administration tasks such as provisioning, backups, software patching, monitoring, and hardware scaling.
- It supports "License-included" licensing model, so we do not have to care about purchasing license separately.
- Amazon RDS provides high availability of MySQL Server using multi-availability zone capability, and this reduces the risk to data loss.
Problem - How should I keep an active most recent backup of our WebApp?
Solution - Amazon S3
- S3 buckets store the data as objects. We have many advantages of backing up any webapp on these buckets because of their high availability and durability.
Implementation -
- I am using S3-IA or S3-RRS buckets instead of S3-Standard to decrease the costing of this model as we do not want to access it frequently.
- I have created a Cron job running on the WebApp EC2 instance to take backup of site hourly.
- * */1 * * * aws s3 sync --delete /var/www/html s3://<bucket_name>
Problem - How to minimize the high latency if users are accessing this web application from US/Europe or any other part of globe?
Solution - CloudFront
Implementation -
- I have implemented the CloudFront as the static Content Delivery Network because static content is heavy (such as JPG, media, Audio files) and unnecessary load on EC2 instances.
- All the static content is being served from the nearest edge locations to the users worldwide. This has helped a lot in maintaining a low latency irrespective of the placement of the EC2 instances.
Problem - The Web application needs to talk to an EC2 instance in a different VPC, how should we do it?
Solution - VPC Peering
- A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses.
- Instances in either VPC can communicate with each other as if they are within the same network.
- You can create a VPC peering connection between your own VPCs, or with a VPC in another AWS account.
- The VPCs can be in different regions (also known as an inter-region VPC peering connection), it provides a simple and cost-effective way to share resources between regions.
Implementation -
- This peering was added to the architecture taking into account that some service might be required by WebApp running in main VPC to communicate/fetch data from some other EC2 instance in different VPC.
- I have created a VPC peering named test_peer between the 2 VPCs (testVPC and new-testVPC) which has made communication possible between these VPCs.
- Additionally I have to add an entry in route table for both of the public subnets in which the EC2 instances needs to communicate.
- public_route_table -> 10.1.0.0./16 test_peer
- new_public_route_table -> 10.0.0.0./16 test_peer
I hope this information will be helpful for upcoming solution architects. If you have any doubt/query please feel free to ask in the comment section below.
That's Pretty Helpful information dude.
ReplyDelete