Friday, 13 September 2019

Docker in a glimpse

Docker is a great tool for building micro-services, allowing you to create cloud-based applications and systems.
It is a container management service which is based on the concept of Develop, Ship and Run anywhere. To develop apps and ship them into containers which can be deployed on any platform.

With the initial release of Docker technology in March 2013, there was an embarkment of new technical term - Containerization. It has started revolutionizing the old concept of virtualization, where we need to build a complete Guest OS on top of a host OS to run an application.

Virtual machines are resource-intensive as they consumes a lot of resources (compute, memory,etc) in runtime whereas containers are lightweight and boots up in seconds.

Virtualization vs Containerization





Architecture of Docker


The basic architecture of Docker is a Client-Server architecture and consists of 3 major parts:

1. Docker Host - Docker Host runs the Docker Daemon. Docker Daemon listens for Docker requests. Docker requests could be ‘docker run’, ‘docker build’, anything.
It manages docker objects such as images, containers, networks, and volumes.

2. Docker Client - Docker Client is used to trigger Docker commands. It sends the Docker commands through a CLI to the Docker Daemon. It can communicate with more than one daemon.

3. Registry - The Registry is a stateless, highly scalable server-side application that stores and lets you distribute Docker images. You can create your own image and upload it to Docker Hub or any other specified registry.
When we run the command docker pull or docker run, the required images are pulled from your configured registry.





What are containers?


A container is a special type of process that is isolated from other processes. Containers are assigned resources that no other process can access, and they cannot access any resources that are not explicitly assigned to them.


The technology behind the containers!!


Docker containers are evolved from LXC (Linux Containers)
LXC is the well-known and heavily tested low-level Linux container runtime. It is in active development since 2008 and has implemented various well-known containerization features inside the Linux kernel.
The goal of LXC is to create an environment as close as possible to a standard Linux installation but without the need for a separate kernel. To read more about LXC here.

Docker Daemon/Engine is evolved from LXD (LXD daemon)
LXD is a next generation system container manager. It offers a completely fresh and intuitive user experience with a single command line tool to manage your containers. Containers can be managed over the network in a transparent way through a REST API.


Features of Containers:



  • Complete isolation - Two containerized processes can run side-by-side on the same computer, but they can’t interfere with each other. They can’t access each other’s data unless explicitly configured to do so.
  • Shared physical infrastructure - We don't need separate hardware for running different applications. All can share the same hardware. Shared hardware means lower costs.
  • More Secure - Since there is sharing of hardware only but the processes and data being isolated so it becomes very secure.
  • Faster scaling of applications - The creation and destroying time for containers is negligible. Even there is no need to purchase the physical infrastructure for scaling up of an application like it used to happen many years ago.



The future of Docker


As the technology shifted from mainframes to PC, Baremetal to Virtualization, Datacenters to Cloud. Now is the time for moving from host to containers (going serverless).
As per the trend analysis, By 2020, more than 50% of the global organizations will be running containers in production.

After going through many articles, I can infer that trend in technology has inclined more towards Kubernetes after DockerCon 2017 as the Swarm (Docker's inhouse container orchestration tool) started seeing a tough competition.

Though the simplicity of the Docker Swarm as the container orchestrator that has taken Docker to this level.

We haven't seen any recent development in Docker Swarm repository from quite a long time. (https://github.com/docker/swarm/wiki)

Even Docker itself has adopted Kubernetes as a container orchestrator.

Is this an indication that the Docker Swarm will be soon out of the picture as more and more industries start adopting Kubernetes in their architecture.

Needless to say that the docker has survived till now and will keep running as an organization, regardless of the speculations that some big organizations will acquire it. There are a lot of new upcoming features and development still going on.

Incorporating cgroups v2 will give Docker better resource isolation and management capabilities.
Adopting P2P model for image delivery and distribute images using something like BitTorrent sync.

Referenced Articles:



See you in the upcoming tutorials for taking a deeper dive into Docker.

Monday, 2 September 2019

ESP8266 IOT sensing with Blynk

Basically the internet of things, or IoT, is a system of interrelated computing devices, mechanical and digital machines, objects, animals or people that are provided with unique identifiers (UIDs) and the ability to transfer data over a network without requiring human-to-human or human-to-computer interaction.

Evolution of IOT


The internet of things is also a natural extension of SCADA (supervisory control and data acquisition), a category of software application program for process control, the gathering of data in real time from remote locations to control equipment and conditions.
SCADA systems include hardware and software components. The hardware gathers and feeds data into a computer that has SCADA software installed, where it is then processed and presented it in a timely manner. The evolution of SCADA is such that late-generation SCADA systems developed into first-generation IoT systems.


Designing of our first IOT system


Here we are using ESP8266 Nodemcu module which collects the realtime data from the temperature/humidity sensor (DHT-22) and the soil moisture sensor once every second.

I have created this arduino code where we are collecting this sensor data and sending it over to Blynk hosted API with the help of Blynk libraries.

  1. Download the latest library Blynk_Release_vXX.zip file from the GitHub page and extract the zip file to the Library location of Arduino software.
  2. Download Blynk App on Android or iOS device and follow these steps for getting started.
  3. Download the code from my github repository.
git clone https://github.com/saurabh221089/esp8266_iot_blynk.git

Generate your auth token from app and update it in the code.
char auth[] = "53e4da8793764b6197fc44a673ce4e21";
Change you SSID and Password in the code and flash this code on your ESP module.
char ssid[] = "wifi4u";  //Enter your WIFI Name
char pass[] = "abcd1234";  //Enter your WIFI Password

In the created project on the app, assign the Virtual Pins to the Variables in the program to PULL the data from the API and show it on the app dashboard. 

Blynk.virtualWrite(V5, h);  //V5 is for Humidity
Blynk.virtualWrite(V6, t);  //V6 is for Temperature
Blynk.virtualWrite(V7, m);  //V7 is for Soil Moisture

Flash this code on your NodeMCU module and run the app on the same time to start monitoring.

If you just want to build a static weather monitoring module without any networking involved. You can create a weather monitoring station with ESP8266 and an OLED display.

All the best my friends for your exploration into IOT world. Happy IOTing!!

ESP8266 weather monitor

This tutorial will help you setup your own Weather Monitor with Nodemcu development board on which ESP8266 module is sitting as a microcontroller.

We are going to build a Weather monitor that will show all the vital stats for my home plantation.
Temperature (C), Humidity, Heat Index and Soil moisture percentage. This has helped me to maintain the perfect levels for apt growth of indoor plants.

What is an ESP8266?


The ESP8266 is a System on a Chip (SoC), manufactured by the Chinese company Espressif. It consists of a Tensilica L106 32-bit micro controller unit (MCU) and a Wi-Fi transceiver. It has 11 GPIO pins* (General Purpose Input/Output pins), and an analog input as well. This means that you can program it like any normal Arduino or other microcontroller.

And on top of that, you get Wi-Fi communication, so you can use it to connect to your Wi-Fi network, connect to the Internet, host a web server with real webpages, let your smartphone connect to it, etc. The possibilities are endless! It's no wonder that this chip has become the most popular IOT device available in the market today.

Pre-Requirements


1. Arduino software to upload the .ino sketch on the nodemcu
2. To program the ESP8266, you'll need a plugin for the Arduino IDE, it can be downloaded from GitHub manually, but it is easier to just add the URL in the Arduino IDE:
  1. Open the Arduino IDE.
  2. Go to File > Preferences.
  3. Paste the URL http://arduino.esp8266.com/stable/package_esp8266com_index.json into the Additional Board Manager URLs field.(You can add multiple URLs, separating them with commas.)
  4. Go to Tools > Board > Board Manager and search for 'esp8266'. Select the newest version, and click install. (As of Sep 1st 2019, the latest stable version is 2.5.0.)

Components Required


  • NodeMCU (ESP8266 development board)
  • OLED 128x64 screen
  • DHT-22 temperature/humdity sensor
  • Capacitive Soil Moisture sensor
  • Few jumper cables to connect the components and a breadboard.

Github Repo to clone this project and code




Schematic diagram of the complete circuit



If you want to monitor the weather on your Android device when you are outside your home and want to leverage the full potential of ESP8266 module. You can create an IOT enabled weather monitoring station with ESP8266, DHT-22 sensor and Blynk library and monitor the weather from anywhere around the world.

Monday, 26 August 2019

Role of Ansible in DevOps


Ansible is an open source automation platform. Ansible can help you with configuration management, application deployment, task automation and also IT orchestration.
It is very simple to setup, efficient and powerful tool to make IT professional's life easy.

Features of Ansible:

  1. Ansible uses YAML syntax to define playbook configuration files which is having minimum syntax and easy to understand by humans. 
  2. Write few lines of code to manage and provision your infrastructure.
  3. The work velocity of developers were affected since sysadmins were taking time to configure servers.
  4. Roll-ups and Roll-backs are possible if we want to go back to Java 1.8 from Java 2.0.
  5. Grouping of Servers for partial deployment, let's say on 40 servers out of 100 servers.
  6. Uses push based configuration methodology, where no agent needs to be installed on slaves.
  7. Agent-less configuration tool unlike Puppet & Chef that uses Pull-based (agent-based) config methodology.
  8. Ansible Tower is a GUI based tool where deployment can be done from UI for Large scale enterprises.


Sample Ansible Playbook config file created to make understand the syntax:


Download the sample file - Click Here


Playbooks are simple files written in YAML code. Used to declare configurations and launching task synchronously and asynchronously.
A YAML file will always start with '---' three hyphens.
Indentation matters a lot in YAML files. Running an incorrectly indentated file will run into error and you'll have to spend a lot of time in checking for that extra space.


Hosts is simply a remote machine that Ansible manages. They can have individual variables assigned to them, and can also be organized in groups.


Tasks combine an action with a name. Playbooks exist to run tasks.


Action is a part of a task that specifies which of the modules to run and which arguments to pass to that module. Each task can have only one action, but it may also have other parameters.


Notify is the act of a task registering a change event and informing a handler task that another action needs to be run at the end of the play. If a handler is notified by multiple tasks, it will still be run only once.


Handlers are just like task but they will only run when notified by successful completion of that task.
Handlers are run in the order they are listed, not in the order that they are notified. They are placed at same level (identation) as hosts and tasks in YAML.

-----------------------------------------------------------------------------------------

We can execute commands on remote machines via 2 commands

1. Directly by issuing command
ansible all -m copy -a "src=/home/user1/test.html dest=/home/user2"

2. By creating a playbook file and running it.
ansible-playbook /opt/playbooks/copyfile.yml


##To check the syntax of playbook.yml file

ansible-playbook test-playbook.yml --syntax-check


##To check the list of modules that comes installed with Ansible
(You will be surprised to know that there are around 3000 modules installed by default)
ansible-doc -l


##To test a module named ping, for checking connectivity between controller and slave nodes
ansible -m ping web-servers


If you'll start exploring various modules in ansible, trust me you will fall in love with this tool. As there is nothing which can not be accomplished by it.

Friday, 23 August 2019

Setting up an Ansible Server on AWS EC2 instance


To setup Ansible server on EC2 instance we need to first launch an EC2 instance and SSH into it. Now follow the below commands to install Ansible on it.

##Create ansadmin user and update its password 
useradd ansadmin
passwd ansadmin

##Add ansadmin user to sudoers group
echo "ansadmin ALL=(ALL) ALL" >> /etc/sudoers

##sed command replaces "PasswordAuthentication no to yes" without editing sshd_config file
sed -ie 's/PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config

##create ssh keys for password-less authentication between Ansible control server and hosts.

#Login as ansadmin user and generate ssh key on Master
ssh-keygen

#Create same ansadmin user on the target host server.

#Copy Master ssh keys onto all ansible hosts nodes
ssh-copy-id <target-host-server>

#Update target servers IP on /etc/ansible/hosts file on Master (Always use internal Private IP address)
echo "<target host server IP>" >> /etc/ansible/hosts


Ansible hosts file should look like: cat /etc/ansible/hosts

[web-servers]
10.0.1.20
10.0.1.21
10.0.1.22

#Run ansible command as ansadmin user on Control server. It should be successful.
ansible all -m ping


We have now setup a successful passwordless authentication between all the hosts and the control server from where we can handle any type of tasks like installing any application, starting/stopping any service, copying a config file on servers.
This we will discuss in our next blog.

Setting up a Tomcat server on AWS EC2 instance


This will create a Tomcat server on AWS EC2 instance and don't forget to add Port 8080 in the Security Group open for everyone to connect on the server.

Step 2 is similar installing Java jdk and setting up JAVA_HOME in path variable.


##Download the latest tomcat and extract under /opt directory
cd /opt
wget https://www-us.apache.org/dist/tomcat/tomcat-9/v9.0.22/bin/apache-tomcat-9.0.22.tar.gz
tar -xzf apache-tomcat-9.0.22.tar.gz

##Update the permissions to make startup.sh/shutdown.sh executables.
chmod +x /opt/apache-tomcat-9.0.22/bin/startup.sh shutdown.sh


##Create shortcut for starting/shutting down tomcat server
ln -s /opt/apache-tomcat-9.0.22/bin/startup.sh /usr/local/bin/tomcatup
ln -s /opt/apache-tomcat-9.0.22/bin/shutdown.sh /usr/local/bin/tomcatdown


##Add to the /conf/tomcat-users.xml file as we will be needing the deployer user to deploy our war file.

<role rolename="manager-gui"/>
<role rolename="manager-script"/>
<role rolename="manager-jmx"/>
<role rolename="manager-status"/>
<user username="admin" password="admin" roles="manager-gui, manager-script, manager-jmx, manager-status"/>
<user username="deployer" password="deployer" roles="manager-script"/>
<user username="tomcat" password="s3cret" roles="manager-gui"/>


##Search for context.xml files

find / -name context.xml

above command gives 3 context.xml files. comment () Value ClassName field on files which are under webapp directory.

##restart tomcat services through our created shortcut
tomcatdown
tomcatup

You can check if the server is up and running by typing the http://public-IP:8080 in your browser.

Setting up a Jenkins Server on AWS EC2 instance


Let us first understand what is Jenkins? Jenkins is a self-contained, open source automation server which can be used to automate all sorts of tasks related to building, testing, and delivering or deploying software. Jenkins is a software that allows continuous integration.

Prerequisites:

Launch and SSH into EC2 Instance.
Don't forget to add Port 8080 in the Security Group to open for all incoming connections.
Java JDK should be installed.

Install Java 1.8 JDK

  • sudo su
  • yum update –y
  • yum install java-1.8*

Update Bash_profile to set JAVA_HOME variable


  • To get the path of java binary - find /usr/lib/jvm/ -name java
  • Add below lines to your .bash_profile file in user's home directory. 
    • JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64
    • export JAVA_HOME
    • PATH=$PATH:$JAVA_HOME
  • save the file :wq 
  • To make the changes effective immediately update your .bash_profile
    • source ~/.bash_profile
  • java -version (check the java version)

Install Jenkins


yum -y install wget
wget -O /etc/yum.repos.d/jenkins.repo https://pkg.jenkins.io/redhat-stable/jenkins.repo

Import a key file from Jenkins-CI to enable installation from the package

rpm --import https://pkg.jenkins.io/redhat-stable/jenkins.io.key
yum install jenkins -y


Start Jenkins as a service


## Start jenkins service
systemctl start jenkins

## Setup Jenkins to start at boot,
systemctl enable jenkins


Accessing Jenkins UI


Visit the following address in your browser, http://EC2-SERVER-PUBLIC-IP:8080


Things to do after first login:


  • Default Password Location: cat /var/lib/jenkins/secrets/initialAdminPassword

  • Change admin password
    • Admin > Configure > Password

  • Configure java path
    • Manage Jenkins > Global Tool Configuration > JDK

  • Now you are ready to create a job for your build. Test Jenkins Job
    • Create “new item”
    • Enter an item name – My-first-job
    • Chose Freestyle project
    • Under Build section - choose option - Execute shell : echo "Welcome to Jenkins Demo"
    • Save your job
    • Build job
    • Check "console output" for successful completion of job.
Hurray!! you have just created and ran your first job.

Restarting Jenkins through browser (Only if you are logged in as Admin user)
http://<ip-address>:8080/restart

Wednesday, 7 August 2019

Designing HA Architecture in AWS part-3

Amazon Web Services (AWS) offers a wide range of services that we can leverage to implement High Availability (HA) to any web application that we deploy on the AWS cloud.

I am going to create a Web application which is highly available, resilient, optimized and secure. It delivers high performance and low latency everytime users are accessing it and from whereever the users are accessing it.
I have deployed this application with following parameters:

Region - ap-south-1 (Asia Pacific Mumbai)
Availability Zone - AZ-a and AZ-b
A highly available architecture that spans two Availability Zones. Multi-AZ deployment.
VPC - I have created 2 Virtual Private Clouds
  • testVPC (10.0.0.0/16)
  • new-testVPC (10.1.0.0/16)

Subnets - I have created 2 Subnets and 2 Subnets in this VPC.

Under Availability Zone AZ-b
  • Public_subnet1b = 10.0.1.0/24
  • Private_subnet1b = 10.0.2.0/24

Under Availability Zone AZ-a
  • Public_subnet1a = 10.0.20.0/24
  • Private_subnet1a = 10.0.21.0/24
WPS - I have provisioned the EC2 instances in both of the Public Subnets and installed wordpress on them and named them WPS. So at any point of time we will be having traffic served by atleast 2 intances (1 in each subnet) to a maximum capacity of 6 instances (3 in each subnet).

Route53 - Its a DNS service used for Domain Name Registration, maintaining DNS record sets and setting up different routing policies for Domain Names.

Schematic diagram of the HA architecture below:






I have tried to explain this in Problem and Solution fashion, means what problems I have faced during designing a HA architecture and how I have implemented the solution to that problem.


Problem - How to manage unpredictable load on webserver EC2 instances in our architecture?
Solution - Auto-Scaling Groups

  • An Auto-Scaling group is dependent on 2 things: Launch Configuration and Scaling Policies.

Implementation -

  • First choose the Launch Configuration and the VPC and Subnet in which we are implementing the Auto-Scaling group.
  • Define the group size starting with 1 instance.
  • Next step defining the minimum and maximum size of our group. (Min=1 and Max=3 instances)
  • Create an alarm for Increase Group Size : Avg. CPU utilization >= 75% for consecutive period of 5 mins.
  • You can optionally opt for sending Notification to a Topic name when the above alarm is triggered.
  • Similarly create an alarm for Decrease Group Size : Avg. CPU utilization < 60% for consecutive period of 5 mins.
  • Under the instances tab on selecting Auto-Scaling group, we can see the lifecycle/status of the instances.



Problem - How to get an auto-configured server everytime an instance is added in the Auto-Scaling group?
Solution - Snapshot
  • A Snapshot is a backup of a single EBS volume. You can create an AMI from Snapshot. It is not a bootable copy but an AMI is.

Implementation -

  • Provision a template EC2 instance -> Install Apache,Wordpress and all the key configurations
  • Take Snapshot of the root volume of the EC2 instance, (we can terminate the template instance after taking snapshot).
  • Create an AMI with this Snapshot and name it myWebAppAMI.
  • Use this AMI as Launch Configuration in the Auto Scaling group to spawn an auto-configured EC2 instance in the subnets.
  • In this manner we will be saving the boot time of the newly created EC2 instances and no need to do configurations remotely everytime a server boots up.



Problem - How to update/patch the EC2 instance in private subnet as there is no inbound nor outbound internet access?
Solution - NAT instance

  • A NAT instance, allows your private instances outgoing connectivity to the internet while at the same time blocking inbound traffic from the internet.
  • A NAT instance is similar to a normal EC2 instance with NAT Optimized HVM type AMI.

Implementation -
  • In the public subnet (public_subnet1a), provisioned a NAT instance to allow outbound internet access for DB_server instance in the private subnet (private_subnet1a).
  • After running this instance I have changed the Destination/Source check and disabled it. To do this, right click on your NAT Instance within the AWS Console and select ‘Networking > Change Source/Dest. Check > Yes, Disable’.
  • NAT Gateways provide the same functionality as a NAT instance, however, a NAT Gateway is an AWS managed NAT service. As a result, these NAT Gateways offer greater availability and bandwidth and require less configuration and administration. This was a costlier option and moreover should be applied to a very large scale application.


Problem - How to have SSH access of EC2 instances in private subnets?
Solution - Bastion hosts

  • A Bastion Host is a special purpose computer on a host designed and configured to withstand attacks.
  • It acts as a jump server, allowing you to use SSH or RDP to log in to other instances within private subnets.

Implementation -

  • As we know the servers in private subnets are not configured to talk to the outside network. I have updated the Private Security Group to accept all the inbound/outbound traffic from Public Security Group.
  • In the public subnets, EC2 instances in an Auto Scaling group to allow inbound Secure Shell (SSH) access to EC2 instances in private subnets.



Problem - How to maintain performance of website with the increasing site traffic?
Solution - Elastic Load Balancer (ELB)
Implementation -

  • An Application Load Balancer must be deployed into at least two subnets (Public_subnet1a & Public_subnet1b) to distribute HTTP and HTTPS requests across multiple WordPress instances (WPS1,WPS2,..,WPSn).



Problem - Increasing site traffic means increasing query load on Database, How to cope up?
Solution - ElastiCache

  • Caching improves application performance by storing critical pieces of data in memory for low latency access.
  • Cached information may include the results of I/O-intensive database queries or the results of computationally-intensive calculations.
  • Suppose we are running an online business, customers continuously asking for the information of a particular product. Instead of making a call to DB and always asking information for that product, we can cache the product data using Elasticache.

Implementation -

  • I have used Redis nodes for caching database queries.
  • As the Redis Cache is dependent on just Memory, we should always create the Redis node with Memory Optimized Instance type – (X1, R5, R4, R3)
  • Port should be kept default port - 6379
  • We can set number of Replicas from 0 to 5 to be a part of the cluster.



Problem - How to control the inbound/outbound traffic in the VPC?
Solution - Security Groups and Network Access Control Lists

  • Security Groups add a security layer to EC2 instances that control both inbound and outbound traffic at the instance level.
  • NACL also adds an additional layer of security associated with subnets that control both inbound and outbound traffic at the subnet level.

Implementation -

  • I have created 2 Security Groups - test_security_group and test_security_group_private.
  • I have declared the routes in the route tables and associated the subnets with the respective table - public_route_table and private_route_table



Problem - Why not create a MySQL Database instance instead of a RDS instance?
Solution - RDS (MySQL)
Implementation -

  • Creating a RDS endpoint instead of deploying mySQL Database on an EC2 instance is better in many ways.
  • It frees you from managing the time-consuming database administration tasks such as provisioning, backups, software patching, monitoring, and hardware scaling.
  • It supports "License-included" licensing model, so we do not have to care about purchasing license separately.
  • Amazon RDS provides high availability of MySQL Server using multi-availability zone capability, and this reduces the risk to data loss.



Problem - How should I keep an active most recent backup of our WebApp?
Solution - Amazon S3

  • S3 buckets store the data as objects. We have many advantages of backing up any webapp on these buckets because of their high availability and durability.

Implementation -

  • I am using S3-IA or S3-RRS buckets instead of S3-Standard to decrease the costing of this model as we do not want to access it frequently.
  • I have created a Cron job running on the WebApp EC2 instance to take backup of site hourly.
  • * */1 * * * aws s3 sync --delete /var/www/html s3://<bucket_name>



Problem - How to minimize the high latency if users are accessing this web application from US/Europe or any other part of globe?
Solution - CloudFront
Implementation -

  • I have implemented the CloudFront as the static Content Delivery Network because static content is heavy (such as JPG, media, Audio files) and unnecessary load on EC2 instances.
  • All the static content is being served from the nearest edge locations to the users worldwide. This has helped a lot in maintaining a low latency irrespective of the placement of the EC2 instances.



Problem - The Web application needs to talk to an EC2 instance in a different VPC, how should we do it?
Solution - VPC Peering

  • A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses.
  • Instances in either VPC can communicate with each other as if they are within the same network.
  • You can create a VPC peering connection between your own VPCs, or with a VPC in another AWS account.
  • The VPCs can be in different regions (also known as an inter-region VPC peering connection), it provides a simple and cost-effective way to share resources between regions.

Implementation -

  • This peering was added to the architecture taking into account that some service might be required by WebApp running in main VPC to communicate/fetch data from some other EC2 instance in different VPC.
  • I have created a VPC peering named test_peer between the 2 VPCs (testVPC and new-testVPC) which has made communication possible between these VPCs.
  • Additionally I have to add an entry in route table for both of the public subnets in which the EC2 instances needs to communicate.
    • public_route_table -> 10.1.0.0./16 test_peer
    • new_public_route_table -> 10.0.0.0./16 test_peer

I hope this information will be helpful for upcoming solution architects. If you have any doubt/query please feel free to ask in the comment section below.

Components of AWS part-2

In this blog we are going to take a deeper dive into the components that make AWS instead of having a birds eye view of it.
You might be thinking of what could be the infrastructure of AWS which is providing infrastructure to millions of users and thousands of customers worldwide.

AWS Global Infrastructure

The following are the components that make up the AWS infrastructure:

  • Availability Zones - Availability Zones as a Data Center,  An availability zone is a facility that can be somewhere in a country or in a city. Inside this facility, i.e., Data Centre, we can have multiple servers, switches, load balancing, firewalls. The things which interact with the cloud sits inside the data centers.
  • Region - Region is a distinct geographical area & can have 2 or more AZ. A region is a collection of data centers which are completely isolated from other regions. Currently there are 22 regions across the globe.
  • Edge locations - Edge Locations are endpoints for AWS which are used for caching content. Typically consists of CloudFront, Amazon's Content Delivery Network (CDN). They are mainly located in most of the major cities to distribute the content to end users with reduced latency. Currently there are more than 150 edge locations.
  • Regional Edge Caches - Regional Edge cache lies between CloudFront Origin servers and the edge locations. A regional edge cache has a large cache than an individual edge location. Data is removed from the cache at the edge location while the data is retained at the Regional Edge Caches. When the user requests the data, then data is no longer available at the edge location. Therefore, the edge location retrieves the cached data from the Regional edge cache instead of the Origin servers that have high latency.
Below are the major services under various domains:

Networking and Content Delivery

  1. VPC - A Virtual Private Cloud is your private section of AWS, it provides a logically isolated area where you can launch AWS resources, and allow/restrict access to them. 
    • We can create Subnets (Private/Public) in a VPC and can assign custom IP address ranges in each subnet.
    • Max 5 VPCs allowed in each AWS Region by default.

  2. Route 53 - DNS Routing service (Manage DNS records of the domain) Routing of traffic to EC2 instances can be based on 
    • Weighted percentages
    • Latency based
    • Failover by creating health checks on each Record sets 
    • Geolocation based
    • MultiValue Answer policy is Simple Routing with health checks.

  3. API GatewayIt is a gateway which lets the incoming API calls communicates with a bunch of Lambda functions that create a serverless system and serve the users with response from those functions.

  4. CloudFront - Content Delivery Network is a system of distributed servers that deliver web pages and other web content to a user based on the geographic locations of the user, the origin of the webpage and a content delivery server. CDN comprises of following components:
    1. Edge Location: Edge location is the location where the content will be cached. It is a separate to an AWS Region or AWS availability zone.
    2. Origin: It defines the origin of all the files that CDN will distribute. Origin can be either an S3 bucket, an EC2 instance or an Elastic Load Balancer.
    3. Distribution: It is the name given to the CDN which consists of a collection of edge locations. When we create a new CDN in a network with AWS means that we are creating a Distribution. The distribution can be of two types:
      1. Web Distribution: It is typically used for websites. When a user requests for content, the request is automatically routed to the nearest edge location so that the content is delivered with the best possible performance.
      2. RTMP: It is used for Media Streaming.

Compute

  1. EC2 - EC2 stands for Amazon Elastic Compute Cloud.
    1. Amazon EC2 is a web service that provides resizable compute capacity in the cloud.
    2. Amazon EC2 reduces the time required to obtain and boot new user instances to minutes rather than in older day which was a very time-consuming process.
    3. You can scale the compute capacity up and down as per the computing requirement changes.
    4. Amazon EC2 has changed the economics of computing by allowing you to pay only for the resources that you actually use. Rather than previously we use to use physical servers on lease or purchase them, so we have to plan for 5 years in advance. This end up in spending a lot of capital in such investments.
    5. EC2 pricing options
      1. On Demand - On Demand is perfect for the users who want low cost and flexibility of Amazon EC2 without any up-front investment or long-term commitment. It is suitable for the applications with short term, spiky or unpredictable workloads that cannot be interrupted.
      2. Reserved - In a Reserved instance, you are making a contract means you are paying some upfront, so it gives you a significant discount on the hourly charge for an instance. It is used for those applications that require reserved capacity.
      3. Spot Instances - It allows you to bid for a price whatever price that you want for instance capacity, and providing better savings if your applications have flexible start and end times. It is useful for those applications that are feasible at very low compute prices.
      4. Dedicated Host - A dedicated host is a physical server with EC2 instance capacity which is fully dedicated to your use.

  2. Elastic Beanstalk - It is a PAAS (Platform as a Service) used for deploying and scaling web applications/services developed with Java, PHP,Node.js on familiar servers like Apache, Nginx, Tomcat, IIS.
    1. Elastic Beanstalk is one layer of abstraction away from the EC2 layer. Elastic Beanstalk will setup an "environment" for you that can contain a number of EC2 instances, an optional database, as well as a few other AWS components such as a Elastic Load Balancer, Auto-Scaling Group, Security Group. 
      1. Load Balancing
      2. Auto Scaling
      3. Health Monitoring
    2. EB offers two different Environment tiers:
      1. Web Server Environment: Handles HTTP requests from clients
      2. Worker Environment: Processes background tasks which are resource consuming and time intensive
    3. Each environment runs only a single application version at a time. But it is possible to run same or different versions of an application in many environments at the same time.
    4. After terminating an environment, You can restore it if terminated in the last six weeks.

  3. Lambda - It is a pay-as-you-go serverless compute service. It is known as a Function as a Service(FAAS)
    1. All lambda functions are stateless, meaning they cannot store persistent data.
    2. You deploy some code, it gets invoked, processes some input, and returns a value.
    3. It is always used in conjunction with API Gateways to create serverless model. Means it will always be invoked through an API gateway.
    4. Lambda is used to encapsulate Data centres, Hardware, Assembly code/Protocols, high-level languages, operating systems, AWS APIs.
    5. Lambda is a compute service where you can upload your code and create the Lambda function.
    6. Lambda takes care of provisioning and managing the servers used to run the code.
    7. While using Lambda, you don't have to worry about scaling, patching, operating systems, etc.


Storage

  1. EBS stands for Elastic Block Store.
    1. Amazon EBS allows you to create storage volumes and attach them to the EC2 instances.
    2. Once the storage volume is created, you can create a file system on the top of these volumes, and then you can run a database, store the files, applications or you can even use them as a block device in some other way.
    3. Amazon EBS volumes are placed in a specific availability zone, and they are automatically replicated to protect you from the failure of a single component.
    4. EBS volume attached to the EC2 instance where windows or Linux is installed known as Root device of volume.
    5. EBS Volume types fall into two parts:
      1. SSD-backed volumes
      2. HDD-backed volumes
    6. SSD is further classified into two parts:
      1. General Purpose SSD - General Purpose SSD is also referred as GP2. It is required where application uses less than 10,000 IOPS.
      2. Provisioned IOPS SSD - It is also referred to as IO1. It is mainly used for high-performance applications such as intense applications, relational databases. It is used when you require more than 10,000 IOPS.

  2. S3 stands for Simple Storage Service.
    1. It is an Object-based storage, i.e., you can store the images, word files, pdf files, etc.
    2. The files which are stored in S3 can be from 0 Bytes to 5 TB.
    3. It has unlimited storage means that you can store the data as much you want.
    4. Files are stored in Bucket. A bucket is like a folder available in S3 that stores the files. You can put the permissions individually on your files or on complete bucket.
    5. S3 is a universal namespace, i.e., the bucket names must be unique globally. Bucket contains a DNS address.
    6. If you create a bucket, URL look like: https://<bucket-name>.s3-<AWS-region>.amazonaws.com

  3. Snowball - These are physical devices that help migrate large amounts of data into and out of the cloud without depending on networks.
    1. Snowball is a suitcase-sized device, Snowball Edge is a rack mountable and clusterable suitcase sized device with compute capabilities, and Snowmobile is a shipping container moved with a tractor-trailer.
    2. With Snowball service we can migrate data in amount ranging between 100 TeraBytes to 10 PetaBytes.

  4. Storage Gateway - Storage Gateway is a service in AWS that connects an on-premises software appliance with the cloud-based storage to provide secure integration between an organization's on-premises IT environment and AWS storage infrastructure.
    1. File Gateway (NFS) - It is used to store the flat files in S3 such as word files, pdf files, pictures, videos, etc
      1. Files are directly stored as objects in S3 buckets, and they are accessed through a Network File System (NFS) mount point.
      2. Ownership, permissions, and timestamps are durably stored in S3 in the user metadata of the object associated with the file.
    2. Volume Gateway (iSCSI) - Volume Gateway is an interface that presents your applications with disk volumes using the Iscsi block protocol.
      1. The iSCSI block protocol is block-based storage that can store an operating system, applications and also can run the SQL Server, database.
      2. Data written to the hard disk can be asynchronously backed up as point-in-time snapshots in your hard disks and stored in the cloud as EBS snapshots 
    3. Tape Gateway (VTL) - It is mainly used for taking backups.
      1. Tape Gateway offers a durable, cost-effective solution to archive your data in AWS cloud.
      2. The VTL interface provides a tape-based backup application infrastructure to store data on virtual tape cartridges that you create on your tape Gateway.
      3. It is supported by NetBackup, Backup Exec, Veeam, etc. Instead of using physical tape, they are using virtual tape, and these virtual tapes are further stored in Amazon S3.


Database

  1. RDS - stands for Relational Database Service. It supports six commonly used database engines. The Amazon RDS Free Tier provides a single db.t2.micro instance as well as up to 20 GiB of storage.

  2. DynamoDB - It is a fast and flexible NoSQL database service.

  3. ElasticCache - It is a web service used to deploy, operate, and scale an in-memory cache in the cloud. It improves the performance of web applications by allowing you to retrieve information from fast, managed in-memory cache instead of relying entirely on slower disk-based databases. Caching improves application performance by storing critical pieces of data in memory for low latency access. There are two types of Elasticache:
    1. Memcached - Memcached keeps its data in memory by eliminating the need to access the disk.
      1. Memcached uses the in-memory key-value store service that avoids the seek time delays and can access the data in microseconds.
      2. It is a distributed service means that it can be scaled out by adding new nodes.
    2. Redis - Redis stands for Remote Dictionary Server.
      1. It is a fast, open-source, and in-memory key-value data store.
      2. Its response time is in a millisecond, and also serves the millions of requests per second for real-time applications such as Gaming, AdTech, Financial services, Health care, and IoT.

Security, Identity & Compliance

  1. IAM - Identity Access Management
    1. IAM Policies are made up of documents called Policy Documents. These docs are in JSON format. 
    2. Roles are made up of policies.
    3. Roles can be assigned to users or to a group. Best practice is to assign roles to the groups and add users to that group.
    4. SAML(Security Assertion Markup language) is a technique of achieving Single Sign-On (SSO) means that users have to log in once and can use the same credentials to log in to another service provider.
    5. SAML provides security by eliminating passwords for an app and replacing them with the security tokens.
    6. Two types of SAML providers: Service provider & Identity provider.

Management and Governance

  1. CloudFormationIt is a tool from AWS that allows you to spin up resources effortlessly. You define all the resources you want AWS to spin up in a blueprint document, click a button, and then AWS will creates all of the components. This blueprint is called a template.
    1. CloudFormation makes sure that dependent resources in your template are all created in the proper order. For example if DNS record points to an EC2 instance then the CF will provision the EC2 instance first, wait for it to be ready and then create the Route53 DNS record afterwards.
    2. CF declare the template as JSON format.

  2. CloudWatch - CloudWatch is a service used to monitor your AWS resources and applications that you run on AWS in real time. 
    1. CloudWatch is used to collect and track metrics that measure your resources and applications.
    2. Following are the terms associated with CloudWatch:
      1. Dashboards: CloudWatch is used to create dashboards to show what is happening with your AWS environment.
      2. Alarms: It allows you to set alarms to notify you whenever a particular threshold is hit.
      3. Logs: CloudWatch logs help you to aggregate, monitor, and store logs.
      4. Events: CloudWatch help you to respond to state changes to your AWS resources.

  3. Auto Scaling - Scale your EC2 instances capacity automatically. enabled by Amazon CloudWatch. Scale In/Scale Out EC2 instance to/from Auto Scaling groups as per the launch configuration, when scheduled event is met or Cloud Watch event is triggered. We have to create Launch Configuration first (choice of AMI and EC2 instance type) then Auto-Scaling group.

Application Integration

  1. SNS - SNS stands for Simple Notification Service.
    1. It is a way of sending messages. When you are using AutoScaling, it triggers an SNS service which will email you that "your EC2 instance is growing".
    2. SNS notifications can also trigger the Lambda function. When a message is published to an SNS topic that has a Lambda function associated with it, Lambda function is invoked with the payload of the message.
    3. Amazon SNS allows you to group multiple recipients using topics where the topic is a logical access point that sends the identical copies of the same message to the subscribe recipients.
    4. To prevent the loss of data, all messages published to SNS are stored redundantly across multiple availability zones.

  2. SQS - SQS stands for Simple Queue Service.
    1. Amazon SQS is a web service that gives you access to a message queue that can be used to store messages while waiting for a computer to process them.
    2. Amazon SQS is a distributed queue system that enables web service applications to quickly and reliably queue messages that one component in the application generates to be consumed by another component where a queue is a temporary repository for messages that are awaiting processing.
    3. Messages can contain up to 256 KB of text in any format such as json, xml, etc.
    4. Used if the producer is producing work faster than the consumer can process it, or if the producer or consumer is only intermittently connected to the network.
    5. The Default Visibility Timeout is 30 seconds. Visibility Timeout can be increased if your task takes more than 30 seconds. The maximum Visibility Timeout is 12 hours.
    6. There are two types of Queue:
      1. Standard Queues (default)
      2. FIFO Queues (First-In-First-Out)

  3. SWF - SWF stands for Simple Workflow Service.

So far we have covered all the major services that are useful for creating a high availability architecture in a cloud.
I have created a simple architecture for hosting a web app which I kept evolving while I was in the process of learning AWS. Now I have scaled it to a highly available, reliable and scalable architecture, which we will be covering in the next section : Designing HA Architecture in AWS part-3

Saturday, 3 August 2019

Concepts of AWS part-1

Nowadays we have so many cloud services hosted on the internet by various providers like Apple iCloud, Google Cloud Platform, Microsoft Azure, Amazon Web Services , IBM Cloud, Salesforce and many others.
Everybody has there own perception of how things are uploaded/downloaded/accessed from a cloud. But people are not keen to learn about how things actually work in this process.
Because of the versatility and vastness of Amazon Web Services I have chosen this to understand and clear my concept of cloud architecture.

Lets start with understanding what is AWS and its services first. In the end we would be able to create a cloud architecture ourself by connecting all the dots together.

What is AWS?

AWS stands for Amazon Web Services.

Amazon's Cloud services provide great flexibility in provisioning, duplicating and scaling resources to balance the requirements of users, hosted applications and solutions.
Cloud services are built, operated and managed by a cloud service provider, which works to ensure end-to-end availability, reliability and security of the cloud.

AWS ensures the three aspects of security, i.e., Confidentiality, integrity, and availability of user's data.

There are three basic types of cloud services:

  • Software as a service (SaaS)
  • Infrastructure as a service (IaaS)
  • Platform as a service (PaaS)

In addition to these services above, AWS also offers Function as a Service (FaaS), which is the concept on which Serverless computing is build.
These services are the building blocks that can be used to create and deploy any type of application in the cloud.
Currently there are around 165 services that are being offered by AWS.
If you want to know more about the services offered by AWS. Please feel free to follow this link.

History of AWS

  • In 2003, Chris Pinkham and Benjamin Black presented a paper on how Amazon's own internal infrastructure should look like. They suggested to sell it as a service and prepared a business case on it. They prepared a six-page document and had a look over it to proceed with it or not. They decided to proceed with the documentation.
  • In 2004, the first web service SQS which stands for "Simple Queue Service" was officially launched. 
  • In 2006, AWS (Amazon Web Services) was officially re-launched, combining the three initial service offerings of Amazon S3 cloud storage, SQS and EC2.
  • In 2007, over 180,000 developers had signed up for the AWS.
  • In 2014, AWS claimed its aim was to achieve 100% renewable energy usage in the future.
  • In 2015, AWS breaks its revenue and reaches to $6 Billion USD per annum. The revenue was growing 90% every year.
  • By 2016, revenue doubled and reached $13 Billion USD per annum.
  • In 2018, AWS launched a Machine Learning Speciality Certs. It heavily focused on automating Artificial Intelligence and Machine learning.

Advantages of AWS

  1. High Availability and durability (99.9999999%)
  2. High Scalability/Elasticity (expand/shrink on demand)
  3. Fault tolerance (Reliable/Resilient)
  4. Based on the concept of Pay-As-You-Go - Pay for the resources when you need them.
  5. Cost-effectiveness - No long term commitments/huge investments in physical infrastructure.
  6. Loosely coupled architecture - Best fit for adopting microservice architecture.

How to SignUp to the AWS platform

  • First visit this website https://aws.amazon.com/, then click on the Complete Sign Up to create an account and fill the required details.
  • Now, fill your contact information.
  • After providing the contact information, fill your payment information.(Don't worry nothing will be deducted from your account)
  • After providing your payment information, confirm your identity by entering your phone number and security check code, and then click on the "Contact me" button.
  • AWS will contact you to verify whether the provided contact number is correct or not via a phone call.
  • The final step is the confirmation step. Click on the link to log in again; it redirects you to the "Management Console".
AWS provides 4 plans, you can choose as per your usage/features: 
  • Basic - Free
  • Developer - Starting at $29 per month
  • Business - Starting at $100 per month
  • Enterprise - Starting at $15,000 per month

Wondering how much percentage of internet is comprised of AWS??

AWS hosts about 5% of all websites and accounts for about 40% of all Internet traffic.
You can check by blocking all the traffic coming from IP address ranges of AWS hosted servers, which is shared by AWS here
There is a simple script called AWS Blocker created by a developer which retrieve the official list of AWS IPv4 and IPv6 ranges, then block them all using iptables.
After running the above script on your linux machine, you won’t be able to listen to Spotify, book a flight on Expedia, or look at rooms on Airbnb & moreover not able to watch your favorite seasons on Netflix :'(
This is what the internet would look if Amazon Web Services suddenly ceased to exist.


In the next blog, we will deeply discuss about the components that make AWS such an amazing cloud platform in the coming future. Components of AWS part-2