Skip to main content

AWS ESSENTIALS BASICS (theoretical) Part - 2


 AWS Network:

Networking is how we connect computers across the world, allowing them to communicate with one another. You've already seen some examples of networking in action, such as AWS's global infrastructure, which connects data centers, Availability Zones, and Regions together. Now, let's dive into the networking basics and how AWS uses networking to make the cloud function smoothly.

The Basics of Networking: Think of Sending a Letter

Imagine you're sending a letter. To get that letter to its destination, you need three pieces of information:

  • Payload: The actual letter inside the envelope.
  • Sender's Address: Where the letter is coming from.
  • Recipient's Address: Where the letter is going.

Each address includes information like the sender and recipient's names, street, city, state, postal code, and country. Without these details, the letter might never reach its destination. In the digital world, computers handle the delivery of messages in a similar way—this process is called routing.

What Are IP Addresses?

Just like houses have mailing addresses, every computer has an IP address. Instead of using street names and cities, IP addresses are made up of numbers, specifically in binary (0s and 1s). For example, here’s a 32-bit binary address:


11000000 10101000 00000001 00011110

But don’t worry, you won’t usually see it like this!

What Is IPv4 Notation?

Usually, instead of showing you the binary version, IP addresses are written in decimal format. This is known as IPv4 (Internet Protocol version 4) notation. Here’s an example of how that works:


192.168.1.30

In this notation, the 32 bits are split into octets (groups of 8 bits) and converted to decimal, with each group separated by periods. But often, we need to express a range of IP addresses, not just one single address.

CIDR Notation: A Compressed Way to Specify IP Ranges

Let’s say you want to express a range of IP addresses from 192.168.1.0 to 192.168.1.255. Instead of listing all 256 addresses, you can use CIDR (Classless Inter-Domain Routing) notation, like this:


192.168.1.0/24

The /24 tells you that the first 24 bits of the IP address are fixed, and the remaining 8 bits are flexible, allowing for up to 256 different addresses (2^8).

  • Smaller CIDR Notation (e.g., /28) means fewer addresses (e.g., 16 addresses).
  • Larger CIDR Notation (e.g., /16) means more addresses (e.g., 65,536 addresses).

In AWS, when you’re setting up your cloud network, you select the size of your network by using CIDR notation. The smallest range you can have is /28, giving you 16 IP addresses, and the largest is /16, which provides a whopping 65,536 addresses

Creating and Managing VPCs in AWS: A Simple Breakdown

A Virtual Private Cloud (VPC) is an isolated network you create in the AWS cloud, similar to a traditional network in an on-premises data center. Here’s a simplified guide to setting up and managing your VPC:

1. Setting Up Your VPC

When creating a VPC, three main decisions need to be made:

  • Name of Your VPC: Give your VPC an identifiable name.
  • Region: Choose the AWS Region where your VPC will reside. Each VPC spans across multiple Availability Zones (AZs) within that region.
  • IP Range (CIDR Notation): Define the size of your network. For example, a /16 CIDR block provides 65,536 IP addresses, and each VPC can have up to four /16 ranges.

Once these choices are made, AWS provisions the network and allocates IP addresses based on the CIDR block.

2. Creating Subnets

After setting up your VPC, the next step is to create subnets—smaller networks within your VPC. These are useful for organizing and optimizing resources:

  • VPC Selection: Choose the VPC the subnet will belong to.
  • Availability Zone: Pick an AZ for the subnet.
  • CIDR Block: Allocate a subset of the VPC's CIDR block (e.g., 10.0.0.0/24).

Subnets help isolate network traffic and ensure high availability. For example, an EC2 instance is deployed inside a subnet within an AZ.

3. High Availability in VPC

To ensure redundancy and fault tolerance, create multiple subnets in different AZs. If one AZ fails, resources in another AZ can continue to operate, preventing downtime.

4. Reserved IP Addresses

AWS reserves five IP addresses in each subnet for internal use (e.g., for routing and DNS services):

  • Example: A /24 subnet provides 256 IP addresses, but only 251 are usable due to AWS's reserved IPs.

When designing networks, it's a good practice to start with a /16 CIDR block for the VPC and divide it into /24 subnets, ensuring plenty of IP addresses for flexibility.

5. Gateways

To make your VPC functional and connected, you need gateways:

  • Internet Gateway (IGW): This connects your VPC to the public internet, acting like a modem. Once created, the IGW must be attached to your VPC to enable internet access.
  • Virtual Private Gateway (VGW): This allows you to connect your VPC to another private network through an encrypted VPN. The VGW serves as the AWS-side anchor, while a Customer Gateway (CGW)—a physical or software-based device—connects on your end.

What is VPN?

A VPN (Virtual Private Network) creates a secure, encrypted tunnel over the public internet between two points, such as between your corporate network and your AWS VPC. This ensures data can travel safely without being intercepted.

Example:

Let’s say you want to send data from your office network (192.168.1.10) to your AWS VPC (10.0.0.5):

  1. Without VPN: Data travels over the internet unprotected, making it vulnerable to interception.
  2. With VPN: Data is encrypted and sent through a secure tunnel. Only the intended recipient (AWS VPC) can decrypt and access the data, keeping it safe.

In summary, VPNs offer secure communication between your on-premises network and AWS, ensuring that sensitive data remains private..

Steps to Create a VPC and Subnet in AWS

  1. Log in to AWS Console

    • Go to your AWS Console and make sure you’re in the correct Region (e.g., Oregon).
  2. Navigate to VPC Dashboard

    • In the AWS search bar, type VPC and select it to enter the VPC dashboard.
  3. Create a VPC

    • Select Your VPCs on the left panel and click Create VPC.
    • Enter a CIDR block (e.g., 10.1.0.0/16), and give your VPC a name (e.g., "app-vpc").
    • Leave the rest as default and click Create VPC.
  4. Create Subnets

    • Go to the Subnets section in the left panel and click Create Subnet.
    • Select your VPC (e.g., "app-vpc").
    • Choose an Availability Zone (e.g., us-west-2a).
    • Enter a CIDR block for your subnet (e.g., 10.1.1.0/24 for public subnet).
    • Give the subnet a name (e.g., "Public Subnet 1").
    • Repeat these steps for a Private Subnet with a different CIDR block (e.g., 10.1.3.0/24).
  5. Enable Internet Connectivity (Optional)

    • For internet access, go to Internet Gateways on the left panel and click Create Internet Gateway.
    • Name the internet gateway, and then Attach it to your VPC by selecting Actions > Attach to VPC and choosing your VPC.
  6. High Availability

    • For high availability, create additional subnets in another AZ (e.g., us-west-2b), duplicating your public and private subnets (e.g., 10.1.2.0/24 for public and 10.1.4.0/24 for private).

Now you have a VPC with multiple subnets across different availability zones. You can start deploying resources like EC2 instances inside these subnets.


1. Routing Tables: Directing Traffic

A Routing Table is like a map that tells your VPC how to get traffic from one point to another. Think of it as a guide that routes network requests from one place to another within your network or even outside it.

  • Role: A Routing Table defines how the traffic should flow between subnets in your VPC or between your VPC and the outside world (like the internet or your on-premises network).
  • Example: Imagine you have a user trying to access your web server hosted in an AWS EC2 instance. The routing table ensures that the request is directed to the appropriate subnet (where your EC2 instance resides) through the internet gateway.
  • Key Point: By default, AWS creates a main route table that allows traffic between subnets in your VPC. You can create custom route tables to route specific traffic to certain subnets (like public and private ones).

In a routing table, you specify where traffic should go:

  • Destination: 10.1.0.0/16 (This means traffic is routed within the local VPC)
  • Target: Local (Traffic stays within the VPC)
  • Destination: 0.0.0.0/0 (This means traffic is routed to the internet)
  • Target: Internet Gateway (IGW) (This directs traffic to the internet)

Here, traffic within the VPC can flow freely (local), and internet-bound traffic is routed through the internet gateway.

2. Network ACLs: Filtering Traffic at the Subnet Level

A Network Access Control List (Network ACL) acts like a subnet-level firewall, allowing or denying specific types of traffic into and out of your subnets. It's the first layer of security that filters traffic even before it reaches any specific instance inside the subnet.

  • Role: Network ACLs filter both inbound and outbound traffic to and from the subnets.
    • Stateless: This means you must explicitly define both inbound and outbound rules. If you allow inbound traffic, you also need to specify an outbound rule for the corresponding response.
  • Example: Let’s say you want to allow HTTPS traffic into your public subnet but block all other traffic. You would configure your Network ACL to allow traffic on port 443 (HTTPS) and deny everything else.

Network ACLs filter traffic at the subnet level. Here’s a common configuration for allowing only HTTPS traffic and denying everything else.

Inbound Rules:

  • Rule 100: Allow HTTPS traffic from anywhere (Port: 443, Source: 0.0.0.0/0)
  • Rule * (implicit deny): Deny all other traffic

Outbound Rules:

  • Rule 100: Allow all traffic out (Destination: 0.0.0.0/0, Port: Any)
  • Rule * (implicit deny): Deny all other outbound traffic

This configuration allows HTTPS traffic to the subnet while blocking all other inbound traffic.

3. Security Groups: Instance-Level Firewalls

A Security Group acts as an instance-level firewall for controlling traffic to and from specific instances like EC2. Security Groups allow you to define rules about which traffic is allowed into and out of your instance.

  • Role: Security Groups work at the instance level, allowing or denying traffic based on inbound and outbound rules.
    • Stateful: This means if you allow inbound traffic (e.g., a web request), the response traffic (outbound) is automatically allowed. You don’t need to define separate outbound rules for the response.
  • Example: If you have an EC2 instance running a web server, you might allow inbound traffic on port 80 (HTTP) and port 443 (HTTPS), while blocking all other inbound traffic.

Security groups filter traffic at the instance level and are stateful. Here's an example for a web server:

Inbound Rules:

  • Allow HTTP traffic on port 80 from anywhere (Source: 0.0.0.0/0)
  • Allow HTTPS traffic on port 443 from anywhere (Source: 0.0.0.0/0)

Outbound Rules:

  • Allow all outbound traffic (Destination: 0.0.0.0/0, Port: Any)

This configuration allows web traffic (HTTP/HTTPS) to the EC2 instance.

How They Work Together:

Let’s break down how Routing Tables, Network ACLs, and Security Groups work together in a typical VPC:

  1. Routing Traffic:

    • Routing Tables direct incoming requests (like a web request) to the appropriate subnet. For example, if a user is accessing your web server, the routing table ensures that the traffic is sent to the correct public subnet through the internet gateway.
  2. Filtering at the Subnet Level:

    • Once the traffic reaches the public subnet, Network ACLs check whether the incoming request is allowed. In our example, you might have a Network ACL that allows HTTPS traffic (port 443) but blocks all other traffic.
  3. Filtering at the Instance Level:

    • If the traffic is allowed by the Network ACL, it then reaches the specific EC2 instance hosting your web server. The Security Group for this EC2 instance further filters the traffic, ensuring only allowed traffic (like HTTPS) can reach the server.

Example Workflow for YouTube Access:

  • Routing Table: Routes your request to the correct subnet.
  • Network ACL: Allows or denies traffic at the subnet level based on predefined rules.
  • Security Group: Controls which traffic is allowed to reach your EC2 instance (e.g., allowing HTTP/HTTPS traffic).

Key Differences:

  • Routing Table: Manages traffic routing between subnets and external networks.
  • Network ACL: Filters traffic at the subnet level and is stateless.
  • Security Group: Controls traffic at the instance level and is stateful.

By combining these elements, you can finely control and secure the flow of traffic into and out of your AWS resources!

Sure! Let’s break this down into simple terms and explain the different options for connecting an on-premises data center to AWS in hybrid deployments.

Hybrid Deployment Overview

In a hybrid deployment, part of your infrastructure exists on AWS (in the cloud), while other components remain in an on-premises data center (on-site). This hybrid setup is common when companies don’t want to move everything to the cloud or need to maintain some control over certain resources, such as legacy systems, databases, or sensitive data.

To connect your on-premises data center to your AWS infrastructure (VPC), there are two key options:

1. AWS VPN (Virtual Private Network)

This is the most common and cost-effective way to connect a remote data center to AWS securely. AWS VPN offers two types of VPN connections:

  • AWS Site-to-Site VPN:
    • This allows your on-premises data center to connect securely to your VPC in AWS over the internet.
    • It establishes a secure, encrypted connection between your network and AWS.
    • Suitable for workloads where internet-based connections are acceptable.
  • AWS Client VPN:
    • This is designed for administrators or employees who need secure access to AWS resources or the corporate data center.
    • It's more like a personal VPN connection that you would use to securely access AWS resources from your laptop, for instance.

Think of AWS VPN as your “virtual tunnel” through the public internet, providing a secure way to connect your office or data center to AWS.

2. AWS Direct Connect

Direct Connect is a more advanced solution when you need higher performance, lower latency, and more reliable connectivity than a standard VPN.

  • AWS Direct Connect creates a private, dedicated connection between your data center and AWS, bypassing the public internet entirely.
  • Traffic over Direct Connect is routed through AWS’s global network, which ensures better performance and security.
  • It’s ideal for businesses that have heavy traffic and need guaranteed bandwidth and reliability.

For example, if you run workloads that constantly transfer large amounts of data (like real-time applications, backups, or analytics), Direct Connect is a better choice since it offers a dedicated, high-speed connection.

Key Differences Between AWS VPN and Direct Connect:

  • AWS VPN:

    • Uses the public internet.
    • Lower cost.
    • Easier and quicker to set up.
    • Slightly higher latency and less reliable compared to Direct Connect.
  • AWS Direct Connect:

    • Private and dedicated connection.
    • No public internet involvement.
    • Higher bandwidth and consistent performance.
    • More suitable for mission-critical and high-volume traffic scenarios.
    • Higher cost and longer setup time (involves coordination with AWS delivery partners).

Combining Both: VPN as a Backup

Some companies use both AWS VPN and Direct Connect together. Direct Connect can be the primary connection for high-performance tasks, while VPN serves as a backup. This provides redundancy, ensuring that if your Direct Connect link fails, the VPN automatically takes over to maintain connectivity.

Real-World Example:

Imagine a large company has their employee directory application hosted in AWS, but their on-premises HR database is still located in their data center. They need a way to securely access that database from AWS. Here’s how they can connect:

  • Option 1: Use AWS Site-to-Site VPN to securely connect the AWS VPC to the HR database on-premises.
  • Option 2: For higher performance and reliability, they could set up AWS Direct Connect to create a dedicated connection between the on-premises data center and AWS.

By doing this, the company ensures smooth communication between their cloud and on-premises infrastructure, allowing them to maintain a hybrid model effectively.


Understanding AWS Storage Types: Block, File, and Object Storage

When working with cloud storage on AWS, it’s essential to know the different types of storage services available. AWS offers three primary categories of storage: block storage, file storage, and object storage. Each of these has unique characteristics and use cases, so let’s break them down.

1. File Storage: A Familiar System for Organizing Files

File storage is the most common type of storage for users who are familiar with organizing files on systems like Windows File Explorer or MacOS Finder. In file storage, data is organized in a tree-like hierarchy, using folders and subfolders.

Example:

Imagine you have a collection of cat photos on your computer. You might create a folder named Cat Photos and store all the images inside. If you’re using these cat photos for an application, you could then place the Cat Photos folder into another folder named Application Files for better organization. Each file also comes with metadata, such as the file name, size, and creation date, and has a path like:
computer/Application_files/Cat_photos/cats-03.png

File storage is typically used for cases where multiple hosts need centralized access to shared files, such as:

  • Large content repositories
  • Development environments
  • User home directories

AWS offers services like Amazon EFS (Elastic File System) for scalable file storage in the cloud.

2. Block Storage: High-Performance Storage for Critical Applications

Block storage takes a different approach. Instead of storing data as whole files, it splits the data into small fixed-size chunks known as blocks. Each block has a unique address, allowing them to be retrieved quickly and efficiently.

Unlike file storage, block storage doesn't store metadata. This approach makes block storage ideal for high-performance applications like databases or enterprise resource planning (ERP) systems that require low-latency storage solutions. AWS’s Elastic Block Store (EBS) is a popular block storage service.

Why Block Storage?

Let’s say you want to modify one character in a large document. In block storage, you don’t need to change the entire file. Instead, you only update the specific block that contains that character, which reduces bandwidth and improves performance. It’s particularly useful for:

  • Databases
  • Transactional applications
  • High-performance workloads

3. Object Storage: Unlimited Scalability for Unstructured Data

Object storage is another method of storing data but is optimized for unstructured and static assets. Unlike file storage, which organizes files in folders, object storage uses a flat structure. Every object (file) is stored with a unique identifier that can be used to retrieve it. In addition to the data itself, each object contains metadata, which can describe the object’s contents.

Why Object Storage?

While object storage is not optimized for frequent updates (changing one character means updating the entire object), it’s ideal for storing vast amounts of data that doesn’t need to change often. Use cases include:

  • Media files (images, videos)
  • Backups
  • Static website content

With AWS, Amazon S3 (Simple Storage Service) is the go-to for scalable object storage. It’s perfect for storing large datasets without worrying about file size limits.

Choosing the Right Storage Type

To summarize:

  • File Storage (e.g., Amazon EFS): Best for organizing and sharing files across multiple systems, similar to traditional file systems.
  • Block Storage (e.g., Amazon EBS): Ideal for high-performance workloads, offering fast access to specific blocks of data.
  • Object Storage (e.g., Amazon S3): Perfect for scalable, long-term storage of unstructured data, like media files and backups.

Understanding the differences between these storage types will help you select the best AWS service for your specific use case. Whether you need fast performance, easy sharing, or scalable storage, AWS has a solution tailored for you!

Amazon EC2 Instance Storage and Amazon EBS: Simplifying Block Storage

When you launch an Amazon EC2 instance, you'll need some form of block storage to go with it. Block storage acts like your laptop’s internal or external drives, and AWS provides two main options for this: Instance Store and Amazon Elastic Block Store (Amazon EBS). Let's break these down.

1. Amazon EC2 Instance Store: Fast but Temporary

Instance Store provides directly attached storage to your EC2 instance, meaning it's physically attached to the same server your instance is running on. Because of this direct attachment, it's very fast. However, there’s a downside: the storage is temporary. If you stop or terminate your instance, all the data in the Instance Store is lost. This makes Instance Store ephemeral, meaning it’s useful only for temporary data like buffers, caches, or scratch data that doesn’t need to persist after the instance is shut down.

When to Use Instance Store:

  • Hosting Hadoop clusters or distributed workloads where data replication ensures data availability.
  • Storing temporary or frequently changing data that doesn't need long-term storage.

2. Amazon Elastic Block Store (EBS): Persistent and Flexible

Unlike Instance Store, Amazon EBS volumes provide persistent storage, meaning your data survives even if your instance is stopped or terminated. Think of an EBS volume as an external hard drive attached to your EC2 instance. You can attach multiple EBS volumes to one instance, and if necessary, detach a volume and attach it to another instance within the same Availability Zone.

Key Features of EBS:

  • Persistence: Data remains safe even if the instance is terminated.
  • Flexibility: You can increase the volume size or change volume types without stopping the instance.
  • Backup: EBS supports snapshots, which are incremental backups of your volumes.

EBS is suitable for long-term storage and is often used for:

  • Operating system boot volumes
  • Databases
  • Enterprise applications that require quick, reliable access to data

3. EBS Volume Types: SSD vs. HDD

Amazon EBS offers different volume types that are split into two main categories: SSD (Solid-State Drives) and HDD (Hard Disk Drives). Each category serves different workloads.

  • EBS Provisioned IOPS SSD: High performance for latency-sensitive applications like databases.
  • EBS General Purpose SSD: Balanced for everyday workloads like boot volumes and development environments.
  • Throughput Optimized HDD: Great for big data, log processing, or other frequently accessed workloads.
  • Cold HDD: Best for infrequently accessed data.

For example, if you're running a high-performance NoSQL database, you might choose Provisioned IOPS SSD for maximum speed. But if you're storing large logs, Throughput Optimized HDD is a more cost-effective option.

4. Backing Up Data: EBS Snapshots

Things can go wrong, and that’s why backing up your data is essential. AWS makes this easy with EBS snapshots, which are incremental backups. This means only the blocks that have changed since your last backup are stored, saving time and space.

For instance, if you have a 10 GB EBS volume and only 2 GB of data has changed since the last snapshot, only those 2 GB will be backed up.

Snapshots are stored in Amazon S3, providing high availability and durability across multiple Availability Zones. You can use these snapshots to create new volumes in different Availability Zones, making it easy to restore data or scale your resources.

Conclusion: Choosing the Right Block Storage

Choosing between Instance Store and EBS comes down to your need for speed versus persistence:

  • Use Instance Store for temporary, fast storage in situations where data loss upon instance termination isn’t an issue.
  • Use Amazon EBS for persistent, scalable storage that you can back up and recover, ensuring data safety even when instances fail.

For most applications requiring long-term data storage and reliability, Amazon EBS is the go-to solution.

Understanding Object Storage with Amazon S3

When it comes to storing vast amounts of data reliably and securely, Amazon Simple Storage Service (Amazon S3) is a powerhouse. Unlike other storage solutions like Amazon EBS, which are tied to compute services like EC2, Amazon S3 is a standalone object storage solution. This means it enables you to store and retrieve data from anywhere on the web, making it highly versatile for a variety of use cases, from backups to hosting websites.

What is Object Storage?

Amazon S3 is an object storage service, which means it stores data as objects. An object is simply a file combined with metadata (such as file name, size, or creation date), and you can store an unlimited number of objects in S3. The main difference between object storage and traditional storage methods like block or file storage is the flat storage structure of S3. In object storage, there’s no file hierarchy like you would see in a file system. Instead, every object is stored in a "bucket," and each object is identified with a unique key.

Key Concepts of Amazon S3

1. Buckets and Objects

To store data in S3, you first need to create a bucket. Think of a bucket as a container where you store your data (or objects). Each bucket is region-specific, meaning you choose which AWS region your bucket resides in. This decision is crucial because the region affects the redundancy and latency of your data.

Buckets also come with unique, globally identifiable names, and once created, the name is locked to your account unless you delete the bucket. Each object within a bucket can be accessed via a URL that looks something like this:

vbnet

http://bucket-name.s3.amazonaws.com/folder-name/object-key

2. Metadata and Object Keys

Every object in S3 has associated metadata, such as the file type, size, and permissions. Additionally, each object is given a unique key that allows it to be accessed. While you can organize objects using "folders," S3’s flat structure means that all objects are essentially stored on the same level. The folder view is simply for easier human understanding.

S3 Use Cases

With its flexibility and scalability, Amazon S3 is used in a variety of scenarios:

  • Backup and Storage: One of the most common uses for S3. Thanks to its redundancy across multiple Availability Zones, it's ideal for backing up critical data.
  • Media Hosting: If you're hosting large files like images, videos, or music, S3 is perfect due to its unlimited storage potential.
  • Software Delivery: S3 can be used to host downloadable software for customers.
  • Data Lakes: S3’s scalability makes it a great foundation for storing vast amounts of structured and unstructured data.
  • Static Websites: S3 is often used to host static websites made up of HTML, CSS, and JavaScript.

Managing Access in Amazon S3

By default, all objects in Amazon S3 are private. However, AWS provides multiple methods to control who has access to your buckets and objects:

  • IAM Policies: These are general-purpose policies applied to IAM users, groups, or roles that grant or deny access to AWS resources, including S3.
  • Bucket Policies: Unlike IAM policies, bucket policies are specifically tied to the bucket itself. They can be used to manage cross-account access or make certain objects publicly available.

Example of S3 Bucket Policy:

Let’s say you have a bucket called employeebucket and you want to allow anonymous users to view the contents of the bucket (e.g., your employee photos). You can use an S3 bucket policy to make the objects publicly accessible.

Here’s an example of what the policy might look like:

json

{ "Version":"2012-10-17", "Statement":[ { "Sid":"PublicRead", "Effect":"Allow", "Principal": "*", "Action":["s3:GetObject"], "Resource":["arn:aws:s3:::employeebucket/*"] } ] }

This policy allows anyone to read objects from the employeebucket by granting the s3

permission. However, be careful when making buckets public, as this allows anyone on the internet to access your files.

Encryption and Security

Amazon S3 also allows for data encryption both at rest and in transit. Here are your encryption options:

  • Server-Side Encryption (SSE): S3 handles encryption and decryption of your objects automatically.
  • Client-Side Encryption: You manage the encryption process and keys yourself before uploading the data to S3.

Versioning: Preserving Your Data

S3 also offers a versioning feature that allows you to preserve, retrieve, and restore different versions of objects in a bucket. This means that when you overwrite or delete an object, the old version is not lost—it can be recovered if necessary. This feature is particularly useful for critical data that may be accidentally overwritten or deleted.

Example of Versioning in Action:

Let’s say you have a photo of your employee called employee.jpg stored in your S3 bucket. You’ve enabled versioning for your bucket.

  • You upload employee.jpg on day 1. S3 assigns a unique version ID to it (e.g., 111111).
  • On day 3, you accidentally overwrite employee.jpg with a newer version. S3 assigns a new version ID to the new object (e.g., 222222).

Now, even though you’ve overwritten the original file, you can still retrieve the old version by specifying its version ID.

S3 Storage Classes

S3 offers multiple storage classes, allowing you to optimize costs based on how frequently you need to access your data:

  1. S3 Standard: For frequently accessed data, offering low latency and high throughput.
  2. S3 Intelligent-Tiering: Automatically moves data between two access tiers (frequent and infrequent) based on changing access patterns.
  3. S3 Standard-Infrequent Access (IA): For data that’s accessed less often but still needs fast retrieval.
  4. S3 One Zone-IA: For less frequently accessed data that doesn’t need the redundancy of multiple Availability Zones.
  5. S3 Glacier and Glacier Deep Archive: For long-term data archiving with low retrieval frequency, offering the lowest storage costs.

Automating Data Lifecycle with Lifecycle Management

If you’re looking to optimize storage costs or archive older data automatically, lifecycle policies are the way to go. You can configure lifecycle rules to transition data to cheaper storage classes or expire objects after a certain period.

For example:

  • Move files to S3 Standard-IA after 30 days.
  • Archive logs to S3 Glacier after one year.

Choosing the Right AWS Storage Service for Your Use Case

In the world of cloud computing, storage plays a pivotal role in determining the efficiency, scalability, and cost-effectiveness of your infrastructure. Amazon Web Services (AWS) offers several storage solutions, each tailored to different use cases. Understanding which storage service is best for your needs can be tricky, but by breaking down the core AWS storage options—Amazon S3, Amazon EBS, Amazon EC2 Instance Store, Amazon EFS, and Amazon FSx—you can find the perfect fit for your project.

Key AWS Storage Services

1. Amazon EC2 Instance Store

Instance Store is ephemeral block storage directly attached to the physical server that runs your EC2 instance. This means that when the EC2 instance is terminated or stopped, the data in the instance store is lost. It’s fast and ideal for temporary data that doesn't need to persist beyond the instance’s lifecycle.

  • Key Features:
    • Storage is tied to the instance’s lifecycle.
    • Best suited for temporary storage of buffers, caches, and scratch data.
    • Included in the EC2 instance price.

Example Use Case:
You are running a video processing application that temporarily caches intermediate results while performing video transcoding. The intermediate data doesn’t need to be stored permanently, making EC2 Instance Store a cost-effective option for this temporary data.

2. Amazon Elastic Block Store (EBS)

Amazon EBS provides persistent block storage that can be attached to your EC2 instances. It’s perfect for workloads where data must persist beyond the life of the EC2 instance, such as databases or operating system boot volumes. Unlike Instance Store, EBS volumes can be stopped, detached, and reattached to different instances without losing data.

  • Key Features:
    • Data persists even after the instance is stopped or terminated.
    • Offers both SSD and HDD-backed volumes depending on your workload needs.
    • Highly durable, with data replicated within the same Availability Zone.

Example Use Case:
You’re running an ecommerce website on an EC2 instance, storing customer orders and payment information in a MySQL database. Since you need your data to be reliable and durable, Amazon EBS is the right choice because it provides persistent storage for the database even if the EC2 instance is stopped.

3. Amazon Simple Storage Service (S3)

Amazon S3 is an object storage service ideal for storing vast amounts of unstructured data like images, videos, backups, or logs. S3 provides near-limitless scalability and high durability, ensuring that your data is available when needed. Unlike block storage, S3 stores data in a flat structure, where each object is stored with a unique identifier (key) in a "bucket."

  • Key Features:
    • Object storage with virtually unlimited scalability.
    • Data is automatically replicated across multiple Availability Zones.
    • Perfect for media hosting, backups, and data lakes.

Example Use Case:
You are building a website where users can upload and share their photos. Since the photos don’t need to be processed in real time and scalability is a priority, Amazon S3 is a perfect solution. It allows you to store large numbers of photos, each with a unique identifier, and scale effortlessly as your user base grows.

4. Amazon Elastic File System (EFS)

Amazon EFS is a scalable file storage solution that can be mounted to multiple EC2 instances simultaneously. It provides a shared file system, making it a great choice for applications that need to access the same data across several EC2 instances, like content management systems or application servers.

  • Key Features:
    • Fully managed, scalable, and shared file system.
    • Automatically grows and shrinks as you add or remove files.
    • Supports NFS protocol and can be mounted to multiple EC2 instances.

Example Use Case:
You are running a fleet of web servers for a WordPress site that needs to store user uploads and configurations in a shared location. With Amazon EFS, you can mount the same file system to all the EC2 instances, ensuring that all the servers have access to the same data without the need for complicated synchronization.

5. Amazon FSx

Amazon FSx provides fully managed file systems optimized for specific workloads, such as Windows file shares (FSx for Windows File Server) or high-performance computing (FSx for Lustre). FSx makes it easy to manage shared file systems with specific performance or protocol requirements.

  • Key Features:
    • Supports different file system types, such as Windows File Server and Lustre.
    • Fully managed and scalable to meet your specific performance needs.
    • Integrates seamlessly with AWS services like S3.

Example Use Case:
Your organization is using applications that rely on Windows-based file shares. With Amazon FSx for Windows File Server, you can lift and shift your on-premises file shares to the cloud, ensuring seamless integration with Active Directory and enabling secure file sharing across your network.

Example 1: Real-Time Stock Trading App

You are developing a real-time stock trading app that needs to store transaction logs and market data snapshots temporarily for quick retrieval. The data is processed in real time, and durability isn’t a concern because it is discarded after the market closes.
Solution: Use EC2 Instance Store for fast, ephemeral storage of this temporary data. Since the data doesn't need to persist beyond the trading session, Instance Store is a great fit.

Example 2: Large Media Hosting

You are building an online platform for video streaming. Each user uploads video content, which is then transcoded and stored for long-term availability. Scalability and durability are the key factors.
Solution: Use Amazon S3 for object storage to host the media files. S3’s ability to store massive amounts of unstructured data and deliver it globally makes it perfect for this use case.

Example 3: Big Data Analytics

You are running a big data analytics job on large volumes of log files from various sources. The job involves sequential reads and writes, and the workload is throughput-intensive.
Solution: Use Amazon EBS HDD-backed volumes for storage. The sequential data I/O performance makes HDD-backed volumes an ideal solution for this type of workload.

Example 4: Disaster Recovery

You want to set up a disaster recovery system for your company’s mission-critical data, ensuring that backups are stored safely in a different region.
Solution: Use Amazon S3 Glacier for long-term, low-cost storage of your backups. S3 Glacier provides a highly durable and cost-effective solution for data that’s accessed infrequently but needs to be retained for several years.

Example 5: Shared Application Configuration

You are running an e-commerce platform that uses multiple EC2 instances for load balancing. Each instance needs access to the same application configurations and user session data.
Solution: Use Amazon EFS to create a shared file system accessible by all instances. This setup ensures that each instance has access to the same data in real time, helping maintain consistency across the platform.

Exploring Databases on AWS

Databases have been central to enterprises for decades, starting with the rise of relational databases in the 1970s. Back then, the choice was straightforward: most businesses opted for a relational database, with all data organized into tables and linked through relationships. However, with the emergence of cloud services, databases have evolved, and now AWS offers a wide range of managed and unmanaged database solutions to suit varying needs.

Understanding Relational Databases

A relational database organizes data into tables, where each table consists of rows and columns. Rows store individual records, while columns represent attributes. These tables can be linked by common columns, forming relationships between different sets of data.

For example, in an e-commerce database, you might have:

  • A books table with columns like ISBN, title, and author.
  • A sales table with details like sale ID, book ISBN, and date.
  • An authors table containing author details.

This linking allows you to efficiently query related data.

Relational Database Management Systems (RDBMS)

An RDBMS enables the management of relational databases. AWS supports many popular RDBMS options, including:

  • MySQL
  • PostgreSQL
  • Amazon Aurora
  • Oracle
  • SQL Server

Relational databases use SQL queries to retrieve and manipulate data. SQL allows you to write complex queries that pull data from multiple tables, providing a clear and organized view of related data across the system.

Benefits of Relational Databases:

  • Joins: Linking tables provides a clear understanding of data relationships.
  • Reduced Redundancy: Data is stored in one table and referenced across others, reducing duplicates.
  • Familiarity: Relational databases have been used for decades, making them well understood by most developers.
  • ACID Compliance: Ensures that data is always stored with high integrity.

Managed vs. Unmanaged Databases

When deploying a database on AWS, you have two options: managed or unmanaged.

Unmanaged Databases

In this scenario, AWS handles the physical infrastructure while you manage the rest. If you run a relational database on Amazon EC2, AWS maintains the hardware and operating system, but you handle everything else, including database configuration, performance tuning, and backups.

This option provides more control but requires more manual effort.

Managed Databases

Using AWS’s Amazon RDS, you offload most of the database management tasks to AWS. This includes provisioning, patching, backups, and failover. You still maintain control over data security and query optimization, but AWS manages the heavy lifting. This option provides more convenience, though you have less control over some backend processes.

Amazon RDS supports:

  • Commercial engines: Oracle, SQL Server
  • Open-source engines: MySQL, PostgreSQL, MariaDB
  • Cloud-native: Amazon Aurora (MySQL and PostgreSQL compatible)

What is Amazon RDS?

Amazon RDS (Relational Database Service) allows you to set up, operate, and scale relational databases without worrying about the underlying infrastructure. For instance, an e-commerce business may use RDS to manage its product inventory and customer data, focusing on growing the business rather than managing database infrastructure.

Key Features of Amazon RDS:

  • Automated backups and manual snapshots for data recovery.
  • Multi-AZ deployments for high availability and automatic failover.
  • Security: Network access control lists (ACLs), security groups, and AWS Identity and Access Management (IAM) policies to protect your database.

Backing Up Data

Backups are essential to prevent data loss. Amazon RDS offers two types of backups:

  • Automatic backups: These occur during a user-defined window and allow point-in-time recovery, where you can restore the database to a specific time.
  • Manual snapshots: These are initiated manually and can be retained as long as needed, beyond the 35-day limit of automatic backups.

High Availability with Amazon RDS Multi-AZ

Amazon RDS Multi-AZ provides redundancy by creating a standby copy of your database in a different Availability Zone (AZ). If the primary database fails, AWS automatically fails over to the standby instance, ensuring minimal downtime. Multi-AZ is perfect for applications that require high availability and fault tolerance.

Introduction to Amazon DynamoDB: A Scalable NoSQL Database

Amazon DynamoDB is a fully managed NoSQL database service, designed for performance, scalability, and ease of use. With DynamoDB, AWS takes care of the operational overhead, so you don’t have to worry about hardware provisioning, scaling, or performance bottlenecks. Let’s explore how DynamoDB works and why it’s a great fit for certain use cases.

What Makes Amazon DynamoDB Unique?

Unlike traditional relational databases, DynamoDB doesn't enforce rigid table relationships or schemas. This flexibility makes it ideal for storing and retrieving large volumes of data that may vary in structure. Whether you have just a few records or millions, DynamoDB manages the underlying storage and performance for you.

Key Features of DynamoDB:

  • No Server Management: It’s serverless, so you don’t have to manage infrastructure or handle scaling.
  • Highly Scalable: Automatically handles large volumes of data and traffic with no performance degradation.
  • Fast and Predictable Performance: DynamoDB delivers single-digit millisecond response times, even with massive datasets.
  • High Availability: Your data is replicated across multiple AWS Availability Zones, ensuring redundancy and durability.

Core Components of DynamoDB

In DynamoDB, data is organized in a simple yet powerful structure. The main components include:

  • Tables: Similar to tables in other databases, but with more flexibility. For example, you can have a People table to store personal information or a Cars table for vehicle data.

  • Items: Each item in a table represents a unique record, like a person in the People table or a car in the Cars table. Items in DynamoDB can be likened to rows in relational databases, but without the limit on the number of items you can store.

  • Attributes: These are the pieces of data that describe an item. For example, in a People table, attributes could be PersonID, FirstName, and LastName. Attributes in DynamoDB are similar to columns in relational databases.

Benefits of DynamoDB

  1. No Schema Enforcement: Unlike relational databases, DynamoDB doesn’t require a predefined schema. You can add or remove attributes to individual items without affecting the entire table. This flexibility allows your data structure to evolve as needed.

  2. High Availability and Durability: DynamoDB replicates your data across multiple Availability Zones within a region. This replication guarantees high availability, fault tolerance, and data durability.

  3. Elastic Scalability: DynamoDB automatically scales up and down based on the demands of your application, making it ideal for unpredictable or fluctuating workloads.

  4. NoSQL Performance: DynamoDB can handle large-scale requests, with performance optimized for rapid data retrieval and updates. It’s designed for applications where speed and scalability are critical, such as mobile apps, gaming platforms, and IoT systems.

How DynamoDB Differs from Relational Databases

Relational databases like MySQL or PostgreSQL require well-defined schemas and can struggle with scaling when large volumes of data or traffic are involved. They rely heavily on ACID compliance to ensure data integrity across complex relationships.

In contrast, DynamoDB is designed for:

  • Flexibility: No fixed schema is required.
  • High Availability: Data is automatically replicated across multiple AZs.
  • Scalability: The system can handle large amounts of traffic without performance degradation.

While relational databases excel at handling structured data and complex queries that span multiple tables, DynamoDB thrives in use cases where fast access to unstructured or semi-structured data is essential.

Security and Encryption

Security is a top priority for AWS services, and DynamoDB is no different. It offers encryption at rest, protecting sensitive data without you having to manage encryption keys or the complexity of securing your data manually. AWS Key Management Service (KMS) can handle this for you, providing a seamless and secure experience.

Choosing the Right AWS Database Service for Your Application

When you're building apps in the cloud, picking the right database can make all the difference. AWS offers various database services tailored for different needs. Here’s a quick overview of the main options:

1. Relational Databases (Amazon RDS, Amazon Aurora)

  • Use case: Traditional apps like ERP, CRM, and e-commerce.
  • Best for: Structured data that needs ACID transactions and high integrity.

2. Key-Value Databases (Amazon DynamoDB)

  • Use case: High-traffic apps like gaming or e-commerce.
  • Best for: Fast, scalable read and write operations for simple key-value pairs.

3. In-Memory Databases (Amazon ElastiCache)

  • Use case: Real-time leaderboards, caching, or session management.
  • Best for: Super-fast access to data stored in memory.

4. Document Databases (Amazon DocumentDB)

  • Use case: Content management and catalogs.
  • Best for: Storing flexible, semi-structured data like JSON.

5. Graph Databases (Amazon Neptune)

  • Use case: Social networks, fraud detection, or recommendation engines.
  • Best for: Managing and querying complex relationships.

Why Use Purpose-Built Databases?

In modern app development, it’s often better to use multiple databases tailored for different tasks. For example, you might use a relational database for handling transactions and a key-value database like DynamoDB for managing fast, high-traffic queries.

Comments