IAM USER CREATION:
AWS services require authentication to interact with them. Normally, this is done through IAM users, roles, or temporary credentials, but it is possible to use Boto3 without an IAM user by leveraging the root user's access key (which is not recommended due to security risks).
Using the Root User’s Access Key (Not Recommended)
Although AWS does not create an access key for the root user by default, you can manually create one and use it.
Steps to Generate a Root User Access Key (Not Recommended)
Login to AWS Console as the root user.
Navigate to IAM → Security Credentials.
Scroll to Access Keys and click Create New Access Key.
Copy the Access Key ID and Secret Access Key (you won’t see the secret key again).
Configure AWS CLI with:
- Enter Access Key ID (from step 4)
- Enter Secret Access Key (from step 4)
- Enter Default region name (e.g.,
us-east-1
) - Enter Default output format (
json
ortable
)
Now, you can use Boto3 to access AWS services.
Why Is This Not Recommended?
- Root user has unlimited privileges—if compromised, it can fully control your AWS account.
- Best practice: Use IAM users or roles instead.
Working with Amazon S3 Using Python SDK (Boto3)
Since IAM user setup is complete, let’s start with Amazon S3 (Simple Storage Service), which is one of the most commonly used AWS services in Generative AI workflows.
🚀 Lesson 1: Creating an S3 Bucket
Before we can upload or manage files, we need to create an S3 bucket.
📌 Code: Create an S3 Bucket
Real-Time Example: Secure Private Bucket
If you need a private S3 bucket with proper security, this would be a real-time production-ready way to do it:
Important Parameters in create_bucket()
When creating an S3 bucket using:
Important Parameters to Know:
Bucket
- The name of the bucket must be unique across AWS.
- Ensures uniqueness to avoid conflicts when multiple apps need different buckets.
CreateBucketConfiguration
- Defines bucket properties, such as the region.
- Needed when creating buckets in regions other than
us-east-1
.
LocationConstraint
- Specifies the AWS region for the bucket.
- Helps avoid latency by creating the bucket closer to users.
ACL (Access Control List)
- Controls the access level of the bucket (e.g., private, public-read, etc.).
- Should be set as private for sensitive data or public-read only when necessary.
ObjectLockEnabledForBucket
- Enables object lock to prevent data modification.
- Useful for compliance and legal data retention.
GrantRead / GrantWrite
- Assigns specific permissions to AWS users.
- Used when sharing access with another AWS account.
Common Mistakes to Avoid:
Not specifying LocationConstraint
- Always define
LocationConstraint
when creating a bucket outsideus-east-1
.
- Always define
Using a non-unique bucket name
- AWS bucket names are global—always choose a unique name.
Setting ACL to public-read accidentally
- This makes the bucket publicly accessible—use
private
unless needed.
- This makes the bucket publicly accessible—use
Forgetting to enable object lock for compliance
- If you need to store immutable data, always enable
ObjectLockEnabledForBucket=True
.
- If you need to store immutable data, always enable
Delete all objects
- AWS does not allow you to delete a non-empty bucket.
Delete versions (if enabled)
- Versioned objects must be explicitly removed before bucket deletion.
Delete the bucket
- The final step after clearing all contents.
🔴 Common Mistakes & Fixes
BucketNotEmpty Error
- Ensure you delete all objects before attempting to delete the bucket.
Access Denied Error
- IAM user must have
s3:DeleteBucket
ands3:DeleteObject
permissions.
- IAM user must have
Trying to delete a non-existing bucket
- Verify that the bucket name is correct before attempting deletion.
🔍 Do We Still Need Step 2?
Your question is insightful! Do we need to delete versions explicitly after deleting all standard objects? The answer depends on whether the bucket has versioning enabled or not.
✅ What Happens in Step 1?
- If versioning is disabled: This completely removes all objects.
- If versioning is enabled: This does not delete previous versions. Instead, it creates delete markers, meaning the files become "hidden" but are still stored in S3.
✅ What Happens in Step 2?
- If versioning is disabled: This step is unnecessary (because all files are already deleted in Step 1).
- If versioning is enabled: This removes all previous versions, which Step 1 does not do.
Multipart Uploads in S3
🔍 Why Use Multipart Uploads?
- Uploads large files efficiently by splitting them into parts.
- Allows parallel uploads, making the process faster.
- Resumable uploads in case of failure, reducing the risk of losing progress.
- Required for files larger than 5GB (mandatory for files >5TB).
📌 Full Code: Upload a Large File Using Multipart Upload
📌 How Multipart Upload Works
create_multipart_upload()
- Starts the multipart upload and returns an
UploadId
.
- Starts the multipart upload and returns an
upload_part()
- Uploads chunks of the file in parallel.
complete_multipart_upload()
- Combines all uploaded parts into a single file.
abort_multipart_upload()
- Cancels the upload if an error occurs.
✅ Real-Time Use Cases
Uploading a 10GB AI dataset
- Prevents upload failure by splitting the file into smaller chunks.
Handling network interruptions
- Resumable uploads allow retrying failed parts instead of restarting from scratch.
Parallel uploads for speed
- Multiple parts upload simultaneously, reducing total upload time.
🚀 How to Make Multipart Upload Truly Parallel?
To upload parts concurrently, we can use the concurrent.futures.ThreadPoolExecutor
to send multiple upload_part()
requests simultaneously.
✅ Optimized Parallel Multipart Upload
✅ What’s Different in This Code?
Processing
- Before (Sequential Upload): Uploads one part at a time (blocking).
- Now (Parallel Upload): Uploads multiple parts simultaneously.
Performance
- Before: Slower for large files.
- Now: Faster, as multiple chunks upload in parallel.
Error Handling
- Before: Works but has slower recovery.
- Now: Uses threading and retries failed parts automatically.
🚀 Key Optimizations
- Reads all parts first and stores them in chunks before uploading.
- Uses
ThreadPoolExecutor.map()
to upload multiple parts concurrently. - Filters out failed parts before completing the upload.
- Fails safely if no parts are successfully uploaded.
✅ Real-Time Use Cases
Uploading AI-generated videos
- Speeds up upload by sending chunks simultaneously.
Uploading multi-GB datasets
- Prevents bottlenecks caused by single-threaded execution.
Resilient to slow networks
- Faster recovery from failed uploads.
🔴 Common Mistakes & Fixes
Not all parts uploaded
- Use a part check before completing the upload.
Upload speed not improving
- Increase
ThreadPoolExecutor()
worker count for better concurrency.
- Increase
Memory issues with huge files
- Process chunks one by one instead of storing everything in memory.
Presigned URLs in S3 (Secure File Access Without Credentials)
Now that you've mastered multipart uploads, let's move to Presigned URLs, which are useful for securely accessing private S3 objects without exposing credentials.
🔍 Why Use Presigned URLs?
Scenario | Why Presigned URLs? |
---|---|
Securely share files | Allows temporary access to private S3 files. |
Restrict access duration | URLs automatically expire after a set time. |
Download or Upload Without IAM User Credentials | Users can access files without needing AWS credentials. |
Integrate with AI workflows | Model training data, logs, or AI-generated outputs can be shared via time-limited URLs. |
📌 Code: Generate a Presigned URL for Downloading a File
📌 Code: Generate a Presigned URL for Uploading a File
If you want external users to upload a file to your S3 bucket, use this:
✅ How Presigned URLs Work
generate_presigned_url('get_object')
- Generates a URL to download a file.
- Shareable with end-users for secure access.
generate_presigned_url('put_object')
- Generates a URL to upload a file.
- Allows users to upload files securely without AWS credentials.
🚀 Real-World Use Cases
AI Model Hosting
- Serve trained AI models to users without exposing S3 access.
Secure File Sharing
- Share reports, logs, or datasets with temporary access.
Client-side Uploads
- Users can upload AI-generated images without needing AWS credentials.
🔴 Common Mistakes & Fixes
"Access Denied" Error
- Ensure your IAM user has
s3:GetObject
(for downloads) ors3:PutObject
(for uploads).
- Ensure your IAM user has
Presigned URL Expired Too Soon
- Increase
ExpiresIn
value (max: 7 days).
- Increase
Generated URL Doesn’t Work
- Ensure the file exists in S3 before sharing the URL.
🚀 Optimized Code for Uploading a File Using a Presigned URL
S3 Transfer Acceleration (Faster Uploads & Downloads)
Now that you've mastered Presigned URLs, let’s move to S3 Transfer Acceleration, which helps you upload and download files faster using Amazon CloudFront’s global network.
🔍 Why Use S3 Transfer Acceleration?
Scenario | Why Use It? |
---|---|
Slow uploads from distant locations | Uses CloudFront’s edge locations to accelerate data transfer. |
Large AI datasets or model weights | Reduces upload time by optimizing the route to S3. |
Global users need fast access | Improves performance for users far from the S3 bucket’s region. |
📌 Step 1: Enable Transfer Acceleration on an S3 Bucket
Before using acceleration, you must enable it on the bucket.
📌 Step 2: Upload a File Using Transfer Acceleration
Once enabled, use the accelerated endpoint to upload files.
📌 Step 3: Download a File Using Transfer Acceleration
📌 How to Verify If Acceleration is Enabled?
Run this check status script:
✅ When Should You Use Transfer Acceleration?
Use It When...
- Uploading large AI models or datasets from remote locations.
- You have global users accessing the bucket.
- You need to minimize network latency for faster uploads.
Don't Use It When...
- Your uploads are already fast in the same AWS region.
- Your users are mostly in the same AWS region.
- You only work with small files (<100MB), where acceleration isn't necessary.
🔴 Common Mistakes & Fixes
"Transfer Acceleration Not Enabled" Error
- Run
enable_transfer_acceleration()
before using accelerated endpoints.
- Run
Uploads still slow
- Ensure you're using the
s3-accelerate.amazonaws.com
endpoint.
- Ensure you're using the
Not seeing improvement
- Only use Transfer Acceleration if your location is far from the AWS region of your bucket.
Object Tagging in S3 (Organizing Your Data Efficiently)
Now that we’ve covered S3 Transfer Acceleration, let's move on to Object Tagging, which helps in categorizing, searching, and managing files in S3.
🔍 Why Use Object Tagging?
Scenario | Why Use It? |
---|---|
Organizing AI datasets | Tag files as training , testing , or validation . |
Cost optimization | Apply lifecycle policies based on tags (e.g., archive old data). |
Access control | Restrict permissions using tags (e.g., limit access to sensitive files). |
Efficient searching | Find files quickly by filtering based on metadata. |
📌 Step 1: Upload a File with Tags
When uploading a file, we can add tags to categorize it.
📌 Step 2: Add/Update Tags for an Existing File
If a file is already in S3, we can update or add new tags.
📌 Step 3: Retrieve Tags for a File
You can fetch tags of an object to check how it’s categorized.
📌 Step 4: Remove Tags from a File
If you want to remove all tags, use this:
Lesson 5: S3 Bucket Policies & Access Control (IAM, Bucket Policies, and Public Access)
Now that you've mastered Object Tagging, let's move on to S3 Bucket Policies & Access Control, which are crucial for securing and managing access to your S3 bucket.
🔍 Why Use Bucket Policies & Access Control?
Restrict Unauthorized Access
- Ensures only authorized users/services can access the bucket.
Public or Private File Access
- Controls whether a file is accessible via a public URL.
Secure AI Model Storage
- Protects AI datasets and models from unintended modifications.
Enable Cross-Account Access
- Allows trusted AWS accounts to access the bucket securely.
✅ Three Ways to Control Access in S3
IAM Policies
- Best for: Controlling access based on AWS users & roles.
- Applied to: AWS Users, Groups, or Roles.
Bucket Policies
- Best for: Managing access to the entire bucket.
- Applied to: S3 Buckets.
Block Public Access Settings
- Best for: Ensuring a bucket is never publicly accessible.
- Applied to: S3 Buckets.
📌 Step 1: Restrict or Allow Public Access to an S3 Bucket
By default, AWS blocks public access for security. You must disable block public access settings if you need to allow public reads.
- 🔹 Set
allow_public=False
to completely restrict public access. - 🔹 Set
allow_public=True
to allow public access (use with caution!).
📌 Step 2: Apply an S3 Bucket Policy to Control Access
A Bucket Policy allows or denies access to the entire bucket.
- 🔹 This makes all objects in the bucket publicly readable.
- 🔹 Modify
Principal
to restrict access to a specific AWS account.
📌 Step 3: Grant Read Access to a Specific File (Recommended Fix for ACL Issue)
Instead of using ACLs, AWS now recommends using a Bucket Policy to grant access to specific objects.
- 🔹 This fixes the previous issue where
put_object_acl()
failed due to enforced ACL restrictions. - 🔹 This method ensures that only this specific file is public, not the entire bucket.
📌 Step 4: Grant Cross-Account Access (Allow Another AWS Account to Access Bucket)
If you need to allow another AWS account to access the bucket, modify the bucket policy.
- 🔹 Replace
account_id
with the AWS Account ID you want to allow access.
🚀 Which Method Should You Use?
Restrict access per AWS user or role
- Use IAM Policies.
Manage access for the entire bucket
- Use Bucket Policies.
Allow or block all public access
- Use Block Public Access Settings.
Allow cross-account access
- Use Bucket Policies with Principal.
Grant public read access to a single file
- Use Bucket Policies (NOT ACLs).
✅ Summary of S3 Public Access Settings
BlockPublicAcls
- Effect: Prevents public ACLs.
- Best Use Case: Use when relying on Bucket Policies instead of ACLs.
IgnorePublicAcls
- Effect: Ignores existing ACLs.
- Best Use Case: Use to completely disable ACL-based access.
BlockPublicPolicy
- Effect: Prevents public bucket policies.
- Best Use Case: Use to restrict public access even via policies.
RestrictPublicBuckets
- Effect: Fully blocks public access.
- Best Use Case: Strongest security measure to prevent accidental exposure.
Comments
Post a Comment