Data Engineering On Aws Vol 1 - Olap & Data Warehouse
Published 1/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 18.54 GB | Duration: 46h 4m
Published 1/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 18.54 GB | Duration: 46h 4m
Detailed training (Level 350) on AWS Data Engineering Services Redshift, S3, Athena, Hive, Glue Catalog, Lakeformation
What you'll learn
Understand Data Engineering (Volume 1) on AWS using S3, Redshift, Athena and Hive
Know Redshift, S3 and Athena up to Level 350+ with HANDS-ON
Production level projects and hands-on to help candidates provide on-job-like training
Get access to datasets of size 100 GB - 200 GB and practice using the same
Learn Python for Data Engineering with HANDS-ON (Functions, Arguments, OOP (class, object, self), Modules, Packages, Multithreading, file handling etc.
Learn SQL for Data Engineering with HANDS-ON (Database objects, CASE, Window Functions, CTE, CTAS, MERGE, Materialized View etc.)
Requirements
Good to have AWS and SQL knowledge
Description
This is Volume 1 of Data Engineering course on AWS. This course will give you detailed explanations on AWS Data Engineering Services like S3 (Simple Storage Service), Redshift, Athena, Hive, Glue Data Catalog, Lake Formation. This course delves into the data warehouse or consumption and storage layer of Data Engineering pipeline. In Volume 2, I will showcase Data Processing (Batch and Streaming) Services. You will get opportunities to do hands-on using large datasets (100 GB - 300 GB or more of data). Moreover, this course will provide you hands-on exercises that match with real-time scenarios like Redshift query performance tuning, streaming ingestion, Window functions, ACID transactions, COPY command, Distributed & Sort key, WLM, Row level and column level security, Athena partitioning, Athena WLM etc. Some other highlights:Contains training of data modelling - Normalization & ER Diagram for OLTP systems. Dimensional modelling for OLAP/DWH systems.Data modelling hands-on.Other technologies covered - EC2, EBS, VPC and IAM.This is Part 1 (Volume 1) of the full data engineering course. In Part 2 (Volume 2), I will be covering the following Topics.Spark (Batch and Stream processing using AWS EMR, AWS Glue ETL, GCP Dataproc)Kafka (on AWS & GCP)FlinkApache AirflowApache PinotAWS Kinesis and more.
Overview
Section 1: Introduction - Data Engineering Volume 1 on AWS
Lecture 1 Course Introduction and Resources
Lecture 2 2. Course Introduction and Course Contents
Lecture 3 Course Details - Projects, About Me
Section 2: (Optional) AWS Pre-requisites - EC2 & EBS
Lecture 4 AWS Cloud and EC2 Introduction
Lecture 5 EC2 Console & HandsOn
Lecture 6 EBS Theory
Lecture 7 EBS Hands On
Section 3: (Optional) AWS Pre-requisites - VPC
Lecture 8 VPC Introduction & Components
Lecture 9 VPC Components Hands On
Lecture 10 Bastion Host
Lecture 11 Security Groups
Lecture 12 NAT Gateway & VPC Endpoint
Lecture 13 VPC Peering
Section 4: (Optional) AWS Pre-requisites - IAM
Lecture 14 IAM Introduction & Hands On
Lecture 15 IAM Service Roles
Section 5: (Optional) Non AWS Pre-requisites - SQL Basics
Lecture 16 1. SQL Introduction
Lecture 17 2. SQL Client & Server Setup
Lecture 18 3. SQL Database Objects Theory
Lecture 19 4. Database Objects Hands On
Lecture 20 5. CRUD Operations
Lecture 21 6. SELECT Operators
Lecture 22 7. CASE COALESCE Functions
Lecture 23 8. DATE Functions
Lecture 24 9. CTAS Cast Concat
Lecture 25 10. Update Delete Truncate
Lecture 26 11. HAVING Clause
Lecture 27 12. Inner Join, Left Join, Right Join, Outer Join
Lecture 28 13. Union Intersect View
Lecture 29 14. Materialized View
Lecture 30 15. Common Table Expression (CTE)
Lecture 31 16. SQL Window Functions
Lecture 32 17. MERGE statement & Summary
Section 6: (Optional) Non AWS Pre-requisites - Python Basics
Lecture 33 1. Python Intro - Architecture, PyCharm, Virtual Env
Lecture 34 2. PyCharm & CLI Walkthrough
Lecture 35 3. Compiled vs Interpreted
Lecture 36 4. Everything is Python is Object
Lecture 37 5. String Data Type
Lecture 38 6. Number Data Type
Lecture 39 7. List Data Type
Lecture 40 8. Tuple Data Type
Lecture 41 9. Set & Dict Data Type, Type Conversion
Lecture 42 10. Python Operators
Lecture 43 11. Set up Python interpreter in PyCharm
Lecture 44 12. Print & Input Functions
Lecture 45 13. IF Statement
Lecture 46 14. For & While loops
Lecture 47 15. Functions Intro
Lecture 48 16. Function Scoping
Lecture 49 17. Functions RETURN
Lecture 50 18. Function Arguments
Lecture 51 19. Modify Arguments
Lecture 52 20. Positional & Keyword Arguments
Lecture 53 21. args & kwargs
Lecture 54 22. Class Object Self
Lecture 55 23. Class-Instance Variables, __init__
Lecture 56 24. Class Object Exercise 1
Lecture 57 25. Class Object Exercise 2
Lecture 58 26. Inheritance
Lecture 59 27. Python Memory Management
Lecture 60 28. Modules & Packages
Lecture 61 29. HandsOn Exercise
Lecture 62 30. Module Pre-compilation
Lecture 63 31. Namespace & __name__
Lecture 64 32. Error Handling in Python
Lecture 65 33. File Handling
Lecture 66 34. CSV & JSON module
Lecture 67 35. Python Multi-threading concept
Lecture 68 36. Multi-threading hands-on and exercise
Lecture 69 37. Debugging & Profiling
Section 7: Data Engineering Introduction
Lecture 70 1. Data Engineering introduction, OLTP & OLAP
Lecture 71 2. Data Mart & Data Mesh
Lecture 72 3. Data Lake, Data Lakehouse, DWH
Section 8: AWS Distributed Storage - S3 (Simple Storage Service) for Data Engineers
Lecture 73 1. S3 Introduction 1
Lecture 74 2. S3 Introduction 2
Lecture 75 3. S3 Basics
Lecture 76 4. S3 Basics Hands-on
Lecture 77 5. S3 Versioning
Lecture 78 6. S3 Encryption
Lecture 79 7. Storage Class
Lecture 80 8. S3 Multipart Upload
Lecture 81 9. Lifecycle Policies
Lecture 82 10. Cross Region Replication
Lecture 83 11. S3 Mountpoint
Lecture 84 12. Security - S3 Identity Based Policy
Lecture 85 13. Security - S3 Bucket Policy
Lecture 86 14. Bucket Policy with VPC, IP address, VPCE
Lecture 87 15. S3 Access Point
Lecture 88 16. S3 Object Lambda
Lecture 89 17. Pre-signed URL
Lecture 90 18. S3 Performance Considerations
Lecture 91 19. S3 Pricing
Lecture 92 20. Architectural Patterns using S3
Section 9: Data Modelling - Normalization, ER Diagram, Dimensional Modelling,
Lecture 93 1. Data Modelling Introduction
Lecture 94 2. Normal Forms 1NF 2NF 3NF
Lecture 95 3. Relations: one-to-one, one-to-many, many-to-one, many-to-many
Lecture 96 4. Dimensional modelling - Facts, Dimensions & Grains
Lecture 97 5. Grains Exercise
Lecture 98 6. Dimensional Modelling Technique
Lecture 99 7. Types of Fact & Dimension Tables
Section 10: Data Warehouse on AWS - Redshift Infra
Lecture 100 1. Redshift Infra
Lecture 101 2. Redshift Infra HandsOn
Lecture 102 3. Redshift Architecture - Zone Map, Columnar Storage
Lecture 103 4. Cluster Resize - Elastic & Classic
Lecture 104 5. Cluster Resize - HandsOn
Lecture 105 6. Cluster Pause & Rename
Lecture 106 7. Snapshot & Backup
Lecture 107 8. Redsfhit Infra Conclusion
Section 11: Redshift Objects
Lecture 108 1. Querying, Connection, RSQL, QEV2
Lecture 109 2. Query Editor & RSQL setup
Lecture 110 3. Object Hierarcy, tables hands-on
Lecture 111 4. Data Types Hands-on
Lecture 112 5. Table operations Hands-on
Lecture 113 6. Redshift ACID, Locks, Isolation Level
Lecture 114 7. Implement Transactions
Lecture 115 8. AccessShareLock & ShareRowExclusiveLock HandsOn
Lecture 116 9. Redshift SUPER datatype
Lecture 117 10. Section Summary
Section 12: Redshift Deep Dive
Lecture 118 1. Distribution Key & Style, Sort Key
Lecture 119 2. Column Compression
Lecture 120 3. Modify Dist Sort Key, Compression HandsOn
Lecture 121 4. COPY Command Theory
Lecture 122 5. COPY Command HandsOn
Lecture 123 6. UNLOAD Command
Lecture 124 7. AWS DMS - Move from OLTP to DWH
Lecture 125 8. DMS - Setup Source OLTP & Python application
Lecture 126 9. Setup DMS Instance, Endpoint, Task
Lecture 127 10. DMS Task - OLTP to DWH
Lecture 128 11. Table Maintenance - VACUUM & ANALYZE
Lecture 129 12. Vacuum & Analyze HandsOn
Section 13: Redshift Features
Lecture 130 1. Materialized View (MV)
Lecture 131 2. MV HandsOn
Lecture 132 3. Query Federation
Lecture 133 4. Redshift Spectrum
Lecture 134 5. Streaming Ingestion
Lecture 135 6. Redshift Feature Use Cases
Section 14: Redshift Query Tuning
Lecture 136 1. Query Execution
Lecture 137 2. EXPLAIN Plan & System Joins
Lecture 138 3. System Joins HandsOn
Lecture 139 4. Data RE-distribution
Lecture 140 5. EXPLAIN & RE-distribution HandsOn
Lecture 141 6. Query Tuning Exercise - Part 1
Lecture 142 7. Query Tuning Exercise - Part 2
Lecture 143 8. Query Tuning Exercise - Part 3
Section 15: Redshift Workload Management (WLM)
Lecture 144 1. WLM Intro & Query Queue
Lecture 145 2. Concurrency Scaling, Short Query Acceleration
Lecture 146 3. Configure WLM HandsOn
Lecture 147 4. Create Query Queue HandsOn
Lecture 148 5. Query Queue In Action
Lecture 149 6. Concurrency Scaling In Action
Section 16: Redshift Security - RBAC, CLS, RLS, Dynamic Data Masking (DDM)
Lecture 150 1. Users, Roles, RBAC
Lecture 151 2. Users, Roles, RBAC HandsOn
Lecture 152 3. Row & Column Level Security (RLS & CLS)
Lecture 153 4. Multiple RLS Policies HandsOn
Lecture 154 5. CLS HandsOn
Lecture 155 6. Combine RLS & CLS HandsOn
Lecture 156 7. Dynamic Data Masking
Lecture 157 8. Track Users, Roles, CLS and RLS
Lecture 158 9. Audit Logging
Section 17: Monitoring in Redshift
Lecture 159 1. Monitor Redshift using Console
Lecture 160 2. System Views for Monitoring Queries, Redshift Objects, Configuration Parms
Section 18: Redshift Serverless
Lecture 161 1. Introduction to Redshift Serverless
Lecture 162 2. Create & Delete Redshift Serverless Resources
Lecture 163 3. COPY & UNLOAD in Serverless
Lecture 164 4. ZeroETL Integration Setup
Lecture 165 5. ZeroETL in Action
Lecture 166 6. Query Tuning Similarities
Lecture 167 7. Migrate from Provisioned to Serverless
Section 19: Detailed Redshift Pricing
Lecture 168 1. Redshift Pricing Components
Lecture 169 2. Pricing Example - Provisioned, Serverless, Concurrency Scaling, Spectrum
Lecture 170 3. AWS Pricing Calculator
Section 20: Redshift Additional Information
Lecture 171 1. Redshift Integration with AWS Services
Lecture 172 2. Redshift & Snowflake Comparison
Lecture 173 3. Redshift Best Practices
Lecture 174 4. Redshift Limitations and Challenges
Section 21: AWS Metadata Repository - Glue Data Catalog
Lecture 175 1. AWS Glue Catalog - Theory
Lecture 176 2. Glue Catalog - Setup Data Stores & IAM Roles HandsOn
Lecture 177 3. Store Aurora metadata in Glue Catalog
Lecture 178 4. Store S3 and Redshift metadata in Glue Catalog
Section 22: Data Governance using AWS Lake Formation
Lecture 179 Lake Formation Introduction
Lecture 180 Permission Flow HandsOn 1
Lecture 181 Permission Flow HandsOn 2
Lecture 182 Lake Formation - Tag Based Access Control (LF-TBAC)
Lecture 183 LF-TBAC HandsOn
Lecture 184 LF - Data Filtering
Lecture 185 LF Clean Up (Please complete this)
Section 23: Data Lakehouse on AWS - Athena
Lecture 186 1. Athena Introduction
Lecture 187 2. Athena Intro Hands On
Lecture 188 3. Athena SerDe, File & Row format
Lecture 189 4. SerDe, Format, CTAS Hands On
Lecture 190 5. UNLOAD, Prepare & Execute, Query JSON
Lecture 191 6. UNLOAD, Prepare & Execute, Query JSON Hands On
Lecture 192 7. Schema Evolution, JSON_EXTRACT
Lecture 193 8. Iceberg, ACID
Lecture 194 9. Athena Partitioning & Bucketing
Lecture 195 10. More DDL Commands
Lecture 196 11. Athena WLM Theory
Lecture 197 12. Workgroup HandsOn
Lecture 198 13. Capacity Reservation HandsOn
Lecture 199 14. Performance Tuning Theory
Lecture 200 15. Athena Pricing & Performance Tuning
Lecture 201 16. Architectural Patterns using Athena
Section 24: Big Data Warehouse - Hive
Lecture 202 1. Hadoop Theory
Lecture 203 2. File Formats
Lecture 204 3. Hive Architecture & Components
Lecture 205 4. Hive CLI
Lecture 206 5. Data Types, databases, tables, File & Row Format, Hive SerDe
Lecture 207 6. Hive Databases hands-on
Lecture 208 7. Hive Tables hands-on
Lecture 209 8. Partitioning & Bucketing
Lecture 210 9. Partitioning & Bucketing hands-on
Lecture 211 10. Load, insert, ACID, Materialized Views etc
Lecture 212 11. JOINs, Locks, Configuration Parameters
Data Engineers, Data Scientists, Data Analysts,Python developers, Application Developers, Big Data Developers,Database Administrators (DBA), Big Data Administrators,Solutions Architect, Cloud Architect, Big Data Architect,Technical Managers, Engineering Managers, Project Managers