Data Engineering On Aws Vol 1 - Olap & Data Warehouse

Posted By: ELK1nG

Data Engineering On Aws Vol 1 - Olap & Data Warehouse
Published 1/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 18.54 GB | Duration: 46h 4m

Detailed training (Level 350) on AWS Data Engineering Services Redshift, S3, Athena, Hive, Glue Catalog, Lakeformation

What you'll learn

Understand Data Engineering (Volume 1) on AWS using S3, Redshift, Athena and Hive

Know Redshift, S3 and Athena up to Level 350+ with HANDS-ON

Production level projects and hands-on to help candidates provide on-job-like training

Get access to datasets of size 100 GB - 200 GB and practice using the same

Learn Python for Data Engineering with HANDS-ON (Functions, Arguments, OOP (class, object, self), Modules, Packages, Multithreading, file handling etc.

Learn SQL for Data Engineering with HANDS-ON (Database objects, CASE, Window Functions, CTE, CTAS, MERGE, Materialized View etc.)

Requirements

Good to have AWS and SQL knowledge

Description

This is Volume 1 of Data Engineering course on AWS. This course will give you detailed explanations on AWS Data Engineering Services like S3 (Simple Storage Service), Redshift, Athena, Hive, Glue Data Catalog, Lake Formation. This course delves into the data warehouse or consumption and storage layer of Data Engineering pipeline. In Volume 2, I will showcase Data Processing (Batch and Streaming) Services. You will get opportunities to do hands-on using large datasets (100 GB - 300 GB or more of data). Moreover, this course will provide you hands-on exercises that match with real-time scenarios like Redshift query performance tuning, streaming ingestion, Window functions, ACID transactions, COPY command, Distributed & Sort key, WLM, Row level and column level security, Athena partitioning, Athena WLM etc.  Some other highlights:Contains training of data modelling - Normalization & ER Diagram for OLTP systems. Dimensional modelling for OLAP/DWH systems.Data modelling hands-on.Other technologies covered - EC2, EBS, VPC and IAM.This is Part 1 (Volume 1) of the full data engineering course. In Part 2 (Volume 2), I will be covering the following Topics.Spark (Batch and Stream processing using AWS EMR, AWS Glue ETL, GCP Dataproc)Kafka (on AWS & GCP)FlinkApache AirflowApache PinotAWS Kinesis and more.

Overview

Section 1: Introduction - Data Engineering Volume 1 on AWS

Lecture 1 Course Introduction and Resources

Lecture 2 2. Course Introduction and Course Contents

Lecture 3 Course Details - Projects, About Me

Section 2: (Optional) AWS Pre-requisites - EC2 & EBS

Lecture 4 AWS Cloud and EC2 Introduction

Lecture 5 EC2 Console & HandsOn

Lecture 6 EBS Theory

Lecture 7 EBS Hands On

Section 3: (Optional) AWS Pre-requisites - VPC

Lecture 8 VPC Introduction & Components

Lecture 9 VPC Components Hands On

Lecture 10 Bastion Host

Lecture 11 Security Groups

Lecture 12 NAT Gateway & VPC Endpoint

Lecture 13 VPC Peering

Section 4: (Optional) AWS Pre-requisites - IAM

Lecture 14 IAM Introduction & Hands On

Lecture 15 IAM Service Roles

Section 5: (Optional) Non AWS Pre-requisites - SQL Basics

Lecture 16 1. SQL Introduction

Lecture 17 2. SQL Client & Server Setup

Lecture 18 3. SQL Database Objects Theory

Lecture 19 4. Database Objects Hands On

Lecture 20 5. CRUD Operations

Lecture 21 6. SELECT Operators

Lecture 22 7. CASE COALESCE Functions

Lecture 23 8. DATE Functions

Lecture 24 9. CTAS Cast Concat

Lecture 25 10. Update Delete Truncate

Lecture 26 11. HAVING Clause

Lecture 27 12. Inner Join, Left Join, Right Join, Outer Join

Lecture 28 13. Union Intersect View

Lecture 29 14. Materialized View

Lecture 30 15. Common Table Expression (CTE)

Lecture 31 16. SQL Window Functions

Lecture 32 17. MERGE statement & Summary

Section 6: (Optional) Non AWS Pre-requisites - Python Basics

Lecture 33 1. Python Intro - Architecture, PyCharm, Virtual Env

Lecture 34 2. PyCharm & CLI Walkthrough

Lecture 35 3. Compiled vs Interpreted

Lecture 36 4. Everything is Python is Object

Lecture 37 5. String Data Type

Lecture 38 6. Number Data Type

Lecture 39 7. List Data Type

Lecture 40 8. Tuple Data Type

Lecture 41 9. Set & Dict Data Type, Type Conversion

Lecture 42 10. Python Operators

Lecture 43 11. Set up Python interpreter in PyCharm

Lecture 44 12. Print & Input Functions

Lecture 45 13. IF Statement

Lecture 46 14. For & While loops

Lecture 47 15. Functions Intro

Lecture 48 16. Function Scoping

Lecture 49 17. Functions RETURN

Lecture 50 18. Function Arguments

Lecture 51 19. Modify Arguments

Lecture 52 20. Positional & Keyword Arguments

Lecture 53 21. args & kwargs

Lecture 54 22. Class Object Self

Lecture 55 23. Class-Instance Variables, __init__

Lecture 56 24. Class Object Exercise 1

Lecture 57 25. Class Object Exercise 2

Lecture 58 26. Inheritance

Lecture 59 27. Python Memory Management

Lecture 60 28. Modules & Packages

Lecture 61 29. HandsOn Exercise

Lecture 62 30. Module Pre-compilation

Lecture 63 31. Namespace & __name__

Lecture 64 32. Error Handling in Python

Lecture 65 33. File Handling

Lecture 66 34. CSV & JSON module

Lecture 67 35. Python Multi-threading concept

Lecture 68 36. Multi-threading hands-on and exercise

Lecture 69 37. Debugging & Profiling

Section 7: Data Engineering Introduction

Lecture 70 1. Data Engineering introduction, OLTP & OLAP

Lecture 71 2. Data Mart & Data Mesh

Lecture 72 3. Data Lake, Data Lakehouse, DWH

Section 8: AWS Distributed Storage - S3 (Simple Storage Service) for Data Engineers

Lecture 73 1. S3 Introduction 1

Lecture 74 2. S3 Introduction 2

Lecture 75 3. S3 Basics

Lecture 76 4. S3 Basics Hands-on

Lecture 77 5. S3 Versioning

Lecture 78 6. S3 Encryption

Lecture 79 7. Storage Class

Lecture 80 8. S3 Multipart Upload

Lecture 81 9. Lifecycle Policies

Lecture 82 10. Cross Region Replication

Lecture 83 11. S3 Mountpoint

Lecture 84 12. Security - S3 Identity Based Policy

Lecture 85 13. Security - S3 Bucket Policy

Lecture 86 14. Bucket Policy with VPC, IP address, VPCE

Lecture 87 15. S3 Access Point

Lecture 88 16. S3 Object Lambda

Lecture 89 17. Pre-signed URL

Lecture 90 18. S3 Performance Considerations

Lecture 91 19. S3 Pricing

Lecture 92 20. Architectural Patterns using S3

Section 9: Data Modelling - Normalization, ER Diagram, Dimensional Modelling,

Lecture 93 1. Data Modelling Introduction

Lecture 94 2. Normal Forms 1NF 2NF 3NF

Lecture 95 3. Relations: one-to-one, one-to-many, many-to-one, many-to-many

Lecture 96 4. Dimensional modelling - Facts, Dimensions & Grains

Lecture 97 5. Grains Exercise

Lecture 98 6. Dimensional Modelling Technique

Lecture 99 7. Types of Fact & Dimension Tables

Section 10: Data Warehouse on AWS - Redshift Infra

Lecture 100 1. Redshift Infra

Lecture 101 2. Redshift Infra HandsOn

Lecture 102 3. Redshift Architecture - Zone Map, Columnar Storage

Lecture 103 4. Cluster Resize - Elastic & Classic

Lecture 104 5. Cluster Resize - HandsOn

Lecture 105 6. Cluster Pause & Rename

Lecture 106 7. Snapshot & Backup

Lecture 107 8. Redsfhit Infra Conclusion

Section 11: Redshift Objects

Lecture 108 1. Querying, Connection, RSQL, QEV2

Lecture 109 2. Query Editor & RSQL setup

Lecture 110 3. Object Hierarcy, tables hands-on

Lecture 111 4. Data Types Hands-on

Lecture 112 5. Table operations Hands-on

Lecture 113 6. Redshift ACID, Locks, Isolation Level

Lecture 114 7. Implement Transactions

Lecture 115 8. AccessShareLock & ShareRowExclusiveLock HandsOn

Lecture 116 9. Redshift SUPER datatype

Lecture 117 10. Section Summary

Section 12: Redshift Deep Dive

Lecture 118 1. Distribution Key & Style, Sort Key

Lecture 119 2. Column Compression

Lecture 120 3. Modify Dist Sort Key, Compression HandsOn

Lecture 121 4. COPY Command Theory

Lecture 122 5. COPY Command HandsOn

Lecture 123 6. UNLOAD Command

Lecture 124 7. AWS DMS - Move from OLTP to DWH

Lecture 125 8. DMS - Setup Source OLTP & Python application

Lecture 126 9. Setup DMS Instance, Endpoint, Task

Lecture 127 10. DMS Task - OLTP to DWH

Lecture 128 11. Table Maintenance - VACUUM & ANALYZE

Lecture 129 12. Vacuum & Analyze HandsOn

Section 13: Redshift Features

Lecture 130 1. Materialized View (MV)

Lecture 131 2. MV HandsOn

Lecture 132 3. Query Federation

Lecture 133 4. Redshift Spectrum

Lecture 134 5. Streaming Ingestion

Lecture 135 6. Redshift Feature Use Cases

Section 14: Redshift Query Tuning

Lecture 136 1. Query Execution

Lecture 137 2. EXPLAIN Plan & System Joins

Lecture 138 3. System Joins HandsOn

Lecture 139 4. Data RE-distribution

Lecture 140 5. EXPLAIN & RE-distribution HandsOn

Lecture 141 6. Query Tuning Exercise - Part 1

Lecture 142 7. Query Tuning Exercise - Part 2

Lecture 143 8. Query Tuning Exercise - Part 3

Section 15: Redshift Workload Management (WLM)

Lecture 144 1. WLM Intro & Query Queue

Lecture 145 2. Concurrency Scaling, Short Query Acceleration

Lecture 146 3. Configure WLM HandsOn

Lecture 147 4. Create Query Queue HandsOn

Lecture 148 5. Query Queue In Action

Lecture 149 6. Concurrency Scaling In Action

Section 16: Redshift Security - RBAC, CLS, RLS, Dynamic Data Masking (DDM)

Lecture 150 1. Users, Roles, RBAC

Lecture 151 2. Users, Roles, RBAC HandsOn

Lecture 152 3. Row & Column Level Security (RLS & CLS)

Lecture 153 4. Multiple RLS Policies HandsOn

Lecture 154 5. CLS HandsOn

Lecture 155 6. Combine RLS & CLS HandsOn

Lecture 156 7. Dynamic Data Masking

Lecture 157 8. Track Users, Roles, CLS and RLS

Lecture 158 9. Audit Logging

Section 17: Monitoring in Redshift

Lecture 159 1. Monitor Redshift using Console

Lecture 160 2. System Views for Monitoring Queries, Redshift Objects, Configuration Parms

Section 18: Redshift Serverless

Lecture 161 1. Introduction to Redshift Serverless

Lecture 162 2. Create & Delete Redshift Serverless Resources

Lecture 163 3. COPY & UNLOAD in Serverless

Lecture 164 4. ZeroETL Integration Setup

Lecture 165 5. ZeroETL in Action

Lecture 166 6. Query Tuning Similarities

Lecture 167 7. Migrate from Provisioned to Serverless

Section 19: Detailed Redshift Pricing

Lecture 168 1. Redshift Pricing Components

Lecture 169 2. Pricing Example - Provisioned, Serverless, Concurrency Scaling, Spectrum

Lecture 170 3. AWS Pricing Calculator

Section 20: Redshift Additional Information

Lecture 171 1. Redshift Integration with AWS Services

Lecture 172 2. Redshift & Snowflake Comparison

Lecture 173 3. Redshift Best Practices

Lecture 174 4. Redshift Limitations and Challenges

Section 21: AWS Metadata Repository - Glue Data Catalog

Lecture 175 1. AWS Glue Catalog - Theory

Lecture 176 2. Glue Catalog - Setup Data Stores & IAM Roles HandsOn

Lecture 177 3. Store Aurora metadata in Glue Catalog

Lecture 178 4. Store S3 and Redshift metadata in Glue Catalog

Section 22: Data Governance using AWS Lake Formation

Lecture 179 Lake Formation Introduction

Lecture 180 Permission Flow HandsOn 1

Lecture 181 Permission Flow HandsOn 2

Lecture 182 Lake Formation - Tag Based Access Control (LF-TBAC)

Lecture 183 LF-TBAC HandsOn

Lecture 184 LF - Data Filtering

Lecture 185 LF Clean Up (Please complete this)

Section 23: Data Lakehouse on AWS - Athena

Lecture 186 1. Athena Introduction

Lecture 187 2. Athena Intro Hands On

Lecture 188 3. Athena SerDe, File & Row format

Lecture 189 4. SerDe, Format, CTAS Hands On

Lecture 190 5. UNLOAD, Prepare & Execute, Query JSON

Lecture 191 6. UNLOAD, Prepare & Execute, Query JSON Hands On

Lecture 192 7. Schema Evolution, JSON_EXTRACT

Lecture 193 8. Iceberg, ACID

Lecture 194 9. Athena Partitioning & Bucketing

Lecture 195 10. More DDL Commands

Lecture 196 11. Athena WLM Theory

Lecture 197 12. Workgroup HandsOn

Lecture 198 13. Capacity Reservation HandsOn

Lecture 199 14. Performance Tuning Theory

Lecture 200 15. Athena Pricing & Performance Tuning

Lecture 201 16. Architectural Patterns using Athena

Section 24: Big Data Warehouse - Hive

Lecture 202 1. Hadoop Theory

Lecture 203 2. File Formats

Lecture 204 3. Hive Architecture & Components

Lecture 205 4. Hive CLI

Lecture 206 5. Data Types, databases, tables, File & Row Format, Hive SerDe

Lecture 207 6. Hive Databases hands-on

Lecture 208 7. Hive Tables hands-on

Lecture 209 8. Partitioning & Bucketing

Lecture 210 9. Partitioning & Bucketing hands-on

Lecture 211 10. Load, insert, ACID, Materialized Views etc

Lecture 212 11. JOINs, Locks, Configuration Parameters

Data Engineers, Data Scientists, Data Analysts,Python developers, Application Developers, Big Data Developers,Database Administrators (DBA), Big Data Administrators,Solutions Architect, Cloud Architect, Big Data Architect,Technical Managers, Engineering Managers, Project Managers