Start Your Data Engineering Journey: Project Based Learning
Last updated 8/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.58 GB | Duration: 5h 11m
Last updated 8/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.58 GB | Duration: 5h 11m
Learn By Doing with APIs, SQL, Python, Docker, Airflow, CI/CD, Functional & Data Quality Tests and More!
What you'll learn
Build Python scripts for data extraction by interacting with APIs using Postman, loading into the data warehouse and transforming (ELT)
Use PostgreSQL as a data warehouse. Interact with the data warehouse using both psql & DBeaver
Discover how to containerize data applications using Docker, making your data pipelines portable and easy to scale.
Master the basics of orchestrating and automating your data workflows with Apache Airflow, a must-have tool in data engineering.
Understand how to perform unit, integration & end-to-end (E2E) tests using a combination of pytest and Airflow's DAG tests to validate your data pipelines.
Implement data quality tests using SODA to ensure your data meets business and technical requirements.
Learn to automate deployment pipelines using GitHub Actions to ensure smooth, continuous integration and delivery.
Requirements
At least 8 GB of RAM, though 16 GB is better for smoother performance
Python, Docker & Git installation to run/access the code course
Basic Python & SQL knowledge will be required
Knowledge of Docker & CI/CD is a plus but not necessary
Description
Data Engineering is the backbone of modern data-driven companies. To excel, you need experience with the tools and processes that power data pipelines in real-world environments. This course gives you practical, project-based learning with the following tools PostgreSQL, Python, Docker, Airflow, Postman, SODA and Github Actions. I will guide you as to how you can use these tools.What you will learn in the course:Python for Data Engineering: Build Python scripts for data extraction by interacting with APIs using Postman, loading into the data warehouse and transforming (ELT)SQL for Data Pipelines: Use PostgreSQL as a data warehouse. Interact with the data warehouse using both psql & DBeaverDocker for Containerized Deployments: Discover how to containerize data applications using Docker, making your data pipelines portable and easy to scale.Airflow for Workflow Automation: Master the basics of orchestrating and automating your data workflows with Apache Airflow, a must-have tool in data engineering.Testing and Data Quality Assurance: Understand how to perform unit, integration & end-to-end (E2E) tests using a combination of pytest and Airflow's DAG tests to validate your data pipelines. Implement data quality tests using SODA to ensure your data meets business and technical requirements.CI/CD for Automated Testing & Deployment: Learn to automate deployment pipelines using GitHub Actions to ensure smooth, continuous integration and delivery.
Overview
Section 1: Introduction
Lecture 1 Welcome!
Lecture 2 Prerequisties
Lecture 3 Tools Installation for Course - [IMPORTANT]
Lecture 4 Project Overview
Lecture 5 Building the Code
Lecture 6 APPENDIX
Section 2: Data Extraction using API
Lecture 7 Data Extraction Introduction
Lecture 8 What is an API
Lecture 9 Getting the Youtube API Key
Lecture 10 Google Cloud Shell
Lecture 11 Youtube API Explorer and Postman
Lecture 12 Setting Up Git Remote
Lecture 13 Create Virtual Environment
Lecture 14 Analysis of Data Extraction Variables
Lecture 15 Building the Videos Statistics script - Part 1 Playlist ID
Lecture 16 Introducing the .env
Lecture 17 Building the Videos Statistics script - Part 2 Unique Video IDs
Lecture 18 Building the Videos Statistics script - Part 3 Video Data
Lecture 19 Building the Videos Statistics script - Part 4 Save to JSON
Lecture 20 Put logs/ folder in .gitignore
Lecture 21 APPENDIX
Section 3: Docker
Lecture 22 Why Docker
Lecture 23 Dockerfile
Lecture 24 Build the Docker Image
Lecture 25 Airflow Architecture
Lecture 26 Airflow Directories
Lecture 27 .env file
Lecture 28 Amending the .env
Lecture 29 Current docker-compose.yaml
Lecture 30 Docker Compose
Lecture 31 docker commands
Lecture 32 Stopping Docker containers before shutting down laptop - [IMPORTANT]
Lecture 33 APPENDIX
Section 4: Airflow
Lecture 34 Airflow Introduction
Lecture 35 Refactoring of scripts to use Airflow
Lecture 36 APPENDIX
Section 5: Postgres Data Warehouse
Lecture 37 Postgres Data Warehouse Introduction
Lecture 38 Loading to Data Warehouse & Transformations
Lecture 39 Setting up Connection to Data Warehouse using Airflow
Lecture 40 Creating the Schemas and Tables
Lecture 41 Loading the JSON data
Lecture 42 Inserts, Updates & Deletes
Lecture 43 Transformations
Lecture 44 Populating Staging and Core Tables
Lecture 45 Defining the Data Warehouse DAG & Debugging
Lecture 46 Interacting with the Data Warehouse using Dbeaver
Lecture 47 APPENDIX
Section 6: Testing
Lecture 48 Testing Introduction
Lecture 49 Using Soda for Data Quality Tests
Lecture 50 Airflow Integration for DQ Tests
Lecture 51 Functional Tests Introduction
Lecture 52 Unit Tests
Lecture 53 Integration Tests
Lecture 54 End to End (E2E) Test
Lecture 55 DAGs Re-Structure
Lecture 56 APPENDIX
Section 7: CI/CD
Lecture 57 CI/CD Introduction
Lecture 58 Commit and Push
Lecture 59 CI-CD Part 1 - Docker Image Builds
Lecture 60 CI-CD Part 2 - Testing
Lecture 61 Github Actions Workflow Dispatch
Lecture 62 APPENDIX
Lecture 63 The End
Aspiring Data Engineers: If you're just starting out and want to learn Data Engineering by working with real tools and projects, this course will provide you with the foundational skills you need to start your career.,Beginner Data Professionals: If you have some experience as a Data Engineer/ Data Scientist but want to deepen your understanding of essential tools like Docker, CI/CD, and automated testing, this course will help you build on what you already know.,Data Enthusiasts: Those passionate about data and interested in getting practical, hands-on experience with the tools used by modern Data Engineers.