Start Your Data Engineering Journey: Project Based Learning

Posted By: ELK1nG

Start Your Data Engineering Journey: Project Based Learning
Last updated 8/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.58 GB | Duration: 5h 11m

Learn By Doing with APIs, SQL, Python, Docker, Airflow, CI/CD, Functional & Data Quality Tests and More!

What you'll learn

Build Python scripts for data extraction by interacting with APIs using Postman, loading into the data warehouse and transforming (ELT)

Use PostgreSQL as a data warehouse. Interact with the data warehouse using both psql & DBeaver

Discover how to containerize data applications using Docker, making your data pipelines portable and easy to scale.

Master the basics of orchestrating and automating your data workflows with Apache Airflow, a must-have tool in data engineering.

Understand how to perform unit, integration & end-to-end (E2E) tests using a combination of pytest and Airflow's DAG tests to validate your data pipelines.

Implement data quality tests using SODA to ensure your data meets business and technical requirements.

Learn to automate deployment pipelines using GitHub Actions to ensure smooth, continuous integration and delivery.

Requirements

At least 8 GB of RAM, though 16 GB is better for smoother performance

Python, Docker & Git installation to run/access the code course

Basic Python & SQL knowledge will be required

Knowledge of Docker & CI/CD is a plus but not necessary

Description

Data Engineering is the backbone of modern data-driven companies. To excel, you need experience with the tools and processes that power data pipelines in real-world environments. This course gives you practical, project-based learning with the following tools PostgreSQL, Python, Docker, Airflow, Postman, SODA and Github Actions. I will guide you as to how you can use these tools.What you will learn in the course:Python for Data Engineering: Build Python scripts for data extraction by interacting with APIs using Postman, loading into the data warehouse and transforming (ELT)SQL for Data Pipelines: Use PostgreSQL as a data warehouse. Interact with the data warehouse using both psql & DBeaverDocker for Containerized Deployments: Discover how to containerize data applications using Docker, making your data pipelines portable and easy to scale.Airflow for Workflow Automation: Master the basics of orchestrating and automating your data workflows with Apache Airflow, a must-have tool in data engineering.Testing and Data Quality Assurance: Understand how to perform unit, integration & end-to-end (E2E) tests using a combination of pytest and Airflow's DAG tests to validate your data pipelines. Implement data quality tests using SODA to ensure your data meets business and technical requirements.CI/CD for Automated Testing & Deployment: Learn to automate deployment pipelines using GitHub Actions to ensure smooth, continuous integration and delivery.

Overview

Section 1: Introduction

Lecture 1 Welcome!

Lecture 2 Prerequisties

Lecture 3 Tools Installation for Course - [IMPORTANT]

Lecture 4 Project Overview

Lecture 5 Building the Code

Lecture 6 APPENDIX

Section 2: Data Extraction using API

Lecture 7 Data Extraction Introduction

Lecture 8 What is an API

Lecture 9 Getting the Youtube API Key

Lecture 10 Google Cloud Shell

Lecture 11 Youtube API Explorer and Postman

Lecture 12 Setting Up Git Remote

Lecture 13 Create Virtual Environment

Lecture 14 Analysis of Data Extraction Variables

Lecture 15 Building the Videos Statistics script - Part 1 Playlist ID

Lecture 16 Introducing the .env

Lecture 17 Building the Videos Statistics script - Part 2 Unique Video IDs

Lecture 18 Building the Videos Statistics script - Part 3 Video Data

Lecture 19 Building the Videos Statistics script - Part 4 Save to JSON

Lecture 20 Put logs/ folder in .gitignore

Lecture 21 APPENDIX

Section 3: Docker

Lecture 22 Why Docker

Lecture 23 Dockerfile

Lecture 24 Build the Docker Image

Lecture 25 Airflow Architecture

Lecture 26 Airflow Directories

Lecture 27 .env file

Lecture 28 Amending the .env

Lecture 29 Current docker-compose.yaml

Lecture 30 Docker Compose

Lecture 31 docker commands

Lecture 32 Stopping Docker containers before shutting down laptop - [IMPORTANT]

Lecture 33 APPENDIX

Section 4: Airflow

Lecture 34 Airflow Introduction

Lecture 35 Refactoring of scripts to use Airflow

Lecture 36 APPENDIX

Section 5: Postgres Data Warehouse

Lecture 37 Postgres Data Warehouse Introduction

Lecture 38 Loading to Data Warehouse & Transformations

Lecture 39 Setting up Connection to Data Warehouse using Airflow

Lecture 40 Creating the Schemas and Tables

Lecture 41 Loading the JSON data

Lecture 42 Inserts, Updates & Deletes

Lecture 43 Transformations

Lecture 44 Populating Staging and Core Tables

Lecture 45 Defining the Data Warehouse DAG & Debugging

Lecture 46 Interacting with the Data Warehouse using Dbeaver

Lecture 47 APPENDIX

Section 6: Testing

Lecture 48 Testing Introduction

Lecture 49 Using Soda for Data Quality Tests

Lecture 50 Airflow Integration for DQ Tests

Lecture 51 Functional Tests Introduction

Lecture 52 Unit Tests

Lecture 53 Integration Tests

Lecture 54 End to End (E2E) Test

Lecture 55 DAGs Re-Structure

Lecture 56 APPENDIX

Section 7: CI/CD

Lecture 57 CI/CD Introduction

Lecture 58 Commit and Push

Lecture 59 CI-CD Part 1 - Docker Image Builds

Lecture 60 CI-CD Part 2 - Testing

Lecture 61 Github Actions Workflow Dispatch

Lecture 62 APPENDIX

Lecture 63 The End

Aspiring Data Engineers: If you're just starting out and want to learn Data Engineering by working with real tools and projects, this course will provide you with the foundational skills you need to start your career.,Beginner Data Professionals: If you have some experience as a Data Engineer/ Data Scientist but want to deepen your understanding of essential tools like Docker, CI/CD, and automated testing, this course will help you build on what you already know.,Data Enthusiasts: Those passionate about data and interested in getting practical, hands-on experience with the tools used by modern Data Engineers.