Apache Iceberg: Complete Hands-On Masterclass
Published 7/2025
Duration: 3h 22m | .MP4 1920x1080 30 fps(r) | AAC, 44100 Hz, 2ch | 2.07 GB
Genre: eLearning | Language: English
Published 7/2025
Duration: 3h 22m | .MP4 1920x1080 30 fps(r) | AAC, 44100 Hz, 2ch | 2.07 GB
Genre: eLearning | Language: English
Master Apache Iceberg with hands-on labs using PySpark, Databricks, and Google Colab—no setup or data needed
What you'll learn
- Learn why Apache Iceberg is redefining modern data lakes and how it overcomes limitations of Hive, Delta Lake, and Hudi.
- Set up Iceberg in Databricks & Google Colab—no local install or cloud budget needed—and start building hands-on from your browser.
- Perform DDL and DML operations (insert, update, delete) on Iceberg tables and explore internal metadata structures such as snapshots, manifests, and partitions.
- Master Iceberg’s time travel and metadata tables to build scalable, version-controlled, cost-efficient data pipelines.
- Understand Iceberg’s architecture under the hood—how it handles schema, partition evolution, and decouples compute from storage.
- Explore real-world debugging, rollback, and auditing use cases using Iceberg’s powerful snapshot and metadata tracking system.
Requirements
- Basic understanding of Python programming is helpful.
- Some familiarity with PySpark and big data file formats like Parquet is recommended.
- Familiarity with file formats like Parquet or ORC is useful but not mandatory.
- No need for cloud subscriptions or local installations — Databricks Community Edition and Google Colab are used (both are free).
Description
This course offers a practical, hands-on introduction toApache Iceberg, the modern open table format designed for today’s large-scale data lakes and lakehouses. Whether you’re a data engineer, developer, or architect, this course will help you understand and apply Iceberg concepts through real-world exercises—without the need for any infrastructure setup.
You’ll learn to create, query, and manage Iceberg tables usingPySparkin bothDatabricks Community EditionandGoogle Colab—two free platforms accessible from your browser. We cover everything from understanding table formats, DDL and DML operations, partition evolution, schema evolution, metadata tables, and Iceberg’s powerfultime travelcapability.
All code and sample data are provided chapter by chapter. You’ll generate data on the fly, inspect table structures, and compare metadata files usingVS Codeand online JSON viewers. No local installation, no external datasets—just clear, interactive learning.
What You’ll Learn
Key differences between file formats and table formats in big data
How to create and manage Apache Iceberg tables using PySpark
Comparing Hive tables and Iceberg with practical demos
Running Iceberg on Databricks and Google Colab (setup included)
Performing DDL and DML operations (insert, update, delete)
Using Iceberg’s built-in metadata tables to inspect file-level and snapshot info
Implementingtime travelto query historical data versions
Understanding how Iceberg handles schema evolution and partition changes
Comparing Iceberg with Delta Lake and Hudi in practical scenarios
By the end of the course, you’ll have a strong working knowledge of Apache Iceberg and be ready to use it in real-world data projects with confidence.
Who this course is for:
- This course is designed for data engineers, data architects, and developers who work with large-scale data pipelines and are looking to modernize their data lake or lakehouse architecture. If you’re currently using Hive, Delta Lake, or Hudi and want to explore a more flexible, scalable, and engine-agnostic table format, this course is for you.
- It’s also ideal for those who want to gain hands-on experience with Apache Iceberg using free tools like Databricks Community Edition and Google Colab — without needing complex infrastructure or cloud setup.
- Whether you’re building batch or streaming pipelines, working on schema evolution, or just want to understand Iceberg’s time travel and metadata capabilities, this course will help you build practical skills with real-world applications.
- No prior experience with Iceberg is required — just a willingness to learn and some basic familiarity with Python or PySpark.
More Info