PySpark Essential Training: Introduction to Building Data Pipelines

Posted By: lucky_aut

PySpark Essential Training: Introduction to Building Data Pipelines
Released: 08/2025
Duration: 1h 18m | .MP4 1280x720, 30 fps(r) | AAC, 48000 Hz, 2ch | 146.46 MB
Genre: eLearning | Language: English


PySpark is a powerful library that brings Apache Spark’s distributed computing capabilities to Python, making it a key tool for processing large-scale data efficiently. In this course, data engineer and analyst Sam Bail provides a structured and hands-on introduction to PySpark, starting with an overview of Apache Spark, its architecture, and its ecosystem. Learn about Spark’s core concepts, such as the DataFrame API, transformations, lazy evaluations, and actions, before setting up a lab environment and working with a real dataset. Plus, gain insights into how PySpark fits into a broader data engineering ecosystem and best practices on running PySpark in a production environment.
More Info

Please check out others courses in your favourite language and bookmark them
English - German - Spanish - French - Italian
Portuguese