Tags
Language
Tags
June 2025
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 1 2 3 4 5
    Attention❗ To save your time, in order to download anything on this site, you must be registered 👉 HERE. If you do not have a registration yet, it is better to do it right away. ✌

    ( • )( • ) ( ͡⚆ ͜ʖ ͡⚆ ) (‿ˠ‿)
    SpicyMags.xyz

    Data Engineering Essentials using SQL, Python, and PySpark

    Posted By: lucky_aut
    Data Engineering Essentials using SQL, Python, and PySpark

    Data Engineering Essentials using SQL, Python, and PySpark
    Last updated 7/2023
    Duration: 56h | .MP4 1280x720, 30 fps(r) | AAC, 44100 Hz, 2ch | 14.9 GB
    Genre: eLearning | Language: English

    Learn key Data Engineering Skills such as SQL, Python, Apache Spark (Spark SQL and Pyspark) with Exercises and Projects

    What you'll learn
    Setup Environment to learn SQL and Python essentials for Data Engineering
    Database Essentials for Data Engineering using Postgres such as creating tables, indexes, running SQL Queries, using important pre-defined functions, etc.
    Data Engineering Programming Essentials using Python such as basic programming constructs, collections, Pandas, Database Programming, etc.
    Data Engineering using Spark Dataframe APIs (PySpark) using Databricks. Learn all important Spark Data Frame APIs such as select, filter, groupBy, orderBy, etc.
    Data Engineering using Spark SQL (PySpark and Spark SQL). Learn how to write high quality Spark SQL queries using SELECT, WHERE, GROUP BY, ORDER BY, ETC.
    Relevance of Spark Metastore and integration of Dataframes and Spark SQL
    Ability to build Data Engineering Pipelines using Spark leveraging Python as Programming Language
    Use of different file formats such as Parquet, JSON, CSV etc in building Data Engineering Pipelines
    Setup Hadoop and Spark Cluster on GCP using Dataproc
    Understanding Complete Spark Application Development Life Cycle to build Spark Applications using Pyspark. Review the applications using Spark UI.
    Requirements
    Laptop with decent configuration (Minimum 4 GB RAM and Dual Core)
    Sign up for GCP with the available credit or AWS Access
    Setup self support lab on cloud platforms (you might have to pay the applicable cloud fee unless you have credit)
    CS or IT degree or prior IT experience is highly desired
    Description
    As part of this course, you will learn all the Data Engineering Essentials related to building Data Pipelines using SQL, Python as Hadoop, Hive, or Spark SQL as well as PySpark Data Frame APIs. You will also understand the development and deployment lifecycle of Python applications using Docker as well as PySpark on multinode clusters. You will also gain basic knowledge about reviewing Spark Jobs using Spark UI.
    About Data Engineering
    Data Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc.
    Here are
    some of the challenges the learners have to face
    to learn
    key Data Engineering Skills such as Python, SQL, PySpark, etc
    .
    Having an appropriate environment with Apache Hadoop, Apache Spark, Apache Hive, etc working together.
    Good quality content with proper support.
    Enough tasks and exercises for practice
    This course is designed to address these key challenges for professionals at all levels to acquire the required
    Data Engineering Skills (Python, SQL, and Apache Spark)
    .
    Setup Environment to learn Data Engineering Essentials such as SQL (using Postgres), Python, etc.
    Setup required tables in Postgres to practice SQL
    Writing basic SQL Queries with practical examples using WHERE, JOIN, GROUP BY, HAVING, ORDER BY, etc
    Advanced SQL Queries with practical examples such as cumulative aggregations, ranking, etc
    Scenarios covering troubleshooting and debugging related to Databases.
    Performance Tuning of SQL Queries
    Exercises and Solutions for SQL Queries.
    Basics of Programming using Python as Programming Language
    Python Collections for Data Engineering
    Data Processing or Data Engineering using Pandas
    2 Real Time Python Projects with explanations (File Format Converter and Database Loader)
    Scenarios covering troubleshooting and debugging in Python Applications
    Performance Tuning Scenarios related to Data Engineering Applications using Python
    Getting Started with Google Cloud Platform to setup Spark Environment using Databricks
    Writing Basic Spark SQL Queries with practical examples using WHERE, JOIN, GROUP BY, HAVING, ORDER BY, etc
    Creating Delta Tables in Spark SQL along with CRUD Operations such as INSERT, UPDATE, DELETE, MERGE, etc
    Advanced Spark SQL Queries with practical examples such as ranking
    Integration of Spark SQL and Pyspark
    In-depth coverage of Apache Spark Catalyst Optimizer for Performance Tuning
    Reading Explain Plans of Spark SQL Queries or Pyspark Data Frame APIs
    In-depth coverage of columnar file formats and Performance tuning using Partitioning
    Who this course is for:
    Computer Science or IT Students or other graduates with passion to get into IT
    Data Warehouse Developers who want to transition to Data Engineering roles
    ETL Developers who want to transition to Data Engineering roles
    Database or PL/SQL Developers who want to transition to Data Engineering roles
    BI Developers who want to transition to Data Engineering roles
    QA Engineers to learn about Data Engineering
    Application Developers to gain Data Engineering Skills

    More Info