Mastering Apache Pyspark

Posted By: ELK1nG

Mastering Apache Pyspark
Published 4/2023
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 17.38 GB | Duration: 38h 10m

Mastering Apache PySpark

What you'll learn

1. PySpark Foundation

2. Python for Data Engineering

3. PySpark Core Programming – RDD Programming

4. SQL for Data Engineering

5. PySpark SQL Programming

6. AWS Foundation

7. Linux Essentials

8. PySpark Cluster Setup (AWS, Java, Scala, Python, MySQL, Apache Hadoop, Apache Hive, Apache Kafka, Apache Cassandra, Apache Spark etc..)

9. PySpark Integrations

Requirements

1. Python for Data Engineering

2. SQL for Data Engineering

3. Linux Essentials

4. Any Cloud Foundation

Note:- We are going to cover All above Pre-Requisites in this Mastering Apache PySpark

Description

About Data EngineeringData Engineering is nothing but processing the data depending upon our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Apache Spark is evolved as a leading technology to take care of Data Engineering at scale.I have prepared this course for anyone who would like to transition into a Data Engineer role using Pyspark (Python + Spark). I myself am a proven Data Engineering Solution Architect with proven experience in designing solutions using Apache PySpark.Let us go through the details about what you will be learning in this course. Keep in mind that the course is created with a lot of hands-on tasks which will give you enough practice using the right tools. Also, there are tons of tasks and exercises to evaluate yourself. We will provide details about Resources or Environments to learn Mastering PySpark 3 using Python 3.Mastering in Apache PySpark Developer/ProgrammerPySpark FoundationPython for Data EngineeringPySpark Core Programming – RDD ProgrammingSQL for Data EngineeringPySpark SQL ProgrammingAWS FoundationLinux EssentialsPySpark Cluster Setup (AWS, Java, Scala, Python, MySQL, Apache Hadoop, Apache Hive, Apache Kafka, Apache Cassandra, Apache Spark etc..)PySpark IntegrationsPySpark Integration with Apache HadoopPySpark Integration with Apache HivePySpark Integration with Any Cloud Filesystem like AWS S3PySpark Integration with Any RDBMS like MySQL, Oracle, PostgreSQL, etc..PySpark Integration with Any NoSQL like Apache Cassandra, MongoDB etc..PySpark Integration with Any Streaming Frameworks like Apache Kafka, etc..Etc..Any VCS (Version Control System) like Git, GitHub, GitLab, Bit Bucket etc..Who are Target Audience?· Any IT aspirant/professional willing to learn/Become Data Engineering using Apache Spark· Python Developers who want to learn Spark to add the key skill to be a Data Engineer· Scala based Data Engineers who would like to learn Spark using Python as Programming Language· Who are Freshers/Experienced – Who Wants to Become Data Engineers· Who are Programmers like Java, Scala, .Net, Python etc.. willing to learn/Become Data Engineering using Apache PySpark· Who are Database Developer/DBA willing to learn/Become Data Engineering using Apache PySpark· Who are Data Warehouse and Reporting People willing to learn/Become Data Engineering using Apache PySpark· Non-Programmers like Test Engineers etc.. willing to learn/Become Data Engineering using Apache PySparkByAkkem Sreenivasulu – Founder of CFAMILY IT

Overview

Section 1: Apache PySpark Programming Foundation

Lecture 1 PySpark Foundation Course Introduction

Lecture 2 Introduction to PySpark - How to Become Master in PySpark Developer

Lecture 3 Apache PySpark vs Apache Hadoop - Difference between Apache PySpark & Hadoop

Section 2: Python for Data Engineering

Lecture 4 Python for Data Engineering Introduction & Course Content

Lecture 5 Python for Data Engineering Part 1 - Python Programming Introduction

Lecture 6 Python for Data Engineering Part 2 - What are different ways to write Python

Lecture 7 Python for Data Engineering Part 3 - Python Installation on Windows Operating

Lecture 8 Python for Data Engineering Part 4 - Anaconda Python Installation

Lecture 9 Python for Data Engineering Part 5 - Python Editors & IDE Software’s

Lecture 10 Python for Data Engineering Part 6 - Python Indentation rules & Examples

Lecture 11 Python for Data Engineering Part 7 - Language Fundamentals Part 1 – Comments,

Lecture 12 Python for Data Engineering Part 8 - Language Fundamentals Part 2

Lecture 13 Python for Data Engineering Part 9 - Language Fundamentals Part 3

Lecture 14 Python for Data Engineering Part 10 - Language Fundamentals Part 4

Lecture 15 Python for Data Engineering Part 11 - Language Fundamentals Part 5

Lecture 16 Python for Data Engineering Part 12 - Python Flow Control Part 1

Lecture 17 Python for Data Engineering Part 13 - Python Flow Control Part 2

Lecture 18 Python for Data Engineering Part 14 - Python Flow Control Part 3

Lecture 19 Python for Data Engineering Part 15 - Python Flow Control Part 4

Lecture 20 Python for Data Engineering Part 16 - Python Modules, Packages & Libraries

Lecture 21 Python for Data Engineering Part 17 - Python Functions & Lambda Functions

Lecture 22 Python for Data Engineering Part 18 - Python Object Oriented Programming

Lecture 23 Python for Data Engineering in One Day

Section 3: Apache PySpark Core Programming - RDD Programming

Lecture 24 Apache PySpark Core Programming Introduction

Lecture 25 First PySpark Core Program in Script Mode using SparkContext and SparkSession

Lecture 26 First PySpark Core Program in Interactive Mode using SparkContext & SparkSession

Lecture 27 What is an RDD in PySpark Core Programming

Lecture 28 What is RDD Programming or What are RDD Operations

Lecture 29 How to create an RDD or Different ways of create an RDD

Lecture 30 What is map Transformation and write a PySpark example for map Transformation

Lecture 31 What is flatMap Transformation and write a PySpark example for flatMap

Lecture 32 What is filter Transformation and write a PySpark example for filter

Lecture 33 What is reduceByKey Transformation and write a PySpark Wordcount example

Lecture 34 Spark Web UI Introduction

Lecture 35 What is Application, Job, Stage and Task in Spark Programming

Lecture 36 How to set Configuration to PySpark Application

Lecture 37 Partitions in PySpark - Introduction, How Partitions are Created

Lecture 38 How to Increase or Decrease Partitions in PySpark - repartition and coalesce

Lecture 39 Persistence in PySpark - Introduction, When do we Persist an RDD in PySpark

Lecture 40 How to Persist an RDD in PySpark - What are StorageLevel Class and Different

Section 4: SQL for Data Engineering

Lecture 41 SQL for Data Engineering Introduction

Lecture 42 SQL Introduction - Structured Query Language Introduction

Lecture 43 SQL for Data Engineering Lab Setup - MySQL Server and MySQL Workbench Install

Lecture 44 Databases in SQL - What is Database, How to Create or delete or list database

Lecture 45 SQL Datatypes

Lecture 46 Tables in SQL - What is Table, How to Create or Describe or List table

Lecture 47 Inserting Data into Table in SQL

Lecture 48 Alter Table Definition in SQL O

Lecture 49 Delete, Truncate and Drop Tables in SQL

Lecture 50 Update Table Data in SQL O

Lecture 51 SQL Select Examples Lab1 O

Lecture 52 SQL Select Examples Lab2 O

Section 5: Apache PySpark SQL

Lecture 53 PySpark SQL Introduction - What is PySpark SQL Programming & Advantages

Lecture 54 PySpark DataFrame Introduction - What is DataFrame - How to Create DataFrame

Lecture 55 Create a DataFrame from CSV file using format function in PySpark

Lecture 56 Create a DataFrame from CSV file using csv function in PySpark SQL

Lecture 57 Custom in PySpark SQL with Examples - Different ways of Creating Custom Schema

Lecture 58 Create a DataFrame from JSON file using json function in PySpark SQL

Lecture 59 What DSL Queries in PySpark and How to write DSL Query in PySpark

Lecture 60 Write a DSL Query to display DataFrame Data in different ways in PySpark SQL

Lecture 61 Write DSL Query to Filter the Data of DataFrame in PySpark SQL Programming

Lecture 62 Write a DSL Query to Add new Columns to DataFrame in PySpark SQL Programming

Lecture 63 Write a DSL Query to replace value of a column of a DataFrame in PySpark SQL

Lecture 64 Type Casting in PySpark SQL Programming

Lecture 65 Rename Column Names of DataFrame in PySpark SQL Programming

Lecture 66 Drop Columns from DataFrame in PySpark SQL Programming

Lecture 67 Native SQL Introduction in PySpark SQL Programming

Lecture 68 Temporary Table in PySpark SQL Programming

Lecture 69 Permanent Table in PySpark SQL Programming

Section 6: Apache PySpark Streaming

Lecture 70 What is PySpark Streaming Programming

Lecture 71 Batch Processing vs Streaming Processing

Lecture 72 Important Point for PySpark Streaming Programming

Lecture 73 How to create Streaming Object in PySpark Streaming Programming

Lecture 74 Write a PySpark Application to read data from Network Ports or Sockets

Section 7: AWS Fundamentals for Data Engineering

Lecture 75 AWS Foundation Course Content

Lecture 76 Cloud Computing Intro and What is Cloud Computing, Benefits of Cloud Compute

Lecture 77 What is AWS - Amazon Web Services

Lecture 78 AWS Account Creation

Lecture 79 What is AWS Free Tier, and How do I use it

Lecture 80 AWS Services & Categories Introduction

Lecture 81 AWS Global Infrastructure

Lecture 82 What are AWS Clients - AWS Management Console, AWS Application Clients & AWS CLI

Lecture 83 AWS Certifications - AWS Foundational, Associate, Professional & Specialization

Section 8: Data Engineering Machine Setup on AWS

Lecture 84 PySpark Setup on AWS or PySpark Installation on AWS EC2 or PySpark Installation

Section 9: Linux Essentials for Data Engineering

Lecture 85 Linux Essentials Syllabus

Lecture 86 What is UNIX and UNIX History

Lecture 87 What is Linux

Lecture 88 What are Linux Features

Lecture 89 Components of UNIX or Linux Operating System - Linux Shell and Kernel

Lecture 90 Linux Filesystem - Introduction, Types and Hierarchy

Lecture 91 Linux Lab Setup - Virtualization Software Installation

Lecture 92 Linux Lab Setup - Linux Installation - Ubuntu Desktop Installation

Lecture 93 Linux Lab Setup - SSH Clients - Putty & MobaXterm

Lecture 94 Linux Basic Commands Part 1

Lecture 95 Linux Basic Commands Part 2

Lecture 96 Linux Basic Commands Part 3 - Directories in Linux

Lecture 97 Linux Basic Commands Part 4 - Files in Linux

Section 10: Apache PySpark Integrations

Lecture 98 RDBMS Integration Integration

Lecture 99 PySpark Integration with MySQL

Lecture 100 PySpark Integration with Hadoop

Lecture 101 PySpark Integration with Apache Hive

Lecture 102 Apache Hive Integration with PySpark Example 1

Lecture 103 Apache Hive Integration with PySpark Example 2

• Any IT aspirant/professional willing to learn/Become Data Engineering using Apache Spark,• Python Developers who want to learn Spark to add the key skill to be a Data Engineer,• Scala based Data Engineers who would like to learn Spark using Python as Programming Language,• Who are Freshers/Experienced – Who Wants to Become Data Engineers,• Who are Programmers like Java, Scala, .Net, Python etc.. willing to learn/Become Data Engineering using Apache PySpark,• Who are Database Developer/DBA willing to learn/Become Data Engineering using Apache PySpark,• Who are Data Warehouse and Reporting People willing to learn/Become Data Engineering using Apache PySpark,• Non-Programmers like Test Engineers etc.. willing to learn/Become Data Engineering using Apache PySpark