Mastering Apache Pyspark
Published 4/2023
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 17.38 GB | Duration: 38h 10m
Published 4/2023
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 17.38 GB | Duration: 38h 10m
Mastering Apache PySpark
What you'll learn
1. PySpark Foundation
2. Python for Data Engineering
3. PySpark Core Programming – RDD Programming
4. SQL for Data Engineering
5. PySpark SQL Programming
6. AWS Foundation
7. Linux Essentials
8. PySpark Cluster Setup (AWS, Java, Scala, Python, MySQL, Apache Hadoop, Apache Hive, Apache Kafka, Apache Cassandra, Apache Spark etc..)
9. PySpark Integrations
Requirements
1. Python for Data Engineering
2. SQL for Data Engineering
3. Linux Essentials
4. Any Cloud Foundation
Note:- We are going to cover All above Pre-Requisites in this Mastering Apache PySpark
Description
About Data EngineeringData Engineering is nothing but processing the data depending upon our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Apache Spark is evolved as a leading technology to take care of Data Engineering at scale.I have prepared this course for anyone who would like to transition into a Data Engineer role using Pyspark (Python + Spark). I myself am a proven Data Engineering Solution Architect with proven experience in designing solutions using Apache PySpark.Let us go through the details about what you will be learning in this course. Keep in mind that the course is created with a lot of hands-on tasks which will give you enough practice using the right tools. Also, there are tons of tasks and exercises to evaluate yourself. We will provide details about Resources or Environments to learn Mastering PySpark 3 using Python 3.Mastering in Apache PySpark Developer/ProgrammerPySpark FoundationPython for Data EngineeringPySpark Core Programming – RDD ProgrammingSQL for Data EngineeringPySpark SQL ProgrammingAWS FoundationLinux EssentialsPySpark Cluster Setup (AWS, Java, Scala, Python, MySQL, Apache Hadoop, Apache Hive, Apache Kafka, Apache Cassandra, Apache Spark etc..)PySpark IntegrationsPySpark Integration with Apache HadoopPySpark Integration with Apache HivePySpark Integration with Any Cloud Filesystem like AWS S3PySpark Integration with Any RDBMS like MySQL, Oracle, PostgreSQL, etc..PySpark Integration with Any NoSQL like Apache Cassandra, MongoDB etc..PySpark Integration with Any Streaming Frameworks like Apache Kafka, etc..Etc..Any VCS (Version Control System) like Git, GitHub, GitLab, Bit Bucket etc..Who are Target Audience?· Any IT aspirant/professional willing to learn/Become Data Engineering using Apache Spark· Python Developers who want to learn Spark to add the key skill to be a Data Engineer· Scala based Data Engineers who would like to learn Spark using Python as Programming Language· Who are Freshers/Experienced – Who Wants to Become Data Engineers· Who are Programmers like Java, Scala, .Net, Python etc.. willing to learn/Become Data Engineering using Apache PySpark· Who are Database Developer/DBA willing to learn/Become Data Engineering using Apache PySpark· Who are Data Warehouse and Reporting People willing to learn/Become Data Engineering using Apache PySpark· Non-Programmers like Test Engineers etc.. willing to learn/Become Data Engineering using Apache PySparkByAkkem Sreenivasulu – Founder of CFAMILY IT
Overview
Section 1: Apache PySpark Programming Foundation
Lecture 1 PySpark Foundation Course Introduction
Lecture 2 Introduction to PySpark - How to Become Master in PySpark Developer
Lecture 3 Apache PySpark vs Apache Hadoop - Difference between Apache PySpark & Hadoop
Section 2: Python for Data Engineering
Lecture 4 Python for Data Engineering Introduction & Course Content
Lecture 5 Python for Data Engineering Part 1 - Python Programming Introduction
Lecture 6 Python for Data Engineering Part 2 - What are different ways to write Python
Lecture 7 Python for Data Engineering Part 3 - Python Installation on Windows Operating
Lecture 8 Python for Data Engineering Part 4 - Anaconda Python Installation
Lecture 9 Python for Data Engineering Part 5 - Python Editors & IDE Software’s
Lecture 10 Python for Data Engineering Part 6 - Python Indentation rules & Examples
Lecture 11 Python for Data Engineering Part 7 - Language Fundamentals Part 1 – Comments,
Lecture 12 Python for Data Engineering Part 8 - Language Fundamentals Part 2
Lecture 13 Python for Data Engineering Part 9 - Language Fundamentals Part 3
Lecture 14 Python for Data Engineering Part 10 - Language Fundamentals Part 4
Lecture 15 Python for Data Engineering Part 11 - Language Fundamentals Part 5
Lecture 16 Python for Data Engineering Part 12 - Python Flow Control Part 1
Lecture 17 Python for Data Engineering Part 13 - Python Flow Control Part 2
Lecture 18 Python for Data Engineering Part 14 - Python Flow Control Part 3
Lecture 19 Python for Data Engineering Part 15 - Python Flow Control Part 4
Lecture 20 Python for Data Engineering Part 16 - Python Modules, Packages & Libraries
Lecture 21 Python for Data Engineering Part 17 - Python Functions & Lambda Functions
Lecture 22 Python for Data Engineering Part 18 - Python Object Oriented Programming
Lecture 23 Python for Data Engineering in One Day
Section 3: Apache PySpark Core Programming - RDD Programming
Lecture 24 Apache PySpark Core Programming Introduction
Lecture 25 First PySpark Core Program in Script Mode using SparkContext and SparkSession
Lecture 26 First PySpark Core Program in Interactive Mode using SparkContext & SparkSession
Lecture 27 What is an RDD in PySpark Core Programming
Lecture 28 What is RDD Programming or What are RDD Operations
Lecture 29 How to create an RDD or Different ways of create an RDD
Lecture 30 What is map Transformation and write a PySpark example for map Transformation
Lecture 31 What is flatMap Transformation and write a PySpark example for flatMap
Lecture 32 What is filter Transformation and write a PySpark example for filter
Lecture 33 What is reduceByKey Transformation and write a PySpark Wordcount example
Lecture 34 Spark Web UI Introduction
Lecture 35 What is Application, Job, Stage and Task in Spark Programming
Lecture 36 How to set Configuration to PySpark Application
Lecture 37 Partitions in PySpark - Introduction, How Partitions are Created
Lecture 38 How to Increase or Decrease Partitions in PySpark - repartition and coalesce
Lecture 39 Persistence in PySpark - Introduction, When do we Persist an RDD in PySpark
Lecture 40 How to Persist an RDD in PySpark - What are StorageLevel Class and Different
Section 4: SQL for Data Engineering
Lecture 41 SQL for Data Engineering Introduction
Lecture 42 SQL Introduction - Structured Query Language Introduction
Lecture 43 SQL for Data Engineering Lab Setup - MySQL Server and MySQL Workbench Install
Lecture 44 Databases in SQL - What is Database, How to Create or delete or list database
Lecture 45 SQL Datatypes
Lecture 46 Tables in SQL - What is Table, How to Create or Describe or List table
Lecture 47 Inserting Data into Table in SQL
Lecture 48 Alter Table Definition in SQL O
Lecture 49 Delete, Truncate and Drop Tables in SQL
Lecture 50 Update Table Data in SQL O
Lecture 51 SQL Select Examples Lab1 O
Lecture 52 SQL Select Examples Lab2 O
Section 5: Apache PySpark SQL
Lecture 53 PySpark SQL Introduction - What is PySpark SQL Programming & Advantages
Lecture 54 PySpark DataFrame Introduction - What is DataFrame - How to Create DataFrame
Lecture 55 Create a DataFrame from CSV file using format function in PySpark
Lecture 56 Create a DataFrame from CSV file using csv function in PySpark SQL
Lecture 57 Custom in PySpark SQL with Examples - Different ways of Creating Custom Schema
Lecture 58 Create a DataFrame from JSON file using json function in PySpark SQL
Lecture 59 What DSL Queries in PySpark and How to write DSL Query in PySpark
Lecture 60 Write a DSL Query to display DataFrame Data in different ways in PySpark SQL
Lecture 61 Write DSL Query to Filter the Data of DataFrame in PySpark SQL Programming
Lecture 62 Write a DSL Query to Add new Columns to DataFrame in PySpark SQL Programming
Lecture 63 Write a DSL Query to replace value of a column of a DataFrame in PySpark SQL
Lecture 64 Type Casting in PySpark SQL Programming
Lecture 65 Rename Column Names of DataFrame in PySpark SQL Programming
Lecture 66 Drop Columns from DataFrame in PySpark SQL Programming
Lecture 67 Native SQL Introduction in PySpark SQL Programming
Lecture 68 Temporary Table in PySpark SQL Programming
Lecture 69 Permanent Table in PySpark SQL Programming
Section 6: Apache PySpark Streaming
Lecture 70 What is PySpark Streaming Programming
Lecture 71 Batch Processing vs Streaming Processing
Lecture 72 Important Point for PySpark Streaming Programming
Lecture 73 How to create Streaming Object in PySpark Streaming Programming
Lecture 74 Write a PySpark Application to read data from Network Ports or Sockets
Section 7: AWS Fundamentals for Data Engineering
Lecture 75 AWS Foundation Course Content
Lecture 76 Cloud Computing Intro and What is Cloud Computing, Benefits of Cloud Compute
Lecture 77 What is AWS - Amazon Web Services
Lecture 78 AWS Account Creation
Lecture 79 What is AWS Free Tier, and How do I use it
Lecture 80 AWS Services & Categories Introduction
Lecture 81 AWS Global Infrastructure
Lecture 82 What are AWS Clients - AWS Management Console, AWS Application Clients & AWS CLI
Lecture 83 AWS Certifications - AWS Foundational, Associate, Professional & Specialization
Section 8: Data Engineering Machine Setup on AWS
Lecture 84 PySpark Setup on AWS or PySpark Installation on AWS EC2 or PySpark Installation
Section 9: Linux Essentials for Data Engineering
Lecture 85 Linux Essentials Syllabus
Lecture 86 What is UNIX and UNIX History
Lecture 87 What is Linux
Lecture 88 What are Linux Features
Lecture 89 Components of UNIX or Linux Operating System - Linux Shell and Kernel
Lecture 90 Linux Filesystem - Introduction, Types and Hierarchy
Lecture 91 Linux Lab Setup - Virtualization Software Installation
Lecture 92 Linux Lab Setup - Linux Installation - Ubuntu Desktop Installation
Lecture 93 Linux Lab Setup - SSH Clients - Putty & MobaXterm
Lecture 94 Linux Basic Commands Part 1
Lecture 95 Linux Basic Commands Part 2
Lecture 96 Linux Basic Commands Part 3 - Directories in Linux
Lecture 97 Linux Basic Commands Part 4 - Files in Linux
Section 10: Apache PySpark Integrations
Lecture 98 RDBMS Integration Integration
Lecture 99 PySpark Integration with MySQL
Lecture 100 PySpark Integration with Hadoop
Lecture 101 PySpark Integration with Apache Hive
Lecture 102 Apache Hive Integration with PySpark Example 1
Lecture 103 Apache Hive Integration with PySpark Example 2
• Any IT aspirant/professional willing to learn/Become Data Engineering using Apache Spark,• Python Developers who want to learn Spark to add the key skill to be a Data Engineer,• Scala based Data Engineers who would like to learn Spark using Python as Programming Language,• Who are Freshers/Experienced – Who Wants to Become Data Engineers,• Who are Programmers like Java, Scala, .Net, Python etc.. willing to learn/Become Data Engineering using Apache PySpark,• Who are Database Developer/DBA willing to learn/Become Data Engineering using Apache PySpark,• Who are Data Warehouse and Reporting People willing to learn/Become Data Engineering using Apache PySpark,• Non-Programmers like Test Engineers etc.. willing to learn/Become Data Engineering using Apache PySpark