Apache Spark - Pyspark

Posted By: ELK1nG

Apache Spark - Pyspark
Published 6/2023
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 10.70 GB | Duration: 19h 59m

PySpark

What you'll learn

Learners will understand the Apache Spark Foundation and Spark Architecture

How Apache Spark can be used in Data Engineering and Data Processing

Working with different Data Sources and types of Datasets

Working with Data Frames and PySpark

Use Python and Spark together to analyze Big Data

Learner will understand about PySpark RDD

PySpark DataFrames Actions and Transformation

Use of different file formats such as Parquet, JSON, CSV etc in building Data Engineering Pipelines

Requirements

Basic Knowledge of Python and SQL are necessary

Having a reliable internet connection and a strong desire to learn are essential prerequisites.

Description

Learn the latest Big Data technology, Apache Spark, and its collaboration with Python, one of the most popular programming languages. This comprehensive course covers everything from the basics to advanced levels of data analysis. Apache Spark is a highly sought-after technology in the Big Data analytics industry, with top companies like Google, Facebook, Netflix, Airbnb, Amazon, and NASA utilizing it to solve their data challenges. Its superior performance, up to 100 times faster than Hadoop MapReduce, has led to a surge in demand for professionals skilled in Spark.By mastering Spark and its DataFrame framework, which is relatively new and in high demand, you'll position yourself as a highly knowledgeable candidate in the job market.Throughout the course, you'll work with PySpark for data analysis, exploring Spark RDDs, DataFrames, and the various transformations and actions you can perform on data using them.In addition, the course covers essential topics such as Spark architecture, the Data Sources API, and the DataFrame API. You'll learn how to efficiently ingest CSV files, as well as simple and complex JSON files, into the data lake as parquet files or tables.The course also delves into important PySpark transformations, including filtering, joining, simple aggregations, groupBy operations. These transformations enable you to manipulate and analyze data effectively within PySpark.Furthermore, you'll gain expertise in creating local and temporary views, allowing you to organize and work with data more efficiently in PySpark.With a comprehensive coverage of topics ranging from Spark architecture to transformations, and view creation, this course equips you with the necessary skills to become a proficient PySpark Developer.With over 150 concise tutorial videos, this course provides a comprehensive understanding of the concepts and methodologies of PySpark. Whether you're aiming to become a PySpark Developer or enhance your Big Data skills, this course is a must-have.

Overview

Section 1: THE FUNDAMENTALS

Lecture 1 Data VS Information

Lecture 2 Data Storage and Processing

Lecture 3 Data Sources

Lecture 4 Big Data Introduction

Section 2: THE FOUNDATIONS OF BIG DATA

Lecture 5 Emergence of Big Data

Lecture 6 Basic Terminologies

Lecture 7 Central theme of Big Data

Lecture 8 Requirements of Programming Model

Lecture 9 Understand Distributed Processing through a Story

Section 3: ENVIRONMENT AND INSTALLATION

Lecture 10 Oracle_VirtualMachine_Installation

Lecture 11 How to install Ubuntu operating system on Virtual Box

Lecture 12 How to install PySpark on Ubuntu with Java and Python_3

Lecture 13 How to configure Pyspark with Pycharm_with_Installation

Lecture 14 Google Cloud Platform Setup

Section 4: HADOOP ECOSYSTEM

Lecture 15 Introduction to Hadoop Ecosystem

Section 5: PYTHON FOR PYSPARK

Lecture 16 INTRODUCTION TO PROGRAMMING

Lecture 17 Introduction to Python

Lecture 18 Environment for Python

Lecture 19 Executing Python Code

Lecture 20 Syntax, Indentation and Comments

Lecture 21 Syntax, Indentation and Comments - Practical

Lecture 22 Variables

Lecture 23 Variable Practical's

Lecture 24 Python Datatypes

Lecture 25 Python Datatypes Practical's

Lecture 26 Python Operator Concepts

Lecture 27 Python Operator Practical's

Lecture 28 Control Flows in Python

Lecture 29 Control Flows - IF ELSE Concepts

Lecture 30 If Else Practical

Lecture 31 Loops Theory

Lecture 32 Loops Practical

Lecture 33 Python Function Concepts

Lecture 34 Python Function Hands-on

Section 6: APACHE SPARK

Lecture 35 Why Spark?

Lecture 36 Advantages of Spark

Lecture 37 What is Spark?

Lecture 38 Components of Spark

Lecture 39 History of Spark

Section 7: OVERVIEW OF SPARK

Lecture 40 Architecture of Spark

Lecture 41 Spark Session

Lecture 42 Spark Session Terminal & Jupyter notebook Hands-On

Lecture 43 Spark Language API

Lecture 44 Dataframes and Partitions

Lecture 45 Spark Transformations

Lecture 46 Spark Actions

Section 8: STRUCTURED API OVERVIEW

Lecture 47 Structured APIs - Dataframes and Datasets

Lecture 48 Schema Definition

Lecture 49 Spark Types

Lecture 50 Structured API Execution

Section 9: OPERATIONS ON DATAFRAMES

Lecture 51 Dataframe Columns

Lecture 52 Columns as Expression

Lecture 53 Dataframe Rows

Lecture 54 Ways of Creating Dataframe

Lecture 55 Methods to Manipulate Columns

Lecture 56 DataFrame Transformations

Lecture 57 Dataframe Transformation - Columns

Lecture 58 Dataframe Transformations - Rows Part1

Lecture 59 Dataframe Transformation - Rows Part2

Section 10: WORKING WITH DIFFERENT TYPES OF DATABASE

Lecture 60 Introduction to working with Different Types of Data

Lecture 61 Working with Booleans

Lecture 62 Working with Strings

Lecture 63 Working with Strings Practical1

Lecture 64 Working with Strings Practical2

Lecture 65 Working with Date and Time Stamps

Lecture 66 Working with Null Concepts

Lecture 67 Working with Nulls Practicals

Lecture 68 Working with Complex Types

Lecture 69 Working with Complex types practical

Lecture 70 User Defined Functions - Concepts

Lecture 71 Working with Complex types practical

Section 11: CREATING DATAFRAMES FROM DIFFERENT SOURCES

Lecture 72 Data Sources Introduction

Lecture 73 Read-API- Data Sources

Lecture 74 Read-API-Practical

Lecture 75 Write-API-Data Sources

Lecture 76 Write-API-Practical

Lecture 77 Reading from CSV Files

Lecture 78 Writing into CSV Files

Lecture 79 Reading from JSON Files and Writing into JSON

Lecture 80 Reading from Parquet and writing into Parquet

Lecture 81 Reading from ORC and writing into ORC

Lecture 82 Unstructured Data - Text File - Reading and Writing

Lecture 83 Introduction to reading data from structured sources

Lecture 84 Reading data from structured sources - Database - Concepts

Lecture 85 Reading data from structured sources - Database - Practicals

Lecture 86 Query Pushdown Concepts

Lecture 87 Query Pushdown Praticals

Lecture 88 Writing into structured sources - Database - Concepts

Lecture 89 Writing into structured sources - Database - Practicals

Section 12: AGGREGATIONS

Lecture 90 Introduction to Aggregations

Lecture 91 Aggregataion Concepts - Count

Lecture 92 Aggregation_Practical-1-Count

Lecture 93 Aggregation Concepts - First, Sum and Average

Lecture 94 Aggregation - Practical 2 - First Last Average

Lecture 95 Aggregation-Practical-3-StatisticalFunctions

Lecture 96 Aggregation Concepts - Grouping

Lecture 97 Aggregation-Practical-4-GroupBy

Lecture 98 Aggregation Concepts - Window Functions

Lecture 99 Aggregation-Practical-5-WindowFunctions

Lecture 100 Aggregation Concepts - RollUp and Cube

Lecture 101 Aggregation-Practical-6-RollupandCube

Section 13: SPARK JOINS

Lecture 102 Spark Joins Theory-1-Introduction

Lecture 103 Spark Joins Theory-2-How Joins Work

Lecture 104 Spark Joins-Theory-3-Inner Joins

Lecture 105 Spark Joins -Practical -1-Innerjoins

Lecture 106 Saprk Joins - Theory-4 - Outer Joins

Lecture 107 Spark Joins -Practical - Outer Joins

Lecture 108 Spark Joins -Theory - 5-Left Semi & Anti Joins

Lecture 109 Spark Joins - Practical - Left Semi & Anti Joins

Lecture 110 Spark Joins -Theory -6-CrossJoin

Lecture 111 Spark Joins - Practical- Cross Joins

Lecture 112 Spark Joins -Theory -7-Challenges In Joins

Lecture 113 Spark Joins-5-Practical-Tackling the Challenges in Joins

Lecture 114 Spark Joins -Theory -8-Communication Strategies

Section 14: RESILIENT DISTRIBUTED DATASETS- RDDs

Lecture 115 What is an RDD ?

Lecture 116 Introduction to Low Level APIs

Lecture 117 Properties Of RDD

Lecture 118 When to use RDDs

Lecture 119 Creating RDDs

Lecture 120 RDD Practical-1-Creating RDDs

Lecture 121 RDD Lineage

Lecture 122 RDD Transformations

Lecture 123 RDD - Transformations Practical

Lecture 124 RDD Actions

Lecture 125 RDD Actions - Practical

Lecture 126 RDDT Saving To File

Lecture 127 RDD Saving to a File - Practical

Section 15: DISTRIBUTED VARIABLES

Lecture 128 Distributed Variables - Introduction

Lecture 129 Broadcast Variables

Lecture 130 Broadcast Variables - Practical

Lecture 131 Accumulators

Lecture 132 Accumulators - Practical

Section 16: HOW SPARK WORKS ON A CLUSTER

Lecture 133 Introduction

Lecture 134 How Spark runs on a Cluster - Cluster Manager

Lecture 135 How Spark runs on a Cluster - Execution Modes

Lecture 136 Life Cycle a Spark Application - Outside Spark

Lecture 137 Life Cycle of a Spark Application - Inside Spark

Computer Science or IT Students or other graduates with passion to get into IT,Data Warehouse Developers or Testers who want to transition to Data Engineering roles,Someone who is very familiar with another programming language and needs to learn Spark,Data Engineers,Data Scientists,Data Analysts, Database Developers