Apache Spark - Pyspark
Published 6/2023
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 10.70 GB | Duration: 19h 59m
Published 6/2023
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 10.70 GB | Duration: 19h 59m
PySpark
What you'll learn
Learners will understand the Apache Spark Foundation and Spark Architecture
How Apache Spark can be used in Data Engineering and Data Processing
Working with different Data Sources and types of Datasets
Working with Data Frames and PySpark
Use Python and Spark together to analyze Big Data
Learner will understand about PySpark RDD
PySpark DataFrames Actions and Transformation
Use of different file formats such as Parquet, JSON, CSV etc in building Data Engineering Pipelines
Requirements
Basic Knowledge of Python and SQL are necessary
Having a reliable internet connection and a strong desire to learn are essential prerequisites.
Description
Learn the latest Big Data technology, Apache Spark, and its collaboration with Python, one of the most popular programming languages. This comprehensive course covers everything from the basics to advanced levels of data analysis. Apache Spark is a highly sought-after technology in the Big Data analytics industry, with top companies like Google, Facebook, Netflix, Airbnb, Amazon, and NASA utilizing it to solve their data challenges. Its superior performance, up to 100 times faster than Hadoop MapReduce, has led to a surge in demand for professionals skilled in Spark.By mastering Spark and its DataFrame framework, which is relatively new and in high demand, you'll position yourself as a highly knowledgeable candidate in the job market.Throughout the course, you'll work with PySpark for data analysis, exploring Spark RDDs, DataFrames, and the various transformations and actions you can perform on data using them.In addition, the course covers essential topics such as Spark architecture, the Data Sources API, and the DataFrame API. You'll learn how to efficiently ingest CSV files, as well as simple and complex JSON files, into the data lake as parquet files or tables.The course also delves into important PySpark transformations, including filtering, joining, simple aggregations, groupBy operations. These transformations enable you to manipulate and analyze data effectively within PySpark.Furthermore, you'll gain expertise in creating local and temporary views, allowing you to organize and work with data more efficiently in PySpark.With a comprehensive coverage of topics ranging from Spark architecture to transformations, and view creation, this course equips you with the necessary skills to become a proficient PySpark Developer.With over 150 concise tutorial videos, this course provides a comprehensive understanding of the concepts and methodologies of PySpark. Whether you're aiming to become a PySpark Developer or enhance your Big Data skills, this course is a must-have.
Overview
Section 1: THE FUNDAMENTALS
Lecture 1 Data VS Information
Lecture 2 Data Storage and Processing
Lecture 3 Data Sources
Lecture 4 Big Data Introduction
Section 2: THE FOUNDATIONS OF BIG DATA
Lecture 5 Emergence of Big Data
Lecture 6 Basic Terminologies
Lecture 7 Central theme of Big Data
Lecture 8 Requirements of Programming Model
Lecture 9 Understand Distributed Processing through a Story
Section 3: ENVIRONMENT AND INSTALLATION
Lecture 10 Oracle_VirtualMachine_Installation
Lecture 11 How to install Ubuntu operating system on Virtual Box
Lecture 12 How to install PySpark on Ubuntu with Java and Python_3
Lecture 13 How to configure Pyspark with Pycharm_with_Installation
Lecture 14 Google Cloud Platform Setup
Section 4: HADOOP ECOSYSTEM
Lecture 15 Introduction to Hadoop Ecosystem
Section 5: PYTHON FOR PYSPARK
Lecture 16 INTRODUCTION TO PROGRAMMING
Lecture 17 Introduction to Python
Lecture 18 Environment for Python
Lecture 19 Executing Python Code
Lecture 20 Syntax, Indentation and Comments
Lecture 21 Syntax, Indentation and Comments - Practical
Lecture 22 Variables
Lecture 23 Variable Practical's
Lecture 24 Python Datatypes
Lecture 25 Python Datatypes Practical's
Lecture 26 Python Operator Concepts
Lecture 27 Python Operator Practical's
Lecture 28 Control Flows in Python
Lecture 29 Control Flows - IF ELSE Concepts
Lecture 30 If Else Practical
Lecture 31 Loops Theory
Lecture 32 Loops Practical
Lecture 33 Python Function Concepts
Lecture 34 Python Function Hands-on
Section 6: APACHE SPARK
Lecture 35 Why Spark?
Lecture 36 Advantages of Spark
Lecture 37 What is Spark?
Lecture 38 Components of Spark
Lecture 39 History of Spark
Section 7: OVERVIEW OF SPARK
Lecture 40 Architecture of Spark
Lecture 41 Spark Session
Lecture 42 Spark Session Terminal & Jupyter notebook Hands-On
Lecture 43 Spark Language API
Lecture 44 Dataframes and Partitions
Lecture 45 Spark Transformations
Lecture 46 Spark Actions
Section 8: STRUCTURED API OVERVIEW
Lecture 47 Structured APIs - Dataframes and Datasets
Lecture 48 Schema Definition
Lecture 49 Spark Types
Lecture 50 Structured API Execution
Section 9: OPERATIONS ON DATAFRAMES
Lecture 51 Dataframe Columns
Lecture 52 Columns as Expression
Lecture 53 Dataframe Rows
Lecture 54 Ways of Creating Dataframe
Lecture 55 Methods to Manipulate Columns
Lecture 56 DataFrame Transformations
Lecture 57 Dataframe Transformation - Columns
Lecture 58 Dataframe Transformations - Rows Part1
Lecture 59 Dataframe Transformation - Rows Part2
Section 10: WORKING WITH DIFFERENT TYPES OF DATABASE
Lecture 60 Introduction to working with Different Types of Data
Lecture 61 Working with Booleans
Lecture 62 Working with Strings
Lecture 63 Working with Strings Practical1
Lecture 64 Working with Strings Practical2
Lecture 65 Working with Date and Time Stamps
Lecture 66 Working with Null Concepts
Lecture 67 Working with Nulls Practicals
Lecture 68 Working with Complex Types
Lecture 69 Working with Complex types practical
Lecture 70 User Defined Functions - Concepts
Lecture 71 Working with Complex types practical
Section 11: CREATING DATAFRAMES FROM DIFFERENT SOURCES
Lecture 72 Data Sources Introduction
Lecture 73 Read-API- Data Sources
Lecture 74 Read-API-Practical
Lecture 75 Write-API-Data Sources
Lecture 76 Write-API-Practical
Lecture 77 Reading from CSV Files
Lecture 78 Writing into CSV Files
Lecture 79 Reading from JSON Files and Writing into JSON
Lecture 80 Reading from Parquet and writing into Parquet
Lecture 81 Reading from ORC and writing into ORC
Lecture 82 Unstructured Data - Text File - Reading and Writing
Lecture 83 Introduction to reading data from structured sources
Lecture 84 Reading data from structured sources - Database - Concepts
Lecture 85 Reading data from structured sources - Database - Practicals
Lecture 86 Query Pushdown Concepts
Lecture 87 Query Pushdown Praticals
Lecture 88 Writing into structured sources - Database - Concepts
Lecture 89 Writing into structured sources - Database - Practicals
Section 12: AGGREGATIONS
Lecture 90 Introduction to Aggregations
Lecture 91 Aggregataion Concepts - Count
Lecture 92 Aggregation_Practical-1-Count
Lecture 93 Aggregation Concepts - First, Sum and Average
Lecture 94 Aggregation - Practical 2 - First Last Average
Lecture 95 Aggregation-Practical-3-StatisticalFunctions
Lecture 96 Aggregation Concepts - Grouping
Lecture 97 Aggregation-Practical-4-GroupBy
Lecture 98 Aggregation Concepts - Window Functions
Lecture 99 Aggregation-Practical-5-WindowFunctions
Lecture 100 Aggregation Concepts - RollUp and Cube
Lecture 101 Aggregation-Practical-6-RollupandCube
Section 13: SPARK JOINS
Lecture 102 Spark Joins Theory-1-Introduction
Lecture 103 Spark Joins Theory-2-How Joins Work
Lecture 104 Spark Joins-Theory-3-Inner Joins
Lecture 105 Spark Joins -Practical -1-Innerjoins
Lecture 106 Saprk Joins - Theory-4 - Outer Joins
Lecture 107 Spark Joins -Practical - Outer Joins
Lecture 108 Spark Joins -Theory - 5-Left Semi & Anti Joins
Lecture 109 Spark Joins - Practical - Left Semi & Anti Joins
Lecture 110 Spark Joins -Theory -6-CrossJoin
Lecture 111 Spark Joins - Practical- Cross Joins
Lecture 112 Spark Joins -Theory -7-Challenges In Joins
Lecture 113 Spark Joins-5-Practical-Tackling the Challenges in Joins
Lecture 114 Spark Joins -Theory -8-Communication Strategies
Section 14: RESILIENT DISTRIBUTED DATASETS- RDDs
Lecture 115 What is an RDD ?
Lecture 116 Introduction to Low Level APIs
Lecture 117 Properties Of RDD
Lecture 118 When to use RDDs
Lecture 119 Creating RDDs
Lecture 120 RDD Practical-1-Creating RDDs
Lecture 121 RDD Lineage
Lecture 122 RDD Transformations
Lecture 123 RDD - Transformations Practical
Lecture 124 RDD Actions
Lecture 125 RDD Actions - Practical
Lecture 126 RDDT Saving To File
Lecture 127 RDD Saving to a File - Practical
Section 15: DISTRIBUTED VARIABLES
Lecture 128 Distributed Variables - Introduction
Lecture 129 Broadcast Variables
Lecture 130 Broadcast Variables - Practical
Lecture 131 Accumulators
Lecture 132 Accumulators - Practical
Section 16: HOW SPARK WORKS ON A CLUSTER
Lecture 133 Introduction
Lecture 134 How Spark runs on a Cluster - Cluster Manager
Lecture 135 How Spark runs on a Cluster - Execution Modes
Lecture 136 Life Cycle a Spark Application - Outside Spark
Lecture 137 Life Cycle of a Spark Application - Inside Spark
Computer Science or IT Students or other graduates with passion to get into IT,Data Warehouse Developers or Testers who want to transition to Data Engineering roles,Someone who is very familiar with another programming language and needs to learn Spark,Data Engineers,Data Scientists,Data Analysts, Database Developers