Tags
Language
Tags
June 2025
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 1 2 3 4 5
    Attention❗ To save your time, in order to download anything on this site, you must be registered 👉 HERE. If you do not have a registration yet, it is better to do it right away. ✌

    ( • )( • ) ( ͡⚆ ͜ʖ ͡⚆ ) (‿ˠ‿)
    SpicyMags.xyz

    Spark Sql And Pyspark 3 Using Python 3 Hands-On With Labs

    Posted By: ELK1nG
    Spark Sql And Pyspark 3 Using Python 3 Hands-On With Labs

    Spark Sql And Pyspark 3 Using Python 3 Hands-On With Labs
    Last updated 8/2022
    MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
    Language: English | Size: 10.03 GB | Duration: 32h 12m

    A Comprehensive Course on Spark SQL as well as Data Frame APIs using Python 3 with complementary lab access

    What you'll learn
    Setup the Single Node Hadoop and Spark using Docker locally or on AWS Cloud9
    Review ITVersity Labs (exclusively for ITVersity Lab Customers)
    All the HDFS Commands that are relevant to validate files and folders in HDFS.
    Quick recap of Python which is relevant to learn Spark
    Ability to use Spark SQL to solve the problems using SQL style syntax.
    Pyspark Dataframe APIs to solve the problems using Dataframe style APIs.
    Relevance of Spark Metastore to convert Dataframs into Temporary Views so that one can process data in Dataframes using Spark SQL.
    Apache Spark Application Development Life Cycle
    Apache Spark Application Execution Life Cycle and Spark UI
    Setup SSH Proxy to access Spark Application logs
    Deployment Modes of Spark Applications (Cluster and Client)
    Passing Application Properties Files and External Dependencies while running Spark Applications
    Requirements
    Basic programming skills using any programming language
    Self support lab (Instructions provided) or ITVersity lab at additional cost for appropriate environment.
    Minimum memory required based on the environment you are using with 64 bit operating system
    4 GB RAM with access to proper clusters or 16 GB RAM to setup environment using Docker
    Description
    As part of this course, you will learn all the key skills to build Data Engineering Pipelines using Spark SQL and Spark Data Frame APIs using Python as a Programming language. This course used to be a CCA 175 Spark and Hadoop Developer course for the preparation for the Certification Exam. As of 10/31/2021, the exam is sunset and we have renamed it to Apache Spark 2 and 3 using Python 3 as it covers industry-relevant topics beyond the scope of certification.About Data EngineeringData Engineering is nothing but processing the data depending upon our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Apache Spark is evolved as a leading technology to take care of Data Engineering at scale.I have prepared this course for anyone who would like to transition into a Data Engineer role using Pyspark (Python + Spark). I myself am a proven Data Engineering Solution Architect with proven experience in designing solutions using Apache Spark.Let us go through the details about what you will be learning in this course. Keep in mind that the course is created with a lot of hands-on tasks which will give you enough practice using the right tools. Also, there are tons of tasks and exercises to evaluate yourself. We will provide details about Resources or Environments to learn Spark SQL and PySpark 3 using Python 3 as well as Reference Material on GitHub to practice Spark SQL and PySpark 3 using Python 3. Keep in mind that you can either use the cluster at your workplace or set up the environment using provided instructions or use ITVersity Lab to take this course.Setup of Single Node Big Data ClusterMany of you would like to transition to Big Data from Conventional Technologies such as Mainframes, Oracle PL/SQL, etc and you might not have access to Big Data Clusters. It is very important for you set up the environment in the right manner. Don't worry if you do not have the cluster handy, we will guide you through support via Udemy Q&A.Setup Ubuntu-based AWS Cloud9 Instance with the right configurationEnsure Docker is setupSetup Jupyter Lab and other key componentsSetup and Validate Hadoop, Hive, YARN, and SparkAre you feeling a bit overwhelmed about setting up the environment? Don't worry!!! We will provide complementary lab access for up to 2 months. Here are the details.Training using an interactive environment. You will get 2 weeks of lab access, to begin with. If you like the environment, and acknowledge it by providing a 5* rating and feedback, the lab access will be extended to additional 6 weeks (2 months). Feel free to send an email to support@itversity.com to get complementary lab access. Also, if your employer provides a multi-node environment, we will help you set up the material for the practice as part of the live session. On top of Q&A Support, we also provide required support via live sessions.A quick recap of PythonThis course requires a decent knowledge of Python. To make sure you understand Spark from a Data Engineering perspective, we added a module to quickly warm up with Python. If you are not familiar with Python, then we suggest you go through our other course Data Engineering Essentials - Python, SQL, and Spark.Master required Hadoop Skills to build Data Engineering ApplicationsAs part of this section, you will primarily focus on HDFS commands so that we can copy files into HDFS. The data copied into HDFS will be used as part of building data engineering pipelines using Spark and Hadoop with Python as the Programming Language.Overview of HDFS CommandsCopy Files into HDFS using the put or copyFromLocal command using appropriate HDFS CommandsReview whether the files are copied properly or not to HDFS using HDFS Commands.Get the size of the files using HDFS commands such as du, df, etc.Some fundamental concepts related to HDFS such as block size, replication factor, etc.Data Engineering using Spark SQLLet us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.Getting Started with Spark SQLBasic Transformations using Spark SQLManaging Tables - Basic DDL and DML in Spark SQLManaging Tables - DML and Create Partitioned Tables using Spark SQLOverview of Spark SQL Functions to manipulate strings, dates, null values, etcWindowing Functions using Spark SQL for ranking, advanced aggregations, etc.Data Engineering using Spark Data Frame APIsSpark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.Data Processing Overview using Spark or Pyspark Data Frame APIs.Projecting or Selecting data from Spark Data Frames, renaming columns, providing aliases, dropping columns from Data Frames, etc using Pyspark Data Frame APIs.Processing Column Data using Spark or Pyspark Data Frame APIs - You will be learning functions to manipulate strings, dates, null values, etc.Basic Transformations on Spark Data Frames using Pyspark Data Frame APIs such as Filtering, Aggregations, and Sorting using functions such as filter/where, groupBy with agg, sort or orderBy, etc.Joining Data Sets on Spark Data Frames using Pyspark Data Frame APIs such as join. You will learn inner joins, outer joins, etc using the right examples.Windowing Functions on Spark Data Frames using Pyspark Data Frame APIs to perform advanced Aggregations, Ranking, and Analytic FunctionsSpark Metastore Databases and Tables and integration between Spark SQL and Data Frame APIsApache Spark Application Development and Deployment Life CycleOnce you go through the content related to Spark using Jupyter-based environment, we will also walk you through the details about how the Spark applications are typically developed using Python, deployed as well as reviewed.Setup Python Virtual Environment and Project for Spark Application Development using PycharmUnderstand complete Spark Application Development Lifecycle using Pycharm and PythonBuild zip file for the Spark Application, copy to the environment where it is supposed to run and run.Understand how to review the Spark Application Execution Life Cycle.All the demos are given on our state-of-the-art Big Data cluster. You can avail of one-month complimentary lab access by reaching out to support@itversity.com with a Udemy receipt.

    Overview

    Section 1: Introduction about Spark SQL and PySpark 3 using Python 3

    Lecture 1 Introduction to Spark SQL and PySpark 3 using Python 3

    Lecture 2 Curriculum for Spark SQL and Pyspark 3 using Python 3

    Lecture 3 Purchasing the Spark SQL and PySpark using Python 3 Course

    Lecture 4 Introduction to Udemy Course Landing Page

    Lecture 5 Overview of Udemy Course or Video Player

    Lecture 6 Adding Notes to Course Lectures

    Lecture 7 Using Course Sidebar to move between lectures

    Lecture 8 Overview of Support to ITVersity courses on Udemy

    Lecture 9 Best Practices to get ITVersity Support using Udemy

    Lecture 10 Resources for Spark SQL and Pyspark 3 using Python 3

    Lecture 11 Material for Spark SQL and PySpark 3 using Python 3

    Lecture 12 Become Part of ITVersity Data Engineering Community

    Lecture 13 Rate and Leave Feedback - Spark SQL and PySpark 3 using Python 3

    Lecture 14 Udemy for Business Customers - Important Information for about labs for practice

    Section 2: Using ITVersity Labs for hands-on practice (for ITVersity Lab Customers only)

    Lecture 15 Setup Development Environment using VS Code Remote Development Extension Pack

    Lecture 16 Review Data Sets Provided as part of Gateway Nodes of Hadoop and Spark Cluster

    Lecture 17 Validate HDFS on Multi Node Hadoop and Spark Cluster from Gateway Node

    Lecture 18 Validate Hive on Hadoop and Spark Multinode Cluster

    Lecture 19 Review Hadoop HDFS and YARN Property Files on Hadoop and Spark Cluster

    Lecture 20 Review Hadoop HDFS and YARN Property Files using Visual Studio Code Editor

    Lecture 21 Review Hive Property Files on Multinode Hadoop and Spark Cluster

    Lecture 22 Review Spark 2 Property Files and Important Properties

    Lecture 23 Validate Spark Shell CLI using Spark 2

    Lecture 24 Validate Pyspark CLI using Spark 2

    Lecture 25 Validate Spark SQL CLI using Spark 2

    Lecture 26 Review Spark 3 Property Files and Important Properties

    Lecture 27 Validate Spark Shell CLI using Spark 3

    Lecture 28 Validate Pyspark CLI using Spark 3

    Lecture 29 Validate Spark SQL CLI using Spark 3

    Section 3: Setup Hadoop and Spark Single Node Cluster on Windows 11 using Docker

    Lecture 30 Prerequisites for Single Node Hadoop and Spark Cluster on Windows

    Lecture 31 Overview of Windows System Configuration

    Lecture 32 Setup Ubuntu on Windows 11 using wsl

    Lecture 33 Setup and Validate Ubuntu VM on Windows using wsl

    Lecture 34 Install Docker Desktop on Windows 11 using wsl2

    Lecture 35 Overview of Docker Desktop on Windows 11

    Lecture 36 Validate Docker Commands using Windows Powershell as well as wsl Ubuntu

    Lecture 37 Setup Visual Studio Code IDE on Windows

    Lecture 38 Install Visual Studio Code Extension for Remote Development

    Lecture 39 Clone GitHub Repository for Pyspark Course using Visual Studio Code

    Lecture 40 Launching Terminal using Visual Studio Code and WSL

    Lecture 41 Review Docker Compose File to setup Hadoop and Spark Lab

    Lecture 42 Start Hadoop and Spark Lab along with Jupyter Lab on Windows 11

    Lecture 43 Review the resource utilization of Windows for Hadoop and Spark Lab

    Lecture 44 Review Docker Desktop for Hadoop and Spark Lab using Docker

    Lecture 45 Overview of Docker Compose Commands to manage Hadoop and Spark Lab

    Lecture 46 Validate Hadoop and Spark setup using Docker on Windows

    Section 4: Setup Hadoop and Spark Single Node Cluster on AWS Cloud9 using Docker

    Lecture 47 Getting Started with AWS Cloud9

    Lecture 48 Creating AWS Cloud9 Environment

    Lecture 49 Warming up with AWS Cloud9 IDE

    Lecture 50 Review Operating System Details on AWS Cloud9

    Lecture 51 Overview of EC2 Instance related to AWS Cloud9

    Lecture 52 Opening ports for AWS Cloud9 Instance

    Lecture 53 Associating Elastic IPs to AWS Cloud9 Instance

    Lecture 54 Increase EBS Volume Size of AWS Cloud9 Instance

    Lecture 55 Setup Docker Compose on AWS Cloud9 Instance

    Lecture 56 Clone GitHub Repository on AWS Cloud9 for the Course Material

    Lecture 57 Review Docker Compose File to setup Hadoop and Spark Lab

    Lecture 58 Start Hadoop and Spark Lab along with Jupyter Lab on Windows 11

    Lecture 59 Overview of Docker Compose Commands to manage Hadoop and Spark Lab

    Lecture 60 Validate Hadoop and Spark setup using Docker

    Section 5: Python Fundamentals

    Lecture 61 Introduction and Setting up Python

    Lecture 62 Basic Programming Constructs

    Lecture 63 Functions in Python

    Lecture 64 Python Collections

    Lecture 65 Map Reduce operations on Python Collections

    Lecture 66 Setting up Data Sets for Basic I/O Operations

    Lecture 67 Basic I/O operations and processing data using Collections

    Section 6: Overview of Hadoop HDFS Commands

    Lecture 68 Getting help or usage

    Lecture 69 Listing HDFS Files

    Lecture 70 Managing HDFS Directories

    Lecture 71 Copying files from local to HDFS

    Lecture 72 Copying files from HDFS to local

    Lecture 73 Getting File Metadata

    Lecture 74 Previewing Data in HDFS File

    Lecture 75 HDFS Block Size

    Lecture 76 HDFS Replication Factor

    Lecture 77 Getting HDFS Storage Usage

    Lecture 78 Using HDFS Stat Commands

    Lecture 79 HDFS File Permissions

    Lecture 80 Overriding Properties

    Section 7: Apache Spark 2.x - Data processing - Getting Started

    Lecture 81 Introduction

    Lecture 82 Review of Setup Steps for Spark Environment

    Lecture 83 Using ITVersity labs

    Lecture 84 Apache Spark Official Documentation (Very Important)

    Lecture 85 Quick Review of Spark APIs

    Lecture 86 Spark Modules

    Lecture 87 Spark Data Structures - RDDs and Data Frames

    Lecture 88 Develop Simple Application

    Lecture 89 Apache Spark - Framework

    Lecture 90 Create Data Frames from Text Files

    Lecture 91 Create Data Frames from Hive Tables

    Section 8: Apache Spark using SQL - Getting Started

    Lecture 92 Getting Started - Overview

    Lecture 93 Overview of Spark Documentation

    Lecture 94 Launching and using Spark SQL CLI

    Lecture 95 Overview of Spark SQL Properties

    Lecture 96 Running OS Commands using Spark SQL

    Lecture 97 Understanding Spark Metastore Warehouse Directory

    Lecture 98 Managing Spark Metastore Databases using Spark SQL

    Lecture 99 Managing Spark Metastore Tables using Spark SQL

    Lecture 100 Retrieve Metadata of Spark Metastore Tables using Spark SQL Describe Command

    Lecture 101 Role of Spark Metastore or Hive Metastore

    Lecture 102 Exercise - Getting Started with Spark SQL

    Section 9: Apache Spark using SQL - Basic Transformations using Spark SQL

    Lecture 103 Basic Transformations using Spark SQL - Introduction

    Lecture 104 Spark SQL - Overview

    Lecture 105 Define Problem Statement

    Lecture 106 Prepare Spark Metastore Tables for Basic Transformations using Spark SQL

    Lecture 107 Projecting Data using Spark SQL Select Clause

    Lecture 108 Filtering Data using Spark SQL Where Clause

    Lecture 109 Joining Tables using Spark SQL - Inner

    Lecture 110 Joining Tables using Spark SQL - Outer

    Lecture 111 Aggregating Data using Group By in Spark SQL

    Lecture 112 Sorting Data using Order By in Spark SQL

    Lecture 113 Conclusion - Final Solution for the problem statement using Spark SQL

    Section 10: Apache Spark using SQL - Basic DDL and DML

    Lecture 114 Introduction to Basic DDL and DML in Spark SQL

    Lecture 115 Create Spark Metastore Tables using Spark SQL Create Statement

    Lecture 116 Overview of Data Types used in Spark Metastore Tables

    Lecture 117 Adding Comments to Spark Metastore Tables using Spark SQL

    Lecture 118 Loading Data from Local File System Into Tables using Spark SQL Load Statement

    Lecture 119 Loading Data from HDFS Folders Into Tables using Spark SQL Load Statement

    Lecture 120 Difference between Load with Append and Overwrite using Spark SQL Load Statement

    Lecture 121 Creating External Spark Metastore Tables using Spark SQL

    Lecture 122 Difference between Managed and External Spark Metastore Tables

    Lecture 123 Overview of File Formats used in Spark Metastore Tables

    Lecture 124 Drop Spark Metastore Tables and Databases using Spark SQL

    Lecture 125 Truncating Spark Metastore Tables

    Lecture 126 Exercise - Managed Spark Metastore Tables

    Section 11: Apache Spark using SQL - DML and Partitioning

    Lecture 127 Introduction to DML and Partitioning using Spark SQL on Spark Metastore Tables

    Lecture 128 Introduction to Partitioning of Spark Metastore Tables using Spark SQL

    Lecture 129 Creating Spark Metastore Tables using Parquet File Format

    Lecture 130 Difference between Load and Insert to get data into Spark Metastore Tables

    Lecture 131 Inserting Data using Stage Table leveraging Spark SQL

    Lecture 132 Creating Spark Metastore Partitioned Tables using Spark SQL

    Lecture 133 Adding Partitions to Spark Metastore Tables using Spark SQL

    Lecture 134 Loading Data into Spark Metastore Partitioned Tables using Spark SQL

    Lecture 135 Inserting Data into Spark Metastore Partitions using Spark SQL Insert Statement

    Lecture 136 Using Dynamic Partition Mode while inserting into Spark Partitioned Tables

    Lecture 137 Exercise - Partitioned Tables using Spark SQL

    Section 12: Apache Spark using SQL - Pre-defined Functions

    Lecture 138 Introduction - Overview of Spark SQL Pre-defined Functions

    Lecture 139 Overview of Spark SQL Pre-defined Functions

    Lecture 140 Validating Spark SQL Functions

    Lecture 141 String Manipulation using Spark SQL Functions

    Lecture 142 Date Manipulation using Spark SQL Functions

    Lecture 143 Overview of Numeric Functions in Spark SQL

    Lecture 144 Data Type Conversion using Spark SQL

    Lecture 145 Dealing with Nulls using Spark SQL

    Lecture 146 Using CASE and WHEN in Spark SQL Queries

    Lecture 147 Query Example - Word Count using Spark SQL

    Section 13: Apache Spark SQL - Windowing Functions

    Lecture 148 Introduction to Windowing Functions in Spark SQL

    Lecture 149 Prepare HR Database for Windowing Functions in Spark SQL

    Lecture 150 Overview of Windowing Functions using Spark SQL

    Lecture 151 Aggregations using Spark SQL Windowing Functions

    Lecture 152 Using LEAD or LAG in Spark SQL Windowing Functions

    Lecture 153 Getting first and last values using Spark SQL Windowing Functions

    Lecture 154 Ranking using Spark SQL Windowing Functions - rank, dense_rank and row_number

    Lecture 155 Order of execution of Spark SQL Queries

    Lecture 156 Overview of Subqueries in Spark SQL

    Lecture 157 Filtering Window Function Results using Spark SQL

    Section 14: Apache Spark using Python - Data Processing Overview

    Lecture 158 Starting Spark Context - pyspark

    Lecture 159 Overview of Spark Read APIs

    Lecture 160 Understanding airlines data

    Lecture 161 Inferring Schema using Spark Data Frame APIs

    Lecture 162 Previewing Airlines Data using Spark Data Frame APIs

    Lecture 163 Overview of Data Frame APIs

    Lecture 164 Overview of Functions on Spark Data Frames

    Lecture 165 Overview of Spark Write APIs

    Section 15: Apache Spark using Python - Processing Column Data

    Lecture 166 Overview of Predefined Functions on Spark Data Frame Columns

    Lecture 167 Create Dummy Data Frame to explore Functions on Data Frame Columns

    Lecture 168 Categories of Predefined Functions used on Spark Data Frame Columns

    Lecture 169 Special Functions for Spark Data Frame Columns - col and lit

    Lecture 170 Common String Manipulation Functions for Spark Data Frame Columns

    Lecture 171 Extracting Strings using substring from Spark Data Frame Columns

    Lecture 172 Extracting Strings using split from Spark Data Frame Columns

    Lecture 173 Padding Characters around Strings in Spark Data Frame Columns

    Lecture 174 Trimming Characters from Strings in Spark Data Frame Columns

    Lecture 175 Date and Time Manipulation Functions for Spark Data Frame Columns

    Lecture 176 Date and Time Arithmetic on Spark Data Frame Columns

    Lecture 177 Using Date and Time Trunc Functions on Spark Data Frame Columns

    Lecture 178 Date and Time Extract Functions for Spark Data Frame Columns

    Lecture 179 Using to_date and to_timestamp on Spark Data Frame Columns

    Lecture 180 Using date_format Function on Spark Data Frame Columns

    Lecture 181 Dealing with Unix Timestamp in Spark Data Frame Columns

    Lecture 182 Dealing with Nulls in Spark Data Frame Columns

    Lecture 183 Using CASE and WHEN on Spark Data Frame Columns

    Section 16: Apache Spark using Python - Basic Transformations

    Lecture 184 Overview of Basic Transformations on Spark Data Frames

    Lecture 185 Spark Data Frames for basic transformations

    Lecture 186 Basic Filtering of Data or rows using where from Spark Data Frames

    Lecture 187 Filtering Example using dates on Spark Data Frames

    Lecture 188 Boolean Operators while filtering from Spark Data Frames

    Lecture 189 Using IN Operator or isin Function while filtering from Spark Data Frames

    Lecture 190 Using LIKE Operator or like Function while filtering from Spark Data Frames

    Lecture 191 Using BETWEEN Operator while filtering from Spark Data Frames

    Lecture 192 Dealing with Nulls while Filtering from Spark Data Frames

    Lecture 193 Total Aggregations on Spark Data Frames

    Lecture 194 Aggregate data using groupBy from Spark Data Frames

    Lecture 195 Aggregate data using rollup on Spark Data Frames

    Lecture 196 Aggregate data using cube on Spark Data Frames

    Lecture 197 Overview of Sorting Spark Data Frames

    Lecture 198 Solution - Problem 1 - Get Total Aggregations

    Lecture 199 Solution - Problem 2 - Get Total Aggregations By FlightDate

    Section 17: Apache Spark using Python - Joining Data Sets

    Lecture 200 Prepare Datasets for Joining Spark Data Frames

    Lecture 201 Analyze Datasets for Joining Spark Data Frames

    Lecture 202 Problem Statements for Joining Spark Data Frames

    Lecture 203 Overview of Joins on Spark Data Frames

    Lecture 204 Using Inner Joins on Spark Data Frames

    Lecture 205 Left or Right Outer Join on Spark Data Frames

    Lecture 206 Solution - Get Flight Count Per US Airport using Spark Data Frame APIs

    Lecture 207 Solution - Get Flight Count Per US State using Spark Data Frame APIs

    Lecture 208 Solution - Get Dormant US Airports using Spark Data Frame APIs

    Lecture 209 Solution - Get Origins without master data using Spark Data Frame APIs

    Lecture 210 Solution - Get Count of Flights without master data using Spark Data Frame APIs

    Lecture 211 Solution - Get Count of Flights per Airport without master data

    Lecture 212 Solution - Get Daily Revenue using Spark Data Frame APIs

    Lecture 213 Solution - Get Daily Revenue rolled up till Yearly using Spark Data Frame APIs

    Section 18: Apache Spark using Python - Spark Metastore

    Lecture 214 Overview of APIs to deal with Spark Metastore

    Lecture 215 Exploring Spark Catalog

    Lecture 216 Creating Spark Metastore Tables using catalog

    Lecture 217 Inferring Schema while creating Spark Metastore Tables using Spark Catalog

    Lecture 218 Define Schema for Spark Metastore Tables using StructType

    Lecture 219 Inserting into Existing Spark Metastore Tables using Spark Data Frame APIs

    Lecture 220 Read and Process data from Spark Metastore Tables using Data Frame APIs

    Lecture 221 Create Spark Metastore Partitioned Tables using Data Frame APIs

    Lecture 222 Saving as Spark Metastore Partitioned Table using Data Frame APIs

    Lecture 223 Creating Temporary Views on top of Spark Data Frames

    Lecture 224 Using Spark SQL against Temporary Views on Spark Data Frames

    Section 19: Getting Started with Semi Structured Data using Spark

    Lecture 225 Introduction to Getting Started with Semi Structured Data using Spark

    Lecture 226 Create Spark Metastore Table with Special Data Types

    Lecture 227 Overview of ARRAY Type in Spark Metastore Table

    Lecture 228 Overview of MAP and STRUCT Type in Spark Metastore Table

    Lecture 229 Insert Data into Spark Metastore Table with Special Type Columns

    Lecture 230 Create Spark Data Frame with Special Data Types

    Lecture 231 Create Spark Data Frame with Special Types using Python List

    Lecture 232 Insert Spark Data Frame with Special Types into Spark Metastore Table

    Lecture 233 Review Data in the JSON File with Special Data Types

    Lecture 234 Setup JSON Data Set to explore Spark APIs on Special Data Type Columns

    Lecture 235 Read JSON Data with Special Types into Spark Data Frame

    Lecture 236 Flatten Array Fields in Spark Data Frames using explode and explode_outer

    Lecture 237 Get Size or Length of Array Type Columns in Spark Data Frame

    Lecture 238 Concatenate Array Values into Delimited String using Spark APIs

    Lecture 239 Convert Delimited Strings from Spark Data Frame Columns to Arrays

    Lecture 240 Setup Data Sets to Build Arrays using Spark.cmproj

    Lecture 241 Read JSON Data into Spark Data Frame and Review Aggregate Operations

    Lecture 242 Build Arrays from Flattened Rows of Spark Data Frame

    Lecture 243 Getting Started with Spark Data Frames with Struct Columns

    Lecture 244 Concatenate Struct Column Values in Spark Data Frame

    Lecture 245 Filter Data on Struct Column Attributes in Spark Data Frame

    Lecture 246 Create Spark Data Frame using Map Type Column

    Lecture 247 Project Map Values as Columns using Spark Data Frame APIs

    Lecture 248 Conclusion of Getting Started with Semi Structured Data using Spark

    Section 20: Process Semi Structured Data using Spark Data Frame APIs

    Lecture 249 Introduction to Process Semi Structured Data using Spark Data Frame APIs

    Lecture 250 Review the Data Sets to generate denormalized JSON Data using Spark

    Lecture 251 Setup JSON Data Sets in HDFS using HDFS Command

    Lecture 252 Create Spark Data Frames using Data Frame APIs

    Lecture 253 Join Orders and Order Items using Spark Data Frame APIs

    Lecture 254 Generate Struct Field for Order Details using Spark

    Lecture 255 Generate Array of Struct Field for Order Details using Spark

    Lecture 256 Join Data Sets to generate denormalized JSON Data using Spark

    Lecture 257 Denormalize Join Results using Spark Data Frame APIs

    Lecture 258 Write Denormalized Customer Details to JSON Files using Spark

    Lecture 259 Publish JSON Files for downstream applications

    Lecture 260 Read Denormalized Data into Spark Data Frame

    Lecture 261 Filter Denormalized Data Frame using Spark APIs

    Lecture 262 Perform Aggregations on Denormalized Data Frame using Spark

    Lecture 263 Flatten Semi Structured Data or Denormalized Data using Spark

    Lecture 264 Compute Monthly Customer Revenue using Spark on Denormalized Data

    Lecture 265 Conclusion of Processing Semi Structured Data using Spark Data Frame APIs

    Section 21: Apache Spark - Application Development Life Cycle

    Lecture 266 Setup Virtual Environment and Install Pyspark

    Lecture 267 Getting Started with Pycharm

    Lecture 268 Passing Run Time Arguments

    Lecture 269 Accessing OS Environment Variables

    Lecture 270 Getting Started with Spark

    Lecture 271 Create Function for Spark Session

    Lecture 272 Setup Sample Data

    Lecture 273 Read data from files

    Lecture 274 Process data using Spark APIs

    Lecture 275 Write data to files

    Lecture 276 Validating Writing Data to Files

    Lecture 277 Productionizing the Code

    Lecture 278 Setting up Data for Production Validation

    Lecture 279 Running the application using YARN

    Lecture 280 Detailed Validation of the Application

    Section 22: Spark Application Execution Life Cycle and Spark UI

    Lecture 281 Deploying and Monitoring Spark Applications - Introduction

    Lecture 282 Overview of Types of Spark Cluster Managers

    Lecture 283 Setup EMR Cluster with Hadoop and Spark

    Lecture 284 Overall Capacity of Big Data Cluster with Hadoop and Spark

    Lecture 285 Understanding YARN Capacity of an Enterprise Cluster

    Lecture 286 Overview of Hadoop HDFS and YARN Setup on Multi-node Cluster

    Lecture 287 Overview of Spark Setup on top of Hadoop

    Lecture 288 Setup Data Set for Word Count application

    Lecture 289 Develop Word Count Application

    Lecture 290 Review Deployment Process of Spark Application

    Lecture 291 Overview of Spark Submit Command

    Lecture 292 Switch between Python Versions to run Spark Applications or launch Pyspark CLI

    Lecture 293 Switch between Pyspark Versions to run Spark Applications or launch Pyspark CLI

    Lecture 294 Review Spark Configuration Properties at Run Time

    Lecture 295 Develop Shell Script to run Spark Application

    Lecture 296 Run Spark Application and review default executors

    Lecture 297 Overview of Spark History Server UI

    Section 23: Setup SSH Proxy to access Spark Application logs

    Lecture 298 Setup SSH Proxy to access Spark Application logs - Introduction

    Lecture 299 Overview of Private and Public ips of servers in the cluster

    Lecture 300 Overview of SSH Proxy

    Lecture 301 Setup sshuttle on Mac or Linux

    Lecture 302 Proxy using sshuttle on Mac or Linux

    Lecture 303 Accessing Spark Application logs via SSH Proxy using sshuttle on Mac or Linux

    Lecture 304 Side effects of using SSH Proxy to access Spark Application Logs

    Lecture 305 Steps to setup SSH Proxy on Windows to access Spark Application Logs

    Lecture 306 Setup PuTTY and PuTTYgen on Windows

    Lecture 307 Quick Tour of PuTTY on Windows

    Lecture 308 Configure Passwordless Login using PuTTYGen Keys on Windows

    Lecture 309 Run Spark Application on Gateway Node using PuTTY

    Lecture 310 Configure Tunnel to Gateway Node using PuTTY on Windows for SSH Proxy

    Lecture 311 Setup Proxy on Windows and validate using Microsoft Edge browser

    Lecture 312 Understanding Proxying Network Traffic overcoming Windows Caveats

    Lecture 313 Update Hosts file for worker nodes using private ips

    Lecture 314 Access Spark Application logs using SSH Proxy

    Lecture 315 Overview of performing tasks related to Spark Applications using Mac

    Section 24: Deployment Modes of Spark Applications

    Lecture 316 Deployment Modes of Spark Applications - Introduction

    Lecture 317 Default Execution Master Type for Spark Applications

    Lecture 318 Launch Pyspark using local mode

    Lecture 319 Running Spark Applications using Local Mode

    Lecture 320 Overview of Spark CLI Commands such as Pyspark

    Lecture 321 Accessing Local Files using Spark CLI or Spark Applications

    Lecture 322 Overview of submitting spark application using client deployment mode

    Lecture 323 Overview of submitting spark application using cluster deployment mode

    Lecture 324 Review the default logging while submitting Spark Applications

    Lecture 325 Changing Spark Application Log Level using custom log4j properties

    Lecture 326 Submit Spark Application using client mode with log level info

    Lecture 327 Submit Spark Application using cluster mode with log level info

    Lecture 328 Submit Spark Applications using SPARK_CONF_DIR with custom properties files

    Lecture 329 Submit Spark Applications using Properties File

    Section 25: Passing Application Properties Files and External Dependencies

    Lecture 330 Passing Application Properties Files and External Dependencies - Introduction

    Lecture 331 Steps to pass application properties using JSON

    Lecture 332 Setup Working Directory to pass application properties using JSON

    Lecture 333 Build the JSON with Application Properties

    Lecture 334 Explore APIs to process JSON Data using Pyspark

    Lecture 335 Refactor the Spark Application Code to use properties from JSON

    Lecture 336 Pass Application Properties to Spark Application using local files in client mod

    Lecture 337 Pass Application Properties to Spark Application using local files in cluster mo

    Lecture 338 Pass Application Properties to Spark Application using HDFS files

    Lecture 339 Steps to pass external Python Libraries using pyfiles

    Lecture 340 Create required YAML File to externalize application properties

    Lecture 341 Install PyYAML into specific folder and build zip

    Lecture 342 Explore APIs to process YAML Data using Pyspark

    Lecture 343 Refactor the Spark Application Code to use properties from YAML

    Lecture 344 Pass External Dependencies to Spark Application using local files in client mode

    Lecture 345 Pass External Dependencies to Spark Apps using local files in cluster mode

    Lecture 346 Pass External Dependencies to Spark Application using HDFS files

    Any IT aspirant/professional willing to learn Data Engineering using Apache Spark,Python Developers who want to learn Spark to add the key skill to be a Data Engineer,Scala based Data Engineers who would like to learn Spark using Python as Programming Language