Tags
Language
Tags
June 2025
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 1 2 3 4 5
    Attention❗ To save your time, in order to download anything on this site, you must be registered 👉 HERE. If you do not have a registration yet, it is better to do it right away. ✌

    ( • )( • ) ( ͡⚆ ͜ʖ ͡⚆ ) (‿ˠ‿)
    SpicyMags.xyz

    Spark Sql And Spark 3 Using Scala Hands-On With Labs

    Posted By: ELK1nG
    Spark Sql And Spark 3 Using Scala Hands-On With Labs

    Spark Sql And Spark 3 Using Scala Hands-On With Labs
    Last updated 2/2022
    MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
    Language: English | Size: 8.75 GB | Duration: 24h 12m

    A comprehensive course on Spark SQL as well as Data Frame APIs using Scala with complementary lab access

    What you'll learn
    All the HDFS Commands that are relevant to validate files and folders in HDFS.
    Enough Scala to work Data Engineering Projects using Scala as Programming Language
    Spark Dataframe APIs to solve the problems using Dataframe style APIs.
    Basic Transformations such as Projection, Filtering, Total as well as Aggregations by Keys using Spark Dataframe APIs
    Inner as well as outer joins using Spark Data Frame APIs
    Ability to use Spark SQL to solve the problems using SQL style syntax.
    Basic Transformations such as Projection, Filtering, Total as well as Aggregations by Keys using Spark SQL
    Inner as well as outer joins using Spark SQL
    Basic DDL to create and manage tables using Spark SQL
    Basic DML or CRUD Operations using Spark SQL
    Create and Manage Partitioned Tables using Spark SQL
    Manipulating Data using Spark SQL Functions
    Advanced Analytical or Windowing Functions to perform aggregations and ranking using Spark SQL
    Requirements
    Basic programming skills
    Self support lab (Instructions provided) or ITVersity lab at additional cost for appropriate environment.
    Minimum memory required based on the environment you are using with 64 bit operating system
    4 GB RAM with access to proper clusters or 16 GB RAM with virtual machines such as Cloudera QuickStart VM
    Description
    As part of this course, you will learn all the key skills to build Data Engineering Pipelines using Spark SQL and Spark Data Frame APIs using Scala as a Programming language. This course used to be a CCA 175 Spark and Hadoop Developer course for the preparation of the Certification Exam. As of 10/31/2021, the exam is sunset and we have renamed it to Spark SQL and Spark 3 using Scala as it covers industry-relevant topics beyond the scope of certification.About Data EngineeringData Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Apache Spark is evolved as a leading technology to take care of Data Engineering at scale.I have prepared this course for anyone who would like to transition into a Data Engineer role using Spark (Scala). I myself am a proven Data Engineering Solution Architect with proven experience in designing solutions using Apache Spark.Let us go through the details about what you will be learning in this course. Keep in mind that the course is created with a lot of hands-on tasks which will give you enough practice using the right tools. Also, there are tons of tasks and exercises to evaluate yourself.Setup of Single Node Big Data ClusterMany of you would like to transition to Big Data from Conventional Technologies such as Mainframes, Oracle PL/SQL, etc and you might not have access to Big Data Clusters. It is very important for you set up the environment in the right manner. Don't worry if you do not have the cluster handy, we will guide you through support via Udemy Q&A.Setup Ubuntu-based AWS Cloud9 Instance with the right configurationEnsure Docker is setupSetup Jupyter Lab and other key componentsSetup and Validate Hadoop, Hive, YARN, and SparkAre you feeling a bit overwhelmed about setting up the environment? Don't worry!!! We will provide complementary lab access for up to 2 months. Here are the details.Training using an interactive environment. You will get 2 weeks of lab access, to begin with. If you like the environment, and acknowledge it by providing a 5* rating and feedback, the lab access will be extended to additional 6 weeks (2 months). Feel free to send an email to support@itversity.com to get complementary lab access. Also, if your employer provides a multi-node environment, we will help you set up the material for the practice as part of the live session. On top of Q&A Support, we also provide required support via live sessions.A quick recap of ScalaThis course requires a decent knowledge of Scala. To make sure you understand Spark from a Data Engineering perspective, we added a module to quickly warm up with Scala. If you are not familiar with Scala, then we suggest you go through relevant courses on Scala as Programming Language.Data Engineering using Spark SQLLet us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.Getting Started with Spark SQLBasic Transformations using Spark SQLManaging Spark Metastore Tables - Basic DDL and DMLManaging Spark Metastore Tables Tables - DML and PartitioningOverview of Spark SQL FunctionsWindowing Functions using Spark SQLData Engineering using Spark Data Frame APIsSpark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.Data Processing Overview using Spark Data Frame APIs leveraging Scala as Programming LanguageProcessing Column Data using Spark Data Frame APIs leveraging Scala as Programming LanguageBasic Transformations using Spark Data Frame APIs leveraging Scala as Programming Language - Filtering, Aggregations, and SortingJoining Data Sets using Spark Data Frame APIs leveraging Scala as Programming LanguageAll the demos are given on our state-of-the-art Big Data cluster. You can avail of one-month complimentary lab access by reaching out to support@itversity.com with a Udemy receipt.

    Overview

    Section 1: Introduction

    Lecture 1 CCA 175 Spark and Hadoop Developer - Curriculum

    Section 2: Setting up Environment using AWS Cloud9

    Lecture 2 Getting Started with Cloud9

    Lecture 3 Creating Cloud9 Environment

    Lecture 4 Warming up with Cloud9 IDE

    Lecture 5 Overview of EC2 related to Cloud9

    Lecture 6 Opening ports for Cloud9 Instance

    Lecture 7 Associating Elastic IPs to Cloud9 Instance

    Lecture 8 Increase EBS Volume Size of Cloud9 Instance

    Lecture 9 Setup Jupyter Lab on Cloud9

    Lecture 10 [Commands] Setup Jupyter Lab on Cloud9

    Section 3: Setting up Environment - Overview of GCP and Provision Ubuntu VM

    Lecture 11 Signing up for GCP

    Lecture 12 Overview of GCP Web Console

    Lecture 13 Overview of GCP Pricing

    Lecture 14 Provision Ubuntu VM from GCP

    Lecture 15 Setup Docker

    Lecture 16 Why we are setting up Python and Jupyter Lab for Scala related course?

    Lecture 17 Validating Python

    Lecture 18 Setup Jupyter Lab

    Section 4: Setup Hadoop on Single Node Cluster

    Lecture 19 Introduction to Single Node Hadoop Cluster

    Lecture 20 Setup Prerequisties

    Lecture 21 [Commands] - Setup Prerequisites

    Lecture 22 Setup Password less login

    Lecture 23 [Commands] - Setup Password less login

    Lecture 24 Download and Install Hadoop

    Lecture 25 [Commands] - Download and Install Hadoop

    Lecture 26 Configure Hadoop HDFS

    Lecture 27 [Commands] - Configure Hadoop HDFS

    Lecture 28 Start and Validate HDFS

    Lecture 29 [Commands] - Start and Validate HDFS

    Lecture 30 Configure Hadoop YARN

    Lecture 31 [Commands] - Configure Hadoop YARN

    Lecture 32 Start and Validate YARN

    Lecture 33 [Commands] - Start and Validate YARN

    Lecture 34 Managing Single Node Hadoop

    Lecture 35 [Commands] - Managing Single Node Hadoop

    Section 5: Setup Hive and Spark on Single Node Cluster

    Lecture 36 Setup Data Sets for Practice

    Lecture 37 [Commands] - Setup Data Sets for Practice

    Lecture 38 Download and Install Hive

    Lecture 39 [Commands] - Download and Install Hive

    Lecture 40 Setup Database for Hive Metastore

    Lecture 41 [Commands] - Setup Database for Hive Metastore

    Lecture 42 Configure and Setup Hive Metastore

    Lecture 43 [Commands] - Configure and Setup Hive Metastore

    Lecture 44 Launch and Validate Hive

    Lecture 45 [Commands] - Launch and Validate Hive

    Lecture 46 Scripts to Manage Single Node Cluster

    Lecture 47 [Commands] - Scripts to Manage Single Node Cluster

    Lecture 48 Download and Install Spark 2

    Lecture 49 [Commands] - Download and Install Spark 2

    Lecture 50 Configure Spark 2

    Lecture 51 [Commands] - Configure Spark 2

    Lecture 52 Validate Spark 2 using CLIs

    Lecture 53 [Commands] - Validate Spark 2 using CLIs

    Lecture 54 Validate Jupyter Lab Setup

    Lecture 55 [Commands] - Validate Jupyter Lab Setup

    Lecture 56 Intergrate Spark 2 with Jupyter Lab

    Lecture 57 [Commands] - Intergrate Spark 2 with Jupyter Lab

    Lecture 58 Download and Install Spark 3

    Lecture 59 [Commands] - Download and Install Spark 3

    Lecture 60 Configure Spark 3

    Lecture 61 [Commands] - Configure Spark 3

    Lecture 62 Validate Spark 3 using CLIs

    Lecture 63 [Commands] - Validate Spark 3 using CLIs

    Lecture 64 Intergrate Spark 3 with Jupyter Lab

    Lecture 65 [Commands] - Intergrate Spark 3 with Jupyter Lab

    Section 6: Scala Fundamentals

    Lecture 66 Introduction and Setting up of Scala

    Lecture 67 Setup Scala on Windows

    Lecture 68 Basic Programming Constructs

    Lecture 69 Functions

    Lecture 70 Object Oriented Concepts - Classes

    Lecture 71 Object Oriented Concepts - Objects

    Lecture 72 Object Oriented Concepts - Case Classes

    Lecture 73 Collections - Seq, Set and Map

    Lecture 74 Basic Map Reduce Operations

    Lecture 75 Setting up Data Sets for Basic I/O Operations

    Lecture 76 Basic I/O Operations and using Scala Collections APIs

    Lecture 77 Tuples

    Lecture 78 Development Cycle - Create Program File

    Lecture 79 Development Cycle - Compile source code to jar using SBT

    Lecture 80 Development Cycle - Setup SBT on Windows

    Lecture 81 Development Cycle - Compile changes and run jar with arguments

    Lecture 82 Development Cycle - Setup IntelliJ with Scala

    Lecture 83 Development Cycle - Develop Scala application using SBT in IntelliJ

    Section 7: Overview of Hadoop HDFS Commands

    Lecture 84 Getting help or usage of HDFS Commands

    Lecture 85 Listing HDFS Files

    Lecture 86 Managing HDFS Directories

    Lecture 87 Copying files from local to HDFS

    Lecture 88 Copying files from HDFS to local

    Lecture 89 Getting File Metadata

    Lecture 90 Previewing Data in HDFS File

    Lecture 91 HDFS Block Size

    Lecture 92 HDFS Replication Factor

    Lecture 93 Getting HDFS Storage Usage

    Lecture 94 Using HDFS Stat Commands

    Lecture 95 HDFS File Permissions

    Lecture 96 Overriding Properties

    Section 8: Apache Spark 2 using Scala - Data Processing - Overview

    Lecture 97 Introduction for the module

    Lecture 98 Starting Spark Context using spark-shell

    Lecture 99 Overview of Spark read APIs

    Lecture 100 Previewing Schema and Data using Spark APIs

    Lecture 101 Overview of Spark Data Frame APIs

    Lecture 102 Overview of Functions to Manipulate Data in Spark Data Frames

    Lecture 103 Overview of Spark Write APIs

    Section 9: Apache Spark 2 using Scala - Processing Column Data using Pre-defined Functions

    Lecture 104 Introduction to Pre-defined Functions

    Lecture 105 Creating Spark Session Object in Notebook

    Lecture 106 Create Dummy Data Frames for Practice

    Lecture 107 Categories of Functions on Spark DAta Frame Columns

    Lecture 108 Using Spark Special Functions - col

    Lecture 109 Using Spark Special Functions - lit

    Lecture 110 Manipulating String Columns using Spark Functions - Case Conversion and Length

    Lecture 111 Manipulating String Columns using Spark Functions - substring

    Lecture 112 Manipulating String Columns using Spark Functions - split

    Lecture 113 Manipulating String Columns using Spark Functions - Concatenating Strings

    Lecture 114 Manipulating String Columns using Spark Functions - Padding Strings

    Lecture 115 Manipulating String Columns using Spark Functions - Trimming unwanted characters

    Lecture 116 Date and Time Functions in Spark - Overview

    Lecture 117 Date and Time Functions in Spark - Date Arithmetic

    Lecture 118 Date and Time Functions in Spark - Using trunc and date_trunc

    Lecture 119 Date and Time Functions in Spark - Using date_format and other functions

    Lecture 120 Date and Time Functions in Spark - dealing with unix timestamp

    Lecture 121 Pre-defined Functions in Spark - Conclusion

    Section 10: Apache Spark 2 using Scala - Basic Transformations using Data Frames

    Lecture 122 Introduction to Basic Transformations using Data Frame APIs

    Lecture 123 Starting Spark Context

    Lecture 124 Overview of Filtering using Spark Data Frame APIs

    Lecture 125 Filtering Data from Spark Data Frames - Reading Data and Understanding Schema

    Lecture 126 Filtering Data from Spark Data Frames - Task 1 - Equal Operator

    Lecture 127 Filtering Data from Spark Data Frames - Task 2 - Comparison Operators

    Lecture 128 Filtering Data from Spark Data Frames - Task 3 - Boolean AND

    Lecture 129 Filtering Data from Spark Data Frames - Task 4 - IN Operator

    Lecture 130 Filtering Data from Spark Data Frames - Task 5 - Between and Like

    Lecture 131 Filtering Data from Spark Data Frames - Task 6 - Using functions in Filter

    Lecture 132 Overview of Aggregations using Spark Data Frame APIs

    Lecture 133 Overview of Sorting using Spark Data Frame APIs

    Lecture 134 Solution - Get Delayed Counts using Spark Data Frame APIs - Part 1

    Lecture 135 Solution - Get Delayed Counts using Spark Data Frame APIs - Part 2

    Lecture 136 Solution - Getting Delayed Counts By Date using Spark Data Frame APIs

    Section 11: Apache Spark 2 using Scala - Joining Data Sets

    Lecture 137 Prepare and Validate Data Sets

    Lecture 138 Starting Spark Session or Spark Context

    Lecture 139 Analyze Data Sets for Joins using Spark Data Frame APIs

    Lecture 140 Eliminate Duplicate records from Data Frame using Spark Data Frame APIs

    Lecture 141 Recap of Basic Transformations using Spark Data Frame APIs

    Lecture 142 Joining Data Sets using Spark Data Frame APIs - Problem Statements

    Lecture 143 Overview of Joins using Spark Data Frame APIs

    Lecture 144 Inner Join using Spark Data Fr - Get number of flights departed from US airports

    Lecture 145 Inner Join using Spark Data Fram - Get number of flights departed from US States

    Lecture 146 Outer Join using Spark Data Frame APIs - Get Aiports - Never Used

    Section 12: Apache Spark using SQL - Getting Started

    Lecture 147 Getting Started with Spark SQL - Overview

    Lecture 148 Overview of Spark Documentation

    Lecture 149 Launching and using Spark SQL CLI

    Lecture 150 Overview of Spark SQL Properties

    Lecture 151 Running OS Commands using Spark SQL

    Lecture 152 Understanding Spark Metastore Warehouse Directory

    Lecture 153 Managing Spark Metastore Databases

    Lecture 154 Managing Spark Metastore Tables

    Lecture 155 Retrieve Metadata of Spark Metastore Tables

    Lecture 156 Role of Spark Metastore or Hive Metastore

    Lecture 157 Exercise - Getting Started with Spark SQL

    Section 13: Apache Spark using SQL - Basic Transformations

    Lecture 158 Basic Transformation using Spark SQL - Introduction

    Lecture 159 Spark SQL - Overview

    Lecture 160 Define Problem Statement for Basic Transformations using Spark SQL

    Lecture 161 Prepare or Create Tables using Spark SQL

    Lecture 162 Projecting or Selecting Data using Spark SQL

    Lecture 163 Filtering Data using Spark SQL

    Lecture 164 Joining Tables using Spark SQL - Inner

    Lecture 165 Joining Tables using Spark SQL - Outer

    Lecture 166 Aggregating Data using Spark SQL

    Lecture 167 Sorting Data using Spark SQL

    Lecture 168 Conclusion - Final Solution using Spark SQL

    Section 14: Apache Spark using SQL - Basic DDL and DML

    Lecture 169 Introduction to Basic DDL and DML using Spark SQL

    Lecture 170 Create Spark Metastore Tables using Spark SQL

    Lecture 171 Overview of Data Types for Spark Metastore Table Columns

    Lecture 172 Adding Comments to Spark Metastore Tables using Spark SQL

    Lecture 173 Loading Data Into Spark Metastore Tables using Spark SQL - Local

    Lecture 174 Loading Data Into Spark Metastore Tables using Spark SQL - HDFS

    Lecture 175 Loading Data into Spark Metastore Tables using Spark SQL - Append and Overwrite

    Lecture 176 Creating External Tables in Spark Metastore using Spark SQL

    Lecture 177 Managed Spark Metastore Tables vs External Spark Metastore Tables

    Lecture 178 Overview of Spark Metastore Table File Formats

    Lecture 179 Drop Spark Metastore Tables and Databases

    Lecture 180 Truncating Spark Metastore Tables

    Lecture 181 Exercise - Managed Spark Metastore Tables

    Section 15: Apache Spark using SQL - DML and Partitioning

    Lecture 182 Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL

    Lecture 183 Introduction to Partitioning of Spark Metastore Tables using Spark SQL

    Lecture 184 Creating Spark Metastore Tables using Parquet File Format

    Lecture 185 Load vs. Insert into Spark Metastore Tables using Spark SQL

    Lecture 186 Inserting Data using Stage Spark Metastore Table using Spark SQL

    Lecture 187 Creating Partitioned Spark Metastore Tables using Spark SQL

    Lecture 188 Adding Partitions to Spark Metastore Tables using Spark SQL

    Lecture 189 Loading Data into Partitioned Spark Metastore Tables using Spark SQL

    Lecture 190 Inserting Data into Partitions of Spark Metastore Tables using Spark SQL

    Lecture 191 Using Dynamic Partition Mode to insert data into Spark Metastore Tables

    Lecture 192 Exercise - Partitioned Spark Metastore Tables using Spark SQL

    Section 16: Apache Spark using SQL - Pre-defined Functions

    Lecture 193 Introduction - Overview of Spark SQL Functions

    Lecture 194 Overview of Pre-defined Functions using Spark SQL

    Lecture 195 Validating Functions using Spark SQL

    Lecture 196 String Manipulation Functions using Spark SQL

    Lecture 197 Date Manipulation Functions using Spark SQL

    Lecture 198 Overview of Numeric Functions using Spark SQL

    Lecture 199 Data Type Conversion using Spark SQL

    Lecture 200 Dealing with Nulls using Spark SQL

    Lecture 201 Using CASE and WHEN using Spark SQL

    Lecture 202 Query Example - Word Count using Spark SQL

    Section 17: Apache Spark using SQL - Pre-defined Functions - Exercises

    Lecture 203 Prepare Users Table using Spark SQL

    Lecture 204 Exercise 1 - Get number of users created per year

    Lecture 205 Exercise 2 - Get the day name of the birth days of users

    Lecture 206 Exercise 3 - Get the names and email ids of users added in the year 2019

    Lecture 207 Exercise 4 - Get the number of users by gender

    Lecture 208 Exercise 5 - Get last 4 digits of unique ids

    Lecture 209 Exercise 6 - Get the count of users based up on country code

    Section 18: Apache Spark using SQL - Windowing Functions

    Lecture 210 Introduction to Windowing Functions using Spark SQL

    Lecture 211 Prepare HR Database in Spark Metastore using Spark SQL

    Lecture 212 Overview of Windowing Functions using Spark SQL

    Lecture 213 Aggregations using Windowing Functions using Spark SQL

    Lecture 214 LEAD or LAG Functions using Spark SQL

    Lecture 215 Getting first and last values using Spark SQL

    Lecture 216 Ranking using Windowing Functions in Spark SQL

    Lecture 217 Order of execution of Spark SQL Queries

    Lecture 218 Overview of Subqueries using Spark SQL

    Lecture 219 Filtering Window Function Results using Spark SQL

    Section 19: Sample scenarios with solutions

    Lecture 220 Introduction to Sample Scenarios and Solutions

    Lecture 221 Problem Statements - General Guidelines

    Lecture 222 Initializing the job - General Guidelines

    Lecture 223 Getting crime count per type per month - Understanding Data

    Lecture 224 Getting crime count per type per month - Implementing the logic - Core API

    Lecture 225 Getting crime count per type per month - Implementing the logic - Data Frames

    Lecture 226 Getting crime count per type per month - Validating Output

    Lecture 227 Get inactive customers - using Core Spark API (leftOuterJoin)

    Lecture 228 Get inactive customers - using Data Frames and SQL

    Lecture 229 Get top 3 crimes in RESIDENCE - using Core Spark API

    Lecture 230 Get top 3 crimes in RESIDENCE - using Data Frame and SQL

    Lecture 231 Convert NYSE data from text file format to parquet file format

    Lecture 232 Get word count - with custom control arguments, num keys and file format

    Any IT aspirant/professional willing to learn Data Engineering using Apache Spark,Python Developers who want to learn Spark using Scala to add additional skill to be a Data Engineer,Java or Scala Developers to learn Spark using Scala to add Data Engineering Skills to their profile