Tags
Language
Tags
June 2025
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 1 2 3 4 5
    Attention❗ To save your time, in order to download anything on this site, you must be registered 👉 HERE. If you do not have a registration yet, it is better to do it right away. ✌

    ( • )( • ) ( ͡⚆ ͜ʖ ͡⚆ ) (‿ˠ‿)
    SpicyMags.xyz

    Data Engineering Essentials Using Sql, Python, And Pyspark (updated 2/2023)

    Posted By: ELK1nG
    Data Engineering Essentials Using Sql, Python, And Pyspark (updated 2/2023)

    Data Engineering Essentials Using Sql, Python, And Pyspark
    Last updated 2/2023
    MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
    Language: English | Size: 31.34 GB | Duration: 65h 57m

    Learn key Data Engineering Skills such as SQL, Python, Apache Spark (Spark SQL and Pyspark) with Exercises and Projects

    What you'll learn

    Setup Development Environment to learn building Data Engineering Applications on GCP

    Database Essentials for Data Engineering using Postgres such as creating tables, indexes, running SQL Queries, using important pre-defined functions, etc.

    Data Engineering Programming Essentials using Python such as basic programming constructs, collections, Pandas, Database Programming, etc.

    Data Engineering using Spark Dataframe APIs (PySpark). Learn all important Spark Data Frame APIs such as select, filter, groupBy, orderBy, etc.

    Data Engineering using Spark SQL (PySpark and Spark SQL). Learn how to write high quality Spark SQL queries using SELECT, WHERE, GROUP BY, ORDER BY, ETC.

    Relevance of Spark Metastore and integration of Dataframes and Spark SQL

    Ability to build Data Engineering Pipelines using Spark leveraging Python as Programming Language

    Use of different file formats such as Parquet, JSON, CSV etc in building Data Engineering Pipelines

    Setup self support single node Hadoop and Spark Cluster to get enough practice on HDFS and YARN

    Understanding Complete Spark Application Development Life Cycle to build Spark Applications using Pyspark. Review the applications using Spark UI.

    Requirements

    Laptop with decent configuration (Minimum 4 GB RAM and Dual Core)

    Sign up for GCP with the available credit or AWS Access

    Setup self support lab on cloud platforms (you might have to pay the applicable cloud fee unless you have credit)

    CS or IT degree or prior IT experience is highly desired

    Description

    As part of this course, you will learn all the Data Engineering Essentials related to building Data Pipelines using SQL, Python as Hadoop, Hive, or Spark SQL as well as PySpark Data Frame APIs. You will also understand the development and deployment lifecycle of Python applications using Docker as well as PySpark on multinode clusters. You will also gain basic knowledge about reviewing Spark Jobs using Spark UI.About Data EngineeringData Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc.Here are some of the challenges the learners have to face to learn key Data Engineering Skills such as Python, SQL, PySpark, etc.Having an appropriate environment with Apache Hadoop, Apache Spark, Apache Hive, etc working together.Good quality content with proper support.Enough tasks and exercises for practiceThis course is designed to address these key challenges for professionals at all levels to acquire the required Data Engineering Skills (Python, SQL, and Apache Spark).To make sure you spend time learning rather than struggling with technical challenges, here is what we have done.Training using an interactive environment. You will get 2 weeks of lab access, to begin with. If you like the environment and acknowledge it by providing ratings and feedback, the lab access will be extended to additional 6 weeks (2 months). Feel free to send an email to support@itversity.com to get complementary lab access. Also, if your employer provides a multi-node environment, we will help you set up the material for the practice as part of the live session. On top of Q&A Support, we also provide required support via live sessions.Make sure we have a system with the right configuration and quickly set up a lab using Docker with all the required Python, SQL, Pyspark as well as Spark SQL material. It will address a lot of pain points related to networking, database integration, etc. Feel free to reach out to us via Udemy Q&A, in case you struck at the time of setting up the environment.You will start with foundational skills such as Python as well as SQL using a Jupyter-based environment. Most of the lecturers have quite a few tasks and also at the end of each and every module, there are enough exercises or practice tests to evaluate the skills taught.Once you are comfortable with programming using Python and SQL, then you will ensure you understand how to quickly set up and access Single Node Hadoop and Spark Cluster.The content is streamlined in such a way that, you use learner-friendly interfaces such as Jupyter Lab to practice them.If you end up signing up for the course do not forget to rate us 5* if you like the content. If not, feel free to reach out to us and we will address your concerns.Highlights of this courseHere are some of the highlights of this Data Engineering course using technologies such as Python, SQL, Hadoop, Spark, etc.The course is designed by 20+ years of experienced veteran (Durga Gadiraju) with most of his experience around data. He has more than a decade of Data Engineering as well as Big Data experience with several certifications. He has a history of training hundreds of thousands of IT professionals in Data Engineering as well as Big Data.Simplified setup of all the key tools to learn Data Engineering or Big Data such as Hadoop, Spark, Hive, etc.Dedicated support where 100% of questions are answered in the past few months.Tons of material with real-world experiences and Data Sets. The material is made available both under the Git repository as well as in the lab which you are going to set up.Complementary Lab Access for 2 Weeks which can be extended to 8 Weeks.30 Day Money back guarantee.Content DetailsAs part of this course, you will be learning Data Engineering Essentials such as SQL, and Programming using Python and Apache Spark. Here is the detailed agenda for the course.Data Engineering Labs - Python and SQLYou will start with setting up self-support Data Engineering Labs on Cloud9 or on your Mac or PC so that you can learn the key skills related to Data Engineering with a lot of practice leveraging tasks and exercises provided by us. As you pass the sections related to SQL and Python, you will also be guided to set up Hadoop and Spark Lab.Provision AWS Cloud9 Instance (in case your Mac or PC does not have enough capacity)Setup Docker Compose to start the containers to learn Python and SQL (using Postgresql)Access the material via Jupyter Lab environment setup using Docker and learn via hands-on practice.Once the environment is set up, the material will be directly accessible.Database Essentials - SQL using PostgresIt is important for one to be proficient with SQL to take care of building data engineering pipelines. SQL is used for understanding the data, performing ad-hoc analysis, and also in building data engineering pipelines.Getting Started with PostgresBasic Database Operations (CRUD or Insert, Update, Delete)Writing Basic SQL Queries (Filtering, Joins, and Aggregations)Creating Tables and Indexes using Postgres DDL CommandsPartitioning Tables and Indexes using Postgres DDL CommandsPredefined Functions using SQL (String Manipulation, Date Manipulation, and other functions)Writing Advanced SQL Queries using PostgresqlProgramming Essentials using PythonPython is the most preferred programming language to develop data engineering applications. As part of several sections related to Python, you will be learning most of the important aspects of Python to build data engineering applications effectively.Perform Database OperationsGetting Started with PythonBasic Programming Constructs in Python (for loops, if conditions)Predefined Functions in Python (string manipulation, date manipulation, and other standard functions)Overview of Collections such as list and set in PythonOverview of Collections such as dict and tuple in PythonManipulating Collections using loops in Python. This is primarily designed to get enough practice with Python Programming around Python Collections.Understanding Map Reduce Libraries in Python. You will learn functions such as map, filter, etc. You will also understand details about itertools.Overview of Python Pandas Libraries. You will be learning about how to read from files, and processing the data in Pandas Data Frame by applying Standard Transformations such as filtering, joins, sorting, etc. Also, you'll be learning how to write data to files. Database Programming using Python - CRUD OperationsDatabase Programming using Python - Batch Operations. There will be enough emphasis on best practices to load data into Databases in bulk or batches.Setting up Single Node Data Engineering Cluster for PracticeThe most common approach to building data engineering applications at scale is by using Apache Spark integrated with HDFS and YARN. Before getting into data engineering using Apache Spark and Hadoop, we need to set up an environment to practice data engineering using Apache Spark. As part of this section, we will primarily focus on setting up a single node cluster to learn key skills related to data engineering using distributed frameworks such as Apache Spark and Apache Hadoop.We have simplified the complex tasks of setting up Apache Hadoop, Apache Hive, and Apache Spark leveraging Docker. Within an hour without running into too many technical issues, you will be able to set up the cluster. However, if you run into any issues, feel free to reach out to us and we will help you to overcome the challenges.Master required Hadoop Skills to build Data Engineering ApplicationsAs part of this section, you will primarily focus on HDFS commands so that we can copy files into HDFS. The data copied into HDFS will be used as part of building data engineering pipelines using Spark and Hadoop with Python as a Programming Language.Overview of HDFS CommandsCopy Files into HDFS using put or copyFromLocal command using appropriate HDFS CommandsReview whether the files are copied properly or not to HDFS using HDFS Commands.Get the size of the files using HDFS commands such as du, df, etc.Some fundamental concepts related to HDFS such as block size, replication factor, etc.Data Engineering using Spark SQLLet us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.Getting Started with Spark SQLBasic Transformations using Spark SQLManaging Tables - Basic DDL and DML in Spark SQLManaging Tables - DML and Create Partitioned Tables using Spark SQLOverview of Spark SQL Functions to manipulate strings, dates, null values, etcWindowing Functions using Spark SQL for ranking, advanced aggregations, etc.Data Engineering using Spark Data Frame APIsSpark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Apache Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.Data Processing Overview using Spark or Pyspark Data Frame APIs.Projecting or Selecting data from Spark Data Frames, renaming columns, providing aliases, dropping columns from Data Frames, etc using Pyspark Data Frame APIs.Processing Column Data using Spark or Pyspark Data Frame APIs - You will be learning functions to manipulate strings, dates, null values, etc.Basic Transformations on Spark Data Frames using Pyspark Data Frame APIs such as Filtering, Aggregations, and Sorting using functions such as filter/where, groupBy with agg, sort or orderBy, etc.Joining Data Sets on Spark Data Frames using Pyspark Data Frame APIs such as join. You will learn inner joins, outer joins, etc using the right examples.Windowing Functions on Spark Data Frames using Pyspark Data Frame APIs to perform advanced Aggregations, Ranking, and Analytic FunctionsSpark Metastore Databases and Tables and integration between Spark SQL and Data Frame APIsDevelopment, Deployment as well as Execution Life Cycle of Spark ApplicationsOnce you go through the content related to Apache Spark using a Jupyter-based environment, we will also walk you through the details about how the Spark applications are typically developed using Python, deployed as well as reviewed.Setup Python Virtual Environment and Project for Spark Application Development using PycharmUnderstand complete Spark Application Development Lifecycle using Pycharm and PythonBuild a zip file for the Spark Application, copy it to the environment where it is supposed to run, and run.Understand how to review the Spark Application Execution Life Cycle.Desired Audience for this Data Engineering Essentials coursePeople from different backgrounds can aim to become Data Engineers. We cover most of the Data Engineering essentials for the aspirants who want to get into the IT field as Data Engineers as well as professionals who want to propel their career toward Data Engineering from legacy technologies.College students and entry-level professionals to get hands-on expertise with respect to Data Engineering. This course will provide enough skills to face interviews for entry-level data engineers.Experienced application developers to gain expertise related to Data Engineering.Conventional Data Warehouse Developers, ETL Developers, Database Developers, and PL/SQL Developers to gain enough skills to transition to being successful Data Engineers.Testers to improve their testing capabilities related to Data Engineering applications.Other hands-on IT Professional who wants to get knowledge about Data Engineering with Hands-On Practice.Prerequisites to practice Data Engineering SkillsHere are the prerequisites for someone who wants to be a Data Engineer.LogisticsComputer with decent configuration (At least 4 GB RAM, however 8 GB is highly desired). However, this will not suffice if you do not have a multi-node cluster. We will walk you through the cheaper options to set up the environment and practice.Dual Core is required and Quad-Core is highly desiredChrome BrowserHigh-Speed InternetDesired BackgroundEngineering or Science DegreeAbility to use computerKnowledge or working experience with databases and any programming language is highly desiredTraining Approach for learning required Data Engineering SkillsHere are the details related to the training approach for you to master all the key Data Engineering Skills to propel your career toward Data Engineering.It is self-paced with reference material, code snippets, and videos provided as part of Udemy.One can either use the environment provided by us or set up their own environment using Docker on AWS or GCP or the platform of their choice.We would recommend completing 2 modules every week by spending 4 to 5 hours per week.It is highly recommended to take care of the exercises at the end to ensure that you are able to meet all the key objectives for each module.Support will be provided through Udemy Q&A.The course is designed in such a way that one can self-evaluate through the course and confirm whether the skills are acquired.Here is the approach we recommend you to take this course.The course is hands-on with thousands of tasks, you should practice as you go through the course.You should also spend time understanding the concepts. If you do not understand the concept, I would recommend moving on and coming back later to the topic.Go through the consolidated exercises and see if you are able to solve the problems or not.Make sure to follow the order we have defined as part of the course.After each and every section or module, make sure to solve the exercises. We have provided enough information to validate the output.By the end of the course, then you can come to the conclusion that you are able to master essential skills related to SQL, Python, and Apache Spark.

    Overview

    Section 1: Introduction about the course

    Lecture 1 Introduction about course

    Lecture 2 Desired Audience

    Lecture 3 Pre-requisites

    Lecture 4 [Must Watch] 30 Day Money Back Guarantee - Feedback and Rating

    Lecture 5 Training Approach

    Lecture 6 Overview of Environment for Hands on Practice

    Lecture 7 How to access data sets used in this course?

    Section 2: Getting Started with ITVersity Labs for Data Engineering Essentials on Udemy

    Lecture 8 Introduction to Getting Started with ITVersity Labs and Udemy

    Lecture 9 Logging in into the ITVersity Python and Data Engineering Lab

    Lecture 10 Setup Data Engineering Material from GitHub

    Lecture 11 Overview of ITVersity Labs and Udemy

    Lecture 12 Overview of Jupyter Lab Environment

    Lecture 13 Using Jupyter Lab Sidebar to Navigate through the content

    Lecture 14 Understanding Jupyter Launcher

    Lecture 15 Creating Jupyter Notebooks and Overview of Kernels

    Lecture 16 Managing Tabs and Kernels using Jupyter Lab Environment

    Lecture 17 Overview of Jupyter Notebooks and Cells

    Lecture 18 Running Shell Commands using Jupyter Notebook

    Lecture 19 Getting Information to Connect to Databases to run queries

    Lecture 20 Running SQL Queries using Jupyter Notebooks

    Section 3: Setup Environment to learn Python, SQL, Hadoop, Spark using Docker on Windows 11

    Lecture 21 Setup Environment using Docker on Windows 11 - Introduction

    Lecture 22 Understanding System Configuration of Windows 11 PC

    Lecture 23 Steps to setup Docker Desktop on Windows 11

    Lecture 24 Enable WSL2 on Windows 11 by installing Ubuntu VM using WSL

    Lecture 25 Install Linux Kernel Update Package on Windows 11 for Docker Desktop

    Lecture 26 Download and Install Docker Desktop on Windows 11

    Lecture 27 Validating git using WSL Ubuntu on Windows 11

    Lecture 28 Clone Data Engineering Essentials Material on Windows 11

    Lecture 29 Start Python and SQL Containers using docker-compose command on Windows 11

    Lecture 30 Download and Install Pycharm on Windows 11

    Lecture 31 Setup Pycharm Project for Data Engineering

    Lecture 32 Review Docker Compose File for Data Engineering Essentials Material

    Lecture 33 Review important Docker Compose Commands to manage services

    Lecture 34 Access Jupyter Based Environment to learn Python and SQL

    Lecture 35 Getting Jupyter Lab Token to login into Jupyter Lab

    Section 4: Setup Environment to learn Python, SQL, Hadoop, Spark using Docker on Windows 10

    Lecture 36 Understanding System Configuration

    Lecture 37 Setup Docker Desktop on Windows

    Lecture 38 Validate Docker on Windows using Command Line leveraging Power Shell

    Lecture 39 Review Docker Desktop Resource Configurations

    Lecture 40 Clone GitHub Repository on Windows

    Lecture 41 Setup Pycharm Project for Data Engineering Essentials

    Lecture 42 Update Git Global Settings related to Line Endings

    Lecture 43 Review Services Docker Compose

    Lecture 44 Start Python and SQL Environment using Docker Compose

    Lecture 45 Review resource utilization after setting up Python and SQL Environment

    Lecture 46 Access Jupyter Based Environment to learn Python

    Lecture 47 Getting Jupyter Lab Token to login into Jupyter Lab

    Section 5: Setup Environment to learn Python, SQL, Hadoop and Spark using Docker on Mac

    Lecture 48 Setup Environment using Mac

    Lecture 49 Setup Docker Desktop on Mac

    Lecture 50 Validate Docker Setup on Mac

    Lecture 51 Review Memory and CPU Settings of Docker Desktop for Mac

    Lecture 52 Configure Docker Desktop for Data Engineering Essentials Environment

    Lecture 53 Clone GitHub Repository for Data Engineering Essentials

    Lecture 54 Setup as Pycharm Project to review the files using IDE

    Lecture 55 Review Docker Compose file for Python and SQL Lab

    Lecture 56 Start Python and SQL Environment using Docker Compose

    Lecture 57 Review resource utilization after setting up Python and SQL Environment

    Lecture 58 Access Jupyter Based Environment to learn Python

    Lecture 59 Getting Jupyter Lab Token to login into Jupyter Lab

    Section 6: Setting up Environment to learn Python, SQL as well as Spark using AWS Cloud9

    Lecture 60 Getting Started with Cloud9

    Lecture 61 Creating Cloud9 Environment

    Lecture 62 Warming up with Cloud9 IDE

    Lecture 63 Details about material to setup postgres database using docker

    Lecture 64 Overview of EC2 related to Cloud9

    Lecture 65 Opening ports for Cloud9 Instance

    Lecture 66 Associating Elastic IPs to Cloud9 Instance

    Lecture 67 Increase EBS Volume Size of Cloud9 Instance

    Lecture 68 Setup Docker Compose on AWS Cloud9 Instance

    Lecture 69 Clone GitHub Repository

    Lecture 70 Setup Python and SQL Environment using Docker Compose

    Lecture 71 Update Inbound Rules of AWS EC2 Security Group

    Lecture 72 Login into the Jupyter based environment

    Section 7: Networking Concepts for Beginners - ip addresses and port numbers

    Lecture 73 Enable telnet on Windows

    Lecture 74 Different IP Address Types

    Lecture 75 Port Numbers associated with Applications or Services

    Lecture 76 Reverting port for SSH to default port number

    Lecture 77 Setup Apache2 on Ubuntu

    Lecture 78 Overview of localhost

    Lecture 79 Overview of Private IP Address associated with a server

    Lecture 80 Overview of Public IP Address associated with a server

    Lecture 81 Setup Web Application and access using local ip

    Lecture 82 Setup Web Application and access using private ip

    Lecture 83 Disable Access to Web Application using Public ip

    Lecture 84 Install sshuttle on Mac using brew

    Lecture 85 Access Web Application using Private IP using SSH as proxy

    Section 8: Database Essentials - Getting Started

    Lecture 86 Setup SMS Database using Postgres

    Lecture 87 Connecting to Postgresql Database

    Lecture 88 Using psql to interact with Postgresql Database using CLI

    Lecture 89 Data Loading Utilities in Postgresql

    Section 9: Database Essentials - Database Operations

    Lecture 90 Database Operations - Overview

    Lecture 91 Database CRUD Operations

    Lecture 92 Creating Table in Postgres Database

    Lecture 93 Inserting Data into Postgres Database Table

    Lecture 94 Updating Data in Postgres Database Table

    Lecture 95 Deleting Data in Postgres Database Table

    Lecture 96 Overview of Database Transactions

    Lecture 97 Exercise - DML or CRUD Operations using Postgresql

    Section 10: Database Essentials - Writing Basic SQL Queries

    Lecture 98 Standard Transformations

    Lecture 99 Overview of Data Model

    Lecture 100 Define Problem Statement

    Lecture 101 Preparing Database Tables using Postgres

    Lecture 102 Selecting or Projecting Data from Postgres Database Tables using SQL

    Lecture 103 Filtering Data from Postgres Database Tables using SQL

    Lecture 104 Joining Postgres Database Tables using SQL - Inner

    Lecture 105 Joining Postgres Database Tables using SQL - Outer

    Lecture 106 Performing Aggregations using SQL on Postgres Database Tables

    Lecture 107 Sorting Data in Postgres Tables using SQL

    Lecture 108 Solution - Daily Product Revenue using SQL on Postgres Database Tables

    Lecture 109 Exercises - Writing Basic SQL Queries on Postgres Database Tables

    Section 11: Database Essentials - Creating Tables and Indexes

    Lecture 110 DDL - Data Definition Language

    Lecture 111 Overview of Data Types used while creating Postgres Database Tables

    Lecture 112 Adding or Modifying Columns using Alter in Postgres Database Tables

    Lecture 113 Different Type of Constraints used on Database Tables

    Lecture 114 Managing Constraints on Postgres Database Tables

    Lecture 115 Indexes on Postgres Database Tables

    Lecture 116 Indexes for Constraints on Postgres Database Tables

    Lecture 117 Overview of Sequences used on Postgres Database Tables

    Lecture 118 Truncating Postgres Database Tables

    Lecture 119 Dropping Postgres Database Tables

    Lecture 120 Exercises and Solutions - Managing Database Objects using Postgresql

    Section 12: Database Essentials - Partitioning Tables and Indexes

    Lecture 121 Overview of Partitioning of Postgres Database Tables

    Lecture 122 List Partitioning of Database Tables

    Lecture 123 Managing Partitions of Postgres Database Tables - List

    Lecture 124 Manipulating Data in Postgres Database Partitioned Tables

    Lecture 125 Range Partitioning of Postgres Database Tables

    Lecture 126 Managing Partitions of Postgres Database Tables - Range

    Lecture 127 Repartitioning of Postgres Database Tables - Range

    Lecture 128 Hash Partitioning of Postgres Database Tables

    Lecture 129 Managing Partitions of Postgres Database Tables - Hash

    Lecture 130 Usage Scenarios of Database Partitioned Tables

    Lecture 131 Sub Partitioning of Postgres Database Tables

    Lecture 132 Exercise - Partitioned Tables of Postgres Database Tables

    Section 13: Database Essentials - Predefined Functions

    Lecture 133 Overview of SQL Functions in Postgres

    Lecture 134 String Manipulation Functions in SQL using Postgres

    Lecture 135 Case Conversion and Length using Functions in SQL using Postgres

    Lecture 136 Extracting Data - Using substr and split_part Functions in SQL using Postgres

    Lecture 137 Using position or strpos Functions in SQL using Postgres

    Lecture 138 Trimming and Padding Functions in SQL using Postgres

    Lecture 139 Reverse and Concatenate Multiple Strings using Functions in SQL using Postgres

    Lecture 140 String Replacement using Functions in SQL using Postgres

    Lecture 141 Date Manipulation Functions using SQL in Postgres

    Lecture 142 Getting Current Date or Timestamp using Functions in SQL using Postgres

    Lecture 143 Date Arithmetic using Functions in SQL using Postgres

    Lecture 144 Beginning Date or Time using date_trunc Function in SQL using Postgres

    Lecture 145 Using to_char and to_date Functions in SQL using Postgres

    Lecture 146 Extracting Information using extract Function in SQL using Postgres

    Lecture 147 Dealing with Unix Timestamp or epoch using Functions in SQL using Postgres

    Lecture 148 Overview of Numeric Functions using SQL in Postgres

    Lecture 149 Data Type Conversion using Functions in SQL using Postgres

    Lecture 150 Handling NULL Values using SQL in Postgres

    Lecture 151 Using CASE and WHEN as part of SQL in Postgres

    Section 14: Database Essentials - Writing Advanced SQL Queries

    Lecture 152 Overview of Database Views using Postgres Database

    Lecture 153 Overview of Named Queries using SQL in Postgres

    Lecture 154 Overview of Sub Queries using SQL in Postgres

    Lecture 155 CTAS - Create Table As Select using Postgres

    Lecture 156 Advanced DML Operations on Postgres Database Tables

    Lecture 157 Merging or Upserting Data into Postgres Database Tables

    Lecture 158 Pivoting Rows into Columns using SQL in Postgres

    Lecture 159 Overview of Analytic Functions using SQL in Postgres

    Lecture 160 Analytic Functions - Aggregations using SQL in Postgres

    Lecture 161 Cumulative or Moving Aggregations using SQL in Postgres

    Lecture 162 Analytic Functions using SQL in Postgres - Windowing

    Lecture 163 Analytic Functions using SQL in Postgres - Ranking

    Lecture 164 Analytic Functions using SQL in Postgres - Filtering

    Lecture 165 Ranking and Filtering using SQL in Postgres - Recap

    Lecture 166 Exercises - Writing Advanced Queries

    Section 15: Programming Essentials using Python - Perform Database Operations

    Lecture 167 Introduction - Perform Database Operations

    Lecture 168 Overview of SQL

    Lecture 169 Create Database and Users Table

    Lecture 170 DDL - Data Definition Language

    Lecture 171 DML - Data Manipulation Language

    Lecture 172 DQL - Data Query Language

    Lecture 173 CRUD Operations - DML and DQL

    Lecture 174 TCL - Transaction Control Language

    Lecture 175 Example - Data Engineering

    Lecture 176 Example - Web Application

    Lecture 177 Exercise - Database Operations

    Section 16: Programming Essentials using Python - Getting Started with Python

    Lecture 178 Installing Python on Windows

    Lecture 179 Overview of Anaconda

    Lecture 180 Python CLI and Jupyter Notebook

    Lecture 181 Overview of Jupyter Lab

    Lecture 182 Using IDEs - Pycharm

    Lecture 183 Using Visual Studio Code

    Lecture 184 Using ITVersity Labs

    Lecture 185 Leveraging Google Colab

    Section 17: Programming Essentials using Python - Basic Programming Constructs

    Lecture 186 Basic Programming Constructs using Python - Introduction

    Lecture 187 Getting Help using help function in Python

    Lecture 188 Python Variables and Objects

    Lecture 189 Python Data Types - Commonly Used

    Lecture 190 Operators in Python

    Lecture 191 Tasks - Data Types and Operators using Python

    Lecture 192 Developing Conditionals using Python

    Lecture 193 All about for loops in Python

    Lecture 194 Running os commands in Python

    Lecture 195 Exercises - Basic Programming Constructs using Python

    Lecture 196 Dynamic Arithmetic Operations using eval and exec in Python

    Section 18: Programming Essentials using Python - Predefined Functions

    Lecture 197 Predefined Functions in Python - Introduction

    Lecture 198 Overview of Predefined Functions in Python

    Lecture 199 Numeric Functions in Python

    Lecture 200 Overview of Strings in Python

    Lecture 201 String Manipulation Functions in Python

    Lecture 202 Formatting Strings in Python

    Lecture 203 Print and Input Functions in Python

    Lecture 204 Date Manipulation Functions in Python

    Lecture 205 Exercises - Predefined Functions in Python

    Section 19: Programming Essentials using Python - User Defined Functions

    Lecture 206 Developing User Defined Functions in Python - Introduction

    Lecture 207 Defining Functions in Python

    Lecture 208 Doc Strings in Python

    Lecture 209 Returning Variables from Python Functions

    Lecture 210 Passing Function Parameters and Arguments to Python Functions

    Lecture 211 Varying Arguments in Python

    Lecture 212 Keyword Arguments in Python

    Lecture 213 Recap of User Defined Functions in Python

    Lecture 214 Passing Functions as Arguments to Python Functions

    Lecture 215 Lambda or Anonymous Functions in Python

    Lecture 216 Usage of Lambda Functions in Python Functions

    Lecture 217 Exercise - User Defined Functions in Python

    Section 20: Programming Essentials using Python - Overview of Collections - list and set

    Lecture 218 Overview of Collections in Python - list and set - Introduction

    Lecture 219 Overview of list and set in Python

    Lecture 220 Common Operations on Python Collections

    Lecture 221 Accessing elements from Python list

    Lecture 222 Adding elements to Python list

    Lecture 223 Updating and Deleting elements from Python list

    Lecture 224 Other or Miscellaneous Python list operations

    Lecture 225 Adding and Deleting elements using Python set

    Lecture 226 Typical Python set operations

    Lecture 227 Validating Python sets

    Lecture 228 Usage of Python list and set

    Lecture 229 Exercises - Basic Operations on Python list and set

    Lecture 230 Python List of Delimited Strings

    Lecture 231 Sorting data in Python lists and tuples

    Lecture 232 Sorting list of Delimited Strings using Python

    Lecture 233 Exercises - Sorting lists and sets in Python

    Section 21: Programming Essentials using Python - Overview of Collections - dict and tuple

    Lecture 234 Manipulating Collections using loops in Python - Introduction

    Lecture 235 Overview of Python dict and tuple

    Lecture 236 Common Operations on dict and tuple using Python

    Lecture 237 Accessing Elements from Python tuples

    Lecture 238 Accessing Elements from Python dict

    Lecture 239 Manipulating Python dict

    Lecture 240 Common Examples of Python dict

    Lecture 241 Representing Tables or Excel Sheets as Python List of Tuples

    Lecture 242 Representing Tables or Excel Sheets as Python List of dicts

    Lecture 243 Process Python dict values

    Lecture 244 Processing Python dict items

    Lecture 245 Sorting Python dict items

    Lecture 246 Exercises - Overview of Python Collections - dict and set

    Section 22: Programming Essentials using Python - Manipulating Collections using loops

    Lecture 247 Manipulating Collections using loops in Python - Introduction

    Lecture 248 Reading Files into Python Collections

    Lecture 249 Overview of Standard Transformations

    Lecture 250 Row Level Transformations using Python loops

    Lecture 251 Getting Unique Elements using Python loops

    Lecture 252 Filtering Data using Python loops and conditionals

    Lecture 253 Preparing Data Sets

    Lecture 254 Quick recap of Python dict operations

    Lecture 255 Performing Total Aggregations using Python loops

    Lecture 256 Overview of Grouped Aggregations using Python loops

    Lecture 257 Get Order Count by Status using Python loops

    Lecture 258 Get Revenue Details per Order using Python loops

    Lecture 259 Get Order Count by Month using Python loops

    Lecture 260 Joining Data Sets using Python loops

    Lecture 261 Manipulate Collections using Comprehensions in Python

    Lecture 262 List Comprehensions using Python

    Lecture 263 Set Comprehensions using Python

    Lecture 264 Dict Comprehensions in Python

    Lecture 265 Limitations of using loops to process data sets

    Lecture 266 Exercises - Manipulating Collections using Python loops

    Section 23: Programming Essentials using Python - Development of Map Reduce APIs

    Lecture 267 Develop myFilter Function using Python loops and conditionals

    Lecture 268 Validate myFilter using Python loops and conditionals

    Lecture 269 Develop myMap Function using Python loops

    Lecture 270 Validate myMap Function using Python loops

    Lecture 271 Develop myReduce Function using Python loops

    Lecture 272 Validate myReduce Function using Python loops

    Lecture 273 Develop myReduceByKey Function using Python loops

    Lecture 274 Validate myReduceByKey Function using Python loops

    Lecture 275 Develop myJoin Function using Python loops

    Lecture 276 Validate myJoin Function using Python loops

    Lecture 277 Exercises - Development of Map Reduce APIs using Python loops and Conditionals

    Section 24: Programming Essentials using Python - Understanding Map Reduce Libraries

    Lecture 278 Preparing Data Sets

    Lecture 279 Filtering Data using Python filter

    Lecture 280 Projecting data using Python map

    Lecture 281 Row Level Transformations using Python map

    Lecture 282 Aggregations using Python reduce

    Lecture 283 Get Revenue for a given product id using Python Map Reduce

    Lecture 284 Get total items sold and revenue for a product using Python Map reduce

    Lecture 285 Get total commission amount using Python Map Reduce

    Lecture 286 Overview of itertools

    Lecture 287 Cumulative Operations using Python itertools

    Lecture 288 Using Python itertools starmap

    Lecture 289 Overview of Python itertools groupby

    Lecture 290 Get order count by status using Python itertools groupby

    Lecture 291 Get revenue per order using Python itertools groupby

    Lecture 292 Limitations of Python Map Reduce Libraries

    Lecture 293 Exercises - Understanding Python Map Reduce Libraries

    Section 25: Programming Essentials using Python - Basics of File IO using Python

    Lecture 294 Basics of File IO using Python - Introduction

    Lecture 295 Overview of File IO using Python

    Lecture 296 Understand concepts behind Folders and Files

    Lecture 297 Getting File Paths and File Names

    Lecture 298 Overview of Retail Data

    Lecture 299 Read text file into string using Python File I/O

    Lecture 300 Write string to text file using Python File I/O

    Lecture 301 Overview of modes to write into files using Python File I/O

    Lecture 302 Overview of Delimited Strings

    Lecture 303 Read csv into list of strings using Python File I/O

    Lecture 304 Writing Strings to file in Append Mode using Python File I/O

    Lecture 305 Managing Files and Folders using Python File I/O

    Section 26: Programming Essentials using Python - Delimited Files and Collections

    Lecture 306 Understanding Delimited Files and Collections

    Lecture 307 Overview of Delimited Text Files

    Lecture 308 Recap of basic file IO using Python

    Lecture 309 Read Delimited files into list of tuples using Python File I/O

    Lecture 310 Write Delimited Strings into files using Python File I/O

    Lecture 311 Overview of Python CSV Module to process files

    Lecture 312 Read Delimited data into list using Python CSV APIs

    Lecture 313 Writing iterables to files using Python CSV APIs

    Lecture 314 Advantages of using using APIs in Python CSV module

    Lecture 315 Apply Schema on lists from files using Python

    Section 27: Programming Essentials using Python - Overview of Pandas Libraries

    Lecture 316 Overview of Python Pandas Libraries

    Lecture 317 Understanding Python Pandas Data Structures

    Lecture 318 Overview of Python Series

    Lecture 319 Creating Python Data Frames from lists

    Lecture 320 Basic Operations on Python Data Frames

    Lecture 321 Reading Data from CSV Files to Python Pandas Data Frames

    Lecture 322 Projecting and Filtering using Python Pandas Data Frame APIs

    Lecture 323 Performing Total Aggregations using Python Pandas Data Frame APIs

    Lecture 324 Performing Grouped Aggregations using Python Pandas Data Frame APIs

    Lecture 325 Writing Python Pandas Data Frames to Files

    Lecture 326 Joining Data in Python Pandas Data Frames using join

    Section 28: Programming Essentials using Python - Database Programming - CRUD Operations

    Lecture 327 Database Operations using Python - CRUD Operations - Introduction

    Lecture 328 Overview of Database Programming using Python

    Lecture 329 Recap of RDBMS Concepts

    Lecture 330 Setup Database Client Libraries for Python Applications

    Lecture 331 Develop Function to get Database Connection using Python

    Lecture 332 Create Database Table in Postgres using Python

    Lecture 333 Inserting Data into Table in Postgres using Python

    Lecture 334 Updating Existing Table Data in Postgres using Python

    Lecture 335 Deleting Data From Table in Postgres using Python

    Lecture 336 Querying Data From Table in Postgres using Python

    Lecture 337 Recap - CRUD Operations using Python

    Section 29: Programming Essentials using Python - Database Programming - Batch Operations

    Lecture 338 Database Programming using Python - Batch Operations - Introduction

    Lecture 339 Recap of Insert using Python

    Lecture 340 Preparing Database to perform batch operations using Python

    Lecture 341 Reading Data From File using Python File I/O

    Lecture 342 Batch Loading of Data into Database Table using Python

    Lecture 343 Best Practices for Batch Loading into Database Table using Python

    Section 30: Programming Essentials using Python - Processing JSON Data

    Lecture 344 Processing JSON Data - Introduction

    Lecture 345 Process JSON using Python Pandas

    Lecture 346 JSON Data Types

    Lecture 347 Create JSON String

    Lecture 348 Process JSON String

    Lecture 349 Single JSON Document in Files

    Lecture 350 Multiple JSON Documents in files

    Lecture 351 Process JSON using Pandas

    Lecture 352 Different JSON Formats supported by Python Pandas

    Lecture 353 Common Use Cases for JSON

    Lecture 354 Write to JSON files using Python json module

    Lecture 355 Write to JSON files using Python Pandas

    Section 31: Programming Essentials using Python - Processing REST Payloads

    Lecture 356 Overview of REST APIs

    Lecture 357 Using curl command

    Lecture 358 Overview of Postman

    Lecture 359 Getting Started with Python requests module

    Lecture 360 Convert REST Payload to Python Objects

    Lecture 361 Process REST Payload using Python Collection Operations

    Lecture 362 Process REST Payload using Python Pandas

    Section 32: Understanding Python Virtual Environments

    Lecture 363 Introduction to Python Virtual Environments

    Lecture 364 Validating Python Versions

    Lecture 365 Create Python Virtual Environment for Web Application

    Lecture 366 Reviewing dependencies installed in Python Virtual Environment

    Lecture 367 Installing Dependencies for Web Application using Python pip

    Lecture 368 Getting Details about installed packages using Python pip

    Lecture 369 Uninstall Packages using Python pip

    Lecture 370 Cleanup Python Virtual Environment

    Lecture 371 Recreate and Activate Python Virtual Environment for Web Application

    Lecture 372 Define requirements file for Python Web Application

    Lecture 373 Install Dependencies using requirements file for Python Web Application

    Lecture 374 Create Virtual Environment for Data Engineering Application using Python

    Lecture 375 Install Dependencies for Data Engineering Application using Python

    Lecture 376 Install Dependencies for Data Engineering Application using Python 3.6

    Lecture 377 Validate Python and Package Compatibility and Install Python 3.6

    Lecture 378 Conclusion about understanding Python Virtual Environments

    Section 33: Overview of Pycharm for Python Application Development

    Lecture 379 Introduction to Pycharm for Python Application Development

    Lecture 380 Installation of Pycharm on Windows for Python Application Development

    Lecture 381 Installation of Pycharm on Mac for Python Application Development

    Lecture 382 Setup Python Getting Started Project using Pycharm

    Lecture 383 Setup Python Getting Started Project using Pycharm on Mac

    Lecture 384 Setup de-demo Python project using Pycharm

    Lecture 385 Accessing Settings in Pycharm and Changing Font Size

    Lecture 386 Accessing Settings in Pycharm and Changing Font Size on Mac

    Lecture 387 Install Python Packages using Pycharm

    Lecture 388 Overview of Pycharm Integrated Terminal

    Lecture 389 Overview of Pycharm Integrated Terminal on Mac

    Lecture 390 Overview of Run Time Arguments for Python Applications

    Lecture 391 Passing Run Time Arguments to Python Applications using Pycharm

    Section 34: Data Copier - Getting Started

    Lecture 392 Introduction to Getting Started for Data copier using Python

    Lecture 393 Problem Statement - Data Copier using Python

    Lecture 394 Create Working Directory for the Python Project

    Lecture 395 Setup Docker on Windows 10 Pro

    Lecture 396 Quick Overview of Docker

    Lecture 397 Prepare Dataset

    Lecture 398 Create Postgres Container

    Lecture 399 Setup Postgres Database for development

    Lecture 400 Overview of Postgres Database Commands

    Lecture 401 Setup Python Project using Pycharm

    Lecture 402 Managing Python Dependencies for the project

    Lecture 403 Create GitHub Project

    Section 35: Data Copier - Reading Data using Pandas

    Lecture 404 Reading Data using Python Pandas - Introduction

    Lecture 405 Overview of Retail Data

    Lecture 406 Adding Python Pandas to the project

    Lecture 407 Reading JSON Data using Python Pandas

    Lecture 408 Previewing Data using Python Pandas

    Lecture 409 Reading Data in Chunks using Python Pandas

    Lecture 410 Dynamically read files using Python os module

    Section 36: Data Copier - Database Programming using Pandas

    Lecture 411 Database Programming using Python Pandas - Introduction

    Lecture 412 Validate Postgres Setup using Docker

    Lecture 413 Add required dependencies for database programming using Python pandas

    Lecture 414 Create users table in retail_db Database

    Lecture 415 Populating Sample Data into users table

    Lecture 416 Reading data from table using Python Pandas

    Lecture 417 Truncate users Postgres Database Table

    Lecture 418 Writing Python Pandas Dataframe to table

    Lecture 419 Validating users data in Postgres Database Table

    Lecture 420 Drop users Postgres Database Table

    Section 37: Data Copier - Loading Data from files to tables

    Lecture 421 Loading Data from files to tables - Introduction

    Lecture 422 Populating Departments data into table

    Lecture 423 Validate departments table

    Lecture 424 Populating orders table in chunks using Python Pandas

    Lecture 425 Validate orders table in Postgres Database

    Lecture 426 Validate orders table using pandas

    Section 38: Data Copier - Modularizing the application

    Lecture 427 Overview of Python main function

    Lecture 428 Overview of Python Environment Variables

    Lecture 429 Using Python os module for Environment Variables

    Lecture 430 Passing Environment Variables to Python Applications using Pycharm

    Lecture 431 Read logic using Python Pandas

    Lecture 432 Validate read logic developed using Python Pandas

    Lecture 433 Write logic using Python Pandas

    Lecture 434 Validate write logic developed using Python Pandas

    Lecture 435 Integrate read and write logic using Python

    Lecture 436 Validate Integration logic developed using Python

    Lecture 437 Develop logic to load multiple tables using Python

    Lecture 438 Validate Python logic for table list as run time argument

    Lecture 439 Push Python Application Changes to remote git repository

    Section 39: Data Copier - Dockerizing the application

    Lecture 440 Dockerizing the application - Introduction

    Lecture 441 Prepare Database for validation

    Lecture 442 Pull and validate appropriate python image

    Lecture 443 Create and attach network to database docker container

    Lecture 444 Quick recap about Docker containers

    Lecture 445 Review Python based Data Copier Application

    Lecture 446 Deploying Python application and installing dependencies in the docker container

    Lecture 447 Copy source data files into container

    Lecture 448 Add Python Data Copier container to custom network

    Lecture 449 Installing OS libraries as part of Docker container

    Lecture 450 Validate Network Connectivity between Docker Containers

    Lecture 451 Running Application from the Docker Container

    Lecture 452 Delete Docker Container

    Section 40: Data Copier - Using custom Docker Image

    Lecture 453 Using Custom Docker Image - Introduction

    Lecture 454 Getting started with docker custom image

    Lecture 455 Install OS Modules in custom docker image

    Lecture 456 Copying Python Source Code to Docker Custom Image

    Lecture 457 Adding dependencies to the custom image

    Lecture 458 Understanding docker custom image build process

    Lecture 459 Mounting Data Folders on to Docker Container

    Lecture 460 Passing Environment Variables to Docker Container

    Lecture 461 Add Python Data Copier Container to custom network

    Lecture 462 Run Python application using Docker

    Section 41: Data Copier - Deploy and Validate Application on Remote Server

    Lecture 463 Deploy and Validate Python Application on Remote Server - Introduction

    Lecture 464 Push Application Changes to GitHub Repository

    Lecture 465 Requirements to deploy application on Virtual Machine

    Lecture 466 Clone Application on remote machine

    Lecture 467 Setup Data Set for Validation

    Lecture 468 Setup Network and Database Folder for Database using Docker

    Lecture 469 Setup Docker Container for the Database

    Lecture 470 Setup Database and Tables as part of Docker based Database Server

    Lecture 471 Building Custom Docker Image for application

    Lecture 472 Run and Validate Dockerized Application

    Section 42: Validate ITVersity Hadoop and Spark Cluster (for ITVersity lab customers)

    Lecture 473 Setup Development Environment using VS Code Remote Development Extension Pack

    Lecture 474 Review Data Sets Provided as part of Gateway Nodes of Hadoop and Spark Cluster

    Lecture 475 Validate HDFS on Multi Node Hadoop and Spark Cluster from Gateway Node

    Lecture 476 Validate Hive on Hadoop and Spark Multinode Cluster

    Lecture 477 Review Hadoop HDFS and YARN Property Files on Hadoop and Spark Cluster

    Lecture 478 Review Hadoop HDFS and YARN Property Files using Visual Studio Code Editor

    Lecture 479 Review Hive Property Files on Multinode Hadoop and Spark Cluster

    Lecture 480 Review Spark 2 Property Files and Important Properties

    Lecture 481 Validate Spark Shell CLI using Spark 2

    Lecture 482 Validate Pyspark CLI using Spark 2

    Lecture 483 Validate Spark SQL CLI using Spark 2

    Lecture 484 Review Spark 3 Property Files and Important Properties

    Lecture 485 Validate Spark Shell CLI using Spark 3

    Lecture 486 Validate Pyspark CLI using Spark 3

    Lecture 487 Validate Spark SQL CLI using Spark 3

    Section 43: Setup Single Node Hadoop and Spark Cluster or Lab using Docker

    Lecture 488 Setup Single Node Hadoop and Spark Cluster or Lab using Docker

    Lecture 489 Pre-requisites to setup Hadoop and Spark Lab

    Lecture 490 Configure Docker Desktop

    Lecture 491 Update Hadoop and Spark Content

    Lecture 492 Clone GitHub Repository to setup and learn Hadoop and Spark

    Lecture 493 Cleaning up Docker Containers used for Python and SQL Practice

    Lecture 494 Review Hadoop and Spark Lab details in Docker Compose File

    Lecture 495 Pull Docker Image for Single Node Hadoop and Spark

    Lecture 496 Start Docker Containers related to Hadoop and Spark

    Lecture 497 Overview of reviewing Hadoop and Spark Lab setup using Docker

    Lecture 498 Connecting to Terminal of Spark and Hadoop Containers

    Lecture 499 Review HDFS and YARN on Single Node Hadoop and Spark Cluster

    Lecture 500 Review and Validate HIve on Single Node Hadoop and Spark Cluster

    Lecture 501 Validate Spark 2 using Pyspark and Spark SQL on Single Node Lab

    Lecture 502 Validate Spark 3 using Pyspark and Spark SQL on Single Node Lab

    Lecture 503 Validate HIve Metastore used as part of Single Node Hadoop and Spark Cluster

    Lecture 504 Access Hadoop and Spark Material using Jupyter lab environment

    Lecture 505 Managing Single Node Hadoop and Spark Cluster using Docker

    Section 44: Introduction to Hadoop eco system - Overview of HDFS

    Lecture 506 Getting help or usage

    Lecture 507 Listing HDFS Files

    Lecture 508 Managing HDFS Directories

    Lecture 509 Copying files from local to HDFS

    Lecture 510 Copying files from HDFS to local

    Lecture 511 Getting Files Metadata

    Lecture 512 Previewing Data in HDFS Files

    Lecture 513 HDFS Block Size

    Lecture 514 HDFS Replication Factor

    Lecture 515 Getting HDFS Storage Usage

    Lecture 516 USing HDFS Stat Commands

    Lecture 517 HDFS File Permissions

    Lecture 518 Overriding Properties of Hadoop or HDFS commands

    Section 45: Data Engineering using Spark SQL - Getting Started

    Lecture 519 Getting Started - Overview

    Lecture 520 Overview of Spark Documentation

    Lecture 521 Launching and using Spark SQL CLI

    Lecture 522 Overview of Spark SQL Properties

    Lecture 523 Running OS Commands using Spark SQL

    Lecture 524 Understanding Warehouse Directory

    Lecture 525 Managing Spark Metastore Databases

    Lecture 526 Managing Spark Metastore Tables

    Lecture 527 Retrieve Metadata of Tables

    Lecture 528 Role of Spark Metastore or Hive Metastore

    Lecture 529 Exercise - Getting Started with Spark SQL

    Section 46: Data Engineering using Spark SQL - Basic Transformations

    Lecture 530 Basic Transformations - Introduction

    Lecture 531 Spark SQL - Overview

    Lecture 532 Define Problem Statement

    Lecture 533 Prepare Tables

    Lecture 534 Projecting Data

    Lecture 535 Filtering Data

    Lecture 536 Joining Tables - Inner

    Lecture 537 Joining Tables - Outer

    Lecture 538 Aggregation Data

    Lecture 539 Sorting Data

    Lecture 540 Conclusion - Final Solution

    Section 47: Data Engineering using Spark SQL - Managing Tables - Basic DDL and DML

    Lecture 541 Introduction

    Lecture 542 Create Spark Metastore Tables

    Lecture 543 Overview of Data Types

    Lecture 544 Adding Comments

    Lecture 545 Loading Data Into Tables - Local

    Lecture 546 Loading Data Into Tables - HDFS

    Lecture 547 Loading Data - Append and Overwrite

    Lecture 548 Creating External Tables

    Lecture 549 Managed Tables vs External Tables

    Lecture 550 Overview of File Formats

    Lecture 551 Drop Tables and Databases

    Lecture 552 Truncating Tables

    Lecture 553 Exercise - Managed Tables

    Section 48: Data Engineering using Spark SQL - Managing Tables - DML and Partitioning

    Lecture 554 Introduction - Managing Tables - DML and Partitioning

    Lecture 555 Introduction to Partitioning

    Lecture 556 Creating Tables using Parquet

    Lecture 557 Load vs Insert

    Lecture 558 Inserting Data using Stage Table

    Lecture 559 Creating Partitioned Tables

    Lecture 560 Adding Partitions to Tables

    Lecture 561 Loading Data into Partitioned Tables

    Lecture 562 Inserting Data into Partitions

    Lecture 563 Using Dynamic Partition Mode

    Lecture 564 Exercise - Partitioned Tables

    Section 49: Data Engineering using Spark SQL - Overview of Spark SQL Functions

    Lecture 565 Introduction - Overview of Spark SQL Functions

    Lecture 566 Overview of Functions

    Lecture 567 Validating Functions

    Lecture 568 String Manipulation Functions

    Lecture 569 Date Manipulation Functions

    Lecture 570 Overview of Numeric Functions

    Lecture 571 Data Type Conversion

    Lecture 572 Dealing with Nulls

    Lecture 573 Using CASE and WHEN

    Lecture 574 Query Example - Word Count

    Section 50: Data Engineering using Spark SQL - Windowing Functions

    Lecture 575 Introduction - Windowing Functions

    Lecture 576 Prepare HR Database

    Lecture 577 Overview of Windowing Functions

    Lecture 578 Aggregations using Windowing Functions

    Lecture 579 Using LEAD or LAG

    Lecture 580 Getting first and last values

    Lecture 581 Ranking using Windowing Functions

    Lecture 582 Order of execution of SQL.cmproj

    Lecture 583 Overview of Subqueries

    Lecture 584 Filtering Windowing Function Results

    Section 51: Apache Spark using Python - Data Processing Overview

    Lecture 585 Starting Spark Context - pyspark

    Lecture 586 Overview of Spark Read APIs

    Lecture 587 Understanding airlines data

    Lecture 588 Inferring Schema

    Lecture 589 Previewing Airlines Data

    Lecture 590 Overview of Data Frame APIs

    Lecture 591 Overview of Functions

    Lecture 592 Overview of Spark Write APIs

    Section 52: Apache Spark using Python - Processing Column Data

    Lecture 593 Overview of Predefined Functions in Spark

    Lecture 594 Create Dummy Data Frame

    Lecture 595 Categories of Functions

    Lecture 596 Special Functions - col and lit

    Lecture 597 Common String Manipulation Functions

    Lecture 598 Extracting Strings using substring

    Lecture 599 Extracting Strings using split

    Lecture 600 Padding Characters around Strings

    Lecture 601 Trimming Characters from Strings

    Lecture 602 Date and Time Manipulation Functions

    Lecture 603 Date and Time Arithmetic

    Lecture 604 Using Date and Time Trunc Functions

    Lecture 605 Date and Time Extract Functions

    Lecture 606 Using to_date and to_timestamp

    Lecture 607 Using date_format Function

    Lecture 608 Dealing with Unix Timestamp

    Lecture 609 Dealing with Nulls

    Lecture 610 Using CASE and WHEN

    Section 53: Apache Spark using Python - Basic Transformations

    Lecture 611 Overview of Basic Transformations

    Lecture 612 Data Frames for basic transformations

    Lecture 613 Basic Filtering of Data

    Lecture 614 Filtering Example using dates

    Lecture 615 Boolean Operators

    Lecture 616 Using IN Operator or isin Function

    Lecture 617 Using LIKE Operator or like Function

    Lecture 618 Using BETWEEN Operator

    Lecture 619 Dealing with Nulls while Filtering

    Lecture 620 Total Aggregations

    Lecture 621 Aggregate data using groupBy

    Lecture 622 Aggregate data using rollup

    Lecture 623 Aggregate data using cube

    Lecture 624 Overview of Sorting Data Frames

    Lecture 625 Solution - Problem 1 - Get Total Aggregations

    Lecture 626 Solution - Problem 2 - Get Total Aggregations By FlightDate

    Section 54: Apache Spark using Python - Joining Data Sets

    Lecture 627 Prepare Datasets for Joins

    Lecture 628 Analyze Datasets for Joins

    Lecture 629 Problem Statements for Joins

    Lecture 630 Overview of Joins

    Lecture 631 Using Inner Joins

    Lecture 632 Left or Right Outer Join

    Lecture 633 Solution - Get Flight Count Per US Airport

    Lecture 634 Solution - Get Flight Count Per US State

    Lecture 635 Solution - Get Dormant US Airports

    Lecture 636 Solution - Get Origins without master data

    Lecture 637 Solution - Get Count of Flights without master data

    Lecture 638 Solution - Get Count of Flights per Airport without master data

    Lecture 639 Solution - Get Daily Revenue

    Lecture 640 Solution - Get Daily Revenue rolled up till Yearly

    Section 55: Apache Spark using Python - Spark Metastore

    Lecture 641 Overview of Spark Metastore

    Lecture 642 Exploring Spark Catalog

    Lecture 643 Creating Metastore Tables using catalog

    Lecture 644 Inferring Schema for Tables

    Lecture 645 Define Schema for Tables using StructType

    Lecture 646 Inserting into Existing Tables

    Lecture 647 Read and Process data from Metastore Tables

    Lecture 648 Create Partitioned Tables

    Lecture 649 Saving as Partitioned Table

    Lecture 650 Creating Temporary Views

    Lecture 651 Using Spark SQL

    Section 56: Getting Started with Semi Structured Data using Spark

    Lecture 652 Introduction to Getting Started with Semi Structured Data using Spark

    Lecture 653 Create Spark Metastore Table with Special Data Types

    Lecture 654 Overview of ARRAY Type in Spark Metastore Table

    Lecture 655 Overview of MAP and STRUCT Type in Spark Metastore Table

    Lecture 656 Insert Data into Spark Metastore Table with Special Type Columns

    Lecture 657 Create Spark Data Frame with Special Data Types

    Lecture 658 Create Spark Data Frame with Special Types using Python List

    Lecture 659 Insert Spark Data Frame with Special Types into Spark Metastore Table

    Lecture 660 Review Data in the JSON File with Special Data Types

    Lecture 661 Setup JSON Data Set to explore Spark APIs on Special Data Type Columns

    Lecture 662 Read JSON Data with Special Types into Spark Data Frame

    Lecture 663 Flatten Array Fields in Spark Data Frames using explode and explode_outer

    Lecture 664 Get Size or Length of Array Type Columns in Spark Data Frame

    Lecture 665 Concatenate Array Values into Delimited String using Spark APIs

    Lecture 666 Convert Delimited Strings from Spark Data Frame Columns to Arrays

    Lecture 667 Setup Data Sets to Build Arrays using Spark

    Lecture 668 Read JSON Data into Spark Data Frame and Review Aggregate Operations

    Lecture 669 Build Arrays from Flattened Rows of Spark Data Frame

    Lecture 670 Getting Started with Spark Data Frames with Struct Columns

    Lecture 671 Concatenate Struct Column Values in Spark Data Frame

    Lecture 672 Filter Data on Struct Column Attributes in Spark Data Frame

    Lecture 673 Create Spark Data Frame using Map Type Column

    Lecture 674 Project Map Values as Columns using Spark Data Frame APIs

    Lecture 675 Conclusion of Getting Started with Semi Structured Data using Spark

    Section 57: Process Semi Structured Data using Spark Data Frame APIs

    Lecture 676 Introduction to Process Semi Structured Data using Spark Data Frame APIs

    Lecture 677 Review the Data Sets to generate denormalized JSON Data using Spark

    Lecture 678 Setup JSON Data Sets in HDFS using HDFS Command

    Lecture 679 Create Spark Data Frames using Data Frame APIs

    Lecture 680 Join Orders and Order Items using Spark Data Frame APIs

    Lecture 681 Generate Struct Field for Order Details using Spark

    Lecture 682 Generate Array of Struct Field for Order Details using Spark

    Lecture 683 Join Data Sets to generate denormalized JSON Data using Spark

    Lecture 684 Denormalize Join Results using Spark Data Frame APIs

    Lecture 685 Write Denormalized Customer Details to JSON Files using Spark

    Lecture 686 Publish JSON Files for downstream applications

    Lecture 687 Read Denormalized Data into Spark Data Frame

    Lecture 688 Filter Denormalized Data Frame using Spark APIs

    Lecture 689 Perform Aggregations on Denormalized Data Frame using Spark

    Lecture 690 Flatten Semi Structured Data or Denormalized Data using Spark

    Lecture 691 Compute Monthly Customer Revenue using Spark on Denormalized Data

    Lecture 692 Conclusion of Processing Semi Structured Data using Spark Data Frame APIs

    Section 58: Apache Spark - Development Life Cycle using Python

    Lecture 693 Setup Virtual Environment and Install Pyspark

    Lecture 694 [Commands] - Setup Virtual Environment and Install Pyspark

    Lecture 695 Getting Started with Pycharm

    Lecture 696
     - Getting Started with Pycharm
    
    Lecture 697 Passing Run Time Arguments
    
    Lecture 698 Accessing OS Environment Variables
    
    Lecture 699 Getting Started with Spark
    
    Lecture 700 Create Function for Spark Session
    
    Lecture 701 [code] - Create Function for Spark Session
    
    Lecture 702 Setup Sample Data
    
    Lecture 703 Read Data from Files
    
    Lecture 704 [code] - Read data from files
    
    Lecture 705 Process Data using Spark APIs
    
    Lecture 706 [code] - Process data using Spark APIs
    
    Lecture 707 Write Data to Files
    
    Lecture 708 [code] - Write data to files
    
    Lecture 709 Validating Writing Data to Files
    
    Lecture 710 Productionizing the Code
    
    Lecture 711 [code] - Productionizing the code
    
    Lecture 712 Setting up Data for Production Validation
    
    Lecture 713 Running Application using YARN
    
    Lecture 714 Detailed Validation of the Application
    
    Section 59: Spark Application Execution Life Cycle and Spark UI
    
    Lecture 715 Deploying and Monitoring Spark Applications - Introduction
    
    Lecture 716 Overview of Types of Spark Cluster Managers
    
    Lecture 717 Setup EMR Cluster with Hadoop and Spark
    
    Lecture 718 Overall Capacity of Big Data Cluster with Hadoop and Spark
    
    Lecture 719 Understanding YARN Capacity of an Enterprise Cluster
    
    Lecture 720 Overview of Hadoop HDFS and YARN Setup on Multi-node Cluster
    
    Lecture 721 Overview of Spark Setup on top of Hadoop
    
    Lecture 722 Setup Data Set for Word Count application
    
    Lecture 723 [Instructions and Commands] Setup Data Set for Word Count Application
    
    Lecture 724 Develop Word Count Application
    
    Lecture 725 [code] Develop Word Count Application
    
    Lecture 726 Review Deployment Process of Spark Application
    
    Lecture 727 Overview of Spark Submit Command
    
    Lecture 728 Switching between Python Versions to run Spark Apps or launch Pyspark CLI
    
    Lecture 729 Switching between Pyspark Versions to run Spark Apps or launch Pyspark CLI
    
    Lecture 730 Review Spark Configuration Properties at Run Time
    
    Lecture 731 Develop Shell Script to run Spark Application
    
    Lecture 732 [code] Develop Shell Script to run Spark Application
    
    Lecture 733 Run Spark Application and review default executors
    
    Lecture 734 Overview of Spark History Server UI
    
    Section 60: Setup SSH Proxy to access Spark Application logs
    
    Lecture 735 Setup SSH Proxy to access Spark Application logs - Introduction
    
    Lecture 736 Overview of Private and Public ips of servers in the cluster
    
    Lecture 737 Overview of SSH Proxy
    
    Lecture 738 Setup sshuttle on Mac or Linux
    
    Lecture 739 Proxy using sshuttle on Mac or Linux
    
    Lecture 740 Accessing Spark Application logs via SSH Proxy using sshuttle on Mac or Linux
    
    Lecture 741 Side effects of using SSH Proxy to access Spark Application Logs
    
    Lecture 742 Steps to setup SSH Proxy on Windows to access Spark Application Logs
    
    Lecture 743 Setup PuTTY and PuTTYgen on Windows
    
    Lecture 744 Quick Tour of PuTTY on Windows
    
    Lecture 745 Configure Passwordless Login using PuTTYGen Keys on Windows
    
    Lecture 746 Run Spark Application on Gateway Node using PuTTY
    
    Lecture 747 Configure Tunnel to Gateway Node using PuTTY on Windows for SSH Proxy
    
    Lecture 748 Setup Proxy on Windows and validate using Microsoft Edge browser
    
    Lecture 749 Understanding Proxying Network Traffic overcoming Windows Caveats
    
    Lecture 750 Update Hosts file for worker nodes using private ips
    
    Lecture 751 Access Spark Application logs using SSH Proxy
    
    Lecture 752 Overview of performing tasks related to Spark Applications using Mac
    
    Section 61: Deployment Modes of Spark Applications
    
    Lecture 753 Deployment Modes of Spark Applications - Introduction
    
    Lecture 754 Default Execution Master Type for Spark Applications
    
    Lecture 755 Launch Pyspark using local mode
    
    Lecture 756 Running Spark Applications using Local Mode
    
    Lecture 757 Overview of Spark CLI Commands such as Pyspark
    
    Lecture 758 Accessing Local Files using Spark CLI or Spark Applications
    
    Lecture 759 Overview of submitting spark application using client deployment mode
    
    Lecture 760 Overview of submitting spark application using cluster deployment mode
    
    Lecture 761 Review the default logging while submitting Spark Applications
    
    Lecture 762 Changing Spark Application Log Level using custom log4j properties
    
    Lecture 763 Submit Spark Application using client mode with log level info
    
    Lecture 764 Submit Spark Application using cluster mode with log level info
    
    Lecture 765 Submit Spark Applications using SPARK_CONF_DIR with custom properties files
    
    Lecture 766 Submit Spark Applications using Properties File
    
    Computer Science or IT Students or other graduates with passion to get into IT,Data Warehouse Developers who want to transition to Data Engineering roles,ETL Developers who want to transition to Data Engineering roles,Database or PL/SQL Developers who want to transition to Data Engineering roles,BI Developers who want to transition to Data Engineering roles,QA Engineers to learn about Data Engineering,Application Developers to gain Data Engineering Skills[/code][/code][/code][/code][/code][/code][/code]