Data Engineering Master Course: Spark/Hadoop/Kafka/Mongodb
Last updated 5/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 5.61 GB | Duration: 12h 12m
Last updated 5/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 5.61 GB | Duration: 12h 12m
Full Hands on course to become Big Data Engineer: Spark/Kafka/Hadoop/Flume/Hive/Sqoop/MongoDB. Data Engineering course.
What you'll learn
Hadoop Ecosystem, Sqoop, Flume, Hive
Expertise on writing code with Apache Spark
Learn Kafka Fundamentals and using Kafka Connectors
Learn writing queries and client in MongoDB
Learn Data Engineering technologies
Requirements
No
Description
In this course, you will start by learning what is hadoop distributed file system and most common hadoop commands required to work with Hadoop File system.Then you will be introduced to Sqoop Import Understand lifecycle of sqoop command.Use sqoop import command to migrate data from Mysql to HDFS.Use sqoop import command to migrate data from Mysql to Hive.Use various file formats, compressions, file delimeter,where clause and queries while importing the data.Understand split-by and boundary queries.Use incremental mode to migrate the data from Mysql to HDFS.Further, you will learn Sqoop Export to migrate data.What is sqoop exportUsing sqoop export, migrate data from HDFS to Mysql.Using sqoop export, migrate data from Hive to Mysql.Further, you will learn about Apache FlumeUnderstand Flume Architecture.Using flume, Ingest data from Twitter and save to HDFS.Using flume, Ingest data from netcat and save to HDFS.Using flume, Ingest data from exec and show on console.Describe flume interceptors and see examples of using interceptors.Flume multiple agents Flume Consolidation.In the next section, we will learn about Apache HiveHive IntroExternal & Managed TablesWorking with Different Files - Parquet,AvroCompressionsHive AnalysisHive String FunctionsHive Date FunctionsPartitioningBucketingYou will learn about Apache SparkSpark IntroCluster OverviewRDDDAG/Stages/TasksActions & TransformationsTransformation & Action ExamplesSpark Data framesSpark Data frames - working with diff File Formats & CompressionDataframes API'sSpark SQLDataframe ExamplesSpark with Cassandra IntegrationRunning Spark on Intellij IDERunning Spark on EMRYou will learn about Apache KafkaKafka ArchitecturePartitions and offsetsKafka Producers and ConsumersKafka SerDEsKafka MessagesKafka ConnectorIngesting Data using Kafka ConnectorYou will learn about MongoDBMongoDB UsecasesCRUD OperationsMongoDB OperatorsWorking with ArraysMongoDB with SparkData Engineering Interview PreparationSqoop Interview QuestionsHive Interview QuestionsSpark Interview QuestionsData Engineering common questionsData Engineering Real project questions.
Overview
Section 1: Big Data Introduction
Lecture 1 Meet your Instructor
Lecture 2 Course Intro
Lecture 3 Big Data Intro
Lecture 4 Understanding Big Data Ecosystem
Section 2: Google Cloud Cluster Setup
Lecture 5 Google Cloud Account Setup
Lecture 6 Troubleshooting Guide (April 2025)
Lecture 7 Dataproc Cluster Setup - Part1
Lecture 8 DataProc Cluster Setup - Part2
Lecture 9 Upload Files on Google Cloud
Lecture 10 Sqoop Setup
Lecture 11 Environment Update
Section 3: Hadoop & Yarn
Lecture 12 HDFS and Hadoop Commands
Lecture 13 Yarn Cluster Overview
Section 4: Sqoop Import
Lecture 14 Sqoop Introduction
Lecture 15 Managing Target Directories
Lecture 16 Working with Different Compressions
Lecture 17 Conditional Imports
Lecture 18 Split-by and Boundary Queries
Lecture 19 Field delimeters
Lecture 20 Incremental Appends
Lecture 21 Sqoop-Hive Cluster Fix
Lecture 22 Access Hive on Google Cloud
Lecture 23 Sqoop Hive Import
Lecture 24 Sqoop List Tables/Database
Lecture 25 Sqoop Import Practice1
Lecture 26 Sqoop Import Practice2
Section 5: Sqoop Export
Lecture 27 Export from Hdfs to Mysql
Lecture 28 Export from Hive to Mysql
Lecture 29 Export Avro Compressed to Mysql
Lecture 30 Bonus Lecture: Sqoop with Airflow
Section 6: Apache Flume
Lecture 31 Flume Setup
Lecture 32 Flume Introduction & Architecture
Lecture 33 Exec Source and Logger Sink
Lecture 34 Moving data from Twitter to HDFS
Lecture 35 Moving data from NetCat to HDFS
Lecture 36 Flume Interceptors
Lecture 37 Flume Interceptor Example
Lecture 38 Flume Multi-Agent Flow
Lecture 39 Flume Consolidation
Section 7: Apache Hive
Lecture 40 Access Hive Shell on Google Cloud
Lecture 41 Hive Introduction
Lecture 42 Hive Database
Lecture 43 Hive Managed Tables
Lecture 44 Hive External Tables
Lecture 45 Hive Inserts
Lecture 46 Hive Analytics
Lecture 47 Working with Parquet
Lecture 48 Compressing Parquet
Lecture 49 Working with Fixed File Format
Lecture 50 Alter Command
Lecture 51 Hive String Functions
Lecture 52 Hive Date Functions
Lecture 53 Hive Partitioning
Lecture 54 Hive Bucketing
Section 8: Spark with Yarn & HDFS
Lecture 55 What is Apache Spark
Lecture 56 Understanding Cluster Manager (Yarn)
Lecture 57 Understanding Distributed Storage (HDFS)
Lecture 58 Running Spark on Yarn/HDFS
Lecture 59 Understanding Deploy Modes
Section 9: GCS Cluster
Lecture 60 Spark on GCS Cluster
Lecture 61 Upload Data files for Spark
Section 10: Spark Internals
Lecture 62 Drivers & Executors
Lecture 63 RDDs & Dataframes
Lecture 64 Transformation & Actions
Lecture 65 Wide & Narrow Transformations
Lecture 66 Understanding Execution Plan
Lecture 67 Different Plans by Driver
Section 11: Spark RDD : Transformation & Actions
Lecture 68 Map/FlatMap Transformation
Lecture 69 Filter/Intersection
Lecture 70 Union/Distinct Transformation
Lecture 71 GroupByKey/ Group people based on Birthday months
Lecture 72 ReduceByKey / Total Number of students in each Subject
Lecture 73 SortByKey / Sort students based on their rollno
Lecture 74 MapPartition / MapPartitionWithIndex
Lecture 75 Change number of Partitions
Lecture 76 Join / join email address based on customer name
Lecture 77 Spark Actions
Section 12: Spark RDD Practice
Lecture 78 Upload Files
Lecture 79 Scala Tuples
Lecture 80 Filter Error Logs
Lecture 81 Frequency of word in Text File
Lecture 82 Population of each city
Lecture 83 Orders placed by Customers
Lecture 84 average rating of movie
Section 13: Spark Dataframes & Spark SQL
Lecture 85 Dataframe Intro
Lecture 86 Dafaframe from Json Files
Lecture 87 Dataframe from Parquet Files
Lecture 88 Dataframe from CSV Files
Lecture 89 Dataframe from Avro File
Lecture 90 Working with XML
Lecture 91 Working with Columns
Lecture 92 Working with String
Lecture 93 Working with Dates
Lecture 94 Dataframe Filter API
Lecture 95 DataFrame API Part1
Lecture 96 DataFrame API Part2
Lecture 97 Spark SQL
Lecture 98 Working with Hive Tables in Spark
Lecture 99 Datasets versus Dataframe
Lecture 100 User Defined Functions (UDFS)
Section 14: Using Intellij IDE
Lecture 101 Intellij Setup
Lecture 102 Project Setup
Lecture 103 Writing first Spark program on IDE
Lecture 104 Understanding spark configuration
Lecture 105 Adding Actions/Transformations
Lecture 106 Understanding Execution Plan
Section 15: Running Spark on EMR (AWS Cloud)
Lecture 107 EMR Cluster Overview
Lecture 108 Cluster Setup
Lecture 109 Setting Spark Code for EMR
Lecture 110 Using Spark-submit
Lecture 111 Running Spark on EMR Cluster
Section 16: Spark with Cassandra
Lecture 112 Cassandra Course
Lecture 113 Creating Spark RDD from Cassandra Table
Lecture 114 Processing Cassandra data in Spark
Lecture 115 Cassandra Rows to Case Class
Lecture 116 Saving Spark RDD to Cassandra
Section 17: Apache Kafka
Lecture 117 Kafka Section Intro
Lecture 118 Confluent Cluster Setup
Lecture 119 Kafka Architecture
Lecture 120 Partitions and Offsets
Lecture 121 Kafka Consumer/Producers
Lecture 122 Kafka Message
Lecture 123 Kafka Serialization & Deserialization
Lecture 124 Your First Python Producer
Lecture 125 Your First Python Consumer
Section 18: Kafka Connector
Lecture 126 What is Connector?
Lecture 127 Kafka Connector - AWS S3 to Kafka
Section 19: Spark Structured Streaming & Kafka (Coming Soon)
Lecture 128 Spark streaming Intro
Section 20: MongoDB
Lecture 129 MongoDB Intro
Lecture 130 MongoDB Usecase & Limitations
Lecture 131 MongoDB Installation
Section 21: CRUD Operations
Lecture 132 Find
Lecture 133 Find With Filter
Lecture 134 Insert
Lecture 135 Update
Lecture 136 Update Continues
Lecture 137 Projections
Lecture 138 Delete
Section 22: Working with Operators
Lecture 139 In / not in Operators
Lecture 140 gte / lte Operators
Lecture 141 and / or operators
Lecture 142 regex operator
Section 23: MongoDB Compass
Lecture 143 Working with GUI
Section 24: Advanced Mongo
Lecture 144 Validation/Schema
Lecture 145 Working with Indexes
Section 25: Spark with Mongo
Lecture 146 Spark Mongo Integration
Section 26: Data Engineering Interview Preparation
Lecture 147 Data Engineer Resume template
Lecture 148 Sqoop Interview Questions
Lecture 149 Hive Interview Questions
Lecture 150 Spark Interview Questions
Lecture 151 Data Engineering common Questions
Lecture 152 Data Engineering Real project Questions
Who want to learn Big data technologies,Who want to become Data Engineers