Learning Apache Spark | Master Spark For Big Data Processing
Published 10/2024
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 2.77 GB | Duration: 7h 11m
Published 10/2024
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 2.77 GB | Duration: 7h 11m
Embark on a comprehensive journey to Master Apache Spark from Data Manipulation to Machine Learning!
What you'll learn
Understand the fundamentals of Spark’s architecture and its distributed computing capabilities
Learn to write and optimize Spark SQL queries for efficient data processing
Master the creation and manipulation of DataFrames, a core component of Spark
Learn to read data from different file formats such as CSV and Parquet
Develop skills in filtering, sorting, and aggregating data to extract meaningful insights
Learn to process and analyze streaming data for real-time insights
Explore the capabilities of Spark’s MLlib for machine learning
Learn to create and fine-tune models using pipelines and transformers for predictive analytics
Requirements
You should know how to write and run Python code
Basic understanding of Python syntax and concepts is necessary
Understanding SQL (Structured Query Language) is important
You should know how to create and manage tables, transform data, and run queries
Description
Unlock the power of big data with Apache Spark!In this course, you’ll learn how to use Apache Spark with Python to work with data.We’ll start with the basics and move up to advanced projects and machine learning.Whether you’re just starting or already know some Python, this course will teach you step-by-step how to process and analyze big data.What You’ll Learn:Use PySpark’s DataFrame: Learn to organize and work with data.Store Data Efficiently: Use formats like Parquet to store data quickly.Use SQL in PySpark: Work with data using SQL, just like with DataFrames.Connect PySpark with Python Tools: Dig deeper into data with Python’s data tools.Machine Learning with PySpark’s MLlib: Work on big projects using machine learning.Real-World Examples: Learn by doing with practical examples.Handle Large Data Sets: Understand how to manage big data easily.Solve Real-World Problems: Apply Spark to real-life data challenges.Build Confidence in PySpark: Get better at big data processing.Manage and Analyze Data: Gain skills for both work and personal projects.Prepare for Data Jobs: Build skills for jobs in tech, finance, and healthcare.By the end of this course, you’ll have a solid foundation in Spark, ready to tackle real-world data challenges.
Overview
Section 1: Getting Started
Lecture 1 Why Should You Learn Apache Spark?
Lecture 2 What Does This Course Offer on Apache Spark?
Section 2: All about Apache Spark
Lecture 3 Let’s understand WordCount
Lecture 4 Let’s understand Map and Reduce
Lecture 5 Programming with Map and Reduce
Lecture 6 Let’s understand Hadoop
Lecture 7 Apache Hadoop Architecture
Lecture 8 Apache Hadoop and Apache Spark
Lecture 9 Apache Spark Architecture
Lecture 10 What is PySpark
Section 3: Installations for Apache Spark
Lecture 11 Install JAVA JDK
Lecture 12 Install Python
Lecture 13 Install JupyterLab
Lecture 14 Install PySpark
Lecture 15 Spark Session by Initialization
Lecture 16 Running PySpark on AWS EC2 Instances P1
Lecture 17 Running PySpark on AWS EC2 Instance P2
Section 4: Using Databricks Community Edition
Lecture 18 Why Use Databricks Community Edition
Lecture 19 Register for Databricks Community Edition
Lecture 20 When to use Databricks Community Edition
Lecture 21 Running Magic Commands in Databricks P1
Lecture 22 Running Magic Commands in Databricks P2
Section 5: Spark DataFrames
Lecture 23 Apache Spark DataFrame
Lecture 24 Create DataFrames from CSV Files P1
Lecture 25 Create DataFrames from CSV Files P2
Lecture 26 Create DataFrames from Parquet Files
Section 6: Spark Data Transformations
Lecture 27 Using SELECT
Lecture 28 Using FILTER
Lecture 29 Using ORDER BY
Lecture 30 Using GROUP BY
Lecture 31 Using AGGREGATE Functions
Lecture 32 Using INNER JOIN
Section 7: Spark SQL Catalog
Lecture 33 Spark SQL Catalogs
Lecture 34 Access Spark SQL Catalogs
Lecture 35 List Databases from Catalogs
Lecture 36 List Tables from Current Database
Lecture 37 Create Spark Temp View
Lecture 38 Run SQL Queries on Temp Views
Lecture 39 Drop Temp Views
Section 8: Databricks Utility FileSystem for Apache Spark
Lecture 40 Using Databricks Utilities
Lecture 41 Using dbfs - Databricks Utility FileSystem
Lecture 42 Using dbfs - Make Directory
Lecture 43 Using dbfs - Copy Files
Lecture 44 Using dbfs - Delete Files
Section 9: Pandas API on Spark
Lecture 45 Introduction to Pandas
Lecture 46 Pandas API on Spark
Lecture 47 Reading and Writing Data with Pandas P1
Lecture 48 Reading and Writing Data with Pandas P2
Lecture 49 Data Manipulation with PySpark Pandas
Lecture 50 Merging and Joining in PySpark Pandas
Lecture 51 Grouping and Aggregation with PySpark Pandas
Lecture 52 Visualizing Data in PySpark Pandas
Section 10: Structured Streaming Using Apache Spark
Lecture 53 What is Apache Spark Structure Streaming
Lecture 54 How Apache Spark handles Structured Streaming
Lecture 55 Handling Programmatically Streaming Data
Lecture 56 Programmatic Modes by Apache Spark
Lecture 57 DataFrames for Streaming
Lecture 58 readStream API
Lecture 59 writeStream API
Lecture 60 Querying Data
Lecture 61 StreamingQuery - stop
Lecture 62 Structured Streaming with Kafka and Spark P1
Lecture 63 Structured Streaming with Kafka and Spark P2
Lecture 64 Structured Streaming with Kafka and Spark P3
Lecture 65 Terminate the Kafka Environment
Lecture 66 Handling Late Data Arrivals and Water Marking P1
Lecture 67 Handling Late Data Arrivals and Water Marking P2
Section 11: Machine Learning with Spark
Lecture 68 About this section
Lecture 69 Learning about Machine Learning
Lecture 70 How to build a Machine Learning Model
Lecture 71 Apache Spark MLLib Overview
Lecture 72 Learning about ML Pipelines using Spark MLlib
Lecture 73 Data Sources by Spark MLlib to Build ML Models
Lecture 74 Create DataFrames from Data Sources
Lecture 75 Learning about Featurization using Spark MLlib
Lecture 76 Using Apache Spark MLlibs - Feature Transformers
Lecture 77 Using Tokenizer
Lecture 78 Using StringIndexer
Lecture 79 Using Pipelines
Lecture 80 Using VectorAssembler
Lecture 81 Using VectorIndexer
Lecture 82 Using MLlib Estimator - Linear Regression
Lecture 83 Using MLlib Estimator - Logisitic Regression
Lecture 84 Measure ML Effiecny using Spark MLlib Evaluators
Lecture 85 Using ML for Solving Real World Problem
Lecture 86 Building ML Model P1 - Using Local Host
Lecture 87 Building ML Model P2 - Using Databricks Community Edition
Lecture 88 Using Apache Spark MLFlow with Databricks Community Edition
IT professionals interested in big data and analytics,Aspiring Data Scientists,Aspiring Data Analysts,Aspiring Machine Learning Engineers,Business Analysts,Software Engineers,Students and Academics,Researchers,Anyone Interested in Big Data