Big Data With Apache Spark 3 And Python: From Zero To Expert
Published 11/2022
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.76 GB | Duration: 4h 19m
Published 11/2022
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.76 GB | Duration: 4h 19m
Complete bootcamp to learn PySpark, Databricks, Spark Machine Learning, Advanced Analytics, Koalas and Spark Streaming
What you'll learn
Introduction to Big Data and Apache Spark Fundamentals
Spark RDDs, Dataframes and Spark Koalas
Machine Learning with Spark
Advanced features with Apache Spark
Advanced analytics and data visualization toold
Spark in cloud with Azure and Databricks
Spark Streaming and GraphX
Databricks
Machine learning in Databricks
Requirements
Python
Description
If you are looking for a hands-on, complete and advanced course to learn Big Data with Apache Spark and Python, you have come to the right place.This course is designed to cover the complete skillset of Apache Spark, from RDDs, Spark SQL, Dataframes, and Spark Streaming, to Machine Learning with Spark ML, Advanced Analytics, data visualization, Spark Koalas, and Databricks.With lessons, downloadable study guides, hands-on exercises, and real-world use cases, this is the only course you'll need to learn Apache Spark.Apache Spark has become the reference tool for Big Data, surpassing Hadoop MapReduce. Spark works up to 100 times faster than Hadoop MapReduce and has a complete ecosystem of functionalities for machine learning and data analytics. This makes Apache Spark one of the most in-demand skills for data engineers, data scientists, etc. Big Data is one of the most valuable skills today. So this course will teach you everything you need to position yourself in the Big Data job market.In this course we will teach you the complete skillset of Apache Spark and PySpark. Starting from the basics to the most advanced features. We will use visual presentations in Power Point, sharing clear explanations and useful professional advice.This course has the following sections:Introduction to big data and fundamentals of Apache SparkInstallation of Apache Spark and libraries such as Anaconda, Java, etc.Spark RDDsSpark DataframesAdvanced features with Apache SparkAdvanced analytics and data visualizationSpark KoalasMachine Learning with SparkSpark Streaming Spark GraphXDatabricksSpark in the cloud (Azure)If you're ready to sharpen your skills, increase your career opportunities, and become a Big Data expert, join today and get immediate and lifetime access to:• Complete guide to Apache Spark (PDF e-book)• Downloadable Spark project files and code• Hands-on exercises and quizzes• Spark resources like: Cheatsheets and Summaries• 1 to 1 expert support• Course question and answer forum• 30 days money back guaranteeSee you there!
Overview
Section 1: Spark Fundamentals
Lecture 1 How to get the most out of this course
Lecture 2 Course material
Lecture 3 Spark Fundamentals
Lecture 4 Apache Spark execution
Lecture 5 Apache Spark ecosystem and documentation
Lecture 6 PySpark: operation, cluster administration and architecture
Section 2: Installing Apache Spark locally
Lecture 7 Download Spark, Java and Anaconda
Lecture 8 Setting environment variables
Lecture 9 Running Spark in Prompt and Jupyter Notebook
Lecture 10 Fixing common problems
Section 3: Basic Features and RDDs
Lecture 11 PySpark Cheat Sheet
Lecture 12 RDD Fundamentals
Lecture 13 Initialize PySpark with SparkSession and the SparkContext
Lecture 14 Transformations in RDDs like map, filter, flatMap and distinct
Lecture 15 Transformations in RDDs like reduceByKey, groupByKey or sortByKey
Lecture 16 RDD actions such as count, first, collect or take
Section 4: Spark DataFrames and Apache Spark SQL
Lecture 17 PySpark Cheatsheet: SQL
Lecture 18 Fundamentals and advantages of DataFrames
Lecture 19 Characteristics of DataFrames and data sources
Lecture 20 Creating DataFrames in PySpark
Lecture 21 Operations with PySpark DataFrames
Lecture 22 Different types of joins in DataFrames
Lecture 23 SQL queries in PySpark
Lecture 24 Advanced features for loading and exporting data in PySpark
Section 5: Advanced features in Apache Spark
Lecture 25 Funciones avanzadas y optimización del rendimiento
Lecture 26 BroadCast Join and caching
Lecture 27 User Defined Functions (UDF) and advanced SQL functions
Lecture 28 Handling and imputation of missing values
Lecture 29 Partitioning and catalog of APIs
Lecture 30 Practical Exercise: Advanced Analytics with Apache Spark
Section 6: Advanced Analytics with Apache Spark
Lecture 31 Introduction to advanced analytics with Spark
Lecture 32 Data loading and data schema modification
Lecture 33 Inspect data in PySpark
Lecture 34 Column transformation in PySpark
Lecture 35 Advanced missing data imputation in PySpark
Lecture 36 Data selection with PySpark and PySpark SQL
Lecture 37 Data visualization and graph generation in PySpark
Lecture 38 Persist data with PySpark
Section 7: Kolas: The Apache Spark Pandas API
Lecture 39 Spark Koalas Fundamentals
Lecture 40 Feature Engineering with Koalas
Lecture 41 Creating DataFrames with Koalas
Lecture 42 Data manipulation and DataFrames with Koalas
Lecture 43 Working with missing data in Koalas
Lecture 44 Data visualization and graph generation with Koalas
Lecture 45 Importing and exporting data with Koalas
Lecture 46 Hands-on exercise with Koalas
Section 8: Machine Learning with Apache Spark
Lecture 47 Fundamentals of Machine Learning with Spark
Lecture 48 Spark Machine Learning Components
Lecture 49 Stages of developing a Machine Learning model
Lecture 50 Import data and exploratory data analysis (EDA)
Lecture 51 Data preprocessing with PySpark
Lecture 52 Training the machine learning model in PySpark
Lecture 53 Evaluation of the Machine Learning model
Section 9: Spark Streaming
Lecture 54 Practical example of counting words with Spark Streaming
Lecture 55 Spark Streaming Configurations: Output Modes and Operation Types
Lecture 56 Time Window Operations in Spark Streaming
Lecture 57 Spark Streaming Capabilities
Lecture 58 Use case: Real-time bank fraud detection (Part I)
Lecture 59 Use case: Real-time bank fraud detection (Part II)
Lecture 60 Spark Streaming Exercise
Section 10: Introduction to Databricks
Lecture 61 Introduction to Databricks
Lecture 62 Databricks Terminology and Databricks Community
Lecture 63 Delta Lake
Lecture 64 Create a free Databricks account
Section 11: Apache Spark on Databricks
Lecture 65 Introduction to the Databricks environment
Lecture 66 Getting started with Databricks
Lecture 67 Creating and saving DataFrames in Databricks
Lecture 68 Data transformation and visualization in Databricks
Lecture 69 Use case: Population data analytics
Section 12: Machine Learning in Databricks
Lecture 70 Import and exploratory analysis of the data
Lecture 71 Variable preprocessing with PySpark and Databricks
Lecture 72 Definition of the Machine Learning model and development of the Pipeline
Lecture 73 Model evaluation with PySpark and Databricks
Lecture 74 Hyperparameter tuning and registration in MLFlow
Lecture 75 Predictions with new data and visualization of the results
Section 13: Additional material
Lecture 76 Additional Resources: Complete Guide to Spark
Anyone who wants to learn advanced big data skills,Anyone who knows Python and wants to adquire Big Data processing skills,Anyone that want to make a career as a data engineer, data analyst or data scientist,Anyone interested in learning Apache Spark and Pyspark for Big Data analysis,Anyone that want to learn cutting-edge technology in Big Data