Stream Processing Frameworks For Big Data: The Internals

Posted By: ELK1nG

Stream Processing Frameworks For Big Data: The Internals
Published 1/2023
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 697.07 MB | Duration: 3h 9m

A deep dive into the internals of Flink, Spark Streaming, Structured Streaming, and Kafka Streams

What you'll learn

The features and internals of Flink, Spark Streaming, Structured Streaming and Kafka Streams.

How to select the right stream processing framework for a use case.

The current state-of-the-art of distributed stream processing.

References to equivalent implementations in all frameworks.

This is not a programming course! This is a course on understanding how these systems work.

Requirements

Preferably a notion of distributed systems (e.g. Spark batch API) but not required.

Description

Do you need to use stream processing for your next project but have no idea where to begin? Or do you want to grow into a data engineering role and want to start building up knowledge on stream processing?In this course, we give a detailed explanation and comparison of several popular stream processing frameworks. At the finish line, you will be able to make a well-grounded selection of the right framework for  your use case or to start your learning process. We will cover Flink, Kafka Streams, Spark Streaming and Structured Streaming. These are the four frameworks that are currently the state-of-the-art in the industry.You will understand their features, characteristics and differences. This course gives you the perfect primer to start learning and better understand the APIs and programming languages behind these frameworks.This course covers all relevant aspects: - their general characteristics- APIs- latency and throughput performance- scalability- elasticity- fault tolerance- state management- deployment- …We will dive deeply into the workings and the advantages and disadvantages of the different mechanisms and approaches. !!! This course is not a programming course but focuses on more theoretical aspects. At the end, you will be provided with a concise overview on what was covered. The content of this course is based on the results of Giselle's PhD work in which she benchmarked and analyzed these frameworks on all these characteristics. 

Overview

Section 1: Introduction

Lecture 1 Introduction

Lecture 2 Course overview

Section 2: General characteristics

Lecture 3 Overview

Lecture 4 Stream processing and distributed processing

Lecture 5 Frameworks: Flink

Lecture 6 Frameworks: Kafka Streams

Lecture 7 Frameworks: Spark Streaming and Structured Streaming

Lecture 8 Ecosystem: Connectors

Lecture 9 Ecosystem: Batch Processing

Lecture 10 Ecosystem: ML Libraries and Other Libraries

Lecture 11 Maturity

Lecture 12 Streaming models

Section 3: APIs

Lecture 13 Programming languages

Lecture 14 API levels

Lecture 15 Operators

Lecture 16 Operators: Sliding and Tumbling Windows

Lecture 17 Operators: Session and Count Windows

Lecture 18 Operators: Joining

Lecture 19 Operators: Low-level Operators

Lecture 20 Configuration

Section 4: Time

Lecture 21 Time characteristics l

Lecture 22 Time characteristics II

Lecture 23 Out-of-order processing

Lecture 24 Triggers

Section 5: Performance: Latency and throughput

Lecture 25 Latency: Definition and influence of streaming model

Lecture 26 Latency: influence of operation

Lecture 27 Latency: predictability

Lecture 28 Throughput

Lecture 29 General advice

Section 6: Scalability, elasticity and parallelization

Lecture 30 Scalability

Lecture 31 Elasticity

Lecture 32 Parallelization

Section 7: State management

Lecture 33 State

Lecture 34 State backends

Lecture 35 State features

Section 8: Fault tolerance

Lecture 36 Message delivery guarantees

Lecture 37 Checkpointing

Lecture 38 Checkpointing: savepoints

Lecture 39 Write-ahead-logs

Lecture 40 Fault tolerance in Kafka Streams

Lecture 41 Master and worker failures

Section 9: Summary

Lecture 42 Summary

Anybody who needs to get a feeling on how to select the right framework for a use case.,Anybody who wants to build up firm, in-depth knowledge on the differences and characteristics of these frameworks.,Anybody who wants to build up a deep understanding of stream processing in general.