System Design For Big Data Pipelines
Published 4/2023
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.82 GB | Duration: 6h 33m
Published 4/2023
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.82 GB | Duration: 6h 33m
Analyze, Design and Build scalable, resilient and cost-effective Big Data pipelines with a methodical process
What you'll learn
Learn about the building blocks of a big data pipeline, their functions and challenges
Adapt an end-to-end methodical approach to designing a big data pipeline
Explore techniques to ensure overall scaling of a big data pipeline
Study design patterns for building blocks, their advantages, shortcomings, applications and available technologies
Focus additionally on Infrastructure, Operations and Security for Big Data deployments
Exercise the learnings in the course with a Batch and Realtime use case study
Requirements
Big Data Technology Concepts
Familiarity with Big Data Technologies like Apache Spark, Apache Kafka and NoSQL
Development / Deployment Experience with Big Data Technologies and Pipelines
Software Design and Development Experience including Cloud & Microservices
Description
Big data technologies have been growing exponentially over the past few years and have penetrated into every domain and industry in software development. It has become a core skill for a software engineer. Robust and effective big data pipelines are needed to support the growing volume of data and applications in the big data world. These pipelines have become business critical and help increase revenues and reduce cost.Do quality big data pipelines happen by magic? High quality designs that are scalable, reliable and cost effective are needed to build and maintain these pipelines.How do you build an end-to-end big data pipeline that leverages big data technologies and practices effectively to solve business problems? How do you integrate them in a scalable and reliable manner? How do you deploy, secure and operate them? How do you look at the overall forest and not just the individual trees? This course focuses on this skill gap.What are the topics covered in this course?We start off by discussing the building blocks of big data pipelines, their functions and challenges.We introduce a structured design process for building big data pipelines.We then discuss individual building blocks, focusing on the design patterns available, their advantages, shortcomings, use cases and available technologies.We recommend several best practices across the course.We finally implement two use cases for illustration on how to apply the learnings in the course to a real world problem. One is a batch use case and another is a real time use case.
Overview
Section 1: Introduction & Expectations
Lecture 1 Need for Quality Pipeline Design
Lecture 2 Course Coverage and Pre-requisites
Lecture 3 Cloud Serverless Technologies
Section 2: Building Blocks for Big Data Pipelines
Lecture 4 The Big Data Pipeline Network
Lecture 5 Data Acquisition Blocks
Lecture 6 Data Transport Blocks
Lecture 7 Data Processing Blocks
Lecture 8 Data Storage Blocks
Lecture 9 Data Serving Blocks
Lecture 10 Data Pipeline Infrastructure
Lecture 11 Data Pipeline Operations
Section 3: System Design Process
Lecture 12 System Design Process Overview
Lecture 13 Analyze Functional Requirements
Lecture 14 Analyze Pipeline Input
Lecture 15 Analyze Non-functional Requirements
Lecture 16 Draw a Pipeline Flowchart
Lecture 17 Create a Skeleton Design
Lecture 18 Analyze Scaling
Lecture 19 Select Technologies
Lecture 20 Design Infrastructure and Operations
Lecture 21 Develop a Test Strategy
Section 4: Scalable Pipelines - Design Principles
Lecture 22 Batch vs Realtime Pipelines
Lecture 23 Distributed Architectures
Lecture 24 Microservices based Architectures
Lecture 25 Batch Pipelines - Best Practices
Lecture 26 Realtime Pipelines - Best Practices
Lecture 27 Performance Benchmarking for Big Data Pipelines
Section 5: Data Acquisition Design
Lecture 28 File Transfer Pattern
Lecture 29 Extraction Client Pattern
Lecture 30 Ingestion API Pattern
Lecture 31 Pub Sub Acquisition Pattern
Lecture 32 Data Acquisition Design Practices
Section 6: Data Transport Design
Lecture 33 Extract Load Pattern
Lecture 34 Request Response Pattern
Lecture 35 Event Streaming Pattern
Lecture 36 Data Transport Design Practices
Section 7: Data Processing & Transformation Design
Lecture 37 Data Processing Patterns
Lecture 38 Distributed Processing with Big Data
Lecture 39 Batch Processing Design Practices - Part 1
Lecture 40 Batch Processing Design Practices - Part 2
Lecture 41 Stream Processing Design Practices
Lecture 42 Batch vs Realtime Processing
Lecture 43 Input and Output Considerations for Processing
Lecture 44 Processing Engine Technologies
Section 8: Storage Design
Lecture 45 Distributed File System Pattern
Lecture 46 Relational Database Pattern
Lecture 47 Document Database Pattern
Lecture 48 Columnar Database Pattern
Lecture 49 Graph Database Pattern
Lecture 50 Distributed Cache Pattern
Lecture 51 Data Storage Design Practices - 1
Lecture 52 Data Storage Design Practices - 2
Section 9: Serving Design
Lecture 53 Query Interface Pattern
Lecture 54 Serving API Pattern
Lecture 55 Push Client Pattern
Lecture 56 Publish Subscribe Pattern
Lecture 57 Data Serving Design Practices
Section 10: Infrastructure and Deployments
Lecture 58 Infrastructure Technologies
Lecture 59 Microservices Deployments
Lecture 60 Processing Jobs Deployments
Lecture 61 Databases and Queues Deployments
Lecture 62 Geographical Distribution
Section 11: Security
Lecture 63 Pipeline Security by Design
Lecture 64 Secure External Interfaces
Lecture 65 Secure Data Storage
Lecture 66 Privacy Considerations
Lecture 67 Multi-Tenancy Considerations
Section 12: Serviceability
Lecture 68 Elements of Serviceability
Lecture 69 Monitoring Pipelines
Lecture 70 Data to Monitor
Lecture 71 Pipeline Troubleshooting
Section 13: Use Case I : Customer Journey Analytics (CJA)
Lecture 72 Problem Definition for CJA
Lecture 73 Study CJA Functional Requirements
Lecture 74 Analyze CJA Input Data
Lecture 75 Study CJA Non-Functional Requirements
Lecture 76 Study CJA Pipeline Flowchart
Lecture 77 Create CJA Skeleton Design
Lecture 78 Analyze CJA Scaling
Lecture 79 Select Technologies for CJA
Lecture 80 Design Infrastructure and Operations for CJA
Section 14: Use Case II : Suspicious Login Alerting (SLA)
Lecture 81 Problem Definition for SLA
Lecture 82 Study SLA Functional Requirements
Lecture 83 Analyze SLA Input Data
Lecture 84 Study SLA Non-Functional Requirements
Lecture 85 Draw SLA Pipeline Flowchart
Lecture 86 Create SLA Skeleton Design
Lecture 87 Analyze SLA Scaling
Lecture 88 Select SLA Technologies
Lecture 89 Define SLA Infrastructure and Operations
Section 15: Conclusion
Lecture 90 Thank You
Big Data Pipeline Designers & Architects,Big Data Developers looking to move into Design/Architecture roles,Software Architects looking to gain Big Data Experience