Site Reliability Engineering Essential Training
.MP4, AVC, 1280x720, 30 fps | English, AAC, 2 Ch | 4h 9m | 534 MB
Instructor: Karun Subramanian
.MP4, AVC, 1280x720, 30 fps | English, AAC, 2 Ch | 4h 9m | 534 MB
Instructor: Karun Subramanian
Unlock the power of Site Reliability Engineering (SRE) with this comprehensive video course. SRE is a critical discipline that combines software engineering with IT operations to ensure high system reliability, scalability, and performance. This course provides a deep dive into the core principles and practices of SRE, equipping you with the tools to build reliable systems and improve operational efficiency.
Learn key SRE concepts, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets, with practical examples that help you apply these principles to your own organization. The course also addresses crucial aspects of incident management, such as managing on-call duties, running war rooms for critical incidents, and conducting blameless postmortems to learn from failures. Additionally, discover release management strategies that minimize user impact during deployments, monitor your CI/CD pipeline, and ensure progressive rollouts.
Learning objectives
- Set a strong foundation by implementing core Site Reliability Engineering (SRE) principles to ensure system reliability and performance.
- Build and optimize a robust monitoring and observability system using essential telemetry data such as logs, metrics, and traces.
- Monitor system health effectively through observability platforms to maintain optimal system performance.
- Apply Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to improve system reliability and performance.
- Manage incidents effectively, run war rooms for critical situations, and conduct blameless postmortems to learn from failures.
- Design reliable system architectures, including load balancing, auto-scaling, and implementing the CAP theorem for system resilience.