Data Engineering with AWS Cookbook: A recipe-based approach to help you tackle data engineering problems with AWS services
English | 2024 | ISBN: 1805127284 | 528 pages | EPUB (True) | 35.64 MB
English | 2024 | ISBN: 1805127284 | 528 pages | EPUB (True) | 35.64 MB
Master AWS data engineering services and techniques for orchestrating pipelines, building layers, and managing migrations
Key Features
Get up to speed with the different AWS technologies for data engineering
Learn the different aspects and considerations of building data lakes, such as security, storage, and operations
Get hands on with key AWS services such as Glue, EMR, Redshift, QuickSight, and Athena for practical learning
Purchase of the print or Kindle book includes a free PDF eBook
Book Description
Performing data engineering with Amazon Web Services (AWS) combines AWS's scalable infrastructure with robust data processing tools, enabling efficient data pipelines and analytics workflows. This comprehensive guide to AWS data engineering will teach you all you need to know about data lake management, pipeline orchestration, and serving layer construction.
Through clear explanations and hands-on exercises, you’ll master essential AWS services such as Glue, EMR, Redshift, QuickSight, and Athena. Additionally, you’ll explore various data platform topics such as data governance, data quality, DevOps, CI/CD, planning and performing data migration, and creating Infrastructure as Code. As you progress, you will gain insights into how to enrich your platform and use various AWS cloud services such as AWS EventBridge, AWS DataZone, and AWS SCT and DMS to solve data platform challenges.
Each recipe in this book is tailored to a daily challenge that a data engineer team faces while building a cloud platform. By the end of this book, you will be well-versed in AWS data engineering and have gained proficiency in key AWS services and data processing techniques. You will develop the necessary skills to tackle large-scale data challenges with confidence.
What you will learn
Define your centralized data lake solution, and secure and operate it at scale
Identify the most suitable AWS solution for your specific needs
Build data pipelines using multiple ETL technologies
Discover how to handle data orchestration and governance
Explore how to build a high-performing data serving layer
Delve into DevOps and data quality best practices
Migrate your data from on-premises to AWS
Who this book is for
If you're involved in designing, building, or overseeing data solutions on AWS, this book provides proven strategies for addressing challenges in large-scale data environments. Data engineers as well as big data professionals looking to enhance their understanding of AWS features for optimizing their workflow, even if they're new to the platform, will find value. Basic familiarity with AWS security (users and roles) and command shell is recommended.
Table of Contents
Managing Data Lake Storage
Sharing Your Data Across Environments and Accounts
Ingesting and Transforming Your Data with AWS Glue
A Deep Dive into AWS Orchestration Frameworks
Running Big Data Workloads with Amazon EMR
Governing Your Platform
Data Quality Management
DevOps – Defining IaC and Building CI/CD Pipelines
Monitoring Data Lake Cloud Infrastructure
Building a Serving Layer with AWS Analytics Services
Migrating to AWS – Steps, Strategies, and Best Practices for Modernizing Your Analytics and Big Data Workloads
Harnessing the Power of AWS for Seamless Data Warehouse Migration
Strategizing Hadoop Migrations – Cost, Data, and Workflow Modernization with AWS