Hands-On Big Data Analytics with PySpark by Rudy Lai, Bartłomiej Potaczek
English | 2019 | ISBN: 183864413X | 182 pages | EPUB | 5.36 MB
Use PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs
Key Features
• Work with large amounts of agile data using distributed datasets and in-memory caching
• Source data from all popular data hosting platforms, such as HDFS, Hive, JSON, and S3
• Employ the easy-to-use PySpark API to deploy big data Analytics for production