4.33
84 reviews
Mastering AWS Elastic Map Reduce (EMR) for Data Engineers
Build Pyspark and Spark SQL Applications on AWS EMR, Orchestrate using Step Functions, Manage EMR using Boto3 and more
- Description
- Curriculum
- FAQ
- Reviews
AWS Elastic Map Reduce (EMR) is one of the key AWS Services used in building large-scale data processing leveraging Big Data Technologies such as Apache Hadoop, Apache Spark, Hive, etc. As part of this course, you will end up learning AWS Elastic Map Reduce (EMR) by building end-to-end data pipelines leveraging Apache Spark and AWS Step Functions.
Here is the detailed outline of the course.
- First, you will learn how to Get Started with AWS Elastic Map Reduce (EMR) by understanding how to use AWS Web Console to create and manage EMR Clusters. You will also learn about all the key features of Web Console and also how to connect to the master node of the cluster and validate all the important CLI interfaces such as spark-shell, pyspark, hive, etc as well as hdfs and aws CLI commands.
- Once you understand how to get started with AWS EMR, you will go through the details related to Setting up Development Cluster using AWS EMR. There are quite a few advantages to using AWS EMR Clusters for development purposes and most enterprises do so.
- After setting up a development cluster using AWS EMR, you will go through the Development Life Cycle of Spark Applications using AWS EMR Development Cluster. You will be using Visual Studio Code Remote Development on top of the AWS EMR Development Cluster to go through the details.
- Once the development is done, you will go through the details related to Deploying Spark Application on AWS EMR Cluster. You will build the zip file and understand how to run using CLI in both clients as well as cluster deployment modes. You will also understand how you can deploy the spark application as a step on AWS EMR Clusters. You will also understand the details related to troubleshooting the issues related to Spark Applications by going through relevant logs.
- Typically we run Spark Applications programmatically. After going through the details related to deploying spark applications on AWS EMR Clusters, you will be learning how to Manage AWS EMR Clusters using Python Boto3. You will not only learn how to create clusters programmatically but also how to deploy Spark Applications as Steps programmatically using Python Boto3.
- End to End Data Pipelines using AWS EMR is built using AWS Step Functions. Once you understand how to manage EMR Clusters using Python Boto3 and also deploy Spark Applications on EMR Clusters using the same, it is important to learn how to Build EMR-based Workflows or Pipelines using AWS Step Functions. You will be learning how to create the cluster, deploy Spark Application as Step on to the cluster, and then terminate the cluster as part of a basic pipeline or State Machine using AWS Step Functions.
- You will also learn how to perform validations as part of State Machines by Enhancing AWS EMR-based State Machine or Pipeline. You will check if the files specified already exist as part of the validations.
- We can also build Data Processing Applications or Pipelines using Spark SQL on AWS EMR. First, you will learn how to design and develop solutions using Spark SQL Script, how to validate by using appropriate commands by passing relevant runtime arguments, etc.
- Once you understand the development process of implementing solutions using Spark SQL on AWS EMR, you will learn how to deploy Data Pipeline using AWS Step Function to deploy Spark SQL Script on EMR Cluster. You will also learn the concept of Boto3 Waiters to make sure the steps are executed in a linear fashion.
Getting Started on Windows with Required Tools
Getting Started with AWS EMR
Setup Development Cluster using AWS EMR
-
5Planning of EMR ClusterVideo lesson
-
6Create EC2 Key PairVideo lesson
-
7Setup EMR Cluster with SparkVideo lesson
-
8Understanding Summary of AWS EMR ClusterVideo lesson
-
9Review EMR Cluster Application User InterfacesVideo lesson
-
10Review EMR Cluster MonitoringVideo lesson
-
11Review EMR Cluster Hardware and Cluster Scaling PolicyVideo lesson
-
12Review EMR Cluster ConfigurationsVideo lesson
-
13Review EMR Cluster EventsVideo lesson
-
14Review EMR Cluster StepsVideo lesson
-
15Review EMR Cluster Bootstrap ActionsVideo lesson
-
16Connecting to EMR Master Node using SSHVideo lesson
-
17Disabling Termination Protection and Terminating the ClusterVideo lesson
-
18Clone and Create New ClusterVideo lesson
-
19Listing AWS S3 Buckets and Objects using AWS CLI on EMR ClusterVideo lesson
-
20Listing AWS S3 Buckets and Objects using HDFS CLI on EMR ClusterVideo lesson
-
21Managing Files in AWS s3 using HDFS CLI on EMR ClusterVideo lesson
-
22Review Glue Catalog Databases and TablesVideo lesson
-
23Accessing Glue Catalog Databases and Tables using EMR ClusterVideo lesson
-
24Accessing spark-sql CLI of AWS EMR ClusterVideo lesson
-
25Accessing pyspark CLI of AWS EMR ClusterVideo lesson
-
26Accessing spark-shell CLI of AWS EMR ClusterVideo lesson
-
27Create AWS EMR Cluster for NotebooksVideo lesson
Development Life Cycle using AWS EMR Development Cluster
-
28Create bootstrap script for AWS EMR ClusterVideo lesson
-
29Provision Elastic IP for Master Node of AWS EMR ClusterVideo lesson
-
30Create AWS EMR for DevelopmentVideo lesson
-
31Troubleshooting Issues related to Bootstrap of EMR ClusterVideo lesson
-
32Fix Bootstrap Script for AWS EMR ClusterVideo lesson
-
33Validate AWS EMR Cluster with Bootstrap Action with updated scriptVideo lesson
-
34Setup Python Virtual Environment as part of VS Code WorkspaceVideo lesson
-
35Getting Started with Boto3 to Manage AWS EMR ClustersVideo lesson
-
36Setup boto3 to explore APIs to manage AWS EMR ClustersVideo lesson
-
37Set AWS Profile using env file in Visual Studio CodeVideo lesson
-
38Get Cluster Details of AWS EMR Development Cluster using boto3Video lesson
-
39Getting Instance Id of the Master Node of AWS EMR Cluster using boto3Video lesson
-
40Getting Allocation Id of the Elastic Ip using AWS boto3Video lesson
-
41Associating Elastic Ip with AWS EMR Master Node using Boto3Video lesson
-
42Setup Notebook Environment for EMR Cluster using IAM UserVideo lesson
Deploy Spark Application on AWS EMR Cluster
-
43Open Remote Window on AWS EMR Master Node using VS CodeVideo lesson
-
44Setup Workspace on AWS EMR Master using Git RepositoryVideo lesson
-
45Best Practices and Advantages of using AWS EMR Cluster for Team DevelopmentVideo lesson
-
46Install VSCode Extensions in remote Workspace for PythonVideo lesson
-
47Review Python and Pyspark details on EMR ClusterVideo lesson
-
48Running Applications using local and yarn during developmentVideo lesson
-
49Getting Started with Development of Spark Applications on EMR ClusterVideo lesson
-
50Create Function for Spark SessionVideo lesson
-
51Upload Files to AWS s3 for the development using AWS EMR ClusterVideo lesson
-
52Develop read logic for the Spark ApplicationVideo lesson
-
53Process Data Frame using Spark APIsVideo lesson
-
54Write Data to Files using Spark APIsVideo lesson
-
55Productionize the Code and setup required data sets for validationVideo lesson
-
56Resize the AWS EMR Cluster using Web ConsoleVideo lesson
-
57Validate Changes to productionize the Application CodeVideo lesson
-
58Take the backup and terminate the clusterVideo lesson
Manage AWS EMR Clusters using Python Boto3
-
59Recreate the AWS EMR Cluster to deploy Spark ApplicationsVideo lesson
-
60Setup Code Repository on the AWS EMR Master NodeVideo lesson
-
61Resize the AWS EMR Cluster to validate application on larger data setsVideo lesson
-
62Build Zip File for the Spark ApplicationVideo lesson
-
63Validate the Spark Application using zip file and client as deploy modeVideo lesson
-
64Run Spark Application on EMR using Cluster Deployment ModeVideo lesson
-
65Run Spark Application copied to s3 on EMR using Cluster Deployment ModeVideo lesson
-
66Deploy Spark Application as Step to the AWS EMR ClusterVideo lesson
-
67Setup Multiple Files to Manage AWS s3 Objects using State MachinesVideo lesson
-
68Validate Spark Application Deployed as Step on AWS EMR ClusterVideo lesson
Build EMR based Workflows or Pipelines using AWS Step Functions
-
69Update Material related to Managing AWS EMR using Boto3Video lesson
-
70Create AWS EMR Cluster using AWS CLI CommandVideo lesson
-
71Manage AWS EMR Clusters using AWS CLI CommandsVideo lesson
-
72Overview of AWS boto3 to Manage AWS EMR ClustersVideo lesson
-
73Overview of Run Job Flow API to create AWS EMR ClusterVideo lesson
-
74Create AWS EMR Cluster or Job Flow Cluster using AWS Boto3Video lesson
-
75Prepare Data Sets to add Spark Application as Step to AWS EMR ClusterVideo lesson
-
76Add Spark Application as Step to AWS EMR Cluster using Boto3Video lesson
-
77Exercise to add Spark Application as Step to EMR Cluster using boto3Video lesson
-
78Terminate the AWS EMR Cluster used for adding StepsVideo lesson
-
79Exercise to Create AWS EMR Cluster with Steps for Spark ApplicationVideo lesson
Develop State Machine using AWS Step Functions to manage s3
-
80Review of Development Environment for AWS Step Functions and EMRVideo lesson
-
81Quick Overview of Important Terms of AWS Step FunctionsVideo lesson
-
82Getting Started with EMR based Pipeline using AWS Step FunctionsVideo lesson
-
83Overview of AWS IAM Role associated with State Machine copyVideo lesson
-
84Overview of Creating EMR Cluster using AWS Step FunctionsVideo lesson
-
85Parameters to Create EMR Cluster using AWS Step FunctionsVideo lesson
-
86Attach Permissions to Step Function Role to Create AWS EMR ClusterVideo lesson
-
87Add Step to AWS EMR Cluster using AWS Step FunctionVideo lesson
-
88Validate Adding Step to AWS EMR Cluster using Step FunctionsVideo lesson
-
89Add Action to Step Machine to Terminate the AWS EMR ClusterVideo lesson
-
90Validate the execution of State Machine to run Spark Application on AWS EMRVideo lesson
-
91Terminate AWS EMR Clusters Created to Validate State Machine copyVideo lesson
How long do I have access to the course materials?
You can view and review the lecture materials indefinitely, like an on-demand channel.
Can I take my courses with me wherever I go?
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!

Share
30-Day Money-Back Guarantee
Course details
Video
11 hours
Lectures
1
Certificate of Completion
Full lifetime access
Access on mobile and TV
Popular courses
External Links May Contain Affiliate Links read more