Setup Big Data Development Environment for Spark and Hadoop
- Description
- Curriculum
- FAQ
- Reviews
One of the key aspects to work on Big Data projects using technologies such as Spark and Hadoop is to have an appropriate development environment. By the end of the course, one will have the development environment ready to build Spark-based applications leveraging the power of multi-node clusters such as EMR, Databricks, etc.
Even though interactive CLIs are effective in learning, they are not good enough for the collaborative development of Spark Applications. Here is what you will be doing to set up an Environment for Application Development using Big Data Technologies such as Hadoop and Spark.
-
Overview of IDEs or Integrated Development Environment Tools such as VS Code, Pycharm, etc.
-
Setup Visual Studio Code on Windows or Mac along with Remote Development Extension Pack
-
Setup Multi-Node Big Data Cluster using AWS Elastic Map Reduce aka AWS EMR.
-
Validate Connectivity to Master Node of AWS EMR Cluster
-
Setup Workspace on Master Node of AWS EMR Cluster using Visual Studio Code Remote Development Extension Pack.
-
Understand Application Development Life Cycle using Spark.
-
Validate the Application locally using spark-submit command.
-
Setup Required Data Sets in AWS s3
-
Build the Spark Application Bundle as a zip file and deploy using both clients as well as cluster mode.
-
Run Spark Application using CLI on Master Node of the cluster.
-
Deploy the Spark Application as Step using EMR Cluster
-
5Planning of EMR ClusterVideo lesson
-
6Create EC2 Key PairVideo lesson
-
7Setup EMR Cluster with SparkVideo lesson
-
8Understanding Summary of AWS EMR ClusterVideo lesson
-
9Review EMR Cluster Application User InterfacesVideo lesson
-
10Review EMR Cluster MonitoringVideo lesson
-
11Review EMR Cluster Hardware and Cluster Scaling PolicyVideo lesson
-
12Review EMR Cluster ConfigurationsVideo lesson
-
13Review EMR Cluster EventsVideo lesson
-
14Review EMR Cluster StepsVideo lesson
-
15Review EMR Cluster Bootstrap ActionsVideo lesson
-
16Connecting to EMR Master Node using SSHVideo lesson
-
17Disabling Termination Protection and Terminating the ClusterVideo lesson
-
18Clone and Create New ClusterVideo lesson
-
19Listing AWS S3 Buckets and Objects using AWS CLI on EMR ClusterVideo lesson
-
20Listing AWS S3 Buckets and Objects using HDFS CLI on EMR ClusterVideo lesson
-
21Managing Files in AWS s3 using HDFS CLI on EMR ClusterVideo lesson
-
22Accessing spark-sql CLI of AWS EMR ClusterVideo lesson
-
23Accessing pyspark CLI of AWS EMR ClusterVideo lesson
-
24Accessing spark-shell CLI of AWS EMR ClusterVideo lesson
-
25Create AWS EMR Cluster for NotebooksVideo lesson
-
26Create bootstrap script for AWS EMR ClusterVideo lesson
-
27Provision Elastic IP for Master Node of AWS EMR ClusterVideo lesson
-
28Create AWS EMR Cluster for DevelopmentVideo lesson
-
29Troubleshooting Issues related to Bootstrap of EMR ClusterVideo lesson
-
30Fix Bootstrap Script for AWS EMR ClusterVideo lesson
-
31Validate AWS EMR Cluster with Bootstrap Action with updated scriptVideo lesson
-
32Setup Python Virtual Environment as part of VS Code WorkspaceVideo lesson
-
33Getting Started with Boto3 to Manage AWS EMR ClustersVideo lesson
-
34Setup boto3 to explore APIs to manage AWS EMR ClustersVideo lesson
-
35Set AWS Profile using env file in Visual Studio CodeVideo lesson
-
36Get Cluster Details of AWS EMR Development Cluster using boto3Video lesson
-
37Getting Instance Id of the Master Node of AWS EMR Cluster using boto3Video lesson
-
38Getting Allocation Id of the Elastic Ip using AWS boto3Video lesson
-
39Associating Elastic Ip with AWS EMR Master Node using Boto3Video lesson
-
40Setup Notebook Environment for EMR Cluster using IAM UserVideo lesson
-
41Open Remote Window on AWS EMR Master Node using VS CodeVideo lesson
-
42Setup Workspace on AWS EMR Master using Git RepositoryVideo lesson
-
43Best Practices and Advantages of using AWS EMR Cluster for Team DevelopmentVideo lesson
-
44Install VSCode Extensions in remote Workspace for PythonVideo lesson
-
45Review Python and Pyspark details on EMR ClusterVideo lesson
-
46Running Applications using local and yarn during developmentVideo lesson
-
47Getting Started with Development of Spark Applications on EMR ClusterVideo lesson
-
48Create Function for Spark SessionVideo lesson
-
49Upload Files to AWS s3 for the development using AWS EMR ClusterVideo lesson
-
50Develop read logic for the Spark ApplicationVideo lesson
-
51Process Data Frame using Spark APIsVideo lesson
-
52Write Data to Files using Spark APIsVideo lesson
-
53Productionize the Code and setup required data sets for validationVideo lesson
-
54Resize the AWS EMR Cluster using Web ConsoleVideo lesson
-
55Validate Changes to productionize the Application CodeVideo lesson
-
56Take the backup and terminate the clusterVideo lesson
External Links May Contain Affiliate Links read more