Practical Guide to setup Hadoop and Spark Cluster using CDH
- Description
- Curriculum
- FAQ
- Reviews
Cloudera is one of the leading vendor for distributions related to Hadoop and Spark. As part of this Practical Guide, you will learn step by step process of setting up Hadoop and Spark Cluster using CDH.
Install – Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.
-
Set up a local CDH repository
-
Perform OS-level configuration for Hadoop installation
-
Install Cloudera Manager server and agents
-
Install CDH using Cloudera Manager
-
Add a new node to an existing cluster
-
Add a service using Cloudera Manager
Configure – Perform basic and advanced configuration needed to effectively administer a Hadoop cluster
-
Configure a service using Cloudera Manager
-
Create an HDFS user’s home directory
-
Configure NameNode HA
-
Configure ResourceManager HA
-
Configure proxy for Hiveserver2/Impala
Manage – Maintain and modify the cluster to support day-to-day operations in the enterprise
-
Rebalance the cluster
-
Set up alerting for excessive disk fill
-
Define and install a rack topology script
-
Install new type of I/O compression library in cluster
-
Revise YARN resource assignment based on user feedback
-
Commission/decommission a node
Secure – Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices
-
Configure HDFS ACLs
-
Install and configure Sentry
-
Configure Hue user authorization and authentication
-
Enable/configure log and query redaction
-
Create encrypted zones in HDFS
Test – Benchmark the cluster operational metrics, test system configuration for operation and efficiency
-
Execute file system commands via HTTPFS
-
Efficiently copy data within a cluster/between clusters
-
Create/restore a snapshot of an HDFS directory
-
Get/set ACLs for a file or directory structure
-
Benchmark the cluster (I/O, CPU, network)
Troubleshoot – Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios
-
Resolve errors/warnings in Cloudera Manager
-
Resolve performance problems/errors in cluster operation
-
Determine reason for application failure
-
Configure the Fair Scheduler to resolve application delays
Our Approach
-
You will start with creating Cloudera QuickStart VM (in case you have laptop with 16 GB RAM with Quad Core). This will facilitate you to get comfortable with Cloudera Manager.
-
You will be able to sign up for GCP and avail credit up to $300 while offer lasts. Credits are valid up to year.
-
You will then understand brief overview about GCP and provision 7 to 8 Virtual Machines using templates. You will also attaching external hard drive to configure for HDFS later.
-
Once servers are provisioned, you will go ahead and set up Ansible for Server Automation.
-
You will take care of local repository for Cloudera Manager and Cloudera Distribution of Hadoop using Packages.
-
You will then setup Cloudera Manager with custom database and then Cloudera Distribution of Hadoop using Wizard that comes as part of Cloudera Manager.
-
As part of setting up of Cloudera Distribution of Hadoop you will setup HDFS, learn HDFS Commands, Setup YARN, Configure HDFS and YARN High Availability, Understand about Schedulers, Setup Spark, Transition to Parcels, Setup Hive and Impala, Setup HBase and Kafka etc.
-
6IntroductionVideo lesson
-
7Setup Ubuntu using Windows SubsystemVideo lesson
-
8Sign up for GCPVideo lesson
-
9Create template for Big Data ServerVideo lesson
-
10Provision Servers for Big Data ClusterVideo lesson
-
11Review ConceptsVideo lesson
-
12Setting up gcloudVideo lesson
-
13Setup ansible on first serverVideo lesson
-
14Format JBODVideo lesson
-
15Cluster TopologyVideo lesson
-
22IntroductionVideo lesson
-
23Setup Pre-requisitesVideo lesson
-
24Install Cloudera ManagerVideo lesson
-
25Licensing and Installation OptionsVideo lesson
-
26Install CM and CDH on all nodesVideo lesson
-
27CM Agents and CM ServerVideo lesson
-
28Setup Cloudera Management ServiceVideo lesson
-
29Cloudera Management Service – ComponentsVideo lesson
-
36IntroductionVideo lesson
-
37Setup HDFSVideo lesson
-
38Copy Data into HDFSVideo lesson
-
39Copy Data into HDFS ContdVideo lesson
-
40Components of HDFSVideo lesson
-
41Components of HDFS ContdVideo lesson
-
42Configuration files and Important PropertiesVideo lesson
-
43Review Web UIs and log filesVideo lesson
-
44CheckpointingVideo lesson
-
45Checkpointing ContdVideo lesson
-
46Namenode Recovery ProcessVideo lesson
-
47Configure Rack AwarenessVideo lesson
-
48IntroductionVideo lesson
-
49Getting list of commands and helpVideo lesson
-
50Creating Directories and Changing OwnershipVideo lesson
-
51Managing Files and File Permissions - Deleting Files from HDFSVideo lesson
-
52Managing Files and File Permissions - Copying Files Local File System and HDFSVideo lesson
-
53Managing Files and File Permissions - Copying Files within HDFSVideo lesson
-
54Managing Files and File Permissions - Previewing Data in HDFSVideo lesson
-
55Managing Files and File Permissions - Changing File PermissionsVideo lesson
-
56Controlling Access using ACLs - Enable ACLs On ClusterVideo lesson
-
57Controlling Access using ACLs - ACLs On FilesVideo lesson
-
58Controlling Access using ACLs - ACLs On DirectoriesVideo lesson
-
59Controlling Access using ACLs - Removing ACLsVideo lesson
-
60Overriding PropertiesVideo lesson
-
61HDFS usage commands and getting metadataVideo lesson
-
62Creating SnapshotsVideo lesson
-
63Using CLI for administrationVideo lesson
-
64IntroductionVideo lesson
-
65Setup YARN + MR2Video lesson
-
66Run Simple Map Reduce JobVideo lesson
-
67Components of YARN and MR2Video lesson
-
68Configuration files and Important Properties - OverviewVideo lesson
-
69Configuration files and Important Properties - Review YARN PropertiesVideo lesson
-
70Configuration files and Important Properties - Review Map Reduce PropertiesVideo lesson
-
71Configuration files and Important Properties - Running JobsVideo lesson
-
72Review Web UIs and log filesVideo lesson
-
73YARN and MR2 CLIVideo lesson
-
74YARN Application Life CycleVideo lesson
-
75Map Reduce Job Execution Life CycleVideo lesson
-
76IntroductionVideo lesson
-
77High Availability – OverviewVideo lesson
-
78Configure HDFS Namenode HAVideo lesson
-
79Review Properties – HDFS Namenode HAVideo lesson
-
80HDFS Namenode HA – Quick Recap of HDFS typical ConfigurationVideo lesson
-
81HDFS Namenode HA – ComponentsVideo lesson
-
82HDFS Namenode HA – Automatic failoverVideo lesson
-
83Configure YARN Resource Manager HAVideo lesson
-
84Review – YARN Resource Manager HAVideo lesson
-
85High Availability – ImplicationsVideo lesson

External Links May Contain Affiliate Links read more