Spark SQL & Hadoop (For Data Scientists & Big Data Analysts)
- Description
- Curriculum
- FAQ
- Reviews
Apache Spark is currently one of the most popular systems for processing big data.
Apache Hadoop continues to be used by many organizations that look to store data locally on premises. Hadoop allows these organisations to efficiently store big datasets ranging in size from gigabytes to petabytes.
As the number of vacancies for data science, big data analysis and data engineering roles continue to grow, so too will the demand for individuals that possess knowledge of Spark and Hadoop technologies to fill these vacancies.
This course has been designed specifically for data scientists, big data analysts and data engineers looking to leverage the power of Hadoop and Apache Spark to make sense of big data.
This course will help those individuals that are looking to interactively analyse big data or to begin writing production applications to prepare data for further analysis using Spark SQL in a Hadoop environment.
The course is also well suited for university students and recent graduates that are keen to gain exposure to Spark & Hadoop or anyone who simply wants to apply their SQL skills in a big data environment using Spark-SQL.
This course has been designed to be concise and to provide students with a necessary and sufficient amount of theory, enough for them to be able to use Hadoop & Spark without getting bogged down in too much theory about older low-level APIs such as RDDs.
On solving the questions contained in this course students will begin to develop those skills & the confidence needed to handle real world scenarios that come their way in a production environment.
(a) There are just under 30 problems in this course. These cover hdfs commands, basic data engineering tasks and data analysis.
(b) Fully worked out solutions to all the problems.
(c) Also included is the Verulam Blue virtual machine which is an environment that has a spark Hadoop cluster already installed so that you can practice working on the problems.
- The VM contains a Spark Hadoop environment which allows students to read and write data to & from the Hadoop file system as well as to store metastore tables on the Hive metastore.
- All the datasets students will need for the problems are already loaded onto HDFS, so there is no need for students to do any extra work.
- The VM also has Apache Zeppelin installed. This is a notebook specific to Spark and is similar to Python’s Jupyter notebook.
This course will allow students to get hands-on experience working in a Spark Hadoop environment as they practice:
- Converting a set of data values in a given format stored in HDFS into new data values or a new data format and writing them into HDFS.
- Loading data from HDFS for use in Spark applications & writing the results back into HDFS using Spark.
- Reading and writing files in a variety of file formats.
- Performing standard extract, transform, load (ETL) processes on data using the Spark API.
- Using metastore tables as an input source or an output sink for Spark applications.
- Applying the understanding of the fundamentals of querying datasets in Spark.
- Filtering data using Spark.
- Writing queries that calculate aggregate statistics.
- Joining disparate datasets using Spark.
- Producing ranked or sorted data.
-
3Section IntroductionVideo lesson
-
4Big DataVideo lesson
-
5Distributed Storage & ProcessingVideo lesson
-
6Introduction to HadoopVideo lesson
-
7Introduction to SparkVideo lesson
-
8Spark ApplicationsVideo lesson
-
9Spark's Interactive ShellVideo lesson
-
10Distributed Processing on a Hadoop Cluster using SparkVideo lesson
-
11Section IntroductionVideo lesson
-
12Install Oracle VM VirtualBoxVideo lesson
-
13The Verulam Blue VM - Zipped Files for DownloadingText lesson
-
14Loading the Verulam Blue VMVideo lesson
-
15Booting up the VMVideo lesson
-
16Spin Up ClusterVideo lesson
-
17spark-shellVideo lesson
-
18Run Zeppelin NotebookVideo lesson
-
19Problems & practice test questionsVideo lesson
-
20Interacting with HDFSVideo lesson
-
21The File System Shell (FS Shell)Video lesson
-
22Commands and operations -helpVideo lesson
-
23Commands and operations -lsVideo lesson
-
24Commands and operations -findVideo lesson
-
25Commands and operations -mkdirVideo lesson
-
26Commands and operations -putVideo lesson
-
27Commands and operations -cp -mvVideo lesson
-
28Commands and operations -cat -tail -textVideo lesson
-
29Commands and operations -rmdir -rmVideo lesson
-
30Commands and operations -getVideo lesson
-
31Health warningVideo lesson
-
32HDFS Basic File Management - Problems & SolutionsText lesson
-
46Section IntroductionVideo lesson
-
47The ETL ProcessVideo lesson
-
48The Extract Phase of an ETL processVideo lesson
-
49The Extract Phase - Loading CSV and Text filesVideo lesson
-
50The Extract Phase - Loading JSON and Parquet filesVideo lesson
-
51The Extract Phase - Loading Avro and ORC filesVideo lesson
-
52The Transform Phase of an ETL processVideo lesson
-
53The Transform Phase - String TransformationsVideo lesson
-
54The Transform Phase - Numerical TransformationsVideo lesson
-
55The Transform Phase - Date & Time TransformationsVideo lesson
-
56The Transform Phase - Data Type TransformationsVideo lesson
-
57The Transform Phase - Transformations of NullsVideo lesson
-
58The Load Phase of an ETL processVideo lesson
-
59The Load Phase - Saving DataFrame data to Files IVideo lesson
-
60The Load Phase - Saving DataFrame data to Files IIVideo lesson
-
61The Load Phase - Saving DataFrame data to TablesVideo lesson
-
62Data Engineering - Solutions to ProblemsText lesson
-
63Section IntroductionVideo lesson
-
64Metastore Tables as Input Sources or Output SinksVideo lesson
-
65Querying datasets in SparkVideo lesson
-
66Math Functions in SQLVideo lesson
-
67FilteringVideo lesson
-
68Sorting & RankingVideo lesson
-
69AggregationVideo lesson
-
70GroupingVideo lesson
-
71Multi Table QueriesVideo lesson
-
72Multi Table Queries - JoinsVideo lesson
-
73Multi Table Queries - Types of JoinsVideo lesson
-
74Multi Table Queries - UnionsVideo lesson
-
75Data Analysis - Solutions to ProblemsText lesson
External Links May Contain Affiliate Links read more