Real Time Spark Project for Beginners: Hadoop, Spark, Docker
- Description
- Curriculum
- FAQ
- Reviews
-
In many data centers, different type of servers generate large amount of data(events, Event in this case is status of the server in the data center) in real-time.
-
There is always a need to process these data in real-time and generate insights which will be used by the server/data center monitoring people and they have to track these server’s status regularly and find the resolution in case of issues occurring, for better server stability.
-
Since the data is huge and coming in real-time, we need to choose the right architecture with scalable storage and computation frameworks/technologies.
-
Hence we want to build the Real Time Data Pipeline Using Apache Kafka, Apache Spark, Hadoop, PostgreSQL, Django and Flexmonster on Docker to generate insights out of this data.
-
The Spark Project/Data Pipeline is built using Apache Spark with Scala and PySpark on Apache Hadoop Cluster which is on top of Docker.
-
Data Visualization is built using Django Web Framework and Flexmonster.
-
3Setting up Docker EnvironmentVideo lesson
-
4Create Single Node Kafka Cluster on DockerVideo lesson
-
5Create Single Node Apache Hadoop and Spark Cluster on DockerVideo lesson
-
6Setting up IntelliJ IDEA Community Edition(IDE)Video lesson
-
7Setting up PyCharm Community Edition(IDE)Video lesson
-
8Setting up Django Web FrameworkVideo lesson
-
9Event Simulator using Python(Server Status Detail)Video lesson
-
10Building Streaming Data Pipeline using Scala | Spark Structured StreamingVideo lesson
-
11Building Streaming Data Pipeline using PySpark | Spark Structured StreamingVideo lesson
-
12Setting up PostgreSQL Database(Events Database)Video lesson
-
13Building Dashboard using Django Web Framework and Flexmonster | VisualizationVideo lesson
External Links May Contain Affiliate Links read more