Have a question?
Message sent Close
0
0 reviews

The Complete GCP Data Engineering Project - Retailer Domain

Industry Standard Project in Retailer Domain using GCP services like GCS, BigQuery, Dataproc, Composer, GitHub, CICD
Instructor
coursevania
Category
  • Description
  • Curriculum
  • Reviews
  • This project focuses on building a data lake in Google Cloud Platform (GCP) for Retailer Domain

  • The goal is to centralize, clean, and transform data from multiple sources, enabling Retailers providers and insurance companies to streamline billing, claims processing, and revenue tracking.

  • GCP Services Used:

    • Google Cloud Storage (GCS): Stores raw and processed data files.

    • BigQuery: Serves as the analytical engine for storing and querying structured data.

    • Dataproc: Used for large-scale data processing with Apache Spark.

    • Cloud Composer (Apache Airflow): Automates ETL pipelines and workflow orchestration.

    • Cloud SQL (MySQL): Stores transactional Electronic Medical Records (EMR) data.

    • GitHub & Cloud Build: Enables version control and CI/CD implementation.

    • CICD (Continuous Integration & Continuous Deployment): Automates deployment pipelines for data processing and ETL workflows.

  • Techniques involved :

    • Metadata Driven Approach

    • SCD type 2 implementation

    • CDM(Common Data Model)

    • Medallion Architecture

    • Logging and Monitoring

    • Error Handling

    • Optimizations

    • CICD implementation

    • many more best practices

  • Data Sources

    • MySQL Retailer Database

    • MySQL Supplier Database

    • API Reviews (api-reviews)

       

  • Expected Outcomes

    • Efficient Data Pipeline: Automating the ingestion and transformation of RCM data.

    • Structured Data Warehouse: gold tables in BigQuery for analytical queries.

    • After Analysis, Looker BI is used to generate dashboards and reports based on gold-layer tables.

    • All processes (data extraction, loading into GCS, transformation in BigQuery) are managed using Apache Airflow, ensuring automation, scheduling, and monitoring.

       

Who this course is for:

  • Aspiring Data Engineers, Data Professionals
  • For getting interview Ready
The Complete GCP Data Engineering Project - Retailer Domain.webp
Share
Course details
Level Beginner

External Links May Contain Affiliate Links read more

Join our Telegram Channel To Get Latest Notification & Course Updates!
Join Our Telegram For FREE Courses & Canva PremiumJOIN NOW