Skip to content
#

apache-hadoop

Here are 102 public repositories matching this topic...

This repository showcases a Medallion Architecture Data Lakehouse designed for both batch and real-time processing of e-commerce and marketing data. It supports comprehensive data analysis, reporting, and monitoring, providing a scalable solution for deriving insights from integrated datasets.

  • Updated Sep 26, 2024
  • Jupyter Notebook

This project simulates a real-world enterprise data migration and modernization strategy. It extracts transactional data from a simulated "On-Premise" environment (hosted on AWS EC2), performs heavy distributed processing using a Hadoop/Spark cluster, and ultimately serves the data via a Cloud-Native, serverless architecture to optimize costs .

  • Updated Mar 19, 2026
  • Python

Improve this page

Add a description, image, and links to the apache-hadoop topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apache-hadoop topic, visit your repo's landing page and select "manage topics."

Learn more