Skip to content
/ HaMu Public

πŸš€ A tool for quickly deploying a fully containerized pseudo-distributed Hadoop cluster, making Hadoop setup faster and easier.

License

Notifications You must be signed in to change notification settings

DOCUTEE/HaMu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

69 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

All Contributors

πŸš€ HaMu is a tool for quickly deploying a fully containerized pseudo-distributed Hadoop cluster, making Hadoop setup faster and easier.

πŸ“œ Table of Contents

πŸ“– My Story (feel free to skip)

Setting up a Hadoop cluster manually can be frustrating, especially for beginners. My friends and I faced several challenges when deploying a multi-node Hadoop cluster on VMware, such as configuration issues, poor scalability, and inefficient resource usage. To solve these problems, I developed HaMu, a tool that simplifies Hadoop deployment using Docker containers.

I first learned about Hadoop in the Introduction to Big Data course at my university. The subject itself was not difficult, but it became challenging when my friends and I had to deploy a multi-node Hadoop cluster on VMware. While Hadoop is powerful, using VMware introduced several inconveniences, such as:

  • Inability to scroll up to read errors in the terminal
  • Unfriendly communication with external systems
  • Weak performance due to virtualization overhead
  • Difficulty in scaling out
  • High storage usage for backups
  • Long rebuild times when making mistakes

To address these issues, I decided to containerize the Hadoop system using Docker. This project serves as a final summary of my knowledge in both Hadoop and Docker.

πŸ’‘ I hope HaMu helps you quickly set up a Hadoop multi-node cluster, making it easier and more efficient to practice Hadoop. πŸš€

πŸ‘₯ Authors

✨ Features

πŸ–₯️ OS Support

  • πŸͺŸ Windows (via WSL2 or Docker Desktop)
  • 🐧 Linux (Ubuntu, CentOS, Debian, etc.) – ⏳ Coming Soon

πŸ“Œ Prerequisites

  • 🐳 Docker
  • πŸ—ƒοΈ Basic Knowledge of Hadoop

πŸš€ Installation Guide

Please select one of the two options !

Windows Linux

Modify the Owner Name

If you need to change the owner name, run the rename-owner.py script and enter your new owner name when prompted.

⏳ Note: If you want to check the current owner name, it is stored in OwnerName.txt.

πŸ“Œ There are some limitations; you should use a name that is different from words related to the 'Hadoop' or 'Docker' syntax. For example, avoid names like 'hdfs', 'yarn', 'container', or 'docker-compose'.

python rename-owner.py

🌐 Interact with the Web UI

You can access the following web interfaces to monitor and manage your Hadoop cluster:

  • YARN Resource Manager UI β†’ http://localhost:9004
    Provides an overview of cluster resource usage, running applications, and job details.

  • NameNode UI β†’ http://localhost:9870
    Displays HDFS file system details, block distribution, and overall health status.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Nguyen Quoc Huy
Nguyen Quoc Huy

πŸ’»
Add your contributions

This project follows the all-contributors specification. Contributions of any kind welcome!

Contact

πŸ“§ Email: quangforwork1203@gmail.com

πŸ’¬ My project still has many aspects that need improvement. I would greatly appreciate your feedback!

About

πŸš€ A tool for quickly deploying a fully containerized pseudo-distributed Hadoop cluster, making Hadoop setup faster and easier.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •