π HaMu is a tool for quickly deploying a fully containerized pseudo-distributed Hadoop cluster, making Hadoop setup faster and easier.
- π My Story
- π₯ Authors
- β¨ Features
- β¨ OS support
- β¨ Prerequisites
- π Installation Guide
- π Modify the Owner Name
- π Interact with the Web UI
- β¨ Contributors
- π Contact
Setting up a Hadoop cluster manually can be frustrating, especially for beginners. My friends and I faced several challenges when deploying a multi-node Hadoop cluster on VMware, such as configuration issues, poor scalability, and inefficient resource usage. To solve these problems, I developed HaMu, a tool that simplifies Hadoop deployment using Docker containers.
I first learned about Hadoop in the Introduction to Big Data course at my university. The subject itself was not difficult, but it became challenging when my friends and I had to deploy a multi-node Hadoop cluster on VMware. While Hadoop is powerful, using VMware introduced several inconveniences, such as:
- Inability to scroll up to read errors in the terminal
- Unfriendly communication with external systems
- Weak performance due to virtualization overhead
- Difficulty in scaling out
- High storage usage for backups
- Long rebuild times when making mistakes
To address these issues, I decided to containerize the Hadoop system using Docker. This project serves as a final summary of my knowledge in both Hadoop and Docker.
π‘ I hope HaMu helps you quickly set up a Hadoop multi-node cluster, making it easier and more efficient to practice Hadoop. π
- Deploy a Hadoop multi-node cluster with a single command.
- Customize the number of slave nodes.
- Set the cluster owner's name.
- Interact with the cluster via CLI or Web UI.
- πͺ Windows (via WSL2 or Docker Desktop)
- π§ Linux (Ubuntu, CentOS, Debian, etc.) β β³ Coming Soon
- π³ Docker
- ποΈ Basic Knowledge of Hadoop
Please select one of the two options !


If you need to change the owner name, run the rename-owner.py
script and enter your new owner name when prompted.
β³ Note: If you want to check the current owner name, it is stored in
OwnerName.txt
.π There are some limitations; you should use a name that is different from words related to the 'Hadoop' or 'Docker' syntax. For example, avoid names like 'hdfs', 'yarn', 'container', or 'docker-compose'.
python rename-owner.py
You can access the following web interfaces to monitor and manage your Hadoop cluster:
-
YARN Resource Manager UI β http://localhost:9004
Provides an overview of cluster resource usage, running applications, and job details. -
NameNode UI β http://localhost:9870
Displays HDFS file system details, block distribution, and overall health status.
Thanks goes to these wonderful people (emoji key):
Nguyen Quoc Huy π» |
||||||
|
This project follows the all-contributors specification. Contributions of any kind welcome!
π§ Email: quangforwork1203@gmail.com
π¬ My project still has many aspects that need improvement. I would greatly appreciate your feedback!