git clone https://github.com/huy-dataguy/Spark-on-YARN.git
cd Spark-on-YARN
⏳ Note: The first build may take a few minutes as no cached layers exist.
docker build -t base -f docker/base.dockerfile .
- build image (build in the first time or after make changes in dockerfile)
docker compose -f docker/compose.yaml build
- run container
docker compose -f docker/compose.yaml up -d
- Go inside master container's CLI
💡 Start the HDFS - YARN services:
start-dfs.sh
start-yarn.sh

Create folder store spark logs
hdfs dfs -mkdir /spark-logs
Run spark on yarn
spark-submit \
--class org.apache.spark.examples.SparkPi \
$SPARK_HOME/examples/jars/spark-examples_*.jar 10
If success you will see answear Pi = 3,14159
You can access the following web interfaces to monitor and manage your Hadoop cluster:
-
YARN Resource Manager UI → http://localhost:9004
Provides an overview of cluster resource usage, running applications, and job details. -
NameNode UI → http://localhost:9870
Displays HDFS file system details, block distribution, and overall health status. -
Spark Web UI → http://localhost:4040 Provides an interface to monitor running Spark jobs, stages, and tasks. Note: Because you are using YARN client mode, the Spark UI will automatically redirect to the master node's web UI.


📧 Email: quochuy.working@gmail.com
💬 Feel free to contribute and improve this project! 🚀