GitHub - huy-dataguy/Spark-on-YARN: This repository contains the configuration and scripts necessary to run Apache Spark on a Hadoop YARN cluster in client mode. The setup allows you to leverage the scalability of YARN for distributed data processing with Spark.

🏢 Spark on YARN Architecture Client Mode

🚀 Installation Guide

Step 1: Clone the Repository

  git clone https://github.com/huy-dataguy/Spark-on-YARN.git
  cd Spark-on-YARN

Step 2: Build Image Base

⏳ Note: The first build may take a few minutes as no cached layers exist.

  docker build -t base -f docker/base.dockerfile .

Step 3: Build and Start Cluster

build image (build in the first time or after make changes in dockerfile)

  docker compose -f docker/compose.yaml build

run container

  docker compose -f docker/compose.yaml up -d

Step 4: Verify the Installation

Go inside master container's CLI

💡 Start the HDFS - YARN services:

  start-dfs.sh
  start-yarn.sh

Step 5: Run Spark Submit on Yarn Client Mode

Create folder store spark logs

  hdfs dfs -mkdir /spark-logs

Run spark on yarn

  spark-submit \
  --class org.apache.spark.examples.SparkPi \
  $SPARK_HOME/examples/jars/spark-examples_*.jar 10

If success you will see answear Pi = 3,14159

🌐 Interact with the Web UI

You can access the following web interfaces to monitor and manage your Hadoop cluster:

YARN Resource Manager UI → http://localhost:9004
Provides an overview of cluster resource usage, running applications, and job details.
NameNode UI → http://localhost:9870
Displays HDFS file system details, block distribution, and overall health status.
Spark Web UI → http://localhost:4040 Provides an interface to monitor running Spark jobs, stages, and tasks. Note: Because you are using YARN client mode, the Spark UI will automatically redirect to the master node's web UI.

📞 Contact

📧 Email: quochuy.working@gmail.com

💬 Feel free to contribute and improve this project! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
docker		docker
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏢 Spark on YARN Architecture Client Mode

🚀 Installation Guide

Step 1: Clone the Repository

Step 2: Build Image Base

Step 3: Build and Start Cluster

Step 4: Verify the Installation

Step 5: Run Spark Submit on Yarn Client Mode

🌐 Interact with the Web UI

📞 Contact

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

huy-dataguy/Spark-on-YARN

Folders and files

Latest commit

History

Repository files navigation

🏢 Spark on YARN Architecture Client Mode

🚀 Installation Guide

Step 1: Clone the Repository

Step 2: Build Image Base

Step 3: Build and Start Cluster

Step 4: Verify the Installation

Step 5: Run Spark Submit on Yarn Client Mode

🌐 Interact with the Web UI

📞 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages