Spark/Hadoop Cluster: Difference between revisions

No edit summary
No edit summary
Line 4: Line 4:
This assumes the spark/hadoop cluster were configured in a particular way. You can see the general configuration from the Foreman page, but in general, spark was configured in the /usr/local/spark directory and hadoop was installed to /usr/local/hadoop.
This assumes the spark/hadoop cluster were configured in a particular way. You can see the general configuration from the Foreman page, but in general, spark was configured in the /usr/local/spark directory and hadoop was installed to /usr/local/hadoop.


This is a <a href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html">
This is a [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html good guide] for general setup of a single-node cluster
good guide</a> for general setup of a single-node cluster


= Passwordless SSH from Master =
= Passwordless SSH from Master =

Revision as of 06:48, 29 January 2024

Getting Started

This assumes the spark/hadoop cluster were configured in a particular way. You can see the general configuration from the Foreman page, but in general, spark was configured in the /usr/local/spark directory and hadoop was installed to /usr/local/hadoop.

This is a good guide for general setup of a single-node cluster

Passwordless SSH from Master

To allow the spark master user to ssh to itself (for a local worker) and also the workers, you need ssh passwordless to be enabled. This can be done by logging into the spark user on the master server and doing:

ssh-keygen -t rsa -P ""

Once the key has been generated, it will be in /home/spark/.ssh/id_rsa (by default). Copy it to the authorized hosts file (to allow spark to ssh to itself):


cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Or, for each worker, do something like:

ssh-copy-id -i ~/.ssh/id_rsa.pub spark@localhost
ssh-copy-id -i ~/.ssh/id_rsa.pub spark@spark2.lab.bpopp.net


Starting Spark

su spark
cd /usr/local/spark/sbin
./start-all.sh

Starting Hadoop

Note that the namenode needs to be formatted prior to startup or it will not work.

(assuming still spark user)

hdfs namenode -format
cd /usr/local/hadoop/sbin
./start-all.sh

Spark UI

http://spark1.lab.bpopp.net:8080