Thursday, 10 November 2016



Considering that, hadoop is installed across all machines
  • Download and install scala either using tarball or using command
    sudo apt-get install scala
  • Check the version of scala using command
scala  -version
  • Update the hosts file in /etc/hosts , by adding the all machines ip-address with their hostnames
sudo vi /etc/hosts
120.120.120.114  bigdata-node-1-1
120.120.120.113  bigdata-node-1-2
120.120.120.112  bigdata-node-1-3
  • Download apache spark and untar it, and move it to some location
       tar xzf spark-1.6.0-bin-hadoop2.6.tgz
sudo mv spark-1.6.0-bin-hadoop2.6/ /opt/
  • Update configuration files under /opt/spark-1.6.0-bin-hadoop2.6/conf/
    add worker/slaves nodes ip-address or host-name in slaves files
    vi  slaves
      bigdata-node-1-2
  bigdata-node-1-3
sudo cp spark-env.sh.template spark-env.sh
add the below lines in the spark-env.sh file at the end
vi  spark-env.sh
   SPARK_JAVA_OPTS=-Dspark.driver.port=53411
   HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
   SPARK_MASTER_IP=bigdata-node-1-1
sudo cp spark-defaults.conf.template spark-defaults.conf
add the below lines in the spark-defaults.conf file
vi  spark-defaults.conf
   spark.master                     spark://bigdata-node-1-1:7077
   spark.serializer                 org.apache.spark.serializer.KryoSerializer
  • Perform the above steps on each system
  • Generate pubkey using ssh on master machine
    ssh-keygen -t -rsa
    Press enter to generate
       Copy the pubkey to authorized_keys file
       cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
       Copy the master’s pubkey to all slave machines to the ~/.ssh/authorized_keys
  • Update the spark environmental variables under .bashrc file
    sudo vi ~/.bashrc
    export SPARK_HOME=/opt/spark-1.6.0-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
export PATH=$PATH:$SPARK_HOME/sbin

Save and exit
source ~/.bashrc
exec bash
  • Start all the daemons using start-all.sh or start-master.sh and start-slaves.sh 
     To access spark web ui: http://<ip-address_of_master>:8080/

  • To stop , use stop-all.sh or stop-master.sh and stop-slaves.sh