Rack Awareness on Hadoop

  • 0

Rack Awareness on Hadoop

Category : Bigdata

If you have Hadoop clusters of more than 30-40 nodes then it is better you have configured it with rack awarenwss because communication between two data nodes on the same rack is efficient than the same between two nodes on different racks.

It also have us to improve network traffic while reading/writing HDFS files, NameNode chooses data nodes which are on the same rack or a near by rack to read/write request (client node).

NameNode achieves this rack information by maintaining  rack ids of each data node. This concept of choosing closer data nodes based on racks information is called Rack Awareness in Hadoop.

Note : A default Hadoop installation assumes all the nodes belong to the same rack.

So in this article I have explained how to make your cluster rack aware.

Step 1: Create a topology data file anywhere in Master node(i.e NN) and insert all datanodes ip address corresponding to rack. 

[root@m1 ~]# vi topology.data

[root@m1 ~]# cat topology.data

192.168.56.51 01

192.168.56.52 02

192.168.56.53 01

192.168.56.54 02

192.168.56.55 01

192.168.56.56 02

Step 2: Now create rack-topology.sh for above data files. 

root@m1 ~]# vi rack-topology.sh

[root@m1 ~]# cat rack-topology.sh

#!/bin/bash

# Adjust/Add the property “net.topology.script.file.name”

# to core-site.xml with the “absolute” path the this

# file.  ENSURE the file is “executable”.

# Supply appropriate rack prefix

RACK_PREFIX=default

# To test, supply a hostname as script input:

if [ $# -gt 0 ]; then

CTL_FILE=${CTL_FILE:-“rack_topology.data”}

HADOOP_CONF=${HADOOP_CONF:-“/etc/hadoop/conf”}

if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then

  echo -n “/$RACK_PREFIX/rack “

  exit 0

fi

while [ $# -gt 0 ] ; do

  nodeArg=$1

  exec< ${HADOOP_CONF}/${CTL_FILE}

  result=””

  while read line ; do

    ar=( $line )

    if [ “${ar[0]}” = “$nodeArg” ] ; then

      result=”${ar[1]}”

    fi

  done

  shift

  if [ -z “$result” ] ; then

    echo -n “/$RACK_PREFIX/rack “

  else

    echo -n “/$RACK_PREFIX/rack_$result “

  fi

done

else

  echo -n “/$RACK_PREFIX/rack “

fi

Step 3: Add this property into core-site.xml or through ambari add following property. 

<property>
<name>topology.script.file.name</name>
<value>/home/hadoop/topology.sh</value>
</property>
or net.topology.script.file.name to your ambari.
Step 4: Now you need to restart your hdfs service to get it reflect. 
I hope this article helped you to make your cluster rack awareness. Please fell free to give your feedback.