Wednesday, 13 November 2013

Hadoop-2.2.0 Installation Steps for Single-Node Cluster (On Ubuntu 12.04)



1.       Download and install VMware Player depending on your Host OS (32 bit or 64 bit)


2.       Download the .iso image file of Ubuntu 12.04 LTS (32-bit or 64-bit depending on your requirements)


3.       Install Ubuntu from image in VMware. (For efficient use, configure the Virtual Machine to have at least 2GB (4GB preferred) of RAM and at least 2 cores of processor


Note: Install it using any user id and password you prefer to keep for your Ubuntu installation. We will create a separate user for Hadoop installation later.


4.       After Ubuntu is installed, login to it and go to User Accounts(right-top corner) to create a new user for Hadoop


5.       Click on “Unlock” and unlock the settings by entering your administrator password.


6.        Then click on “+” at the bottom-left to add a new user. Add the user type as Administrator (I prefer this but you can also select as Standard) and then add the username as “hduser” and create it.
Note: After creating the account you may see it as disabled. Click on the Dropdown where Disabled is written and select “Set Password” – to set the password for this account or select “Enable” – to enable this account without password.


7.       Your account is set. Now login into your new “hduser” account.


8.       Open terminal window by pressing Ctrl + Alt + T


9. Install openJDK using the following command
$ sudo apt-get install openjdk-7-jdk

10. Verify the java version installed
$ java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.12) (7u25-2.3.12-4ubuntu3)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)


11. Create a symlink from openjdk default name to ‘jdk’ using the following commands:
$ cd /usr/lib/jvm
$ ln -s java-7-openjdk-amd64 jdk


12. Install ssh server:
$ sudo apt-get install openssh-client
$ sudo apt-get install openssh-server


13. Add hadoop group and user
$ sudo addgroup hadoop
$ usermod -a -G hadoop hduser


To verify that hduser has been added to the group hadoop use the command:
$ groups hduser


which will display the groups hduser is in.


14. Configure SSH:
$ ssh-keygen -t rsa -P ''
...
Your identification has been saved in /home/hduser/.ssh/id_rsa
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub
...
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost


15.Disable IPv6 because it creates problems in Hadoop– Run the following command:
$ gksudo gedit /etc/sysctl.conf


16. Add the following line to the end of the file:
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1


Save and close the file. Then restart the system and login with hduser again.


17. Download Hadoop - 2.2.0 from the following link to your Downloads folder


18. Extract Hadoop and move it to /usr/local and make this user own it:
$ cd Downloads
$ sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local
$ cd /usr/local
$ sudo mv hadoop-2.2.0 hadoop
$ sudo chown -R hduser:hadoop hadoop



19. Open the .bashrc file to edit it:
$ cd ~
$ gksudo gedit .bashrc


20. Add the following lines to the end of the file:
#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
#end of paste


Save and close the file.


21. Open hadoop-env.sh to edit it:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh


and modify the JAVA_HOME variable in the File:


export JAVA_HOME=/usr/lib/jvm/jdk/


Save and close the file
Restart the system and re-login


22. Verify the Hadoop Version installed using the following command in the terminal:


$ hadoop version

The output should be like:
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar


This makes sure that Hadoop is installed and we just have to configure it now.


23. Run the following command:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml


24. Add the following between the <configuration> ... </configuration> tags
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>


Then Save and close the file


25. In extended terminal write:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

and Paste following between <configuration> … </configuration> tags
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>


<property>
 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>


26. Run the following command:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml.template


27. Add the following between the <configuration> ... </configuration> tags
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>



28. Instead of saving the file directly, Save As… and then set the filename as mapred-site.xml. Verify that the file is being saved to the /usr/local/hadoop/etc/hadoop/  directory only


29. Type following commands to make folders for namenode and datanode:


$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode

  


30. Run the following:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml



31. Add the following lines between the <configuration> … </configuration> tags
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>


<property>
 <name>dfs.namenode.name.dir</name>
 <value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>


<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>



32. Format the namenode with HDFS:
$ hdfs namenode -format



33. Start Hadoop Services:


$ start-dfs.sh
....
$ start-yarn.sh
….



34. Run the following command to verify that hadoop services are running
$ jps


If everything was successful, you should see following services running2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode


Acknowledgement & Reference:
1. For testing Hadoop File System with copying and deleting files from Local machine, please refer to the following link:



2. There are not many good references available for the installation of Hadoop - 2.2.0 online (I had to go through a lot of them to actually find the one). I followed the steps as given on the following webpage.


The changes made are by me to fix the errors which I faced during the installation process.


3. Check out this video if you wish to see how to configure Ubuntu on VMware. This video also explains how to install the previous (1.x) version of Hadoop

Hadoop 1.x Video