Wednesday, 13 November 2013

Hadoop-2.2.0 Installation Steps for Single-Node Cluster (On Ubuntu 12.04)

1.       Download and install VMware Player depending on your Host OS (32 bit or 64 bit)

2.       Download the .iso image file of Ubuntu 12.04 LTS (32-bit or 64-bit depending on your requirements)

3.       Install Ubuntu from image in VMware. (For efficient use, configure the Virtual Machine to have at least 2GB (4GB preferred) of RAM and at least 2 cores of processor

Note: Install it using any user id and password you prefer to keep for your Ubuntu installation. We will create a separate user for Hadoop installation later.

4.       After Ubuntu is installed, login to it and go to User Accounts(right-top corner) to create a new user for Hadoop

5.       Click on “Unlock” and unlock the settings by entering your administrator password.

6.        Then click on “+” at the bottom-left to add a new user. Add the user type as Administrator (I prefer this but you can also select as Standard) and then add the username as “hduser” and create it.
Note: After creating the account you may see it as disabled. Click on the Dropdown where Disabled is written and select “Set Password” – to set the password for this account or select “Enable” – to enable this account without password.

7.       Your account is set. Now login into your new “hduser” account.

8.       Open terminal window by pressing Ctrl + Alt + T

9. Install openJDK using the following command
$ sudo apt-get install openjdk-7-jdk

10. Verify the java version installed
$ java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.12) (7u25-2.3.12-4ubuntu3)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

11. Create a symlink from openjdk default name to ‘jdk’ using the following commands:
$ cd /usr/lib/jvm
$ ln -s java-7-openjdk-amd64 jdk

12. Install ssh server:
$ sudo apt-get install openssh-client
$ sudo apt-get install openssh-server

13. Add hadoop group and user
$ sudo addgroup hadoop
$ usermod -a -G hadoop hduser

To verify that hduser has been added to the group hadoop use the command:
$ groups hduser

which will display the groups hduser is in.

14. Configure SSH:
$ ssh-keygen -t rsa -P ''
Your identification has been saved in /home/hduser/.ssh/id_rsa
Your public key has been saved in /home/hduser/.ssh/
$ cat ~/.ssh/ >> ~/.ssh/authorized_keys
$ ssh localhost

15.Disable IPv6 because it creates problems in Hadoop– Run the following command:
$ gksudo gedit /etc/sysctl.conf

16. Add the following line to the end of the file:
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Save and close the file. Then restart the system and login with hduser again.

17. Download Hadoop - 2.2.0 from the following link to your Downloads folder

18. Extract Hadoop and move it to /usr/local and make this user own it:
$ cd Downloads
$ sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local
$ cd /usr/local
$ sudo mv hadoop-2.2.0 hadoop
$ sudo chown -R hduser:hadoop hadoop

19. Open the .bashrc file to edit it:
$ cd ~
$ gksudo gedit .bashrc

20. Add the following lines to the end of the file:
#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
#end of paste

Save and close the file.

21. Open to edit it:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/

and modify the JAVA_HOME variable in the File:

export JAVA_HOME=/usr/lib/jvm/jdk/

Save and close the file
Restart the system and re-login

22. Verify the Hadoop Version installed using the following command in the terminal:

$ hadoop version

The output should be like:
Hadoop 2.2.0
Subversion -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar

This makes sure that Hadoop is installed and we just have to configure it now.

23. Run the following command:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

24. Add the following between the <configuration> ... </configuration> tags

Then Save and close the file

25. In extended terminal write:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

and Paste following between <configuration> … </configuration> tags


26. Run the following command:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml.template

27. Add the following between the <configuration> ... </configuration> tags

28. Instead of saving the file directly, Save As… and then set the filename as mapred-site.xml. Verify that the file is being saved to the /usr/local/hadoop/etc/hadoop/  directory only

29. Type following commands to make folders for namenode and datanode:

$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode


30. Run the following:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

31. Add the following lines between the <configuration> … </configuration> tags



32. Format the namenode with HDFS:
$ hdfs namenode -format

33. Start Hadoop Services:


34. Run the following command to verify that hadoop services are running
$ jps

If everything was successful, you should see following services running2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode

Acknowledgement & Reference:
1. For testing Hadoop File System with copying and deleting files from Local machine, please refer to the following link:

2. There are not many good references available for the installation of Hadoop - 2.2.0 online (I had to go through a lot of them to actually find the one). I followed the steps as given on the following webpage.

The changes made are by me to fix the errors which I faced during the installation process.

3. Check out this video if you wish to see how to configure Ubuntu on VMware. This video also explains how to install the previous (1.x) version of Hadoop

Hadoop 1.x Video