Wednesday, 13 November 2013

Hadoop-2.2.0 Installation Steps for Single-Node Cluster (On Ubuntu 12.04)



1.       Download and install VMware Player depending on your Host OS (32 bit or 64 bit)


2.       Download the .iso image file of Ubuntu 12.04 LTS (32-bit or 64-bit depending on your requirements)


3.       Install Ubuntu from image in VMware. (For efficient use, configure the Virtual Machine to have at least 2GB (4GB preferred) of RAM and at least 2 cores of processor


Note: Install it using any user id and password you prefer to keep for your Ubuntu installation. We will create a separate user for Hadoop installation later.


4.       After Ubuntu is installed, login to it and go to User Accounts(right-top corner) to create a new user for Hadoop


5.       Click on “Unlock” and unlock the settings by entering your administrator password.


6.        Then click on “+” at the bottom-left to add a new user. Add the user type as Administrator (I prefer this but you can also select as Standard) and then add the username as “hduser” and create it.
Note: After creating the account you may see it as disabled. Click on the Dropdown where Disabled is written and select “Set Password” – to set the password for this account or select “Enable” – to enable this account without password.


7.       Your account is set. Now login into your new “hduser” account.


8.       Open terminal window by pressing Ctrl + Alt + T


9. Install openJDK using the following command
$ sudo apt-get install openjdk-7-jdk

10. Verify the java version installed
$ java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.12) (7u25-2.3.12-4ubuntu3)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)


11. Create a symlink from openjdk default name to ‘jdk’ using the following commands:
$ cd /usr/lib/jvm
$ ln -s java-7-openjdk-amd64 jdk


12. Install ssh server:
$ sudo apt-get install openssh-client
$ sudo apt-get install openssh-server


13. Add hadoop group and user
$ sudo addgroup hadoop
$ usermod -a -G hadoop hduser


To verify that hduser has been added to the group hadoop use the command:
$ groups hduser


which will display the groups hduser is in.


14. Configure SSH:
$ ssh-keygen -t rsa -P ''
...
Your identification has been saved in /home/hduser/.ssh/id_rsa
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub
...
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost


15.Disable IPv6 because it creates problems in Hadoop– Run the following command:
$ gksudo gedit /etc/sysctl.conf


16. Add the following line to the end of the file:
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1


Save and close the file. Then restart the system and login with hduser again.


17. Download Hadoop - 2.2.0 from the following link to your Downloads folder


18. Extract Hadoop and move it to /usr/local and make this user own it:
$ cd Downloads
$ sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local
$ cd /usr/local
$ sudo mv hadoop-2.2.0 hadoop
$ sudo chown -R hduser:hadoop hadoop



19. Open the .bashrc file to edit it:
$ cd ~
$ gksudo gedit .bashrc


20. Add the following lines to the end of the file:
#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
#end of paste


Save and close the file.


21. Open hadoop-env.sh to edit it:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh


and modify the JAVA_HOME variable in the File:


export JAVA_HOME=/usr/lib/jvm/jdk/


Save and close the file
Restart the system and re-login


22. Verify the Hadoop Version installed using the following command in the terminal:


$ hadoop version

The output should be like:
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar


This makes sure that Hadoop is installed and we just have to configure it now.


23. Run the following command:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml


24. Add the following between the <configuration> ... </configuration> tags
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>


Then Save and close the file


25. In extended terminal write:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

and Paste following between <configuration> … </configuration> tags
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>


<property>
 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>


26. Run the following command:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml.template


27. Add the following between the <configuration> ... </configuration> tags
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>



28. Instead of saving the file directly, Save As… and then set the filename as mapred-site.xml. Verify that the file is being saved to the /usr/local/hadoop/etc/hadoop/  directory only


29. Type following commands to make folders for namenode and datanode:


$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode

  


30. Run the following:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml



31. Add the following lines between the <configuration> … </configuration> tags
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>


<property>
 <name>dfs.namenode.name.dir</name>
 <value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>


<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>



32. Format the namenode with HDFS:
$ hdfs namenode -format



33. Start Hadoop Services:


$ start-dfs.sh
....
$ start-yarn.sh
….



34. Run the following command to verify that hadoop services are running
$ jps


If everything was successful, you should see following services running2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode


Acknowledgement & Reference:
1. For testing Hadoop File System with copying and deleting files from Local machine, please refer to the following link:



2. There are not many good references available for the installation of Hadoop - 2.2.0 online (I had to go through a lot of them to actually find the one). I followed the steps as given on the following webpage.


The changes made are by me to fix the errors which I faced during the installation process.


3. Check out this video if you wish to see how to configure Ubuntu on VMware. This video also explains how to install the previous (1.x) version of Hadoop

Hadoop 1.x Video

52 comments:

  1. Its really helpful for me to understand where we i lost in my previous interview. Thanks.
    If anyone wants to Learn Hadoop in Chennai go to the Besant Technologies which is No.1 Training Institute in Chennai.


    http://www.hadooptrainingchennai.co.in

    ReplyDelete
  2. Good blog. If you want to try latest Hadoop 2.x, try LightHadoop 32-bit VM which is available at http://www.lighthadoop.com/?p=1 . It has latest CDH5.1 with minimal configuration which can run on 32-bit OS. CPUs without VT-x is supported. It includes Apache Hadoop 2.3, Apache Pig 0.12, Apache Hive 0.12 and Sqoop 1.4.4. Mysql 5.1 is also available which is used as metastore.

    ReplyDelete
  3. Wow that's a wonderfull blog having all details & helpful. Hadoop cluster NJ

    ReplyDelete
  4. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this (Salesforce Training).

    ReplyDelete
  5. I have read your blog, it was good to read & I am getting some useful info's through your blog keep sharing... Informatica is an ETL tools helps to transform your old business leads into new vision. Learn Informatica training in chennai from corporate professionals with very good experience in informatica tool.
    Regards,
    Best Informatica Training In Chennai|Informatica training center in Chennai|Informatica training chennai

    ReplyDelete
  6. I agree with your thoughts!!! As the demand of java programming application keeps on increasing, there is massive demand for java professionals in software development industries. Thus, taking training will assist students to be skilled java developers in leading MNCs. J2EE Training in Chennai | JAVA Training Institutes in Chennai

    ReplyDelete
  7. We never miss a single post on this blog about hadoop. After attending hadoop online training, this site worked as a supplement to our technical knowledge about the subject related to cloud and other related platforms like hadoop.

    ReplyDelete
  8. Upgrading ourselves to the upcoming technology is the best way to survive in this modern and fast paced technology world. Reading contents like this will create a positive impact within me. Thanks for writing such a valuable content. Keep up this work.

    JAVA Training in Chennai | JAVA Course in Chennai | JAVA Training Institutes in Chennai | Best JAVA Training institute in Chennai

    ReplyDelete
  9. Hi, actually I'am new to angularJs and infact I'am learning angularjs with online training. I'am having doubt, if you could solve the doubt for me that would be very helpful. The doubt is, how can I reset a “$timeout”, and disable a “$watch()”?
    Regards,
    angularjs training in Chennai|angularjs course in Chennai|angularjs training institute in Chennai

    ReplyDelete
  10. Best Java Training Institute In ChennaiThis information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic..

    ReplyDelete
  11. I had really very happy for the nice services on this blog that to really love this info in this blog. Thanks a lot for visiting the nice services in this blog that to utilize the different technology.
    Big Data and Hadoop Online Training |Bigdata and Hadoop Administrator online Training

    ReplyDelete
  12. AngularJs is an open source framework which is used to add details to the website. AngularJs is supported by google which a major advantage to the developers.
    Regards,
    angularjs training in Chennai | angularjs training | FITA Academy reviews

    ReplyDelete
  13. The main thing which i like about web designing is that it needs creativity and we need to work differently acccording to our clients need this needs a creativity and innovation.
    web designing course in chennai|web designing training in chennai|web designing courses in chennai

    ReplyDelete
  14. Very quite informative post! I will try these installation steps and if everything goes fine, I would be happy to share it.
    PHP courses in Chennai

    ReplyDelete
  15. The expansion of internet and intelligence in business process lead the way to huge volume of data. It is important to maintain and process these data to be efficient in data handling. Hadoop Training in Chennai | Big Data Training in Chennai

    ReplyDelete
  16. Your blog is awesome..You have clearly explained about it ...Its very useful for me to know about new things..Keep on blogging..
    Hadoop training in chennai

    ReplyDelete
  17. Big data is used extensively in MNC today as using big data leads to accurate decision making and there are is a huge demand for the big data analysts.
    Big data training in Chennai | Hadoop training in Chennai | Big data training institute in Chennai

    ReplyDelete
  18. Thanks for sharing this information .You may also refer http://www.s4techno.com/lab-setup/
    for Hadoop practice.

    ReplyDelete


  19. Great and useful article. Creating content regularly is very tough. Your points are motivated me to move on.


    SEO Company in Chennai

    ReplyDelete
  20. My Arcus offer java training with 100% placement. Our java training course that includes fundamentals and advance java training program with high priority jobs. java j2ee training with placement having more exposure in most of the industry nowadays in depth manner of java

    java training in chennai

    ReplyDelete
  21. Thanks for your valuable post ofHadoop Online Training is very informaive and useful for who wants to learn about Hadoop

    Visit :http://www.trainingbees.com/

    ReplyDelete

  22. Big data training .All the basic and get the full knowledge of hadoop.
    Big data training


    ReplyDelete
  23. Hadoop bigdata online training,with all the basic knowledge.
    hadoop online training.

    ReplyDelete
  24. Such a very useful article. If you are looking for good Oracle fusion financials online course, we provide low price of fee for on-line coaching.Thanks for sharing this.
    Oracle fusion financials training

    ReplyDelete
  25. The blog gave me idea to install hadoop Thanks for sharing it
    Haadoop Training in Chennai

    ReplyDelete
  26. Thanks for sharing the useful information in which good points were mentioned which are helpful and for the further information contact us at
    Oracle Fusion Financials Training

    ReplyDelete
  27. Thanks a lot very sharing this idea install information is really good.I'll be very like this blog web page.
    Hadoop Online Training
    Best Hadoop Training

    ReplyDelete
  28. Oracle fusion Financials Training from CALFRE.COM gives you the best results to learn your dream course and maintains sufficient knowledge on oracle. It provides training by self-paced videos which are very helpful for the users to watch at any time according to their schedule. It is globally accepted and having many users undergoing
    training every day.


    Oracle fusion Financials Training in hyderabad

    Oracle Fusion Financials online Training in hyderabad

    ReplyDelete
  29. Everyone wants to get unique place in the IT industry’s for that you need to upgrade your skills, your blog helps me improvise my skill set to get good career, keep sharing your thoughts with us.

    Hadoop Training In Chennai

    ReplyDelete


  30. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharng this information,this is useful to me...
    Android training in chennai
    Ios training in chennai

    ReplyDelete
  31. Thanks for posting useful information.You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...Really it was an awesome article...very interesting to read..please sharing like this information......
    Web Design Development Company
    Mobile App Development Company

    ReplyDelete
  32. This is a great post- so please provide clear information on etl testing openings in hyderabad for experienced . All your hard work is much appreciated.

    ReplyDelete

  33. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  34. Hi admin..Iam the big follower of your blog. I read your artical. That is very nice and helpful to learning hadoop.Thanks for sharing.keep sharing more blogs.


    Hadoop Training in Bangalore

    ReplyDelete
  35. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.

    Hadoop Online Training

    Data Science Online Training

    ReplyDelete
  36. Well Said, you have provided the right info that will be beneficial to somebody at all time. Thanks for sharing your valuable Ideas to our vision.Big Data Hadoop Training in Bangalore | Data Science Training in Bangalore

    ReplyDelete
  37. It is really a great work and the way in which you are sharing the knowledge is excellent.Thanks for your informative article

    Hadoop Training in Marathahalli|
    Hadoop Training in Bangalore|
    Data science training in Marathahalli|
    Data science training in Bangalore|

    ReplyDelete
  38. This comment has been removed by the author.

    ReplyDelete
  39. Nice Information about Hadoop Installation My Sincere thanks for sharing this post please continue to share this post
    Hadoop Training in Bangalore

    ReplyDelete