Wednesday, 13 November 2013

Hadoop-2.2.0 Installation Steps for Single-Node Cluster (On Ubuntu 12.04)



1.       Download and install VMware Player depending on your Host OS (32 bit or 64 bit)


2.       Download the .iso image file of Ubuntu 12.04 LTS (32-bit or 64-bit depending on your requirements)


3.       Install Ubuntu from image in VMware. (For efficient use, configure the Virtual Machine to have at least 2GB (4GB preferred) of RAM and at least 2 cores of processor


Note: Install it using any user id and password you prefer to keep for your Ubuntu installation. We will create a separate user for Hadoop installation later.


4.       After Ubuntu is installed, login to it and go to User Accounts(right-top corner) to create a new user for Hadoop


5.       Click on “Unlock” and unlock the settings by entering your administrator password.


6.        Then click on “+” at the bottom-left to add a new user. Add the user type as Administrator (I prefer this but you can also select as Standard) and then add the username as “hduser” and create it.
Note: After creating the account you may see it as disabled. Click on the Dropdown where Disabled is written and select “Set Password” – to set the password for this account or select “Enable” – to enable this account without password.


7.       Your account is set. Now login into your new “hduser” account.


8.       Open terminal window by pressing Ctrl + Alt + T


9. Install openJDK using the following command
$ sudo apt-get install openjdk-7-jdk

10. Verify the java version installed
$ java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.12) (7u25-2.3.12-4ubuntu3)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)


11. Create a symlink from openjdk default name to ‘jdk’ using the following commands:
$ cd /usr/lib/jvm
$ ln -s java-7-openjdk-amd64 jdk


12. Install ssh server:
$ sudo apt-get install openssh-client
$ sudo apt-get install openssh-server


13. Add hadoop group and user
$ sudo addgroup hadoop
$ usermod -a -G hadoop hduser


To verify that hduser has been added to the group hadoop use the command:
$ groups hduser


which will display the groups hduser is in.


14. Configure SSH:
$ ssh-keygen -t rsa -P ''
...
Your identification has been saved in /home/hduser/.ssh/id_rsa
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub
...
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost


15.Disable IPv6 because it creates problems in Hadoop– Run the following command:
$ gksudo gedit /etc/sysctl.conf


16. Add the following line to the end of the file:
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1


Save and close the file. Then restart the system and login with hduser again.


17. Download Hadoop - 2.2.0 from the following link to your Downloads folder


18. Extract Hadoop and move it to /usr/local and make this user own it:
$ cd Downloads
$ sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local
$ cd /usr/local
$ sudo mv hadoop-2.2.0 hadoop
$ sudo chown -R hduser:hadoop hadoop



19. Open the .bashrc file to edit it:
$ cd ~
$ gksudo gedit .bashrc


20. Add the following lines to the end of the file:
#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
#end of paste


Save and close the file.


21. Open hadoop-env.sh to edit it:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh


and modify the JAVA_HOME variable in the File:


export JAVA_HOME=/usr/lib/jvm/jdk/


Save and close the file
Restart the system and re-login


22. Verify the Hadoop Version installed using the following command in the terminal:


$ hadoop version

The output should be like:
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar


This makes sure that Hadoop is installed and we just have to configure it now.


23. Run the following command:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml


24. Add the following between the <configuration> ... </configuration> tags
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>


Then Save and close the file


25. In extended terminal write:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

and Paste following between <configuration> … </configuration> tags
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>


<property>
 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>


26. Run the following command:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml.template


27. Add the following between the <configuration> ... </configuration> tags
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>



28. Instead of saving the file directly, Save As… and then set the filename as mapred-site.xml. Verify that the file is being saved to the /usr/local/hadoop/etc/hadoop/  directory only


29. Type following commands to make folders for namenode and datanode:


$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode

  


30. Run the following:
$ gksudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml



31. Add the following lines between the <configuration> … </configuration> tags
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>


<property>
 <name>dfs.namenode.name.dir</name>
 <value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>


<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>



32. Format the namenode with HDFS:
$ hdfs namenode -format



33. Start Hadoop Services:


$ start-dfs.sh
....
$ start-yarn.sh
….



34. Run the following command to verify that hadoop services are running
$ jps


If everything was successful, you should see following services running2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode


Acknowledgement & Reference:
1. For testing Hadoop File System with copying and deleting files from Local machine, please refer to the following link:



2. There are not many good references available for the installation of Hadoop - 2.2.0 online (I had to go through a lot of them to actually find the one). I followed the steps as given on the following webpage.


The changes made are by me to fix the errors which I faced during the installation process.


3. Check out this video if you wish to see how to configure Ubuntu on VMware. This video also explains how to install the previous (1.x) version of Hadoop

Hadoop 1.x Video

Thursday, 8 August 2013

Using Rich Snippets: Microdata DOM API

This is in continuation with my previous post in which I explained about Microdata. In this post I am going to discuss further about the Microdata DOM API which can be used in programming to retrieve the structured data written on the HTML page. Here I will be using the same paragraph as the example, which I used in my last post.

The code of the HTML page along with the JavaScript is as follows:

<!doctype html>
<html lang="en">


<head>
<meta charset="utf-8" />
<title>Microdata API</title>
</head>
 
<body>


<p itemscope>
Hello, my name is <span itemprop="given-name">Piyush</span>
<span itemprop="family-name">Agarwal</span> and I am a
<span itemprop="role">Student</span> at
<span itemprop="organisation">University of North Carolina</span>
at <span itemprop="city">Raleigh</span>, 

<span itemprop="state">NC</span>
</p>
 

<button id="btnGetVals" onclick="getVals();">Get Values</button>
<br><br>
 

<div id="divContainer">
<strong>Extracted Microdata:</strong><br>
</div>
 
<script>
function getVals(){

var dispDiv = document.getElementById('divContainer');
if (document.getItems){
var docItems = document.getItems();
dispDiv.innerHTML += 'given-name: ' + docItems[0].properties['given-name'][0].itemValue;dispDiv.innerHTML += '<br>family-name: ' + docItems[0].properties['family-name'][0].itemValue;dispDiv.innerHTML += '<br>role: ' + docItems[0].properties['role'][0].itemValue;dispDiv.innerHTML += '<br>organisation: ' + docItems[0].properties['organisation'][0].itemValue;dispDiv.innerHTML += '<br>city: ' + docItems[0].properties['city'][0].itemValue;dispDiv.innerHTML += '<br>state: ' + docItems[0].properties['state'][0].itemValue;

}
else{
alert('Your browser does not support Microdata API');

}}
</script>
 

</body>
</html>



In the above code, I first create a paragraph which contains the microdata (similar to the one in my previous post). Then I make a button - "Get Values", on the click of which the javascript function getVals will be called. Then there is a div which has a heading of "Extracted Data:". On the click of the button, the extracted microdata from the HTML page will be displayed in this div. Next I define the getVals function in the script tags. In the function I first check if the browser supports microdata API, by the command "document.getItems". If this statement returns True, that means that browser supports Microdata API and so the code is executed otherwise an alert message is displayed to the user.

Note: It is always beneficial to check for the same, as although, almost every browser nowadays supports the Microdata API, but there might be some users who are still using old browsers which do not support it. So, by checking, we can reduce the possibilities of any errors.

The document.getItems function returns a node-list containing all the microdata items on the HTML page, along-with their properties. Then we can use this list to get the property value of a particular item. The docItems[0] defines that we are referring to the first item in the node-list, which in this case is our paragraph. Next we get the property we want to use from the HTMLPropertiesCollection, by specifying the correct name of the property to be used.
Example: document[0].properties['city']
The above command returns a node-list of the specified property which has just one element, i.e., the specified property itself. So, we can specify the first object in the list by using its index, i.e., 0. Then we can get the value of that property from the itemValue property in the node-list (as show in the code above).

So, you can see that how helpful microdata can be for programmers(Microdata DOM API) as well as Search Engines(the structured microdata on the HTML page). As far as we say that Microdata is flexible as in keeping any names for our item properties, but it is still better to use some predefined names so that any person who looks at your code understands it. For this purpose a website - schema.org has been created, which lists some predefined microdata item names according to their use. Various search engines like Bing, Google, Yahoo, etc. look for microdata names specified on this website to improve their search results.


So, this is all about Microdata DOM API. Please feel free to post your comments and views regarding it. If you have any questions, then I will be more than happy to answer them. You can also watch the video tutorial for this post on my Youtube Channel.

Thursday, 1 August 2013

Using Rich Snippets: Microdata

You must all be familiar with Rich Snippets by now. If not then read my previous post which gives a brief description of Rich Snippets and especially Microformats.

In this post I am going to discuss in brief about Microdata. Microdata is also a type of rich snippet. The similarities between Microdata and Microformat are:

1. As you all know by now, that they are types of the same class, i.e., Rich Snippets;
2. They are both embedded into HTML pages;
3. They help in increasing the machine-readable information of the web pages;
4. They both are considered as Structured Data in the web pages and help increase the semantic     meaning of the HTML code; and
5. They both can be tested using the Google's Structured Data Testing Tool.

So, besides these similarities there are many differences between the two (Microformats and Microdata), as well. These differences can be summed up as follows:

1. Microformats are predefined formats like format for People, Recipes, Reviews, etc., whereas, Microdata is completely flexible and customize-able. You can define any item as a Microdata (in your HTML), by specifying the itemscope and the itemprop (item properties);
2. Microdata are universally recognised due to their flexible nature;
3. You don't have to remember any predefined classes in microdata, as you have to do in case of microformats.

Till now, I hope you must have got a basic idea about Microdata and how it is different from Microformat. Now let me give you an example of Microdata.

Suppose you want to write a paragraph about someone in HTML (In this example I am writing a short paragraph about myself). In normal HTML, i.e., without the use of Rich Snippets, you would write a paragraph somewhat as follows:

<p>
Hello, my name is Piyush Agarwal and I am a Student at University of North Carolina 
at Raleigh, NC
</p>

Now if we write the same paragraph using Microdata, then it would look something as follows:

<p itemscope>
Hello, my name is <span itemprop="given-name">Piyush</span>
<span itemprop="family-name">Agarwal</span> and I am a 
<span itemprop="role">Student</span> at 
<span itemprop="organisation">University of North Carolina</span>
at <span itemprop="city">Raleigh</span>, <span itemprop="state">NC</span>

</p>

You can clearly make out the difference between the code of the two paragraphs. This is what Microdata (rather Rich Snippets) does to your code. It makes it structured, more meaningful, and also more machine-readable. Its as simple as that but still I would like to tell you about the code. The paragraph tag (<p>) has got an attribute - itemscope. Itemscope is responsible for marking the scope of the item. Here it means, that everything which is present between the opening and closing tags of the paragraph is one single item. Next comes the itemprop attribute which marks the name of the item property which is enclosed within the opening and closing tags of that span. So, in case of first span, the property name is "given-name" and its value is "Piyush" [ hey that's my name :-) ]. Similarly in the second span, the property name is "family-name" and its value is "Agarwal", and so on. 

That is all what is in Microdata. Just by the use of itemscope and itemprop, it makes our code well structured and efficiently-readable. The property names which I have used in the above code are just random and are based on my choice only. They are completely customize-able and you can use any property name you want, that of course goes with the code you are writing [otherwise you yourself will get confused ;-)].

You can test how the above codes will be rendered by the Google's Structured Data Testing Tool and check the difference between them yourself. (See the Extracted Data Section for difference)

So that's a summary about microdata.You can look more into this topic in the help section of the Google's Webmaster Tools, where you can also find about other rich snippets.

You can also view the video tutorial for this post on my Youtube Channel.

Please feel free to post any of your queries or suggestion in the comments, and I will try and answer them.

Tuesday, 30 July 2013

Using Rich Snippets: Microformats

Rich Snippets are basically used in HTML to give a more meaningful information to the various search engines. Search engines like Google, Bing, Yahoo!, etc. look for these rich snippets in the HTML content of the page to get more information about that particular page. Rich snippets are basically some machine-readable data, which is embedded directly into your webpages.

There are various kinds of rich snippets available, but in this post I am going to talk only about Microformats. Microformats are designed, so that they give a much meaningful information to the search engines of the information contained on the page. Many different kinds of microformats are defined, such as Reviews, People, Recipes, etc., whose formats can easily be found on the website link given at the end of the post. Here I am just going to explain how microformats make your information more meaningful, using the example of the contact details of a person.

So, if we want to write about a person contact details in plain HTML, i.e., without using microformats, we would write something as follows:

<div>
<img src="www.sample.com/piyush.jpg" />
<strong>Piyush Agarwal</strong>
Student at University of Mason
100 Round St
Raleigh, NC 45678
</div>

The above information, although absolutely correct, still does not gives much information about the content. Look at the same information written in HTML, but with Microformats:

<div class="vcard">
<img class="photo" src="www.sample.com/piyush.jpg" />
<strong class="fn">Piyush Agarwal</strong>
<span class="title">Student</span> at <span class="org"> University of Mason</span>
<span class="adr">
<span class="street-address">100 Round St</span>
<span class="locality">Raleigh</span>,
<span class="region">NC</span>
<span class="postal-code">45678</span>
</span>
</div>

One can clearly see the difference in the above two codes. The above mentioned microformat is known as the hCard Microformat. It starts with the class vcard which basically marks the beginning of the hCard microformat. Although it is said as hCard microformat, it is written as vcard (this is not a typo). After that we specify the image of the person with a class photo, which tells the search engine that it is the photo of the person referred to by the contact. Then we specify the full name of the person with a class fn. Then comes the title, i.e., Student and then the organisation he is associated with, i.e., org. Then starts the address span (represented by class adr), which is subdivided into street-address (for Street Address), locality (for City), region (for State) and postal-code (for Postal Code).

You can also check out how your HTML, containing microformats will be rendered by Google Search, on the Google's Structured Data Testing Tool

This is just a simple hCard microformat which explains how helpful rich snippets can be in structuring your webpage data. You can look into this topic in the help section of the Google's Webmaster Tools, where you can also find the formats for other microformats which I mentioned in the beginning of this post.

You can also view the video tutorial for this post on my Youtube Channel.

Please feel free to post any of your queries or suggestion in the comments, and I will try and answer them.

Wednesday, 16 January 2013

Arrange the Checkbox list vertically

This is a small piece of code which is required many a times to provide vertical scroll for a Check-Box list.

<asp:CheckBoxList runat="server" ID="chkbxlstFacility" RepeatDirection="Vertical"
Height="300" RepeatColumns="1" RepeatLayout="Flow">
</asp:CheckBoxList>


Search in a Check-Box list

This is a very important piece of code which is required many a times to search in a Check-box List on a web page. This code requires a text box to enter the search text and a check box list in which this search is to be performed. This code searches the first item in the list starting with the searched text and then scrolls the div up to that searched item.



function SearchBox(ListBox, SearchTextBox, ChkBoxSA, divSel) {
     
        var searchText;
        var arrListBox = new Array(5);
        arrListBox[0] = '<%=chkbxlstFacility.ClientID %>';
        arrListBox[1] = '<%=chkbxlstProductLine.ClientID %>';
        arrListBox[2] = '<%=chkbxlstState.ClientID %>';
        arrListBox[3] = '<%=chkbxlstCountry.ClientID %>';
        arrListBox[4] = '<%=chkbxlstContinent.ClientID %>';

        var abox = document.getElementById(arrListBox[ListBox]);
        var checkBoxArray = abox.getElementsByTagName('input');
        var checkBoxLabel = abox.getElementsByTagName('label');

        for (var j = 0; j < checkBoxArray.length; j++) {
            var checkBoxRefLbl = checkBoxLabel[j];
            var checkBoxRef = checkBoxArray[j];
            if (SearchTextBox.value != "") {
                var ser = checkBoxRefLbl.innerText.toLowerCase();
                var toSer = SearchTextBox.value.toLowerCase();
                if (ser.startsWith(toSer)) {
                    checkBoxRef.focus();
                    divSel.scrollTop = checkBoxRef.offsetTop + 20;
                    break;
                }
            }
            else {
                ChkBoxSA.focus();
                break;
            }
        }
        SearchTextBox.focus(); 
     }