Though HDFS is the default Distributed Filing System attached to Hadoop, HBase also came to limelight due to several limitations in HDFS.

HDFS Limitations

1. HDFS is optimized for streaming relatively larger files, which have over 100s of MB upwards and are accessed them through MapReduce in “batch mode“.

2. HDFS files are write-once and read-many files and does not perform well in random writes or random reads.

To overcome above, HBase was developed with following features.

HBase Features

1. Can access small amount of data from a large data set from a billion row table in real-time.

2. Fast scanning across tables.

3. Flexible data model.

So, with the all above, lets dive into the world of HBase.

Installation (Standalone Mode)

HBase installation can happen in three difference modes.

1. Standalone / Local Mode

2, Pseudo Distributed Mode

3. Fully Distributed Mode

Just to feel HBase, we will first try out the Standalone Mode here.

Step 1: Download HBase. Use binaries at the Apache HBase mirrors.

In this tutorial I am using the hbase-0.94.8.tar.gz.

Further you can check the configuration dependencies from the URL http://hbase.apache.org/book/configuration.html (See Table 2.1 Hadoop Version support Matrix) According to this you are required to have a Hadoop 1.0.3 to run HBase.

Step 2: Extract HBase binaries to a desired location. Here I do extract to the location /usr/local/hbase-0.94.8 and make it HBASE_HOME

Step 3: Set the HBASE_HOME in .bashrc

You are required to include HBASE_HOME and add the HBASE_HOME to .bashrc file.

----
export HBASE_HOME=/usr/local/hbase-0.94.8
----
----
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$PATH

Step 3: Change the HBASE Configuration Files.

$HBASE_HOME/conf/hbase-env.sh to set the JAVA_HOME

export JAVA_HOME=/usr/local/jdk1.6.0_37

$HBASE_HOME/conf/hbase-site.xml

<configuration>
<property>

<name>hbase.rootdir</name>
<value>file:///usr/local/hbase-0.94.8/var</value>
</property>
</configuration>

The hbase.rootdir needs to be specified in order to have a static directory for HBase work. Otherwise the system is using a temporary folder assigned by itself.

Step 4: Change the /etc/hosts file to reflect the external IP of yours.

The changed entry above should look like below. The second line should be given the external IP (192.168.1.33 here) or 127.0.0.1 by replacing the 127.0.1.1

127.0.0.1       localhost
192.168.1.33    crishantha-Notebook-PC

Step 5: Start Hbase

$ $HBASE_HOME/bin/start-hbase.sh

So if everything goes well, you should be able to successfully start the HBase.

Check $HBASE_HOME/logs folder to find any errors while starting.

Step 6: Start the HBase shell – Once the HBase is started make sure the HBase shell is working. Execute the following command to ensure.

$ hbase shell

References

1. http://diggdata.in/post/67561846971/fetch-data-from-hbase-database-from-r-using-rhbase

2. http://blog.revolutionanalytics.com/2011/09/mapreduce-hadoop-r.html

3. https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/tutorial.md

VN:F [1.9.22_1171]
Rating: 6.7/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)
Setting Up HBase (Standalone) on Ubuntu 12.04, 6.7 out of 10 based on 3 ratings