Saturday, November 1, 2014

Using Pig to Load and Store data from HBase

Lets first store data from HDFS to our HBase Table. For this we will be using

org.apache.pig.backend.hadoop.hbase
Class HBaseStorage

public HBaseStorage(String columnList) throws org.apache.commons.cli.ParseException,IOException)


Warning: Make sure that your PIG_CLASSPATH refers to all the library files in HBASE,HADOOP and ZOOKEEPER. Doing this will save you countless hours of debugging.

Lets Create a HBase table for the data given below named as testtable.

Make sure that your first column is the ROWKEY while doing an insert to HBase table.

1|Krishna|23
2|Madhuri|37
3|Kalyan|54
4|Shobhana|50
view raw HBase hosted with ❤ by GitHub


Lets Create a Table for this data in HBase.

>> cd $HBASE_HOME\bin
>> ./hbase shell

This will take you to your HBase shell

>> create 'testtable','cf'
>> list 'testtable'
>> scan 'testtable'

Now lets fire up grunt shell

Type in the following commands in the grunt shell
-- Loading the testdata to your relation
A = LOAD '/home/biadmin/testdata' using PigStorage('|') as (id,name,age);
-- Casting to chararray needs to be done
B = foreach A generate (chararray)$0,(chararray)$1,(chararray)$2;
-- We can delemit by either space or comma
C = STORE B INTO 'hbase://testtable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:name cf:age');
view raw gistfile1.txt hosted with ❤ by GitHub

and TaDa....







No comments:

Post a Comment