Tuesday, November 11, 2014

MapR SingleNode in Ubuntu

Creating a single node instance of MapR in ubuntu.


# Navigate to this Folder
cd /etc/apt
# Edit sources.list file and add the MapR repositories into it.
vi sources.list
# Add MapR repo entries below
deb http://package.mapr.com/releases/v2.1.2/ubuntu/ mapr optional
deb http://package.mapr.com/releases/ecosystem/ubuntu binary/
# Update repo
sudo apt-get update
# install Map-r hadoop
sudo apt-get install mapr-single-node
view raw Mapr hosted with ❤ by GitHub

Saturday, November 1, 2014

Using Pig to Load and Store data from HBase

Lets first store data from HDFS to our HBase Table. For this we will be using

org.apache.pig.backend.hadoop.hbase
Class HBaseStorage

public HBaseStorage(String columnList) throws org.apache.commons.cli.ParseException,IOException)


Warning: Make sure that your PIG_CLASSPATH refers to all the library files in HBASE,HADOOP and ZOOKEEPER. Doing this will save you countless hours of debugging.

Lets Create a HBase table for the data given below named as testtable.

Make sure that your first column is the ROWKEY while doing an insert to HBase table.

1|Krishna|23
2|Madhuri|37
3|Kalyan|54
4|Shobhana|50
view raw HBase hosted with ❤ by GitHub


Lets Create a Table for this data in HBase.

>> cd $HBASE_HOME\bin
>> ./hbase shell

This will take you to your HBase shell

>> create 'testtable','cf'
>> list 'testtable'
>> scan 'testtable'

Now lets fire up grunt shell

Type in the following commands in the grunt shell
-- Loading the testdata to your relation
A = LOAD '/home/biadmin/testdata' using PigStorage('|') as (id,name,age);
-- Casting to chararray needs to be done
B = foreach A generate (chararray)$0,(chararray)$1,(chararray)$2;
-- We can delemit by either space or comma
C = STORE B INTO 'hbase://testtable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:name cf:age');
view raw gistfile1.txt hosted with ❤ by GitHub

and TaDa....







Pig Casting and Schema Management

Pig is quite flexible when schema need to be manipulated.

Consider this data set

a,1,55,M,IND
b,2,55,M,US
c,3,56,F,GER
d,4,57,F,AUS
view raw Schema hosted with ❤ by GitHub


Suppose we needed to define schema after some processing we could cast the columns with their data types

-- Load
A = load 'input' using PigStorage(',');
-- this will generate all columns after the first one
B = foreach A generate $1..;
--Suppose you need to cast the
C = FOREACH A generate (chararray)$0,(int)$1,(int)$2,(chararray)$3,(chararray)$4;
dump C;
view raw Cast hosted with ❤ by GitHub


That all for today folks.

Cheers!