Saturday, November 1, 2014

Pig Casting and Schema Management

Pig is quite flexible when schema need to be manipulated.

Consider this data set

a,1,55,M,IND
b,2,55,M,US
c,3,56,F,GER
d,4,57,F,AUS
view raw Schema hosted with ❤ by GitHub


Suppose we needed to define schema after some processing we could cast the columns with their data types

-- Load
A = load 'input' using PigStorage(',');
-- this will generate all columns after the first one
B = foreach A generate $1..;
--Suppose you need to cast the
C = FOREACH A generate (chararray)$0,(int)$1,(int)$2,(chararray)$3,(chararray)$4;
dump C;
view raw Cast hosted with ❤ by GitHub


That all for today folks.

Cheers!

No comments:

Post a Comment