PigStorage also load/store complex data type: tuple, bag, map. Here are formats:

  • Tuple: enclosed by (), items separated by ",". Eg: (item1,item2,item3). Empty tuple is valid. Eg: ()
  • Bag: enclosed by {}, tuples separated by ",". Eg: {code}{(tuple1),(tuple2),(tuple3)}{code}. Empty bag is valid. Eg: {}
  • Map: enclosed by [], items separated by ",", key and value seperated by "#". Eg: [key1#value1,key2#value2]. Empty map is valid. Eg: []

If load statement specify a schema, Pig will convert the complex type according to schema. If conversion fail, the affected item will be null.

Here are more examples:

a = load '1.txt' as (a0:{t:(m:map[int],d:double)});

{([foo#1,bar#2],34.0),([white#3,yellow#4],45.0)} : valid
{([foo#badint],baddouble)} : conversion fail for badint/baddouble, get {([foo#],)}
{} : valid, empty bag
  • No labels