Pig 0.12.0 Regressions

Automatic reducer estimater is broken

PIG-3512 breaks automatic reducer estimater. If you rely on Pig to set number of reducer automatically, you should not use this version. If you use "default_parallel" or "parallel" to specify number of reducers, you will not be affected. This will be fixed in 0.12.1.

Partition filter does not pushed into loader in some cases

This is a performance regression. If you have two partition filter statement after loader, before 0.12.0, Pig will first combine the two filters and push both into the loader. In Pig 0.12.0, Pig only pushes the first filter to the loader. You will get the same result, but there is a performance downgrade because of it.
For example:

-- This works
A = load 'sometable' using HCatLoader();
B = filter A by ds=='201301' and state=='CA';
-- This does not work, only ds will be pushed to HCatLoader
A = load 'sometable' using HCatLoader();
B = filter A by ds=='201301';
C = filter B by state=='CA';

To get around this, you should use one filter statement for all partition. Or you can specify:

pig.exec.useOldPartitionFilterOptimizer=true

in conf/pig.properties

This will be fixed in Pig 0.12.1.

  • No labels