PIG-3512 breaks automatic reducer estimater. If you rely on Pig to set number of reducer automatically, you should not use this version. If you use "default_parallel" or "parallel" to specify number of reducers, you will not be affected. This will be fixed in 0.12.1.
This is a performance regression. If you have two partition filter statement after loader, before 0.12.0, Pig will first combine the two filters and push both into the loader. In Pig 0.12.0, Pig only pushes the first filter to the loader. You will get the same result, but there is a performance downgrade because of it.
For example:
-- This works A = load 'sometable' using HCatLoader(); B = filter A by ds=='201301' and state=='CA'; |
-- This does not work, only ds will be pushed to HCatLoader A = load 'sometable' using HCatLoader(); B = filter A by ds=='201301'; C = filter B by state=='CA'; |
To get around this, you should use one filter statement for all partition. Or you can specify:
pig.exec.useOldPartitionFilterOptimizer=true |
in conf/pig.properties
This will be fixed in Pig 0.12.1.