Most of the Pig 0.9 incompatibilities are in the area of syntactic and semantic cleanup. We expect these incompatibilities to have minimal impact on the users. This document contains the details of those changes.

Logical Plan

With Pig 0.9, we have completely transitioned to using new logical plan. As the result, it will not be possible to use old logical plan which has been a workaround for problems found in the new logical plan. This means that, starting with Pig 0.9 |pig.usenewlogicalplan| property will have no impact on the execution. You would still be able to disable individual optimization rules.

Parser Changes

Change

Old Way

New Way

Only single quates support for join modifier. Also for cogroup.

"skewed" or 'skewed', "merge" or 'merge'

only 'skewed', 'merge'

org.apache.pig.impl.logicalLayer.parser.ParseException has been removed. This impacts a UDF that uses Utils.getSchemaFromString function.

catch ParseException

catch Exception

Remove meaningless syntax

B = (A); - legal

B = (A); - illegal

Remove meaningless support for as clause in filter

C = ( filter B by $0 > 0 ) as (a:bytearray, b:long); - legal

C = ( filter B by $0 > 0 ) as (a:bytearray, b:long); illegal

Remove meaningless support for as clause in group

D = group A by $0 as b:LONG; - legal

D = group A by $0 as b:LONG; - illegal

Deprecate PARALLEL on operators that do not start a reducer

C = filter B by $0 > 0 PARALLEL 10; - legal

C = filter B by $0 > 0 PARALLEL 10; - legal but will generate a warning; will be removed in the next release

Streaming command options

For each option, user can specify multiple times.

User can specify at most once for each option. Violating this rule will result a validation error.

Utils.getSchemaFromString

Throw org.apache.pig.impl.logicalLayer.parser.ParseException

Throw org.apache.pig.parser.ParserException

negative numeric constants within parenthesis recognized as tuple constant

(-1) treated as integer with -1 as value, but (1) was treated as tuple containing numeric value

both (1) and (-1) treated as a tuple containing column with value -1

Semantic Changes

Change

Old Way

New Way

JIRA

Flattening a bag with an unknown schema will produce a record with an unknown schema

Schema will contain bytearray

null schema

PIG-1627

Schema & Load related changes

Pig could produce a gap between schema and data, which sometime will result runtime exceptions

If load statement specify schema, Pig will truncate/padding null to make sure the loaded data has exactly the same number of fields specified in load statement.

PIG-1188, PIG-1112, PIG-749, PIG-435

BinStorage do not cast bytes by default

BinStorage will cast bytes but in a wrong way

By default, casting bytes of BinStorage results an error. User needs to pass caster explicitly to BinStorage if they want to cast bytes

PIG-1745

When input relation's schema is present, the way matching UDF implementation class is found for UDF that take * as argument

The * argument was not expanded, even though expanded arguments are passed at runtime.

the expanded list of arguments is used for finding matching UDF class. SIZE(*) and COUNT(*) earlier did equivalent of SIZE($0) and COUNT($0), now SIZE(*) and COUNT(*) will fail at typechecking

PIG-1938

Interface Changes

Change

Old Way

New Way

JIRA

LoadCaster

 

Add "bytesToMap(byte[] b, ResourceFieldSchema fieldSchema)", mark "bytesToMap(byte[] b)" as deprecate

PIG-1876

Other changes

Change

Impact

JIRA

Combiner is used for query execution in more cases than before

Many queries might run faster. But some queries might require morememory, specially ones where the algebraic function produces large bags. Distinct in foreach statement that gets input from a group-by statement is such an example. You can also reduce the memory footprint by disabling combiner in such cases (-Dpig.exec.nocombiner=true).

  • No labels