Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Writing GenericUDAFs: A Tutorial

...

User-Defined Aggregation Functions (UDAFs) are an excellent way to integrate advanced data-processing into Hive. Hive allows two varieties of UDAFs: simple and generic. Simple UDAFs, as the name implies, are rather simple to write, but incur performance penalties because of the use of Java Reflection, and do not allow features such as variable-length argument lists. Generic UDAFs allow all these features, but are perhaps not quite as intuitive to write as Simple UDAFs.

...

Old interface org.apache.hadoop.hive.ql.udf.GenericUDAFResolver was deprecated as of the 0.6.0 release. The key difference between GenericUDAFResolver and GenericUDAFResolver2 interface is the fact that the latter allows the evaluator implementation to access extra information regarding the function invocation such as the presence of DISTINCT qualifier or the invocation with the wildcard syntax such as FUNCTION(star)(*). UDAFs that implement the deprecated GenericUDAFResolver interface will not be able to tell the difference between an invocation such as FUNCTION() or FUNCTION(star) (*) since the information regarding specification of the wildcard is not available. Similarly, these implementations will also not be able to tell the difference between FUNCTION(EXPR) vs FUNCTION(DISTINCT EXPR) since the information regarding the presence of the DISTINCT qualifier is also not available.

Note that while resolvers which implement the GenericUDAFResolver2 interface are provided the extra information regarding the presence of DISTINCT qualifier of invocation with the wildcard syntax, they can choose to ignore it completely if it is of no significance to them. The underlying data filtering to compute DISTINCT values is actually done by Hive's core query processor and not by the evaluator or resolver; the information is provided to the resolver only for validation purposes. The AbstractGenericUDAFResolver base class offers an easy way to transition previously written UDAF implementations to migrate to the new resolver interface without having to re-write the implementation since the change from implementing GenericUDAFResolver interface to extending AbstractGenericUDAFResolver class is fairly minimal. (There may be issues with implementations that are part of an inheritance hierarchy since it may not be easy to change the base class.)
a