This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • Unit Testing Hive SQL

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


This query has a very broad set of responsibilities which cannot be easily verified in isolation. On closer inspection it appears that it is in fact formed of at least 7 distinct queries. To effectively unit test the process that this query represents an approach must be applied that separates and encapsulates each of the subqueries so that they can be tested independently. Possible approaches to this include: VIEWs, sequential execution of components with intermediate (possibly TEMPORARY) tables, and even variable substitution of query fragments. Furthermore, the availability of HPL/SQL may provide additional opportunities to modularize. The potential performance implications of such techniques should be considered. For example, using intermediate tables may generate more I/O and splitting large queries may limit the query optimisation opportunities available to the query planner.