...
- Modularise large or complex queries into multiple smaller components. These are easier to comprehend, maintain, and test.
- Use macros or UDFs to encapsulate repeated or complex column expressions.
- Use Hive variables to decouple HQL scripts from specific environments. For example it might be wise to use
LOCATION ${myTableLocation}
in preference toLOCATION /hard/coded/path
. - Keep the scope of tests small. Making coarse assertions on the entire contents of a table is brittle and has a high maintenance requirement.
- Use the
SOURCE
command to combine multiple smaller HQL scripts. - Test macros and the integration of UDFs by creating simple test tables and applying the functions to columns in those tables.
- Test UDFs by invoking the lifecycle methods directly (
initialize
,evaluate
, etc.) in a standard testing framework such as JUnit.
Relevant issues
HIVE-12703: CLI agnostic HQL import command implementation