  Unit Testing Hive SQL

  • Modularise large or complex queries into multiple smaller components. These are easier to comprehend, maintain, and test.
  • Use macros or UDFs to encapsulate repeated or complex column expressions.
  • Use Hive variables to decouple HQL scripts from specific environments. For example it might be wise to use LOCATION ${myTableLocation} in preference to LOCATION /hard/coded/path.
  • Keep the scope of tests small. Making coarse assertions on the entire contents of a table is brittle and has a high maintenance requirement.
  • Use the SOURCE command to compose combine multiple smaller HQL scripts.