Spark SQL Internals

NOTE: This Wiki is obsolete as of November 2016 and is retained for reference only.

Unit Testing

To run an individual Hive compatibility test:

sbt/sbt -Phive -Dspark.hive.whitelist="testname.*" "hive/test-only org.apache.spark.sql.hive.execution.HiveCompatibilitySuite"

where testname.* can be a list of comma separated regex patterns that matche the tests that you want to run. You can also use the following command to save some typing:

sbt/sbt -Phive "hive/test-only *.HiveCompatibilitySuite -- -z substring"

Then all tests whose names contain substring will be executed. The -z option comes with ScalaTest, and can be used with any other test suites.

Hive Golden Answer files

For some test suites, Hive golden answer files are generated when test cases are executed for the first time. These files cache results generated by Hive, and Spark SQL testing framework use them to accelerate test execution. For all test suites that sub-classes org.apache.spark.sql.hive.execution.HiveComparisonTest, if a test case is added via HiveComparisonTest.createQueryTest, developers should check and add corresponding golden answer files to the Git repository. In most cases, developers only need to pay attention to the following two test suites:

org.apache.spark.sql.hive.execution.HiveCompatibilitySuite
- Newly whitelisted test cases (listed in HiveCompatibilitySuite.whiteList)
org.apache.spark.sql.hive.execution.HiveQuerySuite
- Test cases created via createQueryTest

To generate golden answer files based on Hive 0.12, you need to setup your development environment according to the "Other dependencies for developers" of this README. To generate golden answer files based on Hive 0.13.1, please follow the following steps.

Download Hive's 0.13.1 release and set HIVE_HOME (HIVE_DEV_HOME is not needed. See SPARK-4119 - Getting issue details... STATUS for details).
Set HADOOP_HOME.
Download all 0.13.1a jars from http://mvnrepository.com/artifact/org.spark-project.hive and replace corresponding jars in $HIVE_HOME/lib.
Download kryo 2.21.jar (Note: 2.22 jar does not work) and javolution 5.5.1 jar (http://mvnrepository.com/artifact/javolution/javolution/5.5.1) to $HIVE_HOME/lib.
You may not need this step. But, if a Hive query fails and you find that Hive tries to talk to HDFS or you find weird runtime NPEs, set the following in your test suite...

val testTempDir = Utils.createTempDir()
// We have to use kryo to let Hive correctly serialize some plans.
sql("set hive.plan.serialization.format=kryo")
// Explicitly set fs to local fs.
sql(s"set fs.default.name=file://$testTempDir/")
// Ask Hive to run jobs in-process as a single map and reduce task.
sql("set mapred.job.tracker=local")

After running the new test cases for the first time, golden answer files should be found under the sql/hive/src/test/resources/golden folder.

Child pages

Spark SQL Internals

NOTE: This Wiki is obsolete as of November 2016 and is retained for reference only.

Unit Testing

Hive Golden Answer files