This page describes the mechanics of how to contribute software to Apache Hive. For ideas about what you might contribute, please see open tickets in Jira.
First of all, you need the Hive source code. As of April 2015 Hive has moved to git for its repository.
Get the source code on your local drive using git. See Understanding Hive Branches below to understand which branch you should be using.
git clone https://git-wip-us.apache.org/repos/asf/hive.git
Setting Up Eclipse Development Environment (Optional)
This is an optional step. Eclipse has a lot of advanced features for Java development, and it makes the life much easier for Hive developers as well.
How do I import into eclipse?
This checklist tells you how to create accounts and obtain permissions needed by Hive contributors. See the Hive website for additional information.
Before you start, send a message to the Hive developer mailing list, or file a bug report in JIRA. Describe your proposed changes and check that they fit in with what others are doing and have planned for the project. Be patient, it may take folks a while to understand your requirements.
Modify the source code and add some features using your favorite IDE.
Please take care about the following points.
An Eclipse formatter is provided in the dev-support folder – this can be used with both Eclipse and Intellij. Please consider importing this before editing the source code.
mvn checkstyle:checkstyle-aggregate, and then inspect the results in the
target/sitedirectory. It is possible to run the checks for a specific module, if the
mvncommand is issued in the root directory of the module.
mvn test, or you can run a specific unit test with the command
mvn test -Dtest=<class name without package prefix>(for example:
mvn test -Dtest=TestFileSystem).
Hive is a multi-module Maven project. If you are new to Maven, the articles below may be of interest:
Additionally, Hive actually has two projects, "core" and "itests". The reason that itests is not connected to the core reactor is that itests requires the packages to be built.
The actual Maven commands you will need are discussed on the HiveDeveloperFAQ page.
As of June 2015, Hive has two "main lines", master and branch-1.
All new feature work and bug fixes in Hive are contributed to the master branch. As of June 2015, releases from master are numbered 2.x. The 2.x versions are not necessarily backwards compatible with 1.x versions.
branch-1 is used to build stable, backward compatible releases. Releases from this branch are numbered 1.x (where 1.3 will be the first release from it, as 1.2 was released from master prior to the creation of branch-1). Until at least June 2016 all critical bug fixes (crashes, wrong results, security issues) applied to master must also be applied to branch-1. The decision to port a feature from master to branch-1 is at the discretion of the contributor and committer. However no features that break backwards compatibility will be accepted on branch-1.
In addition to these main lines Hive has two types of branches, release branches and feature branches.
Release branches are made from branch-1 (for 1.x) or master (for 2.x) when the community is preparing a Hive release. Release branches match the number of the release (e.g., branch-1.2 for Hive 1.2). For patch releases the branch is made from the existing release branch (to avoid picking up new features from the main line). For example, if a 1.2.1 release was being made branch-1.2.1 would be made from the tip of branch-1.2. Once a release branch has been made, inclusion of additional patches on that branch is at the discretion of the release manager. After a release has been made from a branch, additional bug fixes can still be applied to that branch in anticipation of the next patch release. Any bug fix applied to a release branch must first be applied to master (and branch-1 if applicable).
Feature branches are used to develop new features without destabilizing the rest of Hive. The intent of a feature branch is that it will be merged back into master once the feature has stabilized.
For general information about Hive branches, see Hive Versions and Branches.
Hadoop dependencies are handled differently in master and branch-1.
In branch-1 both Hadoop 1.x and 2.x are supported. The Hive build downloads a number of different Hadoop versions via Maven in order to compile "shims" which allow for compatibility with these Hadoop versions. However, the rest of Hive is only built and tested against a single Hadoop version.
The Maven build has two profiles,
hadoop-1 for Hadoop 1.x and
hadoop-2 for Hadoop 2.x. When building, you must specify which profile you wish to use via Maven's
-P command line option (see How to build all source).
Hadoop 1.x is no longer supported in Hive's master branch. There is no need to specify a profile for most Maven commands, as Hadoop 2.x will always be chosen.
On this page we assume you are building from the master branch and do not include the profile in the example Maven commands. If you are building on branch-1 you will need to select the appropriate profile for the version of Hadoop you are building against.
Please make sure that all unit tests succeed before and after applying your patch and that no new javac compiler warnings are introduced by your patch. Also see the information in the previous section about testing with different Hadoop versions if you want to verify compatibility with something other than the default Hadoop version.
When submitting a patch it's highly recommended you execute tests locally which you believe will be impacted in addition to any new tests. The full test suite can be executed by Hive PreCommit Patch Testing. Hive Developer FAQ describes how to execute a specific set of tests.
> cd hive-trunk > mvn clean install -DskipTests > mvn test -Dtest=SomeTest
After a while, if you see
[INFO] BUILD SUCCESS
all is ok, but if you see
[INFO] BUILD FAILURE
then you should fix things before proceeding.
Unit tests take a long time (several hours) to run sequentially even on a very fast machine; for information on how to run them in parallel, see Hive PreCommit Patch Testing.
There are two kinds of unit tests that can be added: those that test an entire component of Hive, and those that run a query to test a feature.
To test a particular component of Hive:
Test) in the component's
mvn test -Dtest=TestAbc(where
TestAbcis the name of the new class), which will be faster than
mvn testwhich tests all testcases.
If the new feature can be tested using a Hive query in the command line, we just need to add a new
*.q file and a new
If the feature is added in
ql (query language):
ql/src/test/queries/clientpositive. (Optionally, add a new
XXXXXX.qfile for a query that is expected to fail in
mvn test -Dtest=TestCliDriver -Dqfile=XXXXXX.q -Dtest.output.overwrite=true. This will generate a new
-Dqfile="X1.q,X2.q". You can also specify a Java regex, for example
-Dqfile_regex='join.*'. (Note that it takes Java regex, i.e.,
'join*'.) The regex match first removes the
.qfrom the file name before matching regex, so specifying
join*.qwill not work.
If the feature is added in
See the FAQ "How do I add a test case?" for more details.
Legacy query test Drivers (all of them except TestBeeLineDriver) uses HiveCli to run the tests. TestBeeLineDriver runs the tests using the Beeline client. Creates a specific database for them, so the tests can run parallel. Running the tests you have the following configuration options:
-Dqfile=XXXXXX.q - To run one or more specific query file tests. For the exact format, check the Query Unit Test paragraph. If not provided only those query files from
ql/src/test/queries/clientpositive directory will be run which are mentioned in
itests/src/test/resources/testconfiguration.properties in the
-Dtest.output.overwrite=true - This will rewrite the output of the q.out files in
ql/src/test/results/clientpositive/beeline. The default value is false, and it will check the current output against the golden files
-Dtest.beeline.compare.portable- If this parameter is true, the generated and the golden query output files will be filtered before comparing them. This way the existing query tests can be run against different configurations using the same golden output files. The result of the following commands will be filtered out from the output files: EXPLAIN, DESCRIBE, DESCRIBE EXTENDED, DESCRIBE FORMATTED, SHOW TABLES, SHOW FORMATTED INDEXES and SHOW DATABASES.
-Djunit.parallel.threads=1- The number of the parallel threads running the tests. The default is
1. There were some flakiness caused by parallelization
-Djunit.parallel.timeout=10- The tests are terminated after the given timeout. The parameter is set in minutes and the default is 10 minutes. (As of HIVE 3.0.0.)
-Dtest.beeline.url- The jdbc url which should be used to connect to the existing cluster. If not set then a MiniHS2 cluster will be created instead.
-Dtest.beeline.user- The user which should be used to connect to the cluster. If not set
"user"will be used.
-Dtest.beeline.password- The password which should be used to connect to the cluster. If not set
"password"will be used.
-Dtest.data.dir- The test data directory on the cluster. If not set
<HIVEROOT>/data/fileswill be used.
-Dtest.results.dir- The test results directory to compare against. If not set the default configuration will be used.
-Dtest.init.script- The test init script. If not set the default configuration will be used.
-Dtest.beeline.shared.database- If true, then the default database will be used, otherwise a test-specific database will be created for every run. The default value is false.
Please see Debugging Hive code in Development Guide.
After you have committed a change or set of changes to your local repository, you need to create a patch to post on the JIRA. The naming convention for patches is:
<patch-num> is only required if it is not the first patch.
<branch-name> is only required if it is not master. (See Understanding Hive Branches above.)
So the first patch for JIRA HIVE-9999 intended to be applied to master would be named "
The second patch for the same JIRA would be named "
A patch for the same JIRA intended to be applied to the branch-1 branch would be named
The following git command creates a patch:
git diff --no-prefix <commit> > HIVE-1234.1.patch
<commit> is the last commit from Hive (not you) before your commits. Note that if it has been a while since you fetched or pulled from the Hive repository, you may need to do a rebase to get your commit(s) on top in order to create a patch that will cleanly apply.
Please do not:
If the name of your patch conforms to the naming convention shown above, the automated testing system will run precommit tests and post the results as a JIRA comment from Hive QA. The results give advisory +1 or -1 votes (SUCCESS or ERROR) based on whether all of the tests executed successfully and, more recently, whether existing tests are modified or new tests are included in the patch to cover the code changes. For examples, see the Hive QA comments on HIVE-9534 and HIVE-11752. Note that sometimes tests fail for reasons unrelated to the patch.
Patches with nonconforming names are ignored by Hive QA. One leading zero can be used in <patch-num>, such as "
HIVE-9999.02.patch", but multiple leading zeros are not accepted (see comments on HIVE-12981).
To retest a patch, you can cancel it and resubmit it. These two status changes are not necessary for a new patch that has a different filename, but if the filename stays the same then Hive QA ignores the patch after its first test unless you cancel and resubmit.
To prevent precommit testing, include the case-sensitive phrase NO PRECOMMIT TESTS in the Description section of the JIRA issue. You can remove it later as needed. For examples, see HIVE-5289, HIVE-7343, and HIVE-7375.
This section only gives the basic procedures for attaching and submitting a patch. See Contributing Your Work below for more information.
For patch updates, our convention is to number them like HIVE-1856.1.patch, HIVE-1856.2.patch, etc. And then click the "Submit Patch" button again when a new one is uploaded; this makes sure it gets back into the review queue.
To apply a patch that you either generated or found from JIRA, you can issue:
patch -p0 < cool_patch.patch
If you prefer to use git to apply the patch, the following patches the tree and runs git add on them ( this is very usefull, since it will not miss added/renamed files; and also enables git to oversee the conflicts...so git mergetool can be used to resolve the conflicts)
git apply -3 -p0 HIVE-1111.1.patch
If you just want to check whether the patch applies you can run patch with --dry-run option:
patch -p0 --dry-run < cool_patch.patch
If you are an Eclipse user, you can apply a patch by:
See Review Board for instructions.
Finally, patches should be attached to an issue report in JIRA via the Attach File link on the issue's JIRA. Please add a comment that asks for a code review. Please note that the attachment should be granted license to ASF for inclusion in ASF works (as per the Apache License).
When you believe that your patch is ready to be committed, select the Submit Patch link on the issue's JIRA. Unit tests will run automatically if the file is named according to the naming standards. See Hive PreCommit Patch Testing. Tests should all pass. If your patch involves performance optimizations, they should be validated by benchmarks that demonstrate an improvement.
If your patch creates an incompatibility with the latest major release, then you must set the Incompatible change flag on the issue's JIRA and fill in the Release Note field with an explanation of the impact of the incompatibility and the necessary steps users must take.
If your patch implements a major feature or improvement, then you must fill in the Release Note field on the issue's JIRA with an explanation of the feature that will be comprehensible by the end user.
The Release Note field can also document changes in the user interface (such as new HiveQL syntax or configuration parameters) prior to inclusion in the wiki documentation.
A committer should evaluate the patch within a few days and either: commit it; or reject it with an explanation.
Please be patient. Committers are busy people too. If no one responds to your patch after a few days, please make friendly reminders. Please incorporate others' suggestions into your patch if you think they're reasonable. Finally, remember that even a patch that is not committed is useful to the community.
Should your patch receive a "-1" select Resume Progress on the issue's JIRA, upload a new patch with necessary fixes, and then select the Submit Patch link again.
Committers: for non-trivial changes, it is best to get another committer to review your patches before commit. Use the Submit Patch link like other contributors, and then wait for a "+1" from another committer before committing. Please also try to frequently review things in the patch queue.
If you don't already have a JIRA account, sign Up for JIRA.
Please comment on issues in JIRA, making your concerns known. Please also vote for issues that are a high priority for you.
Please refrain from editing descriptions and comments if possible, as edits spam the mailing list and clutter JIRA's "All" display, which is otherwise very useful. Instead, preview descriptions and comments using the preview button (icon below the comment box) before posting them.
Keep descriptions brief and save more elaborate proposals for comments, since descriptions are included in JIRA's automatically sent messages. If you change your mind, note this in a new comment, rather than editing an older comment. The issue should preserve this history of the discussion.
To open a JIRA issue, click the Create button on the top line of the Hive summary page or any Hive JIRA issue.
Please leave Fix Version/s empty when creating the issue – it should not be tagged until an issue is closed, and then, it is tagged by the committer closing it to indicate the earliest version(s) the fix went into. Instead of Fix Version/s, use Target Version/s to request which versions the new issue's patch should go into. (Target Version/s was added to the Create Issue form in November 2015. You can add target versions to issues created before that with the Edit button, which is in the upper left corner.)
When in doubt about how to fill in the Create Issue form, take a look at what was done for other issues. Here are several Hive JIRA issues that you can use as examples:
Many examples of uncommitted issues are available in the "Added recently" list on the issues panel.
Some portions of the Hive code are generated by Thrift. For most Hive changes, you don't need to worry about this, but if you modify any of the Thrift IDL files (e.g.
service/if/hive_service.thrift), then you'll also need to regenerate these files and submit their updated versions as part of your patch.
Here are the steps relevant to
hive_metastore.thriftuntil instructed below.
thrift-0.9.3, which you can obtain from http://thrift.apache.org/.
./configure --without-csharp --without-ruby
sudo make install
which thriftreturns the build of Thrift you just installed (typically
/usr/local/binon Linux); if not, edit your PATH and repeat the verification. Also verify that the command 'thrift -version' returns the expected version number of Thrift.
mvn clean install -Pthriftif -DskipTests -Dthrift.home=/usr/local -Phadoop-2
cp /path/to/thrift-0.9.3/contrib/fb303/if/fb303.thrift /usr/local/share/fb303/if/fb303.thrift
svn statusfor the same. If you can't figure out what is going wrong, ask for help from a committer.
hive_metastore.thrift, and then run the compiler again, from /path/to/hive-trunk/<hive_metastore.thrift's module>:
mvn clean install -Pthriftif -DskipTests -Dthrift.home=/usr/local -Phadoop-2
git status and git diffto verify that the regenerated code corresponds only to the changes you made to
hive_metastore.thrift. You may also need
git addif new files were generated (and
git rmif files have been obsoleted).
ant clean package
Contributors should join the Hive mailing lists. In particular the dev list (to join discussions of changes) and the user list (to help others).