How to Contribute to Apache Hive

This page describes the mechanics of how to contribute software to Apache Hive. For ideas about what you might contribute, please see open tickets in Jira.

Mavenization is complete

Hive now uses maven as a build tool as opposed to ant.

Getting the source code

First of all, you need the Hive source code.

Get the source code on your local drive using SVN. Most development is done on the "trunk":

svn checkout http://svn.apache.org/repos/asf/hive/trunk hive-trunk

You also have the option of using one of the Git mirrors of the SVN repository:

git clone git://git.apache.org/hive.git

or

git clone http://github.com/apache/hive.git

Setting up Eclipse Development Environment (Optional)

This is an optional step. Eclipse has a lot of advanced features for Java development, and it makes the life much easier for Hive developers as well.

How do I import into eclipse?

Making Changes

Before you start, send a message to the Hive developer mailing list, or file a bug report in JIRA. Describe your proposed changes and check that they fit in with what others are doing and have planned for the project. Be patient, it may take folks a while to understand your requirements.

Modify the source code and add some features using your favorite IDE.

Coding Convention

Please take care about the following points

All public classes and methods should have informative Javadoc comments.
- Do not use @author tags.
Code should be formatted according to Sun's conventions, with one exception:
- Indent two (2) spaces per level, not four (4).
- Line length limit is 100 chars, instead of 80 chars.
Contributions should not introduce new Checkstyle violations.
- Check for new Checkstyle violations by running ant checkstyle, and then inspect the results in the build/checkstyle directory.
- If you use Eclipse you should install the eclipse-cs Checkstyle plugin. This plugin highlights violations in your code and is also able to automatically correct some types of violations.
Contributions should pass existing unit tests.
New unit tests should be provided to demonstrate bugs and fixes. JUnit is our test framework:
- You must implement a class that extends junit.framework.TestCase and whose class name starts with Test.
- Define methods within your class whose names begin with test, and call JUnit's many assert methods to verify conditions; these methods will be executed when you run ant test.
- You can run all the unit test with the command mvn test, or you can run a specific unit test with the command mvn test -Dtest=<class name without package prefix> (for example mvn test -Dtest=TestFileSystem)

Understanding Maven

Hive is a multi-module maven project. If you are new to Maven, the articles below maybe of interest:

Additionally, Hive actually has two projects, "core" and "itests". The reason that itests is not connected to the core reactor is that itests requires the packages to be built.

The actually maven commands you will need located on the HiveDeveloperFAQ page.

Hadoop Dependencies

The Hive build downloads a number of different Hadoop versions via maven in order to compile "shims" which allow for compatibility with these Hadoop versions. However, by default, the rest of Hive is only built and tested against a single Hadoop version (1.2.1 as of this writing, but check pom.xml for the latest).

The maven build has two profiles, one for Hadoop 1 (0.20 and 1.X) and one for Hadoop 2 (2.X). By default the hadoop-1 profile is used, to use the hadoop-2 profile just specify -Phadoop-2

Trunk builds of Hive require Hadoop version at least 0.20.1; older versions are no longer supported.

Unit Tests

Please make sure that all unit tests succeed before and after applying your patch and that no new javac compiler warnings are introduced by your patch. Also see the information in the previous section about testing with different Hadoop versions if you want to verify compatibility with something other than the default Hadoop version.

When submitting a patch it's highly recommended you execute tests locally which you believe will be impacted in addition to any new tests. The full test suite can be executed by Hive PreCommit Patch Testing. See Hive Developer FAQ to see how to execute a specific set of tests.

> cd hive-trunk
> mvn clean install -DskipTests
> mvn test -Dtest=SomeTest

After a while, if you see

[INFO] BUILD SUCCESS

all is ok, but if you see

[INFO] BUILD FAILURE

Unit tests take a long time (several hours) to run sequentially even on a very fast machine; for information on how to run them in parallel, see Hive PreCommit Patch Testing

Add a Unit Test

There are two kinds of unit tests in Hive:

Normal unit test: These are used by testing a particular component of Hive.
- We just need to add a new class (name must start with "Test") in */src/test directory.
- We can run "ant test -Dtestcase=TestAbc" where TestAbc is the name of the new class. This will test only the new testcase, which will be faster than "ant test" which tests all testcases.
A new query: If the new feature can be tested using Hive command line, we just need to add a new *.q file and a new *.q.out file:
- If the feature is added in ql
  - Add a new XXXXXX.q file in ql/src/test/queries/clientpositive
  - Run "mvn test -Dcase=TestCliDriver -Dqfile=XXXXXX.q -Dtest.output.overwrite=true". This will generate a new XXXXXX.q.out file in ql/src/test/results/clientpositive.
    - If you want to run multiple .q files in the test run, you can specify comma separated .q files, for example- -Dqfile="X1.q,X2.q" . You can also specify a java regex, for example -Dqfile_regex='join.*'. (Note that it takes java regex, ie 'join.' and not 'join'). The regex match first removes the .q from the file name before matching regex, so specifying "join*.q" will not work.
  - If you are using hive-0.11.0 or later, you can specify -Dmodule=ql
- If the feature is added in contrib
  - Do the steps above, replacing "ql" with "contrib", and "TestCliDriver" with "TestContribCliDriver".
  - If you are using hive-0.11.0 or later, you can specify -Dmodule=contrib

Debugging

Please see Debugging Hive code in Development Guide.

Creating a patch

Check to see what files you have modified with:

svn stat

Add any new files with:

svn add .../MyNewClass.java
svn add .../TestMyNewClass.java
svn add .../XXXXXX.q
svn add .../XXXXXX.q.out

In order to create a patch, type (from the base directory of hive):

svn diff > HIVE-1234.1.patch.txt

This will report all modifications done on Hive sources on your local disk and save them into the HIVE-1234.1.patch.txt file. Read the patch file. Make sure it includes ONLY the modifications required to fix a single issue.

If you are using Git instead of Subversion, it's important that you generate your patch using the following command:

git diff --no-prefix <commit> > HIVE-1234.1.patch.txt

Please do not:

reformat code unrelated to the bug being fixed: formatting changes should be separate patches/commits.
comment out code that is now obsolete: just remove it.
insert comments around each change, marking the change: folks can use subversion to figure out what's changed and by whom.
make things public which are not required by end users.

Please do:

try to adhere to the coding style of files you edit;
comment code whose function or rationale is not obvious;
update documentation (e.g., package.html files, this wiki, etc.)

If you need to rename files in your patch:

Write a shell script that uses 'svn mv' to rename the original files.
Edit files as needed (e.g., to change package names).
Create a patch file with 'svn diff --no-diff-deleted --notice-ancestry'.
Submit both the shell script and the patch file.

This way other developers can preview your change by running the script and then applying the patch.

Updating a patch

For patch updates, our convention is to number them like HIVE-1856.1.patch.txt, HIVE-1856.2.patch.txt, etc. And then click the "Submit Patch" button again when a new one is uploaded; this makes sure it gets back into the review queue. Appending '.txt' to the patch file name makes it easy to quickly view the contents of the patch in a web browser.

Applying a patch

To apply a patch either you generated or found from JIRA, you can issue

patch -p0 < cool_patch.patch

if you just want to check whether the patch applies you can run patch with --dry-run option

patch -p0 --dry-run < cool_patch.patch

If you are an Eclipse user, you can apply a patch by : 1. Right click project name in Package Explorer , 2. Team -> Apply Patch

Review Process

See Phabricator for instructions.

Use Hadoop's code review checklist as a rough guide when doing reviews
In JIRA, use Submit Patch to get your review request into the queue.
If a committer requests changes, set the issue status to 'Resume Progress', then once you're ready, submit an updated patch with necessary fixes and then request another round of review with 'Submit Patch' again.
Once your patch is accepted, be sure to upload a final version which grants rights to the ASF.

Contributing your work

Finally, patches should be attached to an issue report in JIRA via the Attach File link on the issue's JIRA. Please add a comment that asks for a code review. Please note that the attachment should be granted license to ASF for inclusion in ASF works (as per the Apache License).

When you believe that your patch is ready to be committed, select the Submit Patch link on the issue's JIRA.

Folks should run ant clean package test before selecting Submit Patch. Tests should all pass. If your patch involves performance optimizations, they should be validated by benchmarks that demonstrate an improvement.

If your patch creates an incompatibility with the latest major release, then you must set the Incompatible change flag on the issue's JIRA 'and' fill in the Release Note field with an explanation of the impact of the incompatibility and the necessary steps users must take.

If your patch implements a major feature or improvement, then you must fill in the Release Note field on the issue's JIRA with an explanation of the feature that will be comprehensible by the end user.

A committer should evaluate the patch within a few days and either: commit it; or reject it with an explanation.

Please be patient. Committers are busy people too. If no one responds to your patch after a few days, please make friendly reminders. Please incorporate other's suggestions into your patch if you think they're reasonable. Finally, remember that even a patch that is not committed is useful to the community.

Should your patch receive a "-1" select the Resume Progress on the issue's JIRA, upload a new patch with necessary fixes, and then select the Submit Patch link again.

Committers: for non-trivial changes, it is best to get another committer to review your patches before commit. Use Submit Patch link like other contributors, and then wait for a "+1" from another committer before committing. Please also try to frequently review things in the patch queue.

JIRA Guidelines

Please comment on issues in JIRA, making their concerns known. Please also vote for issues that are a high priority for you.

Please refrain from editing descriptions and comments if possible, as edits spam the mailing list and clutter JIRA's "All" display, which is otherwise very useful. Instead, preview descriptions and comments using the preview button (on the right) before posting them. Keep descriptions brief and save more elaborate proposals for comments, since descriptions are included in JIRA's automatically sent messages. If you change your mind, note this in a new comment, rather than editing an older comment. The issue should preserve this history of the discussion.

Generating Thrift Code

Some portions of the Hive code are generated by thrift. For most Hive changes, you don't need to worry about this, but if you modify any of the Thrift IDL files (e.g. metastore/if/hive_metastore.thrift and service/if/hive_service.thrift), then you'll also need to regenerate these files and submit their updated versions as part of your patch.

Here are the steps relevant to hive_metastore.thrift:

Don't make any changes to hive_metastore.thrift until instructed below.
Use the approved version of thrift. This is currently thrift-0.9.0, which you can obtain from http://thrift.apache.org/.
Build the thrift compiler from its sources, then install it:
cd /path/to/thrift-0.9.0
./configure --without-csharp --without-ruby
make
sudo make install
Before proceeding, verify that which thrift returns the build of thrift you just installed (typically /usr/local/bin on Linux); if not, edit your PATH and repeat the verification. Also verify that the command 'thrift -version' returns the expected version number of Thrift.
Now you can run the ant 'thriftif' target to generate the Thrift code:
cd /path/to/hive-trunk/
ant thriftif -Dthrift.home=/path/to/thrift-0.9.0
if you see error about fb303.thrift not being found, copy it to appropriate directory and run above command again. On centOS/RHEL cp /path/to/thrift-0.9.0/contrib/fb303/if/fb303.thrift /usr/local/share/fb303/if/fb303.thrift
Use svn status to verify that the code generation was a no-op, which should be the case if you have the correct thrift version and everyone has been following these instructions. If you can't figure out what is going wrong, ask for help from a committer.
Now make your changes to hive_metastore.thrift, and then run the compiler again:
ant thriftif
Now use svn status and svn diff to verify that the regenerated code corresponds only to the changes you made to hive_metastore.thrift. You may also need svn add if new files were generated (and svn remove if files have been obsoleted).
cd /path/to/hive-trunk
ant clean package
Verify that hive is still working correctly with both embedded and remote metastore configurations.

MVN:

The maven equivalent of {{ant thriftif} is:

mvn clean install -Pthriftif -DskipTests -Dthrift.home=/usr/local

Stay involved

Contributors should join the Hive mailing lists. In particular the dev list (to join discussions of changes) and the user list (to help others).

Space shortcuts

Child pages

How to Contribute to Apache Hive

Getting the source code

Setting up Eclipse Development Environment (Optional)

Making Changes

Coding Convention

Understanding Maven

Hadoop Dependencies

Unit Tests

Add a Unit Test

Debugging

Creating a patch

Updating a patch

Applying a patch

Review Process

Contributing your work

JIRA Guidelines

Generating Thrift Code

Stay involved

See Also

Space shortcuts

Child pages

HowToContribute

How to Contribute to Apache Hive

Getting the source code

Setting up Eclipse Development Environment (Optional)

Making Changes

Coding Convention

Understanding Maven

Hadoop Dependencies

Unit Tests

Add a Unit Test

Debugging

Creating a patch

Updating a patch

Applying a patch

Review Process

Contributing your work

JIRA Guidelines

Generating Thrift Code

Stay involved

See Also