Thanks for your interest in Apache Impala!
Impala is an Apache-licensed open-source SQL query engine for data stored in Apache Hadoop clusters. We welcome contributions! This document contains some guidelines for contributing to Impala, and suggestions for the kind of contributions you can make.
What should I contribute?
Some of the most useful contributions to Impala are bug reports. If you discover a bug in Impala, please open a new JIRA ticket at the Impala JIRA tracker (after first checking to see if the bug has already been reported).
Bug reports are best when they contain a precise description of the problem, a minimal set of steps to reproduce the issue, and a comprehensive description of your environment (what version of Impala are you running? Are you using HDFS, S3 or HBase?)
Please do not create JIRA tickets for general user or developer questions - direct those to the appropriate mailing list - https://impala.apache.org/community.html
As with other open source communities, the best way to engage is to start small, by fixing some bugs or contributing small features. By doing so, you will earn a reputation as a contributor of high-quality patches, you will get to know the Impala community, and you will become familiar with our process for contribution.
If you’re looking for inspiration, Impala has a very active JIRA instance. We have labelled some JIRA tickets ‘newbie’ - these are ideal for someone who is looking to learn the codebase. If you find one you like, feel free to email email@example.com to discuss it, or dig right in. Before you start, though, register on the Apache JIRA system and ask someone on dev@ to assign the ticket to you. You can't assign yourself a JIRA until this is done. That way you don't end up in a race condition with another new contributor!
Impala is extremely actively developed, and all parts of the code are changing quickly as we work to meet Impala’s roadmap and provide the features our customers need. We are therefore very unlikely to be able to accept large, unsolicited patches into the codebase. If you are considering undertaking a substantial piece of work, please be forewarned: we cannot guarantee that we will be able to accept it, or that we will have the time to review it. If you are unsure, please ask before you start work.
We also strongly encourage improvements to our developer documentation: we need better build documentation, getting started guides, style guides and more!
See Documentation for instructions and guidelines on contributing to the Impala user documentation (SQL Reference, performance tuning, and so on).
How do I contribute code?
Get an account on the Apache Jira.
Subscribe to the dev list firstname.lastname@example.org. You do that by sending mail to email@example.com.
Find a JIRA that you would like to work on (or file one if you have discovered a new issue!). If no-one is working on it, assign it to yourself only if you intend to work on it shortly. Ramp-up JIRAs are a great place to look for your first patch ideas.
Jira has a whitelist of users that are permitted to modify Jiras. Mail the firstname.lastname@example.org list and request access to assign Jiras to yourself. Please provide your Apache Jira username in the email.
Except for the very smallest items, it’s a very good idea to discuss your intended approach either on the JIRA or on the email@example.com mailing list. You are much more likely to have your patch reviewed and committed if you’ve already got buy-in from the community before you start.
We may sometimes decide not to accept a proposed feature or bugfix. This does not mean we do not value your contribution, but more likely that we feel that it does not align with the goals of the project or it is not the right time.
Before getting started, you'll want to get your development environment set up. Read bin/bootstrap_development.sh to get going. You can also take a look at Impala Development Environment inside Docker.
Now start coding! As you are writing your patch, please keep the following things in mind:
First, try to follow best practices for coding and design (see Effective Coding Practices for guidelines).
Second, please follow the Impala style guide. We are sticklers for adherence to the guide (with a couple of exceptions), and we won’t accept any patch that needs work in this regard. For Java and Python we do not have formal guides yet (writing one would be an excellent contribution!); please try to follow the standards set by existing code. We will help you with this when your patch is reviewed.
You can use clang-format or clang-format-diff to format the whitespace of the C++ parts of your patch using the file .clang-format included in the git repository, which is tuned to match our current coding style as closely as possible. When you submit a clang-formatted patch, please do not reformat the entire file you are working on, but only the parts of it you are changing.
Third, please include tests with your patch. If your patch does not include tests, it will not be accepted. If you are unsure how to write tests for a particular component, please ask on the impala-dev mailing list for guidance.
Fourth, run all the existing Impala test to ensure your fix doesn't introduce any new issues. See How to load, run, and create new Impala tests for more details.
Fifth, please keep your patch narrowly targeted to the problem described by the JIRA. It’s better for everyone if we maintain discipline about the scope of each patch. In general, if you find a bug while working on a specific feature, file a JIRA for the bug, check if you can assign it to yourself and fix it independently of the feature. This helps us to differentiate between bug fixes and features and allows us to build stable maintenance releases.
Finally, please write a good, clear commit message, with a short, descriptive title and a message that is exactly long enough to explain what the problem was, and how it was fixed. Each should have 72 or fewer characters if possible. The first line should have an empty line after it, and the first line should begin with the ticket(s) addressed, followed by a colon and a space: "IMPALA-1234: ". Docs-only commits should have [DOCS] after the ticket numbers, like "IMPALA-1234: [DOCS] ". Here is an example of a good commit message:
IMPALA-1645 and IMPALA-1632: Verify Cache Directives
When a table is loaded in the catalog, we will now perform a check to
verify that the cache directive ID and cache replication factor is still
valid and the data is current.
If the cache directive does no longer exist, we issue a error message
and mark the table / partition as uncached. Furthermore, the replication
factor is updated with the information from the actual cache directive.
In the case of insert statement there is a special situation as the
catalog update is happening synchronously and will try to access the
cache directive information that might be stale. Thus in this insert
path, we catch the possible not found exception and reset the caching
When you have a patch that you consider ready for submission, submit it to our code review tool called Gerrit (see Using Gerrit to submit and review patches for details). You’ll see an e-mail go out to the firstname.lastname@example.org list.
After that, other community members will review your patch. You will likely need to submit an updated patch with some changes - reply to all the comments in Gerrit at the same time (mark as ‘Done’ those suggestions that you’ve taken without further comment). This process can go back and forth for a while - please don’t be discouraged, you’ll see it happens with all patches!
Once a patch is considered ready to go in, a committer will give it the ‘+2’ mark in Gerrit, and will run the submission testing job.
If the testing job completes successfully, Gerrit will automatically submit the patch to Impala’s Github repository. Congratulations! You are now a contributor to Impala
If the testing job does not complete successfully, you’ll see ‘Jenkins’ mark your patch with a -1, and you’ll need to work with your reviewer to fix the test failures.
Developer mailing list
We maintain a mailing list at email@example.com, which is the right place for Impala development discussions. Questions about using Impala still belong on the Impala user list, which is firstname.lastname@example.org.
This presentation called How to Get Your PostgreSQL Patch Accepted captures a lot of our requirements - particularly the focus on starting small, getting agreement, working on the design first, ensuring proper testing and committing to good comments in the code.