Committers

Current Committers

Name	Organization
Michael Armbrust	Databricks
Joseph Bradley	Databricks
Mosharaf Chowdhury	UC Berkeley
Jason Dai	Intel
Tathagata Das	Databricks
Ankur Dave	UC Berkeley
Aaron Davidson	Databricks
Thomas Dudziak	Groupon
Robert Evans	Yahoo!
Joseph Gonzalez	UC Berkeley
Thomas Graves	Yahoo!
Stephen Haberman	Bizo
Mark Hamstra	ClearStory Data
Yin Huai	Databricks
Shane Huang	National University of Singapore
Andy Konwinski	Databricks
Ryan LeCompte	Quantifind
Haoyuan Li	UC Berkeley
Davies Liu	Databricks
Cheng Lian	Databricks
Sean McNamara	Webtrends
Xiangrui Meng	Databricks
Mridul Muralidharam	Yahoo!
Andrew Or	Databricks
Kay Ousterhout	UC Berkeley
Sean Owen	Cloudera
Nick Pentreath	Mxit
Imran Rashid	Cloudera
Charles Reiss	UC Berkeley
Josh Rosen	Databricks
Sandy Ryza	Cloudera
Prashant Sharma	Imaginea, Pramati, Databricks
Ram Sriharsha	Hortonworks
Shivaram Venkataraman	UC Berkeley
Patrick Wendell	Databricks
Andrew Xia	Alibaba
Reynold Xin	Databricks
Matei Zaharia	Databricks, MIT

Review Process and Maintainers

Spark development follows the Apache voting process, where changes to the code are approved through consensus. We use a review-then-commit model, where at least one committer other than the patch author has to review and approve it before it gets merged, and any committer may vote against it. For certain modules, changes to the architecture and public API should also be reviewed by a maintainer for that module (which may or may not be the same as the main reviewer) before being merged. The PMC has designated the following maintainers:

Component	Maintainers
Spark core public API	Josh Rosen, Patrick Wendell, Reynold Xin, Matei Zaharia
Job scheduler	Mark Hamstra, Kay Ousterhout, Patrick Wendell, Matei Zaharia
Shuffle and network	Aaron Davidson, Reynold Xin, Matei Zaharia
Block manager	Aaron Davidson, Reynold Xin
YARN	Thomas Graves, Andrew Or
Python	Josh Rosen, Xiangrui Meng, Matei Zaharia
MLlib	Xiangrui Meng, Shivaram Venkataraman, Matei Zaharia
SQL	Michael Armbrust, Reynold Xin
Streaming	Tathagata Das, Matei Zaharia
GraphX	Ankur Dave, Joseph Gonzalez, Reynold Xin

Note that the maintainers in Spark do not "own" each module – every committer is responsible for the quality of the whole codebase. Instead, maintainers are asked by the PMC to ensure that public APIs and changes to complex components are designed consistently. Any committer may contribute to any module, and any committer may review any code change. If maintainers do not respond to a change within a reasonable amount of time, other committers may also merge it and ask the PMC to add more maintainers for that module.

Becoming a Committer

To get started contributing to Spark, learn how to contribute – anyone can submit patches, documentation and examples to the project.

The PMC regularly adds new committers from the active contributors, based on their contributions to Spark. The qualifications for new committers include:

Sustained contributions to Spark: Committers should have a history of major contributions to Spark. An ideal committer will have contributed broadly throughout the project, and have contributed at least one major component where they have taken an "ownership" role. An ownership role means that existing contributors feel that they should run patches for this component by this person.
Quality of contributions: Committers more than any other community member should submit simple, well-tested, and well-designed patches. In addition, they should show sufficient expertise to be able to review patches, including making sure they fit within Spark's engineering practices (testability, documentation, API stability, code style, etc). The committership is collectively responsible for the software quality and maintainability of Spark.
Community involvement: Committers should have a constructive and friendly attitude in all community interactions. They should also be active on the dev and user list and help mentor newer contributors and users. In design discussions, committers should maintain a professional and diplomatic approach, even in the face of disagreement.

The type and level of contributions considered may vary by project area -- for example, we greatly encourage contributors who want to work on mainly the documentation, or mainly on platform support for specific OSes, storage systems, etc.

How to Merge a Pull Request

Changes pushed to the master branch on Apache cannot be removed; that is, we can't force-push to it. So please don't add any test commits or anything like that, only real patches.

All merges should be done using the dev/merge_spark_pr.py script, which squashes the pull request's changes into one commit. To use this script, you will need to add a git remote called "apache" at https://git-wip-us.apache.org/repos/asf/spark.git, as well as one called "apache-github" at git://github.com/apache/spark. For the "apache" repo, you can authenticate using your ASF username and password. Ask Patrick if you have trouble with this or want help doing your first merge.

The script is fairly self explanatory and walks you through steps and options interactively.

If you want to amend a commit before merging – which should be used for trivial touch-ups – then simply let the script wait at the point where it asks you if you want to push to Apache. Then, in a separate window, modify the code and push a commit. Run "git rebase -i HEAD~2" and "squash" your new commit. Edit the commit message just after to remove your commit message. You can verify the result is one change with "git log". Then resume the script in the other window.

Also, please remember to set Assignee on JIRAs where applicable when they are resolved. The script can't do this automatically.

Child pages

Committers

Current Committers

Review Process and Maintainers

Becoming a Committer

How to Merge a Pull Request