Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: subversion -> git

...

That is where this document comes in. The purpose of this document is to help you as developers take the next step in becoming contributing members of the Nutch community. We will cover a general overview of the Nutch development process including the different pieces and how they fit together. We will cover how the community works and interacts, using the mailing lists to search for information and how to ask questions to ensure that they get answered. We will also cover how to go about learning the internals of the Nutch codebase. We will cover how to use the JIRA for change requests and how to start developing for Nutch. And finally we will cover contributing back to the Nutch community. When we are finished you should have a good understanding of how the community works and how you can go about becoming a bigger part of that community.

Table of Contents

Table of Contents

The Nutch Community

Nutch Development Roles

...

JIRA and Issue/Request Tracking

...

The JIRA system is the central repository for all work wanting to be included int the Nutch source code base. The system tracks issues and feature requests by component, by version, and by status. You can view what requests are assigned to what person, what requests are currently being worked on, and which ones haven't been scheduled. You can search all requests by keyword or by various categories and filters. We will go into detail later on how to use the JIRA system to propose new functionality and submit bug fixes. For now understand this: If you are going to be a developer you will need to understand how to use the JIRA system as this is where you will propose new functionality, submit bug fixes, give you input on features other developers may be working on, and coordinate actions with other developers on specific pieces of functionality.

The address to signup for JIRA was given above. Once you have signed up you will have access to all of the Apache JIRA repository, not just the Nutch project.

Source Code Control through

...

Git

Source code control is very important to open source projects. Nutch uses the apache subversion repository for it source control. As a developer you will want to get into the habit of downloading and updated your development environment directly from the subversion repository. We will go into detail about how to do this later. There are two types of logins to the repository, users and committers. Users can download the repository but cannot make changes directly to the repository. You can make changes on your local system and those changes can be submitted to the JIRA system. Committers hold the committer role that we discussed previously. These individuals can make changes directly to the subversion repository and are responsible for take patches from the JIRA system and applying them to subversion where they then become available to all users.Git for its source control. See UsingGit how to access the Nutch source repositories.

Wiki and Documentation

The weakest part of most open source projects is their documentation and Nutch is no exception. Wikis are special web pages like the one that you are reading that allows users to directly edit text on the page and to create new pages. The wiki provides various tutorials and documentation for Nutch. Links to view the Nutch wiki and to register for the wiki are provided below.

As a developer one of the ways you can contribute back to the community is by documenting your hard won experience on the wiki. You can do this in the form of tutorials, articles, or simple notes and instructions. Anything that you have learned may be of use to other developers. The wiki is also used as a virtual white board to help document general themes and directions for the project.

...

When searching the list for errors you have received it is good to search both by component, for example fetcher, and by the actual error received. If you are not finding the answers you are looking for on the list, you may want to move to the JIRA and search there for answers.

...

Start by checking to see what files you have modified with:

No Format
svngit statstatus

Keep this list for later because you will want to make sure that only code that you have changed is included in your patch.

In order to create a patch, just type:

No Format
svngit diff > yourPatchName.patch

...

  • reformat code unrelated to the bug being fixed: formatting changes should be separate patches/commits.
  • comment out code that is now obsolete: just remove it.
  • insert comments around each change, marking the change: folks can use subversion to figure out what's changed and by whom.
  • make things public which are not required by end users.

Please do:

  • try to adhere to the coding style of files you edit.
  • comment code whose function or rationale is not obvious.
  • update documentation (e.g., package.html files, this wiki, etc.)

Finally, patches should be attached to the JIRA issue. You can do this by logging into the JIRA issue and clicking the attach file to this issue link on the left hand side of the JIRA issue page.

...

So you have developed some very useful functionality and contributed it back to the community. You consistently fix bugs. You answer questions for other users and developers on the mailing lists. All in all you are an asset to the community. At this point you may be invited to become a committer. At this point you would get an apache email address and direct access to the subversion git source code repository and you would be responsible for helping set the technical direction of the Nutch project.

...