Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Fix links, minor improvements

This page is a quick HOWTO explaining how to easily contribute patches to Nutch. It assumes you are using some form of Unix.

Table of Contents

Getting the source code

First of all, you need the Nutch source code.

Create a directory in which you want to store the Nutch source code on your local drive, clone the Nutch git repository and cd into the Nutch project folder:

No Format

> cd somewhereOnYourDisk
> git mkdir nutchclone https://github.com/apache/nutch.git
> cd nutch

then get the source code on your local drive using SVN.

...

Alternatively, you can use Apache's gitbox mirror. Please see UsingGit for further details.

Working time

Now it is time to work.

...

Building a patch

First of all, please perform some minimal non-regression tests by:

  • rebuilding the whole entire Nutch code
  • executing the whole entire unit tests.

Building Nutch

No Format

> cd somewhereOnYourDisk/nutch
> ant

After a while, if you see

No Format

BUILD SUCCESSFUL

all is ok, but if you see

No Format

BUILD FAILED

please, read carefully the errors messages and check your code.

Unit Tests

No Format

> cd somewhereOnYourDisk/nutch
> ant test

After a while, if you see

No Format

BUILD SUCCESSFUL

all is ok, but if you see

No Format

BUILD FAILED

please, read carefully the errors messages and check your code. Detailed error logs are found in `build/test` (core classes) or `build/my-plugin/test` for plugins (here the plugin "my-plugin").

It is possible to run individual unit tests (useful during development), see :

  • run unit tests from a single core test class (use class name without package path):

    No Format
    ant test-core -Dtestcase=TestMimeUtil
    


  • run unit tests for a specific plugin:

    No Format
    ant test-plugin -Dplugin=protocol-okhttp
    


  • or exclude test files by patterns:

    No Format
    ant -Dtest.exclude='TestCrawlDb*.java **/TestNutchServer*' test


See also bin/nutch junit and WritingPluginExample-1.2#Unit_testing.

...

If you are perfectionist you can also perform some functional tests by running a small crawl using Nutch. Please refer to the NutchTutorial

Opening a Pull-Request on Github

See the README on https://github.com/apache/nutch and the pull-request template how to open a pull-request on Github.

Creating a patch

Although a pull-request on Github is the prefered way of contribution, we still accept patches (not all contributors are on Github). In order to create a patch, just type from the root of the Nutch directory :

No Format

svn diff > myBeautifulPatch.patch
vi myBeautifulPatch.patch

or

No Format

git diff --no-prefix > myBeautifulPatch.patch
vi myBeautifulPatch.patch

if you are generating it from a Git repository.

This will report all modifications done on Nutch sources on your local disk and save them into the myBeautifulPath.patch file. Then edit the patch file in order to check that it includes ONLY the modifications you want to add to the Nutch SVN git repository.

Remember to generate a patch against a live branch, i.e. trunk master for Nutch 1.x and 2.x for Nutch 2.x. The other branches are snapshots of past releases and the code might have evolved since then.

...

  1. report your finding in Jira or on nutch-dev. It's always better to have one review more, than to introduce a regression because of insufficient testing.

Applying patches

A properly generated patch can be automatically applied to the source tree. The patch utility is one tool to apply patches. Change into the Nutch root folder and run:

No Format

> patch -p0 <path_to.patch

or

No Format
git apply path_to.patch

Do not ignore the output of patch, it may indicate errors. Applying a patch may (partially) fail, if the source code has changed meanwhile. A good starting point to learn more about patches it the Wikipedia article Patch.