This page both documents and provides a UML graphic for the Nutch 2.X REST API.
It explains the logic behind the entire API and also provides detail on the type of REST calls which can be made to the Nutch 2.x REST API. This can be read in conjunction with the documentation on bin/nutch nutchserver command.
Responsible class is AdminResource. This API point is created in order to get server status and manage server's state.
{
GET /admin |
}
Response contains server startup date, availible configuration names, job history and currently running jobs.
{
{ "startDate":1403716000012, "configuration":[ "default" ], "jobs":[ ], "runningJobs":[ ] } |
}
It is possible to stop running server using /admin/stop. You can use non-mandatory force parameter, if you want to stop server despite running tasks.
{
GET /admin/stop GET /admin/stop?force=true |
}
Response
{
Stopping in 5 seconds. |
}
Responsible class is JobResource. This point is created for job's management.
{
GET /job |
}
Response contains list of all jobs (running and history)
{
[ { "id":"job-id-5977", "type":"FETCH", "confId":"default", "args":null, "result":null, "state":"FINISHED", "msg":"", "crawlId":"crawl-01" } { "id":"job-id-5978", "type":"PARSE", "confId":"default", "args":null, "result":null, "state":"RUNNING", "msg":"", "crawlId":"crawl-01" } ] |
}
{
GET /job/job-id-5977 |
}
Response
{
{ "id":"job-id-5977", "type":"FETCH", "confId":"default", "args":null, "result":null, "state":"FINISHED", "msg":"", "crawlId":"crawl-01" } |
}
{
GET /job/job-id-5977/stop |
}
Response
{
true |
}
{
GET /job/job-id-5977/abort |
}
Response
{
true |
}
Create job with given parameters. You should either specify JobType or jobClassName.
{
POST /job/create { "crawlId":"crawl-01", "type":"FETCH", "confId":"default", "args":{"someParam":"someValue"} } POST /job/create { "crawlId":"crawl-01", "jobClassName":"org.apache.nutch.fetcher.FetcherJob" "confId":"default", "args":{"someParam":"someValue"} } |
}
Response is created job's id.
{
job-id-43243 |
}
{
GET /config |
}
Response contains names of availible configurations.
{
["default","custom-config"] |
}
{
GET /config/{configuration name} Examples: GET /config/default GET /config/custom-config |
}
Response contains parameters with values
{
{ "anchorIndexingFilter.deduplicate":"false", "crawl.gen.delay":"604800000", "db.fetch.interval.default":"2592000", "db.fetch.interval.max":"7776000", "db.fetch.retry.max":"3", .... .... } |
}
{
GET /config/{configuration name}/{property} Examples: GET /config/default/db.fetch.retry.max GET /config/custom-config/crawl.gen.delay |
}
Response contains parameter's value as string
{
604800000 |
}
Creates new nutch configuration with given parameters. It force field is true, then already existing configuration will be overrided, otherwise not.
{
POST /config/{configuration name} Examples: POST /config/new-config { "configId":"new-config", "force":"true", "params":{"anchorIndexingFilter.deduplicate":"false",... } } |
}
Response is created config's id.
{
new-config |
}
{
DELETE /config/{configuration name} Examples: DELETE /config/new-config |
}
{
PUT /config/{property name}/ value={value} Examples: PUT /config/anchorIndexingFilter.deduplicate value=true |
}
Responsible class is DbResource. This point is created in order to get data from database.
Examples:
{
POST /db { } POST /db { "fields": ["headers"] } POST /db { "batchId": "batch-id" } POST /db { "startKey":"http://google.com", "endKey":"http://yahoo.com", "isKeysReversed":"false", } POST /db { "startKey":"com.google", "endKey":"com.yahoo", "isKeysReversed":"true" } |
}
Response contains data from database with filtered fields.
{
{ "values":[ { "headers":{ }, "status":0, "markers":{ }, "modifiedTime":0, "score":0.0, "prevModifiedTime":0, "url":"http://google.com", "__g__dirty":"\\x00\\x00\\x00\\x00", "fetchInterval":0, "prevFetchTime":0, "inlinks":{ }, "retriesSinceFetch":0, "outlinks":{ }, "fetchTime":0, "metadata":{ } } ] } |
}
The Unified Modeling Language (UML) is a general-purpose modeling language in the field of software engineering, which is designed to provide a standard way to visualize the design of a system.
The graphic below displays the REST API architecture and described the classes as well as the role and context within the API operation.
Some comments about class roles in Nutch API.
back to FrontPage