You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Programmatic Operation of LCF

A certain subset of LCF users want to think of LCF as an engine that they can poke from whatever other system they are developing. While LCF is not precisely a document indexing engine per se, it can certainly be controlled programmatically. Right now, there are three principle ways of achieving this control.

Control by Servlet API

LCF provides a servlet-based JSON API that gives you the complete ability to define connections and jobs, and control job execution. You can read about JSON here. The API can be called with either GET, POST, or multipart POST methods. The format of the servlet URL is as follows:

http[s]://<server_and_port>/lcf-api/json/<command>[?object=<json_argument>]

The servlet either returns an error response code (either 400 or 500) with an appropriate explanatory message, or a 200 response code and a JSON object. The json_argument parameter can be passed in either as part of the URL, or in POST data, whichever is most convenient. Bear in mind that URLs are limited by specification to 4096 characters, so for large payloads you will want to use multipart form data rather than encoding arguments on the URL.

The actual available commands are as follows:

Command

What it does

Argument format

Response format

outputconnection/list

List all output connections

N/A

 

outputconnection/get

Get a specific output connection

{"connection_name":<connection_name>}

 

outputconnection/save

Save or create an output connection

{"outputconnection":<output_connection_object>}

 

outputconnection/delete

Delete an output connection

{"connection_name":<connection_name>}

 

outputconnection/checkstatus

Check the status of an output connection

{"connection_name":<connection_name>}

 

authorityconnection/list

List all authority connections

N/A

 

authorityconnection/get

Get a specific authority connection

{"connection_name":<connection_name>}

 

authorityconnection/save

Save or create an authority connection

{"authorityconnection":<authority_connection_object>}

 

authorityconnection/delete

Delete an authority connection

{"connection_name":<connection_name>}

 

authorityconnection/checkstatus

Check the status of an authority connection

{"connection_name":<connection_name>}

 

repositoryconnection/list

List all repository connections

N/A

 

repositoryconnection/get

Get a specific repository connection

{"connection_name":<connection_name>}

 

repositoryconnection/save

Save or create a repository connection

{"repositoryconnection":<repository_connection_object>}

 

repositoryconnection/delete

Delete a repository connection

{"connection_name":<connection_name>}

 

repositoryconnection/checkstatus

Check the status of a repository connection

{"connection_name":<connection_name>}

 

job/list

List all job definitions

N/A

 

job/get

Get a specific job definition

{"job_id":<job_identifier>}

 

job/save

Save or create a job definition

{"job":<job_object>}

 

job/delete

Delete a job definition

{"job_id":<job_identifier>}

 

jobstatus/list

List all jobs and their status

N/A

 

jobstatus/get

Get a specific job's status

{"job_id":<job_identifier>}

 

jobstatus/start

Start a specified job manually

{"job_id":<job_identifier>}

 

jobstatus/abort

Abort a specified job

{"job_id":<job_identifier>}

 

jobstatus/restart

Stop and start a specified job

{"job_id":<job_identifier>}

 

jobstatus/pause

Pause a specified job

{"job_id":<job_identifier>}

 

jobstatus/resume

Resume a specified job

{"job_id":<job_identifier>}

 

Other commands having to do with reports have been planned, but not yet been implemented.

Output connection objects

The JSON format of an output connection object is as follows:

TBD

Authority connection objects

The JSON format of an authority connection object is as follows:

TBD

Job objects

The JSON format of a job is as follows:

TBD

Control via Commands

For script writers, there currently exist a number of LCF execution commands. These commands are primarily rich in the area of definition of connections and jobs, controlling jobs, and running reports. The following table lists the current suite.

Command

What it does

org.apache.lcf.agents.DefineOutputConnection

Create a new output connection

org.apache.lcf.agents.DeleteOutputConnection

Delete an existing output connection

org.apache.lcf.authorities.ChangeAuthSpec

Modify an authority's configuration information

org.apache.lcf.authorities.CheckAll

Check all authorities to be sure they are functioning

org.apache.lcf.authorities.DefineAuthorityConnection

Create a new authority connection

org.apache.lcf.authorities.DeleteAuthorityConnection

Delete an existing authority connection

org.apache.lcf.crawler.AbortJob

Abort a running job

org.apache.lcf.crawler.AddScheduledTime

Add a schedule record to a job

org.apache.lcf.crawler.ChangeJobDocSpec

Modify a job's specification information

org.apache.lcf.crawler.DefineJob

Create a new job

org.apache.lcf.crawler.DefineRepositoryConnection

Create a new repository connection

org.apache.lcf.crawler.DeleteJob

Delete an existing job

org.apache.lcf.crawler.DeleteRepositoryConnection

Delete an existing repository connection

org.apache.lcf.crawler.ExportConfiguration

Write the complete list of all connection definitions and job specifications to a file

org.apache.lcf.crawler.FindJob

Locate a job identifier given a job's name

org.apache.lcf.crawler.GetJobSchedule

Find a job's schedule given a job's identifier

org.apache.lcf.crawler.ImportConfiguration

Import configuration as written by a previous ExportConfiguration command

org.apache.lcf.crawler.ListJobStatuses

List the status of all jobs

org.apache.lcf.crawler.ListJobs

List the identifiers for all jobs

org.apache.lcf.crawler.PauseJob

Given a job identifier, pause the specified job

org.apache.lcf.crawler.RestartJob

Given a job identifier, restart the specified job

org.apache.lcf.crawler.RunDocumentStatus

Run a document status report

org.apache.lcf.crawler.RunMaxActivityHistory

Run a maximum activity report

org.apache.lcf.crawler.RunMaxBandwidthHistory

Run a maximum bandwidth report

org.apache.lcf.crawler.RunQueueStatus

Run a queue status report

org.apache.lcf.crawler.RunResultHistory

Run a result history report

org.apache.lcf.crawler.RunSimpleHistory

Run a simply history report

org.apache.lcf.crawler.StartJob

Start a job

org.apache.lcf.crawler.WaitForJobDeleted

After a job has been deleted, wait until the delete has completed

org.apache.lcf.crawler.WaitForJobInactive

After a job has been started or aborted, wait until the job ceases all activity

org.apache.lcf.crawler.WaitJobPaused

After a job has been paused, wait for the pause to take effect

Control by direct code

Control by direct java code is quite a reasonable thing to do. The sources of the above commands should give a pretty clear idea how to proceed, if that's what you want to do.

Caveats

The existing commands know nothing about the differences between connection types. Instead, they deal with configuration and specification information in the form of XML documents. Normally, these XML documents are hidden from a system integrator, unless they happen to look into the database with a tool such as psql. But the API commands above often will require such XML documents to be included as part of the command execution.

This has one major consequence. Any application that would manipulate connections and jobs directly cannot be connection-type independent - these applications must know the proper form of XML to submit to the command. So, it is not possible to use these command APIs to write one's own UI wrapper, without sacrificing some of the generality that LCF by itself maintains.

  • No labels