Programmatic Operation of LCF

A certain subset of LCF users want to think of LCF as an engine that they can poke from whatever other system they are developing. While LCF is not precisely a document indexing engine per se, it can certainly be controlled programmatically. Right now, there are three principle ways of achieving this control.

Control by Servlet API

LCF provides a servlet-based JSON API that gives you the complete ability to define connections and jobs, and control job execution. You can read about JSON here. The API can be called with either GET, POST, or multipart POST methods. The format of the servlet URL is as follows:

http[s]://<server_and_port>/lcf-api/json/<command>[?object=<json_argument>]

The servlet either returns an error response code (either 400 or 500) with an appropriate explanatory message, or a 200 response code and a JSON object. The json_argument parameter can be passed in either as part of the URL, or in POST data, whichever is most convenient. Bear in mind that URLs are limited by specification to 4096 characters, so for large payloads you will want to use multipart form data rather than encoding arguments on the URL.

The actual available commands are as follows:

Command	What it does	Argument format
outputconnection/list	List all output connections	N/A
outputconnection/get	Get a specific output connection	{"connection_name":<connection_name>}
outputconnection/save	Save or create an output connection	{"outputconnection":<output_connection_object>}
outputconnection/delete	Delete an output connection	{"connection_name":<connection_name>}
outputconnection/checkstatus	Check the status of an output connection	{"connection_name":<connection_name>}
authorityconnection/list	List all authority connections	N/A
authorityconnection/get	Get a specific authority connection	{"connection_name":<connection_name>}
authorityconnection/save	Save or create an authority connection	{"authorityconnection":<authority_connection_object>}
authorityconnection/delete	Delete an authority connection	{"connection_name":<connection_name>}
authorityconnection/checkstatus	Check the status of an authority connection	{"connection_name":<connection_name>}
repositoryconnection/list	List all repository connections	N/A
repositoryconnection/get	Get a specific repository connection	{"connection_name":<connection_name>}
repositoryconnection/save	Save or create a repository connection	{"repositoryconnection":<repository_connection_object>}
repositoryconnection/delete	Delete a repository connection	{"connection_name":<connection_name>}
repositoryconnection/checkstatus	Check the status of a repository connection	{"connection_name":<connection_name>}
job/list	List all job definitions	N/A
job/get	Get a specific job definition	{"job_id":<job_identifier>}
job/save	Save or create a job definition	{"job":<job_object>}
job/delete	Delete a job definition	{"job_id":<job_identifier>}
jobstatus/list	List all jobs and their status	N/A
jobstatus/get	Get a specific job's status	{"job_id":<job_identifier>}
jobstatus/start	Start a specified job manually	{"job_id":<job_identifier>}
jobstatus/abort	Abort a specified job	{"job_id":<job_identifier>}
jobstatus/restart	Stop and start a specified job	{"job_id":<job_identifier>}
jobstatus/pause	Pause a specified job	{"job_id":<job_identifier>}
jobstatus/resume	Resume a specified job	{"job_id":<job_identifier>}

Other commands having to do with reports have been planned, but not yet been implemented.

Output connection objects

The JSON format of an output connection object is as follows:

TBD

Authority connection objects

The JSON format of an authority connection object is as follows:

TBD

Job objects

The JSON format of a job is as follows:

TBD

Control via Commands

For script writers, there currently exist a number of LCF execution commands. These commands are primarily rich in the area of definition of connections and jobs, controlling jobs, and running reports. The following table lists the current suite.

Command	What it does
org.apache.lcf.agents.DefineOutputConnection	Create a new output connection
org.apache.lcf.agents.DeleteOutputConnection	Delete an existing output connection
org.apache.lcf.authorities.ChangeAuthSpec	Modify an authority's configuration information
org.apache.lcf.authorities.CheckAll	Check all authorities to be sure they are functioning
org.apache.lcf.authorities.DefineAuthorityConnection	Create a new authority connection
org.apache.lcf.authorities.DeleteAuthorityConnection	Delete an existing authority connection
org.apache.lcf.crawler.AbortJob	Abort a running job
org.apache.lcf.crawler.AddScheduledTime	Add a schedule record to a job
org.apache.lcf.crawler.ChangeJobDocSpec	Modify a job's specification information
org.apache.lcf.crawler.DefineJob	Create a new job
org.apache.lcf.crawler.DefineRepositoryConnection	Create a new repository connection
org.apache.lcf.crawler.DeleteJob	Delete an existing job
org.apache.lcf.crawler.DeleteRepositoryConnection	Delete an existing repository connection
org.apache.lcf.crawler.ExportConfiguration	Write the complete list of all connection definitions and job specifications to a file
org.apache.lcf.crawler.FindJob	Locate a job identifier given a job's name
org.apache.lcf.crawler.GetJobSchedule	Find a job's schedule given a job's identifier
org.apache.lcf.crawler.ImportConfiguration	Import configuration as written by a previous ExportConfiguration command
org.apache.lcf.crawler.ListJobStatuses	List the status of all jobs
org.apache.lcf.crawler.ListJobs	List the identifiers for all jobs
org.apache.lcf.crawler.PauseJob	Given a job identifier, pause the specified job
org.apache.lcf.crawler.RestartJob	Given a job identifier, restart the specified job
org.apache.lcf.crawler.RunDocumentStatus	Run a document status report
org.apache.lcf.crawler.RunMaxActivityHistory	Run a maximum activity report
org.apache.lcf.crawler.RunMaxBandwidthHistory	Run a maximum bandwidth report
org.apache.lcf.crawler.RunQueueStatus	Run a queue status report
org.apache.lcf.crawler.RunResultHistory	Run a result history report
org.apache.lcf.crawler.RunSimpleHistory	Run a simply history report
org.apache.lcf.crawler.StartJob	Start a job
org.apache.lcf.crawler.WaitForJobDeleted	After a job has been deleted, wait until the delete has completed
org.apache.lcf.crawler.WaitForJobInactive	After a job has been started or aborted, wait until the job ceases all activity
org.apache.lcf.crawler.WaitJobPaused	After a job has been paused, wait for the pause to take effect

Control by direct code

Control by direct java code is quite a reasonable thing to do. The sources of the above commands should give a pretty clear idea how to proceed, if that's what you want to do.

Caveats

The existing commands know nothing about the differences between connection types. Instead, they deal with configuration and specification information in the form of XML documents. Normally, these XML documents are hidden from a system integrator, unless they happen to look into the database with a tool such as psql. But the API commands above often will require such XML documents to be included as part of the command execution.

This has one major consequence. Any application that would manipulate connections and jobs directly cannot be connection-type independent - these applications must know the proper form of XML to submit to the command. So, it is not possible to use these command APIs to write one's own UI wrapper, without sacrificing some of the generality that LCF by itself maintains.

Child pages

Programmatic Operation of LCF

Programmatic Operation of LCF

Control by Servlet API

Output connection objects

Authority connection objects

Job objects

Control via Commands

Control by direct code

Caveats