This page serves as an area to collocate the documentation on any REST endpoints that the OODT infrastructure exposes.

Need a endpoint summary table here that anchors to the other sections.

The OODT PCS Pedigree service

JSON output format

For a set of product files, prod1, prod2, prod3, prod4, prod5, and prod6, where prod1 produced output files prod6 and prod2, and prod3, and prod3 produced output files prod5 and prod4.

{
   "pedigree":{
      "upstream":"prod1",
      "downstream":{
         "prod1":[
            "prod6",
            {
               "prod3":[
                  "prod5",
                  "prod4"
               ]
            },
            "prod2"
         ]
      }
   }
}

The below are the different REST endpoints to generate full and partial pedigree reports.

Full Report

To see the full Pedigree report for the file myfile.txt, access:

http://host/pcs-services/services/pedgiree/report/myfile.txt

Just the Upstream lineage

To see just the upstream lineage for the file myfile.txt, access:

http://host/pcs-services/services/pedgiree/report/myfile.txt/upstream

Just the Downstream lineage

To see just the downstream lineage for the file myfile.txt, access:

http://host/pcs-services/services/pedgiree/report/myfile.txt/downstream

The OODT PCS Health Monitor service

JSON output format

All calls to the Health Monitor REST service provide the following JSON output:

{
    "report": {
        "crawlerStatus": [
            {
                "crawlerName": "Crawler1", 
                "crawlerPort": "9020", 
                "status": "UP", 
                "url": "localhost"
            }
        ], 
        "daemonStatus": {
            "stubs": [
                {
                    "daemon": "batch stub", 
                    "status": "UP", 
                    "url": "http://localhost:2001"
                }
            ], 
            "fm": {
                "daemon": "File Manager", 
                "status": "UP", 
                "url": "http://localhost:9000"
            }, 
            "rm": {
                "daemon": "Resource Manager", 
                "status": "UP", 
                "url": "http://localhost:9002"
            }, 
            "wm": {
                "daemon": "Workflow Manager", 
                "status": "UP", 
                "url": "http://localhost:9001"
            }
        }, 
        "generated": "2011-02-15T06:57:07.591-0800", 
        "ingestHealth": [
            {
                "avgCrawlTime": 132.78640211640212, 
                "crawler": "Crawler1", 
                "numCrawls": 189
            }
        ], 
        "jobHealth": [
            {
                "numJobs": 0, 
                "state": "QUEUED"
            }, 
            {
                "numJobs": 0, 
                "state": "RSUBMIT"
            }, 
            {
                "numJobs": 0, 
                "state": "BUILDING CONFIG FILE"
            }, 
            {
                "numJobs": 0, 
                "state": "PGE EXEC"
            }, 
            {
                "numJobs": 0, 
                "state": "CRAWLING"
            }, 
            {
                "numJobs": 0, 
                "state": "STAGING INPUT"
            }, 
            {
                "numJobs": 7, 
                "state": "FINISHED"
            }, 
            {
                "numJobs": 0, 
                "state": "STARTED"
            }, 
            {
                "numJobs": 0, 
                "state": "PAUSED"
            }
        ], 
        "latestFiles": {
            "files": [
                {
                    "filepath": "/Users/mattmann/files/foo.bar/foo.bar", 
                    "receivedTime": "2011-01-22T15:19:21.126-08:00"
                }, 
                {
                    "filepath": "/Users/mattmann/files/foo.bar/foo.bar", 
                    "receivedTime": "2011-01-22T15:08:10.198-08:00"
                }, 
                {
                    "filepath": "/Users/mattmann/files/foo.bar/foo.bar", 
                    "receivedTime": "2011-01-22T15:06:03.659-08:00"
                }, 
                {
                    "filepath": "/Users/mattmann/files/blah.txt/blah.txt", 
                    "receivedTime": "2011-01-21T21:56:03.922-08:00"
                }
            ], 
            "topN": 20
        }
    }
}


The below REST-ful service descriptions show how to slice out different parts of the report. All reports at least have the generatedTime attribute, and then some combination of daemonStatus, and/or crawlerStatus, and (if the file manager is running) latestFiles, and if the workflow manager is running jobHealth, and if the crawlers are running ingestHealth.

Full Report

To see the full Health Monitor report, access:

http://host/pcs-services/services/health/report

Just the Daemons

To see the Health Monitor report, focused on just the PCS daemons (including batch stubs), access:

http://host/pcs-services/services/health/report/daemon

Note that daemons can be further "sliced out" by adding additional REST parameters, e.g.:

To see just the file manager status:

http://host/pcs-services/services/health/report/daemon/fm

To see just the workflow manager status

http://host/pcs-services/services/health/report/daemon/wm

To see just the resource manager status

http://host/pcs-services/services/health/report/daemon/rm

To see just the batch stub status

http://host/pcs-services/services/health/report/daemon/stubs

Just the Crawlers

To see the Health Monitor report, focused on just the PCS ingest crawlers, access:

http://host/pcs-services/services/health/report/crawlers

Crawlers can also be sliced out, similar to daemons, e.g.:

Slice a crawler out by name

To see a crawler with the name Crawler1, slice it out by:

http://host/pcs-services/services/health/report/crawlers/Crawler1

Just Job Processing Status

To see the Health Monitor report, focused on just the PCS workflow (job) processing status, broken down by state, access:

http://host/pcs-services/services/health/report/jobs

Job status may also be sliced out, similar to daemons and crawlers, by using the state parameter, e.g.:

Slice a job processing status by state

The following will display the number of jobs that are in the QUEUED state:

http://host/pcs-services/services/health/report/jobs/QUEUED

Just Ingest Processing Health

To see the Health Monitor report, focused on just the PCS ingest crawler health (with information like number of crawls and average crawl time), broken down by Crawler, access:

http://host/pcs-services/services/health/report/ingest

Ingest crawler health status can be sliced, similar to daemon status, crawler status, and the other types of information, e.g.:

Slice ingest crawler health status by name

To see the ingest health status of the Crawler with the name Crawler1, access:

http://host/pcs-services/services/health/report/ingest/Crawler1

Slice a job processing status by state


Endpoint:

http://host/pcs-services/services/health/report/jobs/{state}

HTTP Request

Method: GET
Parameters:

Name

Placement

Type

Description

state

path

Enum

Values range from "STARTED", "FINISHED", "PAUSED", "RSUBMIT", "QUEUED". There may be extended states but these are a list of the basic ones. Extended ones would be passed in the same manner.

HTTP Response

Valid

Status: 200 OK

JSON

{
   "report":{
      "generated":"2011-02-15T08:09:24.225-0800",
      "jobHealth":[
         {
            "state":"QUEUED",
            "numJobs":0
         }
      ]
   }
}

Example(s):

To display the number of jobs that are in the QUEUED state

http://host/pcs-services/services/health/report/jobs/QUEUED
  • No labels