1. Defining a Workflow with CAS-PGE
2. Adding Metadata
Workflow Context Metadata
Every workflow instance has the following core metadata keys:
-TaskId
-WorkflowInstId
-JobId
-ProcessingNode
-WorkflowManagerUrl
-QueueName
-TaskLoad
The above met keys can be accessed inside a PGE script. For e.g:
<customMetadata> <metadata key="JobWorkDir" val="[PGE_WORK_DIR]/[JobId]"/> </customMetadata>
In addition to the above keys, you can add metadata to the workflow using the --metaData option while kicking off a workflow event.
The --metaData command line option adds key-value pairs to the Workflow context metadata as seen below:
./wmgr-client --url http://localhost:9001 --operation --sendEvent --eventName fileconcatenator-pge --metaData --key RunID testNumber1
The key can be used inside a PGE, such as in the augmented metadata i.e <customMetadata> , like below:
.. <customMetadata> <metadata key="InputFile" val= "SQL(FORMAT='$Filename') {SELECT Filename FROM GenericFile WHERE RID = '[RunID]' }" /> </customMetadata>
Augmenting Metadata in a PGE
This part is taken from the 'CAS-Workflow 2:A User Guide' by Brian Foster
The element for augmenting metadata is <customMetadata>. Although this element is at the end of the file, it doesn’t mean that it is the last to be loaded. <customMetadata> is actually the first element loaded in this pge-config.xml (the only other element that is loaded before it is the import element – not in this example). Inside <customMetadata> any number of <metadata> elements are allowed.
- If you want a metadata to pass on through following tasks in a workflow, you can specify the attribute workflowMet='true'. For example: <metadata key='filename' val='data.dat' workflowMet='true'/>
Product-Type Metadata
The product-type metadata refers to the metadata for the files that are ingested during the workflow.This is defined in a met file that is specified in the "args" attribute of the 'files' element in the PgeConfig.xml :
<files name="FiletoIngest" metFileWriterClass="org.apache.oodt.cas.pge.writers.metlist.MetadataListPcsMetFileWriter" args="PGE_CONFIG_HOME/MetOut_FiletoIngest.xml"/>
The MetOut_FiletoIngest.xml should typically look like the below:
<?xml version="1.0" encoding="UTF-8"?> <metadataList> <!-- Any File --> <metadata key="ProductName" val="[Filename]"/> <metadata key="Filename"/> <metadata key="FileLocation"/> <metadata key="FileSize"/> <metadata key="ProductType"/> <!--Add any element specified in your elements.xml that you want to be written out as metadata for the output file--> </metadataList> <?xml version="1.0" encoding="UTF-8"?>
The metFileWriters create the metadata (.met) file for the output files that will be ingested by the file manager.