1. Defining a Workflow with CAS-PGE
2. Adding Metadata
Workflow Context Metadata
Every workflow instance has the following core metadata keys:
-TaskId
-WorkflowInstId
-JobId
-ProcessingNode
-WorkflowManagerUrl
-QueueName
-TaskLoad
The above met keys can be accessed inside a PGE script. For e.g:
<customMetadata> <metadata key="JobWorkDir" val="[PGE_WORK_DIR]/[JobId]"/> </customMetadata>
In addition to the above keys, you can add metadata to the workflow using the --metaData option while kicking off a workflow event.
The --metaData command line option adds key-value pairs to the Workflow context metadata as seen below:
./wmgr-client --url http://localhost:9001 --operation --sendEvent --eventName fileconcatenator-pge --metaData --key RunID testNumber1
The key can be used inside a PGE, such as in the augmented metadata i.e <customMetadata> , like below:
.. <customMetadata> <metadata key="InputFile" val= "SQL(FORMAT='$Filename') {SELECT Filename FROM GenericFile WHERE RID = '[RunID]' }" /> </customMetadata>
Augmenting Metadata in a PGE
This part is taken from the 'CAS-Workflow 2:A User Guide' by Brian Foster
The element for augmenting metadata is <customMetadata>. Although this element is at the end of the file, it doesn’t mean that it is the last to be loaded. <customMetadata> is actually the first element loaded in this pge-config.xml (the only other element that is loaded before it is the import element – not in this example). Inside <customMetadata> any number of <metadata> elements are allowed.
To pass metadata through all tasks in a workflow, you can specify the attribute workflowMet='true'. For example: <metadata key='filename' val='data.dat' workflowMet='true'/>
Metadata elements specified in a different file can be accessed in a PGE using the <import> tag. For example if common-metadata.xml contains the below:
<pgeConfig> <customMetadata> <metadata key="JobWorkDir" val="[PGE_WORK_DIR]/[JobId]"/> <metadata key="JavaHome" val="/usr/bin/java"/> <metadata key="RespJar" val="[WORKFLOW_HOME]/lib/somejarfile-0.0.jar"/> </customMetadata> <!--Add similar common metadata keys--> </pgeConfig>
The above file can be imported into the PGE task configs as shown below PgeConfig example:
<pgeConfig> <import file="common-metadata.xml"/> <exe dir="[JobWorkDir]" shellType="/bin/bash"> <cmd> [JavaHome] -cp [RespJar] [LoadClass] [Arguments] </cmd> </exe> <output> <dir path="[OutputDir]" createBeforeExe="true"> </dir> </output> <customMetadata> <metadata key="LoadClass" val="edu.usc.chla.vpicu.vpsdb.SomeClass"/> <metadata key="Arguments" val="blah1 blah2"/> </customMetadata> </pgeConfig>
Product-Type Metadata
The product-type metadata refers to the metadata for the files that are ingested during the workflow.This is defined in a met file that is specified in the "args" attribute of the 'files' element in the PgeConfig.xml :
<files name="FiletoIngest" metFileWriterClass="org.apache.oodt.cas.pge.writers.metlist.MetadataListPcsMetFileWriter" args="PGE_CONFIG_HOME/MetOut_FiletoIngest.xml"/>
The MetOut_FiletoIngest.xml should typically look like the below:
<?xml version="1.0" encoding="UTF-8"?> <metadataList> <!-- Any File --> <metadata key="ProductName" val="[Filename]"/> <metadata key="Filename"/> <metadata key="FileLocation"/> <metadata key="FileSize"/> <metadata key="ProductType"/> <!--Add any element specified in your elements.xml that you want to be written out as metadata for the output file--> </metadataList> <?xml version="1.0" encoding="UTF-8"?>
The metFileWriters create the metadata (.met) file for the output files that will be ingested by the file manager.