1. Introduction
Historically data processing systems have been primarily controlled by file-based triggering mechanisms. These types of systems function like a chain-reaction: one file would trigger a process, which would generate another file, which would then trigger another process, and so forth. These systems, while easy to add and remove processes from the system, require the user to extensively understand how these processes are related to each other, so to avoid creating unwanted ‘chain-reactions’. Recently, efforts have been made to move towards more controlled processing system models, which utilize the concept of workflows. Workflows are more-or-less a tightly grouped set of processes. A workflow explicitly tells the processing system which set of processes should be run and in what order. Workflows run processes based off successful completion of previous processes in its mapping, thereby making file generation a criteria for successful completion of a process instead of being the triggering mechanism for the next process. This concept separates the workflow from the files it may generate, thereby allowing the processing system to perform more tasks than just file processing. In this paper you will learn how to use, configure, and understand design decisions of a workflow processing system, specifically CAS-Workflow2.
2. Workflows
2.1. Structure
Workflows consist of three parts: pre-conditions, a list of tasks (or processes) to perform, and post-conditions.
2.1.1. Pre-conditions
A pre-condition is a task whose purpose is to return a true/false answer to some question. Pre-conditions are requirements that must be meet before a workflow can run its tasks. An example of a pre-condition might be: checking for the existences of a particular file. After all pre-conditions have been meet, a workflow will execute its tasks.
2.1.2. Tasks
A Task is an activity or piece of work that needs to be done. Tasks are the atomic level of a workflow. The goal of any workflow is to run its tasks to successful completion. An example of a task might be: creating a visual map for a data file. After all tasks have completed, the workflow will then run its post-conditions.
2.1.3. Post-conditions
Post-conditions give the workflow the ability to evaluate whether or not its task successfully perform all their required duties. An example of a post-condition might be: checking for the existence of a file that a task was responsible for generating.
2.2. Lifecycle
Each workflow must go through a well-defined set of states or a lifecycle. We can easily deduce a few of the states from what we know already. A workflow starts by evaluating its pre-conditions, so we can call this state: PreConditionEval. Then it must execute its tasks, we’ll call this state: Executing. Then of course we have: PostConditionEval. Now, what if any of the three steps fail, we need a failure state, so hence the state: Failure. And, if everything goes as planed, we have the state: Success. Figure 1 further describes this workflow lifecycle. There are other states, however, for simplicity sake, these are the only states we will introduce for now, the other states will be introduced later, as more workflow knowledge is required to understand them.
Figure 1
2.2.1. PreConditionEval
Workflow is executing its pre-conditions.
2.2.2. Executing
Workflow is executing its tasks.
2.2.3. PostConditionEval
Workflow is executing its post-conditions.
2.2.4. Success
Workflow has successfully passed all pre-conditions, executed all tasks, and passed all post-conditions.
2.2.5. Failure
At least one of the workflow’s pre-conditions, tasks, or post-conditions have failed.
2.3. Context
Workflows can have context, which is kind of like their knowledge base. This context is also referred to as metadata. Metadata is a bucket of key/value(s) information that workflows have access to. An example of a metadata field might be: RunDate=’2009-01-20’. At times, tasks needs to talk to other tasks, or conditions would like to communicate something to the tasks that run after them. Workflows not only control the flow of conditions and tasks, they also control communication between them. Workflows accomplish this by the use of metadata. Conditions and tasks can also have their own metadata, which they don’t share with anyone else. A workflow has three categories of metadata: 1) static, 2) dynamic, and 3) local.
2.3.1. Static
This is metadata that is the same for every run of a workflow. A task can always assume this metadata will exist.
2.3.2. Dynamic
This is metadata that is passed into the workflow when it is run and/or set by other task and conditions when communicating with each other.
2.3.3. Local
This is dynamic metadata that is local to a task or condition.
These categories have a precedence order. By default, dynamic will override local and static, and local will override static, however, later I will show how this precedence order can be altered if so desired.
2.4. Everything is a Workflow
In order to simplify how process control is configured, tasks and conditions were also designed to be workflows. This means that almost anywhere we used the word workflow up until now, we could have replaced it with the word task and vise versa. However, there are a few exceptions, a task differs from a workflow in that it wraps an executable class, which performs some activity, and it cannot have any children workflows. Conditions are just specialized tasks, so the same applies to them as well. Yet, conditions differ from tasks in that they cannot have pre-conditions or post-conditions, since that would mean you could have a pre-condition for a pre-condition. So, in other words, a workflow is really just a workflow of workflows with pre and post-condition workflows.
2.5. Listeners
We now know that workflows have three different parts (or buckets) into which other workflows can be placed: pre-conditions, children workflows, and post-conditions. Workflows placed into these buckets are treated like black boxes. A workflow has no idea what types of workflows have been placed into these buckets. The workflow just knows that first the workflows in the pre-conditions bucket must pass before running the workflows in the children bucket, followed then by the workflows in the post-conditions bucket. The way a workflow knows what is going on with the workflows in its buckets is by registering itself as a listener for state changes in those workflows. When a workflow changes state, it will notify its listeners about the change. The listening workflow will then adjust its state depending on which bucket the state change notification came from. Earlier we learned about the lifecycle which each workflow goes through. This lifecycle is not only followed by the top workflow or root workflow, it is followed by every workflow in all of the different buckets as well. Workflows will change states in their lifecycle when one of the workflows in their buckets change state. For example, if that a workflow has a pre-condition workflow which changes state to Executing, upon notification, it will change its state to PreConditionEval. This notion of workflow lifecycle changes affecting other workflow lifecycles will be explained in greater detail later.
2.6. Types
There are two categories to workflows, there are workflows which control the run order of other workflows, and then there are workflows which track the execution of some process or activity. There are currently two workflows implemented which control run order of workflows:
2.6.1. Parallel
A workflow that runs all the workflows in its children bucket at the same time. Its metadata (or context) becomes the merge of all metadata of workflows in its children bucket.
2.6.2. Sequential
A workflow that runs the workflows in its children bucket one at a time, only running the next child workflow after its previous child workflow has finished. Its metadata (or context) is updated after each workflow from its children bucket is run, then passed to the next workflow to run from its children bucket.
The second category of workflows, which track the running of some process, we have already been introduced to, these are tasks and conditions:
2.6.3. Task
Tracks some executing activity. Its metadata is synched with this process periodically (see: Tasks).
2.6.4. Condition
Tracks some executing condition activity. Its metadata is synched with this executing condition periodically (see: Pre-conditions and Post-conditions)
3. Workflows in Workflows
Now that we understand the make up of a workflow, let look at an example. Let’s say we want a workflow that models going to the store to buy groceries. So the first step is to make sure we have our keys and wallet. These would be considered pre-conditions, because we can’t drive without our keys, and we can’t buy the groceries without our wallet. However, these pre-conditions can be performed at the same time. I can check if I have my keys while I am checking for my wallet, since checking for my keys does not depend on me checking for my wallet. So these pre-conditions would happen in ‘parallel’. After we’ve determined that we have our keys and wallet, we can now perform the tasks we have set out to do: drive to the store; buy our groceries; drive home. Since we can’t do one of these tasks without doing the one before it (that is, we can’t buy our groceries without driving to the store), these tasks are ‘sequential’. So our workflow model graph would look something like:
[id=’BuyGroceries’ execution=’sequential’] {PreCond: [id=’FindWalletAndKeys’ execution=’parallel’] [id=’FindWallet’ exectuion=’condition’] [id=’FindKeys’ execution=’condition’]} [id=’DriveToStore’ execution=’task’] [id=’PurchaseGroceries’ execution=’task’] [id=’DriveHome’ execution=’task’]
Let’s take this one step further now. Let’s say we brought a friend along to help with the shopping and we split up our list, so to cut the time in half. Now we have two people shopping at the same time:
[id=’BuyGroceries’ execution=’sequential’] {PreCond: [id=’FindWalletAndKeys’ execution=’parallel’] [id=’FindWallet’ exectuion=’condition’] [id=’FindKeys’ execution=’condition’]} [id=’DriveToStore’ execution=’task’] [id=’ PurchaseGroceries’ execution=’parallel’] [id=’YouPurchaseGroceries’ execution=’task’] [id=’FriendPurchaseGroceries’ execution=’task’] [id=’DriveHome’ execution=’task’]
Figure 2 shows the task mapping of this workflow. Usually, when you go to implement a workflow in the system, you will have a task diagram, which you will have to convert to a workflow model graph similar to the grocery store example above. So being able to look at one and realize the other is essential.
Figure 2
The following figures enumerates the recommended thought process which one should follow to identify workflows from a task graph.
Figure 3
Figure 4
Figure 5
4. Workflow Patterns
There are many complex workflow patterns out there. However, most patterns should be implementable with careful usage of different combinations of parallel and sequential workflows. In the unusual case where parallel and sequential won’t cut it, custom workflows can be written and plugged in (this is an advanced topic that will be discussed later). Here we will cover how to create the most common workflow patterns. More advanced patterns will be discussed later.
4.1. Parallel Split
4.1.1. Description
The divergence of a branch into two or more parallel branches each of which execute concurrently.
4.1.2. Diagram
Figure 6
4.1.3. Model Graph
[id=‘S1’ execution=‘sequential’]
[id=‘T1’ execution=‘task’]
[id=‘P1’ execution=‘parallel’]
[id=‘T2’ execution=‘task’]
[id=‘T3’ execution=‘task’]
4.2. Synchronization
4.2.1. Description
The convergence of two or more branches into a single subsequent branch such that the thread of control is passed to the subsequent branch when all input branches have been enabled.
4.2.2. Diagram
Figure 7
4.2.3. Model Graph
[id=‘S1’ execution=‘sequential’]
[id=‘P1’ execution=‘parallel’]
[id=‘T1’ execution=‘task’]
[id=‘T2’ execution=‘task’]
[id=‘T3’ execution=‘task’]
4.3. Combination of a Parallel Split into a Synchronization
4.3.1. Description
(See Parallel Split and Synchronization).
4.3.2. Diagram
Figure 8
4.3.3. Model Graph
[id=‘S1’ execution=‘sequential’]
[id=‘T1’ execution=‘task’]
[id=‘P1’ execution=‘parallel’]
[id=‘T2’ execution=‘task’]
[id=‘T3’ execution=‘task’]
[id=‘T4’ execution=‘task’]
5. Lifecycles in Lifecycles
We learned above how each workflow goes through its own lifecycle, which depends on is pre-condition, children, and post-conditions workflows’ lifecycles. Here we will learn how this actually works. First we are going to introduce a few more states: Queued, PreConditionSuccess, Ready, and ExecutionComplete. Figure 9 is an updated lifecycle diagram.
Figure 9
5.1. Queued
Workflow has been put on the main queue (assume this to be initial state for now)
5.2. PreConditionSuccess
Pre-conditions have all completed successfully.
5.3. Ready
Workflow is ready to run or at least one of its children workflows has reached this state.
5.4. ExecutionComplete
A workflow has completed executing or all workflows in its children bucket have completed successfully.
Let’s bring back the buying groceries example but this time we will add in the states (with everything starting in Queued state):
[id=’BuyGroceries’ execution=’sequential’ state=‘Queued’]
[id=’DriveToStore’ execution=’task’ state=‘Queued’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Queued’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’DriveHome’ execution=’task’ state=‘Queued’]
First BuyGroceries will get a chance to run its pre-conditions. Since there are two and they can run in parallel, they will both change states to Ready which will set FindWalletAndKeys to Ready since at least one of its children workflows has changed states to Ready:
[id=’BuyGroceries’ execution=’sequential’ state=‘Queued’]
[id=’DriveToStore’ execution=’task’ state=‘Queued’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Queued’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’DriveHome’ execution=’task’ state=‘Queued’]
Once resources become available to run these pre-conditions, they will execute and their states will change to Executing. This will cause FindWalletAndKeys to also change states to Executing and BuyGroceries to change states to PreConditionsEval since now it has running pre-conditions:
[id=’BuyGroceries’ execution=’sequential’ state=‘PreConditionEval’]
[id=’DriveToStore’ execution=’task’ state=‘Queued’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Queued’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’DriveHome’ execution=’task’ state=‘Queued’]
Let’s say FindWallet successfully completes, it will change states to Success, but nothing else changes because FindWalletAndKeys still has a child that is Excuting and BuyGroceries still has an Executing pre-condition:
[id=’BuyGroceries’ execution=’sequential’ state=‘PreConditionEval’]
[id=’DriveToStore’ execution=’task’ state=‘Queued’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Queued’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’DriveHome’ execution=’task’ state=‘Queued’]
What if FindKeys fails? Then it will change states to Failure, which will cause FindWalletAndKeys to changes states to Failure, which will then cause BuyGroceries to also change states to Failure. This means the workflow has failed to run, and is done. Since one of its pre-conditions failed to complete successfully, BuyGroceries can never run the workflows in its children bucket, so it has failed:
[id=’BuyGroceries’ execution=’sequential’ state=‘Failure’]
[id=’DriveToStore’ execution=’task’ state=‘Queued’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Queued’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’DriveHome’ execution=’task’ state=‘Queued’]
Now if FindKeys succeeds, then it will change states to Success, which will cause FindWalletAndKeys to change states to Success, since all of its children workflows have completed and, because it is a condition, it has no pre-conditions or post-conditions, so it is done. BuyGroceries will then change states to PreConditionSuccess, since all of its pre-conditions have completed successfully:
[id=’BuyGroceries’ execution=’sequential’ state=‘PreConditionSuccess’]
[id=’DriveToStore’ execution=’task’ state=‘Queued’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Queued’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’DriveHome’ execution=’task’ state=‘Queued’]
Since BuyGroceries has finished running its pre-conditions successfully, the workflows in its children bucket can now run. BuyGroceries is a sequential workflow so first DriveToStore will change states to Ready, which will cause BuyGroceries to change states to Ready:
[id=’BuyGroceries’ execution=’sequential’ state=‘Ready’]
[id=’DriveToStore’ execution=’task’ state=‘Ready’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Queued’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’DriveHome’ execution=’task’ state=‘Queued’]
When resources become available, DriveToStore will execute and change states to Executing, which will also change BuyGroceries state to Executing, since it now has an executing child workflow:
[id=’BuyGroceries’ execution=’sequential’ state=‘Executing’]
[id=’DriveToStore’ execution=’task’ state=‘Executing’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Queued’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’DriveHome’ execution=’task’ state=‘Queued’]
When DriveToStore successfully completes, it will change states to Success, and BuyGroceries will stay in Executing state because even though at the moment it is not actually executing any children workflow, it still considered to be executing, otherwise a workflow would have to be able to go backwards in a lifecycle. In order to simply things, a design decision was made that workflows can only move forward through its lifecycle. The only way a workflow will go backward through its lifecycle is if a user sets the workflow back manual (explained later). So now we have:
[id=’BuyGroceries’ execution=’sequential’ state=‘Executing’]
[id=’DriveToStore’ execution=’task’ state=‘Success’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Queued’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Queued’]
[id=’DriveHome’ execution=’task’ state=‘Queued’]
Next to run is PurchaseGroceries, which has two children workflows. PurchaseGroceries is a parallel workflow, so both YouPurchaseGroceries and FriendPurchaseGroceries will be set to Ready and PurchaseGroceries will also change to Ready:
[id=’BuyGroceries’ execution=’sequential’ state=‘Executing’]
[id=’DriveToStore’ execution=’task’ state=‘Success’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Ready’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Ready’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Ready’]
[id=’DriveHome’ execution=’task’ state=‘Queued’]
When both YouPurchaseGroceries and FriendPurchaseGroceries completes successfully, they will change states to Success, causing PurchaseGroceries to also change states to Success, and BuyGroceries again will stay in state Executing since it still has more children workflows to run:
[id=’BuyGroceries’ execution=’sequential’ state=‘Executing’]
[id=’DriveToStore’ execution=’task’ state=‘Success’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Success’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Success’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Success’]
[id=’DriveHome’ execution=’task’ state=‘Queued’]
Now DriveHome will change states to Ready:
[id=’BuyGroceries’ execution=’sequential’ state=‘Executing’]
[id=’DriveToStore’ execution=’task’ state=‘Success’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Success’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Success’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Success’]
[id=’DriveHome’ execution=’task’ state=‘Ready’]
When resources are available, DriveHome will run, changing its state to Executing:
[id=’BuyGroceries’ execution=’sequential’ state=‘Executing’]
[id=’DriveToStore’ execution=’task’ state=‘Success’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Success’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Success’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Success’]
[id=’DriveHome’ execution=’task’ state=‘Executing’]
On successful completion, DriveHome will change states to Success, and since BuyGroceries has no more children workflows, he also will change states to Success. The workflow is now done and has completed successfully:
[id=’BuyGroceries’ execution=’sequential’ state=‘Success’]
[id=’DriveToStore’ execution=’task’ state=‘Success’]
[id=’ PurchaseGroceries’ execution=’parallel’ state=‘Success’]
[id=’YouPurchaseGroceries’ execution=’task’ state=‘Success’]
[id=’FriendPurchaseGroceries’ execution=’task’ state=‘Success’]
[id=’DriveHome’ execution=’task’ state=‘Success’]
6. Workflow Processors
The lifecycle state changing shown in the BuyGroceries workflow example are controlled by WorkflowProcessor(s). When a workflow model graph is run in the system, the model is converted into workflow processors. Workflow processors are tied to the execution types. There is a workflow processor for parallel, sequential, task, and condition. They each control the different behavior seen by each. The custom workflow implementation mentioned before is done by writing a new workflow processor and tying it to an executing type name.
7. Task Instances
7.1. Execution
Task and condition processors have a TaskInstance attached to them. Before we mentioned that a task and condition processor monitor some external activity, this external activity is its TaskInstance. When a task or condition is given permission to run, it will be asked to create its TaskInstance, which will be sent to the runner. The TaskInstance is what actually performs the activity which the task or condition processor is responsible for and the TaskInstance will synchronize itself (i.e. metadata and state) with its processor periodically.
7.2. Metadata
TaskInstance metadata is stored by the engine in a special way. TaskInstance metadata is not only synchronized with its task or condition processor, it is also stored into a queriable metadata repository. The precedence order discussed in the metadata section is import to note here (See Context). Static, dynamic, and local metadata are grouped into one metadata (i.e. instance metadata) using precedence to determine the values of duplicate keys. This is the metadata that becomes queriable for that task instance.
7.2.1. Reserved Keys
- InstanceId : its workflow InstanceId
- ModelId : is its ModelId
- JobId : the JobId assigned to this TaskInstance
- State : this task’s (or condition) current state
- Host : the machine this TaskInstance is running on
- CreationDate : the date its task processor was created (in UTC format)
- ReadyDate : the date its task processor was ready to run (in UTC format)
- ExecutionDate : the date its task processor was executed (in UTC format)
- CompletionDate : the date its task processor completed (in UTC format)
8. Task and Condition Result States
When a task or condition executes its instance, it is allows to return one of three result states to notify its task or condition workflow processor how execution went.
8.1. ResultsSuccess
8.1.1. Task
Means it successfully completed its process or activity it was assigned to perform.
8.1.2. Condition
Means the condition returned True to the question it was asked to answer.
8.2. ResultsFailure
8.2.1. Task
Means the task failed to complete successfully.
8.2.2. Condition
Condition answered False to its question and even if you run it again it will still be False.
8.3. ResultsBail
8.3.1. Task
Means the task is asking to be run again at a later date, at the moment it was unable to run, but it believes it will be able to succeed if run again later.
8.3.2. Condition
Means the condition answered False to its question, however it is asking to be run again later, because it believes it might be able to answer True at a later date.
9. Workflow States
The remaining workflow states will be introduced here and the rippling state change rules will be laid out. All the states below have a next state of Failure, however it will only be mentioned if there are special circumstances surrounding the state change.
9.1. Null
9.1.1. Description
This is the actual initial state of a workflow, however this will never be visual to the user. This is a code visible only state; if you ever see a workflow in this state, notify the developers, for there is a bug in the system.
9.1.1. Next State
This state will immediately change states to Loaded after all workflow processors have been created for a given workflow model graph.
9.2. Loaded
9.2.1. Description
This means all workflow processors for a given workflow model graph have been successfully loaded. Again, this is not a visible state to the user.
9.2.1. Next State
This state will change states to Queued after it has been placed on the engine’s queue (See Queue Manager).
9.3. Queued
9.3.1. Description
This state means that the workflow processors have been placed into the engine’s queue (See Queue Manager).
9.3.2. Change
Will change states to PreConditionEval if the workflow processor has pre- conditions to run, otherwise it will jump to Ready.
9.4. PreConditionEval
9.4.1. Description
A workflow processor is currently executing its pre-conditions.
9.4.2. Next State
If all pre-conditions return ResultsSuccess, then it will change states to PreConditionSuccess. If any pre-conditions returns ResultsFailure, then it will change states to Failure. And, if any pre-condition returns ResultsBail, it will change states to Blocked.
9.5. PreConditionSuccess
9.5.1. Description
All pre-conditions for a workflow processor have completed successfully.
9.5.2. Next State
A workflow processor will change states to Ready once at least one of its children workflow processors has reached Ready state or it is placed on the engine’s runnables queue (See Queue Manager).
9.6. Ready
9.6.1. Description
A workflow processor is in ready state when it has been put on the runnables queue, or when at least one of its child workflow processors has been put on the runnables queue (See Queue Manager).
9.6.2. Next State
Will change states to Executing once the resources become available for it to run.
9.7. Executing
9.7.1. Description
A workflow processor is in this state when either it has been given the resources to run, or one of its children workflow processors has entered this state.
9.7.2. Next State
Will change state to ExecutionComplete after successfully executing or all of its children workflows have completed successfully (have reached Success state).
9.8. ExecutionComplete
9.8.1. Description
This state is reached when a workflow processor has executed successfully or when all of its children workflow processors have reached Success state.
9.8.2. Next State
If a workflow processor has post-conditions, then next state with be PostConditionEval, otherwise, the workflow processor will reach Success state.
9.9. PostConditionEval
9.9.1. Description
A workflow processor is executing its post-conditions.
9.9.2. Next State
Next state is Success, if all post-conditions finish successfully.
9.10. Success
9.10.1. Description
A workflow processor has successfully performed all steps in its lifecycle.
9.10.2. Next State
This is a final state, so there is no next state.
9.11. Failure
9.11.1. Description
A workflow processor failed to executed one of its steps in its lifecycle.
9.11.2. Next State
This is a final state, so there is no next state.
9.12. Paused
9.12.1. Description
Workflow processor stops executing its lifecycle.
9.12.2. Next State
Whatever the state was before the workflow processor was Paused (i.e. basically an un-pause).
9.13. Stopped
9.13.1. Description
A user force quit the workflow processor.
9.13.2. Next State
This is a final state, so there is no next state.
9.14. Blocked
9.14.1. Description
A condition has returned ResultsBail, so the workflow will block. You can set the metadata field BlockTimeElapse to the time in minutes that the condition should remain blocked until it allows itself to run again.
9.14.2. Next State
Workflow will wait until block wait time period is reached and then will change its state back to Queued (this might seem like a workflow is going back in the lifecycle, but instead it is a lifecycle restart).
9.15. Unknown
9.15.1. Description
A state that a task or condition will go into if an invalid ResultsState is returned by its TaskInstance.
9.15.2. Next State
The next state is whatever the user decides to change the state to.
9.16. OFF
9.16.1. Description
A state signifying that a workflow should be ignored for this particular run.
9.16.2. Next State
This is final state, so there is no next state.
10. Workflow State Categories
All workflow states have been grouped into categories or super-states. The usage for this will become apparent later when you are introduce to Actions and Events. The list of categories are (Figure 10 graphical shows state to category mapping):
10.1. Initial
This is the group of states which the user does not see, it is used internal to the engine during the conversion processes of a workflow model becoming workflow processors. There are two states in this category and they are: Null and Loaded.
10.2. Waiting
This is the group of states which repesent a point in the workflow lifecycle that is waiting on some factor out of the workflow processors control. There are three states in this category and they are: Blocked, Queued, and Ready.
10.3. Holding
This is the group of states which means a workflow is waiting for some sort of user intervention before the workflow can continue. There are two states in this category and they are: Unknown and Paused.
10.4. Transition
This is the group of states in which a workflow has completed to a point and is now waiting for the engine to give it permission to evaluate its next move (i.e. determine next task to run). There are two states in this category and they are: PreConditionSuccess and ExecutionComplete.
10.5. Running
This is the group of states in which a workflow is running or has some task or condition running from one of its workflows in its three buckets: pre-conditions, children workflows, or post-conditions. There are three states in this category and they are: Executing, PreConditionEval, and PostConditionEval.
10.6. Done
This is the group of states in which workflows will be in when they have completed in some manor (whether good or bad) – they will not continue to run. There are four states in this category: Success, Failure, Stopped, and OFF.
10.7. Results
This is the category which the TaskInstance ResultsStates are in. This category is for internal engine use.
Figure 10
11. Engine
The workflow manager’s main component is its engine. The engine is responsible for handling all workflows put into the system. The engine requires two threads to run no matter how many workflows are given it. One of the two threads is used by the engine’s QueueManager. The second is used to submit runnable tasks to the engine’s Runner.
11.1. Queue Manager
The queue manager is responsible for managing workflows given to the engine. When the queue manager is given a workflow, the workflow is first cached (this allows many workflows to be put into the system without overflowing the process’s heap). This caching is not meant to be a speed improvement, if anything is actually makes the engine slower. The purpose of this cache is to only have in memory the workflows that are currently needed. So, anytime you perform an action on a workflow, the engine’s queue manager will first load the workflow into memory, if it hasn’t already, perform the desired action on it, then when it is done with it, it will be marked to be removed from the cache after no other component is using it. The caching is thread-safe, multiple clients can perform an action on the same workflow and the actions will be performed in the order in which they were received. The queue manager is also responsible for figuring out which of the tasks from all the workflows it has been given should be the next one to run. The queue manager determines this by first asking all the workflows for any tasks they would like to run. These task then end up in the queue manager’s runnables queue, which is then priority sorted by the PriorityManager. The top task in this queue is the task next given to the engine when the engine asks for the next runnable task.
11.2. Runner
The engine’s runner is responsible for running concurrently any tasks given to it. Currently there are two implemented EngineRunners: LocalEngineRunner and ResourceEngineRunner. LocalEngineRunner runs the tasks locally in the engine’s jvm. This runner should only be used for beginner setup, it is only meant to be an emulation runner, not an operational runner. ResourceEngineRunner submits the jobs to CAS-Resource Manager. When using this runner you must set the metadata key QueueName to one of the supported CAS-Resource Manager queues. ResourceEngineRunner is the preferred operational EngineRunner.
12. Workflow Core
At this point you should understand the complete process which a workflow goes through inside the engine. When you tell the engine to run a workflow, it finds the model for that workflow, then creates the appropriate WorkflowProcessors to execute that model. These WorkflowProcessors will be a combination of either: ParallelProcessor, SequentialProcessor, TaskProcessor, or ConditionProcessor. We can see from Figure 11 that all extend WorkflowProcessor and that ConditionProcessor is a TaskProcessor. This shouldn’t come as a surprise, since we learned before that a condition is just a special case task (See Everything is a Workflow). Now when a TaskProcessor (ConditionProcessor included) is given permission to run, it is then given to the engine’s QueueManager which priority sorts via its PriorityManager. The engine will then, one at a time, take the next TaskProcessor, have it create its TaskInstance, then send it to its Runner for concurrent execution.
Figure 11
13. Workflow Priority
Each workflow has its own priority. There is a reserved metadata key, Priority, which can be any double between the numbers 1 and 10. Priority only affects a workflow if it existed in its static metadata. Using static metadata allows you to specify different priorities for each workflow by setting Priority differently in each workflow’s static metadata. If you want to dynamically change the priority of a workflow, you can override all the priorities in that workflow when you start it (see StartWorkflow), or after you start it you can target a specific child workflow processor (see ChangeWorkflowPriority). When assigning a task a priority (does not apply to conditions in this case) it will automatically add 0.1 to the given priority. The reason for this is to insure tasks get move to the top of the queue as so a possible, because the goal of any workflow is to get its tasks done – a task should run as soon as possible after all it’s pre-conditions have passed. This means that a task will get moved ahead of a condition with the same priority on the runnables queue (see Queue Manager). So when you start a workflow and override its priority you are setting all the workflow processors to the priority you gave, however task processors get a 0.1 addition priority boost. The priority sorting is controlled by the engine’s QueueManager’s PriorityManager. You can swap out the PriorityManager used in order to get custom priority handling (this will be discussed later).
14. Excused Sub-Processors
There are times where all the children workflows of a Parallel workflow don’t need to successfully complete (that is, if only one of the children workflows fail, this might still be okay). You can do this by marking workflows as excused. Let again bring back the Grocery Store example (see Lifecycles in Lifecycles). Let’s say the other person brought to the store was a girl, so we could improve FindWallet by adding it into a parallel processor FindMoney such that we have two conditions running in parallel: FindWallet and FindPurse. We then set FindPurse as an excused child processor for FindMoney. This now gives us:
[id=’BuyGroceries’ execution=’sequential’]
{PreCond:
[id=’FindWalletAndKeys’ execution=’parallel’]
[id=’FindMoney’ execution=’parallel’ excused=’FindPurse’]
[id=’FindWallet’ execution=’condition’ ]
[id=’FindPurse’ execution=’condition’]
[id=’FindKeys’ execution=’condition’]
…
…
…
This means that we will be looking for a wallet, a purse, and keys at the same time. However, by making FindPurse an excused child processor, finding the wallet is good enough for FindMoney to complete successfully, it would be nice if FindPurse succeeded, however if it fails, we can still continue. In the case where there are more children, you can specify more than one excused workflow. You can also excuse children of sequential workflow processors; this just means that the excused children are allowed to fail without causing any problems for the next processor children.
15. Server/Client Introduction
Workflow Manager 2 uses a popular server/client framework known as XML-RPC. This means that the workflow manager’s engine is placed in a simplified web-server. This server can be sent messages from a client in the form of Actions. A server may also have Events which can be triggered by sending the appropriate Action to the server.
15.1. Actions
Actions are requests to a workflow engine server, which are stored on the client side. Each client could have its own set of actions. An action is made up of a set of calls to the server, which produce some formatted output or tells the server to perform given tasks.
15.1.1. Pre-conditions
Actions may have pre-conditions attached to them (these are not the be confused with workflow pre-conditions). These are checks which are performed to insure the action is runnable at the current moment. The only action pre-condition that is currently implemented and used is: EnsureServerFullyLoaded. This pre-condition checks if all workflow processors have been reloaded after an engine server restart. If the pre-condition succeeds, it will cause a logging message to print out before each actions logging:
May 11, 2010 9:38:58 AM gov.nasa.jpl.oodt.cas.workflow.precondition.PreConditionedComponent passesPreConditions
INFO: Successfully passed action precondition 'EnsureServerFullyLoaded'
Otherwise you will get a logging message saying that it failed, and the action will not be run. In which case, you must wait until the server has fully reloaded all it workflow processors. This usually happens when you bring the workflow engine server down with a lot of workflows in the system, then when bring it back up, it must reload the necessary workflow processor information (i.e. InstanceId, ModelId, State, etc) into memory for each processor and also validates the states of each workflow processor such that any running task at the time of shutdown will get a chance to re-execute. You can check on the progress of this by running the action GetPercentLoaded.
15.2. Events
Events are requests to the server, which are stored on the server end. Any post server request processing is performed on the server side. The client executes an action that triggers an event, stored on the server side. Once an event is triggered there is currently no way of cancelling it besides force killing the server.
16. Workflow Engine Server Control
The workflow engine has two modes which it can run in: Normal and Debug. Normal mode is pretty much just normal. Debug mode brings the engine up without creating the two processing threads discussed in the section about the Engine. This can be considered a query-only mode, for it allows the user to analyze the current state of the engine and its workflows without the engine processing any of its workflows (i.e. everything is frozen while in debug mode).
16.1. Command-Line
Here we will start with some actual command line controls for the workflow manager. All commands should be run from the directory they exist in (i.e. workflow deployment bin directory).
16.1.1. Startup
Launches the workflow engine server:
$ ./engine start
16.1.2. Shutdown
Shuts down the running workflow engine server (NOTE: this command wraps the Shutdown action which performs a safe engine shutdown – if the engine for some reason gets into a weird state, this command may not work, in which case you may have to force kill the engine):
$ ./engine stop
16.1.3. Debug
Debug mode brings up the workflow engine server in a non-processing startup mode (same as startup, except server will not process any workflows). In other words, this a query-only mode. The following command launches the workflow engine server in debug mode (NOTE: server must be down):
$ ./engine debug
17. Using Command-Line Client
All command line communication with a running workflow engine server is done using the ‘engine-client’ script. To get this script’s usage on the command-line, use the option
-h or --help. Anytime ‘engine-client’ is used, the ‘-cfb’ option is always need. The argument to this option should be the factory bean, which connects to the workflow engine server you wish to communicate with (factory beans will be discussed later, for now, always use the bean: WorkflowEngineClientFactory). The ‘-a’ option, followed by the name of an action, performs that action on the given server. At any time, you can get a list of a client’s supported actions by running: $ ./engine-client -psa.
18. Actions Enumerated
18.1. TriggerEvent
18.1.1. Description
Triggers an event on the Workflow Engine Server
18.1.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a TriggerEvent \
-eid <EventId> \
[-m <key> <val> <val> . . . <val>]*
- EventId is any id returned by the action PrintSupportedEvents.
- -m is optional, may be specified 0-n times, and is for passing metadata to the event
18.1.3. Example
This example will run the default built-in test workflow, which tests to insure metadata flows through workflow processors as per design:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a TriggerEvent \
-eid RunTest
This really is just submitting the workflow TestWorkflow. This could also be done by using StartWorkflow action as such:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a StartWorkflow \
-mid TestWorkflow
The difference between the two is that TriggerEvent submits the workflow on the server side, where as StartWorkflow is an action that is performed by the client telling the server to submit the workflow.
18.1.4. Output
- If the event was successfully triggered expect to see output similar to:
May 11, 2010 9:01:53 AM gov.nasa.jpl.oodt.cas.workflow.server.action.TriggerEvent performAction INFO: Successfully triggered event 'RunTest' with input metadata '{}'
NOTE: By stating a workflow using an event, you will not receive an InstanceId for the workflow on the client side, it will only print out on the server side. If you need an InstanceId back, you must instead use the StartWorkflow option as show in the example section above.
- If we submit an event that doesn’t exist, such as RunTest1, expect to see output similar to (Note: Exception messages are concatenated by colons, so usually the last message is the message you will be interested in, in this case: ‘Event RunTest1 not registered with this server’):
May 11, 2010 9:18:12 AM gov.nasa.jpl.oodt.cas.workflow.server.CommandLineClient main
SEVERE: Failed to submit action 'TriggerEvent' to engine server via client factory bean 'WorkflowEngineClientFactory' : org.apache.xmlrpc.XmlRpcException: java.lang.Exception: gov.nasa.jpl.oodt.cas.workflow.exceptions.EngineException: Failed to trigger event RunTest1 : Event RunTest1 not registered with this server
18.2. GetWorkflowsByState
18.2.1. Description
Gets a list of workflow processors in the given state – this is based off the top or root workflow processor (i.e. In the grocery store example in section Lifecycles in Lifecycles, this would be the state of BuyGroceries).
18.2.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetWorkflowsByState \
-pn 1 \
-ps 10 \
-st <State>
- State is any state returned by the action GetSupportedStates.
- -pn is the page number you want to view
- -ps is the page size or, in other words, the number of workflow processor you want to view at a time.
18.2.3. Example
This example gives you the first page of workflows in the Success state (Note: last page may not be a full page):
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetWorkflowsByState \
-pn 1 \
-ps 10 \
-st Success
18.2.4. Output
Output from the example given should look something like:
May 11, 2010 9:38:58 AM gov.nasa.jpl.oodt.cas.workflow.precondition.PreConditionedComponent passesPreConditions
INFO: Successfully passed action precondition 'EnsureServerFullyLoaded'
Workflows In State 'Success' (Page: 1/2; Total: 17):
- InstanceId = 'f394642b-a757-4061-8441-984f11dd9bef', ModelId = 'TestWorkflow'
- InstanceId = 'fec29fb3-a44b-42b0-868e-57daa31e7c6e', ModelId = 'TestWorkflow'
- InstanceId = 'd51b1d75-a70d-4e47-9a5e-36629c533937', ModelId = 'urn:acce:DemoWorkflow'
- InstanceId = 'fb56b73c-b405-4750-908d-e3fcd49614fc', ModelId = 'urn:acce:DemoWorkflow'
- InstanceId = 'bade9aec-f72f-43e6-9ce0-219629d06f94', ModelId = 'urn:acce:DemoWorkflow'
- InstanceId = 'abe9715a-1e89-4ea2-8731-99e03b29ddae', ModelId = 'urn:acce:DemoWorkflow'
- InstanceId = '2fcf6041-4aa1-46b8-8923-df1e67094c67', ModelId = 'urn:acce:DemoWorkflow'
- InstanceId = 'f06a6f83-e5de-4c88-8a90-2c959b3806ae', ModelId = 'urn:acce:DemoWorkflow'
- InstanceId = '02ff1981-8153-4ed1-a519-cf140d8fc2e4', ModelId = 'urn:acce:DemoWorkflow'
- InstanceId = '1e466d5f-483b-4d52-95bd-89579d73602e', ModelId = 'TestWorkflow'
- This output gives you a list of InstanceIds and the root workflow processor ModelId for that workflow.
- The line: ‘Workflows In State 'Success' (Page: 1/2; Total: 17):’ shows you that there is a total of two pages of workflows in Success state, and you are viewing the first page. It also tells you that there are a total of 17 workflows in Success state.
- In order to get more information on each workflow listed, take each InstanceId and pass it to the action PrintWorkflow as the -iid.
18.3. GetWorkflowsByCategory
18.3.1. Description
Gets a list of workflow InstanceIds in the given state category
18.3.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetWorkflowsByCategory \
-pn 1 \
-ps 10 \
-cat <Category>
- Category is any category returned by the action GetSupportedStates.
- -pn is the page number you want to view
- -ps is the page size or, in other words, the number of workflow processor you want to view at a time.
18.3.3. Example
This example gives you the first page of workflows in the category Done (Note: last page may not be a full page):
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetWorkflowsByCategory \
-pn 1 \
-ps 10 \
-cat Done
18.3.4. Output
May 11, 2010 2:23:08 PM gov.nasa.jpl.oodt.cas.workflow.precondition.PreConditionedComponent passesPreConditions
INFO: Successfully passed action precondition 'EnsureServerFullyLoaded'
Workflows In Category 'Done' (Page: 1/2; Total: 19):
- InstanceId = 'f394642b-a757-4061-8441-984f11dd9bef', ModelId = 'TestWorkflow', State = 'Success'
- InstanceId = 'fec29fb3-a44b-42b0-868e-57daa31e7c6e', ModelId = 'TestWorkflow', State = 'Success'
- InstanceId = 'd51b1d75-a70d-4e47-9a5e-36629c533937', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = '004ec01c-a3f0-4e63-8617-e2e6871df3ce', ModelId = 'urn:acce:DemoWorkflow', State = 'Failure'
- InstanceId = 'fb56b73c-b405-4750-908d-e3fcd49614fc', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = 'bade9aec-f72f-43e6-9ce0-219629d06f94', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = 'abe9715a-1e89-4ea2-8731-99e03b29ddae', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = '2fcf6041-4aa1-46b8-8923-df1e67094c67', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = 'f06a6f83-e5de-4c88-8a90-2c959b3806ae', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = '02ff1981-8153-4ed1-a519-cf140d8fc2e4', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- The line: ‘Workflows In Category 'Done' (Page: 1/2; Total: 19):’ shows you that there is a total of two pages of workflows in the category Done, and you are viewing the first page. It also tells you that there are a total of 19 workflows in the category Done.
- In order to get more information on each workflow listed, take each InstanceId and pass it to the action PrintWorkflow as the -iid.
18.4. GetWorkflowsByModelId
18.4.1. Description
Returns a page of Workflow Processors of a given ModelId
18.4.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetWorkflowsByModelId \
-pn 1 \
-ps 10 \
-mid <ModelId>
- ModelId is any ModelId returned by the action GetSupportedWorkflows.
- -pn is the page number you want to view
- -ps is the page size or, in other words, the number of workflow processor you want to view at a time.
18.4.3. Example
This example gives you the first page of workflows of ModelId ‘TestWorkflow’ (Note: last page may not be a full page):
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetWorkflowsByModelId \
-pn 1 \
-ps 10 \
-mid TestWorkflow
18.4.4. Output
May 11, 2010 3:02:44 PM gov.nasa.jpl.oodt.cas.workflow.precondition.PreConditionedComponent passesPreConditions
INFO: Successfully passed action precondition 'EnsureServerFullyLoaded'
Workflows (Page: 1/1; Total: 6):
- InstanceId = 'f394642b-a757-4061-8441-984f11dd9bef', ModelId = 'TestWorkflow', State = 'Success'
- InstanceId = 'fec29fb3-a44b-42b0-868e-57daa31e7c6e', ModelId = 'TestWorkflow', State = 'Success'
- InstanceId = '1e466d5f-483b-4d52-95bd-89579d73602e', ModelId = 'TestWorkflow', State = 'Success'
- InstanceId = '8d4d8636-946d-46e2-86ee-7a4ad0b6a305', ModelId = 'TestWorkflow', State = 'Success'
- InstanceId = '8ce19e83-8f27-4786-96e9-28c2896aa524', ModelId = 'TestWorkflow', State = 'Success'
- InstanceId = '1bc063ea-8fe9-4f3e-b11f-0a653c5fdf5a', ModelId = 'TestWorkflow', State = 'Success'
- The line: ‘Workflows (Page: 1/1; Total: 6):’ shows you that there only one page of workflows of ModelId ‘TestWorkflow’. It also tells you that there are a total of 6 workflows of ModelId ‘TestWorkflow’.
- In order to get more information on each workflow listed, take each InstanceId and pass it to the action PrintWorkflow as the -iid.
18.5. GetWorkflowsByMetadata
18.5.1. Description
Returns a page of Workflow Processors whose cached metadata match the given metadata
18.5.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetWorkflowsByMetadata \
-pn 1 \
-ps 10 \
[-m <key> <val> <val> . . . <val>]+
- -pn is the page number you want to view
- -ps is the page size or, in other words, the number of workflow processor you want to view at a time.
- -m must be specified 1-n times and is a metadata key followed by n number of values treated as ORs. Any number of ‘-m’ options may be used and they are treated as ANDs.
- NOTE: Only the cached metadata keys can be used here.
18.5.3. Example
This example returns Workflows which have workflow metadata CollectionLabel equal to Test.
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetWorkflowsByMetadata \
-pn 1 \
-ps 10 \
-m CollectionLabel Test
If in the example above we had put:‘. . . --m CollectionLabel Test Ops --m Group MetOpA’ that would have returned a page of workflows whose workflow metadata contained (CollectionLabel == Test OR Ops) AND (Group == MetOpA).):
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetWorkflowsByMetadata \
-pn 1 \
-ps 10 \
-m CollectionLabel Test Ops \
-m Group MetOpA
18.6. DeleteWorkflowsByState
18.6.1. Description
Deletes workflows in the given state
18.6.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a DeleteWorkflowsByState \
-st <State>
- State is any state returned by the action GetSupportedStates.
18.6.3. Example
This example deletes all workflows in state Failure:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a DeleteWorkflowsByState \
-st Failure
18.6.4. Output
May 11, 2010 3:23:22 PM gov.nasa.jpl.oodt.cas.workflow.precondition.PreConditionedComponent passesPreConditions
INFO: Successfully passed action precondition 'EnsureServerFullyLoaded'
Successfully Deleted Workflow '004ec01c-a3f0-4e63-8617-e2e6871df3ce'
Successfully Deleted Workflow '1501d636-762c-4606-82d2-dd301a5a3c31'
- This output gives you a list of InstanceIds for the workflows delete.
18.7. DeleteWorkflowsByCategory
18.7.1. Description
Deletes workflows in the given category
18.7.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a DeleteWorkflowsByCategory \
-cat <Category>
- Category is any category returned by the action GetSupportedStates.
18.7.3. Example
This example deletes all workflows in state Failure:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a DeleteWorkflowsByCategory \
-cat Done
18.7.4. Output
May 11, 2010 3:23:22 PM gov.nasa.jpl.oodt.cas.workflow.precondition.PreConditionedComponent passesPreConditions
INFO: Successfully passed action precondition 'EnsureServerFullyLoaded'
Successfully Deleted Workflow '004ec01c-a3f0-4e63-8617-e2e6871df3ce'
Successfully Deleted Workflow '1501d636-762c-4606-82d2-dd301a5a3c31'
- This output gives you a list of InstanceIds for the workflows delete.
18.8. GetSupportedStates
18.8.1. Description
Gets a list of states supported by the server's engine
18.8.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetSupportedStates
18.8.3. Example
Same as usage.
18.8.4. Output
Category 'TRANSITION':
- ExecutionComplete : Execution Completed Successfully
- PreConditionSuccess : All PreCondition Finished Successfully
Category 'HOLDING':
- Paused : Has been manually paused
- Unknown : State is Unknown
Category 'INITIAL':
- Null : Uninitialied State
- Loaded : Loading Complete
Category 'RUNNING':
- Executing : Current being executed
- PostConditionEval : Executing PostConditions
- PreConditionEval : Executing PreConditions
Category 'DONE':
- Failure : Execution Failed
- Off : Turned OFF
- Stopped : Force Killed
- Success : Successfully Completed
Category 'WAITING':
- Blocked : Task Bailed
- Queued : Queued in WorkflowEngine
- Ready : Ready to run
- All States are listed (<state> : <description>), grouped by category, as described in section Workflow State Categories.
18.9. GetSupportedWorkflows
18.9.1. Description
Gets a list of workflow ModelIds supported by the server's engine
18.9.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetSupportedWorkflows
18.9.3. Example
Same as usage.
18.9.4. Output
Supported Workflow ModelIds:
- L0b
- urn:acce:DemoWorkflow
- cond1
- urn:npp:MoaIasiAnalysis
- urn:acce:DemoPGE
- urn:npp:MOA_IASI_L1C_QueryCondition_FileBased
- L0c
- cond2
- cond3
- Orbit
- urn:npp:MoaAnalysis
- L0d
- TestWorkflow
- urn:acce:DemoPGERequiredMetadata
- GeoCal
- Ane
- urn:npp:ECMWF_L1_QueryCondition
- L0a
- L1a
- Geo
- This is a list of workflow ModelIds which can be used for -mid for action StartWorkflow.
18.10. PrintWorkflow
18.10.1. Description
Prints out processor skeleton
18.10.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a PrintWorkflow \
-iid <InstanceId>
- InstanceId is any InstanceId for a workflow put into the server engine (i.e. by using action StartWorkflow).
18.10.3. Example
This example prints out the workflows processor skeleton (similar to a workflows model graph, except it is the workflow processors graph – which additionally contains state and workflow dynamic metadata):
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a PrintWorkflow \
-iid f394642b-a757-4061-8441-984f11dd9bef
18.10.4. Output
Instance 'f394642b-a757-4061-8441-984f11dd9bef'
[id = 'TestWorkflow', name = 'TestWorkflow', execution = 'sequential', state = 'Success']
[id = 'Orbit', name = 'Orbit', execution = 'task', state = 'Success']
[id = 'tester', name = 'tester', execution = 'sequential', state = 'Success']
[id = 'edb88451-5dba-4ad8-bb59-aed04634e0b4', name = 'edb88451-5dba-4ad8-bb59-aed04634e0b4', execution = 'parallel', state = 'Success']
[id = 'L0', name = 'L0', execution = 'sequential', state = 'Success']
[id = 'L0a', name = 'L0a', execution = 'task', state = 'Success']
[id = 'L0b', name = 'L0b', execution = 'task', state = 'Success']
[id = 'L0c', name = 'L0c', execution = 'task', state = 'Success']
[id = 'L0d', name = 'L0d', execution = 'task', state = 'Success']
[id = 'L1a', name = 'L1a', execution = 'sequential', state = 'Success']
[id = 'L1aTask', name = 'L1aTask', execution = 'task', state = 'Success']
[id = '5c9a9a95-0d3c-44d7-8eff-38b3ece5f087', name = '5c9a9a95-0d3c-44d7-8eff-38b3ece5f087',
execution = 'sequential', state = 'Success']
[id = 'Ane', name = 'Ane', execution = 'task', state = 'Success']
[id = 'da41aab1-b840-45aa-a983-d66c74def9c5', name = 'da41aab1-b840-45aa-a983-d66c74def9c5', execution = 'parallel', state = 'Success']
[id = 'GeoCal', name = 'GeoCal', execution = 'sequential', state = 'Success']
[id = 'GeoCalTask', name = 'GeoCalTask', execution = 'task', state = 'Success']
[id = 'Geo', name = 'Geo', execution = 'sequential', state = 'Success']
[id = 'GeoTask', name = 'GeoTask', execution = 'task', state = 'Success']
- Grocery Store example, described in section Lifecycles in Lifecycles, explains how one interprets this output.
18.11. DescribeWorkflow
18.11.1. Description
Prints out no-recur processor skeleton describing info
18.11.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a DescribeWorkflow \
-iid <InstanceId> \
-mid <ModelId>
- InstanceId is any workflows InstanceId in the workflow engine.
- ModelId is any ModelId for any workflow (sub-workflow) in the workflow attached to InstanceId
18.11.3. Example
This action expands on the PrintWorkflow action (See Output). After you print out a workflow, you can then hone in on a particular workflow processor in that workflow to find out its metadata, state, etc . . . in this case, we want to see information on Orbit (TestWorkflow’s first child workflow).
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a DescribeWorkflow \
-iid f394642b-a757-4061-8441-984f11dd9bef \
-mid Orbit
18.11.4. Output
Processor [id = 'Orbit', name = 'Orbit']
- instance = 'f394642b-a757-4061-8441-984f11dd9bef'
- execution = 'task'
- state = 'Success : No PostConditions; successfully completed : Successfully Executed Task 'e1386a57-ce7e-48a2-a4a2-f2f73be0e2bb' : No key name requested to look for'
- priority = 'CUSTOM : 5.1'
- execusedSubProcessors = ''
- static metadata =
+ ImportedKey -> 'ImportedValue'
- dynamic metadata =
+ Orbit_key -> 'f394642b-a757-4061-8441-984f11dd9bef'
- instance: this is the InstanceId of the workflow processor it belongs to.
- execution: this is the execution type (i.e. parallel, sequential, task, or condition).
- state: this is the state and message attached to the state. The message is a concatenation of messages, separated by colons. In this example, we see that Orbit is in state Success, and it reached this state because 1) ‘No key name requested to look for’ (this is a message from the TaskInstance itself, the job of this TestWorkflow is to verify metadata flow, this is done by tasks looking for metadata keys which are set by task and conditions that run before it, since Orbit is the first task in this workflow, there are no keys for it to look for), and 2) ‘Successfully Executed Task 'e1386a57-ce7e-48a2-a4a2-f2f73be0e2bb'’ (this tells you that execution of its TaskInstance, with JobId = ‘e1386a57-ce7e-48a2-a4a2-f2f73be0e2bb’, was successful), and 3) ‘No PostConditions; successfully completed’ (this tells you that it completed successfully and it had no post-conditions).
- We also see priority (see Workflow Priority), excusedSubProcessors (see Excused Sub-Processors), and static/dynamic metadata (see Context).
18.12. DeleteWorkflow
18.12.1. Description
Deletes Workflow and Task Instances
18.12.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a DeleteWorkflow \
-iid <InstanceId>
- InstanceId is any workflow processors InstanceId in the workflow engine.
18.12.3. Example
This example will delete the workflow in the system with InstanceId equal to ‘004ec01c-a3f0-4e63-8617-e2e6871df3ce’:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a DeleteWorkflow \
-iid 004ec01c-a3f0-4e63-8617-e2e6871df3ce
18.12.4. Output
May 11, 2010 3:23:22 PM gov.nasa.jpl.oodt.cas.workflow.precondition.PreConditionedComponent passesPreConditions
INFO: Successfully passed action precondition 'EnsureServerFullyLoaded'
Successfully Deleted Workflow '004ec01c-a3f0-4e63-8617-e2e6871df3ce'
18.13. GetRunnablesPage
18.13.1. Description
Returns a page of runnable Workflow Task Processors
18.13.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetRunnablesPage \
-pn <PageNumber> \
-ps <PageSize>
- PageNumber is the page number you want to view
- PageSize is the page size or, in other words, the number of workflow
18.13.3. Example
This example gets the first page of size 10 of all processor tasks (and conditions) ready to run:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetRunnablesPage \
-pn 1 \
-ps 10
18.13.4. Output
Workflows (Page: 1/1; Total: 5):
- InstanceId = '9e3be2d4-fb9d-491b-963b-53406c00f4e4', ModelId = 'urn:acce:DemoPGERequiredMetadata', State = 'Ready'
- InstanceId = '4cccd377-4421-419a-8f37-ed5240979e82', ModelId = 'urn:acce:DemoPGERequiredMetadata', State = 'Ready'
- InstanceId = 'adb466ad-ddf3-4013-8fb4-a01bebd1c835', ModelId = 'urn:acce:DemoPGERequiredMetadata', State = 'Ready'
- InstanceId = '8428a38e-6e17-4ad3-80c0-380c0ac57721', ModelId = 'urn:acce:DemoPGERequiredMetadata', State = 'Ready'
- InstanceId = '87e7a583-5e85-41f5-9189-e41799a3448f', ModelId = 'urn:acce:DemoPGERequiredMetadata', State = 'Ready'
- InstanceId is the InstanceId of the workflow it belongs to.
- ModelId is the task (or condition) ModelId
- State is the state the task or condition is in.
18.14. GetExecutingPage
18.14.1. Description
Returns a page of executing Workflow Task Processors
18.14.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetExecutingPage \
-pn <PageNumber> \
-ps <PageSize>
- PageNumber is the page number you want to view
- PageSize is the page size or, in other words, the number of workflow
18.14.3. Example
This example gets the first page of size 10 of all processor tasks (and conditions) ready to run:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetExecutingPage \
-pn 1 \
-ps 10
18.14.4. Output
Workflows (Page: 1/1; Total: 1):
- InstanceId = '88e92a9a-8f37-45d7-abd5-f6e83ce1e38d', ModelId = 'urn:acce:DemoPGERequiredMetadata', State = 'Executing'
- InstanceId = '87e7a583-5e85-41f5-9189-e41799a3448f', ModelId = 'urn:acce:DemoPGERequiredMetadata', State = 'Ready'
- InstanceId is the InstanceId of the workflow it belongs to.
- ModelId is the task (or condition) ModelId
- State is the state the task or condition is in (tasks in the executing queue can be in Ready
18.15. GetPage
18.15.1. Description
Returns a page of Workflow Processors
18.15.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetPage \
-pn <PageNumber> \
-ps <PageSize>
- PageNumber is the page number you want to view
- PageSize is the page size or, in other words, the number of workflow
18.15.3. Example
This example gets the first page of size 10 of all workflows in the engine:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetPage \
-pn 1 \
-ps 10
18.15.4. Output
Workflows (Page: 1/7; Total: 66):
- InstanceId = '818aa23c-d025-4fb8-a66f-888e9a6b6430', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = 'f394642b-a757-4061-8441-984f11dd9bef', ModelId = 'TestWorkflow', State = 'Success'
- InstanceId = 'fec29fb3-a44b-42b0-868e-57daa31e7c6e', ModelId = 'TestWorkflow', State = 'Success'
- InstanceId = '2f8dee98-520d-40fe-81f0-e036350a907a', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = '419ecc11-9792-4e64-b4d3-888e680483ce', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = '70d6a09a-8b8e-43ba-935d-7aad1d005f18', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = '72827858-c22f-4b17-adb0-33621c7b5712', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = 'a0118f1f-2f4f-478b-a27d-6938cfb6dfe3', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = '03173c22-0e75-4df8-9fb4-d9c302bad150', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId = '41c66961-2638-4cd7-9cb1-15dfd3918aae', ModelId = 'urn:acce:DemoWorkflow', State = 'Success'
- InstanceId is the InstanceId of the workflow.
- ModelId is the ModelId of the root workflow processor.
- State is the state that the root workflow processor is in.
- NOTE: Workflows returned by this action are not sorted in any fashion or ordered in any way – meant to be a quick-pager through workflows in the engine. For sorted results see GetSortedPage.
18.16. GetSortedPage
18.16.1. Description
Returns a sorted results page of Workflow Processors
18.16.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetSortedPage \
-cmpr <Comparator> \
-pn <PageNumber> \
-ps <PageSize>
- Comparator is one of the 4: AliveTime, CreationDate, ExecutionDate, or CompletionDate
- AliveTime : How long it took the root workflow processor of a given workflow to reach a state in the category Done.
- CreationDate : The date the workflow was put into the engine.
- ExecutionDate : The date at which the workflow changed to state Executing.
- CompletionDate : The date at which the workflow reached a state in the category Done.
- PageNumber is the page number you want to view
- PageSize is the page size or, in other words, the number of workflow
18.16.3. Example
This example gets the first page of size 10 of workflows in the engine sorted by CreationDate in ascending order:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetSortedPage \
-cmpr CreationDate \
-pn 1 \
-ps 10
18.16.4. Output
May 12, 2010 2:14:48 PM gov.nasa.jpl.oodt.cas.workflow.precondition.PreConditionedComponent passesPreConditions
INFO: Successfully passed action precondition 'EnsureServerFullyLoaded'
Workflows (Page: 1/7; Total: 66):
- InstanceId = 'f394642b-a757-4061-8441-984f11dd9bef', ModelId = 'TestWorkflow', State = 'Success', CreationDate = 'Wed May 05 10:56:51 PDT 2010'
- InstanceId = '8ce19e83-8f27-4786-96e9-28c2896aa524', ModelId = 'TestWorkflow', State = 'Success', CreationDate = 'Wed May 05 12:20:12 PDT 2010'
- InstanceId = '8d4d8636-946d-46e2-86ee-7a4ad0b6a305', ModelId = 'TestWorkflow', State = 'Success', CreationDate = 'Wed May 05 13:13:04 PDT 2010'
- InstanceId = 'fec29fb3-a44b-42b0-868e-57daa31e7c6e', ModelId = 'TestWorkflow', State = 'Success', CreationDate = 'Wed May 05 13:21:31 PDT 2010'
- InstanceId = '1bc063ea-8fe9-4f3e-b11f-0a653c5fdf5a', ModelId = 'TestWorkflow', State = 'Success', CreationDate = 'Wed May 05 13:53:26 PDT 2010'
- InstanceId = '226cd024-3a15-4e9f-90c1-6f3b9952f81a', ModelId = 'urn:acce:DemoWorkflow', State = 'Success', CreationDate = 'Wed May 05 13:59:02 PDT 2010'
- InstanceId = '570ba374-1a59-48a9-bf68-9ec439d281e4', ModelId = 'urn:acce:DemoWorkflow', State = 'Success', CreationDate = 'Wed May 05 14:12:08 PDT 2010'
- InstanceId = 'fb56b73c-b405-4750-908d-e3fcd49614fc', ModelId = 'urn:acce:DemoWorkflow', State = 'Success', CreationDate = 'Wed May 05 14:16:08 PDT 2010'
- InstanceId = 'abe9715a-1e89-4ea2-8731-99e03b29ddae', ModelId = 'urn:acce:DemoWorkflow', State = 'Success', CreationDate = 'Wed May 05 14:24:30 PDT 2010'
- InstanceId = 'f06a6f83-e5de-4c88-8a90-2c959b3806ae', ModelId = 'urn:acce:DemoWorkflow', State = 'Success', CreationDate = 'Wed May 05 14:29:05 PDT 2010'
- InstanceId is the InstanceId of the workflow.
- ModelId is the ModelId of the root workflow processor.
- State is the state that the root workflow processor is in.
- CreationDate is the date at which the workflow was created.
18.17. GetPercentLoaded
18.17.1. Description
Returns percent of loaded (cached) Workflow Processors – important because some actions/events can’t be performed if not all Workflow Processors are cached (Caching needs loading time after server restart with Workflow Processors in queue).
18.17.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a GetPercentLoaded
18.17.3. Example
Same as usage.
18.17.4. Output
Workflows Loaded (Percent: '100%', Decimal: '1', Faction: '66/66', ETA: '0 mins')
- Percent is the percentage of workflows loaded
- Decimal is the decimal number of workflows loaded
- Faction is <workflows loaded>/<total number of workflows>
- ETA is the estimated time until fully loaded in minutes
18.18. PagedQuery
18.18.1. Description
Performs a paged query on workflow task instance jobs
18.18.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a PagedQuery \
-q <Query> \
-pn <PageNumber>\
-ps <PageSize>
- Query is a query for TaskInstance metadata repo (see Metadata). It is of the form: <metadata name> == ‘<metadata_value>’. It supports ANDs and ORs and ( ) for precedence.
- PageNumber is the page number you want to view
- PageSize is the page size or, in other words, the number of workflow
18.18.3. Example
This example will give you all the metadata for each TaskInstance whose ModelId is equals to Orbit:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a PagedQuery \
-q “ModelId == ‘Orbit’ \
-pn 1 \
-ps 10
18.18.4. Output
Task Instance Metadata (Page: 1/1; Total: 2)
Metadata: (ReadyDate='2010-05-12T23:34:19.817Z', Host='spade.jpl.nasa.gov', State='ExecutionComplete', test_test='sdfsdf', ImportedKey='ImportedValue', InstanceId='bbc056ca-7310-4bf9-8445-819f333c95fe', urn:CatalogService:TransactionId='b8a9835e-64d4-4402-9c4a-bc42fe2df877', JobId='b8a9835e-64d4-4402-9c4a-bc42fe2df877', ExecutionDate='2010-05-12T23:34:22.161Z', CompletionDate='2010-05-12T23:34:22.608Z', urn:CatalogService:CatalogIds='urn:PEATE:WorkflowInstancesCatalog', Orbit_key='bbc056ca-7310-4bf9-8445-819f333c95fe', ModelId='Orbit', CreationDate='2010-05-12T23:34:19.694Z')
Metadata: (ReadyDate='2010-05-12T23:34:13.620Z', Host='spade.jpl.nasa.gov', State='ExecutionComplete', test_test='sdfsdf', ImportedKey='ImportedValue', InstanceId='1936500f-e15e-4c32-b7da-77ce320c0bc7', urn:CatalogService:TransactionId='2a052276-44fd-410e-8782-76e456b7469c', JobId='2a052276-44fd-410e-8782-76e456b7469c', ExecutionDate='2010-05-12T23:34:15.274Z', CompletionDate='2010-05-12T23:34:15.418Z', urn:CatalogService:CatalogIds='urn:PEATE:WorkflowInstancesCatalog', Orbit_key='1936500f-e15e-4c32-b7da-77ce320c0bc7', ModelId='Orbit', CreationDate='2010-05-12T23:34:13.587Z')
- ‘Task Instance Metadata (Page: 1/1; Total: 2)’ tells you that your found a total of 2 TaskInstances with your query and that your are on page 1 of 1.
- Metadata: lines are the metadata found for each TaskInstance (there are 2 of them, one for each TaskInstance).
- If you want to filter the metadata to a limited set of keys, then you should use ReducedPagedQuery.
18.19. ReducedPagedQuery
18.19.1. Description
Performs a reduced paged query on workflow task instance jobs
18.19.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a ReducedPagedQuery \
-q <Query> \
-rt <ReducedTerms> \
-pn <PageNumber>\
-ps <PageSize>
- Query is a query for TaskInstance metadata repo (see Metadata). It is of the form: <metadata name> == ‘<metadata_value>’. It supports ANDs and ORs and ( ) for precedence.
- ReducedTerms is a spaced separated list of the limited metadata keys you would like returned for each TaskInstance found by the Query.
- PageNumber is the page number you want to view
- PageSize is the page size or, in other words, the number of workflow
18.19.3. Example
This example will give you the InstanceId, JobId, and CompletionDate for each TaskInstance whose ModelId is equals to Orbit:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a ReducedPagedQuery \
-q “ModelId == ‘Orbit’ \
-rt InstanceId JobId CompletionDate \
-pn 1 \
-ps 10
18.19.4. Output
Task Instance Metadata (Page: 1/1; Total: 2)
Metadata: (InstanceId = 'bbc056ca-7310-4bf9-8445-819f333c95fe', JobId = 'b8a9835e-64d4-4402-9c4a-bc42fe2df877', CompletionDate = '2010-05-12T23:34:22.608Z')
Metadata: (InstanceId = '1936500f-e15e-4c32-b7da-77ce320c0bc7', JobId = '2a052276-44fd-410e-8782-76e456b7469c', CompletionDate = '2010-05-12T23:34:15.418Z')
- ‘Task Instance Metadata (Page: 1/1; Total: 2)’ tells you that your found a total of 2 TaskInstances with your query and that your are on page 1 of 1.
- Metadata: lines are the metadata found for each TaskInstance (there are 2 of them, one for each TaskInstance).
18.20. ChangeWorkflowState
18.20.1. Description
Changes a workflow's state
18.20.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a ChangeWorkflowState \
-iid <InstanceId> \
-mid <ModelId> \
-st <State>
- InstanceId is any workflows InstanceId in the workflow engine.
- ModelId is any ModelId for any workflow (sub-workflow) in the workflow attached to InstanceId
- State is any state from Workflow States, which you would like to change the workflow processor, with the InstanceId and ModelId specified, to.
18.20.3. Example
This example will pause the Orbit task processor:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a ChangeWorkflowState \
-iid 004ec01c-a3f0-4e63-8617-e2e6871df3ce \
-mid Orbit \
-st Paused
18.20.4. Output
This command has no output if it is successful, it will just return. If you want to verify that if properly executed you can perform a PrintWorkflow passing it the InstanceId or a DescribeWorkflow giving it both the InstanceId and ModelId (i.e. in this case, Orbit).
18.21. ChangeWorkflowPriority
18.21.1. Description
Changes a workflow's priority
18.21.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a ChangeWorkflowPriority\
-iid <InstanceId> \
-mid <ModelId> \
-p <Priority>
- InstanceId is any workflows InstanceId in the workflow engine.
- ModelId is any ModelId for any workflow (sub-workflow) in the workflow attached to InstanceId
- Priority is any double between the values 1 to 10, with 1 being the lowest priority and 10 the highest. (See Workflow Priority).
18.21.3. Example
This example will set the Orbit task processor’s priority to 8.2:
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a ChangeWorkflowPriority\
-iid 004ec01c-a3f0-4e63-8617-e2e6871df3ce \
-mid Orbit \
-p 8.2
18.21.4. Output
This command has no output if it is successful, it will just return. If you want to verify that if properly executed you can perform a DescribeWorkflow giving it both the InstanceId and ModelId (i.e. in this case, Orbit).
18.22. PrintSupportedEvents
18.22.1. Description
Prints the events registered with server engine
18.22.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a PrintSupportedEvents \
18.22.3. Example
Same as usage.
18.22.4. Output
Events:
Event:
Id: RunTest
Description: Runs TestWorkflow
Event:
Id: GranuleMaps
Description: Generates MetOpA Granule Maps
Event:
Id: DeleteWorkflowsByState
Description: Deletes workflows by state - Requires input metadata field: 'State'
Event:
Id: DeleteWorkflowsByCategory
Description: Delete workflows by category - Requires input metadata field: 'Category'
Event:
Id: GeneratePerformanceReport
Description: Generates Workflow Performance Report
- See Events Enumerated for how to use this information.
18.23. StartWorkflow
18.23.1. Description
Starts a workflow
18.23.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a StartWorkflow \
-mid <ModelId> \
-p <Priority> \
[-m <key> <val> <val> . . . <val>]*
- ModelId is any ModelId returned by the action GetSupportedWorkflows.
- Priority is any double between the values 1 to 10, with 1 being the lowest priority and 10 the highest. (See Workflow Priority).
- -m is optional, may be specified 0-n times, and is for passing metadata to the workflow.
18.23.3. Example
This example will create a workflow processor for the ModelId urn:acce:DemoWorkflow and add it to the engine’s queue to be run. It sets its priority and all conditions and children workflows priority to 6.0 and sends all TaskInstances to the ‘local’ queue when sent to the ResourceRunner (see Runner):
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a StartWorkflow \
-mid urn:acce:DemoWorkflow \
-p 6.0 \
-m QueueName local
18.23.4. Output
May 12, 2010 5:17:25 PM gov.nasa.jpl.oodt.cas.workflow.server.action.StartWorkflow performAction
INFO: Started workflow [ModelId = 'urn:acce:DemoWorkflow',InstanceId = 'a32c2263-3ad2-48a1-b92a-78cf805d2857']
- This tells you that it started ‘urn:acce:DemoWorkflow’ and assigned it an InstanceId of ‘a32c2263-3ad2-48a1-b92a-78cf805d2857’. You can use this InstanceId to track your workflow by using the PrintWorkflow action.
18.24. Shutdown
18.24.1. Description
Shuts down the Workflow Engine Server
18.24.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a Shutdown
18.24.3. Example
Same as usage.
18.24.4. Output
There is no output for this action. When it returns, this means that the server has shutdown. If the server is not up then you will get output telling you if fail to connect to the server for the given client factory bean ‘WorkflowEngineClientFactory’.
19. Events Enumerated
All events are run using the TriggerEvent action. All event have the same output, which lets you know where the event was triggered successfully or failed to succeed. All event logging goes to the server logs, since events run on the server.
19.1. RunTest
19.1.1. Description
Runs a test workflow to check metadata flow throw the system
19.1.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a TriggerEvent \
-eid RunTest
19.1.3. Example
Same as usage.
19.2. DeleteWorkflowsByState
19.2.1. Description
Deletes workflows by state (all performed on the server side – cannot be cancelled so make sure you have it right). Must specify the state using the ‘State’ metadata key.
19.2.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a TriggerEvent \
-eid DeleteWorkflowsByState \
-m State <State>
- State is any state returned by the action GetSupportedStates.
19.2.3. Example
This example deletes all workflows in the state Failure.
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a TriggerEvent \
-eid DeleteWorkflowsByState \
-m State Failure
19.3. DeleteWorkflowsByCategory
19.3.1. Description
Deletes workflows by category (all performed on the server side – cannot be cancelled so make sure you have it right). Must specify the category using the ‘Category’ metadata key.
19.3.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a TriggerEvent \
-eid DeleteWorkflowsByCategory \
-m Category <Category>
- Category is any category returned by the action GetSupportedStates.
19.3.3. Example
This example deletes all the workflows in category Done.
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a TriggerEvent \
-eid DeleteWorkflowsByCategory \
-m Category Done
19.4. GeneratePerformaceReport
19.4.1. Description
Generates Workflow Performance Report.
19.4.2. Usage
$ ./engine-client \
-cfb WorkflowEngineClientFactory \
-a TriggerEvent \
-eid GeneratePerformanceReport
19.4.3. Example
Same as usage.
19.4.4. Output
This does not have any logging output on the client side, like all other events, however, this event generates a file: $PCS_SUPPORT_HOME/performance/report.txt. If you rerun this event, it will write over the last file generated, so if for some reason you want to keep old reports, make sure to copy the current report file somewhere before running this event. Here is an example of a report.txt file:\\
Performance Report Generated on: Thu Jan 14 14:28:33 PST 2010
Time Up: 76.056 hours
*** Overall Performance Report ***
- Total Workflows Analyzed: 593
- Workflow Throughput: 7 workflows per hour
- Max Workflow Runtime: 1352 mins
- Min Workflow Runtime: 0 mins
- Average Workflow Runtime: 137 mins
*** 'urn:npp:MoaAnalysis' Performance Report ***
- Total Workflows Analyzed: 592
- Workflow Throughput: 7 workflows per hour
- Max Workflow Runtime: 1352 mins
- Min Workflow Runtime: 2 mins
- Average Workflow Runtime: 140 mins
*** 'urn:npp:GranuleMaps' Performance Report ***
- Total Workflows Analyzed: 1
- Workflow Throughput: 0 workflows per hour
- Max Workflow Runtime: 2 mins
- Min Workflow Runtime: 2 mins
- Average Workflow Runtime: 2 mins
The first line tells you the date at which the report.txt file was generated. The Time Up is the hours from the earliest CreationDate of a workflow still in the engine to the date at which this event was run. The Overall Performance Report section is the combined reporting of all workflows of all ModelIds in the engine. After this section, there is a section for each ModelId of workflows in the engine. Only workflows which have reached the state Success are used to generate this report. So in other words, this is really a performance report that tells you how many successful workflows you are getting run to completion in the engine every hour. This performance information is generated at the moment you run this event. The engine does not keep track of this information real-time, so if you delete a workflow in Success state from the engine, it will not be included in any performance calculations run after its deletion. Each section of this report has the same layout; the only difference is the list of workflows used to generate each section. Here is the description of how each part of the section is calculated:
- Total Workflows Analyzed: This is the number of workflows which where used for the performance calculations for this section.
- Workflow Throughput: This is the number of workflows in this section that have completed successfully on average per hour.
- Equation: Truncate(<Total Workflows Analyzed>/*<*Time up>)
- Max Workflow Runtime: This is the longest time it took a workflow in this section to complete (NOTE: This number will go up as the ratio of workflows to resources gets larger).
- Equation: for all workflows find Max(<CompletionDate> - <CreationDate>)
- Min Workflow Runtime: This is the shortest time it took a workflow in this section to complete (NOTE: This number will go up as the ratio of workflows to resources gets larger).
- Equation: for all workflows find Min(<CompletionDate> - <CreationDate>)
- Average Workflow Runtime: This is the average time it took a workflow in this section to complete (NOTE: This number will go up as the ratio of workflows to resources gets larger).
- Equation: <Sum of workflow runtimes>/<Total Workflows Analyzed>
Figure 12
20. Configuration
This is the part where all the magic parts up until now get explained. You will learn how to write your own workflows and plug them into the engine, plug in your custom WorkflowProcessor, and understand the workflow engine configuration and how to customize it to your needs.
20.1. XML Model Graphs
Workflows are defined using XML. They have a very similar feel to what you have already seen in the Grocery Store example (See Lifecycles in Lifecycles). All of your workflow XML files must begin and end as follows:
<?xml version="1.0" encoding="UTF-8"?>
<cas:workflows
xmlns="http://oodt.jpl.nasa.gov/2.0/cas"
xmlns:cas="http://oodt.jpl.nasa.gov/2.0/cas">
…
…
…
</cas:workflows>
The workflows you define will exist inside of cas:workflow. In case you are unfamiliar with XML, here are a few “XML buzz words” which you will need to understand: Markup, Content, Tag, Element, Attribute, and Namespace.
20.1.1. Markup
All strings which constitute markup either begin with the character ‘<’ and end with a ‘>’, or begin with the character ‘&’ and end with a ‘;’.
20.1.2. Content
Strings of characters which are not markup are content.
20.1.3. Tag
A markup construct that begins with ‘<’ and ends with ‘>’. Tags come in three flavors: start-tags, for example <section>, end-tags, for example </section>, and empty-element tags, for example <line-break/>.
20.1.4. Element
A logical component of a XML document which either begins with a start-tag and ends with a matching end-tag, or consists only of an empty-element tag. The characters between the start- and end-tags, if any, are the element's content, and may contain markup, including other elements, which are called child elements. An example of an element is <Greeting>Hello, world.</Greeting>. The <Greeting> element is the markup, and the text “Hello, world.” is the content.
20.1.5. Attribute
A markup construct consisting of a name/value pair that exists within a start-tag or empty-element tag. In the example (below) the element img has two attributes, src and alt: <img src="madonna.jpg" alt='by Raphael'/>. Another example would be <step number="3">Connect A to B.</step> where the name of the attribute is "number" and the value is "3".
20.1.6. Namespace
Used for providing uniquely named elements and attributes in an XML document. If you are familiar with C++ programming, this XML namespacing is very similar. It helps to prevent two elements with the same name, intended for different use, from being confused as being the same element.
Now that you are XML savvy, let’s analyze the workflow XML which we introduced above.
- The first line is required as the first line in all XML files. Don’t worry about understanding this line for now, just make sure you put it there. It is just specifying the XML version and the encoding which your content will be in:
- <?xml version="1.0" encoding="UTF-8"?>
- The next line defines the workflows element inside the cas namespace.
- The second defines cas to be the default namespace inside the workflows element. Making cas the default namespace means that any element which is not assigned a namespace in side the workflows element will be considered part of the cas namespace.
- The third line defines the cas namespace.
- The last line defines the end of the workflows element.
20.1.7. Defining Workflows
There are two ways to define a workflow:
1) <[execution-type] id="[model-id]"></[execution-type]>
- execution-type : the id of the workflow processor you want to use (i.e. parallel, sequential, task, or condition)
- model-id : the id you want to give this workflow
- EXAMPLE:
o <parallel id="TestWorkflow"></parallel>
2) <workflow id="[model-id]" execution=“[execution-type]”></workflow>
- execution-type : the id of the workflow processor you want to use (i.e. parallel, sequential, task, or condition)
- model-id : the id you want to give this workflow
- EXAMPLE:
o <workflow id="TestWorkflow" execution=“parallel”></workflow>
Option 1 is usually the preferred way (less typing), so all examples here on out will use option 1. However, any attributes or elements allowed by option 1, also apply to option 2. So, besides there being less typing, there is no real benefit between one or the other.
Other attributes supported by workflow elements are:
- name : allows you to give a more readable name if so desired.
- id-ref : allows you to reference an already defined workflow.
- priority : allows you to specify the priority for the workflow.
- excused : allows you to set this workflow’s children workflows as excused (See Excused Sub-Processors).
- entryPoint : allows you to set this workflow as an entry-point.
- alias : when performing an id-ref, you may have to alias the id to something else if you already have same workflow id used for a sub-processor of an entry-point workflow.
- class : for task and condition workflows, this allows you to specify the TaskInstance class.
Let’s use our Grocery Store example again (See Lifecycles in Lifecycles), and implements it workflow model in XML:
<sequential id="BuyGroceries">
<conditions execution="parallel" type="pre">
<condition id="FindWallet"/>
<condition id="FindKeys"/>
</conditions>
<task id="DriveToStore"/>
<parallel id="PurchaseGroceries">
<task id="YouPurchaseGroceries"/>
<task id="FriendPurchaseGroceries"/>
</parallel>
<task id="DriveHome"/>
</sequential>
We’ve already learned that each workflow can have pre and post-conditions, these are specified using the <conditions> element and setting the type attribute to either pre or post respectively. The attribute execution is optional; if not specified, the conditions will default to run in sequential. The type attribute is also optional, preconditions will be set to the conditions found in the first <conditions> element inside the workflow element, and the post-conditions will be the second. If you want just post-conditions, then you must use the type attribute, otherwise they will register as pre-conditions. Let’s say we now want the ability to run BuyGroceries without running its pre-conditions (just say we somehow already know we have our wallet and keys, and checking for them would just be a waste of time).
20.1.8. Entry Points
We can rework our model so that there is an entry point inside BuyGroceries. First we will rename BuyGroceries to BuyGroceries-DoCheck, then create wrapper workflow around all the task and we will now call this BuyGroceries:
<sequential id="BuyGroceries-DoCheck">
<conditions execution="parallel" type="pre">
<condition id="FindWallet"/>
<condition id="FindKeys"/>
</conditions>
<sequential id="BuyGroceries" entryPoint="true">
<task id="DriveToStore"/>
<parallel id="PurchaseGroceries">
<task id="YouPurchaseGroceries"/>
<task id="FriendPurchaseGroceries"/>
</parallel>
<task id="DriveHome"/>
</sequential>
</sequential>
By setting the new BuyGroceries workflow as an entry point, we can now skip running the pre-condition by telling the workflow engine to run from BuyGroceries instead of BuyGroceries-DoCheck. So, if we want to run with pre-conditions, then we will use the StartWorkflow action by giving it the --mid BuyGroceries-DoCheck, and if we don’t want pre-conditions to run, then we used --mid BuyGroceries. Many times a workflow is used in more than just one workflow model or the same task or conditions is used with slightly different static metadata to make it perform differently. This is accomplished by using references.
20.1.9. Workflow References
As mentioned above in Defining Workflows, we are able to create workflow reference by the use of the id-ref attribute. In our Grocery Store example, we could have used the same task for accomplishing YouPurchaseGroceries and FriendPurchaseGroceries, since the action is the same, it is just the person who performs the action that is different. So let’s rearrange our example a bit to use
id-ref attributes to accomplish this:
<sequential id="BuyGroceries-DoCheck">
<conditions execution="parallel" type="pre">
<condition id="FindWallet"/>
<condition id="FindKeys"/>
</conditions>
<sequential id="BuyGroceries" entryPoint="true">
<task id="DriveToStore"/>
<parallel id="PurchaseGroceries">
<task id-ref="PersonPurchaseGroceries"
alias="YouPurchaseGroceries">
<configuration>
<property name="Person" value="You"/>
</configuration>
</task>
<task id-ref="PersonPurchaseGroceries"
alias="FriendPurchaseGroceries">
<configuration>
<property name="Person" value="Friend"/>
</configuration>
</task>
</parallel>
<task id="DriveHome"/>
</sequential>
</sequential>
<task id="PersonPurchaseGroceries"/>
We have rearranged this such that we now have a task PersonPurchaseGroceries, which is used in two different ways, just by specifying a different Person via static metadata. We aliased the tasks too their original names, since we have to have unique ids inside a workflow graph entry point (i.e. in this case BuyGroceries-DoCheck and BuyGroceries). It is also worth noting, PersonPurchaseGroceries is now an entry point, since it has no parent. So one could start this workflow on it own and just set the Person metadata field in the dynamic metadata. That means that this definition leaves us with 3 entry points: BuyGroceries-DoCheck, BuyGroceries, and PersonPurchaseGroceries. What if we don’t want PersonPurchaseGroceries to be an entry, but we still want to be able to use this new id-ref ability? This is done by being a little creative and changing the workflow model as such:
…
…
<parallel id="PurchaseGroceries">
<task id="YouPurchaseGroceries">
<configuration>
<property name="Person" value="You"/>
</configuration>
</task>
<task id-ref="YouPurchaseGroceries"
alias="FriendPurchaseGroceries">
<configuration>
<property name="Person" value="Friend"/>
</configuration>
</task>
</parallel>
…
…
What we did was make the first task the definition of the task and gave it the id YouPurchaseGroceries and the second to id-ref the first, and alias it to FriendPurchaseGroceries and override the static metadata field Person. In this example we also dropped PersonPurchaseGroceries, since YouPurchaseGroceries is now the task definition. So this leaves us with just two entry points now: BuyGroceries-DoCheck and BuyGroceries. We learned above that task and conditions are actually just monitors (See Types) to their TaskInstance. We will now learn how to attach a TaskInstance to a task or condition.
20.1.10. TaskInstance Class
The TaskInstance is the actual code that runs when a task or conditions is given the resources to execute. TaskInstance is an abstract Java class with one abstract method that has the following signature:
protected abstract ResultsState performExecution(ControlMetadata crtlMetadata);
This method allows one to do some activity, then return its ResultsState to let their task or conditions know how execution went. Notice that this method is given ControlMetadata, this is a super metadata, so to say, that wraps all three types of metadata: static, dynamic, and local (See Context). This is what controls metadata precedence and is the place you would put you local metadata and gives you the ability to override or set new dynamic metadata. For now we will put off actually implementing a TaskInstance, and instead just assume we have written such a class and skip to plugging it into the workflow XML model. Let’s say our TaskInstance class, which we wrote, is: org.mytasks.BuyGroceriesTaskInstance. Let’s also assume we wrote the TaskInstance for the two conditions: org.myconds.FindWalletTaskInstance and org.myconds.FindKeysTaskInstance, and for our other two tasks: org.mytasks.DriveToStoreTaskInstance and org.mytasks.DriveHomeTaskInstance. So our model will now look like:
<sequential id="BuyGroceries-DoCheck">
<conditions execution="parallel" type="pre">
<condition id="FindWallet"
/>
<condition id="FindKeys"/>
/>
</conditions>
<sequential id="BuyGroceries" entryPoint="true">
<task id="DriveToStore"
/>
<parallel id="PurchaseGroceries">
<task id="YouPurchaseGroceries"
>
<configuration>
<property name="Person" value="You"/>
</configuration>
</task>
<task id-ref="YouPurchaseGroceries"
alias="FriendPurchaseGroceries">
<configuration>
<property name="Person" value="Friend"/>
</configuration>
</task>
</parallel>
<task id="DriveHome"
/>
</sequential>
</sequential>
Notice that FriendPuchaseGroceries does not use the class attribute, this is because it is basically saying copy the referenced task here, given it the new name given by alias, and replace/add static metadata as given – it inherits the class from the task it references.
20.1.11. Static Metadata Inheritance
There are times where several workflows require the same metadata field, and will usually use the same value. For example, let’s look at two of the tasks in the Grocery Store model: DriveToStore and DriveHome. In both of these, we require a car, and we know that we will be driving the same car to the store which we will be driving home (given that your car doesn’t brake down or anything unexpected happen). Let’s say these two tasks are told which car they will use to do the driving by the metadata field Vehicle. Since we want both to use the same car we can move the static metadata field up to their parent workflow so they both get the field set to the same car:
…
…
<sequential id="BuyGroceries" entryPoint="true">
<configuration> <property name="Vehicle" value="RX-8"/>
</configuration>
<task id="DriveToStore"
/>
<parallel id="PurchaseGroceries">
<task id="YouPurchaseGroceries"
>
<configuration>
<property name="Person" value="You"/>
</configuration>
</task>
<task id-ref="YouPurchaseGroceries"
alias="FriendPurchaseGroceries">
<configuration>
<property name="Person" value="Friend"/>
</configuration>
</task>
</parallel>
<task id="DriveHome"
/>
</sequential>
…
…
Workflow children inherit their parent’s static metadata, but if they specify the same metadata field, then their value will override the one inherited from their parent. So that means, DriveToStore and DriveHome will both inherit Vehicle set to RX-8. PuchaseGroceries and all its children will also inherit this metadata field, however, they aren’t looking for such a metadata field, so it will be ignored. However, if we don’t want PurchaseGroceries to be able to see Vehicle metadata field, then we can move it into each task:
…
…
<sequential id="BuyGroceries" entryPoint="true">
<task id="DriveToStore"
>
<configuration> <property name="Vehicle" value="RX-8"/>
</configuration>
</task>
<parallel id="PurchaseGroceries">
<task id="YouPurchaseGroceries"
>
<configuration>
<property name="Person" value="You"/>
</configuration>
</task>
<task id-ref="YouPurchaseGroceries"
alias="FriendPurchaseGroceries">
<configuration>
<property name="Person" value="Friend"/>
</configuration>
</task>
</parallel>
<task id="DriveHome"
>
<configuration> <property name="Vehicle" value="RX-8"/>
</configuration>
</task>
</sequential>
…
…
Notice though how this makes you have to change this in two places if you want to change the static metadata value for Vehicle. There are a few ways to get around having to change this in two place. The first is, maybe the value will never change, and if it does, then setting it in the dynamic metadata when you start the workflow is the proper way of changing it. Two other ways include using either envReplace or XInclude.
20.1.12. Environment Variable Replacement
Environment variables give a global way of changing values in all of your workflow model’s static metadata. Every <property> element supports an envReplace attribute, which if set to true will replace the parts of property values enclosed in brackets with the given environment variable specified inside the brackets:
…
…
<configuration> <property name="Vehicle" value="[VEHICLE]" envReplace="true"/>
</configuration>
…
…
The above configuration will replace [VEHICLE] with the value of the environment variable VEHICLE. If VEHICLE is not set, then [VEHICLE] will be replaced with null.
20.1.13. XInclude
XInclude is an XML standard which allows XML to be imported from one file into another. The XML file, from which the import comes, does not have to be a valid XML file itself. XInclude basically just says, take what is in this file and copy/paste it here into this file. So going back to our Grocery Store example where we didn’t want Vehicle visible to every task in PurchaseGroceries, we now have an alternative to environment variables. Here is how XInclude would be used:
properties.xml file contents:
<property name="Vehicle" value="RX-8"/>
Workflow Model file contents:
…
…
<sequential id="BuyGroceries" entryPoint="true">
<task id="DriveToStore"
>
<configuration> <xi:include href="properties.xml"
xmlns:xi="http://www.w3.org/2003/XInclude"/>
</configuration>
</task>
<parallel id="PurchaseGroceries">
<task id="YouPurchaseGroceries"
>
<configuration>
<property name="Person" value="You"/>
</configuration>
</task>
<task id-ref="YouPurchaseGroceries"
alias="FriendPurchaseGroceries">
<configuration>
<property name="Person" value="Friend"/>
</configuration>
</task>
</parallel>
<task id="DriveHome"
>
<configuration> <xi:include href="properties.xml"
xmlns:xi="http://www.w3.org/2003/XInclude"/>
</configuration>
</task>
</sequential>
</sequential>
…
…
The following XML element is performing the “copy/paste”:
<xi:include href="properties.xml" xmlns:xi="http://www.w3.org/2003/XInclude"/>
The above line ends up being replaced with everything inside properties.xml, which is:
<property name="Vehicle" value="RX-8"/>
So what we have is DriveToStore and DriveHome sharing the same Vehicle static metadata field, only one point of replacement, no environment variables need, and only those two tasks see that metadata field.
20.2. XML Processor Map
The XML Processor mapping file is what the workflow engine uses to take a workflow model and convert it to workflow processors. This files looks like:
<?xml version="1.0" encoding="UTF-8"?>
<processors default="sequential">
<!- don't change ids ->
<processor id="condition" class="gov.nasa.jpl.oodt.cas.workflow.processor.ConditionProcessor"/>
<processor id="task" class="gov.nasa.jpl.oodt.cas.workflow.processor.TaskProcessor"/>
<!- custom ->
<processor id="sequential" class="gov.nasa.jpl.oodt.cas.workflow.processor.SequentialProcessor"/>
<processor id="parallel" class="gov.nasa.jpl.oodt.cas.workflow.processor.ParallelProcessor"/>
</processors>
The default attribute attached to the processors element specifies which id of the defined processor elements is meant to be the default when no workflow execution is defined. Each processor element is mapping a WorkflowProcessor to an id. This id is what is used in the model to specify the workflow’s execution (See Defining Workflows).
20.3. Workflow XML Policy
This section starts out with an introduction to Spring, Java Beans, and Object Injection. If you are already aware of what they are and how they work, you can skip to the section Invocation of Server/Client.
20.3.1. Spring Framework
The Spring Framework is an open source object injection framework. It allows you to create objects via Java Beans specified in XML files. For more information go to: http://www.springsource.org
20.3.2. Java Beans
A Java Bean is a Java Object which uses setters and getters to change/access member variables. For example:
public class Person {
private String name;
public void setName(String name) {
this.name = name;
}
public String getName() {
return this.name;
}
}
The Person class is a Java Bean which has a property “name”. Java Bean properties are modified and accessed via their corresponding setter and getter methods, in this case: setName and getName.
20.3.3. Object Injection
Spring builds Java Objects via object injection. This is done by calling a Java Bean’s setter method for each property specified. In Spring, if you set the property name on some object, you are really just calling the method setName(String) on that object_. Here is an example of Spring creating a _Person object with name=Joe:
<bean id="Joe" class="Person">
<property name="name" value="Joe"/>
</bean>
This XML creates an object using the Person class, using the default constructor. After it creates this Person object, it calls its setter method setName with the argument value Joe. This Person object is given the id ‘Joe’ and is put into Spring’s object repository.
20.3.4. Invocation of Server/Client
In the command-line tutorial above (See Using Command-Line Client and Actions Enumerated), we used the engine script to start, stop, and debug the engine server and the engine-client script with always had the same argument --cfb WorkflowEngineClientFactory. This was just a simplification of things for the time being; we will now learn what engine start and engine-client --cfb WorkflowEngineClientFactory really do. When running engine start you are really executing a script you haven’t been introduced to yet: engine-server. engine-server allows you to start up different engine servers. These servers are configured via Java beans; each server is just an engine server bean configured custom to its needs. Severs are launched via their bean id as follows:
$ ./engine-server --sfb <ServerFactoryBeanId> [-d]
- ServerFactoryBeanId : This is the Spring bean id of the server you would like to launch.
- -d is optional and tells the server to come up in debug mode.
You first would configure your engine server bean in the Spring XML file and then you can launch it using engine-server. This script also allows you to list the currently configured engine bean ids in your Spring XML file:\\
$ ./engine-server -pss
- -pss will print out the supported server bean ids
Now that we know there is a bean which brings the engine up, it should make sense that there is a client bean which talks to this server. And that is what --cfb WorkflowEngineClientFactory has been all along. WorkflowEngineClientFactory is just the bean id of the configured client bean which talks to the server bean XmlRpcServerFactory. Now you should be able to looking at the engine script and have an idea of what is going on:
#!/bin/csh
# Copyright (c) 2009 California Institute of Technology.
# ALL RIGHTS RESERVED. U.S. Government Sponsorship acknowledged.
set operation
if ( $#argv < 1 ) then
_ echo "Usage: $0
"_
exit 1;
else
set operation = "$1"
endif
_if ( $
== "start" ) then_
./engine-server -sfb XmlRpcServerFactory
_else if ( $
== "debug" ) then_
./engine-server -sfb XmlRpcServerFactory -d
_else if ( $
== "stop" ) then_
./engine-client -cfb WorkflowEngineClientFactory -a Shutdown
_else if ( $
== "restart" ) then_
./engine-client -cfb WorkflowEngineClientFactory -a Shutdown
sleep 7
./engine-server -sfb XmlRpcServerFactory
else
_ echo "Usage: $0
"_
exit 1;
endif
So the start option just launches the server created by the XmlRpcServerFactory bean; debug brings up the same server, however the server is set to debug mode. The stop options uses the client action Shutdown, with the client bean WorkflowEngineClientFactory. This means that the server which XmlRpcServerFactory brought up, will be brought down by this stop option, since the WorkflowEngineClientFactory bean has been configured to create the client which talks to this server. And restart just performs a stop and start.
20.3.5. Server Configuration
We now know what Spring beans look like and that they have ids, and that these ids are used by the command-line utilities to start/stop severs and communicate with them via clients. Here we learn where these beans are defined and understand how to change them and/or add new ones. In the workflow policy directory, there are several Spring XML files; these files are where all the beans are defined. Here we are going to take a look at the XmlRpcServerFactory bean and understand what it is doing. This bean is defined in the engine-beans.xml file:\\
<bean id="XmlRpcServerFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.server.channel.xmlrpc.XmlRpcCommunicationChannelServerFactory">
<property name="port" value=_"$
"_/>
<property name="workflowEngineFactory" ref="WorkflowEngineLocalFactory"/>
</bean>
This bean is of class XmlRpcCommunicationChannelServerFactory. A workflow engine can be wrapped inside different servers (i.e. so a RMI based server could be written if so desired). XmlRpcServerFactory is a factory bean; engine-server will load this bean and then call its factory create method, which creates an XmlRpcCommunicationChannelServer, which is then launched. This factory bean has two properties: port and workflowEngineFactory. The port property tells this factory to create a server that come up on the port which _$
_ is equal to (for now just assume that _$
_ is somehow replaced with an integer like 9000). The workflowEngineFactory property is using Spring’s ref attribute, which tells Spring to set the workflowEngineFactory property equal the Spring bean with the id WorkflowEngineLocalFactory. Let’s take a look at WorkflowEngineLocalFactory. This factory bean is the one which will build our workflow engine, which is then placed into the XML-RPC server bean and launched:
<bean id="WorkflowEngineLocalFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.engine.WorkflowEngineLocalFactory">
<property name="modelRepoFactory" ref="XmlWorkflowModelRepositoryFactory"/>
<property name="processorRepoFactory"
ref="XStreamWorkflowProcessorRepositoryFactory"/>
<property name="instanceRepoFactory"
ref="LocalCatalogServiceInstanceRepositoryFactory"/>
<property name="eventRepoFactory" ref="SpringBasedEngineEventRepositoryFactory"/>
<property name="processorMapFactory" ref="XmlBasedProcessorMapFactory"/>
<property name="priorityManagerFactory" ref="HighestPriorityFirstManagerFactory"/>
<property name="runnerFactory" ref="ResourceRunnerFactory"/>
<property name="communicationChannelClientFactory" ref="XmlRpcClientFactory"/>
<property name="metadataKeysToCache">
<list>
<value>CollectionLabel</value>
</list>
</property>
<property name="debug" value="false"/>
</bean>
As we can see, this factory bean has several properties; let’s go through them one at a time:
modelRepoFactory :
Description: This factory bean will create the Java object which is responsible for parsing the workflow XML models we where introduced to before (See XML Model Graphs). This means, that if we wanted to specify our workflow models in another format other than XML, it is possible to write our own model repo and plug it in here (Code implementations will be discussed later).
Spring Bean:
<bean id="XmlWorkflowModelRepositoryFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.model.repo.XmlWorkflowModelRepositoryFactory">
<property name="modelFiles">
<list>
<value>_/$
/core/workflow/policy/workflows/WorkflowModelTestFile.xml{}</value>_
<value>_/$
/core/workflow/policy/workflows/GranuleMaps.xml{}</value>_
</list>
</property>
</bean>
- Properties:
modelFiles: A list of paths to Workflow XML Model files (See XML Model Graphs).
processorRepoFactory :
Description: This bean creates the object responsible for handling the caching of each WorkflowProcessor for the engine’s QueueManager. This has a default implementation which utilities the open source utility XStream.
Spring Bean:
<bean id="XStreamWorkflowProcessorRepositoryFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.processor.repo.XStreamWorkflowProcessorRepositoryFactory">
<property name="repoDirectory" value=_"/$
/processorRepo"_/>
</bean>
- Properties:
repoDirectory: This is the directory where each XStream(ed) WorkflowProcessor will be written out to in it’s own directory of files.
instanceRepoFactory: This is the queriable repository used to store TaskInstance metadata. This repository uses another OODT-CAS component called CAS-Catalog. And currently has two bean options: LocalCatalogServiceInstanceRepositoryFactory and ClientCatalogServiceInstanceRepositoryFactory. CAS-Catalog can be brought up in a server of it’s own, and if you want the workflow manager to use that catalog service then you would pick the client one, otherwise chose the local (which wraps the catalog service inside the workflow engine server. Here are the two beans:
Spring Beans:
<bean id="LocalCatalogServiceInstanceRepositoryFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.instance.repo.WorkflowInstanceRepositoryFactory">
<property name="catalogServiceFactory" ref="CatalogServiceLocalFactory"/>
</bean>
<bean id="ClientCatalogServiceInstanceRepositoryFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.instance.repo.WorkflowInstanceRepositoryFactory">
<property name="catalogServiceFactory" ref="CatalogServiceClientFactory"/>
</bean>
- Properties:
catalogServiceFactory: This is similar to the workflow manager setup, CatalogServiceClientFactory is the client bean for talking with the catalog service server and CatalogServiceLocalFactory is the actual catalog service which will be wrapped inside this workflow engine.
eventRepoFactory :
Description: This bean creates the object responsible for holding the engine’s events (current one uses Spring).
Spring Bean:
<bean id="SpringBasedEngineEventRepositoryFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.event.repo.SpringBasedEngineEventRepositoryFactory">
<property name="beanRepo" value=_"/$
/core/workflow/policy/event-beans.xml"_/>
</bean>
- Properties:
beanRepo: Path to a Spring XML file which defines the engine events.
processorMapFactory :
Description: This bean creates the object responsible for mapping execution ids to WorkflowProcessors.
Spring bean:
<bean id="XmlBasedProcessorMapFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.processor.map.XmlBasedProcessorMapFactory">
<property name="xmlFile"
value=_"/$
/core/workflow/policy/workflows/WorkflowProcessorMapping.xml"_/>
</bean>
- Properties:
xmlFile: This is the XML file which specifies this mapping (See XML Processor Map).
priorityManagerFactory :
Description: This bean creates the object responsible for priority sorting all the task and condition WorkflowProcessors on the QueueManager’s runnables queue_._ There are currently two implemented PriorityManagers (created by the factory below respectively): FILOPriorityManager and HighestPriorityFirstManager. FILOPriorityManager sorts by CreationDate (ignores priorities) and HighestPriorityFirstManager sorts by priority, putting those with highest priority to the front of the queue.
Spring beans:
<bean id="FILOPriorityManagerFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.priority.FILOPriorityManagerFactory"/>
<bean id="HighestPriorityFirstManagerFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.priority.HighestPriorityFirstManagerFactory"/>
- Properties: (none)
runnerFactory :
Description: This bean creates the engine’s runner. There are currently two implemented: LocalRunnerFactory and ResourceRunnerFactory. The former will run TaskInstances in the engines jvm (i.e. locally) and the latter will submit jobs to a CAS Resource Manager.
Spring bean:
<bean id="LocalRunnerFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.engine.runner.LocalEngineRunnerFactory"/>
<bean id="ResourceRunnerFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.engine.runner.ResourceRunnerFactory">
<property name="resourceManagerUrl" value=_"$
"_/>
</bean>
- Properties:
resourceManagerUrl: This used by ResourceRunnerFactory only, and is the URL at which the CAS Resource Manager server can be found.
communicationChannelClientFactory :
Description: This bean creates the client which allows this engine to tell TaskInstance how to communicated back to the server in which has been wrapped in (i.e. to be notified about things like metadata and state changes).
Spring bean:
<bean id="XmlRpcClientFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.server.channel.xmlrpc.XmlRpcCommunicationChannelClientFactory">
<property name="serverUrl" value=_"$
"_/>
<property name="requestTimeout" value="20"/>
<property name="connectionTimeout" value="60"/>
<property name="chunkSize" value="1024"/>
</bean>
- Properties:
serverUrl: This is the URL where the engine server is running.
requestTimeout: This is XML-RPC’s request timeout.
connectionTimeout: This is XML-RPC’s connection timeout.
chuckSize: This is XML-RPC’s transfer chuck size.
FYI:
There are several things to note about this XmlRpcClientFactory bean. As you will see later, this is the same bean wrapped by the client bean we used in all of our command-line examples: WorkflowEngineClientFactory. Also, because this XmlRpcClientFactory bean has to be set inside the engine itself, you may notice this could case a problem with configuring an engine and then just placing it into a server bean (i.e. what if the client doesn’t match the server). Since the default configuration, which comes with the workflow manager, only utilizes one server, its configuration setup is probably not the best. A better way would be to rearrange the way this property is set in your Spring XML files. The example below uses a form of polymorphism which Spring allows for its bean definition – I will not explain how this works (visit the Spring website given above to learn more about this). Notice how the communicationChannelClientFactory property was removed from the WorkflowEngineLocalFactory bean and put into the unnamed bean set to the workflowEngineFactory property:
<bean id="XmlRpcServerFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.server.channel.xmlrpc.XmlRpcCommunicationChannelServerFactory">
<property name="port" value=_"$
"_/>
<property name="workflowEngineFactory">
<bean parent="WorkflowEngineLocalFactory"
class="gov.nasa.jpl.oodt.cas.workflow.engine.WorkflowEngineLocalFactory">
<property name="communicationChannelClientFactory" ref="XmlRpcClientFactory"/>
</bean>
</property>
</bean>
<bean id="WorkflowEngineLocalFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.engine.WorkflowEngineLocalFactory">
<property name="modelRepoFactory" ref="XmlWorkflowModelRepositoryFactory"/>
<property name="processorRepoFactory"
ref="XStreamWorkflowProcessorRepositoryFactory"/>
<property name="instanceRepoFactory"
ref="LocalCatalogServiceInstanceRepositoryFactory"/>
<property name="eventRepoFactory" ref="SpringBasedEngineEventRepositoryFactory"/>
<property name="processorMapFactory" ref="XmlBasedProcessorMapFactory"/>
<property name="priorityManagerFactory" ref="HighestPriorityFirstManagerFactory"/>
<property name="runnerFactory" ref="ResourceRunnerFactory"/>
<property name="metadataKeysToCache">
<list>
<value>CollectionLabel</value>
</list>
</property>
<property name="debug" value="false"/>
</bean>
metadataKeysToCache :
Description: This is a list of metadata keys which each WorkflowProcessor should cause to they can be filtered on them via the GetWorkflowsByMetadata action. The bean below is a java.util.List created by Spring’s element tag <list>.
Spring bean:
<list>
<value>CollectionLabel</value>
</list>
- Properties: (none)
debug :
Description: This is a boolean which will put the server in debug mode if set to true. This allows you to create an engine configuration which always comes up in debug mode.
Spring bean: (none)
- Properties: (none)
20.3.6. Client Configuration
Now that we know how an engine is created and assigned a server, we will no learn how to create a client bean which can be used by the command-line utilities to talk to this server. As mentioned above, WorkflowEngineClientFactory uses the same communication client bean which the engine itself uses to allows its TaskInstances to communicate with it via its server. Here is the client bean definition:
<bean id="WorkflowEngineClientFactory" lazy-init="true"
class="gov.nasa.jpl.oodt.cas.workflow.engine.WorkflowEngineClientFactory">
<property name="communicationChannelClientFactory" ref="XmlRpcClientFactory"/>
<property name="autoPagerSize" value="1000"/>
</bean>
communicationChannelClientFactory is similar to WorkflowEngineLocalFactory’s property seen in Server Configuration. This sets the communication factory which will create the client used for server communication. autoPagerSize is used for server to client information transfers, for queries which don’t use paging, this is the number of workflows information which will be transferred across at a time (keeps the jvm throwing a heap overflow exception). This is the bean the all the command-line utilities up until now have been using to communicate with the server. If you where to write your own or create a different configuration, you could use your bean on the command-line as well, by instead using: -cfb <Your-Bean-Id>.
20.3.7. Environment Variables and Java Properties In Spring
Spring does not support environment variable replacement in its configuration by default. However, the workflow manager has added support for piping environment variables into your configuration files via a custom Spring PropertyPlaceholderConfigurer: gov.nasa.jpl.oodt.cas.workflow.util.CasPropertyPlaceholderConfigurer. This class allows you to put environment variables into Java Properties and then use the corresponding Java Property through your Spring XML configuration. You have seen these Java Properties used throughout the client/server configuration above (i.e. _$
). _ The file which allows you to setup these properties is engine-properties.xml:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright 2009 California Institute of Technology. ALL RIGHTS
RESERVED. U.S. Government Sponsorship acknowledged.
$Id$
-->
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:p="http://www.springframework.org/schema/p"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">
<bean class="gov.nasa.jpl.oodt.cas.workflow.util.CasPropertyPlaceholderConfigurer">
<!- Allow for system-level properties to override all properties below ->
<property name="systemPropertiesMode" value="2"/>
<!- Properties ->
<property name="properties">
<props>
<prop key="workflowmgr.url">[WORKFLOWMGR_URL];http://localhost:10000</prop>
<prop key="workflowmgr.port">[WORKFLOWMGR_PORT];10000</prop>
<prop key="resourcemgr.url">[RESOURCEMGR_URL];http://localhost:10001</prop>
<prop key="pcs.support.home">[PCS_SUPPORT_HOME];/tmp</prop>
<prop key="pcs.home">[PCS_HOME];/tmp</prop>
<prop key="filemgr.home">[FILEMGR_HOME];/tmp</prop>
</props>
</property>
</bean>
</beans>
The properties property is where you would create your key and map it to an environment variable. However, you do not need to set it to an environment variable, you can just use this as a place for setting single-place-of-replacement variables used throughout your Spring configuration. The format of the values for each key is: [<environment-variable>];<default-value> or just <value> (with no semi-colon) in the case of no environment variable_. <default-value>_ is used when the environment variable given is not set.
20.3.8. Action Configuration
The workflow manager does not limit you to only the command-line actions which were introduced above (See Actions Enumerated). Those are just the actions which have already been implemented (i.e. freebies). If the current list of actions doesn’t meet your needs, you can write your own and plug them in via Spring. The file where actions are defined is action-beans.xml. This is where you can modify existing actions and/or add your own. The bean ids in this file are what is given as an argument for the --a command-line option (See Actions Enumerated). Notice how the bean ids in this file are the same as the names of the actions defined in the section Actions Enumerated. So, if you where to add a new action to this file, you could invoke it from the command-line as such:
$ ./engine-client --cfb WorkflowEngineClientFactory --a <your-action-id>.
In order to create an action you would just extend the class, gov.nasa.jpl.oodt.cas.workflow.server.action.WorkflowEngineServerAction, and implement its one abstract method:
public abstract void performAction(WorkflowEngineClient weClient) throws Exception;
After which, you would just create a bean for it. The argument for this method is the bean created by the factory bean whose id you gave on the command-line via the -cfb flag. You can invoke any methods on this client class to communicate with the server to which it is configured. Let’s say we wrote an action, org.myactions.HelloWorld, which just printed out Hello World. It would look something like this:
package org.myactions;
public class HelloWorld extends WorkflowEngineServerAction {
@Override
public void performAction(WorkflowEngineClient weClient) throws Exception
}
And the Spring XML bean would look like this:
<bean id="HelloWorld">
<property name="description" value="Prints out ‘Hello World’"/>
</bean>
Your action will now automatically appear is the list of a client’s supported actions (See Actions Enumerated) and is now a valid argument for the -a command-line option. Let’s take this one step further and make this action print out a custom message via our bean configuration. The new class will be name: org.myactions.PrintMessage and looks something like:
package org.myactions;
public class PrintMessage extends WorkflowEngineServerAction {
private String message;
@Override
public void performAction(WorkflowEngineClient weClient) throws Exception {
System.out.println(this.message);
}
public void setMessage(String message) {
this.message = message;
}
}
This time, in order to demonstrate action reusability, we will use this class to create two actions:
<bean id="HelloWorld">
<property name="description" value="Prints out ‘Hello World’"/>
<property name="message" value="Hello World"/>
</bean>
<bean id="ByeWorld">
<property name="description" value="Prints out ‘Bye World’"/>
<property name="message" value="Bye World"/>
</bean>
As you can see, we used the same class, just named each bean differently and changed the message property. Now if we run HelloWorld it will print ‘Hello World’ and if we run ByeWorld it will print ‘Bye World’. We can take this even further. Even command-line options are configurable in the workflow manager. So can make this action receive the message it is to print from the command-line when we run it and we don’t have to specify the message property in the Spring XML file.
20.3.9. Command-Line Configuration
The configuration for all the existing command-line options is in engine-client-cmd-line-beans.xml and engine-server-cmd-line-beans.xml, for the client and server respectively. For the most part, you will not be changing the server’s command-line options, so let’s focus on the client’s (however, by learning how the client’s command-line is configured you will have the knowledge to change the servers if so desired). As mentioned above, the client’s configuration is the workflows policy file: engine-client-cmd-line-beans.xml. Let’s create a command-line option for our PrintMessage action so to understand how this configuration works. The option we are going create is message. First we will fill in just the basics:
<bean id="message" class="gov.nasa.jpl.oodt.cas.commons.option.CmdLineOption">
<property name="shortOption" value="msg"/>
<property name="longOption" value="message"/>
<property name="description" value="Some Message"/>
<property name="hasArgs" value="true"/>
<property name="optionArgName" value="message"/>
<property name="required" value="true"/>
</bean>
Each option is of class type: gov.nasa.jpl.oodt.cas.commons.option.CmdLineOption. This is a Java bean already written which allows new command-line options to be configured and plugged in. Let’s look at each CmdLineOption property one at a time:
shortOption : this is a short name for an option (you can use the short name by using a single dash ‘-’ in front of the option on the command line; so, -msg would invoke this option).
longOption : this is the long name for an option (you can use the long name by using two dashes ‘--’ in front of the option on the command line; so, --message would invoke this option).
description : This is the description which the -h or --help option will print out next to this command-line option.
hasArgs : This is a boolean property which if set to true requires some values to given after the option on the command line (i.e. -option <values>).
optionArgName : This is the description of the args required by the option.
required : This is a boolean property which if set to true requires this option always to be given to the engine-client script.
Now how do we get the argument value of this option into an action? Let’s take our PrintMessage action and define it only once now and given it a new name, but this time omit the message property, since we are going to set this property via the command-line:
<bean id="PrintMessage">
<property name="description" value="Prints out a message"/>
</bean>
Here is the modified command-line option, which tells the message action which message to print:
<bean id="message" class="gov.nasa.jpl.oodt.cas.commons.option.CmdLineOption">
<property name="shortOption" value="msg"/>
<property name="longOption" value="message"/>
<property name="description" value="Some Message"/>
<property name="hasArgs" value="true"/>
<property name="optionArgName" value="message"/>
<property name="required" value="true"/>
<property name="handler">
<bean class="gov.nasa.jpl.oodt.cas.commons.option.handler.CmdLineOptionBeanHandler">
<property name="applyToBeans">
<list>
<bean
class="gov.nasa.jpl.oodt.cas.commons.option.handler.BeanInfo"
p:bean-ref="PrintMessage"/>
</list>
</property>
</bean>
</property>
</bean>
Notice that we added the property handler. Each CmdLineOption can be given a handler which allows you to plug in code that will run if this option is specified on the command-line. In this case, we have use an existing handler: CmdLineOptionBeanHandler. This handler allows you to take the command-line arguments of this option and pipe them into another bean’s property via the BeanInfo given it. The property set by this handler on the given beans is the same as this options bean id (i.e. this options bean id is message, so PrintMessage will have it’s message property set). If the bean property, which should be set by this option, doesn’t match the id of the option bean, BeanInfo has a property methodName which allows you to specify a different method name which it should call. BeanInfo requires its bean-ref property, in the XML above, which is the id of the bean whose property you want set. (Note: Spring supports setting properties on beans via attributes as well, p:<property-name>=“<value>”). So if I run the following on the command-line:
$ _./engine-client -cfb WorkflowEngineClientFactory _
_ -a PrintMessage _
-msg “Hello From Command-Line!”
then “Hello From Command-Line!” will print out. This command-line option isn’t complete yet though. We don’t want this message option to always be required, it should only be required when we use our action PrintMessage. This is done by using CmdLineOption’s requiredOptions property:
<bean id="message" class="gov.nasa.jpl.oodt.cas.commons.option.CmdLineOption">
<property name="shortOption" value="msg"/>
<property name="longOption" value="message"/>
<property name="description" value="Some Message"/>
<property name="hasArgs" value="true"/>
<property name="optionArgName" value="message"/>
<property name="requiredOptions">
<list>
<bean class="gov.nasa.jpl.oodt.cas.commons.option.required.RequiredOption">
<property name="optionLongName" value="action"/>
<property name="requireAllValues" value="false"/>
<property name="optionValues">
<list>
<value>PrintMessage</value>
</list>
</property>
</bean>
</list>
</property>
<property name="handler">
<bean class="gov.nasa.jpl.oodt.cas.commons.option.handler.CmdLineOptionBeanHandler">
<property name="applyToBeans">
<list>
<bean
class="gov.nasa.jpl.oodt.cas.commons.option.handler.BeanInfo"
p:bean-ref="PrintMessage"/>
</list>
</property>
</bean>
</property>
</bean>
The requiredOptions property takes a list of RequiredOption beans, which have three properties each themselves. requiredOptions only sets this option to required if any of the RequiredOption(s) are specified on the command-line. In this case, there is only one RequiredOption given, and it is specifying that, if option action (this is the long name for the option -a) is given and has an argument PrintMessage then this option (i.e. message) is required. In otherwords, this option will only be required if -a PrintMessage is on the command-line. Each option supports one more property: validators. You can give an option a list of CmdLineOptionValidator(s). In this case, we will use an already written validator: gov.nasa.jpl.oodt.cas.commons.option.validator.ArgRegExpCmdLineValidator. This validator allows an option’s arguments to be restricted to given criteria enforced by the validator specified. Here is the message option with a validator that requires all its messages to start with “Hello ”.
<bean id="message" class="gov.nasa.jpl.oodt.cas.commons.option.CmdLineOption">
<property name="shortOption" value="msg"/>
<property name="longOption" value="message"/>
<property name="description" value="Some Message"/>
<property name="hasArgs" value="true"/>
<property name="optionArgName" value="message"/>
<property name="requiredOptions">
<list>
<bean class="gov.nasa.jpl.oodt.cas.commons.option.required.RequiredOption">
<property name="optionLongName" value="action"/>
<property name="requireAllValues" value="false"/>
<property name="optionValues">
<list>
<value>PrintMessage</value>
</list>
</property>
</bean>
</list>
</property>
<property name="handler">
<bean class="gov.nasa.jpl.oodt.cas.commons.option.handler.CmdLineOptionBeanHandler">
<property name="applyToBeans">
<list>
<bean
class="gov.nasa.jpl.oodt.cas.commons.option.handler.BeanInfo"
p:bean-ref="PrintMessage"/>
</list>
</property>
</bean>
</property>
<property name="validators">
<list>
<bean
class="gov.nasa.jpl.oodt.cas.commons.option.validator.ArgRegExpCmdLineOptionValidator">
<property name="allowedArgs">
<list>
<value>Hello\s.+</value>
</list>
</property>
</bean>
</list>
</property>
</bean>
You should now be able to look through both the client and server command-line bean files and understand what is going on.
21. CAS-PGE
Up until now, we have used a Grocery Store example for everything, however that is not the typical usage of CAS-Workflow. CAS-Workflow is typically used as part of a data processing system; where workflows are responsible for controlling the run order of different Product Generation Executables (PGEs). A PGE is basically some code which, when given input files, generates output file(s). CAS-Workflow accomplishes this via another CAS component, which is a PGE wrapper workflow TaskInstance: CAS-PGE. CAS-PGE was designed to help accomplish the most common actions required to run these PGEs: Finding the input files, executing the PGE, saving the output files, etc. CAS-PGE uses yet another CAS component: CAS-FileManager. The File Manager is the part of the data processing system that manages the data files. It supports metadata-filtering queries across these files to allow for fast file pinpointing. It also supports data file ingestion for placing new data files into the File Manager.
21.1. Structure
PGEs usually need a method by which information is given them on how to run, what to run with (i.e. input files), and where to place the output files and what to name them. CAS-PGE accomplishes this by generating a configuration file (or files), which the PGE must read in, and is usually passed to the PGE via the command-line. CAS-PGE allows you to plug-in the code for writing these configuration files, which maps workflow metadata to the desired configuration file format. This interface in CAS-PGE is: SciPgeConfigFileWriter. CAS-PGE also allows you to specify the code that controls which metadata should be sent to the File Manager with each output file you configure CAS-PGE to ingest. This interface is: PcsMetFileWriter. CAS-PGE’s base TaskInstance, PGETaskInstance, is an abstract class with all methods protected and implemented. The reason for this is to enforce the idea that PGETaskInstance is not meant to accomplish every action imaginable, but only the baseline of common actions required for wrapping PGEs. That is, it is meant to be an extension point where you inherit a default set of modularized actions, which perform in a default order, however, the inheritance allows you to override any of the modularized actions (i.e. java methods) so to customize only where needed while still inheriting all the common actions you still need. However, in the case that you may want to use PGETaskInstance as is, use StdPGETaskInstance – it inherits all of PGETaskInstance actions with no changes. CAS-PGE also needs one more piece of information, configuration on how it should run. That is, how many configure files it should generate, which SciPgeConfigFileWriter(s) to use to create these configuration files, which output files need which PcsMetFileWriter to generate their metadata for File Manager ingestion, how to run the PGE, which File Manager to talk to, etc. This is done via a combination of two mediums: metadata and a PgeConfigBuilder. There is a set of reserved metadata fields that CAS-PGE expects, which affect the way it runs (i.e. which File Manager to ingest to). PgeConfigBuilder is the plug-in point where one builds up a PgeConfig object, which controls how CAS-PGE runs. PGETaskInstance is not only inheritable, but it also has an additional plug-in which allows you to set and/or modify metadata to get a slightly different execution out of a given PGETaskInstance. In other words, you can have different inherited PGETaskInstances that can be configured to run several different ways by change out its ConfigFilePropertyAdder (i.e. the PGETaskInstance plug-in). CAS-PGE also has a workflow condition which is responsible for finding the input files for the PGE it will run and adding the files into the workflow manager dynamic metadata so that PGETaskInstance which runs after it knows which files to give its PGE. This condition is: PGETaskWorkflowCondition. Figure 13 graphically shows CAS-PGE’s PGETaskWorkflowCondition’s plug-in points_._ Figure 14 graphically shows CAS-PGE’s PGETaskInstance’s plug-in points. It also shows a few of the default implementations of these plug-in points, so CAS-PGE may work out-of-the-box for you (these plug-ins will be explained in detail later).
Figure 13
Figure 14
21.2. Pre-Condition
The CAS-PGE Workflow Pre-Condition, PGETaskWorkflowCondition, is a customizable File Manager querying task which allows one to query the File Manager then control the way these files are stored into metadata and also valid to insure the expected files exist.
21.2.1. PropertyAdders
PGETaskWorkflowCondition has support for plugging in 0 or more WorkflowConditionPropAdder which run before this condition queries the File Manager. These property adders allow you add or modify metadata before the query is built.
21.2.2. Query Building
PGETaskWorkflowCondition builds its query based on metadata fields in the following format_:_
<ProductType>/<ElementName>/<Term | Start | End | Start_Incl | End_Incl>.
Where ProductType is any ProductType supported by the File Manager being queried, ElementName is any ElementName which the given ProductType supports, and the last metadata field group is as follows:
Term: Equals to.
Start: Greater than.
End: Less than.
Start_Incl: Greater than or equals to.
End_Incl: Less than or equals to.
Consider this example metadata field:
MOA_IASI_L1C/Filename/Term=‘data.dat’
This field would cause PGETaskWorkflowCondition to query for any file of product type MOA_IASI_L1C whose Filename metadata field is equals to data.dat, but only if MOA_IASI_L1C is listed as a value in the metadata field: PGETask/Condition/ProductTypeNames. If multiple metadata fields exist, then they are all combined into a single query, similar to combining SQL WHERE clauses via ANDs.
21.2.3. Post-PropertyAdders
These are the same as PropertyAdders, but they are run post query and allows
one to set additional metadata fields based off query results or to perform additional checks on the query results before allowing the condition to succeed.
21.2.4. Query Result Validation
PGETaskWorkflowCondition allows query result validation by checking for an expected number of files and by checking gap sizes between files in the query results. This condition can also be given a timeout, which is the amount of time which the condition will keep trying to find the expected number of files, after which it will just look for the minimum number of files, and if minimum is not found, fails out.
21.3. Task
//TODO!!!!
21.4. Reserved Metadata
21.4.1. PGETask/Name
Description:
The name of this CAS-PGE wrapper.
Usage:
Used in the name of the script used to run the PGE.
Required:
Yes
21.4.2. PGETask/PgeConfigBuilder
Description:
The implementing class for PgeConfigBuilder.
Usage:
Used to load your custom PgeConfigBuilder.
Required:
No. Default will be XmlFilePgeConfigBuilder.
21.4.3. PGETask/LogFilePattern
Description:
Java logging pattern (See java.util.Logger).
Usage:
Will capture the output of the PGE into a log files using the given pattern.
Required:
No, but PGE output streams will not be captured if not specified.
21.4.4. PGETask/PropertyAdders
Description:
Comma separated list of ConfigFilePropertyAdders. These are the property adders discussed in Structure.
Usage:
Will be run by PGETaskInstance so metadata can be augmented by the given property adders.
Required:
No.
21.4.5. PGETask/Runtime
Description:
This field should not be set! It is set by CAS-PGE and can be used in your output metadata. The value of this field in the time it took to execute the PGE in milliseconds.
Usage:
To store PGE runtime in metadata.
Required:
Should NOT be set!!!
21.4.6. PGETask/DumpMetadata
Description:
Is a boolean field, if set to true will dump out an XML file with the current state of instance metadata in CAS-PGE. This is useful for debugging and when using the XslTransformWriter (discussed later).
Usage:
Boolean field, if set to true, will cause the XML file ‘pgetask- metadata.xml’ to be written in the PGE working directory containing the current instance metadata.
Required:
No. Only if set to true does this have any affect.
21.4.7. PGETask/Query/FileManagerUrl
Description:
The URL of the File Manager which will be queried for input files.
Usage:
Used by PGETaskWorkflowCondition.
Required:
Yes, if PGETaskWorkflowCondition is used.
21.4.8. PGETask/Query/ClientTransferServiceFactory
Description:
Currently Not Implemented. The ClientTransferServiceFactory used by the File Manager to transfer files out of the catalog (for file staging – for now, file staging always used XML-RPC to transfer the file from the File Manager server).
Usage:
Used by PGETaskWorkflowCondition.
Required:
Yes, if PGETaskWorkflowCondition is used.
21.4.9. PGETask/Ingest/FileManagerUrl
Description:
The URL of the File Manager which all output files will be ingested into.
Usage:
Used by CAS-Crawler (used in CAS-PGE) to ingest output files.
Required:
Yes.
21.4.10. PGETask/Ingest/ClientTransferServiceFactory
Description:
The ClientTransferServiceFactory used to transfer output files into the File Manager when ingested.
Usage:
Used by CAS-Crawler (used in CAS-PGE) to transfer ingest output files.
Required:
Yes.
21.4.11. PGETask/Ingest/ActionRepoFile
Description:
The Spring CAS-Crawler Action repo file (See CAS-Crawler).
Usage:
See CAS-Crawler.
Required:
No.
21.4.12. PGETask/Ingest/ActionIds
Description:
The CAS-Crawler Action ids you run from the action repo file (See PGETask/Ingest/ActionRepoFile).
Usage:
See PGETask/Ingest/ActionRepoFile.
Required:
If PGETask/Ingest/ActionRepoFile is used and you want to run actions specified in the file, then Yes, otherwise No.
21.4.13. PGETask/Ingest/CrawlerCrawlForDirs
Description:
This is a boolean field, which if set to true, tells CAS-Crawler to ingest directories (See CAS-Crawler).
Usage:
See CAS-Crawler.
Required:
No.
21.4.14. PGETask/Ingest/CrawlerRecur
Description:
This is a boolean field, which if set to true, tells CAS-Crawler to recursively crawl to the bottom level directory. If PGETask/Ingest/CrawlerForDirs is set to true, then the crawl will ingest the lowest level directories in a directory structure tree, otherwise the lowest level files will be ingested.
Usage:
See CAS-Crawler.
Required:
No.
21.4.15. PGETask/Ingest/MetFileExtension
Description:
This is the file extension which CAS-Crawler will look for which contains the File Manager ingest metadata for the output data file with the same name minus this extension in the same directory (See CAS-Crawler).
Usage:
See CAS-Crawler.
Required:
No. Default is: met
21.4.16. PGETask/Ingest/RequiredMetadata
Description:
This is the required metadata CAS-Crawler will check for, for each data file it tries to ingest, if it doesn’t exist it will not ingest the file into the File Manager (See CAS-Crawler).
Usage:
See CAS-Crawler.
Required:
No. See CAS-Crawler for default list of required metadata.
21.4.17. PGETask/Condition/Timeout
Description:
This is the number of seconds from CreationDate at which the condition will switch to evaluating against minimum allowed input file results instead of expected number of input files.
Usage:
Used by PGETaskWorkflowCondition to determine when to switch to evaluating against minimum number of files.
Required:
No, but condition will never timeout then.
21.4.18. PGETask/Condition/PropertyAdders
Description:
This are similar to PGETask/PropertyAdders, except they are for PGETaskWorkflowCondition.
Usage:
Allows metadata augmentation for PGETaskWorkflowCondition.
Required:
No.
21.4.19. PGETask/Condition/PostPropertyAdders
Description:
This is the same as PGETask/Condition/PropertyAdders, except they run after the condition has queried the File Manager.
Usage:
Allows metadata augmentation for PGETaskWorkflowCondition.
Required:
No.
21.4.20. PGETask/Condition/FilterAlgorClass
Description:
This is the implementation of gov.nasa.jpl.oodt.cas.filemgr.structs.query.filter.FilterAlgor you would like to use to further filter query results.
Usage:
Used in complex queries to CAS File Manager.
Required:
No.
21.4.21. PGETask/Condition/StartDateTimeKey
Description:
This is the metadata field in which each input file’s metadata stores its data’s start time in UTC format.
Usage:
Used by both the FilterAlgor specified by PGETask/Condition/FilterAlgorClass and by gap analysis if turned on by setting PGETask/Condition/MaxGap/Size.
Required:
If either PGETask/Condition/FilterAlgorClass or PGETask/Condition/MaxGap/Size is set then Yes, otherwise No – however it is probably good practice to just set it anyway, since you may want to use it in your condition property adders (See PGETask/Condition/PropertyAdders and PGETask/Condition/PostPropertyAdders).
21.4.22. PGETask/Condition/EndDateTimeKey
Description:
Same as PGETask/Condition/StartDateTimeKey, except it is the end time.
Usage:
See PGETask/Condition/StartDateTimeKey
Required:
See PGETask/Condition/StartDateTimeKey
21.4.23. PGETask/Condition/EpsilonInMillis
Description:
Defines the time, in milliseconds, allowed between the files in the query results when filtered through the class defined by PGETask/Condition/FilterAlgorClass.
Usage:
Used by the class set in PGETask/Condition/FilterAlgorClass.
Required:
No. FilterAlgor will use its default.
21.4.24. PGETask/Condition/VersioningKey
Description:
The metadata field for each input file that contains the metadata value
which should be used to determine priority of overlapping input files when
filtered by the specified FilterAlgor via
PGETask/Condition/FilterAlgorClass.
Usage:
Used by the class set in PGETask/Condition/FilterAlgorClass.
Required:
If PGETask/Condition/FilterAlgorClass is set then Yes, otherwise No.
21.4.25. PGETask/Condition/VersionConverter
Description:
This is the class which converts the metadata value of the metadata field specified by PGETask/Condition/VersioningKey into a priority double number (higher the number, higher the priority).
Usage:
Used by the class set in PGETask/Condition/FilterAlgorClass.
Required:
No, if the metadata value is convertible via the default VersionConverter: AsciiSortableVersionConverter, otherwise, if PGETask/Condition/VersioningKey, then Yes.
21.4.26. PGETask/Condition/SortByKey
Description:
This is the metadata field which is used to sort the query results – happens post filtering, but does not require filtering to happen.
Usage:
Used in complex queries to CAS File Manager.
Required:
No. Results will be in default CAS File Manager order (sorted by ingest date)
21.4.27. PGETask/Condition/ProductTypeNames
Description:
A list of ProductTypes which this condition will query on.
Usage:
See Query Building.
Required
Yes.
21.4.28. PGETask/Condition/ExpectedNumOfFiles
Description:
The number of files expected to come back from the File Manager query.
Usage:
To evaluate all the expected files are available.
Required:
Yes.
21.4.29. PGETask/Condition/MinNumOfFiles
Description:
The minimum number of files returned from File Manager query that are
allowed after timeout is reached.
Usage:
Used as the worst case scenario number of files.
Required:
No, but this means that after timeout 0 files return from query is acceptable.
21.4.30. PGETask/Condition/MaxGap/Size
Description:
The max allows time in milliseconds between files returned from File Manager query.
Usage:
Used by condition gap analysis to help insure there are no missing files.
Required:
No, but then gap analysis will not be used.
21.4.31. PGETask/Condition/MaxGap/StartDateTime
Description:
The sets a start date and time which the files returned from the File Manager query are expected to start at (gap analysis is done between this time and the first query result’s PGETask/Condition/StartDateTimeKey value).
Usage:
Used by condition gap analysis to insure there are not missing files before the list of results from the File Manager query. Should be in UTC format.
Required:
No, but then gap analysis will not be done for missing files before the returned results.
21.4.32. PGETask/Condition/MaxGap/EndDateTime
Description:
Same as PGETask/Condition/MaxGap/StartDateTime, except this is the
end date and time.
Usage:
Same as PGETask/Condition/MaxGap/StartDateTime but gap analysis is done at the end of the results.
Required:
No, but then gap analysis will not be done for missing files after the returned results.
21.4.33. PGETask/Condition/ResultKeyFormats
Description:
Used to help set metadata field based on query results.
Usage:
This can be set to N number of comma separated:
_
_
Where <metadata-field> is the metadata field you would like to set and <value> is the value you would like to give that metadata field. <value> also supports injecting file metadata into these values. For instance,
_
, will set a multi-valued metadata field _Filenames equal to a list of the Filename metadata fields from each input file result from the File Manager query.
Required:
No.
21.4.34. PGETask/Condition/SqlQueryKey
Description:
This is the metadata field you would like the condition to place the SQL version of the File Manager query performed.
Usage:
Used to allow user to see the query a condition performs.
Required:
No. However, then query will not be stored in workflow metadata.
21.1. PGETaskInstance Default Plug-ins
XmlFilePgeConfigBuilder is the default PgeConfigBuilder which uses XML to generate the PgeConfig object which controls how PGETaskInstance executes. First off, XmlFilePgeConfigBuilder extends FileBasedPgeConfigBuilder and that class supports two metadata fields:
PGETask/FileBasedConfig/ConfigFilePath:
The path to the xml configuration file.
PGETask/FileBasedConfig/StageConfigFile:
A boolean, which if set to true, will cause the configuration file to be staged to the temporary directory on the computer which the task executes on (only if the file path set by ConfigFilePath does not exist), then changes the ConfigFilePath metadata field’s value to the path of the configuration file in the temporary directory.
In order to demonstrate how this works let’s walk through an example. Our example will take a list of files and concatenate them into one file. Let’s start by creating the workflow XML model. This model assumes you have set the environment variable, FILEMGR_URL, to the URL of the File Manager.
<?xml version="1.0" encoding="UTF-8"?>
<cas:workflows
xmlns="http://oodt.jpl.nasa.gov/2.0/cas"
xmlns:cas="http://oodt.jpl.nasa.gov/2.0/cas"
xmlns:p="http://oodt.jpl.nasa.gov/2.0/cas/property">
<sequential id="urn:acce:DemoWorkflow" name="DemoWorkflow">
<configuration>
<!- PCS properties ->
<property name="PGETask/Query/FileManagerUrl" value="[FILEMGR_URL]" envReplace="true"/>
<property name="PGETask/Ingest/FileManagerUrl" value="[FILEMGR_URL]" envReplace="true"/>
<property name="PGETask/Ingest/ClientTransferServiceFactory"
value="gov.nasa.jpl.oodt.cas.filemgr.datatransfer.RemoteDataTransferFactory"/>
<property name="PGETask/Ingest/MetFileExtension" value="cas"/>
<property name="PGETask/Ingest/CrawlerCrawlForDirs" value="false"/>
<property name="PGETask/Ingest/CrawlerRecur" value="false"/>
<!- Wait time between block and unblock in minutes ->
<property name="BlockTimeElapse" value="1"/>
</configuration>
<task id-ref="urn:acce:DemoPGE"/>
</sequential>
<task id="urn:acce:DemoPGE" name="DemoPGE" class="gov.nasa.jpl.oodt.cas.pge.StdPGETaskInstance">
<configuration>
<property name="PGETask/Name" value="DemoPGE"/>
<property name="PGETask/FileBasedConfig/StageConfigFile" value="true"/>
<property name="QueueName" value="exe"/>
</configuration>
</task>
</cas:workflows>
Notice that that FILEMGR_URL was used to set both the ingest and query File Manager URL. This is because, typically, the File Manager used for querying will be the same File Manager used for ingesting. However, there are cases in which they are different, hence why there are two different metadata fields. The ClientTransferServiceFactory chosen in this example was: gov.nasa.jpl.oodt.cas.filemgr.datatransfer.RemoteDataTransferFactory. There are two currently implemented and they are part of the CAS-FileManager component. The first, is the one used in the example and the second is: gov.nasa.jpl.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory. The deference between the two is Remote uses XML-RPC, and hence utilizes the File Manager server to transfer file data, in cases where there are a lot of file transfers, the File Manager server can be tied up with data transfers, so use carefully. This is one of those cases where you might want to make your ingest File Manager different from your query File Manager, so to split up the data transfers (one File Manager server for ingest transfers and one for query transfers). Sometimes, Remote may be your only option; it is needed when your jobs cannot see the directory where the File Manager stores it’s files, so in order for PGEs to run, these files must be staged to the PGE working directory by remote transfer through the File Manager and the output files must be transferred back to the File Manager in the same manor. Another thing to note, the File Manager supports letting it’s server or client handle the data transfers, however, no matter which transferer is chosen, if remote transferer is used and the File Manager and job disk space is not the same and not visible to each other, then you must use client transfer, otherwise your transfers will fail! Only use the remote transferer in cases where the File Manager and job disk space are not the same and not visible to each other. The Local data transfer can be used when the File Manager’s files are visible to your job (i.e. either they are on the same disk or linked via some remote mounting file system). The MetFileExtension is set to cas, so the crawler (built into CAS-PGE) knows that, for each file it is to ingest, there will be a metadata file with the same name with the added extension of cas. CrawlerCrawlForDirs and CrawlerRecur are both false because this concatenation PGE generates a file. Here is the script used to run this workflow (which requires PCS_HOME to be set):
#!/bin/sh
_./engine-client -cfb WorkflowEngineClientFactory -a StartWorkflow _
_ -mid urn:acce:DemoWorkflow _
_ -m QueueName test _
_ -m DemoPGE/RunInDir $
/DemoPGE/test-1 _
_ -m PGETask/FileBasedConfig/ConfigFilePath $
/core/pge/policy/demo/config/pge-config.xml _
_ -m DemoPGE/ConfigXsltFile $
/core/pge/policy/demo/config/xslt-config.xsl _
_ -m DemoPGE/ConfigMetoutFile $
/core/pge/policy/demo/config/metout-config.xml _
_ -m DemoPGE/DemoJar `ls $
/core/pge/lib/acce-pge-*.jar` _
_ -m DemoPGE/InputFiles $
/core/pge/policy/demo/input/hello.txt $
/core/pge/policy/demo/input/john.txt_
Let’s step through the metadata set here, the first is QueueName:
_-m QueueName test _
By setting QueueName on the command-line we are adding it to the workflows dynamic metadata which will override it’s static. This was done, because, for this run, instead of sending the job to the exe queue in the ResourceRunner, it was decided that is should be sent to the test queue. If this line were removed from the script, then it would be sent to the exe queue (so by adding QueueName to our static metadata, we have basically created a default queue value for this job). The next line sets the PGEs working directory:
_-m DemoPGE/RunInDir $
/DemoPGE/test5 _
This setup requires us to set the unique directory name ourselves every time the PGE is run. This may be what you want, but there is another way to have this automatically done (it will be discussed later). Then comes the line which sets the path to our CAS-PGE XML config file:\\
_-m PGETask/FileBasedConfig/ConfigFilePath $
/core/pge/policy/demo/config/pge-config.xml _
The other metadata lines will become apparent as to their meaning later; for now, let’s take a look at this pge-config.xml file:\\
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2009, California Institute of Technology.
ALL RIGHTS RESERVED. U.S. Government sponsorship acknowledged.
$Id$
-->
<pgeConfig>
<dynInputFiles>
<file path="[RunInDir]/ConfigFile.txt" writerClass="[ConfigWriter]"
args="[ConfigXsltFile],[XsltUseCDATA]"/>
</dynInputFiles>
<fileStaging dir="[RunInDir]/input-files">
<stageFiles metadataKey="ConfigXsltFile"/>
<stageFiles metadataKey="ConfigMetoutFile"/>
<stageFiles metadataKey="DemoJar"/>
<stageFiles metadataKey="InputFiles"/>
</fileStaging>
<exe dir="[RunInDir]" shellType="/bin/csh">
<cmd>touch [OutputFile]</cmd>
<cmd>[JAVA_HOME]/bin/java -cp [DemoJar] [DemoPgeClass] ConfigFile.txt > [OutputFile]</cmd>
</exe>
<output>
<dir path="[RunInDir]" createBeforeExe="true">
<files name="[OutputFile]" metFileWriterClass="[MetFileWriter]" args="[ConfigMetoutFile]"/>
</dir>
</output>
<customMetadata>
<metadata key="RunInDir" key-ref="[PGETask/Name]/RunInDir"/>
<metadata key="DemoJar" key-ref="[PGETask/Name]/DemoJar"/>
<metadata key="ConfigMetoutFile" key-ref="[PGETask/Name]/ConfigMetoutFile"/>
<metadata key="ConfigXsltFile" key-ref="[PGETask/Name]/ConfigXsltFile"/>
<metadata key="InputFiles" key-ref="[PGETask/Name]/InputFiles"/>
<metadata key="OutputFile" val="ConcatOutput.txt"/>
<metadata key="ConfigWriter" val="gov.nasa.jpl.oodt.cas.pge.writers.xslt.XslTransformWriter"/>
<metadata key="MetFileWriter"
val="gov.nasa.jpl.oodt.cas.pge.writers.metlist.MetadataListPcsMetFileWriter"/>
<metadata key="DemoPgeClass" val="gov.nasa.jpl.acce.pge.demo.DemoPGE"/>;
<metadata key="XsltUseCDATA" val="true"/>
</customMetadata>
</pgeConfig>
This file always begins with a <pgeConfig> element and this element supports sub-elements which allows specification of how to build the PGE configuration files, stage its input files, execute the PGE, ingest it’s output files, augment metadata, and (not shown in this example) the ability to import other XML configuration files. Let’s start with the element for augmenting metadata: <customMetadata>. Although this element is at the end of the file, it doesn’t mean that it is the last to be loaded. <customMetadata> is actually the first element loaded in this pge-config.xml (the only other element that is loaded before it is the import element – not in this example file). Inside <customMetadata> any number of <metadata> elements are allowed. <metadata> supports only attributes, it has no sub-elements. The key attribute seen in the example is the name of the metadata field you would like to add to this jobs local metadata. Following the key attribute you will notice in the example there are two different attributes used: key-ref and val. Both of these attributes support environment variable replacement and also metadata replacement using the same technique for as with environment variables. As in the workflow XML model files, you can use environment variables surrounded by brackets to inject values. In CAS-PGE’s XML configuration file, this bracket replacement will also check if the value enclosed within the brackets is an existing metadata field. There are also pre-defined functions allowed in the brackets. The replacement order which happens here is: first check if there is a pre-defined function, and if so, replace, otherwise, check if there is a metadata field set with the enclosed name, and if so, replace, otherwise, check if there is an environment variable set with the enclosed name, if so replace, otherwise replace with null. The key doesn’t support dynamic replacement, if you want dynamic replacement for a key, they use can use instead key-gen which is the same as key, but it tells the XML parser that you would like this key have dynamic replacement run on it. key-ref is similar to a C or C++ pointer; that is, you are linking one key to another key. Let’s take a look at the first <metadata> element:
<metadata key="RunInDir" key-ref="[PGETask/Name]/RunInDir"/>
This line is creating a linking metadata field RunInDir which links to the metadata field DemoPGE/RunInDir (i.e. [PGETask/Name] was replace with DemoPGE – PGETask/Name which was set in our example workflow XML model file above). The key-ref attributes gives us two important features: 1) it allows access to a key that isn’t know until runtime to be named and used throughout the XML configuration file (i.e. as in the case of RunInDir) and 2) this job’s instance metadata will now have metadata field RunInDir which can be queried on (i.e. the benefit of this is, if all jobs utilize this line, then we can query across all jobs just asking for their RunInDir, instead of having to ask for each jobs sub-grouped RunInDir – if we had several PGE in this example workflow, then each task would have a different metadata field for it’s RunInDir, and we would have to specify them in the reduced terms in our command-line query if we queried across all tasks for this example workflow). You can create a key-ref which links to another key-ref, however, at the bottom of all these key-ref’s there must be an actual metadata field which is linked to. Also, when a key-ref is created, the metadata field which it links to does not have to exist at that moment, only when the key is dereference inside brackets does the metadata field need to exist. Also, key-ref’s can link to keys specified in imported files (discussed later). The val element supports the same dynamic replacements as key-ref, but when val is used a new metadata field is created in the jobs local metadata with the given value of the val attribute. All the <metadata> elements are loaded in the order they are given, so any metadata field created can be used for dynamic replacement in any of its following <metadata> elements. All the metadata fields created in this <customMetadata> section can then be used in dynamic replacement throughout the rest of this XML configuration file. The <metadata> does support other attributes, but for now, we will just stick to the two we’ve been introduced to. Let’s now move onto the <fileStaging> element – this is the next element loaded in this configuration file. The placement order of the elements inside <pgeConfig> is not static, you can put these elements in any order you want, this has just always been the order that seemed to look best to me. Another thing to note is that all the element inside <pgeConfig> are optional. In this example, we stage input files, however, if you do not need to stage the input files, then the <fileStaging> element can be omitted. In the example, file staging will only occur if the files don’t exist where the job runs. If file staging should always happen, there is a force attribute, which, if set to true, will force file staging. <fileStaging> has an attribute dir which is the directory where the files will be staged and supports any number of <stageFiles> sub-elements. These <stageFiles> elements have an attribute metadataKey which is the name of the metadata field whose values are paths to input files. The metadata fields specified by each <stageFiles> will have their values (i.e. file paths) replace with the new file paths, so if you want to maintain these original file paths, make sure you copy them to another key in the <customMetadata> section. The next element parsed is <dynInputFiles>. This element let’s you specify how to create the PGE configuration file(s). In this example, only one configuration file is created, and that is usually the normal case, however <dynInputFiles> supports any number of <file> elements, where each <file> element represent a PGE configuration file. Each <file> element supports three attributes, which support dynamic replacement: path, writerClass, and args. The path attribute is the full path of the file you want the PGE configuration file written to. The writerClass is the implementation of SciPgeConfigFileWriter which writes the configuration file. In this example the writer is: gov.nasa.jpl.oodt.cas.pge.writers.xslt.XslTransformWriter. This writer is one of CAS-PGE’s out-of-the-box implemented plug-ins. This writer uses XSL Transformations to create the PGE’s configuration file. The last attribute args allows argument to be piped into the writer. Since this PgeConfigBuilder is in XML, the only arguments supported are String values, so any writer plugged in must support receiving their arguments as Strings (then can convert them to whatever object type needed). The args attribute supports any number of arguments separated by commas. This is allowed because of SciPgeConfigFileWriter’s createConfigFile method signature:
public File createConfigFile(String sciPgeConfigFilePath,
Metadata inputMetadata, Object... customArgs) throws IOException;
The last argument allows any number of arguments to be passed to it, so each implementation of this class can parse this argument in its own custom way. XslTransformWriter expects two arguments: The first is the path to an XSLT file and the second is a boolean signifying whether or not it should use XML CDATA context values. The XslTransformWriter takes these two values in as Strings and then converts them to a File object and a boolean respectively. Let’s now look back at our script for running this example workflow. The next line, which we stopped at, was:
_-m DemoPGE/ConfigXsltFile $
/core/pge/policy/demo/config/xslt-config.xsl _
In our CAS-PGE XML configuration file we created a key-ref to that metadata field:
<metadata key="ConfigXsltFile" key-ref="[PGETask/Name]/ConfigXsltFile"/>
This reference key is then used as the first argument passed into the args attribute. This file is just standard XSLT. The XSLT files written for this plug-in will receive input of the job’s instance metadata in CAS-Metadata’s XML format and should convert it into the configuration file format expected by the PGE. This xslt-config.xsl file looks like:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2009, California Institute of Technology.
ALL RIGHTS RESERVED. U.S. Government sponsorship acknowledged.
-->
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:saxon="http://icl.com/xslt"
xmlns:cas="http://oodt.jpl.nasa.gov/2.0/cas"
exclude-result-prefixes="saxon cas">
<xsl:output method="text"/>
<xsl:variable name="newline"><xsl:text> </xsl:text></xsl:variable>
<xsl:template match="/cas:metadata">
<xsl:for-each select="keyval/val[../key='InputFiles']">
<xsl:value-of select="."/><xsl:value-of select="$newline"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
This file is converting the XML metadata format, which looks something like:
<cas:metadata xmlns:cas="http://oodt.jpl.nasa.gov/2.0/cas">
...
<keyval>
<key>InputFiles</key>
<val>/home/bfoster/PCS/deploy/core/pge/policy/demo/input/hello.txt</val>
<val>/home/bfoster/PCS/deploy/core/pge/policy/demo/input/john.txt</val>
</keyval>
...
</cas:metadata>
And creates a configuration file which looks like this:
/home/bfoster/PCS/pge_working_dir/demo-workflow/input-files/hello.txt
/home/bfoster/PCS/pge_working_dir/demo-workflow/input-files/john.txt
We will not go into any more details here about XSLT, if you want to know more, there are several good XSLT resources on the web. The next element loaded is <exe>. This is where the executing of the PGE is setup. <exe> has two attributes, both of which support dynamic replacement: dir and shellType. The dir is the working directory for the PGE execution and shellType is the path to the shell you would like to execute with. CAS-PGE runs PGE’s by creating a script and then executing it; the <exe> element allows for the building of this script. Each <cmd> element inside it is just a line in the script file. So the script which this example configuration would create would be named, sciPgeExeScript_DemoPGE, be written to the execution directory (i.e. the value of dir), and contain the following:
#!/bin/csh
touch ConcatOutput.txt
/usr/java/jdk1.5.0_11/bin/java -cp /home/bfoster/PCS/pge_working_dir/demo-workflow/input-files/acce-pge-1.0.0-dev.jar gov.nasa.jpl.acce.pge.demo.DemoPGE ConfigFile.txt > ConcatOutput.txt
The last element to be loaded is <output>. This is responsible for configuring which file should have metadata files created for them and what metadata should be written in the metadata file. There may be any number of <dir> elements inside <output>, however, you may not put a <dir> element within another <dir> element. This element has two attributes, both of which support dynamic replacement: path and createBeforeExe. The path attribute must be an absolute path a the directory which contains output files and createBeforeExe can be set to true if you would like this directory to be created before the PGE is executed. There may be any number of <files> elements within this element. Each <files> element supports a name, metFileWriterClass, and args attribute. These attributes each support dynamic replacement, and name may be replaced with a regex attribute if you would like to specify group of files in the given directory via a Java Pattern’s regular expression. Since regex’s value is a regular expression, dynamic replacement is not supported for this attribute. The name should be the name of the file in the given directory which the PcsMetFileWriter class specified by metFileWriterClass should write out a metadata file for. The args attribute is similar to the args attribute when used to configure PGE configuration file creation (i.e. <dynInputFiles>). As with SciPgeConfigFileWriter, PcsMetFileWriter has a similar method signature:
protected abstract Metadata getSciPgeSpecificMetadata(File sciPgeCreatedDataFile, Metadata inputMetadata, Object... customArgs) throws Exception;
The args element’s values are piped into the customArgs method argument. The PcsMetFileWriter used in this example is the out-of-the-box implemented plugin:
gov.nasa.jpl.oodt.cas.pge.writers.metlist.MetadataListPcsMetFileWriter. This class is configured via an XML file, which enumerate the metadata fields which should be written out for the given file specified by name or files if regex is used. If we again look back at our script we see the next line is:
_-m DemoPGE/ConfigMetoutFile $
/core/pge/policy/demo/config/metout-config.xml _
This is the path to the XML file which configures which metadata should be written out for the concatenated output file. metout-config.xml contains the following:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2009, California Institute of Technology.
ALL RIGHTS RESERVED. U.S. Government sponsorship acknowledged.
-->
<metadataList>
<metadata key="ProductName" val="[Filename]"/>
<metadata key="Filename"/>
<metadata key="ProductType" val="GenericFile"/>
<metadata key="ProductStructure" val="Flat"/>
<metadata key="ExecutionDirectory" val="[RunInDir]"/>
</metadataList>
Each <metadata> element supports a key and val attribute. val supports dynamic replacement. The same goes for this file as with the CAS-PGE XML configuration file, use key-gen in cases where you want dynamically replaced keys. Any PcsMetFileWriter will set a few metadata fields each time it runs, automatically, but it still has to be told to write them out into the metadata file. Here is a list of these metadata fields:
Filename : This is the name of the file for which a metadata file is being written.
FileLocation : This is the directory which the file is in.
FileSize : This is the size of the file.
Each <metadata> is not required to have a val attribute. Only use this attribute if the metadata field you would like to write out is not set, or you would like to change the value of the set metadata field. If you just want to write out a metadata field as it currently exists, then just use the key or key-gen without a val element, as with the Filename <metadata> element:
<metadata key="Filename"/>
This metout-config.xml while create a file after the PGE executes with the name: [OutputFile].cas, which is really: ConcatOutput.txt.cas. The file will look like:
<?xml version="1.0" encoding="UTF-8"?>
<cas:metadata xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
<keyval type="vector">
<key>ProductName</key>
<val>ConcatOutput.txt</val>
</keyval>
<keyval type="vector">
<key>ProductType</key>
<val>GenericFile</val>
</keyval>
<keyval type="vector">
<key>ProductStructure</key>
<val>Flat</val>
</keyval>
<keyval type="vector">
<key>Filename</key>
<val>ConcatOutput.txt</val>
</keyval>
<keyval type="vector">
<key>ExecutionDirectory</key>
<val>%2Fhome%2Fbfoster%2FPCS%2Fpge_working_dir%2FDemoPge%2Ftest-1</val>
</keyval>
</cas:metadata>
The ExecutionDirectory value looks a little off, this is because the XML encoding is set to UTF-8 in the XML header:
<?xml version="1.0" encoding="UTF-8"?>
The value has had is ‘/’ replaced with the UTF-8 value ‘%2F’. When this XML file is read in by CAS-Crawler and ingested they will be converted back, so your catalog metadata will not have these funky values.