Apache Airavata is a framework that supports execution and management of computational scientific applications and workflows in grid-based systems, remote clusters and cloud-based systems. Airavata’s main focus is on submitting and managing applications and workflows in grid based systems. Airavata’s architecture is extensible to support for other underlying resources as well. Traditional scientific applications provide a portal for users to submit and manage scientific applications which is called as science gateways. Airavata can be used by scientific gateway developers as their middleware layer. They can directly call airavata API in order to communicate with grid based system.
Apache Airavata Architecture
Airavata has several main components.
- Airavata API - component where outside users / gateway developers can communicate with Airavata.
Orchestrator - component which orchestrates the application to AMQP based worker queue to GFac to act upon
Workflow interpreter - component which manages the workflows that submit
Application Factory (GFac) - component which communicate with remote resource
Registry - data store of airavata
Messaging - component which publish notifications related to application statuses
Airavata API is written using apache thrift. Because of that, airavata can generate client libraries for different languages.
Airavata Data Model
Airavata data models divided into two main categories. One set of data models is for application registration which is called as app-catalog and the other category is called experiment-catalog which is related to application execution.
AppCatalog Data Model
Airavata AppCatalog has six main data models as below.
ComputeResourceModel - abstraction of remote compute resource which contains information about resource queues, host addresses, job submission protocols and other useful information related to compute resource
ApplicationModule - Remote resources contains modules / applications already installed in the systems. Developers can define their own applications as well.
ApplicationInterfaceModel - Interface of the module which defines the inputs and outputs for the application. Same module can have multiple interfaces defined
ApplicationDeploymentModel - model that bridges the compute resource and the module.
GatewayProfileModel - abstraction of the science gateway
- GatewayPreferenceModel - model to specify what are the compute resources preferences for a given gateway. This model includes information such as login usernames, allocation ids etc
ExperimentCatalog Data Model
ExperimentCatalog is mainly used with application execution. When a user needs to execute an application, he needs to model it according to Experiment data model. Airavata internally creates other data models (process model, task model, job model). If the application is a single node application, then there will be only one process for that experiment. If your experiment is a workflow, then it contains a list of processes. According to the request, airavata internally creates tasks. Those tasks can be input staging, job submission, job monitoring, output staging etc. Users can develop their own tasks and update the task chain in order to make it effective.
What types of jobs are supported
Apache airavata mainly support grid-based application execution at the moment. But airavata architecture is extensible so that a developer can write custom job submission tasks if he wants to submit the job to cloud-based application or any other remote cluster.
Apache airavata has a java based application called XBaya to create workflows, submit and manage multiple applications (This works best with Airavata 0.14.). Airavata also has a web-based interface (PHP Gateway) written using airavata php client library where users can deploy it in their own systems and use it to register applications, run and monitor applications.