AI Actions

STATUS: Draft

This document captures requirements for making AI actions a first-class citizen in Openwhisk. The main ideas captured bellow are inspired from the work done in running deep learning inferences on images and videos.

TL;DR

A new action kind ( python3-ai ) that contains most frameworks used for ML/DL ( i.e. scipy, tensorflow, caffe, pytorch, and others )
GPU support
tmpfs for sequences and compositions
Higher memory limits ( 4G+)
Longer execution time
Disk space allocation

New action kind - "python3-ai"

Problem

ML/DL frameworks are large in size. With the existing python3 action kind developers have to create large ZIP packages ( 1GB+) that hit the max action size limitations.

Workarounds

Blackbox actions are already supported by OpenWhisk, but given the size of the docker images which are in GBs, cold-start times are highly impacted. Downloading large docker images may add latencies up to minutes, leaving a large footprint on the used network, and increasing the space used by docker images on each host.

Proposal

Create a new action kind: python3-ai

Include the following frameworks and libs: scipy, tensorflow, pytorch, caffee2, and others that the community may ask for.

The image size would be ~6GB, and ideally it would be prewarmed in the OW cluster by running docker pull ... commands on each host.

GPU support

Problem

Processing time is greatly reduced when using GPU.

Workarounds

Use CPU and parallelize.

Parallelization is actually the sweet spot that makes AI actions perform better than any other linear model, even when compared to a linear GPU processing. For example compare 500 parallel actions running on CPU that process a data input in 10 seconds each, vs 1 GPU action performing the same 500 operations sequentially, with each operation taking 1s. The GPU action produces a response in 500s, while the 500 parallel actions produce the response in 10s.

Proposal

Allow developers to specify GPU resources when creating actions:

wsk action create my-action code.py --gpu [number_of_gpus]

Depending on docker support for GPU, developers should be able to specify how many GPUs the action needs.

`tmpfs` for sequences and compositions

Problem

Creating a sequence of actions, or a composition, that processes the same asset having a size greater than the pax payload limit makes it hard to benefit from the piping support in OpenWhisk which makes the output of one action in a sequence to become the input for the next action.

Workarounds

Send Asset by Reference

Developers can use an intermediate blob storage to upload the asset, and pass it to the next action using the URL to the asset, as "the reference" to it. The problem with this workaround is that the size of the asset influences the performance. The bigger the asset, the bigger the impact. If the asset is 2GB, then uploading and downloading to a blob storage on each activation may add up to a minute to the execution time.

Combine multiple actions in one action

Developers can combine the operations that act on the same asset into a single action. This workaround makes it harder to reuse actions in sequences or compositions. It restricts developers from using a polyglot implementation with action written in multiple languages; for instance the AI actions could be written in python, and combined with JS actions that may download the asset at the beginning of the sequence, and upload it at the end, by the last action in the composition.

Proposal

The proposal is to transparently provide developers with a temporary way to store large assets in the OW cluster. This is probably the hardest problem to solve when compared to the other ones b/c it involves persistence, state, and possibly handling large amounts of data. Bellow are listed a few possible options:

Docker volumes

Allow developers to "attach" a persistent disk to an action. The programming model in this case assumes there's always a folder available on the disk on a well known path. The WSK CLI could look like in the example bellow:

wsk action create read-asset read.js --volume my-action-volume
wsk action create action-1 foo.py --volume my-action-volume
wsk action create action-stateless code.js
wsk action create action-2 bar.py --volume my-action-volume
wsk action create write-asset write.js --volume my-action-volume
wsk action create my-sequence --sequence read-asset,action-1,action-stateless,action-2,write-asset

When running the sequence my-sequence, the Openwhisk scheduler can use the volume information defined by each action, to schedule the actions requiring the same volume on the same host. When the read-asset action starts, it creates a new volume that should be uniquely identified. For instance the volume name, the namespace of the subject invoking, and a UUID, could be used. I.e. my-action-volume--guest--<UUID>. When action-1 starts, it mounts the same volume created previously, same for action-2 and write-asset actions. Note that the sequence may have other actions that don't need the volume such as action-stateless; in such cases the actions can be scheduled anywhere in the cluster, as per the existing scheduling mechanism.

The volume my-action-volume can be mounted inside each container on: /mnt/ow/my-action-volume.

The volume should be removed once the prewarmed actions are destroyed.

Open questions:

Should the developers specify a size for the volume, or assume a default for each volume ?

Cluster cache

Provide developers with a caching solution in the cluster. Developers would still pass larger assets by reference between actions, they would still write code to upload or download an asset, but use a cache provided by OpenWhisk, inside the cluster. The cache can be exposed through the Openwhisk SDK through 2 methods: write(asset_name, asset_value), read(asset_name).

The implementation could use a distributed in-memory cache, a distributed FS such as glusterFS, a blob-storage, or even an EBS-like volume attached to 1 machine in the cluster to store the items in the cache.

The problem with this approach is the network bottleneck; even if the action ends up coincidentally on the same host with other actions in the sequence, it would still consume network bandwidth to write or read an asset from the cluster cache, hence it's dependent on the speed of the network.

Streaming responses between actions

In this option there's no persistence involved, but actions communicate directly, or through a proxy that can stream the response. The OpenWhisk scheduler should be aware of the order of activations and start actions requiring

Space shortcuts

Page tree

New action kind - "python3-ai"

Problem

Workarounds

Proposal

GPU support

Problem

Workarounds

Proposal

`tmpfs` for sequences and compositions

Problem

Workarounds

Send Asset by Reference

Combine multiple actions in one action

Proposal

Docker volumes

Open questions:

Cluster cache

Streaming responses between actions

Space shortcuts

Page tree

AI Actions

New action kind - "python3-ai"

Problem

Workarounds

Proposal

GPU support

Problem

Workarounds

Proposal

tmpfs for sequences and compositions

Problem

Workarounds

Send Asset by Reference

Combine multiple actions in one action

Proposal

Docker volumes

Open questions:

Cluster cache

Streaming responses between actions

`tmpfs` for sequences and compositions