View Source

This proposal is a to outline an approach to OpenWhisk as a source for HTTP APIs that are used to drive UIs in browsers or mobile apps - "UI driven use cases" - compared to use cases where no user is present at the time of activation processing - "Event driven use cases".

Authors

Tyson Norris: tnorris@adobe.com
Dragos Dascalita Haut ddascal@adobe.com

Feedback

OpenWhisk dev list: dev@openwhisk.apache.org

Original OpenWhisk Behavior

Original behavior is according to this diagram:

OpenWhisk > UI Driven Use Cases > image2017-7-1 10:38:28.png

In this workflow the execution of an activation is:

queued in kafka topic associated with a specific invoker (there is some minimal logic here for allowing an action to have some affinity with a particular invoker, and spread actions across available invokers, etc)
dequeued by the invoker IFF the existing in flight activations has dwindled to below some threshold (see ActivationFeed.pipelineFillThreshold)
executed in a container AFTER
- the container is initialized, where initialization is required when:
  - user namespace is different than the last one executed by this container
  - OR the action is different than the last one executed by this container
  - OR some initialization interval has elapsed since the last initialization
- any previous activation is completed

Primary bottlenecks in throughput with this arrangement is per activation isolation. This means that even if the same user submits the same action for invocation (less likely, but not in all cases), the container usage is serialized so that a single activation is in flight at any given time. This is enforced by:

queuing at multiple layers:
- kafka
- ActivationFeed
- ContainerPool/ContainerProxy are singleton actors, where they process exactly 1 message at any time
some action containers (e.g. nodejs6Action) generate errors when another activation arrives while an existing activation is in-flight.

Secondary complications are:

kafka - not as much of a bottleneck, but a complication when used in a blocking or realtime request/response workflow, since scaling throughput via kafka is affected by:
- additional network hops/serdes/etc - the overhead of just transmitting data to a separate system
- scaling out consumption of messages is affected by partitioning scheme, and it will be hard to predict a scheme that will suit dynamically changing http traffic patterns
log processing - the logs are collected and persisted with each activation run; this is not done in blocking fashion so should not affect latency, but does affect throughput

OpenWhisk use cases

Currently OpenWhisk offers both

event driven useage - where a system can "fire and forget" a trigger, either automatically via schedule, or an HTTP endpoint where the client does not wait for a response.
UI driven usage - where an action is invoked as part of an HTTP workflow that is driven by a user who is waiting for a response, such as:

- dependent API usage in customer apps
- browser based app usage

These "realtime API usages" are cases where latency fluctuation based on concurrent loads is not tolerable, compared to event processing cases where an additional n seconds of latency during peek event generation is not noticeable, in most cases, and in many cases the response is not ever seen by the event producer (sensor data collection, etc). The specific differences between UI and event-driven cases are listed below:

	UI driven use case	event driven use case
blocking parameter	always used	sometimes used
desired timeout behavior	504 response - response will never be provided	202 response - response will be provided later
activation concurrency	often concurrent with activations of the same action (to support scaleout independent of container resources)	never concurrent with other activations
action container life cycle	always reused (without re-initialization)	may be reused (only for same action+subject)
comparison to conventional web application life cycle (start once, serve many requests)	same as conventional web application start web application process many requests concurrently stop web application (only for deploy of new action container, idle for some period)	more like: start web application, process single request, (or a sequence of single requests, for same action+subject) stop web application
affects on container resource requirements	number of containers required is directly bound to the number of unique actions, and indirectly bound to the number of concurrent users (only for horizontal scaling, same as a web server)	number of containers required is directly bound to the number of unique actions, and directly bound to the number of concurrent users
log collection	cannot harvest logs for activations as part of activation processing - log collection must be performed at an aggregate level, and made available to developers via query tools	can harvest logs for storage as part of activation processing - since each run will leave the container in a state where the most recent logs are associated with the most recent activation run

Proposed OpenWhisk Behavior

In general, this proposal presents an option for UI driven activation processing for enabling realtime API consumption cases like:

using OpenWhisk as an extension point for existing APIs that service user facing applications, where these APIs have a significant number of concurrent users
using OpenWhisk as a system for implementing APIs that serve content to browser/mobile UI applications for a significant number of concurrent users

Proposal for using http for activation transport is below:

OpenWhisk > UI Driven Use Cases > image2017-7-1 10:39:20.png

Important points:

Enabling http routing (from controller to container) should be OPTIONAL (and is different from existing blocking=true parameter or --web annotation) ; could be based on:
- an annotation on the action (requires logic be added to the action containers to conditionally tolerate concurrent activations)
- a different/unique action type
Multiple activations for the SAME action can be serviced by a single shared container concurrently
Additional logic MAY be added to treat an existing shared container as "at capacity" once a certain number of outstanding concurrent requests is reached (at which point additional containers should launch to share load)
Invoker will advertise the container state(s) to the controller (in addition to the health status, which already happens)
Controller can route activation directly to a container once the action is resolved (skipping kafka and invoker)
Invoker is still responsible for handling all cold-start use cases (where no existing container exists, or not enough to handle load)*

* Although extending the Controller LoadBalancer component to leverage a clustering system instead of "a set of Invokers" should also be optional and have the same affects on throughput.

Benefits:

Increased throughput for traffic patterns that are "UI driven" use cases:
- high volume of unique users (10k+ concurrent users)
- comparatively low volume of unique actions (<1000 unique actions)
Resource requirements based only on number of unique actions*

*Until saturating capacity is reached based on similar deployment of the same technology as "a conventional web application wrapped in a container" - e.g. if I can deploy a conventional nodejs application deployed as a container to service 4000 concurrent user, I should be able to implement an action that services similar traffic, using a single container within the OpenWhisk system

Isolation details

This proposal purposefully decreases isolation for the gain of throughput. It is true that this exposes action developers to risks such as:

leaking "session" data across different activations
incorrectly blocking code affects many users instead of just one
incorrect estimation of resource usage

However, these are the same risks that web developers take when building conventional web applications, so generally developers should not be averse to these issues.

Resource Requirements Estimation

The area of "incorrect resource usage estimation" is one where any container based applications are susceptible to starvation ("I thought my app/function would only require 128m") - this does not change with this proposal, except that it is somewhat simpler to simulate a single user for measurements used for estimation. Estimating resource requirements is a challenge regardless of whether the application is built into a custom container, running as actions in OpenWhisk, and servicing 1 user or 100 users. There are areas where we can help, such as:

Provide multiple pre-warmed container instances - this may be wasteful, but is a way to provide some guarantees around availability under load, and less expensive than 1-container per action requirement of old scheme when under heavy concurrent load
Collect and expose data on OOM killer encounters from docker containers that may be masked by mesos/marathon in general, so that devs can tune the system based on usage over time, even in an automated way in some cases. This is arguably required anyways, but is less likely to be an issue for single-concurrent-user usage of a container.

Sample Data

In a simple prototype, creating load using the https://github.com/markusthoemmes/openwhisk-performance throughput.sh test, throughput increases tested locally (default configs except for extended throttling limits) were measured at metrics below. This test used a simple async action that produces a result after 175ms, to simulate waiting for a downstream external API to return, which would be a common scenario for actions that service UI driven use cases:

function main(params) {
    console.log("testing async-noop.js");
    return new Promise(function (resolve, reject) {
        setTimeout(function () {
            resolve({done: true});
        }, 175);
    })
}

	original deployment	http activation processing approach
mean latency	8642.7 ms	915.1 ms
requests per second	11	102
number of action containers	2	1