Introduction

One of the major benefits to using Helix to provision containers is that Helix can also schedule other tasks or resources to run inside those containers. Then, users don't need to guess how many resources their tasks will take up; Helix will automatically scale containers up or down to manage the load. This document describes the design of such a task execution interface and framework.

Problem Statement

Helix provisioning allows a target provider to dictate how many containers Helix should tell provisioners to have running at any given time. The CPU and memory requirements for these containers are the only constraints that a user should need to provide. However, it's not enough just to have allocated containers running. These containers still need to be able to do something meaningful. These containers may run some long-lived service, or they could support running an arbitrary set of tasks that are started on-demand or by a schedule. Generally, users don't know how many physical resources to allocate for these tasks, so we'd like to just schedule them to run and let Helix take care of optimizing the number of instances running to support the tasks. We should be able to schedule tasks inside of Helix-scheduled containers to support this.

Existing Work

There is currently a task framework checked into Helix trunk that does the following:

Defines a workflow as a collection of tasks that will be started according to a DAG
Each task is a single resource where the partitions are a single instance of the task to be run alongside a partition of a target resource
As each task partition completes, the overall task status is updated. When all partitions are completed, the task is marked complete. When all tasks in a workflow are complete, the workflow is complete.

Proposed Design

Risks

Dependencies

Performance, Scalability, and Provisioning

Monitoring and Alerting

Roll-Out Plan

...

We can use the existing monitoring work to allow querying the monitoring provider for CPU/memory/load, enabling complex target providers.

Child pages

Versions Compared

Old Version 1

New Version 2

Key

Introduction

Problem Statement

Existing Work

Proposed Design

Risks

Dependencies

Performance, Scalability, and Provisioning

Monitoring and Alerting

Roll-Out Plan

Child pages

Page History

Versions Compared

Old Version 1

New Version 2

Key

Introduction

Problem Statement

Existing Work

Proposed Design

Risks

Dependencies

Performance, Scalability, and Provisioning

Monitoring and Alerting

Roll-Out Plan