THIS IS A WORK IN PROGRESS.

Background

A core part of the Scheduler interface (required to implement a framework) is the concept of resource offers:

class Scheduler
{
public:
...
  /**
   * Invoked when resources have been offered to this framework. A
   * single offer will only contain resources from a single slave.
   * Resources associated with an offer will not be re-offered to
   * _this_ framework until either (a) this framework has rejected
   * those resources (see SchedulerDriver::launchTasks) or (b) those
   * resources have been rescinded (see Scheduler::offerRescinded).
   * Note that resources may be concurrently offered to more than one
   * framework at a time (depending on the allocator being used). In
   * that case, the first framework to launch tasks using those
   * resources will be able to use them while the other frameworks
   * will have those resources rescinded (or if a framework has
   * already launched tasks with those resources then those tasks will
   * fail with a TASK_LOST status and a message saying as much).
   */
  virtual void resourceOffers(SchedulerDriver* driver,
                              const std::vector<Offer>& offers) = 0;
...
}

Mesos sends resource offers to framework Schedulers in order to allow Schedulers to launch tasks in the cluster. Resources consist of things like: CPUs, Memory, Disk, etc.

Let's suppose a Scheduler accepts an offer of 4 CPUs and runs a task that takes 1 CPU for the lifetime of the task. This is clearly not an efficient use of the cluster, but is something that Mesos should strive to ameliorate. First we must consider why Frameworks often under-utilize their allocated resources:

  1. Schedulers don't always accurately know the resource requirements of their tasks.
  2. Schedulers may try to over-provision resources to ensure utilization spikes can be tolerated.
  3. Schedulers may be running tasks that have non-deterministic or variable resource utilization characteristics.
  4. Schedulers may rely on end-users to specify resource requirements, which ultimately ties into 1 and 2.

This is where the concept of revocable offers becomes relevant. One way to improve efficiency across the cluster would be to offer these unused allocated resources to frameworks as revocable offers. These offers would be considered revocable at any time, but they allow frameworks to take advantage of the under-utilized resources in the cluster for tasks that do not have strict running requirements.

Revocable Offers

Semantics

A revocable offers will look and feel like a non-revocable offer, with the following key differences:

  • Once accepted by a Framework, the revocable offer can be revoked at any time*.
  • Revocation means that the underlying task / executor may be killed at any time*.

* At any time: when the tasks owning the allocated resources "need" the resources used by the tasks scheduled on the revocable resources. This is intentionally vague as this will be an algorithmic decision (out of the scope of this design document).

API Changes

Exposing Revocable Offers to Schedulers

The key API change required is the inclusion of revocable offers in the Scheduler API.

Option 1: Add a boolean field to the Offer protobuf.
/**
 * Describes some resources available on a slave. An offer only
 * contains resources from a single slave.
 */
message Offer {
  required OfferID id = 1;
  required FrameworkID framework_id = 2;
  required SlaveID slave_id = 3;
  required string hostname = 4;
  repeated Resource resources = 5;
  repeated Attribute attributes = 7;
  repeated ExecutorID executor_ids = 6;
  optional bool revocable = 8;
}

The advantage of this method is that the existing API otherwise remains the same. However, this has the potential usability issue of Schedulers unintentionally scheduling on revocable offers.

Option 2: Add revocable_offers to the resourceOffers() call.
class Scheduler
{
public:
...
  /**
   * Invoked when resources have been offered to this framework. A
   * single offer will only contain resources from a single slave.
   * Resources associated with an offer will not be re-offered to
   * _this_ framework until either (a) this framework has rejected
   * those resources (see SchedulerDriver::launchTasks) or (b) those
   * resources have been rescinded (see Scheduler::offerRescinded).
   * Note that resources may be concurrently offered to more than one
   * framework at a time (depending on the allocator being used). In
   * that case, the first framework to launch tasks using those
   * resources will be able to use them while the other frameworks
   * will have those resources rescinded (or if a framework has
   * already launched tasks with those resources then those tasks will
   * fail with a TASK_LOST status and a message saying as much).
   */
  virtual void resourceOffers(SchedulerDriver* driver,
                              const std::vector<Offer>& offers,                             
                              const std::vector<Offer>& revocableOffers) = 0;
...
}

The advantage of this method is that Schedulers must be explicit in order to use revocable offers. This also forces Schedulers to alter their code in order to compile against the new libraries (in C++ and Java). 

Option 3: Add a new revocableResourceOffers() call.
class Scheduler
{
public:
...
  /**
   * Invoked when resources have been offered to this framework. A
   * single offer will only contain resources from a single slave.
   * Resources associated with an offer will not be re-offered to
   * _this_ framework until either (a) this framework has rejected
   * those resources (see SchedulerDriver::launchTasks) or (b) those
   * resources have been rescinded (see Scheduler::offerRescinded).
   * Note that resources may be concurrently offered to more than one
   * framework at a time (depending on the allocator being used). In
   * that case, the first framework to launch tasks using those
   * resources will be able to use them while the other frameworks
   * will have those resources rescinded (or if a framework has
   * already launched tasks with those resources then those tasks will
   * fail with a TASK_LOST status and a message saying as much).
   */
  virtual void resourceOffers(SchedulerDriver* driver,
                              const std::vector<Offer>& offers) = 0;
  virtual void revocableResourceOffers(SchedulerDriver* driver,
                                       const std::vector<Offer>& offers) = 0;
...
}

This option has the unfortunate downside that the Scheduler has to make scheduling decisions across two functions, most likely requiring additional state to be stored in the Scheduler implementation.

Revocation Notification

Revocation for offers used to launch tasks will be communicated to Schedulers via a TASK_KILLED status update along with a relevant message. This allows Schedulers to take advantage of their existing status update logic, rather than having to implement a new callback for offer / task revocation. Likewise, we can re-use offerRescinded() to notify Schedulers when an offer gets revoked before being used.

Similarly for Executors, they do need to be notified of revocation, we can ask them to kill all tasks, or kill everything ourselves.

Implementation

The initial version of offer revocation will make offer / revocation decisions locally on the Slave. This simplifies the design, at the cost of making decisions that are less globally optimal. (A local decision will not have global context that might prove useful for determining when to offer / revoke offers.) However, the slave can use finer granularity data, and perform more intensive estimations given the scale of the data is several orders of magnitude smaller. The master will require an endpoint to allow the Slave to notify when revocable resources are available.

DIAGRAM

With that in mind, most of the changes required on the Master will reside in the Allocator, to ensure revocable offers are handled and accounted correctly. In particular, the Master should ensure that these resources are not considered available when revoked. The Slave will be making such decisions.

Changes to the Slave

Likewise, the Slave will need to communicate with the Master when it determines that revocable offers are available.

TODO: Remaining Sections

  • No labels