The distributed version is the key next milestone after the single node version, and aims to be a high-performance, elastic query process engine.

It consists of three components:

It also uses Transactional Message Bus (TMB) as the communication layer between the above components, and relies on HDFS (via libhdfs3) as the persistent storage.

It supports multiple instances of DistributedCli connecting to Conductor, and the queries from different clients run concurrently. However, the queries from the same client one by one. Note that Quickstep does not have transaction support yet.

For a SELECT query, it works as following:

  1. DistributedCli sends a query to Conductor.
  2. Conductor optimizes the query, and generates a query execution plan. The execution plan consists of a number of operators, which generates a bag of WorkOrders, each of which corresponds to a Quickstep block.
  3. Foreman in Conductor manages the query execution, and reports the query completion to the main thread of Conductor.
    1. Foreman sends a serialized WorkOrder to an Executor (actually Shiftboss) based on some policy (e.g., the data locality, or the minimal loads).
    2. Shiftboss in Executor deserializes the WorkOrder, and dispatches it to a Worker.
    3. Worker in Executor executes the WorkOrder, including a possible data exchange from a StorageManager peer.
    4. Worker in Executor notifies Shiftboss regarding the completion of the WorkOrder, and Shiftboss in Executor forwards it to Foreman.
    5. Go to a), unless the query completes.
  4. The main thread of Conductor replies to DistributedCli regarding the query completion.
  5. DistributedCli reads the query result, and notifies Conductor regarding the finish of reading the query result.
  6. Conductor deletes the query result.

Currently, it has some issues:


The e2e Performance Evaluation for the Network based TMB implementation

The goal of this task is to evaluate whether TMB is suitable for the distributed Quickstep as a high-performance, scalable communication infrastructure. If not, we may need to replace it with other license-compatible, messaging system middleware.

The root cause of the performance issue mentioned above is the receive API implementation in the network-based TMB, which uses the sleep system call for blocking. We need a better implementation, e.g., event-driven based.

One alternative is to avoid the client side polling, by refactoring the Receive RPC API the server-side blocking implementation, instead of client-side streaming.

10-node-cluster Setup in CloudLab

Set up a 10-node cluster for the distributed Quickstep in CloudLab, and run the SSB benchmark.

The e2e Performance Evaluation for the Data Exchange Operation

The data exchange operation (also called Shuffle in other context) exposes performance impacts in the distributed query execution. And such performance evaluation could help StorageManager, when loading a block, to determine whether to fetch a block from remote or load from the persistent storage.

Check the Semantics Equivalence between the single-node and distributed version

In the distributed version, some operator (e.g., delete, update) is not trivial to implement to ensure the data consistency due to block caches.

Partition Support in (Distributed) Quickstep

Support Dynamic Scale-in / out in Ensemble and Elasticity Evaluation

Study Data Replication / Morph Impacts in the Distributed Query Execution