Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The distributed version is the key next milestone after the single node version, and aims to be a high-performance, elastic query process engine.

Its It consists of three components:

  • Conductor consists of the parser, the optimizer, Foreman, and CatalogCatalog, and BlockLocator.
    • BlockLocator caches the location of each block, and is used for exchanging Quickstep blocks between StorageManagers.
  • Ensemble, which manages a cluster of Executors, and
    • Executor consists of StorageManager and Worker, Shiftboss, and a group of Workers. Shiftboss manages Workers through the TMB in-memory implementation.
  • DistributedCli.

It also uses Transactional Message Bus (TMB) as the communication layer between the above components, and relies on HDFS (via libhdfs3) as the persistent storage.

...

  1. DistributedCli sends a query to Conductor.
  2. Conductor optimizes the query, and generates a query execution plan. The execution plan consists of a number of operators, which generates a bag of WorkOrders, each of which corresponds to a Quickstep block.
  3. Foreman in Conductor manages the query execution, and reports the query completion to the main thread of Conductor.
    1. Foreman sends a serialized WorkOrder to an Executor (actually Shiftboss) based on some policy (e.g., the data locality, or the minimal loads).
    2. Shiftboss in Executor deserializes the WorkOrder, and dispatches it to a Worker.
    3. Worker in Executor executes the WorkOrder, including a possible data exchange from a StorageManager peer.
    4. Worker in Executor notifies Shiftboss The Executor notifies Foreman regarding the completion of the WorkOrderthe WorkOrder, and Shiftboss in Executor forwards it to Foreman.
    5. Go to a), unless the query completes.
  4. The main thread of Conductor replies to DistributedCli regarding the query completion.
  5. DistributedCli reads the query result, and notifies Conductor regarding the finish of reading the query result.
  6. Conductor deletes the query result.

...