ID	IEP-37
Author	Igor Seliverstov
Sponsor
Created	06 Sep 2019
Status	DRAFT

Motivation

Current SQL engine has a number of critical limitations:

No data co-location control, i.e. arbitrary data can be returned silently
Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes.
Limited execution modes: either two-phase execution (default) or "distributed joins" which adds one more phase
Lack of proper planner which will take in count both data distribution and data statistics

Description

The approach to solve the limitations implies more complex execution flow that brings a new abstraction: execution graph.

This graph consists of all execution steps (relational operators) like join, filter, sort, etc. The execution graph may be transformed into another one saving query semantic during query optimisation process using relational algebra and transformation rules. After transformation the graph is split into a set of dependent subgraphs where an output of dependent subgraph is an aggregated input of depending one (in other words we have more than one map-reduce steps).

Optimized query execution graph (or query plan) used to build final execution tasks.

So, the query flow gets a number of additional steps and will consist of next steps:

SQL parsing and validation (output: SQL AST)
Query rewrite (output: rewritten SQL AST) - need to (for example) turn DML queries into selects, returning set of rows to change (queries like 'INSERT INTO table1 SELECT * FROM table 2').
SQL AST to Query Plan transformation (output: query execution graph which word by word represents the original SQL query)
Query execution graph optimization (output: optimized query execution graph with additional nodes representing node-to-node communications - Exchanges)
Query execution graph splitting (output: set of execution tasks, each task represents a unit of work in scope of whole query executed on a single node)
Tasks executing (output: resulting cursor)

Example:

Initial query:


SELECT t1.name, t2.name as projectName FROM Persons t1, Projects t2 where t1.id == t2.responsiblePersonId

Let's assume there is no collocation and the data placed on different nodes.

Initial execution graph:

Project (t1.name name, t2.name projectName)
   Join (t1.id == t2.responsiblePersonId)
      Scan (Persons t1)
      Scan (Projects t2)

Transformed graph:

Exchange (SINGLE) // collecting
   Project (t1.name name, t2.name projectName)
      Join (t1.id == t2.id)
         Exchange (HASH t1.id) // repartitioning
            Project (t1.name name, t1.id id)
               Scan (Persons t1)
         Exchange (HASH t2.id) // repartitioning
            Project (t2.name name, t2.responsiblePersonId id)
               Scan (Projects t2)

Split tasks:

1) Executes on a client node:

   Receive (id = 1)

2) Executes on an aggregator/aggregators node(s):

   Send (targetId = 1)
      Project (t1.name name, t2.name projectName)
         Join (t1.id == t2.responsiblePersonId)
            Receive (id = 2 t1)
            Receive (id = 3 t2)

3) Executes on data nodes:

   Send (targetId = 2)
      Project (t1.name name, t1.id id)
         Scan (Persons t1)

4) Executes on data nodes:

   Send (targetId = 3)
      Project (t2.name name, t2.responsiblePersonId id)
         Scan (Projects t2)

Each node may have several roles, intermediate tasks count is unlimited, there may be any count of subsequent joins or sub-selects.

Apache Calcite library is going to be responsible for execution graph building/optimizing.

There are several example of successful Calcite integrations (Apache Drill, Apache Flink, Hive, etc)

The integration steps:

Ignite logical convention implementing (Relational graph nodes, converter rules), so, Calcite can use Ignite's own operations costs, we have a control on what variant of graph is preferable.
Index Scan rules implementing - Apache Phoenix experience may be reused. Range filters, sorted scans, some projections transform into index scans.
Exchange related rules implementing (affinity aware) - Apache Drill experience may be reused. SINGLETON, RANDOM, HASH and BROADCAST distribution types needed.
Sender/Receiver infrastructure implementing. - Each Exchange rewrites into a pair of Receiver and Sender where Receiver is a relation node and Sender is an infrastructure object which is used to stream target Exchange subgraph result to a particular remote receiver.
Physical convention implementing - as a start point we may use one of provided by Calcite conventions (Bindable, Enumerable, Interpretable) rewriting particular relational nodes and converter/transform rules into our own implementations one by one.

Risks and Assumptions

The main issue is the new Calcite based engine (the engine) is completely different to current one. At first the engine will available via internal API. We need really good test coverage to make sure the feature works as expected in all possible scenarios.

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-td43724.html

Reference Links

https://calcite.apache.org/

https://phoenix.apache.org/

https://drill.apache.org/

Page tree

Motivation

Description

Example:

Initial query:

Initial execution graph:

Transformed graph:

Split tasks:

The integration steps:

Risks and Assumptions

Discussion Links

Reference Links

Tickets

Page tree

IEP-37: New query execution engine.

Motivation

Description

Example:

Initial query:

Initial execution graph:

Transformed graph:

Split tasks:

The integration steps:

Risks and Assumptions

Discussion Links

Reference Links

Tickets