ID | IEP-37 |
Authors | |
Sponsor | |
Created | 06 Sep 2019 |
Status | DRAFT |
Current SQL engine has a number of critical limitations:
SELECT t1.a, t2.b FROM t1, t2 WHERE t1.id = t2.id
- silently returns wrong results in case of t1 and t2 are not co-locatedSELECT * FROM person p WHERE p.salary > (SELECT avg(salary) from person)
- doesn’t work at all because it cannot be executed in two stepsClassical database query execution is done using roughly the following steps [1], [2]
The key point in the aforementioned plan is to have a relational algebra execution plan which can undergo arbitrary equivalence transformations by the rules of relational algebra. There is a well-studied optimization approach used in many production systems [3] for optimizing query execution plans.
The idea of this IEP is to introduce all missing intermediate steps into the query execution flow and operate on query execution graph.
This graph consists of all execution steps (relational operators) like join, filter, sort, etc. The execution graph may be transformed into another one saving query semantic during query optimisation process using relational algebra and transformation rules. After transformation the graph is split into a set of dependent subgraphs where an output of dependent subgraph is an aggregated input of depending one (in other words we have more than one map-reduce steps).
Optimized query execution graph (or query plan) used to build final execution tasks.
SELECT t1.name, t2.name as projectName FROM Persons t1, Projects t2 where t1.id == t2.responsiblePersonId
Let's assume there is no collocation and the data placed on different nodes.
Project (t1.name name, t2.name projectName)
Join (t1.id == t2.responsiblePersonId)
Scan (Persons t1)
Scan (Projects t2)
Exchange (SINGLE) // collecting
Project (t1.name name, t2.name projectName)
Join (t1.id == t2.id)
Exchange (HASH t1.id) // repartitioning
Project (t1.name name, t1.id id)
Scan (Persons t1)
Exchange (HASH t2.id) // repartitioning
Project (t2.name name, t2.responsiblePersonId id)
Scan (Projects t2)
1) Executes on a client node:
Receive (id = 1)
2) Executes on an aggregator/aggregators node(s):
Send (targetId = 1)
Project (t1.name name, t2.name projectName)
Join (t1.id == t2.responsiblePersonId)
Receive (id = 2 t1)
Receive (id = 3 t2)
3) Executes on data nodes:
Send (targetId = 2)
Project (t1.name name, t1.id id)
Scan (Persons t1)
4) Executes on data nodes:
Send (targetId = 3)
Project (t2.name name, t2.responsiblePersonId id)
Scan (Projects t2)
Each node may have several roles, intermediate tasks count is unlimited, there may be any count of subsequent joins or sub-selects.
Apache Calcite library is going to be responsible for execution graph building/optimizing.
There are several example of successful Calcite integrations (Apache Drill, Apache Flink, Hive, etc)
It has to generate the same execution plan as H2 for commonly used queries (co-located queries) - only two phases, this means there is no intermediate local task having a Sender on top of execution sub-graph and a Receiver at the bottom for such query (except cases when such behavior is forced by hints - it's helpful to delegate results aggregation to server nodes in case a requesting client have a little free memory)
The list may be increased.
The main issue is the new Calcite based engine (the engine) is completely different to current one. At first the engine will available via internal API. We need really good test coverage to make sure the feature works as expected in all possible scenarios.
http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-td43724.html
[1] https://arxiv.org/pdf/1802.10233.pdf
[2] https://www.youtube.com/playlist?list=PLSE8ODhjZXja7K1hjZ01UTVDnGQdx5v5U
[3] https://cs.uwaterloo.ca/~david/cs848/volcano.pdf