ID | IEP-125 |
Author | Nikolay Izhikov |
Sponsor | |
Created | 12.07.2024 |
Status | DRAFT |
Currently, Ignite provide several interfaces to query data:
Ignite, also, provides ACID transactions for Key-Value API. But, for quering data we have known limitation: All query interfaces doesn't respect changes made inside current transaction. Simple test to illustrate current behaviour:
Traditional databases like MySQL, PostgreSQL and others respects transactions boundaries while executing queries. psql example:
postgres=# CREATE TABLE TBL(ID BIGINT PRIMARY KEY, VAL BIGINT); CREATE TABLE postgres=# START TRANSACTION; START TRANSACTION postgres=*# SELECT * FROM TBL; id | val ----+----- (0 rows) postgres=*# INSERT INTO TBL VALUES(1, 1); INSERT 0 1 postgres=*# SELECT * FROM TBL; id | val ----+----- 1 | 1 (1 row) postgres=*# COMMIT; COMMIT postgres=# SELECT * FROM TBL; id | val ----+----- 1 | 1 (1 row)
It very unexpected for Ignite users to deal with this kind of API inconsistency and lack of interoperability.
The goal of IEP is to provide consistency and interoperability between Key-Value, SQL, Scan query, Index Query APIs.
Three query API has it's own advantages, use-cases and details, so description will be query API specific. In separate section different isolation levels will be described.
IgniteTxState
.get
or put
invocation Ignite checks local state to get or store tx version of entry.GridCacheMvcc
.SQL and other types of queries works on top of datastructures(BPlusTree) that stores data off-heap. This lead to the current behavior when transaction and queries not overlap.
Described implementation and issue has two obvious solutions:
Queries must analyze and keep up-to-date current transaction datastructures during invocation.
Transaction flow must be modified to store changes off-heap. Queries must filter out other transaction changes and analyze only committed or self-changed entries. Other possible implementation of this approach is MVCC instead of lock-based concurrency control.
First approach has well understood limitations - as locks and changes stored in JVM heap it leads to GC pressure and limited scalability. Moreover, in some cases (SQL select) transaction state must be sent around cluster to achieve correct results. Ignite primary use-cases are around OLTP workload which means we deal with small and fast transactions that change coupld of entries. So, overhead will be small for most cases.
Second approach more scalable but requires huge and risky refactoring of literally all Ignite subsystems.
In the first two phases of implementation of this IEP it is proposed to go with first approach.
This following of description part provides high-level details of proposed implementation:
New SQL Calcite engine gives Ignite the power to control execution flow. Proposed design of execution flow for different types of queries:
Currently, executed on transaction coordinator node. But, separate thread used to permorm plan with ModifyNode
. This means we must do the following:
BaseQueryContext
.ModifyNode
in transaction resume/suspend.To invoke select statements correctly execution engine must know IgniteTxState
in time of table(index) scan. Ignite must send IgniteTxState
(possibly filtering out some entries) to the node that invokes query step. This will allow to SQL engine to mixup transaction state and table(index) data to achieve correct results.
No additional locks required. Currently, queries works on this isolation level but without knowledge of transaction changes.
Ignite must acquire shared-lock on each entry that conforms query criteria. This can be achieved in two ways: clause based locks, entry by entry locks. Decision must be made additionally in time of implementaion.
Ignite must acquire exclusive-lock on each entry that conforms query criteria. This can be achieved by clause based locks.
With SQL and Scan queries on strict isolation level it very simple for the end-user to acquire a lot of locks during "simple" query. Mechanism like CalciteQueryEngineConfiguration#getQueryMemoryQuota
, CalciteQueryEngineConfiguration#getGlobalMemoryQuota
must be created(or reused) to prevent user from harming server nodes.