ID | IEP-130 |
Author | |
Sponsor | |
Created |
|
Status | IN PROGRESS |
In a dynamic environment, users may occasionally execute queries that were not intended, or where the query logic is faulty (e.g., missing WHERE clauses in an update or delete statement, or incorrect joins causing Cartesian products). The ability to cancel these queries reduces the impact of such mistakes by preventing excessive data processing or modification.
Besides, long-running or resource-intensive queries can degrade overall system performance, especially in environments where multiple users or applications share resources. Query cancellation allows administrators or users to terminate these queries before they fully execute, freeing up valuable resources such as CPU, memory, and I/O for other important tasks, preventing the entire system from crashing.
This document addresses cancellation of query-related resources only, such as Query itself and Transaction, and also Compute, while handling of the rest of the resources (such as Continuous Queries, Client Connections, etc) is a subject for follow up documents. The goal of this document is to provide a uniform approach to the problem of cancellation of long-running processes within the cluster.
A pair of new objects named CancelHandle
and CancellationToken
must be introduced:
/** * A handle which may be used to request the cancellation of execution. */ public interface CancelHandle { /** A factory method to create a handle. */ static CancelHandle create() {} /** * Abruptly terminates an execution of an associated process. * * <p>Control flow will return after the process has been terminated and the resources associated with that process have been freed. */ void cancel(); /** * Abruptly terminates an execution of a associated process. * * @return A future that will be completed after the process has been terminated and the resources associated with that process have * been freed. */ CompletableFuture<Void> cancelAsync(); /** * Flag indicating whether cancellation was requested or not. * * <p>This method will return true even if cancellation has not been completed yet. * * @return {@code true} when cancellation was requested. */ boolean isCancelled(); /** * Issue a token associated with this handle. * * <p>Token is reusable, meaning the same token may be used to link several executions into a single cancellable. */ CancellationToken token(); } public interface CancellationToken { }
Besides, IgniteSql
must have a new set of methods accepting CancellationToken
as parameter:
public interface IgniteSql { <...> ResultSet<SqlRow> execute( @Nullable Transaction transaction, @Nullable CancellationToken cancellationToken, String query, @Nullable Object... arguments ); <...> }
Similar changes must be done in IgniteCompute
as well.
IgniteSql sql = <acquire sql api>; // create cancellation handle CancelHandle handle = CancelHandle.create(); // propagate cancellation token to an execution sql.executeAsync(null, handle.token(), "<query text>"); // propagate token from the same handle to more executions, if needed sql.executeAsync(null, handle.token(), "<query text>"); // cancel everything at once handle.cancel();
JDBC already has idiomatic way to cancel current execution:
public interface Statement extends Wrapper, AutoCloseable { <...> /** * Cancels this <code>Statement</code> object if both the DBMS and * driver support aborting an SQL statement. * This method can be used by one thread to cancel a statement that * is being executed by another thread. * * @exception SQLException if a database access error occurs or * this method is called on a closed <code>Statement</code> * @exception SQLFeatureNotSupportedException if the JDBC driver does not support * this method */ void cancel() throws SQLException; <...> }
Similar to JDBC, ODBC has its own idiomatic way to cancel a statement: https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlcancel-function?view=sql-server-ver16
A new set of commands must be introduced:
KILL QUERY '<query_id>' [ NO WAIT ] KILL TRANSACTION '<tx_id>' [ NO WAIT ] KILL COMPUTE '<job_id>' [ NO WAIT ]
Where
Besides, a new set of system view must be added in order to make it possible to find and identify problematic processes:
TRANSACTIONS -- node system view COORDINATOR_NODE_ID VARCHAR STATE VARCHAR ID VARCHAR START_TIME TIMESTAMP WITH LOCAL TIME ZONE IMPLICIT BOOLEAN TYPE VARCHAR -- possible values are [READ_ONLY|READ_WRITE] PRIORITY VARCHAR LOCKS -- node system view OWNING_NODE_ID VARCHAR TX_ID VARCHAR OBJECT_ID VARCHAR MODE VARCHAR SQL_QUERIES -- node system view INITIATOR_NODE_ID VARCHAR PHASE VARCHAR -- i.e. INITIALIZATION | OPTIMIZATION | EXECUTION | etc QUERY_ID VARCHAR PARENT_QUERY_ID VARCHAR -- reference to a script ORIGINAL_SQL VARCHAR START_TIME TIMESTAMP WITH LOCAL TIME ZONE SCHEMA_NAME VARCHAR TX_ID VARCHAR COMPUTE_JOBS -- node system view ID VARCHAR COORDINATOR_NODE_ID VARCHAR STATUS VARCHAR CREATE_TIME TIMESTAMP WITH LOCAL TIME ZONE START_TIME TIMESTAMP WITH LOCAL TIME ZONE FINISH_TIME TIMESTAMP WITH LOCAL TIME ZONE
Note: LOCKS.OBJECT_ID denotes the top most object of the lock hierarchy (i.e. tableId or indexId). We must ensure proper value is exposed from LockKey
(e.g. there is proper to_string implementation).
A new set of endpoints must be introduced:
GET /management/v1/transactions/ GET /management/v1/transactions/{tx_id} DELETE /management/v1/transactions/{tx_id} GET /management/v1/sql/queries/ GET /management/v1/sql/queries/{query_id} DELETE /management/v1/sql/queries/{query_id} GET /management/v1/compute/jobs/ GET /management/v1/compute/jobs/{job_id} DELETE /management/v1/compute/jobs/{job_id}