IDIEP-130
Author
Sponsor
Created

  

Status

IN PROGRESS


Motivation

In a dynamic environment, users may occasionally execute queries that were not intended, or where the query logic is faulty (e.g., missing WHERE clauses in an update or delete statement, or incorrect joins causing Cartesian products). The ability to cancel these queries reduces the impact of such mistakes by preventing excessive data processing or modification.

Besides, long-running or resource-intensive queries can degrade overall system performance, especially in environments where multiple users or applications share resources. Query cancellation allows administrators or users to terminate these queries before they fully execute, freeing up valuable resources such as CPU, memory, and I/O for other important tasks, preventing the entire system from crashing.

Description

Scope

This document addresses cancellation of query-related resources only, such as Query itself and Transaction, and also Compute, while handling of the rest of the resources (such as Continuous Queries, Client Connections, etc) is a subject for follow up documents. The goal of this document is to provide a uniform approach to the problem of cancellation of long-running processes within the cluster.

Requirements

  1. It must be possible to cancel execution of a query from the public API even before the first page is ready.
  2. It must be possible to cancel a resource from any node in the cluster, regardless of whether the current node holds the resource or not. 
  3. Broadcasting of messages should be avoided, when possible.
  4. There must be an ability to wait until the resource is completely freed, as well as the option to fire-and-forget (resource will be cleared eventually, but the control will be returned to the user immediately).

Proposal

Cancellation In User-Facing API

Native API

A pair of new objects named CancelHandle  and CancellationToken  must be introduced:

/**
 * A handle which may be used to request the cancellation of execution.
 */
public interface CancelHandle {
    /** A factory method to create a handle. */
    static CancelHandle create() {}

    /**
     * Abruptly terminates an execution of an associated process.
     *
     * <p>Control flow will return after the process has been terminated and the resources associated with that process have been freed.
     */
    void cancel();

    /**
     * Abruptly terminates an execution of a associated process.
     *
     * @return A future that will be completed after the process has been terminated and the resources associated with that process have
     *         been freed.
     */
    CompletableFuture<Void> cancelAsync();

    /**
     * Flag indicating whether cancellation was requested or not.
     *
     * <p>This method will return true even if cancellation has not been completed yet.
     *
     * @return {@code true} when cancellation was requested.
     */
    boolean isCancelled();

    /**
     * Issue a token associated with this handle.
     * 
     * <p>Token is reusable, meaning the same token may be used to link several executions into a single cancellable.
     */
    CancellationToken token();

}

public interface CancellationToken { }

Besides, IgniteSql must have a new set of methods accepting CancellationToken as parameter:

public interface IgniteSql {
    <...>
    
    ResultSet<SqlRow> execute(
        @Nullable Transaction transaction,
        @Nullable CancellationToken cancellationToken,
        String query,
        @Nullable Object... arguments
    );
    
    <...>
}

Similar changes must be done in IgniteCompute as well.

Example
IgniteSql sql = <acquire sql api>;

// create cancellation handle
CancelHandle handle = CancelHandle.create();

// propagate cancellation token to an execution
sql.executeAsync(null, handle.token(), "<query text>");

// propagate token from the same handle to more executions, if needed
sql.executeAsync(null, handle.token(), "<query text>");

// cancel everything at once
handle.cancel();

JDBC

JDBC already has idiomatic way to cancel current execution:

public interface Statement extends Wrapper, AutoCloseable {
    <...>

    /**
     * Cancels this <code>Statement</code> object if both the DBMS and
     * driver support aborting an SQL statement.
     * This method can be used by one thread to cancel a statement that
     * is being executed by another thread.
     *
     * @exception SQLException if a database access error occurs or
     * this method is called on a closed <code>Statement</code>
     * @exception SQLFeatureNotSupportedException if the JDBC driver does not support
     * this method
     */
    void cancel() throws SQLException;

    <...>
}

ODBC

Similar to JDBC, ODBC has its own idiomatic way to cancel a statement: https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlcancel-function?view=sql-server-ver16

Cancellation In Management API

SQL

A new set of commands must be introduced:

KILL QUERY '<query_id>' [ NO WAIT ]
KILL TRANSACTION '<tx_id>' [ NO WAIT ]
KILL COMPUTE '<job_id>' [ NO WAIT ] 

Where 

  • query_id – string representation of query identity that should be terminated.
  • tx_id – string representation of transaction identity that should be terminated.
  • job_id – string representation of compute job identity that should be terminated.
  • NO WAIT – optional parameter forcing command to return immediately rather than to wait for the resources to be freed.

Besides, a new set of system view must be added in order to make it possible to find and identify problematic processes:

TRANSACTIONS -- node system view
  COORDINATOR_NODE_ID VARCHAR
  STATE               VARCHAR
  ID                  VARCHAR
  START_TIME          TIMESTAMP WITH LOCAL TIME ZONE
  IMPLICIT            BOOLEAN
  TYPE                VARCHAR -- possible values are [READ_ONLY|READ_WRITE]
  PRIORITY            VARCHAR

LOCKS -- node system view
  OWNING_NODE_ID VARCHAR
  TX_ID          VARCHAR
  OBJECT_ID      VARCHAR
  MODE           VARCHAR

SQL_QUERIES -- node system view
  INITIATOR_NODE_ID VARCHAR
  PHASE             VARCHAR -- i.e. INITIALIZATION | OPTIMIZATION | EXECUTION | etc
  QUERY_ID          VARCHAR
  PARENT_QUERY_ID   VARCHAR -- reference to a script
  ORIGINAL_SQL      VARCHAR
  START_TIME        TIMESTAMP WITH LOCAL TIME ZONE
  SCHEMA_NAME       VARCHAR
  TX_ID             VARCHAR

COMPUTE_JOBS -- node system view
  ID                  VARCHAR
  COORDINATOR_NODE_ID VARCHAR
  STATUS              VARCHAR
  CREATE_TIME         TIMESTAMP WITH LOCAL TIME ZONE
  START_TIME          TIMESTAMP WITH LOCAL TIME ZONE
  FINISH_TIME         TIMESTAMP WITH LOCAL TIME ZONE

Note: LOCKS.OBJECT_ID denotes the top most object of the lock hierarchy (i.e. tableId or indexId). We must ensure proper value is exposed from LockKey (e.g. there is proper to_string implementation).

Rest

A new set of endpoints must be introduced:

GET /management/v1/transactions/
GET /management/v1/transactions/{tx_id}
DELETE /management/v1/transactions/{tx_id}

GET /management/v1/sql/queries/
GET /management/v1/sql/queries/{query_id}
DELETE /management/v1/sql/queries/{query_id}

GET /management/v1/compute/jobs/
GET /management/v1/compute/jobs/{job_id}
DELETE /management/v1/compute/jobs/{job_id}

Tickets

IGNITE-23429 - Getting issue details... STATUS

  • No labels