IDIEP-137
Author
Sponsor
Created

  

StatusDRAFT


Motivation

The optimizer plays a pivotal role in determining the most efficient execution plan for a given SQL query. However, without visibility into how queries are executed, diagnosing performance issues becomes a trial-and-error process, often leading to suboptimal tuning and confusion.

The EXPLAIN command addresses this gap by providing transparency into the query execution plan chosen by the optimizer.

Requirements

EXPLAIN must provide a clear answer as to:

  1. what algorithms are chosen for query execution (choice of JOIN algorithm, HASH vs STREAMING aggregates, etc).
  2. what steps are taken to fulfill the request (whether sorting order was taken into account or the whole dataset was sorted once again, whether predicate was used for index lookup or it was used only for post-filtration).
  3. how data is transferred between nodes and which operators are executed locally on nodes where data reside.
  4. which part of the query is executed on which nodes. 

Description

Public API

The syntax for EXPLAIN command is as follow:

EXPLAIN [ ( PLAN | MAPPING ) [ L_BRACKET option1 [, ...] R_BRACKET ] FOR ] <query_or_dml>

If neither PLAN nor MAPPING is specified, then PLAN is implicit. Additional options may be specified inside brackets. Currently this part should be omitted since at the moment we have no options, but later it will be used to specify format or verbosity:

// Example below shows possibility of extension of
// current syntax with different options
EXPLAIN PLAN (FORMAT JSON, VERBOSITY LEVEL 10) FOR SELECT * FROM my_table

EXPLAIN PLAN command returns a query plan without mapping to a particular topology. That is, it includes a query plan in a tree-like format without splitting on fragments. This representation is suitable to investigate performance issues related to the optimizer, like choice of particular algorithm (requirements 1, 2, and partially 3).

EXPLAIN MAPPING command is suitable for deep understanding of which part of the query is executed on which node. This representation is suitable to investigate issues related to *best-effort colocation.

Best-effort colocation is a colocation of execution time. That is, for tables belonging to different distribution zones sql engine may peek up replicas which reside on the same node to minimise the amount of data transferred between nodes.

Output format

Only text format will be supported from the beginning. More formats (like JSON or DOT) may be supported later on demand.

Every relational operator is described with a name and set of attributes:

NameOfTheRelationalOperator
    attribute1: value1
    attribute2: value2

Let’s take a look at few examples:

TableScan                        // Full table access
    table: PUBLIC.EMP1           // Name of the table in question
    fields: [NAME, SALARY]       // List of columns to return
    est: (rows=1)                // Estimated number of rows returned

IndexScan                        // Index scan
    table: PUBLIC.TEST_TBL       // Name of the table in question
    index: IDX_DESC              // Name of the index in question
    type: SORTED                 // Type of the index
    fields: [C1]                 // List of columns to return
    collation: [C1 DESC]         // Collation of the index aka order of sorting
    est: (rows=1)                // Estimated number of rows returned

Sort
    collation: [C1 DESC NULLS LAST] // Collation to sort input rows
    est: (rows=1)                   // Estimated number of rows returned

Name represents a particular algorithm used to execute relation operators (TableScan vs IndexScan, HashJoin vs MergeJoin vs NestedLoopJoin, etc). The set of attributes depends on the particular relational operator.

A query plan is represented by a tree-like structure which is the composition of nodes described above. This tree describes a data flow, where rows are passed from leaves to a plan root node (root node is the topmost node, it also has no indentation). Let's take a look at few examples:

// simple ordered select where desired order matches collation of existing index
EXPLAIN PLAN FOR SELECT c1 FROM test_tbl ORDER BY c1 DESC NULLS FIRST
Exchange
    distribution: single
    est: (rows=1)
  IndexScan
      table: PUBLIC.TEST_TBL
      index: IDX_DESC
      type: SORTED
      fields: [C1]
      collation: [C1 DESC]
      est: (rows=1)

// similar query, but desired order doesn't match index collation. Mind the additional
// SORT node.
EXPLAIN PLAN FOR SELECT c1 FROM test_tbl ORDER BY c1 DESC NULLS LAST
Exchange
    distribution: single
    est: (rows=1)
  Sort
      collation: [C1 DESC NULLS LAST]
      est: (rows=1)
    TableScan
        table: PUBLIC.TEST_TBL
        fields: [C1]
        est: (rows=1)

A few words about Exchange. Exchange is a special operator which aligns distributions of neighbour nodes. In the example above, the table is distributed within the zone, while the root node represents the user's cursor and must be assigned to a single node (aka query initiator). To make transition from zone-based to a single-node distribution, an artificial operator Exchange is inserted into the plan. Later, this Exchange will serve as a cutting point when the query plan will be split into fragments. You may think of Exchange as a point of potential (it’s potential just because in case of best-effort colocation it will be send-to-itself, but in general this is a transfer over a network) data transfer between nodes.

More examples:

// Mind the "fetch" attribute of Sort node. It denotes TopN sort algorithm,
// implying that only N nodes will be kept in memory. This also implies
// that only N rows from every node will be transferred over Exchange.  
SELECT * FROM test ORDER BY pk FETCH FIRST ? ROWS ONLY
Limit
    fetch: ?0
    est: (rows=1)
  Exchange
      distribution: single
      est: (rows=1)
    Sort
        collation: [PK ASC]
        fetch: ?0
        est: (rows=1)
      TableScan
          table: PUBLIC.TEST
          fields: [PK, COL0]
          est: (rows=1)

// Similar query, but Limit node wasn't pushed down the exchange. This implies
// that the whole dataset will be transferred over an Exchange.
SELECT * FROM test OFFSET ? ROWS FETCH FIRST ? ROWS ONLY 
Limit
    offset: ?0
    fetch: ?1
    est: (rows=1)
  Exchange
      distribution: single
      est: (rows=1)
    TableScan
        table: PUBLIC.TEST
        fields: [PK, COL0]
        est: (rows=1)

// More complex plan
EXPLAIN PLAN FOR SELECT 
    U.UserName, P.ProductName, R.ReviewText, R.Rating
 FROM Users U, Reviews R, Products P
WHERE U.UserID = R.UserID
  AND R.ProductID = P.ProductID
  AND P.ProductName = 'Product_' || ?::varchar;
Project
    fields: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
    exprs: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
    est: (rows=16650)
  HashJoin
      condition: =(USERID0, USERID)
      joinType: inner
      est: (rows=16650)
    HashJoin
        condition: =(PRODUCTID, PRODUCTID0)
        joinType: inner
        est: (rows=16650)
      Exchange
          distribution: single
          est: (rows=50000)
        TableScan
            table: PUBLIC.REVIEWS
            fields: [PRODUCTID, USERID, REVIEWTEXT, RATING]
            est: (rows=50000)
      Exchange
          distribution: single
          est: (rows=1)665
        TableScan
            table: PUBLIC.PRODUCTS
            filters: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8"))
            fields: [PRODUCTID, PRODUCTNAME]
            est: (rows=1665)
    Exchange
        distribution: single
        est: (rows=10000)
      TableScan
          table: PUBLIC.USERS
          fields: [USERID, USERNAME]
          est: (rows=10000)

A result of EXPLAIN MAPPING command includes additional metadata providing insight at how the query is mapped on cluster topology:

EXPLAIN MAPPING FOR SELECT 
    U.UserName, P.ProductName, R.ReviewText, R.Rating
 FROM Users U, Reviews R, Products P
WHERE U.UserID = R.UserID
  AND R.ProductID = P.ProductID
  AND P.ProductName = 'Product_' || ?::varchar;
Fragment#0 root
  executionNodes: [ijot_n_3344]
  plan: 
Project
    fields: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
    exprs: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
    est: (rows=1)
  HashJoin
      condition: =(USERID0, USERID)
      joinType: inner
      est: (rows=1)
    HashJoin
        condition: =(PRODUCTID, PRODUCTID0)
        joinType: inner
        est: (rows=1)
      Receiver
          rowType: RecordType(INTEGER PRODUCTID, INTEGER USERID, VARCHAR(65536) REVIEWTEXT, INTEGER RATING)
          exchangeId: 1
          sourceFragmentId: 1
          est: (rows=1)
      Receiver
          rowType: RecordType(INTEGER PRODUCTID, VARCHAR(100) PRODUCTNAME)
          exchangeId: 2
          sourceFragmentId: 2
          est: (rows=1)
    Receiver
        rowType: RecordType(INTEGER USERID, VARCHAR(100) USERNAME)
        exchangeId: 3
        sourceFragmentId: 3
        est: (rows=1)
Fragment#1
  targetFragment: 0
  executionNodes: [ijot_n_3344, ijot_n_3345, ijot_n_3346]
  plan: 
Sender
    exchangeId: 1
    targetFragmentId: 0
    distribution: single
    est: (rows=50000)
  TableScan
      table: PUBLIC.REVIEWS
      fields: [PRODUCTID, USERID, REVIEWTEXT, RATING]
      sourceId: 6
      est: (rows=50000)
Fragment#2
  targetFragment: 0
  executionNodes: [ijot_n_3344, ijot_n_3345, ijot_n_3346]
  plan:
Sender
    exchangeId: 2
    targetFragmentId: 0
    distribution: single
    est: (rows=1665)
  TableScan
      table: PUBLIC.PRODUCTS
      filters: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8"))
      fields: [PRODUCTID, PRODUCTNAME]
      sourceId: 5
      est: (rows=1665)
Fragment#3
  targetFragment: 0
  executionNodes: [ijot_n_3344, ijot_n_3345, ijot_n_3346]
  plan:
Sender
    exchangeId: 3
    targetFragmentId: 0
    distribution: single
    est: (rows=10000)
  TableScan
      table: PUBLIC.USERS
      fields: [USERID, USERNAME]
      sourceId: 4
      est: (rows=10000)

Tickets

IGNITE-25273 - Getting issue details... STATUS

  • No labels