DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
| ID | IEP-137 |
| Author | |
| Sponsor | |
| Created |
|
| Status | DRAFT |
The optimizer plays a pivotal role in determining the most efficient execution plan for a given SQL query. However, without visibility into how queries are executed, diagnosing performance issues becomes a trial-and-error process, often leading to suboptimal tuning and confusion.
The EXPLAIN command addresses this gap by providing transparency into the query execution plan chosen by the optimizer.
EXPLAIN must provide a clear answer as to:
JOIN algorithm, HASH vs STREAMING aggregates, etc).The syntax for EXPLAIN command is as follow:
EXPLAIN [ ( PLAN | MAPPING ) [ L_BRACKET option1 [, ...] R_BRACKET ] FOR ] <query_or_dml>
If neither PLAN nor MAPPING is specified, then PLAN is implicit. Additional options may be specified inside brackets. Currently this part should be omitted since at the moment we have no options, but later it will be used to specify format or verbosity:
// Example below shows possibility of extension of // current syntax with different options EXPLAIN PLAN (FORMAT JSON, VERBOSITY LEVEL 10) FOR SELECT * FROM my_table
EXPLAIN PLAN command returns a query plan without mapping to a particular topology. That is, it includes a query plan in a tree-like format without splitting on fragments. This representation is suitable to investigate performance issues related to the optimizer, like choice of particular algorithm (requirements 1, 2, and partially 3).
EXPLAIN MAPPING command is suitable for deep understanding of which part of the query is executed on which node. This representation is suitable to investigate issues related to *best-effort colocation.
Best-effort colocation is a colocation of execution time. That is, for tables belonging to different distribution zones sql engine may peek up replicas which reside on the same node to minimise the amount of data transferred between nodes.
Only text format will be supported from the beginning. More formats (like JSON or DOT) may be supported later on demand.
Every relational operator is described with a name and set of attributes:
NameOfTheRelationalOperator attribute1: value1 attribute2: value2
Let’s take a look at few examples:
TableScan // Full table access table: PUBLIC.EMP1 // Name of the table in question fields: [NAME, SALARY] // List of columns to return est: (rows=1) // Estimated number of rows returned IndexScan // Index scan table: PUBLIC.TEST_TBL // Name of the table in question index: IDX_DESC // Name of the index in question type: SORTED // Type of the index fields: [C1] // List of columns to return collation: [C1 DESC] // Collation of the index aka order of sorting est: (rows=1) // Estimated number of rows returned Sort collation: [C1 DESC NULLS LAST] // Collation to sort input rows est: (rows=1) // Estimated number of rows returned
Name represents a particular algorithm used to execute relation operators (TableScan vs IndexScan, HashJoin vs MergeJoin vs NestedLoopJoin, etc). The set of attributes depends on the particular relational operator.
A query plan is represented by a tree-like structure which is the composition of nodes described above. This tree describes a data flow, where rows are passed from leaves to a plan root node (root node is the topmost node, it also has no indentation). Let's take a look at few examples:
// simple ordered select where desired order matches collation of existing index EXPLAIN PLAN FOR SELECT c1 FROM test_tbl ORDER BY c1 DESC NULLS FIRST Exchange distribution: single est: (rows=1) IndexScan table: PUBLIC.TEST_TBL index: IDX_DESC type: SORTED fields: [C1] collation: [C1 DESC] est: (rows=1) // similar query, but desired order doesn't match index collation. Mind the additional // SORT node. EXPLAIN PLAN FOR SELECT c1 FROM test_tbl ORDER BY c1 DESC NULLS LAST Exchange distribution: single est: (rows=1) Sort collation: [C1 DESC NULLS LAST] est: (rows=1) TableScan table: PUBLIC.TEST_TBL fields: [C1] est: (rows=1)
A few words about Exchange. Exchange is a special operator which aligns distributions of neighbour nodes. In the example above, the table is distributed within the zone, while the root node represents the user's cursor and must be assigned to a single node (aka query initiator). To make transition from zone-based to a single-node distribution, an artificial operator Exchange is inserted into the plan. Later, this Exchange will serve as a cutting point when the query plan will be split into fragments. You may think of Exchange as a point of potential (it’s potential just because in case of best-effort colocation it will be send-to-itself, but in general this is a transfer over a network) data transfer between nodes.
More examples:
// Mind the "fetch" attribute of Sort node. It denotes TopN sort algorithm, // implying that only N nodes will be kept in memory. This also implies // that only N rows from every node will be transferred over Exchange. SELECT * FROM test ORDER BY pk FETCH FIRST ? ROWS ONLY Limit fetch: ?0 est: (rows=1) Exchange distribution: single est: (rows=1) Sort collation: [PK ASC] fetch: ?0 est: (rows=1) TableScan table: PUBLIC.TEST fields: [PK, COL0] est: (rows=1) // Similar query, but Limit node wasn't pushed down the exchange. This implies // that the whole dataset will be transferred over an Exchange. SELECT * FROM test OFFSET ? ROWS FETCH FIRST ? ROWS ONLY Limit offset: ?0 fetch: ?1 est: (rows=1) Exchange distribution: single est: (rows=1) TableScan table: PUBLIC.TEST fields: [PK, COL0] est: (rows=1) // More complex plan EXPLAIN PLAN FOR SELECT U.UserName, P.ProductName, R.ReviewText, R.Rating FROM Users U, Reviews R, Products P WHERE U.UserID = R.UserID AND R.ProductID = P.ProductID AND P.ProductName = 'Product_' || ?::varchar; Project fields: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] exprs: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] est: (rows=16650) HashJoin condition: =(USERID0, USERID) joinType: inner est: (rows=16650) HashJoin condition: =(PRODUCTID, PRODUCTID0) joinType: inner est: (rows=16650) Exchange distribution: single est: (rows=50000) TableScan table: PUBLIC.REVIEWS fields: [PRODUCTID, USERID, REVIEWTEXT, RATING] est: (rows=50000) Exchange distribution: single est: (rows=1)665 TableScan table: PUBLIC.PRODUCTS filters: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8")) fields: [PRODUCTID, PRODUCTNAME] est: (rows=1665) Exchange distribution: single est: (rows=10000) TableScan table: PUBLIC.USERS fields: [USERID, USERNAME] est: (rows=10000)
A result of EXPLAIN MAPPING command includes additional metadata providing insight at how the query is mapped on cluster topology:
EXPLAIN MAPPING FOR SELECT U.UserName, P.ProductName, R.ReviewText, R.Rating FROM Users U, Reviews R, Products P WHERE U.UserID = R.UserID AND R.ProductID = P.ProductID AND P.ProductName = 'Product_' || ?::varchar; Fragment#0 root executionNodes: [ijot_n_3344] plan: Project fields: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] exprs: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] est: (rows=1) HashJoin condition: =(USERID0, USERID) joinType: inner est: (rows=1) HashJoin condition: =(PRODUCTID, PRODUCTID0) joinType: inner est: (rows=1) Receiver rowType: RecordType(INTEGER PRODUCTID, INTEGER USERID, VARCHAR(65536) REVIEWTEXT, INTEGER RATING) exchangeId: 1 sourceFragmentId: 1 est: (rows=1) Receiver rowType: RecordType(INTEGER PRODUCTID, VARCHAR(100) PRODUCTNAME) exchangeId: 2 sourceFragmentId: 2 est: (rows=1) Receiver rowType: RecordType(INTEGER USERID, VARCHAR(100) USERNAME) exchangeId: 3 sourceFragmentId: 3 est: (rows=1) Fragment#1 targetFragment: 0 executionNodes: [ijot_n_3344, ijot_n_3345, ijot_n_3346] plan: Sender exchangeId: 1 targetFragmentId: 0 distribution: single est: (rows=50000) TableScan table: PUBLIC.REVIEWS fields: [PRODUCTID, USERID, REVIEWTEXT, RATING] sourceId: 6 est: (rows=50000) Fragment#2 targetFragment: 0 executionNodes: [ijot_n_3344, ijot_n_3345, ijot_n_3346] plan: Sender exchangeId: 2 targetFragmentId: 0 distribution: single est: (rows=1665) TableScan table: PUBLIC.PRODUCTS filters: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8")) fields: [PRODUCTID, PRODUCTNAME] sourceId: 5 est: (rows=1665) Fragment#3 targetFragment: 0 executionNodes: [ijot_n_3344, ijot_n_3345, ijot_n_3346] plan: Sender exchangeId: 3 targetFragmentId: 0 distribution: single est: (rows=10000) TableScan table: PUBLIC.USERS fields: [USERID, USERNAME] sourceId: 4 est: (rows=10000)
IGNITE-25273 - Getting issue details... STATUS