In the Derby "cost-based optimization" code you will find that the code often manipulates information about tables by using "table numbers". Table numbers are integers which stand in as surrogates for particular tables during the optimization process.

  • How are Table Numbers assigned? How do views and synonyms affect this process?

Note that by the time we start optimization, constructs like views and synonyms have already been transformed and replaced by their underlying "real" tables. Transformations and table resolution occur during the "binding" and "preprocessing" stages of query compilation--and both of those stages occur before cost-based optimization begins. So at this point a view will be represented by a ProjectRestrictNode whose child is a SelectNode, and a synonym will be represented by whatever FromTable it (the synonym) is actually referring to.

Table numbers are also assigned during binding, so by the time we get to the code involved with costing and plan selection, all FromTables (aka "Optimizables") in the entire query will have an assigned table number (if required--in some cases it's not necessary and thus will be -1). Additionally any column reference which points to one of those FromTables will have the table number for that FromTable stored locally (namely, in ColumnReference.tableNumber).

Note that when a ColumnReference is "remapped" to point to a different FromTable, its local information-including tableNumberis updated accordingly. Note also that a "FromTable" is not required to be a base table-rather, anything that can be specified in the FROM list of a SELECT query will be represented by some instance of FromTable, whether it be a subquery, a base table, a union node, etc. Every FromTable has its own "table number", with the exception of ProjectRestrictNodes. For a PRN, if the PRN's child is itself a FromTable (as opposed to, say, a SelectNode) then the PRN's table number will be -1 and any attempts to "get" the PRN's table number will return the table number of the PRN's child. If the PRN's child is not a FromTable, then the PRN will have it's own table number.

  • If the optimizer is choosing to access an index for a table, rather than accessing the table itself, does the table number change depending on whether it is an index or a base table which is being processed by the ProjectRestrictNode?

The thing to note here is that "table number" is strictly a language-created, compilation time value to allow binding, preprocessing, optimization, and code generation to distinguish between the various FromTables in the original query. A table number is not stored on disk and it is independent of the access path decisions (including whether or not an index is used) made by the optimizer. Furthermore, there is no link between a given table number and the actual on-disk table that it points to. Table number 0 could be for T1 in one query, T2 in another query, and T100 in a third query.

As a simple (but admittedly meaningless) example, take the following query:

select t1.i, x1.j from t1, t1 x1 where t1.i = x1.j;

At bind time Derby will assign every item in the FROM list a table number. So in this case, "T1" gets table number 0 and "T1 X1" gets table number 1. The fact that both FromTables are really pointing to the same base table doesn't matter. For the duration of compilation/optimization, they are represented by two different instances of FromTable and are considered two different "tables", each having its own table number. (For the record, in this particular example the different FromTables will in fact point to the same underlying tableDescriptor field).

Given that, the predicate t1.i = x1.j will have a left ColumnReference pointing to a FromBaseTable representing T1 with table number "0" and a right ColumnReference pointing to a different FromBaseTable representing X1 (i.e. T1 again) with table number "1".

If the optimizer then decides to use an index for T1, the table number doesn't change--the optimizer just decides that for "the FromBaseTable whose table number is 0 we will use an index". In fact, once assigned, the table number for a specific FromTable remains the same for the duration of the compilation of the statement.

  • Under what circumstances will a table number be -1?

The only time a FromTable's table number can be -1 during optimization is if it is in fact a ProjectRestrictNode with a non-FromTable subquery as its child.

There are a number of classes that extend FromTable. For many of these classes, table numbers are always set during binding (and thus will not be -1 when it comes time to optimize).

In some cases, classes which extend FromTable are only instantiated AFTER optimization has completed, and thus even though their table numbers can be -1, that won't affect optimization. Two classes for which this is true are IndexToBaseRowNode and SingleChildResultSetNode.

There are two subclasses of FromTable that can be instantiated during preprocessing but that may not have their table numbers set (these are the ones of interest to the current discussion):

  • ProjectRestrictNode: If created during preprocessing, table number will only be set in FromSubquery.extractSubquery(); in all other cases it will remain -1.
  • NormalizeResultSetNode: Instantiated during bind phase for InsertNode and UpdateNode but table number is not set. So it will be -1. However, a NormalizeResultSetNode never appears in an optimizer's optimizableList, and thus such a node will never actually be optimized.

For details on how table numbers are used in the various phases of Derby optimization, interested readers may want to read the ReferencedTableMaps and/or PredicatePushdown pages of this wiki.

  • No labels