IndexDev
Skip to end of metadata
Go to start of metadata

Indexes

Introduction

This document explains the proposed design for adding index support to Hive (HIVE-417). Indexing is a standard database technique, but with many possible variations. Rather than trying to provide a "one-size-fits-all" index implementation, the approach we are taking is to define indexing in a pluggable manner (related to StorageHandlers) and provide one concrete indexing implementation as a reference, leaving it open for contributors to plug in other indexing schemes as time goes by. No index support will be available until Hive 0.7.

Scope

Only single-table indexes are supported. Others (such as join indexes) may be more appropriately expressed as materialized views once Hive has support for those.

This document currently only covers index creation and maintenance. A follow-on will explain how indexes are used to optimize queries (building on FilterPushdownDev).

CREATE INDEX

For the details of the various clauses such as ROW FORMAT, see Create Table.

By default, index partitioning matches the partitioning of the base table. The PARTITIONED BY clause may be used to specify a subset of the table's partitioning columns (this column list may be empty to indicate that the index spans all partitions of the table). For example, a table may be partitioned by date+region even though the index is partitioned by date alone (each index partition spanning all regions).

Indexes cannot be created on views. We will (eventually) support them on non-native tables (in cases where the corresponding storage handler indicates that it supports them).

Index handlers may require that the base table being indexed have a particular format.

Question: should we allow indexes on EXTERNAL tables? What does this mean for implicit DROP when the table is dropped? Is there a concept of an EXTERNAL index?

If the index handler stores its representation in tabular form, then index_table_name can be used to control the name of the "index table" automatically created for this purpose. The index table storage format can be controlled using STORED AS (e.g. RCFILE or SEQUENCFILE) or STORED BY (e.g. to store the index table in a non-native table such as HBase), although some index handlers may require usage of a specific storage format. Not all index handlers store their representation in tabular form; some may use non-table files, and others may use structures maintained completely outside of Hive (e.g. a persistent key/value store).

Metastore Model

The diagram below shows the new metastore schema with index support:

http://issues.apache.org/jira/secure/attachment/12449601/idx2.png

The new IDXS table in the metastore schema contains one entry per index created. It has two relationships with the TBLS table:

  • ORIG_TBL_ID is a mandatory foreign key referencing the ID of the base table containing the data to be indexed.
  • IDX_TBL_ID is an optional foreign key referencing the ID of a table containing the index representation. It is optional because not all index implementations use a table for storage. For indexes which do use a table for storage, the implicitly created table will have its TBL_TYPE set to INDEX_TABLE.

So, given the following DDL:

The TBLS table in the metastore will have two entries:

  • one for base table t
  • one for the index table, automatically named as default+t_x+

In the IDXS entry for x, ORIG_TBL_ID will reference the TBL_ID of x, and IDX_TBL_ID will reference the TBL_ID of default+t_x+.

To avoid the generated name, a user-specified name such as t_x can be supplied instead:

Note that index names are qualified by the containing base table (like partitions), so the same index name can be used across two different tables. However, names of index tables are in the same namespace as all other tables and views, so they must be unique within the same database.

An index has a storage descriptor which includes the subset of columns from the original table covered by the index. If the index representation is stored in a table, most of the other fields in the index's own storage descriptor (e.g. LOCATION) will be irrelevant.

TBD:

  • change IDX_TYPE to IDX_HANDLER
  • what does LAST_ACCESS_TIME mean? last time the optimizer used this index?
  • need LAST_REBUILD_TIME? how do we track it at partition-level? it should be in the metastore (not just HDFS)
  • in the case where the index partitioning is a subset of the base table partitioning, we need a way to model this in the metastore

    Metastore Upgrades

Here are the MySQL metastore upgrade statements.

REBUILD

For the PARTITION clause syntax, see LanguageManual DDL#Add_Partitions.

If WITH DEFERRED REBUILD is specified on CREATE INDEX, then the newly created index is initially empty (regardless of whether the table contains any data). The ALTER INDEX ... REBUILD command can be used to build the index structure for all partitions or a single partition.

If data in the base table changes, then the REBUILD command must be used to bring the index up to date. This is an atomic operation, so if the table was previously indexed, and a rebuild fails, then the stale index remains untouched.

DROP INDEX

An index can be dropped at any time with DROP INDEX. This will also cascade to the index table (if one exists).

Attempting to drop an index table directly with DROP TABLE will fail.

When an indexed base table is dropped, the DROP implicitly cascades to all indexes (and their corresponding index tables if any).

When an indexed base table has one of its partitions dropped, this implicitly cascades to drop corresponding partitions from all indexes.

Question: what do we do if the index partitioning granularity is not the same as the table partitioning granularity? Probably just ignore the drop, and let the user clean up manually with a new ALTER INDEX DROP PARTITION statement.

Plugin Interface

An index handler has these main responsibilities:

  • During CREATE INDEX, validating the format of the base table and then generating the structure of the index table (if any) and filling any additional information into the index's storage descriptor
  • During REBUILD, producing a plan for reading the base table's data and writing to the index storage and/or index table
  • During DROP, deleting any index-specific storage (index tables are dropped automatically by Hive)
  • During queries, participating in optimization in order to convert operators such as filters into index access plans (this part is out of scope for the moment)

The corresponding Java inerface is defined below, together with a companion abstract base class which handlers should extend.

For CREATE INDEX, Hive first calls usesIndexTable() on the handler to determine whether an index table will be created. If this returns false, the statement fails immediately if the user specified any table storage options for the index. However, if usesIndexTable() returns true, then Hive creates a partial table definition for the index table based on the index definition (such as the covered columns) combined with any table storage options supplied by the user. Next, Hive calls analyzeIndexDefinition (passing either null or the partial index table definition for the indexTable parameter). The handler responds by validating the definitions (throwing an exception if any unsupported combination is detected) and then filling in additional information on the index and indexTable parameters as output. Hive then stores these results in the metastore.

TBD: we will be adding methods for calling the handler when an index is dropped (e.g. to give a cleanup opportunity to a handler which stores the index representation in an external system such as HBase)

Reference Implementation

The reference implementation creates what is referred to as a "compact" index. This means that rather than storing the HDFS location of each occurrence of a particular value, it only stores the addresses of HDFS blocks containing that value. This is optimized for point-lookups in the case where a value typically occurs more than once in nearby rows; the index size is kept small since there are many fewer blocks than rows. The tradeoff is that extra work is required during queries in order to filter out the other rows from the indexed blocks.

The compact index is stored in an index table. The index table columns consist of the indexed columns from the base table followed by a _bucketname string column (indicating the name of the file containing the indexed block) followed by an _offsets array<string> column (indicating the block offsets within the corresponding file). The index table is stored as sorted on the indexed columns (but not on the generated columns).

The reference implementation can be plugged in with

TBD: algorithm for building the index

TBD: mechanism for searching the index

TBD: validation on base table (can be any managed table?)

TBD: validation on index table format (can be any managed table format?)

TBD

  • specs for SHOW/DESCRIBE INDEX (HIVE-1497)
  • ALTER INDEX DROP PARTITION?
  • ALTER INDEX SET IDXPROPERTIES, change tableformat, etc
  • what happens when the structure of a table or partition changes after it has already been indexed
  • automatic indexing as part of INSERT when WITH DEFERRED REBUILD is not specified
  • prevent creation of an index on an index table?
  • metastore upgrade script
  • stats collection for index tables
  • intersection with new Hive concurrency control feature (what locks do we take for various index operations?)

Current Status (JIRA)

Type Key Summary Assignee Reporter Priority Status Resolution Created Updated Due
Bug HIVE-6921 index creation fails with sql std auth turned on Ashutosh Chauhan Ashutosh Chauhan Major Patch Available Unresolved Apr 16, 2014 Apr 17, 2014
Bug HIVE-5902 Cannot create INDEX on TABLE in HIVE 0.12 Unassigned Juraj Volentier Major Open Unresolved Nov 27, 2013 Nov 27, 2013
Bug HIVE-5664 Drop cascade database fails when the db has any tables with indexes Venki Korukanti Venki Korukanti Major Patch Available Unresolved Oct 28, 2013 Oct 31, 2013
Improvement HIVE-4368 Upgrade JavaEWAH dependency to version 0.6.11 Unassigned Daniel Lemire Minor Open Unresolved Apr 16, 2013 Apr 16, 2013
Bug HIVE-3563 Drop database cascade fails when there are indexes on any tables Prasad Mujumdar Prasad Mujumdar Major Closed Fixed Oct 10, 2012 Oct 28, 2013
Bug HIVE-3434 Batch creation of indexes for tables with a very large number of partitions Esteban Gutierrez Esteban Gutierrez Major Open Unresolved Sep 05, 2012 Sep 05, 2012
Bug HIVE-2895 Hive cli silently fails to create index Unassigned Mark Schnegelberger Major Open Unresolved Mar 22, 2012 Apr 30, 2012
New Feature HIVE-2845 Add support for index joins in Hive Unassigned Namit Jain Major Open Unresolved Mar 07, 2012 Aug 19, 2012
Bug HIVE-2752 Index names are case sensitive Navis Philip Tromans Minor Resolved Fixed Jan 26, 2012 Mar 29, 2014
Improvement HIVE-2611 Make index table output of create index command if index is table based Kevin Wilfong Kevin Wilfong Major Closed Fixed Nov 28, 2011 Apr 30, 2012
Improvement HIVE-2535 Use sorted nature of compact indexes Kevin Wilfong Kevin Wilfong Major Closed Fixed Oct 28, 2011 Jun 27, 2012
Improvement HIVE-2458 Group-by query optimization Followup: add flag in conf/hive-default.xml Prajakta Kalmegh Prajakta Kalmegh Major Closed Fixed Sep 20, 2011 Dec 16, 2011
Improvement HIVE-2448 Upgrade JavaEWAH to 0.3 John Sichi John Sichi Major Closed Fixed Sep 15, 2011 Dec 16, 2011
Improvement HIVE-2354 Support automatic rebuilding of indexes when they go stale Syed S. Albiz Syed S. Albiz Major Closed Fixed Aug 06, 2011 Dec 16, 2011
Bug HIVE-2335 Indexes are still automatically queried when out of sync with their source tables Syed S. Albiz Syed S. Albiz Major Closed Fixed Aug 02, 2011 Dec 16, 2011
Bug HIVE-2289 NumberFormatException with respect to _offsets when running a query with index Unassigned siddharth ramanan Major Resolved Invalid Jul 18, 2011 Jul 20, 2011
Bug HIVE-2138 Exception when no splits returned from index Syed S. Albiz Russell Melick Major Closed Fixed May 01, 2011 Dec 16, 2011
Improvement HIVE-2130 Cost based choice for rewrite during Automatic Indexing Unassigned Russell Melick Major Open Unresolved Apr 23, 2011 May 02, 2013
Improvement HIVE-2129 Display indexing information for TableScanOperator in plan Unassigned Russell Melick Major Open Unresolved Apr 23, 2011 May 02, 2013
Improvement HIVE-2128 Automatic Indexing with multiple tables Syed S. Albiz Russell Melick Major Closed Fixed Apr 23, 2011 May 02, 2013
20 more issues

Labels
  • No labels