This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • IndexDev

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: fix a few links

...

This document explains the proposed design for adding index support to Hive (http://issues.apache.org/jira/browse/HIVE-417). Indexing is a standard database technique, but with many possible variations. Rather than trying to provide a "one-size-fits-all" index implementation, the approach we are taking is to define indexing in a pluggable manner (related to Hive- StorageHandlers) and provide one concrete indexing implementation as a reference, leaving it open for contributors to plug in other indexing schemes as time goes by. No index support will be available until Hive 0.7.

...

This document currently only covers index creation and maintenance. A follow-on will explain how indexes are used to optimize queries (building on Hive- FilterPushdownDev).

CREATE INDEX

Code Block
CREATE INDEX index_name 
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
   [ ROW FORMAT ...] STORED AS ...
   | STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"]

For the details of the various clauses such as ROW FORMAT, see Hive-LanguageManual-DDL LanguageManual DDL#Create_Table.

By default, index partitioning matches the partitioning of the base table. The PARTITIONED BY clause may be used to specify a subset of the table's partitioning columns (this column list may be empty to indicate that the index spans all partitions of the table). For example, a table may be partitioned by date+region even though the index is partitioned by date alone (each index partition spanning all regions).

...

For the PARTITION clause syntax, see Hive-LanguageManual-DDL LanguageManual DDL#Add_Partitions.

If WITH DEFERRED REBUILD is specified on CREATE INDEX, then the newly created index is initially empty (regardless of whether the table contains any data). The ALTER INDEX ... REBUILD command can be used to build the index structure for all partitions or a single partition.

...

  • specs for SHOW/DESCRIBE INDEX (HIVE-1497 (http://issues.apache.org/jira/browse/HIVE-1497))
  • ALTER INDEX DROP PARTITION?
  • ALTER INDEX SET IDXPROPERTIES, change tableformat, etc
  • what happens when the structure of a table or partition changes after it has already been indexed
  • automatic indexing as part of INSERT when WITH DEFERRED REBUILD is not specified
  • prevent creation of an index on an index table?
  • metastore upgrade script
  • stats collection for index tables
  • intersection with new Hive concurrency control feature (what locks do we take for various index operations?)

...