Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: edits, links to jiras and DEFAULT Keyword doc

...

This document proposes the addition of DEFAULT clause to Hive. DEFAULT clause is a domain constraint which lets user specify a value for domain i.e. column to be used in absence of user specified value i.e. in absence of column reference. Note that this doesn’t does not propose to implement DEFAULT ON NULL like ORACLE which lets user specify DEFAULT value for explicit NULLs.

...

Hive currently let users declare the following constraints:

  • PRIMARY KEY

  • FOREIGN KEY

  • UNIQUE

  • NOT NULL

DEFAULT will be a fifth addition to this list. Note that unlike existing constraints DEFAULT will only support ENABLE/ENFORCED or DISABLE keyword. VALIDATE/NOVALIDATE and RELY/NORELY will not be supported since ENABLING DEFAULT will not change existing data and the optimizer will not make use of the RELY keyword.

Proposed Changes

...

  • LITERAL 

  • DATE TIME VALUE FUNCTION, that is, CURRENT_TIME, CURRENT_DATE 

  • CURRENT_USER() 
  • NULL 
  • CAST (<expression in above category> as PRIMITIVE TYPE) 

INSERT

Anytime user doesn’t specify a value explicitly for a column, its default value will be used if defined e. g For example:

INSERT INTO <tableName>(co1, col3) values(<val1> , <val2>)

Above statement doesn’t specify a value for col2 so system will use the default value for col2 if it is defined.

On the other hand if user specify specifies an explicit value including NULL e.g., for example:

INSERT INTO <tableName>(col1, col2, col3) values (<val1>, <val2>, <val3>)

Default then the default value will not be used.

Above are also valid for all following type of DMLs:

  • INSERT INTO

  • INSERT SELECT

  • MERGE

PARTITION COLUMNS

Since in Hive a query can not be written without referencing partition columns, there could not be a situation where the value for a partition column isn’t explicit. Therefore having DEFAULT for partition columns will not make sense and we propose to not add it.

EXTERNAL TABLE

We plan to disallow DEFAULT for external table since the data isn’t managed by Hive.

ACID/MM TABLE

DEFAULT constraint will be allowed and behavior will be same as non-acid tables.

...

We propose to add column DEFAULT_VALUE to KEY_CONSTRAINTS table in metastore schema to store DEFAULT VALUE.

OTHER DDL

Following The following DDLs will be updated to accommodate DEFAULT:

  • ALTER TABLE CHANGE COLUMN

  • ALTER TABLE DROP CONSTRAINT

...

Restrictions

  • Hive will have strict type rules for defining the default value, i.e., default value will have to be the exact same type as column type.
  • Default Defaults with Complex data types i.e (Array, Struct, Map and Union) are not allowed.

Proposed Design

Currently if an INSERT query is missing a value in INSERT, i.e. if user hasn’t specified a value for a column, Hive uses ‘NULL’ as default. The Hive compiler detects this missing column value at compile time and inserts a NULL. We propose to update this logic to check for default and use that instead of NULL value.

Along with this logic change we foresee the following changes:

  • Metastore code will need to be updated to support the DEFAULT constraint.

    • We propose to store/serialize the default value as string after it is evaluated and constant folded.

    • DEFAULT_VALUE will be added to KEY_CONSTRAINTS table in metastore schema.

  • Hive Parser will need to be updated to allow new DEFAULT keyword with default value.

  • Error handling/Validation logic needs to be added to make sure DEFAULT value conforms to allowed categories during CREATE TABLE.

  • Type check to make sure DEFAULT VALUE type is compatible with corresponding column type.

Further Work

HIVE-19059 adds the keyword DEFAULT to enable users to add DEFAULT values in INSERT and UPDATE statements without specifying the column schema.  See DEFAULT Keyword (HIVE-19059).