This document proposes the addition of DEFAULT clause to Hive. DEFAULT clause is a domain constraint which lets user specify a value for domain i.e. column to be used in absence of user specified value i.e. in absence of column reference. Note that this does not propose to implement DEFAULT ON NULL like ORACLE which lets user specify DEFAULT value for explicit NULLs.
Hive currently let users declare the following constraints:
DEFAULT will be a fifth addition to this list. Note that unlike existing constraints DEFAULT will only support ENABLE/ENFORCED or DISABLE keyword. VALIDATE/NOVALIDATE and RELY/NORELY will not be supported since ENABLING DEFAULT will not change existing data and the optimizer will not make use of the RELY keyword.
CREATE TABLE will be updated to let user specify DEFAULT as follows:
With column definition
CREATE TABLE <tableName> (<columnName> <dataType> DEFAULT <defaultValue>)
With constraint specification CREATE TABLE <tableName> (<columnName> <dataType>, …, CONSTRAINT <constraintName> DEFAULT <defaultValue> (<columnName>)
To be compliant with SQL standards, Hive will only permit default values which fall in one of the following categories:
DATE TIME VALUE FUNCTION, that is, CURRENT_TIME, CURRENT_DATE
- CAST (<expression in above category> as PRIMITIVE TYPE)
Anytime user doesn’t specify a value explicitly for a column, its default value will be used if defined. For example:
INSERT INTO <tableName>(co1, col3) values(<val1> , <val2>)
Above statement doesn’t specify a value for col2 so system will use the default value for col2 if it is defined.
On the other hand if user specifies an explicit value including NULL, for example:
INSERT INTO <tableName>(col1, col2, col3) values (<val1>, <val2>, <val3>)
then the default value will not be used.
Above are also valid for all following type of DMLs:
Since in Hive a query can not be written without referencing partition columns, there could not be a situation where the value for a partition column isn’t explicit. Therefore having DEFAULT for partition columns will not make sense and we propose to not add it.
We plan to disallow DEFAULT for external table since the data isn’t managed by Hive.
DEFAULT constraint will be allowed and behavior will be same as non-acid tables.
We propose to add column DEFAULT_VALUE to KEY_CONSTRAINTS table in metastore schema to store DEFAULT VALUE.
The following DDLs will be updated to accommodate DEFAULT:
ALTER TABLE CHANGE COLUMN
ALTER TABLE DROP CONSTRAINT
- Hive will have strict type rules for defining the default value, i.e., default value will have to be the exact same type as column type.
- Defaults with Complex data types (Array, Struct, Map and Union) are not allowed.
Currently if an INSERT query is missing a value in INSERT, i.e. if user hasn’t specified a value for a column, Hive uses ‘NULL’ as default. The Hive compiler detects this missing column value at compile time and inserts a NULL. We propose to update this logic to check for default and use that instead of NULL value.
Along with this logic change we foresee the following changes:
Metastore code will need to be updated to support the DEFAULT constraint.
We propose to store/serialize the default value as string after it is evaluated and constant folded.
DEFAULT_VALUE will be added to KEY_CONSTRAINTS table in metastore schema.
Hive Parser will need to be updated to allow new DEFAULT keyword with default value.
Error handling/Validation logic needs to be added to make sure DEFAULT value conforms to allowed categories during CREATE TABLE.
Type check to make sure DEFAULT VALUE type is compatible with corresponding column type.