Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The syntax to create a materialized view in Hive is very similar to the CTAS statement syntax, supporting common features such as partition columns, custom storage handler, or passing table properties.

Code Block
sql
sql
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name
  [DISABLE REWRITE]
  [COMMENT materialized_view_comment]
  [
PARTITIONED ON   [ROW FORMAT row_format]
  (col_name, ...)]
  [STOREDCLUSTERED ASON file_format]
      | STORED BY 'storage.(col_name, ...) | DISTRIBUTED ON (col_name, ...) SORTED ON (col_name, ...)]
  [
    [ROW FORMAT row_format]
    [STORED AS file_format]
      | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
AS
<query>;

...

The rewriting algorithm can be enabled and disabled globally using the hive.materializedview.rewriting configuration property and hive.materializedview.rewriting.sql configuration properties (default value is is true). In addition, users can selectively enable/disable materialized views for rewriting. Recall that, by default, materialized views are enabled for rewriting at creation time. To alter that behavior, the following statement can be used:

Code Block
sql
sql
ALTER MATERIALIZED VIEW [db_name.]materialized_view_name ENABLE|DISABLE REWRITE;

...

Hive supports two types of rewriting algorithms:

  • Algebraic: this is part of Apache Calcite and it supports queries

...

  • containing TableScan, Project, Filter, Join, and Aggregate operators. More information about

...

  • this rewriting coverage can be

...

  • found here. In the following, we include a few examples that briefly illustrate different rewritings.

Example 1

Consider the database schema created by the following DDL statements:

...

Code Block
sql
sql
SELECT floor(time to month),
    SUM(c_added)
FROM mv3
GROUP BY floor(time to month);

Materialized view maintenance

  • SQL text: The materialized view definition query text is compared to the incoming query text or it's subquery text. It supports all kind of operators and aggregate functions.

Materialized view maintenance

When data in the source tables used by a materialized view changes, e.g., new data is inserted or When data in the sources tables used by a materialized view changes, e.g., new data is inserted or existing data is modified, we will need to refresh the contents of the materialized view to keep it up-to-date with those changes. Currently, the rebuild operation for a materialized view needs to be triggered by the user. In particular, the user should execute the following statement:

...

Hive supports incremental view maintenance, i.e., only refresh data that was affected by the changes in the original source tables. Incremental view maintenance will decrease the rebuild step execution time. In addition, it will preserve LLAP cache for existing data in the materialized view. To execute incremental view maintenance, the materialized view should only use transactional tables, either micromanaged or ACID.

By default, Hive will attempt to rebuild a materialized view By default, Hive will attempt to rebuild a materialized view incrementally, falling back to full rebuild if it is not possible.

Current  Current implementation only supports incremental rebuild when there were were INSERT operations over the source tables, while UPDATE and DELETE operations will force a full rebuild of the materialized view.

To execute incremental maintenance, following conditions should be met if there were only INSERT operations over the source tables:

  • The materialized view should only use transactional tables (the source tables must be transactional), either micromanaged or ACID or a storage format that supports snapshots (ex. Iceberg)
  • If the materialized view definition contains a Group By clause, the materialized view should be stored in an ACID table or a storage format that supports snapshots (ex. Iceberg v2), since it needs to support MERGE operation. For materialized view definitions consisting of Scan-Project-Filter-Join, this restriction does not exist.
  • If the materialized view definition contains a Group By clause the following aggregate functions are supported: COUNT, SUM, AVG (only if both COUNT and SUM defined for the same column), MIN, MAX

If there were UPDATE and DELETE operations over the source tables:

  • The materialized view should only use transactional tables (the source tables must be transactional), either micromanaged or ACID
  • The materialized view definition must contain a Group By clause and a COUNT(*) function call.
  • The materialized view should be stored in an ACID table or a storage format that supports snapshots (ex. Iceberg v2), since it needs to support MERGE operation.   
  • The following aggregate functions are supported: COUNT, SUM, AVG (only if both COUNT and SUM defined for the same column)

A rebuild operation acquires an exclusive write lock over the materialized view, i.e., for a given materialized view, only one rebuild operation can be executed at a given time.

Materialized view lifecycle

...