Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

This FLIP is another step towards to making tables and materialized tables more consistent. There was already one improvement in that direction like FLIP-542[1] to add DISTRIBUTION and SHOW MATERIALIZED TABLES support. However there were several more things noticed comparing behavior for CREATE and ALTER operations. For instance right now for materialized tables it is impossible to set anything but table constraint while for tables (CREATE TABLE AS) it is possible to provide schema definition since FLIP-463[2], also ALTER operations for TABLE is a way more mature than for MATERIALIZED TABLE. This FLIP is about to decrease the difference by enabling more similar features for materialized tables.

Introducing schema definition support for materialized tables will provide users with greater control and flexibility and also will unify usage of tables and materialized tables.

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-542%3A+Make+materialized+table+DDL+consistent+with+regular+tables

[2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement

Public Interfaces

CREATE MATERIALIZED TABLE: support schema definition

Changes will be applied to the CREATE MATERIALIZED TABLE syntax.

New Supported SQL Syntax

So supported syntax for Materialized tabled would look like

CREATE MATERIALIZED TABLE [catalog_name.][db_name.]table_name
[( { <schema_definition> | <column_list> } )]
...
AS <select_statement>
<schema_definition>:
  [ <column_definition>[, ...n] ],
  [ <watermark_definition> ],
  [ <table_constraint>[, ...n] ]
 
<column_list>:
  col_name1 [, column_name2, ...]

Column definition

The CREATE MATERIALIZED TABLE statement will allow the user to define new columns, including watermarks and primary keys (already supported). The columns may include  any supported type, such as physical, metadata, and computed columns similar to what is currently present for CTAS.
Note: physical columns defined in DDL should be used in the  query. If there is such physical column defined in DDL and not used in the query, the error will be thrown

example:

CREATE MATERIALIZED TABLE my_materialized_table (column1)
    PARTITIONED BY (ds)
    DISTRIBUTED INTO 5 BUCKETS
    FRESHNESS = INTERVAL '1' HOUR
    AS SELECT 
        ds
    FROM
     ...
 

The behavior of CREATE MATERIALIZED TABLE when defining columns in the CREATE MATERIALIZED TABLE part same as for CTAS, also have a look at FLIP-463 (FLIP-463: Schema Definition in CREATE TABLE AS Statement - Apache Flink - Apache Software Foundation ) for more details

CREATE TABLE t1(a INT, b INT, c INT);

CREATE MATERIALIZED TABLE mt1(z INT) 
...
AS SELECT * FROM t1;

DESCRIBE mt1;
+-------------+-----------+----------+--------+
| Column Name | Data Type | Nullable | Extras |
+-------------+-----------+----------+--------+
| z          | INT        | NULL     |        |
| a          | INT        | NULL     |        |
| b          | INT        | NULL     |        |
| c          | INT        | NULL     |        |
+-------------+-----------+----------+--------+
 

And same as for CTAS in CREATE MATERIALIZED TABLE there might be column reordering like

CREATE TABLE t1(a INT, b INT, c INT);

CREATE MATERIALIZED TABLE mt1(b, c, a) 
...
AS SELECT * FROM t1;

Overriding resulted schema data types

It is possible to override the data type of a column derived from the query schema by using implicit and/or explicit casting.

Implicit Casting
CREATE TABLE t1(a INT);
CREATE MATERIALIZED TABLE mt1(a DOUBLE) 
...
AS SELECT a FROM t1;
DESCRIBE mt1;
+-------------+-----------+----------+--------+
| Column Name | Data Type | Nullable | Extras |
+-------------+-----------+----------+--------+
| a | DOUBLE | NULL | |
+-------------+-----------+----------+--------+

When implicit casting is used, the CREATE MATERIALIZED TABLE operation will follow schema compatibility rules similar to the INSERT/SELECT  statement. This check ensures that the CREATE MATERIALIZED TABLE query can run without conversion issues.

Example of a failing compatibility check:

CREATE TABLE t1(a STRING);
CREATE MATERIALIZED TABLE mt1(a INT) AS SELECT a FROM t1;
Column types of query result and sink for 'mt1' do not match.
Cause: Incompatible types for sink column 'a' at position 1.
Query schema: [a: STRING]
Sink schema: [a: INT]
Explicit Casting
CREATE TABLE t1(a INT);
CREATE MATERIALIZED TABLE mt1 
...
AS SELECT CAST(a AS DOUBLE) FROM t1;
DESCRIBE s1;
+-------------+-----------+----------+--------+
| Column Name | Data Type | Nullable | Extras |
+-------------+-----------+----------+--------+
| a | DOUBLE | NULL | |
+-------------+-----------+----------+--------+

The same approach as for CTAS: When explicit casting CREATE MATERIALIZED TABLE operation does not perform schema compatibility checks. The resulted schema is derived from the query after the explicit cast is done.


Watermarks

Similar to CTAS watermarks may also be defined in the CREATE MATERIALIZED TABLE  part and may include columns named in the CREATE  part or the SELECT  part.

  1. Watermark using columns from SELECT part.

    CREATE TABLE t1(a INT, b TIMESTAMP(3));
    CREATE MATERIALIZED TABLE mt1 (WATERMARK FOR b AS b - INTERVAL '5' SECOND)
    ...
    AS SELECT * FROM t1;
    
    
  2. Watermark using columns from CREATE  part.

    CREATE TABLE t1(a INT, b INT);
    CREATE MATERIALIZED TABLE mt1 (
    z TIMESTAMP(3),
    WATERMARK FOR z AS z - INTERVAL '5' SECOND
    )
    AS SELECT * FROM t1;
    

Primary keys

Mostly it is already supported for CREATE MATERIALIZED TABLE. The only different thing I noticed so far is forcible change of nullability for primary keys columns in case of tables (https://github.com/apache/flink/blob/80a95ef5a64d2948d58f8108cc7570627eded6ff/flink-table/flink-sql-parser/src/main/java/org/apache/flink/sql/parser/SqlConstraintValidator.java#L55-L88 ) and the FLIP is proposing doing same same for materialized tables

ALTER MATERIALIZED TABLE

Currently for ALTER MATERIALIZED TABLE [ADD|MODIFY|DROP] there is only support for DISTRIBUTION after FLIP-542, however if we look at similar functionality for tables: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/alter/#add there is a lot more.
The current FLIP suggests to add similar syntax for materialized tables like

ALTER MATERIALIZED TABLE ADD

NOTE: ADD/MODIFY/DROP columns is only supported for metadata and computed columns. In case of physical columns there will be a separate validation forbidding such case.

-- add a new column 
ALTER MATERIALIZED TABLE MyTable ADD category_id STRING COMMENT 'identifier of the category';

-- add columns, constraint, and watermark
ALTER MATERIALIZED TABLE MyTable ADD (
    log_ts STRING COMMENT 'log timestamp string' FIRST,
    ts AS TO_TIMESTAMP(log_ts) AFTER log_ts,
    PRIMARY KEY (id) NOT ENFORCED,
    WATERMARK FOR ts AS ts - INTERVAL '3' SECOND
);

ALTER MATERIALIZED TABLE MODIFY

-- modify a column type, comment and position
ALTER MATERIALIZED TABLE MyTable MODIFY measurement double COMMENT 'unit is bytes per second' AFTER `id`;

-- modify definition of column log_ts and ts, primary key, watermark. They must exist in materialized table schema
ALTER MATERIALIZED TABLE MyTable MODIFY (
log_ts STRING COMMENT 'log timestamp string' AFTER `id`, -- reorder columns
ts AS TO_TIMESTAMP(log_ts) AFTER log_ts,
PRIMARY KEY (id) NOT ENFORCED,
WATERMARK FOR ts AS ts -- modify watermark strategy
);

ALTER MATERIALIZED TABLE DROP

-- drop a column
ALTER MATERIALIZED TABLE MyTable DROP measurement;

-- drop columns
ALTER MATERIALIZED TABLE MyTable DROP (col1, col2, col3);

-- drop primary key
ALTER MATERIALIZED TABLE MyTable DROP PRIMARY KEY;

-- drop a watermark
ALTER MATERIALIZED TABLE MyTable DROP WATERMARK; 

ALTER MATERIALIZED TABLE RENAME

Right now there is no way to rename MATERIALIZED TABLE except for dropping and recreating it again. At the same time there is such functionality for TABLEs. The FLIP also suggests to have this feature for MATERIALIZED TABLEs

Test Plan

The plan is to run all existing tests for the existing MATERIALIZED TABLES design (without a schema definition) to verify nothing is broken.

Also, new tests, that include positive and negative tests

  • Define new columns that extend the schema of the MATERIALIZED TABLES statement

  • Define columns that exist in the resulted schema and override the column data type

  • Define watermarks that use columns from the resulted schema and new columns added in the MATERIALIZED TABLES statement

  • Define primary keys that use columns from the resulted schema and new columns added in the MATERIALIZED TABLES statement

  • Define partition keys and table distribution that use columns from the resulted schema and new columns added in the MATERIALIZED TABLES statement

Compatibility, Deprecation, and Migration Plan

  • It is a new feature with no implication for backwards compatibility.