Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
In umbrella task FLINK-10232 we have introduced CREATE/DROP VIEW grammar in our module flink-sql-parser. But we don't support view objects in neither blink planner nor old planner. In this doc, we want to discuss the design of supporting view for blink planner.
In a database system, a VIEW represents the result set of a stored query on the data. Users can query it just as they would in a persistent database collection object. Unlike ordinary base tables in a database, a VIEW is a virtual table which is computed or collected dynamically from data in the DBMS (while the view is requested).
The SQL Standard defines many features about VIEW and database vendors have different implementations. This initial design bases on the following prerequisites:
- Recursive view is not supported
The CREATE RECURSIVE VIEW statement is syntax sugar for a standard recursive query. Since recursive query is not supported, we DO NOT support recursive view.
- Updatable view is not supported
Views defined in Flink are read-only. INSERT, UPDATE and DELETE operations are illegal and may raise an exception.
- Materialized view is not supported
Materialized views are pre-executed and non-virtual which commonly used in data warehousing. Since we don't have a storage layer, we DO NOT support materialized view.
- Interoperability between Flink and Hive is not guaranteed
When using Hive Catalog, Flink access views defined by Hive and Hive access views defined by Flink are not guaranteed.
In simple terms, we want to:
- Add a queryable view support for FLINK SQL
- Make sure the definition of views and objects(columns, functions, tables, views etc) are valid
In this design doc, we will follow the grammar in FLINK-12954 which described as following:
CREATE [TEMPORARY] VIEW [IF NOT EXISTS] name
[ '(' columnName, ... ')' ]
[ COMMENT view_comment ]
DROP [TEMPORARY] VIEW [ IF EXISTS ] name
DESCRIBE VIEW name
The basics of view implementation is easy. And in this design doc, we want to solve the following questions:
- For CREATE VIEW name AS query_expression, uses execute this ddl statement with TableEnvironment::sqlUpdate, and the sql parser parse it as SqlCreateView. The planner convert it to a CatalogView which is wrapped as CreateViewOperation and write it to catalog. In this phase, we will do some validations on objects in the view definition.
- For SELECT ... FROM view-name, the planner detect the view from catalog( by FlinkCalciteCatalogReader), load the view definition and convert it to SqlCatalogViewTable. The SqlToRelConverter will expand it to RelNode and the planner will generate plan on that. (This is already implemented by FLINK-12905)
- For DROP VIEW name, uses execute this ddl statement with TableEnvironment::sqlUpdate, and the sql parser parse it as SqlDropView. The planner get the qualified name which is wrapped as DropViewOperation and dropped from catalog.
- Supporting SHOW VIEWS|DESCRIBE VIEW name in SQL Client and TableEnvironment depends on the discussing in FLIP 69 - Flink SQL DDL Enhancement
View definition in catalog
We store the original text of the view definition and the expanded text of the view definition in catalog. In SQL world, a view definition is supposed to be frozen at the time it is created. For example, for a view that is defined in the context of "default" database with a query "select * from test1" where table test1 resides in database "default" and has two columns "name" and "value", the expanded text should become "select `test1`.`name`, `test1`.`value` from `default`.`test1` as `test1`". Then later users can query the view even if the current DB is lost. And request of "select * from view" only returns two columns "name" and "value" even if a new column is added to the table "test1".
SQL standard introduce CASCADE/RESTRICT keywords to maintain dependencies (objects in view definition such as: columns, functions, tables, other views, etc). For example, if we want to perform an operation on a table, the operation could be prevent or the view changes correspondingly. But not all DBMS completely support this. We can start with a lenient approach, which is DO NOT maintain dependencies and postpone producing error message when anyone tries to use the view.
- Using a simple CRATE VIEW
- Using builtin-functions in a view
- Using a pre-defined view in a view