Status
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Background
In umbrella task FLINK-10232 we have introduced CREATE/DROP VIEW grammar in our module flink-sql-parser. But we don't support view objects in neither blink planner nor old planner. In this doc, we want to discuss the design of supporting view for blink planner.
In a database system, a VIEW represents the result set of a stored query on the data. Users can query it just as they would in a persistent database collection object. Unlike ordinary base tables in a database, a VIEW is a virtual table which is computed or collected dynamically from data in the DBMS (while the view is requested).
The SQL Standard defines many features about VIEW and database vendors have different implementations. This initial design bases on the following prerequisites:
- Recursive view is not supported[1][2][3]
The CREATE RECURSIVE VIEW statement is syntax sugar for a standard recursive query. Since recursive query is not supported, we DO NOT support recursive view.
- Updatable view is not supported
Views defined in Flink are read-only. INSERT, UPDATE and DELETE operations are illegal and may raise an exception.
- Materialized view is not supported
Materialized views are pre-executed and non-virtual which commonly used in data warehousing. Since we don't have a storage layer, we DO NOT support materialized view.
- Interoperability between Flink and Hive is not guaranteed
When using Hive Catalog, Flink access views defined by Hive and Hive access views defined by Flink are not guaranteed.
Scope
In simple terms, we want to:
- Add a queryable view support for FLINK SQL
- Make sure the definition of views and objects(columns, functions, tables, views etc) are valid
The Grammar
CRATE/DROP VIEW grammar has been described in the design doc FLINK SQL DDL, and we have introduced CREATE/DROP VIEW grammar in our module flink-sql-parser in FLINK-12954.
In this design doc, we will follow the grammar in FLINK-12954 which described as following:
createViewStatement:
CREATE [TEMPORARY] VIEW [IF NOT EXISTS] name
[ '(' columnName, ... ')' ][ COMMENT view_comment ]
AS
query_expression
dropViewStatement:
DROP [TEMPORARY] VIEW [ IF EXISTS ] name
showViewsStatement:
SHOW VIEWS
describeViewStatement:
DESCRIBE VIEW name
Proposed Changes
The basics of view implementation is easy. And in this design doc, we want to solve the following questions:
- For CREATE VIEW name AS query_expression, uses execute this ddl statement with TableEnvironment::sqlUpdate, and the sql parser parse it as SqlCreateView. The planner convert it to a CatalogView which is wrapped as CreateViewOperation and write it to catalog. In this phase, we will do some validations on objects in the view definition.
- For SELECT ... FROM view-name, the planner detect the view from catalog( by FlinkCalciteCatalogReader), load the view definition and convert it to SqlCatalogViewTable. The SqlToRelConverter will expand it to RelNode and the planner will generate plan on that. (This is already implemented by FLINK-12905)
- For DROP VIEW name, uses execute this ddl statement with TableEnvironment::sqlUpdate, and the sql parser parse it as SqlDropView. The planner get the qualified name which is wrapped as DropViewOperation and dropped from catalog.
- Supporting SHOW VIEWS|DESCRIBE VIEW name in SQL Client and TableEnvironment depends on the discussing in FLIP 69 - Flink SQL DDL Enhancement
View definition in catalog
We store the original text of the view definition and the expanded text of the view definition in catalog. In SQL world, a view definition is supposed to be frozen at the time it is created. For example, for a view that is defined in the context of "default" database with a query "select * from test1" where table test1 resides in database "default" and has two columns "name" and "value", the expanded text should become "select `test1`.`name`, `test1`.`value` from `default`.`test1` as `test1`". Then later users can query the view even if the current DB is lost. And request of "select * from view" only returns two columns "name" and "value" even if a new column is added to the table "test1".
Dependencies maintaining
SQL standard introduce CASCADE/RESTRICT keywords to maintain dependencies (objects in view definition such as: columns, functions, tables, other views, etc). For example, if we want to perform an operation on a table, the operation could be prevent or the view changes correspondingly. But not all DBMS completely support this[4]. We can start with a lenient approach, which is DO NOT maintain dependencies and postpone producing error message when anyone tries to use the view.
Use Cases
- Using a simple CRATE VIEW
CREATE VIEW IF NOT EXISTS hiredate_view (FirstName, LastName, BusinessEntityID, HireDate) AS SELECT p.FirstName, p.LastName, e.BusinessEntityID, e.HireDate FROM Employee AS e JOIN Person AS p ON e.BusinessEntityID = p.BusinessEntityID
- Using builtin-functions in a view
CREATE VIEW IF NOT EXISTS sales_person_perform_view (SalesPersonID, TotalSales) AS SELECT SalesPersonID, SUM(TotalDue) FROM SalesOrderHeader GROUP BY SalesPersonID
- Using a pre-defined view in a view
CREATE VIEW IF NOT EXISTS max_hiredate_view (BusinessEntityID, MaxHireDate) AS SELECT BusinessEntityID, MAX(HireDate) FROM hiredate_view GROUP BY BusinessEntityID
Reference
[1] http://www.postgresqltutorial.com/postgresql-recursive-view/
[2] http://www.postgresqltutorial.com/postgresql-recursive-query/
[3] https://issues.apache.org/jira/browse/CALCITE-129
Document
https://docs.google.com/document/d/14bx0t8wYH7_o4ChNkDoBFGn-i0T-Q7kUiOFvDd13_Fk