Status

Discussion threadhttps://lists.apache.org/thread/mwvqvfgv2qbmjs5f3zolqy6r5r08zoj0
Vote thread
JIRA

FLINK-17105 - Getting issue details... STATUS

Release1.11


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Background

In umbrella task FLINK-10232 we have introduced CREATE/DROP VIEW grammar in our module flink-sql-parser. But we don't support view objects in neither blink planner nor old planner. In this doc, we want to discuss the design of supporting view for blink planner.

In a database system, a VIEW represents the result set of a stored query on the data. Users can query it just as they would in a persistent database collection object. Unlike ordinary base tables in a database, a VIEW is a virtual table which is computed or collected dynamically from data in the DBMS (while the view is requested). 

The SQL Standard defines many features about VIEW and database vendors have different implementations. This initial design bases on the following prerequisites:

  • Recursive view is not supported[1][2][3]

          The CREATE RECURSIVE VIEW statement is syntax sugar for a standard recursive query. Since recursive query is not supported, we DO NOT support recursive view.

  • Updatable view is not supported

          Views defined in Flink are read-only. INSERT, UPDATE and DELETE operations are illegal and may raise an exception.

  • Materialized view is not supported

          Materialized views are pre-executed and non-virtual which commonly used in data warehousing. Since we don't have a storage layer, we DO NOT support materialized view.

  • Interoperability between Flink and Hive is not guaranteed

          When using Hive Catalog, Flink access views defined by Hive and Hive access views defined by Flink are not guaranteed.

Scope

In simple terms, we want to:

  • Add a queryable view support for FLINK SQL
  • Make sure the definition of views and objects(columns, functions, tables, views etc) are valid

The Grammar

CRATE/DROP VIEW grammar has been described in the design doc FLINK SQL DDL, and we have introduced CREATE/DROP VIEW grammar in our module flink-sql-parser in FLINK-12954.

In this design doc, we will follow the grammar in FLINK-12954 which described as following:

createViewStatement:
      CREATE [TEMPORARY] VIEW [IF NOT EXISTS] name
      [ '(' columnName, ... ')' ]

      [ COMMENT view_comment ]

      AS

      query_expression

dropViewStatement:

      DROP [TEMPORARY] VIEW [ IF EXISTS ] name

showViewsStatement:

        SHOW VIEWS

describeViewStatement:

        DESCRIBE VIEW name

Proposed Changes

The basics of view implementation is easy. And in this design doc, we want to solve the following questions:

  • For CREATE VIEW name AS query_expression, uses execute this ddl statement with TableEnvironment::sqlUpdate, and  the sql parser parse it as SqlCreateView. The planner convert it to a CatalogView which is wrapped as CreateViewOperation and write it to catalog. In this phase, we will do some validations on objects in the view definition. 
  • For SELECT ... FROM view-name, the planner detect the view from catalog( by FlinkCalciteCatalogReader), load the view definition and convert it to SqlCatalogViewTable. The SqlToRelConverter will expand it to RelNode and the planner will generate plan on that. (This is already implemented by FLINK-12905)
  • For DROP VIEW nameuses execute this ddl statement with TableEnvironment::sqlUpdate, and the sql parser parse it as SqlDropView. The planner get the qualified name which is wrapped as DropViewOperation and dropped from catalog.
  • Supporting SHOW VIEWS|DESCRIBE VIEW name in SQL Client and TableEnvironment depends on the discussing in FLIP 69 - Flink SQL DDL Enhancement


View definition in catalog

We store the original text of the view definition and the expanded text of the view definition in catalog. In SQL world, a view definition is supposed to be frozen at the time it is created. For example, for a view that is defined in the context of "default" database with a query "select * from test1" where table test1 resides in database "default" and has two columns "name" and "value", the expanded text should become "select `test1`.`name`, `test1`.`value` from `default`.`test1` as `test1`".  Then later users can query the view even if the current DB is lost. And request of "select * from view" only returns two columns "name" and "value" even if a new column is added to the table "test1".

Dependencies maintaining

SQL standard introduce CASCADE/RESTRICT keywords to maintain dependencies (objects in view definition such as: columns, functions, tables, other views, etc). For example, if we want to perform an operation on a table, the operation could be prevent or the view changes correspondingly. But not all DBMS completely support this[4].  We can start with a lenient approach, which is DO NOT maintain dependencies and postpone producing error message when anyone tries to use the view.

Use Cases

  • Using a simple CRATE VIEW
CREATE VIEW IF NOT EXISTS hiredate_view (FirstName, LastName, BusinessEntityID, HireDate)
AS 
SELECT p.FirstName, p.LastName, e.BusinessEntityID, e.HireDate 
FROM 
  Employee AS e JOIN Person AS p ON e.BusinessEntityID = p.BusinessEntityID
  • Using builtin-functions in a view
CREATE VIEW IF NOT EXISTS sales_person_perform_view (SalesPersonID, TotalSales) 
AS  
SELECT SalesPersonID, SUM(TotalDue)  
FROM SalesOrderHeader  
GROUP BY SalesPersonID
  • Using a pre-defined view in a view
CREATE VIEW IF NOT EXISTS max_hiredate_view (BusinessEntityID, MaxHireDate)
AS
SELECT BusinessEntityID, MAX(HireDate)
FROM hiredate_view
GROUP BY BusinessEntityID

Reference

[1] http://www.postgresqltutorial.com/postgresql-recursive-view/

[2] http://www.postgresqltutorial.com/postgresql-recursive-query/

[3] https://issues.apache.org/jira/browse/CALCITE-129

[4] https://docs.microsoft.com/en-us/sql/t-sql/statements/create-view-transact-sql?view=sql-server-2017#remarks

Document

https://docs.google.com/document/d/14bx0t8wYH7_o4ChNkDoBFGn-i0T-Q7kUiOFvDd13_Fk