Current state: "Under Discussion"
Discussion thread: https://lists.apache.org/thread.html/0480d17dad32c2df62b3d401385f2140e221b42ee696494a14f73dc5@%3Cdev.flink.apache.org%3E
Released: <Flink Version>
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
The documentation for Table & SQL API has evolved over time. In the meantime, more streaming concepts, a SQL Client, connectors, catalogs etc. were added. Also the vision of a unified API for both batch and streaming is making progress.
It is time to rework the structure in order to:
- attract and target both pure SQL users and programming users
- find features more easily
- have a clear reading flow/order of topics
This structure aims to get to the point quickly. It shows the features early to attract users and uses cross links e.g. to execution or concepts for interested readers.
The top-level pages are:
Overview *we start with a good overview*
Table API *get to the point, the features and operations*
SQL *get to the point, the features and operations*
Setup & Execution *explain the setup and how to execute features*
Connect to External Systems *details*
Built-in Functions *details*
User-defined Extensions *custom details*
Concepts *reference from other pages*
- Table & SQL Ecosystem
Highest level. Next to Application Development because most of the current list siblings are not applicable to this section. The section is also renamed because SQL Client, connectors, and catalogs are not only an API.
What is the Table & SQL Ecosystem?
Might be a big page but is a nice executive summary and informative.
- Main features like schema awareness, abstraction, connectors, catalogs etc.
- How do we achieve unified data processing? Dynamic Tables
- Quickly mention planners
- Advantages/Disadvantages over DataStream API
- E2E example for SQL
- E2E example for Table API in Java/Scala/Python
- E2E example for Table API in Java/Scala/Python with DataStream API
- Short presentation of the SQL Client
- Table API
Goal: Show users the main table features early and link to concepts if necessary.
How to use the API? Intended for users with programming knowledge.
- Short getting started with link to more detailed execution section.
- Explain the most important methods in unified Table Environment
- Present sqlUpdate/sqlQuery etc.
- Querying, Execution, Optimization internals behind the API
- Full Reference
Available operations in the API. This location allows to further split the page in the future if we think an operation needs more space without affecting the top-level structure.
- Present the API operations
- ... more features in the future
Goal: Show users the main features early and link to concepts if necessary.
How to use SQL? Intended for users with SQL knowledge.
Getting started with link to more detailed execution section.
- Full Reference
Available operations in SQL as a table. This location allows to further split the page in the future if we think an operation needs more space without affecting the top-level structure.
- Data Definition
Explain special SQL syntax around DDL.
- Pattern Matching
Make pattern matching more visible.
- ... more features in the future
- Setup & Execution
How to setup a project and submit a job?
- Dependency structure for using the API
- Features and limitations of table environments
- How to setup Python projects?
- SQL Client
How to use the SQL Client? Docs for pure SQL users.
- … Notebooks, JDBC, and more executions in the future
- Connect to External Systems
How to connect to other systems for data or metadata?
- What are available sources and sinks?
- What are catalogs? What can I manage with it?
- Available Connectors
- Available Catalogs
- Built-in Functions
- User-defined Extensions
Use API extension points.
- Sources & Sinks
What are the general concepts (independent of API/SQL Client) any user should know about?
We put this at the end and link from the main pages to pages here if necessary.
What is a Planner? Temporary docs. Removed in the future.
- Blink Planner Features and Limitations
- Flink Planner Features and Limitations
- Data Types
Which data can we process?
- Unbounded Data Processing
Which operation needs special attention when working with unbounded data?
- Dynamic Tables (with all update modes)
- Time Attributes
- Query Configuration
- Joins in Continuous Queries
- Temporal Tables
Explain the concept of a temporal table.
Compatibility, Deprecation, and Migration Plan
We redirect old page links to the new structure.
The first version was more like a walkthrough to guide the reader:
Data Types *we start with available data types*
Connect to External Systems *a program starts with connectors*
SQL *continue with operators*
Built-in Functions *and functions*
Configuration *finally we tweak the execution*
However, the problem here was that a user would read a lot of "boring" concept pages first and the actual list of operations would be nested in the Table API section and SQL section. The API sections should have highest priority and be presented as one of the first items in the list.