Document the state by adding a label to the FLIP page with one of "discussion", "accepted", "released", "rejected".

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

The Apache Flink documentation has evolved organically over many years. While comprehensive, the current structure has several issues that make it harder for users to discover and navigate to the right content for their use case:

  1. Unclear boundaries between introductory sections: The distinction between "Try Flink" (tutorials), "Learn Flink" (conceptual learning), and "Concepts" (foundational concepts) is unclear. Users may not know which section to start with or how these sections relate to each other. The Diátaxis documentation framework categorizes documentation into tutorials (learning-oriented), how-to guides (task-oriented), reference (information-oriented), and explanation (understanding-oriented). The current Flink sections blur these categories, making navigation confusing.

  2. SQL is bundled with Table API: Users interested only in SQL must navigate through programmatic Table API content. SQL has become a first-class citizen in Flink with SQL Client, SQL Gateway, JDBC driver, and extensive SQL support, warranting its own documentation section.

  3. Streaming concepts are buried in Table API: Concepts like Dynamic Tables, Time Attributes, and Versioned Tables are located under Table API, but these concepts apply equally to SQL users. They should be accessible to all users who are interested in relational streaming concepts.

  4. Python documentation duplicates structure: The Python API section replicates the structure of both Table API and DataStream API, creating maintenance burden and making it harder to show Python as an alternative language alongside Java/Scala.

  5. Contributor documentation mixed with user documentation: The "Flink Development" section contains contributor-focused content (building from source, IDE setup, contributing guidelines) that clutters the user-facing documentation navigation. This content belongs in the repository root where contributors naturally look.

  6. Connector discoverability for SQL users: SQL users may not realize they need to look under "Table API Connectors" for information about connecting to external systems like Kafka or databases. The section name doesn't indicate SQL compatibility.

FLIP-60: Restructure Documentation of Table API & SQL was proposed in 2019 to address some of these issues, specifically separating SQL from Table API documentation. While FLIP-60 was discussed, it was never voted on. This FLIP builds upon the ideas from FLIP-60 and expands the scope to address the broader documentation structure, including Python integration and connector documentation organization. It also follows the ideas outlined in FLIP-541: Making PyFlink more Pythonic (Phase-1).

Public Interfaces

N/A

Proposed Changes

An example of how the restructured Flink documentation would look like can be found at https://apache-flink-doc-refactoring.netlify.app/ - Here are some example screenshots:

Note: the examples don't contain all the changes, but mostly the new menu structure, moved content and some new content (like First Steps). Not everything proposed in this FLIP has been implemented.

This FLIP proposes restructuring the documentation to improve discoverability and reduce duplication by:

  • Creating a dedicated Flink SQL section separate from Table API

  • Moving shared concepts (Relational Streaming concepts like Dynamic Tables, Time Attributes) to the top-level Concepts section)

  • Moving shared architecture documentation (Source/Sink APIs) to the Connectors section

  • Integrating Python documentation within Table API and DataStream API sections where applicable, and move the rest to the Python API docs (PyDocs)

Proposed Structure


1. Getting Started (renamed from "Try Flink")
   ├── First Steps (download, install, run example)
   ├── Flink SQL Tutorial
   ├── Table API Tutorial
   ├── DataStream API Tutorial
   └── Flink Operations Playground

2. Learn Flink (restructured)
   ├── Overview (API-agnostic: streams, state, time concepts)
   ├── Streaming Analytics (windowing, aggregations conceptually)
   ├── Learn the Table API (NEW)
   ├── Learn the DataStream API
   ├── Data Pipelines & ETL
   ├── Event-Driven Applications
   └── Fault Tolerance (API-agnostic)

3. Concepts (shared, API-agnostic)
   ├── Overview
   ├── Stateful Stream Processing
   ├── Timely Stream Processing
   ├── Relational Streaming (moved from Table API)
   │   ├── Overview
   │   ├── Dynamic Tables
   │   ├── Time Attributes
   │   ├── Versioned Tables
   │   ├── Joins in Continuous Queries
   │   └── Determinism
   ├── Flink Architecture
   └── Glossary

4. Flink SQL (NEW top-level section)
   ├── Overview (NEW)
   │
   ├── SQL Interfaces
   │   ├── SQL Client
   │   ├── SQL Gateway
   │   │   ├── Overview
   │   │   ├── REST Endpoint
   │   │   └── HiveServer2 Endpoint
   │   └── JDBC Driver
   │
   ├── SQL Reference
   │   ├── Data Types (moved from Table API)
   │   ├── Data Definition (DDL)
   │   │   ├── CREATE Statements
   │   │   ├── ALTER Statements
   │   │   └── DROP Statements
   │   ├── Data Manipulation (DML)
   │   │   ├── INSERT
   │   │   ├── UPDATE
   │   │   └── DELETE
   │   ├── Queries
   │   │   ├── Overview
   │   │   ├── SELECT & WHERE
   │   │   ├── Joins
   │   │   ├── Aggregations
   │   │   ├── Window Functions
   │   │   ├── Pattern Matching (MATCH_RECOGNIZE)
   │   │   └── ...
   │   └── Utility Statements
   │       ├── SHOW & DESCRIBE
   │       ├── EXPLAIN
   │       ├── SET & USE
   │       └── ...
   │
   ├── Functions
   │   ├── Built-in Functions (categorized)
   │   └── User-Defined Functions (NEW - links to Table API UDF docs)
   │
   ├── Catalogs
   ├── Time Zone (moved from Table API)
   ├── Materialized Table
   └── Hive Compatibility

5. Flink APIs (renamed from "Application Development")
   ├── Overview (NEW - explains available APIs and their use cases)
   ├── Table API
   │   ├── Overview
   │   ├── Concepts & Common API
   │   ├── TableEnvironment
   │   ├── Table API
   │   ├── Functions
   │   │   ├── Overview
   │   │   ├── User-Defined Functions
   │   │   └── Process Table Functions
   │   ├── DataStream API Integration
   │   ├── Procedures
   │   ├── Modules
   │   ├── OLAP Quickstart
   │   ├── Configuration
   │   ├── Performance Tuning
   ├── DataStream API
   │   ├── Overview
   │   ├── Event Time & Watermarks
   │   ├── Operators
   │   ├── Windows
   │   ├── Process Functions
   │   ├── State Management
   │   ├── Fault Tolerance (Checkpointing, State Backends)
   │   ├── Serialization
   │   ├── Execution & Parallelism
   │   ├── Testing
   ├── Configuration
   └── DataStream v2 (Experimental)

6. Libraries (unchanged)
   ├── CEP (Complex Event Processing)
   └── State Processor API

7. Connectors (restructured)
   ├── SQL & Table API Connectors (renamed from "Table API Connectors")
   ├── DataStream Connectors
   ├── Data Sources (moved from DataStream API, to be integrated later into one page under Connectors with User-defined Sources & Sinks from Table API)
   ├── Data Sinks (moved from DataStream API, to be integrated later into one page under Connectors with User-defined Sources & Sinks from Table API)
   ├── User-defined Sources & Sinks (moved from Table API, to be integrated later into one page under Connectors with both Data Sources and Data Sinks from DataStream API) 
   └── Models

8. Deployment (unchanged structure)
9. Operations (unchanged structure)

10. Internals (unchanged)

Note: "Flink Development" section removed from documentation navigation (content moved to repository root).
  • Update the Flink contribution guide for documentation to explain if content should be under Learn Flink (when content is primarily "follow these steps to learn X") or under Concepts (when content is primarily here's how X works under the hood")
  • Update the README accordingly, and introduce DEVELOPMENT.md that contains content currently under "Flink Development"


Compatibility, Deprecation, and Migration Plan

  • All existing URLs will continue to work via aliases/redirects configured in Hugo front matter.

Test Plan

  • Verify all redirects resolve correctly

  • Check that internal links work

  • Validate documentation builds successfully

  • Manual review of navigation structure

Future Work

The following initiatives are out of scope for this FLIP but are planned as follow-up efforts:

AI-Assisted Documentation Access

Providing documentation content directly to LLMs via MCP servers or similar mechanisms is a valuable direction. This could include:

  • Deploying a docs MCP server

  • Providing "Add to Cursor/Claude/your favorite tool" integration buttons

  • Adding "Copy as Markdown" functionality for easy pasting into AI assistants

This would enable AI-first documentation workflows that are increasingly common. A separate FLIP will address this.

Connectors Section Restructuring

A follow-up FLIP is planned to further restructure the Connectors section:

  • Organize connectors by system (Kafka, JDBC, Elasticsearch, etc.) rather than by API

  • Each connector page would show usage for SQL, Table API, and DataStream API (where supported) using tabs

  • Create a dedicated Formats subsection explaining format configuration

  • Unify the currently fragmented connector documentation experience

This requires coordination with externalized connector repositories.

Rejected Alternatives

Alternative: Keep SQL within Table API Section

Rejected because SQL users have distinct needs from programmatic Table API users. Separating SQL:

  • Creates a cleaner path for SQL-only users (data analysts, SQL developers)

  • Prominently features SQL Client, SQL Gateway, and JDBC Driver

  • Reduces the size of the Table API section, making it easier to navigate