Link: Unresolved issues in storm-sql
Current milestone
Storm SQL Phase II
JIRA link
Jira | ||||||||
---|---|---|---|---|---|---|---|---|
|
Remaining works
Jira server ASF JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key STORM-1443 Jira server ASF JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key STORM-1446 Jira server ASF JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key STORM-2073
Not prioritized yet
Expand supporting external components
JIRA link
Jira | ||||||||
---|---|---|---|---|---|---|---|---|
|
Done
- Kafka as Input / Output
- Redis as Output
Remaining works
Jira server ASF JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key STORM-2082 Jira server ASF JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key STORM-2102 Jira server ASF JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key STORM-2103 - And moreand etc.
- Any external modules which support Trident state can be candidates.
Consideration
- They should be rewritten if we replaces the backend of Storm SQL to higher-level core API
- Need to determine 'Widely used' data sources and only provides them for now
...
Automatic parallelism for input data source with metadata
JIRA link
Jira | ||||||
---|---|---|---|---|---|---|
|
|
Things to do (not filed to issue yet)
- Apply this to Kafka input data source (maybe handled from STORM-2147)
- Do we want to add more input data sources? Then they need to be considered as well.
Schema support on input format and output format
JIRA link
Jira | ||||||
---|---|---|---|---|---|---|
|
|
|
-- Schema Registry?
-- etc?- <EPIC?>
|
Things to do (not filed to issues yet)
- CSV
- Avro
- TSV
- Schema Registry
- And more
Supports more functions (scalar and aggregation)
JIRA link
None yet
Things to do (not filed to issues yet)
...
- DATE / TIMESTAMP related functions
...
- Calcite lacks here, so may need to refer one of RDBMS - MySQL / Oracle / PostgreSQL
- Functions which SQE supports now
- And more
Projection / Filter pushdown to data source
JIRA link
None yet
Note
- Not sure it helps stream data source
- It may help but we should make sure that Spout supports projection (maybe only column referring) / filter
- It definitely helps with input data sources which accepts query (for example, JDBC)
Depends on other works (Future work)
Change
Future work:
...
backend of SQL to higher-level core API (get rid of Trident)
JIRA link
None yet - without supporting join, aggregation, sort, and so on
- higher
Precondition
- Apache Storm adopts higher-level core API
- JIRA issue:
Jira server ASF JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key STORM-1843
- JIRA issue:
- Higher-level core API
...
- supports exactly-once
...
Note
- Storm SQL may go back to basic feature: no aggregation, no join, no sort
- because higher-level core API cannot support aggregation and join without window
- current Storm SQL's aggregation and join semantic are very different from Streaming SQL
Support Streaming SQL
JIRA link
None yet
Precondition
- Calcite supports Streaming SQL : https://calcite.apache.org/docs/stream.html
- For now this page states that Streaming SQL features are NOT IMPLEMENTED yet
- This still seems to be a design phase.
- Julian Hyde had various talks in various place, but in parallel he posted to Calcite dev mailing list for design doc about join recent days.
Note
...
- group by window
...
- join between stream and table (without support temporal table)
...
- join between stream and stream
...
- join between stream and table (with support temporal
...
- table or similar)