Page History

...

Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-1443
Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-1446
Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-2073

Others (non-epic)

Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-2125
Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-2099

How this milestone will help users to play with Storm SQL?

Users will be able to run Storm SQL Runner without copying dependencies to extlib.
- This is resolved from
  Jira
  server ASF JIRA
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key STORM-2016
  but this issue was originally came from
  Jira
  server ASF JIRA
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key STORM-1435
Users will be able to use GROUP BY and JOIN statement in DML statement
- Note that these feature follows the Trident semantic of aggregation and join
- Aggregation is done within batch
  - SQE does stateful aggregation, so Storm SQL might follow later
- Join is done with each of batches
- But respecting Streaming SQL semantic seems correct, which means that we could get rid of these features again and wait for Streaming SQL
  - This is what Flink is doing, but Flink have Table API so Flink can support window aggregation and join without Calcite support
Users will be able to use user defined aggregate function in Trident mode
- User defined function is already supported
Users will be able to run 'explain' to see query plan for DML statement before submitting Trident topology
Users will be able to specify parallelism of input data source which will be used unless repartitioning is made
Users will be able to store rows to Redis

How this milestone will improve Storm SQL internally?

Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-2073
will reduce redundant multiple Trident steps into one.
Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-1446
will do some query optimizations, and open the way to address next works, like automatic parallelism, pushdown, and so on
Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-2125
will enable most of functionalities what Calcite supports.
- Except aggregate functions, but we may find the way to get them

Next milestone

Schema support on input format and output format

JIRA link

Jira

server	ASF JIRA
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-2149

Things to do

Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-2172
Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-2173

Automatic parallelism for input data source with metadata

JIRA link

Jira

server	ASF JIRA
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-2147

Things to do

automatic parallelism for Kafka input data source

Others (non-epic)

Jira

server	ASF JIRA
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-2170

Jira

server	ASF JIRA
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-2148

How this milestone will help users to play with Storm SQL?

Users can also read/write message from/to Avro, and CSV.
- Only JSON was supported.
Users can test their topology with socket (easily with 'nc'), no need to set up test kafka topic.
Storm SQL will automatically set parallelism hint when input source provides partition informations.
- 'PARALLELISM' can overwrite its value, so manual parallelism hint is always applied.

How this milestone will improve Storm SQL internally?

Code generation will get rid of overheads of evaluating code block, hence making operator more faster.

Next works

Below works can be done without waiting some other works. We might want to pick several works from each category and create next milestone.

When building milestone, it would be better to clarify the goal - "How this milestone helps the users to play with Storm SQL?".

Automatic parallelism for input data source with metadata

JIRA link

Jira

server	ASF JIRA
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-2147

...

Apply this to Kafka input data source (maybe handled from STORM-2147)
Do we want to add more input data sources? Then they need to be considered as well.

Schema support on input format and output format

JIRA link

Jira

server	ASF JIRA
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-2149

Things to do

...

CSV
Avro
TSV TSV
Schema Registry
And more

Supports more functions (scalar and aggregation)

JIRA link

None yet

Things to do (not filed to issues yet)

DATE / TIMESTAMP related functions
- Calcite lacks here, so may need to refer one of RDBMS - MySQL / Oracle / PostgreSQL
Functions which SQE supports now
- Link : Technical analysis of merging SQE and Storm SQL#AppendixA.AnalysisoftheeffortforintegratingSQEtoStormSQL(indetail)
And more

Expand supporting external components

JIRA link

Jira

server	ASF JIRA
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-2075

Done

Kafka as Input / Output
Redis as Output

...

Projection / Filter pushdown to data source

JIRA link

None yet

Note

Not sure it helps stream data source
- It may help but we should make sure that Spout supports projection (maybe only column referring) / filter
It definitely helps with input data sources which accepts query (for example, JDBC)

...

Change backend of SQL to higher-level core API (get rid of Trident)

JIRA link

...

Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-2406
None yet

Precondition

Apache Storm adopts higher-level core API
- JIRA issue:
  Jira
  server ASF JIRA
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
  key STORM-1843
Higher-level core API supports exactly-once

...

Storm SQL may go back to basic feature: no aggregation, no join, no sort
- because higher-level core API cannot support aggregation and join without window
- current Storm SQL's aggregation and join semantic are very different from Streaming SQL

Support Streaming SQL

JIRA link

...

Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-2405
and more (support _START & _END, support session, etc.)
...

Precondition

Calcite supports Streaming SQL : https://calcite.apache.org/docs/stream.html
- For now this page states that Streaming SQL features are NOT IMPLEMENTED yet
- This still seems to be a design phase.
  - Julian Hyde had various talks in various place, but in parallel he posted to Calcite dev mailing list for design doc about join recent days.

Note

adds issues 'rowtime' to row automatically
group by window
join between stream and table (without support temporal table)
join between stream and stream
join between stream and table (with support temporal table or similar)

Page tree

Page History

Versions Compared

Old Version 6

New Version Current

Key

Others (non-epic)

How this milestone will help users to play with Storm SQL?

How this milestone will improve Storm SQL internally?

Next milestone

Schema support on input format and output format

JIRA link

Things to do

Automatic parallelism for input data source with metadata

JIRA link

Things to do

Others (non-epic)

How this milestone will help users to play with Storm SQL?

How this milestone will improve Storm SQL internally?

Next works

Automatic parallelism for input data source with metadata

JIRA link

Schema support on input format and output format

JIRA link

Things to do

CSVAvroTSV TSVSchema Registry And more

Supports more functions (scalar and aggregation)

JIRA link

Things to do (not filed to issues yet)

Expand supporting external components

JIRA link

Done

Projection / Filter pushdown to data source

JIRA link

Note

Change backend of SQL to higher-level core API (get rid of Trident)

JIRA link

JiraserverASF JIRAcolumnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolutionserverId5aa69414-a9e9-3523-82ec-879b028fb15bkeySTORM-2406None yet

Precondition

Support Streaming SQL

JIRA link

JiraserverASF JIRAcolumnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolutionserverId5aa69414-a9e9-3523-82ec-879b028fb15bkeySTORM-2405and more (support _START & _END, support session, etc.)...

Precondition

Note

CSV
Avro
TSV TSV
Schema Registry
And more

Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-2406
None yet

Jira
server ASF JIRA
columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key STORM-2405
and more (support _START & _END, support session, etc.)
...