Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-2073
     will reduce redundant multiple Trident steps into one
  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-1446
     will do some query optimizations, and open the way to address next works, like automatic parallelism, pushdown, and so on
  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-2125
     will enable most of functionalities what Calcite supports.
    • Except aggregate functions, but we may find the way to get them

Next milestone

Schema support on input format and output format

JIRA link

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-2149

Things to do

  • Avro
  • CSV

Automatic parallelism for input data source with metadata

JIRA link

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-2147

Things to do

  • automatic parallelism for Kafka input data source

Others (non-epic)

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-2170

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-2148

How this milestone will help users to play with Storm SQL?

  • Users can also read/write message from/to Avro, and CSV.
    • Only JSON was supported.
  • Users can test their topology with socket (easily with 'nc'), no need to set up test kafka topic.
  • Storm SQL will automatically set parallelism hint when input source provides partition informations.
    • 'PARALLELISM' can overwrite its value, so manual parallelism hint is always applied.

How this milestone will improve Storm SQL internally?

  • Code generation will get rid of overheads of evaluating code block, hence making operator more faster.

Next works

Below works can be done without waiting some other works. We might want to pick several works from each category and create next milestone.

When building milestone, it would be better to clarify the goal - "How this milestone helps the users to play with Storm SQL?".

Automatic parallelism for input data source with metadata

JIRA link

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-2147

...

  • Apply this to Kafka input data source (maybe handled from STORM-2147)
  • Do we want to add more input data sources? Then they need to be considered as well.

Schema support on input format and output format

JIRA link

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-2149

...

Supports more functions (scalar and aggregation)

JIRA link

None yet

Things to do (not filed to issues yet)

...

Expand supporting external components

JIRA link

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-2075

...

Projection / Filter pushdown to data source

JIRA link

None yet 

Note

  • Not sure it helps stream data source
    • It may help but we should make sure that Spout supports projection (maybe only column referring) / filter
  • It definitely helps with input data sources which accepts query (for example, JDBC)

...

Change backend of SQL to higher-level core API (get rid of Trident)

JIRA link

None yet

Precondition

  • Apache Storm adopts higher-level core API
    • JIRA issue: 
      Jira
      serverASF JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keySTORM-1843
  • Higher-level core API supports exactly-once

...

  • Storm SQL may go back to basic feature: no aggregation, no join, no sort
    • because higher-level core API cannot support aggregation and join without window
    • current Storm SQL's aggregation and join semantic are very different from Streaming SQL

Support Streaming SQL

JIRA link

None yet

Precondition

...