Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-1443
  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-1446
  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-2073

Not prioritized yet

Expand supporting external components

...

How this milestone will help users to play with Storm SQL?

Done

  • Kafka as Input / Output
  • Redis as Output

Remaining works

  • Users can run 'explain' to see query plan for DML statement before submitting Trident topology.
  • Users can use GROUP BY and JOIN statement in DML statement
    • Note that these feature follows the Trident semantic of aggregation and join
    • Aggregation is done within batch
      • SQE does stateful aggregation, so Storm SQL might follow later
    • Join is done with each of batches
    • But respecting Streaming SQL semantic seems correct, which means that we could get rid of these features again and wait for Streaming SQL
      • This is what Flink is doing, but Flink have Table API so Flink can support window aggregation and join without Calcite support
  • Users can run Storm SQL Runner without copying dependencies to extlib. 
    • This is resolved from 
      Jira
      serverASF JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keySTORM-

...

    • 2016
       but this issue was originally came from 
      Jira
      serverASF JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keySTORM-
    2082
    • 1435
      .
  • Users can use user defined aggregate function in Trident mode.
  • Users can specify parallelism of input data source which will be used unless repartitioning is made

How this milestone will improve Storm SQL internally?

  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-21022073
     will reduce redundant multiple Trident steps into one.
  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-2103
  • And more
    • Any external modules which support Trident state can be candidates.

Consideration

...

  • 1446
     will do some query optimizations, and open the way to address next works, like automatic parallelism, pushdown, and so on.

Next works

Below works can be done without waiting some other works. We might want to pick several works from each category and create next milestone.

When building milestone, it would be better to clarify the goal - "How this milestone helps the users to play with Storm SQL?".

...

Automatic parallelism for input data source with metadata

JIRA link

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-2147

...

Schema support on input format and output format

JIRA link

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-2149

Things to do (not filed to issues yet)

  • CSV
  • Avro
  • TSV (question)
  • Schema Registry (question)
  • And more

Supports more functions (scalar and aggregation)

JIRA link

None yet

Things to do (not filed to issues yet)

Expand supporting external components

JIRA link

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-2075

Done

  • Kafka as Input / Output
  • Redis as Output

Remaining works

  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-2082
  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-2102
  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-2103
  • And more
    • Any external modules which support Trident state can be candidates.

Consideration

  • They should be rewritten if we replaces the backend of Storm SQL to higher-level core API
    • Need to determine 'Widely used' data sources and only provides them for now

Projection / Filter pushdown to data source

...