Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Link: Unresolved issues in storm-sql

Current milestone

Storm SQL Phase II

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1433

Remaining works

  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-1443
  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-1446
  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-2073

 

Not prioritized yet

Expand supporting external components

JIRA link

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-2075

Done

  • Kafka as Input / Output
  • Redis as Output

Remaining works

  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-2082
  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-2102
  • Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySTORM-2103
  • And moreand etc.
    • Any external modules which support Trident state can be candidates.

Consideration

  • They should be rewritten if we replaces the backend of Storm SQL to higher-level core API
    • Need to determine 'Widely used' data sources and only provides them for now

 

...

Automatic parallelism for input data source with metadata

JIRA link

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-
- Automatic parallelism support for Kafka data source- <EPIC> STORM-2149: [Storm SQL]
82ec-879b028fb15b
keySTORM-2147

Things to do (not filed to issue yet)

  • Apply this to Kafka input data source (maybe handled from STORM-2147)
  • Do we want to add more input data sources? Then they need to be considered as well.

Schema support on input format and output format

JIRA link

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-
CSV?
3523-82ec-
TSV?
879b028fb15b
key
-- Avro?
-- Schema Registry?
-- etc?- <EPIC?>
STORM-2149

Things to do (not filed to issues yet)

  • CSV
  • Avro
  • TSV (question)
  • Schema Registry (question)
  • And more

Supports more functions (scalar and aggregation)

JIRA link

None yet

Things to do (not filed to issues yet)

...

  • DATE / TIMESTAMP related functions

      ...

      Projection / Filter pushdown to data source

      JIRA link

      None yet 

      Note

      • Not sure it helps stream data source
        • It may help but we should make sure that Spout supports projection (maybe only column referring) / filter
      • It definitely helps with input data sources which accepts query (for example, JDBC)

      Depends on other works (Future work)

      Change

      Future work:

      ...

      backend of SQL to higher-level core API (get rid of Trident)

      JIRA link

      None yet - without supporting join, aggregation, sort, and so on
      - higher

      Precondition

      • Apache Storm adopts higher-level core API
        • JIRA issue: 
          Jira
          serverASF JIRA
          columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
          serverId5aa69414-a9e9-3523-82ec-879b028fb15b
          keySTORM-1843
      • Higher-level core API

      ...

      • supports exactly-once

      ...

      Note

      • Storm SQL may go back to basic feature: no aggregation, no join, no sort
        • because higher-level core API cannot support aggregation and join without window
        • current Storm SQL's aggregation and join semantic are very different from Streaming SQL

      Support Streaming SQL

      JIRA link

      None yet

      Precondition

      Note

      ...

      • group by window

      ...

      • join between stream and table (without support temporal table)

      ...

      • join between stream and stream

      ...

      • join between stream and table (with support temporal

      ...

      • table or similar)