Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

How this milestone will help users to play with Storm SQL?

  • Users can run 'explain' to see query plan for DML statement before submitting Trident topology.
  • Users can use GROUP BY and JOIN statement in DML statement
    • Note that these feature follows the Trident semantic of aggregation and join
    • Aggregation is done within batch
      • SQE does stateful aggregation, so Storm SQL might follow later
    • Join is done with each of batches
    • But respecting Streaming SQL semantic seems correct, which means that we could get rid of these features again and wait for Streaming SQL
      • This is what Flink is doing, but Flink have Table API so Flink can support window aggregation and join without Calcite support
  • Users can run Storm SQL Users will be able to run Storm SQL Runner without copying dependencies to extlib. 
    • This is resolved from 
      Jira
      serverASF JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keySTORM-2016
       but this issue was originally came from 
      Jira
      serverASF JIRA
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keySTORM-1435
    .
  • Users will be able to use GROUP BY and JOIN statement in DML statement
    • Note that these feature follows the Trident semantic of aggregation and join
    • Aggregation is done within batch
      • SQE does stateful aggregation, so Storm SQL might follow later
    • Join is done with each of batches
    • But respecting Streaming SQL semantic seems correct, which means that we could get rid of these features again and wait for Streaming SQL
      • This is what Flink is doing, but Flink have Table API so Flink can support window aggregation and join without Calcite support
  • Users will be able to Users can use user defined aggregate function in Trident mode.
    • User defined function is already supported
  • Users will be able to run 'explain' to see query plan for DML statement before submitting Trident topology
  • Users will be able to Users can specify parallelism of input data source which will be used unless repartitioning is made

...