Motivation

Add support for SQL-compatible window functions to SQL++

User Model

Window function call ::= function_name(arg1, arg2, ... argN) OVER (frame_var AS)? 
( (PARTITION BY expr1, expr2, …. exprN)? (ORDER BY exprA, exprB, … exprN)? Frame_Spec? )
Frame_Spec ::= ( ROWS | RANGE | GROUPS ) Frame_Extent Frame_Exclusion?
Frame_Extent ::= Frame_Bound | BETWEEN Frame_Bound AND Frame_Bound
Frame_Bound ::= UNBOUNDED PRECEDING | expr PRECEDING | CURRENT ROW | UNBOUNDED FOLLOWING | expr FOLLOWING
Frame_Exclusion ::= EXCLUDE ( CURRENT ROW | GROUP | TIES | NO OTHERS )

Function_name is one of the following:

  • Window function: cume_dist(), dense_rank(), first_value(), lag(), last_value(), lead(), nth_value(), ntile(), rank(), ratio_to_report(), row_number(), percent_rank()
  • SQL aggregate function: count(), sum(), avg(), min(), max(), etc
  • SQL++ aggregate function: array_count(), array_sum(), strict_count(), strict_sum(), etc

frame_var is a variable which is bound to the frame contents and will be in-scope for function arguments. It contains an array (or multi-set if there’s no order by) of objects where each object has the same structure as the GROUP AS variable object (one field for each variable in the current scope).

Example: 

from emp select array_sum((from w select value w.emp.salary)) over w as (partition by emp.dept) 

Design

The following components were added:

SQL++ Expression

 org.apache.asterix.lang.sqlpp.expression.WindowExpression

 This expression is for a single window function call.

 Contents:

  • Function name
  • Function arguments
  • List of ‘partition by’ expressions
  • List of ‘order by’ expressions and order modifiers
  • Frame mode: (enum) RANGE | ROWS | GROUPS
  • Frame boundary start, end : (enum) CURRENT_ROW | UNBOUNDED_PRECEDING |  UNBOUNDED_FOLLOWING | BOUNDED_PRECEDING (+ boundary expression) | BOUNDED_FOLLOWING ( + boundary expression)
  • Frame exclusion kind: (enum) CURRENT_ROW | GROUP | TIES | NO_OTHERS
  • Frame variable and its field list mapping

Algebricks Operator

org.apache.hyracks.algebricks.core.algebra.operators.logical.WindowOperator

This operator can evaluate multiple function calls over the same window definition

Contents:

  • List of ‘partition by’ expressions
  • List of ‘order by’ expressions and order modifiers
  • List of ‘frame start’ expressions (empty if unbounded)
  • List of ‘frame end’ expressions (empty if unbounded)
  • List of ‘frame value’ expressions
  • List of ‘frame exclusion’ expressions (empty if exclusion kind is ‘no others’)
  • List of output variables and function call expressions. These are running aggregates such as row_number(), rank() that operate on whole partitions (frame definition is not applicable to these functions)
  • Nested plan. These are regular SQL++ aggregates that operate on window frames. Each frame is sent into the nested plan through the nested tuple source operator. SQL aggregate functions are rewritten into SQL++ aggregates by SqlppWindowExpressionVisitor

The logic of the operator is a follows:

  1. Split input into partitions as specified by ‘partition by’ expressions
  2. Then order tuples within each partition as specified by ‘order by’ expressions
  3. Then for each partition
    1. For each tuple compute its running aggregates
    2. For each tuple find a window frame and compute regular aggregates by sending that frame into the nested plan

Window frame computation:

For each tuple in the partition we need to find a set of tuples from the same partition that match the frame definition. This is essentially a self-join with the following condition:

‘frame value’ >= ‘frame start’ and ‘frame value’ <= ‘frame end’ and ‘frame value’ != ‘frame exclusion’

Expressions generated for ‘frame value’, ‘frame start’ and ‘frame end’ depend on the frame mode and frame boundary specification:

Frame mode‘frame value’ expressions
RANGE

{  ‘order by’ expr1, expr2, … exprN }
all ‘order by’ expressions

ROWS

{ row_number() }
a single expression which is a result of computing row_number() over the input dataset (this additional window operator is introduced by the compiler)

GROUPS 

{ dense_rank() }
a single expression which is a result of computing dense_rank() over the input dataset (this additional window operator is introduced by the compiler)


Boundary kind

‘frame start’ and ‘frame end’ expressions

CURRENT ROW{ ‘frame value’ expr1, expr2, … exprN }
all ‘frame value’ expressions’ 
BOUNDED PRECEDING

{ ‘frame value’ expression – ‘boundary’ expression }
(only a single ‘frame value’ expression is allowed in this mode)

BOUNDED FOLLOWING

{ ‘frame value’ expression + ‘boundary’ expression }
(only a single ‘frame value’ expression is allowed in this mode)

UNBOUNDED PRECEDING 

{} (empty)

UNBOUNDED FOLLOWING 

{} (empty)


‘frame exclusion’ expressions use the same mechanism as ‘frame value’ expressions and are generated as follows:

Exclusion kind

‘frame exclusion’ expressions

NO OTHERS

{} (empty)

CURRENT ROW { row_number() }
GROUP

{ dense_rank() }

TIES

{ dense_rank(), not row_number() }


Hyracks Runtime

Window operator runtime consists of an abstract class org.apache.hyracks.algebricks.runtime.operators.win.AbstractWindowPushRuntime and its subclasses:

  • WindowSimplePushRuntime – used for row_number(), rank(), dense_rank(), and some others – running aggregates that do not require information about partition length

  • WindowMaterializingPushRuntime – used for cume_dist(), ntile(), percent_rank() – running aggregates that require information about partition length

  • WindowNestedPlansPushRuntime – used for regular aggregates – this implementation supports frame computation.  

    • WindowNestedPlansUnboundedPushRuntime - optimized version used when the frame is equivalent to the whole partition (unbounded preceding to unbounded following)
    • WindowNestedPlansRunningPushRuntime - optimized version used when the frame is unbounded preceding to current row / n following

Open items

  • Optimize performance of WindowNestedPlansPushRuntime: support reverse aggregation steps for sliding frames (x preceding to y following, etc)


  • No labels