Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
Background
Since that fact that Kylin4 is highly depend on Spark SQL, it better we have a deeper understanding of Spark SQL.
Definitation
Catalyst is an execution-agnostic framework to represent and manipulate a dataflow graph, i.e. trees of relational operators and expressions.
The main abstraction in Catalyst is TreeNode that is then used to build trees of Expressions or QueryPlans.
Core Contract
Name | Role | Comment |
---|---|---|
SparkSession | Entry Point to Spark SQL |
As a Spark developer, you create a |
Dataset | Structured Query with Data Encoder | Dataset is a strongly-typed data structure in Spark SQL that represents a structured query. |
Catalyst | Tree Manipulation Framework | Catalyst is an execution-agnostic framework to represent and manipulate a dataflow graph, i.e. trees of relational operators and expressions. |
TreeNode | Node in Catalyst Tree |
|
Expression | Executable Node in Catalyst Tree |
|
QueryPlan | Structured Query Plan |
Scala-specific, |
Catalog | Metastore Management Interface |
|
Attribute | Base of leaf named expressions |
|
Core Diagram
Framework UML Diagram
Credit
All right reserved to jaceklaskowski.