Apache Kylin : Analytical Data Warehouse for Big Data
Page History
Table of Contents | ||
---|---|---|
|
Background
Since that fact that Kylin4 is highly depend on Spark SQL, it better we have a deeper understanding of Spark SQL.
...
Core Components
Name | Target |
---|---|
SQL Parser Framework | SQL Parser Framework in Spark SQL uses ANTLR to translate a SQL text to a data type, Expression, TableIdentifier or LogicalPlan. |
Catalyst Framework | Catalyst is an execution-agnostic framework to represent and manipulate a dataflow graph, i.e. trees of relational operators and expressions. |
Tungsten Execution Backend | The goal of Project Tungsten is to improve Spark execution by optimizing Spark jobs for CPU and memory efficiency (as opposed to network and disk I/O which are considered fast enough). Tungsten focuses on the hardware architecture of the platform Spark runs on, including but not limited to JVM, LLVM, GPU, NVRAM, etc. It does so by offering the following optimization features:
|
Monitor | SQL tab in web UI shows SQLMetrics per physical operator in a structured query physical plan. You can access the SQL tab under By default, it displays all SQL query executions. However, after a query has been selected, the SQL tab displays the details for the structured query execution. |
...
Core Contract
Name | Contract | Comment |
---|---|---|
SparkSession | Entry Point to Spark SQL |
As a Spark developer, you create a |
Dataset | Structured Query with Data Encoder | Dataset is a strongly-typed data structure in Spark SQL that represents a structured query. |
Catalyst | Tree Manipulation Framework | Catalyst is an execution-agnostic framework to represent and manipulate a dataflow graph, i.e. trees of relational operators and expressions. |
TreeNode | Node in Catalyst Tree |
|
Expression | Executable Node in Catalyst Tree |
|
QueryPlan | Structured Query Plan |
Scala-specific, |
Catalog | Metastore Management Interface |
|
Attribute | Base of leaf named expressions |
|
...