Apache Kylin : Analytical Data Warehouse for Big Data
Page History
reference issue
Jira | ||||||
---|---|---|---|---|---|---|
|
01 Background
The current KE4 employs Kylin has a two-level storage structure with segment and layout. When the user's query hardware resources are sufficient, providing adequate concurrent computing power, the benefits of precomputation allow the query performance to meet expectations. However, there are significant drawbacks:
- Too many small index files are distributed across different segments, resulting in suboptimal storage efficiency and read I/O efficiency.
- The foundational (detailed/aggregate) index data files are large, and the query performance cannot meet user expectations.
- The storage structure lacks flexibility for customization based on user business scenarios, with limited space/methods for optimization, such as:
- Point Query Scenario When users wish to perform point or range queries on high cardinality columns like UserID or phone numbers, they must perform a full scan of the relevant layout to obtain results.
- Aggregate Queries with Filter Conditions In common customer queries with multiple fields (sometimes dozens) as filter conditions, KE Kylin needs to hit the layout that includes all the filter dimensions before performing subsequent aggregation calculations. Since KEKylin's precomputation is done on the entire data set, unlike traditional materialized views that can be precomputed based on precisely defined filter conditions, hitting a large index or even a foundational detailed index for high cardinality filter fields can lead to query performance that fails to meet user expectations in many scenarios.
...
Overview
Content Tools
ThemeBuilder
Apps