Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
1. After successfully connecting Kylin node, click the icon in the upper right corner to exit the management mode:
After exiting the management mode, the display interface is as follows, and you can create a dataset:
2. MDX for Kylin will automatically read kylin's metadata information. Here, we choose to create a dataset named
coved_trip_dataset under project
3. Select the cube in kylin and define the cube relationship.
newyork_trip_cube under the same dataset to establish the connection relationship:
The connection relationship between cubes is realized through the common dimension table between two cubes,
newyork_trip_cube all use the same dimension table
4. In the D
efine Semantics step, you can modify the name of atomic indicators, define business indicators, and add hierarchical dimensions, etc.
- First, in order to avoid confusion in the data analysis layer of the count measurement in the two cubes, we can modify the indicator name to a different name:
- According to the business problems mentioned above, you can define MTD, YTD, MOM and YOY for the measure in the cube by adding calculated measure in `MDX for Kylin`:
Before defining these business indicators, we first define the hierarchical for the time dimension in dimension table
lookup_calender. After defining the hierarchical dimension, we can roll up and drill down on this dimension to facilitate business personnel to analyze and observe data from different granularity of the same dimension.
lookup_calender.day_start to a hierarchical dimension in turn:
Similarly, we can also add hierarchical dimension for the regional dimensions in table
- Because we need to calculate MTD and YTD at the time level, we need to correctly set the types of each time dimension. For example, set the type of
lookup_calender.day_start to D
Similarly, set the type of
lookup_calender.month_start to M
onth , the type of
Quarter , and the type of
lookup_calender.year_start to Y
- Then, taking
SUM_PASSENGER_COUNT as an example, MTD, YTD, MOM and YOY can be easily defined through the built-in template expression of
MDX for Kylin :
SUM_PASSENGER_COUNT, create business indicators of MTD, YTD, MOM and YOY for other atomic indicators in
- After the above indicators are added, it is necessary to define
semi-additive measure for two measures
covid19_cube . S
emi-additive measure refer to measures that are not cumulative in the time dimension but can be normally accumulated in dimensions other than time. Since the number of positive cases and the number of death count has accumulated over time, they are
semi-additive measure like commodity inventory or bank balance.
MDX for Kylin supports defining
semi-additive measures through
calculated measure .
COVID_19_ACTIVITY is defined as a
semi-additive measure, and its mdx expression is:
SUM_POSITIVE_CASES is also defined as a semi cumulative measure:
Then define the indicators of MOM and YOY on Semiadditive_DEATH_COUNT, Semiadditive_POSITIVE_CASE, SUM_NEW_DEATH_COUNT and SUM_NEW_POSITIVE_CASES. Since the number of positive cases and the number of deaths themselves are cumulative values, there is no need to define the indicators of MTD and YTD.
- According to the indicator design of the business scenario, in addition to the indicators of YTD, MTD, YOY and MOM, some other indicators need to be defined:
1. Covid19 case fatality rate：Semiadditive_DEATH_COUNT/Semiadditive_POSITIVE_CASES
2. Average distance traveled by Newyork taxi：SUM_TRIP_DISTANCE/ORDER_COUNT
After all the above business indicators are defined, the next step is to add more translations in different languages to the indicator names, which can facilitate analysts from different countries around the world to analyze data. If there is no multilingual requirement, you can skip this step and save the dataset. After the dateset is saved successfully, the next step is to use the BI tool to connect MDX
for Kylin for convenient and fast data analysis.