Apache Kylin : Analytical Data Warehouse for Big Data

Welcome to Kylin Wiki.

1. After successfully connecting Kylin node, click the icon in the upper right corner to exit the management mode:

After exiting the management mode, the display interface is as follows, and you can create a dataset:

2. MDX for Kylin will automatically read kylin's metadata information. Here, we choose to create a dataset named `coved_trip_dataset`

under project `covid_trip_project`

:

3. Select the cube in kylin and define the cube relationship.

we put `covid19_cube`

and `newyork_trip_cube`

under the same dataset to establish the connection relationship:

The connection relationship between cubes is realized through the common dimension table between two cubes, `covid_cube`

and `newyork_trip_cube`

all use the same dimension table `lookup_calendar`

：

4. In the D`efine Semantics`

step, you can modify the name of atomic indicators, define business indicators, and add hierarchical dimensions, etc.

- First, in order to avoid confusion in the data analysis layer of the count measurement in the two cubes, we can modify the indicator name to a different name:

- According to the business problems mentioned above, you can define MTD, YTD, MOM and YOY for the measure in the cube by adding calculated measure in `MDX for Kylin`:

Before defining these business indicators, we first define the hierarchical for the time dimension in dimension table `lookup_calender`

. After defining the hierarchical dimension, we can roll up and drill down on this dimension to facilitate business personnel to analyze and observe data from different granularity of the same dimension.

Add dimensions `lookup_calender.year_start`

, `lookup_calender.quarter_start`

、`lookup_calender.month_start`

and `lookup_calender.day_start`

to a hierarchical dimension in turn:

Similarly, we can also add hierarchical dimension for the regional dimensions in table `COVID_19_ACTIVITY`

:

- Because we need to calculate MTD and YTD at the time level, we need to correctly set the types of each time dimension. For example, set the type of `lookup_calender.day_start`

to D`ay`

:

Similarly, set the type of `lookup_calender.month_start`

to M`onth`

, the type of `lookup_calender.quarter_start `

to `Quarter`

, and the type of `lookup_calender.year_start`

to Y`ear`

.

- Then, taking `SUM_PASSENGER_COUNT`

as an example, MTD, YTD, MOM and YOY can be easily defined through the built-in template expression of `MDX for Kylin`

:

Similar to `SUM_PASSENGER_COUNT,`

create business indicators of MTD, YTD, MOM and YOY for other atomic indicators in `newyork_trip_cube`

.

- After the above indicators are added, it is necessary to define `semi-additive measure`

for two measures `SUM_DEATH_COUNT`

and `SUM_POSITIVE_CASES`

in `covid19_cube`

. S`emi-additive measure`

refer to measures that are not cumulative in the time dimension but can be normally accumulated in dimensions other than time. Since the number of positive cases and the number of death count has accumulated over time, they are `semi-additive measure`

like commodity inventory or bank balance. `MDX for Kylin`

supports defining `semi-additive measures`

through `calculated measure`

.

For example, `SUM_DEATH_COUNT`

in `COVID_19_ACTIVITY`

is defined as a `semi-additive measure`

, and its mdx expression is:

sum([LOOKUP_CALENDAR].[TIME_HIERARCHY-Hierarchy].CurrentMember.LastChild, [Measures].[SUM_DEATH_COUNT])

Similarly, `SUM_POSITIVE_CASES`

is also defined as a semi cumulative measure:

Then define the indicators of MOM and YOY on Semiadditive_DEATH_COUNT, Semiadditive_POSITIVE_CASE, SUM_NEW_DEATH_COUNT and SUM_NEW_POSITIVE_CASES. Since the number of positive cases and the number of deaths themselves are cumulative values, there is no need to define the indicators of MTD and YTD.

- According to the indicator design of the business scenario, in addition to the indicators of YTD, MTD, YOY and MOM, some other indicators need to be defined:

1. Covid19 case fatality rate：Semiadditive_DEATH_COUNT/Semiadditive_POSITIVE_CASES

2. Average distance traveled by Newyork taxi：SUM_TRIP_DISTANCE/ORDER_COUNT

After all the above business indicators are defined, the next step is to add more translations in different languages to the indicator names, which can facilitate analysts from different countries around the world to analyze data. If there is no multilingual requirement, you can skip this step and save the dataset. After the dateset is saved successfully, the next step is to use the BI tool to connect MDX` for Kylin`

for convenient and fast data analysis.