Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
1. After successfully connecting Kylin node, click the icon in the upper right corner to exit the management mode:
After exiting the management mode, the display interface is as follows, and you can create a dataset:
2. MDX for Kylin will automatically read kylin's metadata information. Here, we choose to create a dataset named coved_trip_dataset
under project covid_trip_project
:
3. Select the cube in kylin and define the cube relationship.
we put covid19_cube
and newyork_trip_cube
under the same dataset to establish the connection relationship:
The connection relationship between cubes is realized through the common dimension table between two cubes, covid_cube
and newyork_trip_cube
all use the same dimension table lookup_calendar
:
4. In the Define Semantics
step, you can modify the name of atomic indicators, define business indicators, and add hierarchical dimensions, etc.
- First, in order to avoid confusion in the data analysis layer of the count measurement in the two cubes, we can modify the indicator name to a different name:
- According to the business problems mentioned above, you can define MTD, YTD, MOM and YOY for the measure in the cube by adding calculated measure in `MDX for Kylin`:
Before defining these business indicators, we first define the hierarchical for the time dimension in dimension table lookup_calender
. After defining the hierarchical dimension, we can roll up and drill down on this dimension to facilitate business personnel to analyze and observe data from different granularity of the same dimension.
Add dimensions lookup_calender.year_start
, lookup_calender.quarter_start
、lookup_calender.month_start
and lookup_calender.day_start
to a hierarchical dimension in turn:
Similarly, we can also add hierarchical dimension for the regional dimensions in table COVID_19_ACTIVITY
:
- Because we need to calculate MTD and YTD at the time level, we need to correctly set the types of each time dimension. For example, set the type of lookup_calender.day_start
to Day
:
Similarly, set the type of lookup_calender.month_start
to Month
, the type of lookup_calender.quarter_start
to Quarter
, and the type of lookup_calender.year_start
to Year
.
- Then, taking SUM_PASSENGER_COUNT
as an example, MTD, YTD, MOM and YOY can be easily defined through the built-in template expression of MDX for Kylin
:
Similar to SUM_PASSENGER_COUNT,
create business indicators of MTD, YTD, MOM and YOY for other atomic indicators in newyork_trip_cube
.
- After the above indicators are added, it is necessary to define semi-additive measure
for two measures SUM_DEATH_COUNT
and SUM_POSITIVE_CASES
in covid19_cube
. Semi-additive measure
refer to measures that are not cumulative in the time dimension but can be normally accumulated in dimensions other than time. Since the number of positive cases and the number of death count has accumulated over time, they are semi-additive measure
like commodity inventory or bank balance. MDX for Kylin
supports defining semi-additive measures
through calculated measure
.
For example, SUM_DEATH_COUNT
in COVID_19_ACTIVITY
is defined as a semi-additive measure
, and its mdx expression is:
sum([LOOKUP_CALENDAR].[TIME_HIERARCHY-Hierarchy].CurrentMember.LastChild, [Measures].[SUM_DEATH_COUNT])
Similarly, SUM_POSITIVE_CASES
is also defined as a semi cumulative measure:
Then define the indicators of MOM and YOY on Semiadditive_DEATH_COUNT, Semiadditive_POSITIVE_CASE, SUM_NEW_DEATH_COUNT and SUM_NEW_POSITIVE_CASES. Since the number of positive cases and the number of deaths themselves are cumulative values, there is no need to define the indicators of MTD and YTD.
- According to the indicator design of the business scenario, in addition to the indicators of YTD, MTD, YOY and MOM, some other indicators need to be defined:
1. Covid19 case fatality rate:Semiadditive_DEATH_COUNT/Semiadditive_POSITIVE_CASES
2. Average distance traveled by Newyork taxi:SUM_TRIP_DISTANCE/ORDER_COUNT
After all the above business indicators are defined, the next step is to add more translations in different languages to the indicator names, which can facilitate analysts from different countries around the world to analyze data. If there is no multilingual requirement, you can skip this step and save the dataset. After the dateset is saved successfully, the next step is to use the BI tool to connect MDX for Kylin
for convenient and fast data analysis.