Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

1.  在 MDX 中连接 kylin 节点成功后,点击右上角的图标退出管理模式:

Image Removed

After successfully connecting Kylin node, click the icon in the upper right corner to exit the management mode:

Image Added

After exiting the management mode, the display interface is as follows, and you can create a dataset:退出管理模式后,显示界面如下,可以创建 dataset:

2.  Mdx MDX for kylin 会自动读取 kylin 的元数据信息,在这里我们选择在 covidKylin will automatically read kylin's metadata information. Here, we choose to create a dataset named coved_trip_dataset  under project 下创建名为 covid_trip_dataset 的 dataset:project :

3. Select the cube in kylin and define the cube relationship.

we put  选取 kylin 中的 cube,并定义 cube 关系,这里我们将 covid19_cube   and  newyork_trip_cube 放在同一个 dataset 下建立连接关系:

Image Removed

  under the same dataset to establish the connection relationship:

Image Added

The connection relationship between cubes is realized through the common dimension table between two cubes, covid_cube  and 在 cube 间建立连接关系是通过两个 cube 之间的共同维度表实现的,covid_cube 和 newyork_trip_cube 都使用了同一张维度表   all use the same dimension table lookup_calendar:calendar :

4.  在 Define Semantics 步骤中,进行修改原子指标名称、定义业务指标、增加层级维度等一系列操作。

- 首先为避免两个 cube 中的 count 度量在数据分析层产生混淆,我们可以修改指标名称为不同的名字:

Image Removed

- 根据前文中的业务问题,在 mdx 中可以通过增加 Caculated Measure为 cube 中的 measure 定义月累计、年累计,以及月增速、年增速:

Image Removed

In the Define Semantics  step, you can modify the name of atomic indicators, define business indicators, and add hierarchical dimensions, etc.

- First, in order to avoid confusion in the data analysis layer of the count measurement in the two cubes, we can modify the indicator name to a different name:

Image Added

- According to the business problems mentioned above, you can define MTD, YTD, MOM and YOY for the measure in the cube by adding calculated measure in `MDX for Kylin`:

Image Added

Before defining these business indicators, we first define the hierarchical for the time dimension in dimension table lookup_calender. After defining the hierarchical dimension, we can roll up and drill down on this dimension to facilitate business personnel to analyze and observe data from different granularity of the same dimension.
Add dimensions 在定义这些业务指标之前,我们先为维度表 lookup_calender 中的时间维度定义层级,定义层级维度之后可以在这个维度上进行上卷下钻,方便业务人员从同一维度的不同粒度来分析观察数据。
lookup_calender.year_start、lookupstart , lookup_calender.quarter_start、lookupstart 、lookup_calender.month_start、lookupstart and lookup_calender.day_start 这几个维度依次添加到一个层级维度中:  to a hierarchical dimension in turn:

类似的,还可以为表 Similarly, we can also add hierarchical dimension for the regional dimensions in table COVID_19_ACTIVITY 中的地区维度设置层级: :

-  由于我们要在在时间层级上进行月累计、年累计的计算,所以需要正确设置各个时间维度的类型,比如将 Because we need to calculate MTD and YTD at the time level, we need to correctly set the types of each time dimension. For example, set the type of lookup_calender.day_start 的类型设置为 Day:  to Day :

类似的,将 Similarly, set the type of lookup_calender.month_start 的类型设置为 Month,将   to Month , the type of lookup_calender.quarter_start 的类型设置为 Quarter,将 to Quarter , and the type of lookup_calender.year_start 的类型设置为 Year。  to Year .

-  然后以 Then, taking SUM_PASSENGER_COUNT 为例,通过 MDX 内置的模板表达式可以轻松定义月累计、年累计、月增速和年增速:  as an example, MTD, YTD, MOM and YOY can be easily defined through the built-in template expression of  MDX for Kylin :

Similar to SUM_PASSENGER_COUNT 类似,依次为 , create business indicators of MTD, YTD, MOM and YOY for other atomic indicators in newyork_trip_cube 中的其他原子指标创建月累计、年累计、月增速和年增速的业务指标。 .

-  After the above indicators are added, it is necessary to define semi-additive measure  for two measures - 在上述指标添加完成之后,需要对 covid19_cube 中的两个 measure SUM_DEATH_COUNT   and SUM_POSITIVE_CASES 定义半累加度量,半累加度量指的是在时间维度上不具备累加性而在时间以外的维度可以正常累加的度量,由于确诊人数和死亡人数已经是时间上的累计值,所以他们和商品库存、银行余额此类度量一样,都是半累加度量。MDX for Kylin 支持通过 Caculated Measure 来定义半累加度量。
定义 COVID_19_ACTIVITY 中的 SUM_DEATH_COUNT 为半累加度量,其 MDX 表达式为:  in covid19_cube . Semi-additive measure refer to measures that are not cumulative in the time dimension but can be normally accumulated in dimensions other than time. Since the number of positive cases and the number of death count has accumulated over time, they are semi-additive measure like commodity inventory or bank balance. MDX for Kylin  supports defining semi-additive measures through calculated measure .
For example, SUM_DEATH_COUNT  in COVID_19_ACTIVITY  is defined as a semi-additive measure, and its mdx expression is:

Code Block

同样的,将 Similarly, SUM_POSITIVE_CASES 也定义为半累加度量:  is also defined as a semi cumulative measure:

然后在 Then define the indicators of MOM and YOY on Semiadditive_DEATH_COUNT、SemiadditiveCOUNT, Semiadditive_POSITIVE_CASE、SUMCASE, SUM_NEW_DEATH_COUNT、UMCOUNT and SUM_NEW_POSITIVE_CASES 之上定义月增速和年增速的指标,由于确诊人数和死亡人数本身就已经是累计值,所以不需要在定义年累计和月累计指标。

- 根据前文中的指标设计,除了年累计、月累计、年增速、月增速的指标之外,还需要定义一些其他指标:

. Since the number of positive cases and the number of deaths themselves are cumulative values, there is no need to define the indicators of MTD and YTD.

- According to the indicator design of the business scenario, in addition to the indicators of YTD, MTD, YOY and MOM, some other indicators need to be defined:

1. Covid19 case fatality rate:Semiadditive_DEATH_COUNT/Semiadditive_POSITIVE_CASES1. Covid19 病死率:死亡人数/确诊人数

2. 出租车出行平均里程:出租车出行里程/出租车出行数量

Image Removed

Average distance traveled by Newyork taxi:SUM_TRIP_DISTANCE/ORDER_COUNT

Image Added

After all the above business indicators are defined, the next step is to add more translations in different languages to the indicator names, which can facilitate analysts from different countries around the world to analyze data. If there is no multilingual requirement, you can skip this step and save the dataset. After the dateset is saved successfully, the next step is to use the BI tool to connect MDX for Kylin  for convenient and fast data analysis.以上业务指标全部定义完成之后,进入下一步可以为指标名称增加更多不同语言的翻译,这样可以方便全球各地不同国家的分析师分析数据。没有多语言需求时可以直接跳过此步骤,保存 Dataset。Dateset 保存成功之后,下一步就可以使用 BI 工具连接 MDX for Kylin 进行方便且快速的数据分析。