Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: update logger for cubing and query

Table of Contents

Background

In the previous kylin Kylin release, the logs of kylinKylin's build engine and query engine are collected or stored by the resource manager(such as: yarn logs -applicationId xxx) or HBase Region Server instance.

This may make it difficult to find the root cause of the failure job or slow query. In order to solve this problem, Kylin 4.0.0 refactored the log of the build job.   In Kylin 4.0.0, we are trying trying to collect and store these log logs under Kylin's working dir(HDFS or S3).

The log4j configuration files of this log include the following two: include the following two: ${KYLIN_HOME}/conf/spark-driver-log4j.properties and ${KYLIN_HOME}/conf/spark-executor-log4j.properties.
Kylin also provides the default log4j configurations for users who do not want to upload logs under Kylin's working dir(HDFS or S3).

The default log4j configuration files include the following two: ${KYLIN_HOME}/conf/spark-driver-log4j-default.properties and ${KYLIN_HOME}/conf/spark-executor-log4j-default.properties. Without modification in kylin.properties, Kylin will start with the default log4j configuration.

Default logger for Query and Cubing

ConsoleAppender

the default configuration for spark driver in Query and Cubing: spark-driver-log4j-default.properties.

Code Block
vim ${KYLIN_HOME}/conf/spark-driver-log4j-default.properties

log4j.rootLogger=INFO,stderr


the default configuration for spark executor in Query and Cubing: spark-executor-log4j-default.properties.

Code Block
vim ${KYLIN_HOME}/conf/spark-executor-log4j-default.properties

log4j.rootLogger=INFO,stderr


If using default log4j properties, you will see the following messages:

Code Block
2022-06-17 17:50:22,610 INFO [Scheduler 1342122509 Job 6ba9cf9f-18e2-4290-93b9-36674b2cfca8-65] job.NSparkExecutable:451 : Current using default log4j properties for spark driver in using `ConsoleAppender`.Please modify `kylin.spark.driver.log4j.properties` to be `spark-driver-log4j.properties`for uploading log file to hdfs.

2022-06-17 17:50:22,610 INFO [Scheduler 1342122509 Job 6ba9cf9f-18e2-4290-93b9-36674b2cfca8-65] job.NSparkExecutable:457 : Current using default log4j properties for spark executor in using `ConsoleAppender`.Please modify `kylin.spark.executor.log4j.properties` to be `spark-executor-log4j.properties`for uploading log file to hdfs.



ConsoleAppender in spark-driver-log4j-default.properties and spark-executor-log4j-default.properties.

the default configuration for spark driver in Query and Cubing: spark-driver-log4j-default.properties.

Code Block
languagebash
titleModify spark-driver-log4j.properties
linenumberstrue
collapsetrue
vim ${KYLIN_HOME}/conf/spark-driver-log4j-default.properties 
log4j.rootLogger=INFO,stderr


the default configuration for spark executor in Query and Cubing: spark-executor-log4j-default.properties.

Code Block
languagebash
titleModify spark-driver-log4j.properties
linenumberstrue
collapsetrue
vim ${KYLIN_HOME}/conf/spark-executor-log4j-default.properties 
log4j.rootLogger=INFO,stderr



Logger for Cubing

Using spark-driver-log4j.properties and spark-executor-log4j.properties

Users can change the default log4j configuration files for Kylin.

Code Block
vim ${KYLIN_HOME}/conf/kylin.properties

kylin.spark.driver.log4j.properties=spark-driver-log4j.properties
kylin.spark.executor.log4j.properties=spark-executor-log4j.properties


Then Kylin will start with spark-driver-log4j.properties and spark-executor-log4j.properties to collect and store logs of Kylin's build engine.



Driver

...

Log

SparkDriverHdfsLogAppender in spark-driver-log4j.properties

spark-driver-log4j.properties is used to configure the output path, appender, layout, etc. of spark driver log in the build job. By default, spark driver log of a step of a build job will be output to a file in hdfs.

The file path is spliced by kylin.env.hdfs-working-dir, kylin.metadata.url, project name, step id, etc., where step id is spliced by job Id and two digits counting from 00, for example, a build job's first step's step id is jobId-00, the second step's step id is jobId-01, and the specific path of log file is:  $`${kylin.env.hdfs-working-dir}/{kylin.metadata.url}/${project_name}/spark_logs/driver/${step_id}/executeexecute_output.json.timestamp.log `.

Image Modified



View logs through kylin WebUI

When enabled SparkDriverHdfsAppender, users can download driver logs from Kylin's Web UI, even the spark.submit.deployMode is cluster(means the driver is not located at the same node of Kylin Job Server).

By default, the Output will only show the contents of the first and last 100 lines of all logs of this step.

if you need to view all logs, you can click "download the log file" at the top of the Output window to download all logs, and then the complete spark driver log file of this step will be downloaded locally by the browser.

Image Modified 

...


FileAppender in spark-driver-log4j.properties

If the user does not want to upload the log of the spark driver to hdfs during the build job, the configuration item in spark-driver-log4j.properties can be changed:

true
Code Block
languagebash
titleModify spark-driver-log4j.properties
linenumberstrue
collapse
vim ${KYLIN_HOME}/conf/spark-driver-log4j.properties 
log4j.rootLogger=INFO,logFile

After modifying the configuration, restart kylin, and then the spark driver log of one step of a job will be output to the local file:  $ ${KYLIN_HOME}/logs/spark/${step_id}.log .

Executor

...

Log

SparkExecutorHdfsAppender in spark-executor-log4j.properties

spark-executor-log4j.properties is used to configure the output path, appender, layout, etc. of spark executor log in the build job. Similar to spark driver log, spark executor log of one step of a build job will be output to a folder in hdfs.

Each file in this folder corresponds to an executor log. The path is $is ${kylin.env.hdfs-working-dir}/{kylin.metadata.url}/${project_name}/spark_logs/executor/yyyy-mm-dd/${job_id}/${step_id}/executor-x.log.

Image Modified


Logger for Query

Using spark-executor-log4j.properties

Users can change the default log4j configuration files for Kylin.

Code Block
vim ${KYLIN_HOME}/conf/kylin.properties

kylin.spark.executor.log4j.properties=spark-executor-log4j.properties

Then Kylin will start with spark-executor-log4j.properties to collect and store logs of Kylin's Query engine.


Executor Log

SparkExecutorHdfsAppender in spark-executor-log4j.properties

spark-executor-log4j.properties is used to configure the output path, appender, layout, etc. of spark executor log in the query job.

Similar to spark driver log, spark executor log of query job will be output to a folder in hdfs. Each file in this folder corresponds to an executor log. The path is ${kylin.env.hdfs-working-dir}/{kylin.metadata.url}/${project_name}/_sparder_logs/yyyy-mm-dd/${job_id}/executor-x.log.

Image Added


Troubleshooting

When the spark job submitted by kylin is submitted to the yarn cluster for execution,  the the user who uploads the spark executor log to HDFS may to HDFS may be yarn.

At this time, the user of yarn may not have write permission to the hdfs directory ${kylin.env.hdfs-working-dir}/${kylin.metadata.url}/${project _ name}/spark_logs, which leads to the failure of uploading spark executor log.

At this time, when viewing the task log with "yarn logs -applicationId <Application ID>",  you you will see the following error:

Image Modified

This error can be solved by the following command:

bash
Code Block
language
titleacl
linenumberstrue
collapsetrue
hadoop fs -setfacl -R -m user:yarn:rwx $rwx ${kylin.env.hdfs-working-dir}/{kylin.metadata.url}/${project_name}/spark_logs

...