Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In above directory tree, the directory which end with "managed by tool" means StorageCleanupJob will try to check and delete useless files under these directory.

For directory table_snapshot, dict/global_dict, parquet/{CUBE_NAME}, parquet/{CUBE_NAME}/{SEGMENT_NAME} , Kylin will mark files which is unreferenced and stale(by checking last modified time) as garbage. 

For directory job_tmp, Kylin will only check last modified time. 

How to use

Option Table 

OptionData TypeDefault ValueComment
deleteBooleanfalseBoolean, whether or not to do real delete operation.
Default value is false, means a dry run.
cleanupTableSnapshotBooleantrueBoolean, whether or not to delete unreferenced snapshot files. Default
value is true .
cleanupGlobalDictBooleantrueBoolean, whether or not to delete unreferenced global dict files. Default value
is true .
cleanupJobTmpBooleanfalseBoolean, whether or not to delete job tmp files. Default value is false .
cleanupThresholdInteger168Integer, used to specific delete unreferenced storage that have not been
modified before how many hours (recent files are protected). Default value
is 168 hours.

...