...
Info | ||
---|---|---|
| ||
cat /Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups3/config/clusteringjob.properties hoodie.clustering.inline.max.commits=2 |
2. Schedule clustering
Info | ||
---|---|---|
| ||
bin/spark-submit \ --master local[4] \ --class org.apache.hudi.utilities.HoodieClusteringJob \ /Users/liwei/work-space/dla/opensource/incubator-hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.8.0-SNAPSHOT.jar \ --schedule \ --base-path /Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups3/dest \ --table-name hudi_table_with_small_filegroups3_schedule_clustering \ --props /Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups3/config/clusteringjob.properties \ --spark-memory 1g |
you can find the schedule clustering instant time in the spark logs. With the log prefix "The schedule instant time is" ,and the schedule clustering instant time is 20210122190240
...
3. use the schedule instant time "20210122190240" to run clustering
Info | ||
---|---|---|
| ||
bin/spark-submit \ --master local[4] \ --class org.apache.hudi.utilities.HoodieClusteringJob \ /Users/liwei/work-space/dla/opensource/incubator-hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.8.0-SNAPSHOT.jar \--schedule \ --base-path /Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups3/dest \ --instant-time 20210122190240 \ --table-name hudi_table_with_small_filegroups3_schedulefilegroups_clustering \ --props /Users/liwei/work-space/spark/spark-2.4.6-bin-hadoop2.7/hudi_table_with_small_filegroups3/config/clusteringjob.properties \ --spark-memory 1g |
...