Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
1. Why need Job scheduler?
In the process of building segment, kylin will produce a lot of tasks to be executed.
In order to coordinate the execution process of these tasks and make efficient and reasonable use of resources, job scheduling mechanism is needed.
2. What schedulers are there in kylin?
In the current kylin version (kylin v3.1.0), there are three kinds of Job Scheduler, and their implementation classes are DefaultScheduler, DistributedScheduler and CuratorScheduler.
Users can choose which scheduler to use by configure parameters kylin.job.scheduler.default to different values.
By default kylin.job.scheduler.default = 0, use DefaultScheduler as job scheduler. Use DistributedScheduler when configured as 2, and use CuratorScheduler when configured as 100.
Please refer to http://kylin.apache.org/cn/docs/install/kylin_cluster.html for configuration method.
3. What is the difference between different job schedulers?
3.1 DefaultScheduler
The DefaultScheduler is the job scheduler initially used by kylin, and it is also the default job scheduler.
Its core logic is to initialize two thread pools. One thread pool, ScheduledExecutorService, is used to fetch all job information in kylin, and the other thread pool JobRunner is used to execute specific jobs.
The ScheduledExecutorService will regularly fetch the status information of all tasks. When the status of a task is Ready, it means that the task can be scheduled for execution, and the task will be handed over to the JobRunner for execution.
DefaultScheduler is a stand-alone version of the scheduler, there can only be one job server under the same metadata.
If kylin.server.mode=all or job, when the Kylin Server process starts, it will initialize the DefaultScheduler and lock the job server. The implementation class of the lock is ZookeeperJobLock, which is implemented by using the temporary node of zookeeper.
Once a job server holds the lock, no other job server can obtain the lock until the job server process is finished.
3.2 DistributedScheduler
DistributedScheduler is a distributed scheduler contributed by Meituan, which is supported since kylin version 1.6.1.
Using the DistributedScheduler as the job scheduler, you can have multiple kylin job servers under the same metadata.
Compared with the DefaultScheduler, the DistributedScheduler reduces the lock granularity, from locking the entire job server to locking the segment.
The implementation class is ZookeeperDistributedLock, which is also implemented by using the temporary node of zookeeper. ZookeeperJobLock is a simple delegator to ZookeeperDistributedLock with a default constructor.
When a segment build job is submitted and the job is scheduled for execution, the jobId will be spelled in the temporary node path of zookeeper, and the node will be locked. When the final state of the whole job becomes SUCCEED,ERROR 和DISCARDED, the lock will be released.
Users can also configure the kylin.cube.schedule.assigned.servers to specify the job execution node of a cube.
3.3 CuratorScheduler
Curatorscheduler is a curator based scheduler implemented by Kyligence, which is supported since kylin v3.0.0-alpha.
CuratorScheduler is master-slave mode. It selects a leader from all job nodes to schedule tasks. It relies on curator-recipes, a high-level feature of curator.
There are two kinds of implementation of leader election for curator, which are LeaderSelector and LeaderLatch.
LeaderSelector is that all the surviving clients take turns to be leader, and a leader releases the leadership after executing the takeLeaderShip method.
LeaderLatch is that once the leader is elected, the leader will not be handed over unless there is client hangs up and triggers the election again.
LeaderSelector is used in kylin to conduct leader election.
Kylin implements CuratorLeaderSelector to control the processing logic of leader election from LeaderSelectorListenerAdapter extension, and rewrites takeLeaderShip method to realize task scheduling. The leaderSelector is initialized in the constructor as the core member variable of CuratorLeaderSelector.
When a kylin job server is started, a jobClient will be initialized from this job server. The jobClient calls the start() method to start. Once started, the jobClient will participate in the leader election.
All job servers under the same metadata will participate in the leader election. And only one job server becomes the leader and other job servers become participants.
The job node that becomes the leader will execute the takeLeaderShip method, which calls the initialization method of the DefaultScheduler, and then this method calls an endless loop, that is to say, this node will always be responsible for scheduling and executing kylin tasks.
Once there is an exception, the process will end and the DefaultScheduler will be terminate. The node will hand over the leadership and re-select a leader from all job servers.