Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The rebalance mode used for this resource is FULL-AUTO mode. Thus, administrators don’t need to maintain a list of controller hosts in the ideal state. This new resource is auto-rebalancing. Once a controller is shutdown, it will be removed from the lead controller resource. Thus, it’s easy for administrators to add/remove/swap controller hosts with least effortPlus, this resource leverages CRUSH-ed rebalance strategy, which provides more even partition distribution so that all the Pinot controllers can be in master state evenly across all the partitions.  The default delayed rebalance time is set to 5 minutes, which means if some Pinot controller host goes offline, Helix cluster will wait at most 5 minutes for this controller to recover. If timeout is met, Helix cluster will elect a new master. The benefit of having this 5-minute delay is that it won't be too often to switch leadership (e.g. when restarting all the controller hosts one by one, it won't take 5 minutes to finish the restart). Plus, since some periodic tasks run more frequently, 5 minutes isn't a too long number so that Helix cluster can take action to elect a new leader timely.

Controller Modes

There’re 3 different modes to run controllers, i.e. dual mode, Helix only, and Pinot only. 

...

  1. Refactor existing controller code to run Helix only controller (https://github.com/apache/incubator-pinot/pull/3864). 
  2. Refactor the existing code so that Pinot controller can have a unique interface for all the periodic workloads (https://github.com/apache/incubator-pinot/pull/3264).
  3. Add logic to create the new resource but disable it in HelixSetupUtils class. The rebalance mode can be set as FULL-AUTO (https://github.com/apache/incubator-pinot/pull/4047).
  4. Add controller config in Pinot controller side to choose whether to use (i.e. Pinot only mode, Helix only mode or dual mode(default mode)) (https://github.com/apache/incubator-pinot/pull/4323).
  5. Add logic in controller side to start checking whether new resource is enabled or not. Pinot controller will cache the partition number once it becomes master of the partition. If lead controller resource is yet disabled, controller won’t get any state transition messages (https://github.com/apache/incubator-pinot/pull/4323). 
    iWhen there’s a state transition from Slave to Master for Partition_X:  Cache Partition number X in Pinot controller.
    ii. When there’s a state transition from Master to Slave for Partition_X:  Remove Partition number X from cache in Pinot controller. 
    iii. When a periodic task is run, or real-time segment completion request is received: 

  6. Add logic in server side to look at new resource if it’s disconnected from Helix controller & new resource is enabled or not. Currently server side logic caches the previous lead controller. With this new feature, the caching logic will still be on, and new checks will happen only when disconnected or we get not_leader message back. Since Pinot server only fetches external view once and will cache the new leader information, it doesn't increase ZK reads by too much.

 

...

Migration Rollout Plan

The deployment plan consists of 3 4 steps. 

Step 0 

Roll out all the code changes and don’t enable the new resource yet. We won’t make any code changes after this step.  

Step 1 

Add as many new controllers to the cluster as the number of helix controllers you need. Be sure to include redundancy for failures/upgrades. We suggest three new controllers. Start them in dual mode. These dual-mode controllers Make sure to have the cluster up and running before these steps. Right now all the controllers are in dual mode; they will be Pinot-only mode controllers once rollout completed. 

Step

...

1 

Enable the new resource. The way of enabling it is to change the setting "RESOURCE_ENABLED" from false to true in ZNode CONFIGS/RESOURCE/leadControllerResource in ZK.
Here is the value of leadControllerResource before enabling the resource:

{
 "id" : "leadControllerResource",
 "simpleFields" :{ "RESOURCE_ENABLED" : "false" },
 "mapFields" : {},
 "listFields" : {}
}

Once the setting "RESOURCE_ENABLED" turned enabled, all All the dual-mode controllers will be immediately registered as masters/slaves in the new resourcePeriodic tasks and real-time segment completion will immediately be distributed. Have it bake for several weeks. During this time, we can test the robustness of this feature by trying to disable and re-enable the resource, running stress tests like simulating node connection loss/failures, or bumping up a compatible Helix version. The The following criteria must be met in order to test the robustness of this feature before we move on to the next step and they might take several weeks to achieveIt could take days, weeks or months depending on the installation:

  1. All LLC and HLC tables have completed at least one segment and started new ones.
  2. All tables are accounted for in all the periodic tasks (no table is ignored).
  3. At least one round of rolling restart of pinot controllers is done, and criteria 1 and 2 are verified after the restart.

...

  1. If any of these criteria goes wrong, disable the lead controller resource

...

  1. and everything comes back to the original state. 

If you want to keep Helix controller and Pinot controller running in the same hardware, you can stop at this step. If you want to have these two controllers run in separate hardware, please follow the following steps below.

Step 2

...

After verifying everything working fine, we can add 2 to 3 1 or more Helix-only controllers to the cluster, so that they can be the candidates of the Helix cluster leadership.

Step 3 

Restart Then, switch all the dual mode controllers to Pinot-only mode one by one. After doing so, only Helix-only controller can be Helix leader, and all the Pinot-only controllers only work on Pinot’s workloads. Rollout finished. 

Image RemovedImage Added

Rollback Plan

Rollback plan is the reverse of rollout plan: rollout plan. If anything goes wrong in the rollout plan, make sure that no more further step left before rolling back the current step.

Step 1 

Restart all the Pinot-only controllers to dual-mode controllers.  

...

Disable lead controller resource. All the controller workload will be done by Helix leader.

API Design

In order to get the information of table assignments for better debugging purpose, the following APIs are needed: 

Get the leaders for all the tables in the cluster 

GET /leader/tables 

lead controller resource enabledlead controller resource disabled
{
  "leadControllerResourceEnabled": true,
  "leadControllerEntryMap": {
    "leadControllerResource_0": {
      "tableNames": [
        "testTable1_OFFLINE"
      ],
      "leadControllerId": "Controller_172.25.124.150_9000"
    },
    ...
    "leadControllerResource_23": {
      "tableNames": ["testTable2_REALTIME"],
      "leadControllerId": "Controller_172.25.124.150_9008"
    }
  }
}
{
  "leadControllerResourceEnabled": false,
  "leadControllerEntryMap": {}
}

Given a table name, return whether lead controller resource is enabled, the partition id and lead controller instance id 

...

GET /leader/tables/{tableName} 

lead controller resource enabledlead controller resource disabled
{
  "leadControllerResourceEnabled": true,
  "leadControllerEntryMap": {
    "leadControllerResource_7": {
      "tableNames": [
        "testTable_OFFLINE"
      ],
      "leadControllerId": "Controller_172.25.124.150_9000"
    }
  }
}
{
  "leadControllerResourceEnabled": false,
  "leadControllerEntryMap": {}
}


Test Plans and Schedule

Once the final plan has been adjusted and finalized, we can do the following steps. 

...