CloudStack 4.6 introduces a new way of configuring routers (basic, isolated and VPC). The new code is intended to achieve two goals:

  1. Create a persistent config.
    This means that configuration is maintained across crashes and reboots.  Also every configuration command sent to the router will ensure that all configuration items are still correct.
  2. Implement redundancy for VPCs (Virtual Private Cloud)
    This is the main difference between "old style" routers and "new style" VPCs.

Provisioning process

The management server sends two components to the router (via the normal communication channels that are hypervisor dependent).  

  1. A JSON file 
  2. and a command to execute.

The JSON file sent varies according to the command being executed.  The command to execute is always update_config.py with a single parameter, the name of the JSON file to process. The JSON file is place in the directory /var/cache/cloud on the router.  The actual processing of the file occurs via merge.py.

For instance the ipassociation command will send data to either add or remove an ip address from the router.  The data in the JSON will be merged with the existing ip addresses on the router and either added or removed from the central configuration. The central config files are stored in /etc/cloudstack.

When the merge process is complete, configure.py is called. This will work through all configuration files in /etc/cloudstack and apply any changes to the servers actual configuration. Thus each command execution causes the machines configuration to be checked and reapplied if necessary. This is to ensure that the router's configuration remains consistent.

Boot Process

During the boot process, the init script cloud-early-config is invoked. This script does many things and is, truthfully in desperate need of refactoring. We chose not to do that at this stage as our changes to the provisioning process were already pretty invasive.
During the execution of this script the "cmd_line" sent to the router via the hypervisor is rewritten to a JSON file called cmd_line.json and stored in /var/cache/cloud.
For the machine types router, vpc_router and dhcpsvr the update_config.py script is invoked. This causes two things to happen:
  1. The basic configuration of the router is setup using the information passed to the router in the "cmd_line"
  2. Any previous configuration already present from before the reboot of the router is also applied. This means that routers can now happily survive being rebooted from outside of CloudStack. Something that was not previously the case.


The redundant VPC is achieved using the same mechanism as with the redundant Virtual Router that existed before: Keepalived in combination with contrackd

The configuration is maintained by configure.py and modules under /opt/cloud/bin/cs also deal with the Keepalive calls to promote to master, set backup or set fault (see /opt/cloud/bin/master.py).

There are some differences with the virtual router implementation:

  • Keepalived is left to decide who will be master and who will be backup. We do not interfere with this process but instead simply report it to the management server. This daemon has undergone many years of development to avoid split brains etc.  We do not see a role for CloudStack to try and reinvent this wheel.
  • VRRP traffic goes over the lowest configured guest interface.  This means, that VPC router is not redundant until it has at least one guest network configured!
  • Adding a new guest network can cause a master/backup election and can cause disruption to the network traffic for some seconds.
  • Script based provisioning, think Terraform etc. goes very fast and Keepalived takes some time to settle down. You may see a situation in which when many provisioning commands are sent one after the other affecting guest networks that it takes 20-30 seconds to settle down.

  • No labels