Bug Reference
CLOUDSTACK-10143
-
Getting issue details...
STATUS
Overview
The current VM migration is performed over unencrypted TCP port using the URI scheme qemu+tcp:// and this can allow snooping adversaries to read the VM's state (memory and states) and metadata. With the acceptance of the new CA framework in CloudStack, we can use the framework and sub-system to enable secured live VM migration across KVM hosts.
Feature Specification
After a KVM host is secured by the CA framework, the following files are created in its /etc/cloudstack/agent directory:
cloud.ca.crt: The CA certificate bundle
cloud.crt: The KVM host certificate
cloud.key: The KVM host private key
cloud.csr: The CSR file
cloud.jks: The Java keystore file (the passphrase of which is stored in agent.properties file)
These certificates can be in turn be used to configure TLS for libvirtd while adding a new KVM host using the cloudstack-setup-agent script that already configures libvirtd.conf with following:
listen_tcp=1
tcp_port="16509"
auth_tcp="none"
listen_tls=0
Such additional configuration will be bound by a (cluster scope) global setting "kvm.libvirtd.tls.enabled", when set to true will perform the following during host addition via the cloudstack-setup-agent script by passing an additiona flag such as "-s" or "--securelibvirtd":
- In the /etc/libvirt/libvirtd.conf (and similar on other non-centos distros) file, the paths to CA, server/client, key files are already assumed to perform minimal changes to the config file we can create some symlinks if the source files exist at /etc/cloudstack/agent path:
ln -s /etc/cloudstack/agent/cloud.ca.crt /etc/pki/CA/cacert.pem
mkdir -p /etc/pki/libvirt/private
ln -s /etc/cloudstack/agent/cloud.crt /etc/pki/libvirt/clientcert.pem
ln -s /etc/cloudstack/agent/cloud.crt /etc/pki/libvirt/servercert.pem
ln -s /etc/cloudstack/agent/cloud.key /etc/pki/libvirt/private/clientkey.pem
ln -s /etc/cloudstack/agent/cloud.key /etc/pki/libvirt/private/serverkey.pem
For reference, the following is seen by default in libvirtd.conf:
#key_file = "/etc/pki/libvirt/private/serverkey.pem"
#cert_file = "/etc/pki/libvirt/servercert.pem"
#ca_file = "/etc/pki/CA/cacert.pem"
- In addition to these changes, remove "listen_tls=0" set by cloudstack-setup-agent. TLS certificate based auth will be enabled therefore explicit authentication using something sasl is not necessary but may be done out-of-band by an admin, by default we'll limit settings to: auth_tcp="none" (this is the current default) and auth_tls="none" (this will be added if global setting was true) in libvirtd.conf.
- Port 16514 needs to be allowed/enabled by firewall/iptables rules, suitable changes will be done in cloudstack-setup-agent and other relevent files.
Proposed Changes
Changes in (KVM) agent:
- Add a new flag such as "-s" in cloudstack-setup-agent that automates the symlinking and additional setup of TLS in libvirtd.conf. The other requirement is that the source cert/pem files already exist at /etc/cloudstack/agent.
- Libvirtd-tls secured hosts should write a config such as libvirtd.tls.enabled=true in agent.properties file, this will be used by the agent to tell management server that it can support qemu+tls:// during migration.
- When certificates are renewed keystore-cert-import may be passed a flag to automate the above (wrt configuring libvirtd and reload/restarting it)
Changes in CloudStack management server and Libvirt computing resource:
- Introduce a new global setting "kvm.libvirtd.tls.enabled" with Scope.Cluster to enable TLS for KVM hosts (for live migration etc), this will be enabled by default. We chose to enable this by default as CA framework generates certs during addition of new hosts using the addHost API, and the cloudstack-setup-agent with "-s" flag will nonetheless check that the certificates actually exists before configuring libvirtd.conf. This setting can also be disabled for a cluster or globally. This will be mentioned in the release notes.
- Changes in keystore-cert-import will be suitably made to additionally reload/restart libvirtd on certificate changes (we need to check if this causes any issues, when agent is already running – this may be done via the agent when its internal task queue is 0)
- When a KVM agent connects to the management server, it will share its host details whether it has its libvirtd TLS setting enabled via the StartupRoutingCommand. And this information will be persisted in host_details.
- When a VM migration is requested, the MigrateCommand will contain the tls enabled setting for source and destination hosts which this will cause the MigrateCommand handler to include the URI to use for migration, i.e. "qemu+tls://" if both source and destination hosts have libvirtd with TLS enabled, otherwise for backward compatibility qemu+tcp:// will be used (a warning may be logged when qemu+tcp:// will be used).
- VM migration can fail (with errors such as CA.pem not found etc) when one of the hosts is not TLS enabled but in agent.properties it says it is (in case of any out-of-band changes) such an error will be logged and may be exposed via the migrateVirtualMachine API.
Upgraded/mixed CloudStack environments:
- After upgraded, existing KVM hosts may not have CA-frameworks and certificates generated and will not have the kvm.libvirtd.tls.enabled define and will assume this to be false.
- Post-upgrade, newly added KVM hosts will have libvirtd configured to use TLS as the global setting will be true (globally or at the cluster level)
- On-boarding strategy can be:
- Keep the global setting as true to enable tls on kvm/libvirtd for the cluster or globally
- Execute provisionCertificate API and provide the host id for all hosts and this will create certificates as well as configure libvirtd/tls (if the global setting is enabled) via the keystore-cert-import script.
References
https://wiki.libvirt.org/page/TLSSetup
https://wiki.libvirt.org/page/VNCTLSSetup
8 Comments
Wido den Hollander
Good! However, no daemon should write to /etc, it should write to something like /var/lib/cloudstack/agent
According to UNIX/Linux rules /etc is only configuration which should be touched by a admin, not by a daemon. It should be possible to run a system with /etc mounted RO.
Rohit Yadav
Thanks for commenting Wido den Hollander, when we add KVM hosts using CloudStack's `addHost` it eventually calls `cloudstack-setup-agent` that writes the following to libvirtd.conf:
Ref: https://github.com/apache/cloudstack/blob/master/python/lib/cloud_utils.py#L732
I think the crux of this FS is to allow configuration of TLS using CloudStack's CA framework (Secure Agent Communications) toggled with/without a global setting. I tested locally and found that we can enable TLS configuration by removing listen_tls=0 or adding listen_tls=1 and configuring auth_tls=none (if a authentication module is used such as sasl, then it complicates the design/implementation by introducing an authenticator that would require additional username/password etc creation/management). I think TLS solves the basic issue of encryption and the CA framework with the CA (root) certificate (chain) validates client/server certificates for both the source/destination hosts performing live VM migration.
Rafael Weingärtner
This is a great feature. I have some questions though.
I noticed that you are proposing to use ‘auth_tls="none"’; what about mutual authentication using the certificates you are injecting into hosts? Does libvirt support that? I ask this because we already have a “CA” authority that is used to generate these certificates, so it makes sense to authenticate both hosts before we start any communication.
Here go my opinions to answer the questions you raise:
I think it is a good idea to have “kvm.libvirtd.tls.enabled” parameter to toggle use/configuration of TLS in KVM hosts. It seems to be a good idea to provide a way to disable it in case problems or bugs relating this feature appear. This means that we have not just global settings, but also cluster-wide settings to control this feature.
I would follow the “kvm.libvirtd.tls.enabled” settings to control the provisioning of certificates.
I would start developing a process without externalizing configurations; if the need to externalize some configurations appears, we can re-work the design to allow it.
Per my previous explanations, I would keep it.
I have one remark:
As the on-boarding strategy you say:
What does it mean? I quite did not understand it. Is it a feature/API method that was created with the CA plugin?
Rohit Yadav
Rafael Weingärtner - please have a look at my reply against Wido's comment around the `auth_tls=none` issue, also see https://wiki.libvirt.org/page/TLSSetup for more details. In short, yes using CloudStack's CA framework the client/server certificates are mutually validated against the issuer's root/CA certificate (chain) that is trusted and setup during host's addition.
With the introduction of the CA framework, certificates and setting up of keystore is always performed. However, there is a global setting (CA provider plugin specific) to toggle authentication (two-way vs one-way). Please read Secure Agent Communications for details. However, we can use this setting to add/remove a flag on the cloudstack-setup-agent script that is called part of `addHost` API that can inturn enable TLS (if a flag is passed).
When addHost API is used to add KVM hosts, it already configures libvirtd.conf file. See my reply to Wido's comment for details. I think the global setting can be used as a way to allow an additional setting of TLS vs non-TLS setup using the addHost+cloudstack\-setup-agent approach.
Refer to the Secure Agent Communications FS, in short, yet there exists a new API part of the new CA framework that can be used to provision (setup certificates and keystores) of KVM hosts added pre-4.11 upgrade:
(provisionCertificate) Issues and propagates client certificate on a connected host/agent using configured CA plugin
This API is asynchronous.
Required params are hostid
Parameters
==========
provider = (string) Name of the CA service provider, otherwise the default configured provider plugin will be used
reconnect = (boolean) Whether to attempt reconnection with host/agent after successful deployment of certificate. When option is not provided, configured global setting is used
hostid = (uuid) The host/agent uuid to which the certificate has to be provisioned (issued and propagated)
The new CA framework's first use-case was to secure agent-management-server connection using certificates (two-way SSL handshake), however only one-way SSL handshake (legacy behaviour) is enforced on upgraded environments. See the FS for details, or admin docs.
Marc-Aurèle Brothier
Good news you made the live migration working (right?) on master. Is it really something we want to control under CS on the agent installation all this libvirt TLS setup? Maybe the installation could write libvirtd configuration file for TLS and non-TLS setup in CS and/or libvirt /etc directory but without overriding the normal one. I have to admit I'm not familiar with how things are usually done in CS for external components.
You can also add to cloudstack configuration the libvirt flags used for the live migration, which should be customizable in some way. On my PR it's in agent.properties, but it could be sent along with the migration command.
I would welcome if you could setup a wiki page that I could edit on the KVM live migration so I could add my remark on my experience and things to config/consider.
On your question: +1 on having the configuration value for TLS or plain tcp.
Rohit Yadav
Thanks for your comments Marc-Aurèle Brothier, please also read my reply above. Live migration to some extent already working on KVM using qemu+tcp://. Libvirtd setup/configuration is already done by cloudstack-setup-agent when addHost API is called to add a KVM host using CloudStack, and this FS simply wants to reuse this existing tool/script to enable TLS configuration (based on a global setting, cluster level perhaps) for libvirtd. Nice idea – we can also add a flag in agent.properties to note that libvirtd tls setup was done (as token for future reference).
To add a new FS (design document) say under 4.11, go here and "Create" a page: 4.11 Design Documents. Alternatively, once you're logged in you should be able to create any wiki page. Let me know or email on dev@ if you've any issue.
Marc-Aurèle Brothier
Rohit Yadav Live migrations are working for VM with remote disk, right? My PR is about live migration with local storage, which uses other flags to inform libivrt to move the data disk too.
I can see the "create" button now on the wiki, so I'll be able to add a page on the documentation for my PR https://github.com/apache/cloudstack/pull/1709 - thanks!
Rohit Yadav
Thanks all your for reviews and comments, I'll amend the wiki by end of this week and leave it open for further discussions and comments.