This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Skip to end of metadata
Go to start of metadata

This is a placeholder for the upcoming documentation on the HA facilities and processes CloudStack uses in it's approach.

Bellow are conversation excerpts I have yet to format:

Excerpt from Hari Kannan on HA:

HA is at 2 levels – VMs can be marked HA. In addition, you can mark some hosts as reserved for “Dedicated” HA hosts. Quoting from the manual, the dedicated HA option is set through a special host tag when the host is created. To allow the administrator to dedicate hosts to only HA-enabled VMs, set the global configuration variable ha.tag to the desired tag (for example, "ha_host"), and restart the Management Server. Enter the value in the Host Tags field when adding the host(s) that you want to dedicate to HA-enabled VMs.

Excerpt from Nitin Mehta on HA:

CS has its own HA logic and doesn't use the native HA of the HV and so the question for enabling the HA for hosts doesn't arise. This is true for XS.

Excerpt from Sateesh Chodapuneedi) on HA:

For VMware, CloudStack uses native HA provided by VMware.

VMware provides HA at the level of cluster.

Excerpt from Kelcey Damage) on HA:

it's safe to conclude that HA while enabled on the host(As in the module), must be available cluster wide(uniform cluster). This is how VMware and others operate.

Excerpt from Sanjeev Neelarapu on HA:

AFAIK, in CloudStack if a host crashes CloudStack would detect the host as down after pingTimeout interval.CloudStack does not reduce the available capacity because the host capacity values are not removed from op_host_capacity table. It assumes the host down is a temporary issue.

Excerpt from Mice Xia on HA:

Currently for xenserver/KVM, Cloudstack uses 'storage heartbeat' to detect whether it should start HA, i.e. agent resides on xenserver/KVM periodically writes a timestamp on shared storage, if host network pingTimeOut happens, Cloudstack will investigate if 'storage heartbeat' timeout and if that's the case HA job will be launched for HA enabled VMs on the host. It's a simplified procedure, HA implementation involves delta sync/ investigators and fencers.

Excerpt from Kelcey Damage on HA:

This is turning out to be a great discussion to have. Now I get that CloudStack HA is purely handled by the  management/orchestration engine and only if VM is tagged(which I knew). But what is good to find out is that it does not involve underlying hypervisor specific HA modules(except perhaps VMware). Incidentally VMwares HA mechanism is also called storage heartbeat(5.x+) but it uses hypervisor modules as well.

I do agree with Ahmad that it might be worth looking into expanding our HA suite to support hypervisor specific HA modules as an override to the default CS HA.

There has not been too many HA discussions on the mailing list, and by the looks of it we were all under slightly different impressions.

Excerpt from Clayton Weise on HA:

I haven't had a chance to test it myself to any large degree but from some conversations I have had with CloudPlatform support (the paid product), we had an incident that ran us into a very similar situation.  Due to a bug in XenServer 6.0.0 we went through the upgrade process to 6.0.2 and during that process some of the VMs got moved to other hosts and we lost track of things.  To my delight, CloudStack picked up on this and updated itself as to the host where the VM was now residing.

What this means to me is that if some _other_ mechanism outside of CS were to move VMs around within a cluster (for example, due to a host failure) CS would eventually query XAPI and find out where the VM resides and update the CS database.  Now, like I said, I haven't tested this but in theory this means that if you enabled HA on your hypervisor directly and instructed CS to not do HA, if there was a host failure your hypervisor would take over the HA process and CS would eventually update itself to recognize the re-arranging of VMs in the cluster... in theory.

  • No labels