Introduction

Purpose

Currently CS support volume snapshot, which is an EC-2 like public cloud solution.

It addresses problems like ‘what if my volume lost or broke down, or what if my primary storage got an unrecoverable disruption’, in other words, it’s more like a backup solution, and it does take considerable long time to backup and restore, especially for large volumes which are unfortunately favored by customers.

There are growing needs for VM snapshot, just like what Xenserver and VMware ESXi do.

It addresses requirement such as 'I want to save everything right now so that I can revert back in the future, and both operations can be done within seconds’, mainly used for private cloud.

References

Document History

Glossary

Use cases

Feature Specifications

VM Snapshot creation

VM Snapshot limitations

 

Disk-only Snapshot
as VM in running state

Disk-only Snapshot
as VM in Stopped State

Memory-Disk snapshot

Xenserver Free Edition

Yes

Yes

No

Xenserver Enterprise Edition

Yes

Yes

Yes

KVM

No

Yes

Yes

VMware

Yes

Yes

Yes

VM Snapshot deletion

VM Snapshot revert

VM Snapshot List

Performance consideration

Event

Capacity

Global config/Limit

Usage

Restriction on VM with vmsnapshots

Architecture and Design description

API

API

parameter

response

createVMSnapshot

  • vmId (required)

vmSnapshot

deleteVMSnapshot

  • vmSnapshotId (required)

jobid

listVMSnapshot

  • id (optional)
  • domainid (optional)
  • state (optional)
  • accountId (optional)
  • vmId (optional)

vmSnapshot[]

revertToVMSnapshot

  • vmSnapshotId (required)

VM

UI Change

      

      

      

Database Schema

New tables: vm_snapshots (only import columns are listed here)

column

comment

id

primary key, auto-increment

uuid

unique key

name

unique internal name generated by system, like i-2-58-TEST_VS_20121118140427

display_name

snapshot name provided by user when creating VM snapshot

description

a short description provided by user when creating VM snapshot

account_id

owner

domain_id

 

vm_snapshot_type

enum {Disk, DiskAndMemory}

state

VM snapshot state

parent

parent VM snapshot Id

current

if this VM snapshot is current

vm_id

VM id

updated

 

created

 

revmoed

 

HighLevel WorkFlow

VMSnapshot state machine

createVMSnapshot:

Common workflow

  1. check authority, concurrency, existence...
  2. allocate VM snapshot entry in DB
  3. transit vm and vmsnapshot state to snapshotting/creating
  4. prepare TO object and CreateVMSnapshotCommand
  5. send command to agent
  6. update DB, like current/parent fields or volume table, depending on CreateVMSnapshotAnswer and TO object
  7. transit vm and vmsnapshot state

Xenserver

  1. check if this vm snapshot already exists, if yes, return suceeded
  2. check if there are existing snapshot task for this vm snapshot, if yes, it means a re-entrant method call from fullsync, skip creation and wait for this task
  3. find target VM, or build a worker VM on the fly if it does not exist
  4. depends on snapshot type, call corresponding Xenserver APIs
  5. Xenserver does not change volumes' path after take a VM snapshot, no need to pack volumeTO into answer object

KVM

  1. check if this vm snapshot already exists, if yes, return suceeded
  2. find target VM, or build a worker VM on the fly if it does not exist
  3. based on VMSnapshotTO object in command, re-define parent snapshots metadata chain on the fly
  4. call libvirt API to take snapshot

VMware

  1. check if this vm snapshot already exists, if yes, return suceeded
  2. check if there are existing vm.snapshot task for this vm snapshot, if yes, wait for it and skip snapshot creation
  3. call vmware sdk to take snapshot
  4. because volumes path will be changed after taking snaphot, return new volumes paths in answer

revertToVMSnapshot:

Common workflow

  1. check authority, concurrency, existence.
  2. call advanceStart or advanceStop first if revert will change vm's state; for example, when reverting a stopped VM to a DiskAndMemory snapshot, we will start this VM first and then revert it.
  3. transit vm/ vmsnapshot state to reverting
  4. prepare TO objects and send command
  5. update DB with information from Answer object
  6. transite vm/vmsnapshot state

Xenserver

  1. build worker VM if target VM does not exist
  2. call revert plugin
  3. update volumeTO

KVM

  1. find target VM, or define a worker VM on the fly if it does not exist
  2. based on VMSnapshotTO object in command, re-define parent snapshots metadata chain on the fly
  3. call libvirt API to revert

VMware

  1. check if there are existing revert task for this vm, if yes, wait for it
  2. call vmware sdk to revert
  3. update volumeTO

deleteVMSnapshot:

Unlike VM expunging, VM snapshot deletion is designed as a sync operation, there is no daemon thread scanning and expunging them.

the implemention is fairly straightforward:

  1. transit vmsnapshot to expunging state
  2. prepare TO object and send command,
  3. update snapshots tree 
  4. mark as removed

VMSnapshotSync:

  1. Add vm snapshot sync to fullSync and fullHostSync.
  2. It will check if there are any vm snapshot in transient states.
  3. Transient state found during host connection usually means mgmt server restart/outrage, or hypervisor cluster down. Because mgmt server has no idea if those tasks succeed or not, it will re-send the command in question

Enable/disable on a per hypervisor*:*

Add enable/disable by hypervisor_capabilities,

Add a new column ` vm_snapshot_enabled` in table `hypervisor_capabilities`, and change related VO/Dao

Set vm_snapshot_enabled = 1 for VMware/Xenserver

Check hypervisor_capabilities when createVMSnapshot

Testing

Suggest following (but not limited) basic test scenarios

Create one VM snapshot with snapshotMemory (on, off) for (vmware, xenserver, KVM) when VM is (running, stopped)

Revert to previous snapshot when VM is (running, stopped)

Create multiple VM snapshot with snapshotMemory (on, off, mixed) for (vmware, xenserver, KVM) when VM is (running, stopped), the snapshots should form a tree hierarchy, such as:

    A

  /    \

B     C

Revert to any snapshots in the tree when VM is (running, stopped)

Delete (current, any, all) VM snapshots for (vmware, xenserver, KVM)

Attach/detach a volume to a VM when this VM has VM snapshots.

Upgrade VM serviceOffering when VM has snapshots with snapshotMemory (on, off)

take Volume Snapshot when associated VM has VM snapshots