GPU and vGPU support for CloudStack Guest VMs

Glossary

GPU - Graphics Processing Unit

pGPU - Physical Graphics Processing Unit.

vGPU - Virtual Graphics Processing Unit

card - GRID cards, which supports vGPU functionality.

GPU group - a group has one of more cards of the same type.

Introduction

GPU (graphics processing unit) is used for accelerating the image output in a frame buffer intended for output to a display. GPU-accelerated computing offers unprecedented application performance by offloading compute-intensive portions of the application to the GPU, while the remainder of the code still runs on the CPU. From a user's perspective, applications simply run significantly faster ^[1]. GPU can also be used as General Purpose Computing as GPGPU(General-purpose computing on graphics processing units) which has significant importance in the field of machine learning, image processing, linear algebra, statistics and Cloud Computing.

Nowadays, physical servers are coming up with graphic cards that have multiple GPUs, VMs running on the cloud can leverage the high computation power of GPU to meet high graphics processing requirements like Auto-CAD, Photoshop etc. Also, there are cards in the market which support sharing of a GPU card among multiple VMs by creating vGPUs for each VM, e.g. NVIDIA has introduced vGPU capable cards GRID K1 and K2 which allow multiple vGPUs on a single physical GPU.

With vGPU technology, the graphics commands of each virtual machine are passed directly to the underlying dedicated GPU, without translation by the hypervisor. This allows the GPU hardware to be time-sliced and shared across multiple VMs ^[2].

XenServer has added the support for NVIDIA GRID K1 and GRID K2 cards ^[3]. It allows the VMs on XenServer hosts to use the GPU cards in following ways:

GPU-passthrough: It allows the hypervisor to assign the entire PGPU to a VM, this is useful for power users.
VGPU: It allows the VM to share a PGPU device with other VMs, this is useful for tier 2 users.

Purpose

This is the functional specification of the feature "GPU and vGPU support for CloudStack Guest VMs" which has JIRA ids CS-4760 and CS 4762.

Bug Reference

References

Feature Specification

Currently, CloudStack doesn't understand and leverage the GPU cards installed in a host. This feature will help users/admin to assign a physical GPU(GPU-passthrough) or a portion of a physical GPU card(vGPU) to a guest VM at the time of VM deployment or at any later stage by changing the service offering. It will help run graphical applications on VMs running in CloudStack.

CloudStack does following:
- Add hosts with GPU/vGPU capability provisioned by admin.
- Query hypervisor to obtain information about the cards, supported vGPU types (in the case of grid cards), the capacity of the cards.
- Create compute offering with GPU/vGPU capability.
- Deploy VM with GPU/vGPU support.
- Destroy VM with GPU/vGPU support and releases the resources accordingly.
- Allow the user to add GPU/vGPU support to a VM (without GPU/vGPU support) by changing service offering and vice-versa.
- Migrate VMs (cold migration) with GPU/vGPU support.
- Manages GPU cards capacity.
Gracefully failure and recovery scenario.
CloudStack generates alerts whenever a VM deployment with GPU/vGPU service offering is requested but there are not suitable GPU/vGPU capacity available.
CloudStack generates suitable usage events when a VM is deployed, destroyed or migrated using GPU/vGPU.
Global configuration:
- None
In XenServer, there are two algorithms to create vGPUs in the given set of PGPUs:
- - DFS
  - BFS
- Default allocation algorithm is DFS.

Negative Usage scenario

VM deployment with GPU support if the hosts with enough GPU capacity is not available.
Creation of service offering with GPU key/values which are not supported.
Live migration of GPU enabled VM (not supported).

General Workflow

Add a host which contains GPU card:
1. Admin adds a host with existing addHost API command and CloudStack checks if the host is GPU enabled or not.
2. Cloudstack queries the hypervisor and detects if the given host is GPU enabled.
3. If the host has GPU card in it, then CloudStack queries the hypervisor and as a response, it gets the GPU group information, which gets stored in the host_gpu_groups table.
4. If the cards have only GPU-passthrough support, then CloudStack stores the vGPU type as passthrough in the vgpu_types table.
5. If the cards also have vGPU support, then CloudStack checks for the enabled vGPU types in the hypervisor and stores them in the vgpu_types table.
6. CloudStack also manages these cards by maintaining the capacity of each card.
Create a compute offering with GPU/vGPU support:
1. Create a compute offering with following two options:
  1. GPU-passthrough: To assign entire PGPU.
  2. vGPU: To share a PGPU device. Admin also needs to pass the vGPU type. Steps are explained in Web Services APIs section.
Deploy a VM with GPU/vGPU support:
1. Admin/user can deploy a VM with GPU/vGPU support by selecting appropriate service offering.
2. Once the deployment of VM with GPU/vGPU support is triggered, the allocator will decide which host to choose for VM deployment based on following criteria:
  1. The host has GPU cards in it.
  2. In case of vGPU, it checks if cards have the required vGPU type support.
  3. Cards have enough capacity available.
3. Once the host is final, Cloudstack will go ahead with VM deployment.
4. In case if allocator doesn't find any appropriate host, then it will through an InsufficientServerCapacity exception.
5. Once the VM deployment is successful, CloudStack will make a resource call to the hypervisor to get the remaining GPU capacity in the card and update the DB accordingly.
Deploy a VM without GPU/vGPU support (This scenario is to protect GPU resources and deploy this VM in non-GPU hosts):
1. Admin/user can deploy a VM without GPU/vGPU support by selecting non-GPU service offering.
2. Once the deployment of VM without GPU/vGPU is triggered, the planner and allocator will decide which cluster and which host to choose for VM deployment based on following criteria:
  1. Once the planner gets a list of clusters which has enough capacity to host this VM, deployment planner reorders this cluster list to put clusters with GPU enabled hosts at the bottom of the list. Please look at next section for more details.
  2. When planner passes this the list of clusters to the allocator, then allocator checks all the hosts in that cluster and if there is a GPU-enabled host in that cluster, allocator reorders this host list and put GPU enabled hosts at the bottom of the list.
3. Once the host is finalized, CloudStack will continue with VM deployment.
4. In case if allocator doesn't finds any appropriate host, then it will through an InsufficientServerCapacity exception.
Destroy a VM with GPU/vGPU support:
1. Admin/user can destroy a VM with GPU/vGPU support by navigating through the existing destroyVM flow.
2. Hypervisor will detach the GPU card from the VM and make it available to use.
3. Once the destroy VM is successful, CloudStack will make a resource call to the hypervisor to get the remaining GPU capacity in the card and update the DB accordingly.
Migrate(cold migration) a VM:
1. Search for hosts available for VM migration and satisfies GPU requirement.
2. If the host is available, stop the VM in the current host and perform the VM migration task, else
3. If the VM migration is successful, update the remaining GPU capacity for both the hosts accordingly.

Protecting non-GPU instances from being deployed on GPU capable hosts

Default first fit planner only look at the CPU and memory capacity of the cluster to create an ordered cluster list. The (host) allocator then goes over the hosts in a cluster to find a suitable host for deployment. If a suitable host isn't found, the next cluster in the ordered list is searched for a deployment destination. Since the default planner isn't aware of the GPU capabilities of a host, it may lead to a situation where VMS not requiring GPU capabilities use up CPU/memory resources on GPU capable host. To address this issue the cluster list needs to order such that for non-GPU VMs, clusters with GPU hosts are towards the end of the ordered list. The solution needs to be generic enough to handle use cases for other capabilities in future.

The scenario is handled as follows

A concept of implicit tags for hosts has been introduced. All hosts with GPU capabilities will be tagged with an implicit tag ‘GPU’. The default planner will order the cluster list from least -> most count of unique implicit tags within the cluster.
Implicit tags will be implemented as a comma separated list of global configuration parameters. The category will be ‘Hidden’. Initially, ‘GPU’ will be the only implicit tag available. The list can grow as support for other features is added in future.
When management server connects to a host it’ll tag a host with implicit tag ‘GPU’, if the host has GPU capabilities.
In addition to the CPU and memory heuristics, the default planner will order the clusters from least to -> most count of unique implicit tags in a cluster.
This will make sure clusters which do not have GPU hosts in them are picked first for deploying non-GPU instances.

DB Schema changes:

New tables

host_gpu_groups (id, group_name, host_id, update_time) : This table contains the mapping between host and available pci_devices in that host.
Field
Type
NULL
Key
Default
Comments
id
bigint(20) unsigned
NO
PRI
NULL
auto_increment
group_id
varchar(40)
NO

NULL
GPU group name
host_id
bigint(20) unsigned
NO

NULL

vgpu_types (id, vgpu_type, gpu_group_id, remaining_vm_capacity) : This table contains the record of available vgpu profiles.

Field	Type	NULL	Key	Default	Comments
id	bigint(20) unsigned	NO	PRI	NULL	auto_increment
gpu_group_id	bigint(20) unsigned	NO		NULL	"id" in the host_gpu_groups table.
vgpu_type	varchar(40)	NO		NULL	Name label of the vgpu type.
video_ram	bigint(20) unsigned			NULL	video RAM for this vgpu type.
max_heads	bigint(20) unsigned			NULL	maximum displays per user.
max_resolution_x	bigint(20) unsigned			NULL	maximum X resolution per display.
max_resolution_y	bigint(20) unsigned			NULL	maximum Y resolution per display.
max_gpu_per_pgpu	bigint(20) unsigned			NULL	maximum no. of vgpu per gpu (pgpu).
remaining_capacity	bigint(20) unsigned			NULL	remaining vGPU can be created with this vgpu type on the given gpu group.
max_capacity	bigint(20) unsigned			NULL	maximum vgpu can be created with this vgpu type on the given gpu group.

Web Services APIs:

createServiceOffering : This API allows creation of service offering. Now, to create a service offering with gpu/vgpu support, admin needs to pass additional below mentioned information with the help of serviceofferingdetails parameter in key-value pair format.

Supported new keys	Corresponding values
pciDevice	Group of NVIDIA Corporation GK107GL [GRID K1] GPUs Group of NVIDIA Corporation GK104GL [GRID K2] GPUs (or any other GPU group name)
vgpuType	GRID K100 GRID K120Q GRID K140Q GRID K160Q GRID K180Q GRID K200 GRID K220Q GRID K240Q GRID K260Q GRID K280Q

e.g.: To create a service offering with GPU Passthrough support:
http://localhost:8080/client/api?command=createServiceOffering&serviceofferingdetails[1].key=pciDevice&serviceofferingdetails[1].value=Group%20of%20NVIDIA%20Corporation%20GK107GL%20[GRID%20K1]%20GPUs&serviceofferingdetails[2].key=vgpuType&serviceofferingdetails[2].value=passthrough&response=json&.........

To create a service offering with vGPU support:
http://localhost:8080/client/api?command=createServiceOffering&serviceofferingdetails[1].key=pciDevice&serviceofferingdetails[1].value=Group%20of%20NVIDIA%20Corporation%20GK107GL%20[GRID%20K1]%20GPUs&serviceofferingdetails[2].key=vgpuType&serviceofferingdetails[2].value=GRID%20K260Q&response=json........

listServiceOffering: If service offering is GPU/vGPU enabled, then this API will list the GPU card and vgpuType information as key-value pairs.
- GPU
- vgpuType
listHosts : If the host is GPU enabled, then this API will also list the following GPU related information:
- What are the GPU cards (GPU groups) managed by this host.
- Enabled vgpuTypes
  - Video RAM
  - Max Heads
  - Max X resolution
  - Max Y resolution
  - Max vgpu per gpu card (pgpu)
  - Remaining Capacity (remaining no. of vgpu that can be created with this vgpu type in given gpu group).
  - Maximum Capacity

listVirtualMachines : If the VM has GPU support, then this API will also include the following information in VM statistics:
- VGPU type

Interoperability and compatibility requirements

Supports XenServer 6.2 SP1 and later versions ^[4].
GPU/vPGU functionality is supported for following HVM guest operating systems:
- Windows 7 (x86 and x64)
- Windows Server 2008 R2
- Windows Server 2012
- Windows 8 (x86 and x64)
- Windows 8.1 ("Blue") (x86 and x64)
- Windows Server 2012 R2 (server equivalent of "Blue")
CS do not restrict the deployment of GPU enabled VM with guest OS types that are not supported by XS for GPU/vGPU functionality. The deployment would be successful and a GPU/vGPU will get allocated for that VM but due to missing guestOS drivers, that VM would not be able to leverage GPU resources. So, it is recommended to use GPU enabled service offering only with supported guest operating systems.

NVIDIA GRID K1 (with 16GiB video RAM) AND K2 (with 8 GiB of video RAM) cards supports homogeneous virtual GPUs i.e. at any given time, the vGPUs resident on a single physical GPU must be all of the same type. However, this restriction doesn't extend across physical GPUs on the same card. Each physical GPU on a K1 or K2 may host different types of virtual GPU at the same time. For example, a GRID K2 card has two physical GPUs, and supports four types of virtual GPU; GRID K200, GRID K220Q, GRID K240Q, AND GRID K260Q ^[3].

GRID	Profile Name	No. of Physical GPUs	Video Ram per vGPU	Max Resolution	vGPUs per GPU	No. of vGPUs per Grid Card
K1	GRID K100	4	256 MB	1920 x 1200	8	32
	GRID K120Q		512 MB	2560 x 1600	8	32
	GRID K140Q		1 GB	2560 x 1600	4	16
	GRID K160Q		2 GB	2560 x 1600	2	8
	GRID K180Q		4 GB	2560 x 1600	1	4
K2	GRID K200	2	256 MB	1920 x 1200	8	16
	GRID K220Q		512 MB	2560 x 1600	8	16
	GRID K240Q		1 GB	2560 x 1600	4	8
	GRID K260Q		2 GB	2560 x 1600	2	4
	GRID K280Q		4 GB	2560 x 1600	1	2

Logging and debugging

All the VM activities involving GPU would be logged at different log levels (INFO, ERROR, DEBUG) in management server log file.

Operations not supported

Live VM migration.
VM snapshot with memory is not supported.
Current Limitations
Reservation of GPU resources: CloudStack does not hold the GPU resources in cases like stopped VM or to recover a destroyed VM in the same host.
XenServer supports only one VGPU creation per VM, so admin/user cannot create more than one VGPU for a VM.
Host maintenance will put vGPU VMs in that host into stopped state and migrate non-vGPU VMs to other suitable hosts.

Upgrade Scenario

During CloudStack Operation, cloudstack checks for GPU cards in all the hosts that are being managed by CloudStack.
If cloudstack finds any GPU enabled host, it persists the entry in host_gpu_groups and vgpu_types table.

UI Flow

To create service offering with vgpu support: Goto "Service Offerings" tab ==>> "Add compute offering", user will get two parameter fields, "GPU" and "vGPU Type". As of now, two cards are supported in UI i.e. NVIDIA GRID K1 and K2 card but if user is using any other card then he/she has to use the API and pass full GPU card name in "GPU" parameter field and "passthrough" in vGPU Type field as mentioned under web services API section.

To check if a service offering is GPU/vGPU enabled, so to service offering detail view and check GPU and vGPU Type fields:

If the host is GPU enabled, then go to Infrastructure -> Hosts -> Details there user will see a new tab "GPU" tab next to Statistics tab:

If the VM is GPU enabled then apart from CPU and MEMORY details in VM detailView page, VGPU parameter will also get updated with vgpu model name:

In Dashboard, "System Capacity" will have one more pie chart to display Zone-Wide GPU information.

Space shortcuts

Child pages