Purpose

This is functional specification of feature "Support Management and Storage traffic over VMware DVS in CloudStack" which has Jira reference  CLOUDSTACK-3317 - Getting issue details... STATUS

Currently in CloudStack deployments over VMware hypervisor, both management and storage network traffic are supported only over standard virtual switch.

Goal of this feature is to extend CloudStack's integration with VMware DVS to support traffic types other than guest and public.

Bug Reference

CLOUDSTACK-3317 - Getting issue details... STATUS

Branch

CloudStack-4.5.1

Introduction

VMware Distributed Switch is an aggregation of per-host virtual switches presented and controlled as a single distributed switch through vCenter Server at the Datacenter level. vDS abstracts configuration of individual virtual switches and enables centralized provisioning, administration, and monitoring.

Each Standard vSwitch represents an independent point of configuration that needs to be managed and monitored. The management of virtual networks required by instances in the cloud is tedious when virtual networks have to span across large number of hosts. Using distributed vSwitch (vDS) simplifies the configuration and monitoring. Being standalone implementations, standard vSwitches do not provide any support for virtual machine mobility. So there needed a component to ensure that the network configurations on the source and the destination virtual switch are consistent and will allow the VM to operate without breaking connectivity or network policies. Particularly during migration of VM across hosts, the sync up among peers need to be taken care. However in case of distributed vSwitch during VMotion, the vCenter server, would update the vSwitch modules on the hosts in cluster accordingly.

vDS is integral component of vCenter. Hence the native vDS support makes sense for wider and larger deployments of CloudStack over vSphere.

References

Document History

Author

Description

Date

Sateesh Chodapuneedi

Initial Revision

17 Mar 2015

Glossary

  • dvSwitch / vDS - VMware vNetwork Distributed Virtual Switch.
  • vSwitch - VMware vNetwork Standard Virtual Switch.
  • dvPort - Distributed Virtual Port (member of dvPortGroup).
  • dvPortGroup - Distributed Virtual Port Group

Use cases

  1. Cloud administrator would like to use dvSwitch as backend for management traffic
  2. Cloud administrator would like to use dvSwitch as backend for storage traffic
  3. Cloud administrator need more flexibility in combining and planning traffic types over available physical NICs. Given VMware 's restriction that a physical NIC can be used only by a single type of vSwitch at a time, and current integration of CloudStack with VMware dvSwitch not supporting management and storage traffic, if admin prefers to have guest/public traffic over dvSwitch, he/she cannot configure guest or public traffic to use same physical NIC as that of  management/storage traffic. This means it is required to have dedicated physical NIC(s) for each of management and storage traffic. Cloud admin, with infrastructure limitations, would like to overcome this limitation of requiring more physical NICs in ESXi hosts used in CloudStack zone with dvSwitch enabled for public/guest traffic types. With this feature, it is possible to share physical NIC among all traffic types, and give more choice to administrator while planning his/her cloud.
  4. Cloud administrator would like to do frequent re-deployment of CloudStack over same infrastructure (for PoCs etc) with easier cleanup of virtual networks created in ESXi hosts. The standard vSwitch networks makes cleanup tedious in larger environments as they exists at host level and could well be scattered across many hosts in the zone. Admin need to cleanup the virtual networks on per host basis. In case of dvSwitch, admin would be presented with single point of cleanup (dvSwitch itself), within scope of zone.

Feature Specifications

This feature enables CloudStack to do following.

  1. Create and manage management and storage virtual networks over dvSwitch in VMware data center associated with CloudStack zone.
  2. Provide means to cloud administrator to specify dvSwitch as physical network end point to be used for storage and/or management traffic in cloud.
  3. Configure designated dvSwitch for management of virtual networks carrying management and/or storage traffic in cloud.
Configuration of dvSwitch by CloudStack includes following operations
  • Create dvPortGroup over designated dvSwitch
  • Modify dvPortGroup over designated dvSwitch
Configuration of dvSwitch by CloudStack does not include following operations
  • Create dvSwitch
  • Add host to dvSwitch
  • Dynamic migration of virtual adapters of existing VMs across differnt types of virtual switches in scenarios in which co-existance of multiple types of virtual switches. Instead this is left to administrator to decide.
  • Configuration of PVLAN
  • Configuration dvPort mirror
  • Configuration of user defined network resource pools for Network I/O Control (NIOC)
Quality risks
  • Functional
    • Zone configured to use different types of switches for different traffic types
  • Non-functional
    • Large number of VMs and isolated networks need to be tested for performance specific results.
    • Negative usage scenarios - Some admins may want to update traffic labels in an already deployed zone. To accommodate such cases, disable the zone and update physical network traffic labels with new dvSwitch. Restart management server, enable the zone and stop and start system vms to see if all virtual networks are migrated to dvSwitch specified by updated traffic label.
Audit Events
  • Virtual network orchestration events in management server logs
  • Events and tasks on specific VM object in vCenter
  • Events and tasks on designated dvSwitch in vCenter
Graceful failure and recovery scenarios
  • If a non-existent dvSwitch is specified in traffic label, virtual network creation fails by pointing in log/error statement (in management-server.log file) that specified dvSwitch in not found in current vmware data center. Two types of recovery scenarios exist in such case, as follows. Once admin takes corrective action, by editing the traffic label with a right dvSwitch, then CloudStack will pickup right dvSwitch to continue create required dvportgroup(s).
    • [Requires restart of management server] Admin need to specify right dvSwitch (which is already present in current vmware data center) in respective traffic label
    • [Does not require restart of management server] Create/Re-name a dvSwitch such that the name matches with the dvSwitch name specified in traffic label.
  • If vCenter goes offline for very brief intervals due network outages while CloudStack is attempting to create dvportgroup on dvSwitch, then CloudStack would detect invalid session object and do reasonable number of retries to before continuing with job in hand (e.g. deploy SSVM) rather than failure in first attempt itself.
  • If a dvPortGroup or dvSwitch itself is deleted by external entities, upon a power cycle operation of VM, CloudStack can restore all the dvPortGroups associated with virtual networks in which the VM is part of.
  • If traffic shaping or bandwidth limit values are lost/updated by external entities, the values can be restored simply by power cycle operation of any VM in CloudStack, which is connected to those dvPortGroups.
Possible fallback or workaround
  • Connectivity issues observed in VMs connected to dvPortGroups created by CloudStack
    • Could be due to invalid IP range/subnet specified for pod. Failback would be to update pod IP range with correct IP range.
    • Could be due to non-existent VLAN provided in traffic label. Work around is to create the VLAN in upstream switch, if not already created and also ensure the VLAN is trunked/added to switch port to which physical NIC of each of the ESXi host in zone is connected to. Update traffic label with correct VLAN.
    • Could be due to already used IP range specified during pod configuration, which resulted in IP conflict even though dvPortGroup settings are correct. Work around is to update pod configuration with right/un-used IP range/subnet.
  • Insufficient resource issues observed (results in failure) during creation of dvPortGroup during implementation of virtual network by CloudStack
    • Could be result of running out of number of maximum number of dvPortGroups per dvSwitch in large scale environments. Work around is to cleanup any unused dvPortGroup on that dvSwitch.
Sanity check list for support team
  • Check vCenter's network configuration if the dvSwitch specified in traffic label does exist really.
  • Check vCenter's network configuration if the dvSwitch has at least one physical NIC(s) added
  • Check vCenter's network configuration if the dvSwitch has been added to all ESXi hosts in zone.
  • Check vCenter's network configuration if the status of dvPort and physical NIC port in vCenter, whether active or blocked.
  • Check vCenter version if it matches with version this feature supports.
  • Check upstream switch's switch port configuration to confirm if the management/storage VLAN (which were specified while creating physical network, carrying respective traffic, in CloudStack) are added/trunked indeed.
  • Verify network connectivity between management server and vCenter. It might be ping operation or telnet to 443 port (vSphere webservices) on vCenter. CloudStack fires vSphere API call to vCenter webservices on port 443 to create/update dvPortGroup on specified dvSwitch, hence checking the connectivity to vCenter is vital when failures are observed while implementing virtual networks.
  • Verify vCenter health check upon vSphere API failures even though network is very good and faultless. It is possible if vCenter run our of memory/swap space vSphere API calls tend to fail.
  • Check pod IP range is unused by any other VM/physical host to avoid IP conflicts in future
  • Check if storage IP range is unused by any other VM/physical host to avoid IP conflicts in future
  • Verify if traffic shaping policy configured in network offering is indeed applied in dvPortGroup settings.
  • Check if dvPort level override setting is enabled in dvSwitch or not. The dvPort level setting override means the settings configured by CloudStack at dvPortGroup could be overridden and ignored if port level settings are modified by some external entity. Recommendation to admin is not to enable dvPort level override capability.
  • Check if any upgrade from previous version of CloudStack
  • Note down switch type of each traffic type and see if the deployment is supported by specific version of CloudStack from respective product install/admin guide.
  • Check if any traffic label modifications done by admin before the moment failures started occuring. Admin may do this from CloudStack UI/API or by running SQL over CloudStack database. If the update is done via UI/API, it is possible to detect by searching api.log/management-server.log for string "command=updateTrafficType?". This helps in narrowing down the area to investigate on.

  • Check for any product upgrade/patching taken place recently.

  • Check if vCenter/ESXi host upgrade/patching taken place recently.

  • Look for vCenter alarms/warnings and alerts raised against hosts (in the zone) managed by CloudStack in vCenter

Configuration characteristics
  • Use existing global configuration parameter - "vmware.use.dvswitch" of type Boolean. Possible values are "true" or "false". Default value is "false". This parameter acts as umbrella parameter for all types of distributed virtual switches. If this parameter is true, VMware distributed virtual switch would be used.
  • Global configuration parameter - "vmware.ports.per.dvportgroup" of type integer. This would be used only in case of dvSwitch version 4.1 or earlier. This parameter would be ignored for all vSphere version 5.0 or later as CloudStack leverages the autoExpand feature which is available in vSphere from version 5.0 onwards, which enables dynamic up/down scale of dvPorts in dvPortGroup.Default value is 256. Each dvPortGroup created by CloudStack would have this number of dvPorts.
Deployment requirements
  • One or more Hosts with VMware vSphere 5.0 or above
  • One or more vCenter 5.0 or above installed with license required to enable dvSwitch
  • VMware dvSwitch must already be created in the vCenter data center associated with CloudStack zone
  • All the ESXi hosts in a vSphere cluster should be added to dvSwitch before adding the vSphere cluster to a pod in CloudStack zone.
  • Valid/unused set of IP/subnet and VLAN (if applicable)
Interoperability and compatibility requirements
  • Hypervisors - VMware vSphere 5.0 or later
Impact on upgrade/migration
  • An upgrade to the CloudStack version supporting this feature from previous version does not introduce any migration/upgrade operation in existing cloud deployment.
  • This feature gives an option to admin/operator to switch to DVS as network backend from existing network backend of vSwitch. Such a switch/migration is purely optional and admin/operator would need to decide when to carry out the migration. Such migration operation is out of scope of this feature, and need to be carried out separately.
  • Legacy zones and normal zones are treated as same as far as this feature is concerned
Performance & Scalability implications
  • In case of vSphere 4.1 dvPortGroup need to be created with specific number of dvPorts. In large scale deployment optimum use of dvPorts may not be possible due to this pre-allocation. This issue is not applicable to deployments with vSphere version 5.0 or later, CloudStack leverages vSphere 5.0 feature - "autoexpand", which helps in auto increment/decrement of number of dvPorts based on usage/requirement.
  • Network switches (including the vSwitch in ESXi host) keep a distinct forwarding table for each VLAN; this could lead to an increased overhead in packet forwarding when a considerable number of isolated networks, each one with a significant number of virtual machines, is configured in a data center.
Target user
  • Cloud administrator

Architecture and Design description

  1. CloudStack relies on physical network traffic label of specific traffic to learn the physical network end point of configuration for virtual networks carrying that traffic. Physical network traffic label of each traffic supported by CloudStack can be specified while creating physical network in CloudStack zone. CloudStack uses the physical network end point of specific traffic type, whenever a virtual network of respective traffic type has to be implemented.
  2. Hence to support a specific traffic over dvSwitch as network backend, all CloudStack needs to know is the physical network end point of configuration, i.e. name of dvSwitch itself. Here's a how general network entities map to virtual network entities in VMware environment,
    1. Physical NIC(s) ~ dvSwitch
    2. Virtual network ~ dvPortGroup
    3. Virtual NIC ~ dvPort
  3. To support management and/or storage traffic over dvSwitch, CloudStack does following,
    1. Reads physical network traffic labels to understand the designated dvSwitch(es). VmwareResource does necessary create or update operation over dvSwitch.
    2. The traffic label information is passed down to VmwareResource as part of NicTO object while executing following Commands.
      1. StartCommand
      2. PlugNicCommand
      3. PrepareForMigrationCommand
      4. MigrationWithStorageCommand
    3. In VmwareResource, the method getTargetSwitch would parse the traffic label from NicTo object as per traffic label format specified below.
    4. In VmwareResource, dvPortGroup (analogous to CloudStack virtual network) would be created over dvSwitch which is obtained by parsing the traffic label fetched in above step.
  4. For management and storage traffics the traffic label format is ["Name of vSwitch/dvSwitch"[,"VLAN ID"[,"vSwitch Type"]]]
    1. In the traffic label all the 3 fields are optional.
    2. 1st field (highlighted in blue) - Represents the name of virtual/distributed virtual switch at vCenter. The default value assumed would depend upon the type of virtual switch. Defaults values are as follows.
      1. "vSwitch0" if type of virtual switch is "VMware vNetwork Standard virtual switch"
      2. "dvSwitch0" if type of virtual switch is "VMware vNetwork distributed virtual switch"
    3. 2nd field ((highlighted in green) - VLAN ID to be used for this traffic where ever applicable.
      1. By default empty string would be assumed which translates to untagged VLAN for that specific traffic type.
    4. 3rd field (highlighted in red)- Type of virtual switch specified as string. Possible valid values are vmwaredvs, vmwaresvs. Each translates as follows.
      1. "vmwaresvs" represents "VMware vNetwork Standard virtual switch"
      2. "vmwaredvs" represents "VMware vNetwork distributed virtual switch"
      3. If nothing is specified (left empty) then standard vSwitch would be assumed unless global configuration parameter "vmware.use.dvswitch" is set to "true". By default this parameter is set to false
    5. As per above mentioned format, furnishing few possible values for network label,
      1. "" (empty string)
      2. dvSwitch0
      3. dvSwitch0,200
      4. dvSwitch0,300,vmwaredvs
      5. dvSwitch0,,vmwaredvs
      6. vmwaredvs" represents "VMware vNetwork virtual switch.
    6. Port binding would be of type static
    7. CloudStack creates dvPortGroup with "AutoExpand" enabled such that required number dvPorts are allocated/de-allocated automatically increment/decrement of dvPorts This ensures that we don't pre-allocate unnecessary dvPorts for better resource utilization.
    8. Packages that encapsulates the code changes required by this feature are,
      1. cloud-core package (VmwareManager, VmwareResource)
      2. clloud-server package
      3. cloud-plugins-hypervisor-vmware-base package (for mo and util packages)
    9. The scenarios to be covered during developer testing are,
      1. Adding host/compute resource to podcluster.
      2. Create necessary CloudStack managed virtual networks on designated dvSwitch
      3. Deploy/movement of systemvms.
      4. All systemvm life cycle operations which might need orchestration of management/storage network.

UI flow

  • While creating a zone using zone configuration wizard, in physical network configuration page, admin specifies physical network traffic label for management/storage traffic in the format mentioned above (menu item (2) in section "Architecture and Design description"

Usage

No impact.

Security

 No impact.

API Changes

 No  changes

DB Changes

 No changes.

Deployment scenario specific questions

  1. What is level of support for migration Scenarios?
    1. Yes, migration allowed but not supported implicitly. Admin need to do this explicitly. All instances/systemvms over specific network backend being migrated need to be stop and started. Refer to sample workflow at wiki for "Support Management and Storage traffic over VMware DVS in CloudStack"
  2. Does an existing CloudStack deployment over standard vSwitch need to be migrated to dvSwitch?
    1. No
  3. Allow mix of switches within management and/or storage traffic in single zone?
    1. No
  4. Allow mix of switches between different types of traffic types in single zone?
    1. Yes

 

  • No labels