Introduction

VPC in CloudStack currently gives an isolated section of a zone which can contain one or more isolated networks/tiers with connectivity with each other. While current VPC model in CloudStack is natural choice to build multi-tier application, it has limitation that all tiers need to be in same zone and VPC is confined to single zone. So an application hosted in VPC can not be architected to be highly available in case of zone failures as zone becomes single-point-of-failure for the application hosted in VPC. As VPC's gets deployed in multi-zone CloudStack deployments users would expect VPC to be spanning multiple zones to enable use-cases like the one described below.

 

Here is an example VPC deployment which an user want to create on top of CloudStack deployment with two zones. User would like to create a VPC with 2 web tiers, 2 app tiers and one DB tier, and design VPC so that there is a web tier and app tier in both the zones for redundancy. User wish to use GSLB service in CloudStack in combination with zone level LB is used to distribute traffic across the web tier VM's in both the zones and internal LB in both the zones to load balancer traffic from web tier to app tier. DB tier that stretches across the zones where master and slave DB are setup in active-passive failover setup. In this VPC deployment, even if one of the zone goes down application hosted in VPC can still continue to be available.

 

 

While region level VPC with tiers in different zones appears as logical extension to current VPC model, there are challenges which makes it non-trivial to achieve it. If VPC VR is to do inter-tier routing then tiers need to be extended to the zone running VPC VR. VLAN is currently predominant isolation type and provides isolation for the VPC tiers as well. Stretching a VLAN across data centers would need a non-trivial network architecture and needs special solutions [1]. Overlay networks in the context of network virtualization gives us a completely virtualized networks abstracted from underlay/physical network. If there is L3 connectivity between the zones and tunnels can be established cross zones then a virtual network that span zones can easily be built as overlay network. When VPC tiers are built as overlay networks, it open up new possibilities and makes it easy to realise the use cases like above.

Goals

CloudStack has multiple solutions (GRE tunnel overlays with OVS integration, STT overlays with NVP etc) to build overlay networks. As network virtualization with overlay networks becomes main stream, its important for CloudStack to lead and adopt new network models that are made possible with advances in SDN and network virtualization. Following are the high level goals this proposal would like to achieve.

  • introduce notion of VPC which can span one or more zones, in other words a region level VPC in contrast to zone level VPC that exists in CloudStack
  • region level VPC where individual tiers can be in different zones in the region
  • VPC where tiers with in the VPC can be stretched across zones
  • stretching a VPC tier fundamentally requires ability to stretch the guest network across zones as building block. Proposal would like to leverage this fact and add support for both 'isolated' and 'shared' guest network in the advanced zone to able to stretch across the zones.  
  • introduce a generic framework where plug-in's that provide overlay network solution can adopt the new VPC network models
  • extend current OVS plug-in to be able to support region level VPC and guest networks (isolated and shared) stretched across zones

Deployment Model

By default, deployment model for region level VPC shall have a single VPC VR will be providing inter-tier routing and also performing network ACL. Public access shall be provided by VPC VR through NAT & LB services.

      

Above diagrams elaborates default deployment model with an example CloudStack deployment which has a region with four zones. VPC spans across all the zones (tier 1 in zone1, tier2 has a stretched L2 subnet across zones 2 and 3, and tier 3 in zone 4). Inter-tier traffic has to go through the VPC VR in zone 1 even if the destination tier is not in zone 1. Also the outbound public traffic has to go through VPC VR in zone 1. Also default source NAT ip for the VPC and acquired public IP will be from zone 1 and shall be associated with VPC VR.

While with the default deployment model a VPC is fully functional there are short comings. In this deployment model VPC VR is single point of failure defeating the purpose of region level VPC and face traffic trambone[3] problem. Section 'Future work' has additional details on planned enhancements that alleviates this problem and aims for efficient forwarding path for the VPC.

Deployment requirement

For overlay networks that spans zones, it is necessary that there is L3 connectivity across zones to establish tunnel between the hypervisors across zones. So it is assumed the for region level VPC and stretched L2 isolated/shared networks to work, deployment need to have  zones with L3 connectivity.

Purpose & Scope

Purpose of this document is to present functional specification for the enhancements that are needed to achieve above mentioned goals. Scope of this document is restricted to enabling the default deployment model of region level VPC. OVS plug-in [4] in CloudStack has the ability to create overlay networks with GRE tunnels across Xen and KVM hypervisors. Scope of the proposal is restricted to extending OVS plug-in in CloudStack to support the region level VPC and stretched layer 2 isolate/shared networks across the zones.

References

  1. http://www.networkworld.com/news/tech/2010/090310-layer2-data-center-interconnect.html 
  2. http://etherealmind.com/introduction-to-how-overlay-networking-and-tunnel-fabrics-work/

  3. http://blog.ipspace.net/2011/02/traffic-trombone-what-it-is-and-how-you.html

  4. OVS Tunnel Manager for CloudStack 

Functional Specification

This section details the necessary changes required in the network stack to achieve set goals of the proposal.

API/Service layer changes

  • introduce two new 'Connectivity' service capabilities 
    • 'stretchedl2subnet': this capability shall indicate that 'Connectivity' service provider has the capability to support stretched guest network across zones 
    • 'regionlevelvpc': this capability shall indicate that 'Connectivity' service provider has the capability to support region level VPC
  • createNetworkOffering API shall be enhanced to take 'stretchedl2subnet' as capability for 'Connectivity' service. Provider specified for the 'Connectivity' service shall be validated with capabilities declared by the service provider, to ensure provider supports 'stretchedl2subnet' capability.
  • createVPCOffering API shall be enhanced to support 'regionlevelvpc' as capability to 'Connectivity' service. Provider specified for the 'Connectivity' service shall be validated with capabilities declared by the service provider, to ensure provider supports 'regionlevelvpc' capability.
  • createVpc api shall be enhanced to make zone id as optional parameter. If vpc offering for the VPC does not support regionlevelvpc capability then createVpc api shall fail if zone id is not specified
  • createNetwork api shall be enhanced so that zone id in which network is getting created shall not be checked against the vpc zone id, if the VpcOffering of vpc has the 'regionlevelvpc' capability
  • deployVirtualMachine API shall permit specifying network Id from a different zone than the zone in which VM in being launched only if network offering of the network has the capability 'stretchedl2subnet'. 
  • when deployVirtualMachine API is called with network id different from the zone is which VM is launched (i.e stretching a network to new zone), there shall be check to ensure the target zone has the same 'Connectivity' service provider as that of network is configured and enabled 
  • listVPCs API shall return VpcResponse response which shall have the details of all the zones in which vpc spans
  • listNetworks API shall return NetworkResponse response which shall have the details of the zones in which network spans
  • listNetworkOfferings API shall return NetworkOfferingResponse response that shall contain 'Connectivity' service and  'stretchedl2subnet' capability details of the offering if they are configured
  • listVPCOfferings API shall return VpcOfferingResponse response that contain details of 'Connectivity' service and 'regionlevelvpc' capability
  • delete zone: requires check for stretched l2 networks and region level vpc

Note:

  • there is no explicit api to stretch a network to new zone. on deployVirtualMachine api call with network in different zone than the zone in which VM is being launched it is implicitly assumed network is extended to new zone
  • when createVPC is called without zone id, VPC shall be considered as region level and NO additional API calls are required to create network in any of the zones in the region

Network Orchestrator workflow changes

  • there shall be no changes required for the both network and VPC life cycle management by network orchestrator
  • stretching network: when a network is stretched to new zone, it is expected that 'Connectivity' service provider shall extend the overlay network to the host in the new zone, as part of Prepare call of NetworkElement.

DB schema changes

  • a new column span_multiple_zones shall be added to 'networks' table. when 'span_multiple_zones' is false, network in confined to a single zone and 'networks.data_center_id' shall be used to check the zone corresponding to the network. 

                 (ALTER TABLE `cloud`.`networks` ADD COLUMN 'span_multiple_zones' boolean default false)

  • when a network is created with network offering that has 'stretchedl2subnet' capability 'span_multiple_zones' flag shall be set for the network
  • a new table 'network_zones_map' shall be added that shall track the details of the zones a network is spanning

                 CREATE  TABLE `cloud`.'network_zones_map' (

                       `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT ,

                       `network_id` BIGINT(20) UNSIGNED NULL , 

                       `data_center_id` bigint(20) unsigned NOT NULL)

  • a new column 'region_level_vpc' shall be added to the vpc table, when 'region_level_vpc' is false, VPC in confined to a single zone and 'vpc.zone_id' shall be used to check the zone corresponding to the VPC.
  • when a VPC is created with offering that has 'regionlevelvpc' capability, 'region_level_vpc' column for the VPC shall be set to true
  • a new table 'vpc_zones_map' shall be added that shall track the details of the zones a VPC is spanning

                 CREATE  TABLE `cloud`.'vpc_zones_map' (

                       `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT ,

                       `vpc_id` BIGINT(20) UNSIGNED NULL , 

                       `data_center_id` bigint(20) unsigned NOT NULL)

OVS plug-in enhancements

  • add OVS as provider for 'Connectivity' service for the VPC.
  • OVS element shall declare 'stretchedl2subnet' as supported capability for 'Connectivity' service
  • OVS element shall declare 'regionlevelvpc' as supported capability for 'Connectivity' service

UI changes

  • ability to create VPC from the regions tab. This option should be present only if there is a VPC offering with 'regionlevelvpc' capability.
  • In the deployVM wizard, is user has a network created from network offering with 'strechedl2subnet' capability then network should be dispalyed irrespective which zone VM is being launched.
  • create network offering dialog box shall show a checkbox for selecting 'stretchedl2subnet' capability

Limitations

There are in-efficiencies in the default region level VPC deployment model proposed above. 

  • VPC VR is a single point of failure
  • Since VPC VR does the inter-tier traffic routing all inter-tier traffic has to go through the VPC VR
  • Since VPC VR is also network ACL provider, all traffic that enters and leaves a networks has to traverse through VPC VR
  • public traffic shall exit from single zone in which VPC VR in running as there is only one VPC VR providing NAT and PF services
  • acquired public ip's irrespective of tiers shall belong to the zone running VPC VR as public IP will be associated with the VPC VR. 

Future work

Future work and possible enhancements can be categorised as below. 

  • reducing cross zone traffic to permit only legitimate traffic cross the zones
    • distributed routing: In the current VPC model, inter-tier traffic is routed through the VPC VR. Problem with this approach if both source and destination tiers are in same zone, then traffic still needs to go to the VPC VR to get routed. This is undesirable if VPC VR is in different zone. So idea is to make vSwitch in the hypervisors to be the first and last hop in routing. So a packet from VM in tier gets routed by the hypervisor to destination hypervisor running the destination VM.  
    • network ACL to be evaluated at vSwitch. 
    • active-active VPC VR setup. Idea is if a VPC spans multiple zones, then each zone will have a VPC VR, which can be provider of source NAT and 1:1 NAT/PF services for the tiers/networks in that zone. This should serve two purpose. Public traffic can exit from the same zone and zone failures should not impact the running zones.
  • handling BUM traffic in region level VPC
    • ARP proxy at hypervisor to resolve the ARP request locally at the hypervisor level. 
  • No labels