Thinking Out Loud: HP Flex-10 Design Considerations

Along with a number of other projects recently, I’ve also been spending time working with HP Virtual Connect Flex-10. You may have seen these (relatively) recent Flex-10 articles:

Using VMware ESX Virtual Switch Tagging with HP Virtual Connect
Using Multiple VLANs with HP Virtual Connect Flex-10
Follow-Up About Multiple VLANs, Virtual Connect, and Flex-10

As I began to work up some documentation for internal use at my employer, I asked myself this question: what are the design considerations for how an architect should configure Flex-10?

Think about it for a moment. In a “traditional” VMware environment, architects will place port groups onto vSwitches (or dvPort groups onto dvSwitches) based on criteria like physical network segregation, number of uplinks, VLAN support, etc. In a Flex-10 environment, those design criteria begin to change:

  • The number of uplinks doesn’t matter anymore, because bandwidth is controlled in the Flex-10 configuration. You want 1.5Gbps for VMotion? Fine, no problem. You want 500Mbps for the Service Console? Fine, no problem. You want 8Gbps for IP-based storage traffic? Fine, no problem. As long as it all adds up to 10Gbps, architects can subdivide the bandwidth however they desire. So the number of uplinks, from a bandwidth perspective, is no longer applicable.
  • Physical network segregation is a non-issue, because all the FlexNICs share the same LOM and will (as far as I know) all share the same uplinks. (In other words, I don’t think that LOM1:a can use one uplink while LOM1:b uses a different uplink.) You’ll physically distinct NICs in order to handle physically segregated networks. Of course, physically segregated networks will present a bit of challenge for blade environments anyway, but that’s beside the point.
  • VLAN support is a bit different, too, because of the fact that you can’t map overlapping VLANs to FlexNICs on the same LOM. In addition, because of the way VLANs work within a Virtual Connect environment, I don’t see VLANs being an applicable design consideration anyway; there’s too much flexibility in how VLANs are presented to servers for that to drive how networking should be set up.

So what are the design considerations for Flex-10 in VMware environments, then? What would drive an architect to specify multiple FlexNICs per LOM instead of just lumping everything together in a single 10Gbps pipe? Is bandwidth the only real consideration? I’d love to hear what others think. Let me hear your thoughts in the comments—thanks!

Tags: , , , ,

  1. Brad Hedlund’s avatar

    Scott,

    You asked: “What would drive an architect to specify multiple FlexNICs per LOM instead of just lumping everything together in a single 10Gbps pipe?”

    Quality of Service is a key factor. The architect needs to consider what QoS capabilities exist within Flex-10 before making the decision to lump everything together on one LOM.

    If everything was lumped together on one 10Gbps pipe, does Flex-10 have the ability to provide a minimum guaranteed bandwidth to say the VMKernel or Service Console?

    If said QoS capabilities do not exist in Flex-10, then there is no other choice but to use a hard partitioning of bandwidth via rate-limited FlexNICs in order to provide any minimum guaranteed bandwidths to critical services.

    Cheers,
    Brad

  2. slowe’s avatar

    Brad,

    QoS (or bandwidth concerns) was what I was alluding to when I asked if bandwidth was the only consideration. You’re exactly correct—if I need to guarantee a specific amount of bandwidth in a Flex-10 environment then I do need to partition that off onto one or more FlexNICs. What I’d really like to know, though, is what OTHER considerations may drive the need for additional FlexNICs.

  3. Brad Hedlund’s avatar

    Scott,

    Some database systems such as Oracle RAC require the use of separate NICs for clustering and heartbeat traffic, so FlexNICs could be provisioned to satisfy that requirement.

    Cheers,
    Brad

  4. Justin’s avatar

    Brad,
    But that would only apply for a physical Oracle RAC node, because in a VM you could just add a second vNic to the VM.
    Scott,
    As for your comment about not being able to use separate uplinks, that’s not entirely true. I can have LOM1a going out one link in the back of the VC module and LOM1b going out another link.
    We’ve used this in environments where we are implementing 10GbE iSCSI on a separate physical network and the main network is only 1GbE. Setup two NICs on each LOM (one at 2Gb, one at 8Gb) and have all the VM-related traffic going out the “a” NICs and all iSCSI traffic going out the “b” NICs.

  5. Carl S.’s avatar

    Scott,

    There is no doubt that Flex-10 enables a new way of LAN thinking with ESX. Having two 10Gb pipes to carve up into up to 8 NICs provides some new creative ways to deal with VMs, and VMware architects have not had these options before.

    Before even addressing the cool things you can do, we need to keep in mind that Flex-10 supports the current ESX 3.5 version in addition vSphere, and is totally transparent – no drivers, licensing, or extra support is required in the ESX kernel. A very high percentage of customers using blades (and moving to Flex-10) are existing ESX 3.5 customers – who want greater bandwidth and flexibility without having to go through an arduous software upgrade cycle.

    QoS is a nice thing to have – again keeping in mind that the Nexus 1000v and QoS are not available in ESX 3.5 – QoS is rarely a decision point with regard to current Flex-10 implementations. In regard to assigning the same VLAN across multiple FlexNICs (on the same LOM), you must realize that a FlexNIC is NIC hardware, not a switch. In order for the same VLAN to be on multiple FlexNICs on the same LOM, the NIC itself would have to be able to handle some switching functions. At a Flex-10 module level it is an Ethernet rule-breaker to have the same packet reflected back up the same port it came in on (can you say Ethernet loop?). By disallowing the same VLAN across FlexNICs on the same LOM you remove the possibility of creating a loop. If the 802.1 Ethernet standard gets expanded to include VEPA, then this would not be an issue in the future as the new standard would provide for that. At present we are restricted to following all of the current 802.1 rules.

    Since all 10Gb enabled blades can be leveraged well with Flex-10, you can pretty much carve up to 20Gb of dedicated LAN bandwidth up per server – and if you use Fiber Channel you still have dual 8Gb (or 4Gb) Fiber Channel connectivity through the VC-FC modules. The FC modules now come in 8Gb (8 – 8Gb uplinks) and 4Gb (4 – 4Gb uplinks) and modules. That is huge bandwidth, and VMotion does not have to be included the LAN bandwidth allocation if the VM clusters are within a single Virtual Connect domain.

    With respect to what I have seen customers do, generally they create a small pipe for the Service Console and a much larger pipe for VMotion. The pipe created for VM traffic is usually more than 1Gb, and frequently customers will tunnel the tags through on that connection simply because they want the vSwitch to handle all of the tagging.

    Here is the nice part with regard to VMotion and multiple blade chassis. If the customer stacks the Virtual Connect modules ACROSS multiple chassis, then a dedicated high-bandwidth VMotion network can span across as many as 4 chassis, while not requiring any connection to the core network. The high speed VMotion network uses the stacking links to move traffic between the chassis. Then customers can create MULTIPLE VMotion networks if they want different clusters of VMware servers to span multiple chassis. The uplinks to the core then carry ONLY VM and console traffic, not VMotion. Very doable. Stacking also allows any FlexNIC on any blade to connect to any VLAN on any uplink or network within a 4 chassis stack. Very powerful.

    Scott, you commented about IP based storage using a fat pipe as well. In general practice I have not seen this approach used, but it is viable to dedicate a large amount of bandwidth for an IP storage network on one or more FlexNICs. You could then also create some dedicated 10Gb uplinks from the Flex-10 into the storage core to support IP storage alone. Remember that each Flex-10 module has capability for 70Gb of uplinks if you dedicate only one link to stacking. More complex designs across chassis could reduce that to 60Gb, but basically you have a lot of choices for bringing tons of bandwidth in for the network irrespective of VMotion. Because so much bandwidth is available QoS becomes less important.

    In the future Flex-10 will continue to incorporate new features, and the features requested by customer most are generally the highest priority.

    Full disclosure – HP employee

  6. Casper42’s avatar

    I agree with what the others said about QoS and sizing being the main consideration.

    If it wasn’t for that, you could simply use a 10GB Pass Through (a little birdy told me these do exist somewhere) and then use VLAN Tagging on Console, VMKernel and Guest to get everyone on the same page.

    While I agree the power that Carl mentioned about stacking 4 chassis does sound nice, we’re still dealing with single 10GB interconnects on our Cisco back end, so having more than 10GB out of the entire stack won’t matter much.
    However on the flip side, the thought of a 20/30/40 Gbps stack chain among 4 chassis to isolate things like vMotion or IP Storage is a great idea and one I had never really considered. I figured 10Gbps uplink means your stack wouldn’t need to exceed 10.

    Ultimately I am in the same boat as you are though and working with Flex-10 currently and will soon be working with Xsigo on a similar product but that can also include your FC SAN on the same cable. But just last week attended a Dog n Pony headlined by Cisco where they said they can not only do all this, with FC SAN as well, but will soon have a Cisco Nexus 4000 series switch that will go in the back of the Chassis, and in combination with a CNA (10GB Converged Networking Adapter) dropped into the Blade, you have basically the same solution but the ability to include FC SAN in the mix.

    What would be really nice is to see HP offer an alternate configuration for the BL490c G6 and its brethren that has a CNA onboard in place of a Flex-10. But seeing the cost of the Flex-10 Interconnect, somehow I think that is a dream that will never be realized.

    I’m going to add you to my Google Reader since this seems to be at least the 2nd post I have made to your blog and I look forward to any future updates you have on this subject since I’m going through the same thing.

  7. julianwood’s avatar

    Although there’s a lot of talk about how much bandwidth you are able to get to your individual blades I would be suprised to see people actually sending 2 x 10G to each blade. This seems a massive amount of bandwidth which the hosts would probably not utilise.

    I think the power of the blade architecture is aggregating all this bandwidth into less uplinks so you could have an entire chassis with 2 x 10Gb uplinks.
    In our network profiling we think that is enough bandwidth to drive an entire chassis full of blades, running SC, VM, VMotion and NFS traffic (NetApp) through two uplinks.

    We used to have 8 Gigabit nics per host (2 x SC, 2 x VMotion, 2 x NFS, 2 x VM). Granted this was the simple implementation when we did this a few years ago but times have moved on and I’m sure we could now do the same with 4 x Gigabit nics per host ( 2 x SC/VM, 2 x VMotion/NFS – with portgroups sending traffic over separate active links).

    Well, add this up and for 16 ESX hosts originally we would have needed 128 Gig links, now we could get away with 64 Gig links (with 4 per host).

    Using HP blades we can now use 2 x 10Gb uplinks per chassis. A massive reduction in cabling with associated complexity / cost.

    The problem though with aggregating all this traffic is how do you guarantee anything. You may allocate 1 Gb to your VMs and 1Gb to your VMotion but this already adds up to 32Gb, more than 2 uplinks being clever with your Portgroups.

    We’re trying to work out what’s best. Do we just have 2 x 10Gb Nics per host with all traffic giong through these links and no Flexing. What do we share? I would like to ensure NFS traffic and VM traffic isn’t compromised so I could put them in different portgroups so they use different uplinks but where would I put VMotion as this could be a bandwidth hog?
    I don’t want to add more uplinks just to satisfy VMotion as I don’t really need more bandwidth normally.

    Also this means we don’t really use the Flex-Nic part of Virtual Connect as even though we can create multiple networks they would share the same uplinks so as per my original comments allocating bandwidth per Flex-Nic is a little meaningless.

    What I would love though is HP to provide some reference architecture for ESX with Flex-10 and Virtual Connect. There’s just too much confusing information out there and getting this design correct is critical. With having more eggs in a basket if you have to change the way you build your basket it now has a lot more impact.

    HP’s site is also very confusing and clumsy, they have forum posts in at least two separate locations, a sort of separate blade solutions are but I still haven’t found simple things like the best way to integrate the c7000 with HPSIM.

    HP should really take a leaf out of NetApp’s book who are excellent at their Technical Reports which are incredibly useful in setting up their kit.

    Scott, thanks again for all your work.

  8. bob vance’s avatar

    @carl

    With one chassis, using normal switch modules, we can have Vmotion network not leave the switch into the core, and therefore no uplink cabling. Of course we have to have a switch module for each NIC to be used, so, typically, 2x for Vmotion network.

    Are you saying that this is not true for the VC.
    I.e., that Vmotion would require going to an external switch and back to the other ESX (absent extra chassis as you described)?

  9. Stuart Thompson’s avatar

    Another reason for having multiple flexNICS in your design is the VLAN limit on a per FlexNIC basis.

    Although the Shared Uplink Set supports 64 VLANS and within a few weeks this will increase to 128, there appears to be a 28 VLAN limit applied per FlexNIC when using VLAN mapping.

    So in my environment where we have 35 different VLANS presented to the ESX servers, we need to split these down to different flexNICs, whilst ensuring they are on different LOMS and obvioulsy split to seperate VC modules.

  10. Ken Mitchell’s avatar

    I hope it is possible in vConnect/Flex-10 to use vlan “tunneling” (L2???) and let the vswitch and the physical upstream switch handle the tags. But so far in my testing I cannot get NFS working through a vConnect tunnel. The 28 vlan limit per flexNIC is not an option in our environment.
    In my old production environment (ESX 3.5) we simply used ethernet pass-through modules on our blade chassis, and maybe I’m wrong but I would think this would amount to the same thing as a vConnect tunnel.