Thinking Out Loud: What if vSphere Supported FabricPath?

Since attending Cisco Live 2011 in Las Vegas earlier this year (see my summary list of blog posts), my mind has been swirling with ideas about how various technologies might work (or not work) together. In the first of a series of “Thinking Out Loud” posts in which I’ll attempt to explore—and spur discussion regarding—how certain networking technologies might integrate with virtualization technologies, I’d like to explore this question: “What if vSphere had FabricPath support?”

<aside>By the way, if you’re not familiar with the “Thinking Out Loud” posts, the key point is simply to stimulate discussion and encourage knowledge sharing.</aside>

If you’re unfamiliar with FabricPath, you might find my session blog of a FabricPath session at Cisco Live helpful to bring you up to speed. Also, while this post focuses on potential FabricPath integration, I think many of the same points would apply to TRILL and potentially SPB.

When I first started considering this idea, a few things came to mind:

  1. The first thing that came to mind was that the quickest, most “natural” way of bringing FabricPath support to vSphere would be via the Nexus 1000V. While I’m sure that the NX-OS code base between the Nexus 1000V and the Nexus 7000 (where FabricPath is available) is dramatically different, it’s still closer than trying to add FabricPath support directly to vSphere.
  2. The second thought was that, intuitively, FabricPath would bring value to vSphere. After all, FabricPath is about eliminating STP and increasing the effective bandwidth of your network, right?

It’s this second thought I’d like to drill down on in this post.

When I first started considering what benefits, if any, FabricPath support in vSphere might bring, I thought that FabricPath would really bring some value. After all, what does FabricPath bring to the datacenter network? Multiple L2 uplinks between switches, low latency any-to-any switching, and equal cost multipathing, to name a few things. Surely these would be of great benefit to vSphere, wouldn’t they? That’s what I thought…until I created a diagram.

Consider this diagram of a “regular” vSphere network topology:

Non-FP-aware network

This is fairly straightforward stuff, found in many data centers today. What would bringing FabricPath into this network, all the way down to the ESXi hosts, give us? Consider this diagram:

FP-aware network 1

We’ve replaced the upstream switches with FabricPath-aware switches and put in our fictional FP-aware Nexus 1000V, but what does it change? From what I can tell, not much changes. Consider these points:

  • In both cases, each Nexus 1000V has two uplinks and has the ability to actively use both uplinks. The only difference the presence of FabricPath would make, as far as I can tell, is in the selection of which uplink to use.
  • In both cases, host-to-host (or VM-to-VM) traffic still has to travel through the upstream switches. The presence of FabricPath awareness on the vSphere hosts doesn’t change this.

That second point, in my mind, deserves some additional thought. FabricPath enables multiple, active L2 links between switches, but in both of the topologies shown above the traffic has to travel through the upstream switches. In fact, the only way to change the travel patterns would be to add extra host-to-host links, like this:

FP-aware network 1

OK, if these extra host-to-host links were present, then the presence of FabricPath at the ESXi host layer might make a difference. VM-to-VM traffic could then just hop across without going through the upstream switches. All is good, right?

Not so fast! I’m not a networking guru, but I see some potential problems with this:

  • This approach isn’t scalable. You’d have to have host-to-host links between every ESXi host, which means for N hosts you’ll need (N-1) uplinks. That would limit the scalability of “fabric-connected” vSphere hosts since there just isn’t enough room to add that many networking ports (nor it is very cost effective).
  • Does adding host-to-host links fundamentally change the nature of the virtual switch? The way virtual switches (or edge virtual bridges, if you prefer) operate today is predicated upon certain assumptions; would these assumptions need to change? I’m not sure about this point yet; I’m still processing the possibilities.
  • What does this topology really buy an organization? Most data center switches have pretty low L2 switching latencies (and getting lower all the time). Would a host-to-host link really get us any better performance? I somehow don’t think so.

In the end, it seems to me that there is (currently) very little value in bringing FabricPath (or TRILL or SPB) support all the way down to the virtualization hosts. However, I’d love to hear what you think. Am I wrong in some of my conclusions? My understanding of FabricPath (and related technologies) is still evolving, so please let me know if something I’ve said is incorrect. Speak up in the comments!

Tags: , , , ,

  1. Craig Johnson’s avatar

    I tend to agree with you – I’m not sure what having Fabricpath/TRILL at a host level would buy you. However, the nice thing about Fabricpath is that it doesn’t require you to conform to any sort of topology – star, mesh, whatever – Fabricpath will shortest path route it. So you wouldn’t have to scale to every end host if you wanted host to host connectivity, it could be done on a arbitrary basis.

    I don’t see how it would change the nature of the vSwitch from a switching/routing perspective. The largest benefit that I see is that you would drastically lower your mac address counts if you are on a large L2 domain.

    From a software/vSwitch perspective, the cpu penalty that the mac-in-mac encapsulation would have would probably be the biggest impact. That’s all done in hw at a Nexus 7k level – I wonder how much of a latency hit it would be to do this at a 1000v.

  2. slowe’s avatar

    Craig, great comments! You’re right about the host-to-host links; you could just add those on an “as needed” basis for your “core” ESXi hosts, almost like creating a core-edge virtualization fabric. (Hmm…that’s an interesting thought.)

    I also agree that the CPU overhead for FabricPath on the ESXi hosts is certainly a major consideration. However, with increasing CPU core count and increasing CPU efficiencies, this might not as big of a deal as we initially think. It’s my experience so far that most customers aren’t CPU-bound, they’re memory-bound.

    Thanks for taking the time to comment!

  3. Andy Sholomon’s avatar

    Scott,

    While I agree with your blog/thinking out loud content, I think you need to look a little further “up” for benefits of running FP on the host. Obviously this is mostly a mental exercise at this point, but with FP enabled on the host you can create some benefits for the upstream switched architecture.

    One thing would be the fact that with an FP host you don’t have to trunk every VLAN down to every host. That lowers the number of L2 logical ports that the upstream switch has to support in a PVST environment. This is a good thing because you may have some interesting limitations running MST with a large vSphere environment.

    Having said that the number of “FP neighbors” supported by your upstream switch would end up being very high, and I am not sure that is a good thing either, but we are thinking out loud here, right?

    Great and interesting post. My head is now swirling with ideas ;-)

  4. dparrott’s avatar

    Funny, the first thing I thought of after your article was the return of “Thin Net” and BNCs.

    But what I think would be interesting is some sort of “ring” capability. A 10-40 Gb bus that runs in a serial fashion from host to host. You wouldn’t need as many NICs as servers, but rather a 2 or 4 port adapter wired in series which VMware could configure as a Host Bus.

    This back plan could then be used to route traffic directly between hosts for things like FT, HA, and VMotion. Network security would dictate if it could be used for east/west traffic.

    Another thought was about the push in direction. For some discussions, it seems the network is pushing down and driving virtualization to a node configuration. Others contend that virtualization is pushing up and consuming the network. It will be interesting, in the next few years, to see how the balance between fabric (converged or not) and virtualization swings and if it stabilizes.

  5. slowe’s avatar

    Andy, I can see your concerns about the number of FP neighbors. I don’t suppose that FP topologies would help here, would they? As for decreasing the number of L2 logical ports, that’s something I’ll have to ponder on a bit more…I’m not yet clear on the interaction between FP/TRILL and VLANs.

    By the way, I’m glad I could make your head swirl with ideas…stimulating thought and discussion is always a good thing!

    DParrott, you’re picking up on the idea of a “virtualization fabric,” where Ethernet (via FP or TRILL or some variation) becomes a backplane that vSphere can leverage across multiple hosts. It’s an idea that occurred to me as well, and one that I might explore in a future “Thinking Out Loud” post. :-)

  6. Craig Johnson’s avatar

    FP/TRILL definitely can enable a kind of fabric (eventually – only 64 switch ID’s right now, so that kinda limits us). The logical port count lowering is because it terminates STP at each FP boundary – there is control plane issues with STP once you get into the 10s of thousands of logical ports.

    VLANs are pretty simple in FP – you just decide if you want a vlan to use fabricpath or not. If not, it won’t transit over it and it will be limited to being local.

    Now, admittedly, all this fabric talk could be a nightmare for administrators – it’s hard enough in FP with all the ECMP and trees and such to figure out exactly where the traffic is coming and going. Adding east-west between hosts would only complicate that.

  7. Ivan Pepelnjak’s avatar

    Interesting idea … however, if you do go and expand the vSwitch control plane and introduce additional encapsulation, it would make way more sense to use MAC-over-IP, LISP or MPLS than Fabric Path or TRILL. Remember – whatever you do, as long as you don’t get rid of broadcast and unknown unicast flooding, bridging doesn’t scale.

    Second, you should really talk about TRILL, not Fabric Path ;)

    Last but definitely not least, you don’t just decrease the number of logical L2 ports, you might get rid of them completely. In a FP/TRILL environment, only the edge RBridges need to know about VLANs, the core RBridges don’t really care, they just route TRILL datagrams based on destination RBridge ID.

  8. slowe’s avatar

    Ivan, thanks for taking the time to comment. I know I should be talking about TRILL, but FP and TRILL are similar enough (to my understanding, at least) that the basic idea is the same. I know there are differences in the two, but some of the considerations would be the same I think.

    As for other encapsulations, I have some additional “Thinking Out Loud” posts in the works… :-)

    Thanks!

  9. Jon -- @the_socialist’s avatar

    Your powers of future sight are astounding Scott ;-)

    Myself and a few others are working on an IETF draft for hypervisor participation in a TRILL Fabric. Not a full RBridge, but maybe a vrBridge of sorts.

    Happy to discuss in person at VMworld if we run in each other.

    Oh, and as side note on the diff TRILL implementations
    FabricPath is TRILL without using the TRILL egress frame format. Then it has addition features added on top. Brocade’s (my day job) VCS uses the TRIlL frame fomat etc, but uses FSPF instead of ISIS in the control plane and also bolts addition features on top. Now that the TRILL base protocol is published hopefully we’ll see VCS&FabricPath move closer to the standard for eventual interoperability. Fulcrum, Marvell & Broadcom are now or are shipping be shipping TRILL aware chips. Hopefully in 10yrs STP will be a topic of compsci history ;-)

  10. Michal’s avatar

    Hi Scott, good to read some interesting explorations around that topic.
    I was also thinking recently on why the 1000V doesn’t have directly any MAC in IP type features such as OTV ( probably because of the performance degradation). It would be a nice option I think for the VLAN extensions being controlled uniquely at the V-switch(v-router) level and maybe having back the full IP routed LANs as an alternative LAN architecture.

    Nice blog !
    cheers ,
    michal

  11. Chris’s avatar

    What if the topology wasn’t physically desperate esx hosts but a blade chassis? Would using the backplane of the chassis as an interconnect work? If not, what about blade switches, could utilizing these ports for cross chassis or blade connectivity work? Just “thinking out loud”;)

  12. slowe’s avatar

    Chris, I don’t think that using blades in a blade chassis would really make any difference. Many blade chassis systems have internal “cross links” that almost make it a full mesh for blades in the chassis to talk to other blades in the same chassis. It seems to me—although I’ll be the first to admit I could be wrong—that FabricPath/TRILL wouldn’t really buy you much in that situation, either.

  13. Alex’s avatar

    Without FP on the virtual switch you need to map virtual Ethernet ports to VLANs which are then mapped to FTags on the ingress FP switch. If the virtual switch was FP aware it could directly map virtual Ethernet ports of VMs to FTags. This would definitely reduce some of the network management complexity in FP-only Virtualization environments, you can imagine the mess of having both FP and non-FP traffic going through a virtual switch. But this is just a small gain for all the trouble of implementing FP in the virtual switch. The real benefit would be if you could transport FP traffic over L3 IP links this way you could have multiple DCs connected via L3 links and move VMs from one to the other transparently. That would would be great (just thinking aloud). That would be possible only if FP used IP encapsulation like VXLAN.

  14. Guillaume BARROT’s avatar

    By connecting the ESXi between them, with a sort of loop topology, you’ve just reinvented the switch stack (2960S Flexstack is just that, in a proprietary way).

    For the unique control point you’ve already have dVS and or Nexus1000v, so that’s a stack of Nexus1000v. In an SDN way because your control plan is out of the dataplane.

    Imagine combining this with a “L2 anycast” sort of thing, where every L2/L3 switches can be the gateway (same @Mac) and we even have a better stack.

    I agree with Ivan on the unicast flood and broadcast issue, but I don’t see any good solution (may be using a light TRILL daemon in every kernel of every host, and use it as a way to send hello message in CLNS, to kill the ARP/Broadcast message, but that’s scifi :D)

Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>