OTV and VXLAN Layer 3 Connectivity Compared

Building large-scale L2 networks, including stretched L2 networks, seems to be all the rage these days, driven in part by virtual machine mobility (aka vMotion in VMware vSphere environments or XenMotion in Citrix XenServer environments). While this isn’t always a good idea—some might say it’s never a good idea—it is still something that many organizations are evaluating.

With the announcement of VXLAN at VMworld 2011, a new question seems to have arisen: can I use VXLAN instead of (insert some other protocol here) to create my stretched L2 networks? In this post, I’d like to compare the use of VXLAN with OTV (Overlay Transport Virtualization) for that very purpose. Of course, since VXLAN hasn’t actually been released, the discussion is partially theoretical.

My primary focus in this post will be how each of these protocols handles traffic patterns in the course of addressing the need for L2 connectivity over routed L3 networks.

First, let’s look at VXLAN. The figure below is taken from my revised L3 connectivity with VXLAN post, which I encourage you to read for more details.

As you can see, once a VM inside a VXLAN segment is migrated to a new network, the traffic “trombones” back and forth across the VXLAN segment because all traffic has to pass through a single vShield Edge (VSE) instance. This brings up a key limitation of VXLAN that I think is important to point out: VXLAN has an innate dependency on VSE, and VSE cannot be made redundant. That’s right—you can’t have VSE-specific failover functionality; instead, you have to rely on vSphere HA, VM Monitoring, and other features. That means failover times in the minutes, not seconds. What do you think that will do to network connections?

Now, let’s compare VXLAN’s L3 connectivity with OTV. First, here’s a diagram to show connectivity with OTV before a VM is migrated to the second site:

No real surprises here. I’ll just point out here that a typical OTV deployment following “recommended practices” will use redundant Nexus 7000 switches, as shown here. That’s a key advantage that OTV has over VXLAN—the ability to provide redundancy is there and redundancy is easily built into the solution, with failover times in the seconds (or better).

Now, take a look at the post-migration traffic flows with OTV:

In case you didn’t notice it, let me point out the obvious: note the lack of traffic tromboning here. Here’s how it’s accomplished (and documented in this blog post by Ron Fuller, aka @ccie5851 or VDCBadger to his friends):

  • Each Nexus 7000 pair runs HSRP.
  • The HSRP hello packets are filtered (blocked) from the OTV interfaces. This keeps the HSRP pairs in each data center from knowing about the pair in the other data center.
  • Each HSRP pair runs the same virtual IP (the default gateway for the 10.1.1.0/24 subnet).

In this configuration, once the VM migrates to the second site the HSRP pair at the second site won’t need to send traffic across the OTV link to reach the migrated VM. This appears to be a significant advantage to OTV—a greater knowledge of the routing topology allows OTV to be more intelligent about how traffic should be directed across/around the network.

<aside>Of course, this doesn’t address L3 routing concerns from subnets not directly attached to the Nexus 7000 pairs. For that, we’d need something like LISP.</aside>

As I see it—and networking experts are welcome to jump in if I’m mistaken—this gives OTV two key advantages over VXLAN:

  1. OTV, because it is running on physical networking equipment, is more intelligent than VXLAN about how traffic is directed/routed in/around/across a network. This can result in more efficient utilization of a data center interconnect as a result of reduced “traffic tromboning.”
  2. OTV, because it is running on physical networking equipment, can provide better redundancy and faster failover than VXLAN (which relies on single instances of VSE).

It’s entirely possible that if VXLAN ever makes it into physical network equipment that these advantages of OTV will be nullified.

It’s also important to point out that while OTV and VXLAN have some overlap in functionality they are partially targeted at solving different problems. While both protocols address L2 connectivity across L3 networks, VXLAN also addresses the exhaustion of the VLAN address space in larger networks (especially service provider networks). This is an issue that OTV does not try to address. However, it seems to me that OTV would co-exist better with a solution like Q-in-Q, which could (as far as I can tell) address the VLAN ID exhaustion issue.

Once again, I encourage network experts to chime in and share their views. If I’ve misstated something, please let me know. Questions, thoughts, and comments are always welcome.

Tags: , , ,

  1. Antonio’s avatar

    I think OTV and VXLAN have different uses. OTV help to extend L2 between datancenters and VXLAN extend L2 between VM hosts.

  2. Brad Hedlund’s avatar

    Hi Scott,
    An interesting thought came to mind after reading this..
    If you decided to implement VXLAN, you have precluded yourself from stretching the L2 segments within VXLAN to another DC using a physical network technology like OTV or VPLS. Because, there is no bridging mechanism available yet between VXLAN and a physical LAN.

    Those who choose to implement VXLAN will have the app stack active in one data center, and use the more proven and robust LB + GSLB method for failing applications over to another data center.

    Why on earth would anyone have an app stack fragmented between two data centers, facilitated by vmotion, and all the L2 interconnect complexities that come with it? That’s a rhetorical question btw, you don’t need to answer that. :-)

    Cheers,
    Brad
    (Dell Force10)

  3. Wade Holmes’s avatar

    Hi Scott,

    I don’t really think your comparison is accurate, as you are mixing the features of a product available today (VSE), and basing the architectural comparison of a released product with integration with the unknown futures of an unreleased technology (VXLAN). Unless you are basing the comparison on the capabilities of VSE when integrated with VXLAN in the future (which is publicly unknown at this time), the VXLAN, OTV comparison is null and void.

  4. slowe’s avatar

    Antonio, as I pointed out in the post, I recognize that VXLAN and OTV cannot be directly compared, since there are things that VXLAN was designed to do (like extend the VLAN address space) that OTV was not designed to do.

    Brad, you are—of course—correct. Since there is not (yet) termination of VXLAN segments on physical devices, the use of VXLAN and OTV/VPLS simultaneously for the same VLANs isn’t something organizations will be able to accomplish.

    Wade, I respectfully disagree. You’ll note in the article that I explicitly stated this discussion is partially theoretical since VXLAN has not yet been released. That being said, the mechanics of how we expect VXLAN to work are not likely to change between now and actual release, and therefore this discussion *is* valid. Since VMware has not seen fit to discuss any potential future features of VSE (such as redundancy), that information cannot be incorporated into this discussion. When VXLAN is finally released, we can revisit this discussion and see how the situation has changed at that time. Until then, or until more information is available, customers still need to make decisions about technology choices, and discussions like this are necessary. Thanks!

  5. dj’s avatar

    Great info.

    I would be interested in adding open standards such as VPLS or the upcoming E-VPN (formally MAC-VPN) to this comparison. I am never really all that thrilled with having to deploy proprietary standards, especially ones that seem to only be supported on only certain models/cards in a vendors portfolio.

    Plus MPLS based technology provides TE and sub-sec failovers I believe are so nice features.

  6. Ryan B’s avatar

    Is VSE really a requirement for VXLAN, or can you put any multihomed VM with routing functionality in its place?

  7. Wade Holmes’s avatar

    I agree that these discussions, and bringing to light technology and design considerations that should be made when evaluating VXLAN are healthy. The only part I had issue with is coming to a conclusion based on theoretical information. Until theory is made practice, no real conclusion can be made.

  8. Duncan’s avatar

    Nice article Scott. VMware is working on addressing some of the concerns mentioned in your article. I would suggest you communicate these to the appropriate people within our organization to ensure they are aware and can be correctly prioritized.

    Thanks,

  9. slowe’s avatar

    Excellent. Duncan, it’s good to know that VMware is working to address some of the concerns I mentioned in my article. If you would be so kind as to introduce me to the appropriate contacts within VMware, I’d be happy to discuss these issues with them. Thanks!

  10. Massimo’s avatar

    Scott, I won’t embark in a “what co-exists better with what” discussion but I believe one of the limitations of QinQ is that it doesn’t support duplicated MAC addresses… which may be a problem in particular environment where multiple layer2 segments are created to clone workloads in a test/dev scenario.

    Happy to be corrected if that is not the case but that’s what I remember off the top of my head.

    Cheers.

    Massimo.

  11. slowe’s avatar

    Massimo, I believe you are correct in that Q-in-Q would not address duplicate MAC addresses—an issue that VXLAN *will* address, if I’m not mistaken. Hence my statement that while there are differences between these protocols, those differences are (in part, at least) due to the fact that they strive to address different sets of problems. Thanks for your comment!

  12. Andre Leibovici’s avatar

    Scott,

    Nive write up, as always. I completely get the technical viewpoint, however I wonder what the licencing costs for both products would look like.

    According to http://etherealmind.com/nexus-7000-discount-otv-license-nxos/ the cost per N7K chassis would be ~USD$40000. Total of USD$160000

    Could this be a decision point for smaller orgs to adopt strechted clusters?

    Andre

  13. slowe’s avatar

    Andre, there is no question that licensing and acquisition costs will be a factor to consider. Thanks for pointing that out!

  14. Jon Hudson’s avatar

    Keep in mind that VXLAN is VERY early.

    And while I agree OTV is more mature, for many it’s dead on arrival since being pulled from the IETF standards process in early 2011. (personally very bummed by this)

    And while VXLAN has no charter yet ( or last I checked ) they are at least trying to creat a standards based solution in the IETF.

    I’ll always choose an A- solution that is standards based than a proprietary A+ solution.

    Jon

    @the_socialist
    (Brocade pays my bills, they do not however practice mind control)