Building large-scale L2 networks, including stretched L2 networks, seems to be all the rage these days, driven in part by virtual machine mobility (aka vMotion in VMware vSphere environments or XenMotion in Citrix XenServer environments). While this isn’t always a good idea—some might say it’s never a good idea—it is still something that many organizations are evaluating.
With the announcement of VXLAN at VMworld 2011, a new question seems to have arisen: can I use VXLAN instead of (insert some other protocol here) to create my stretched L2 networks? In this post, I’d like to compare the use of VXLAN with OTV (Overlay Transport Virtualization) for that very purpose. Of course, since VXLAN hasn’t actually been released, the discussion is partially theoretical.
My primary focus in this post will be how each of these protocols handles traffic patterns in the course of addressing the need for L2 connectivity over routed L3 networks.
First, let’s look at VXLAN. The figure below is taken from my revised L3 connectivity with VXLAN post, which I encourage you to read for more details.

As you can see, once a VM inside a VXLAN segment is migrated to a new network, the traffic “trombones” back and forth across the VXLAN segment because all traffic has to pass through a single vShield Edge (VSE) instance. This brings up a key limitation of VXLAN that I think is important to point out: VXLAN has an innate dependency on VSE, and VSE cannot be made redundant. That’s right—you can’t have VSE-specific failover functionality; instead, you have to rely on vSphere HA, VM Monitoring, and other features. That means failover times in the minutes, not seconds. What do you think that will do to network connections?
Now, let’s compare VXLAN’s L3 connectivity with OTV. First, here’s a diagram to show connectivity with OTV before a VM is migrated to the second site:

No real surprises here. I’ll just point out here that a typical OTV deployment following “recommended practices” will use redundant Nexus 7000 switches, as shown here. That’s a key advantage that OTV has over VXLAN—the ability to provide redundancy is there and redundancy is easily built into the solution, with failover times in the seconds (or better).
Now, take a look at the post-migration traffic flows with OTV:

In case you didn’t notice it, let me point out the obvious: note the lack of traffic tromboning here. Here’s how it’s accomplished (and documented in this blog post by Ron Fuller, aka @ccie5851 or VDCBadger to his friends):
- Each Nexus 7000 pair runs HSRP.
- The HSRP hello packets are filtered (blocked) from the OTV interfaces. This keeps the HSRP pairs in each data center from knowing about the pair in the other data center.
- Each HSRP pair runs the same virtual IP (the default gateway for the 10.1.1.0/24 subnet).
In this configuration, once the VM migrates to the second site the HSRP pair at the second site won’t need to send traffic across the OTV link to reach the migrated VM. This appears to be a significant advantage to OTV—a greater knowledge of the routing topology allows OTV to be more intelligent about how traffic should be directed across/around the network.
<aside>Of course, this doesn’t address L3 routing concerns from subnets not directly attached to the Nexus 7000 pairs. For that, we’d need something like LISP.</aside>
As I see it—and networking experts are welcome to jump in if I’m mistaken—this gives OTV two key advantages over VXLAN:
- OTV, because it is running on physical networking equipment, is more intelligent than VXLAN about how traffic is directed/routed in/around/across a network. This can result in more efficient utilization of a data center interconnect as a result of reduced “traffic tromboning.”
- OTV, because it is running on physical networking equipment, can provide better redundancy and faster failover than VXLAN (which relies on single instances of VSE).
It’s entirely possible that if VXLAN ever makes it into physical network equipment that these advantages of OTV will be nullified.
It’s also important to point out that while OTV and VXLAN have some overlap in functionality they are partially targeted at solving different problems. While both protocols address L2 connectivity across L3 networks, VXLAN also addresses the exhaustion of the VLAN address space in larger networks (especially service provider networks). This is an issue that OTV does not try to address. However, it seems to me that OTV would co-exist better with a solution like Q-in-Q, which could (as far as I can tell) address the VLAN ID exhaustion issue.
Once again, I encourage network experts to chime in and share their views. If I’ve misstated something, please let me know. Questions, thoughts, and comments are always welcome.
Tags: Networking, OTV, Virtualization, VXLAN
-
I think OTV and VXLAN have different uses. OTV help to extend L2 between datancenters and VXLAN extend L2 between VM hosts.
-
Hi Scott,
An interesting thought came to mind after reading this..
If you decided to implement VXLAN, you have precluded yourself from stretching the L2 segments within VXLAN to another DC using a physical network technology like OTV or VPLS. Because, there is no bridging mechanism available yet between VXLAN and a physical LAN.Those who choose to implement VXLAN will have the app stack active in one data center, and use the more proven and robust LB + GSLB method for failing applications over to another data center.
Why on earth would anyone have an app stack fragmented between two data centers, facilitated by vmotion, and all the L2 interconnect complexities that come with it? That’s a rhetorical question btw, you don’t need to answer that.
Cheers,
Brad
(Dell Force10) -
Hi Scott,
I don’t really think your comparison is accurate, as you are mixing the features of a product available today (VSE), and basing the architectural comparison of a released product with integration with the unknown futures of an unreleased technology (VXLAN). Unless you are basing the comparison on the capabilities of VSE when integrated with VXLAN in the future (which is publicly unknown at this time), the VXLAN, OTV comparison is null and void.
-
Great info.
I would be interested in adding open standards such as VPLS or the upcoming E-VPN (formally MAC-VPN) to this comparison. I am never really all that thrilled with having to deploy proprietary standards, especially ones that seem to only be supported on only certain models/cards in a vendors portfolio.
Plus MPLS based technology provides TE and sub-sec failovers I believe are so nice features.
-
Is VSE really a requirement for VXLAN, or can you put any multihomed VM with routing functionality in its place?
-
Scott, I won’t embark in a “what co-exists better with what” discussion but I believe one of the limitations of QinQ is that it doesn’t support duplicated MAC addresses… which may be a problem in particular environment where multiple layer2 segments are created to clone workloads in a test/dev scenario.
Happy to be corrected if that is not the case but that’s what I remember off the top of my head.
Cheers.
Massimo.
-
Scott,
Nive write up, as always. I completely get the technical viewpoint, however I wonder what the licencing costs for both products would look like.
According to http://etherealmind.com/nexus-7000-discount-otv-license-nxos/ the cost per N7K chassis would be ~USD$40000. Total of USD$160000
Could this be a decision point for smaller orgs to adopt strechted clusters?
Andre
-
Keep in mind that VXLAN is VERY early.
And while I agree OTV is more mature, for many it’s dead on arrival since being pulled from the IETF standards process in early 2011. (personally very bummed by this)
And while VXLAN has no charter yet ( or last I checked ) they are at least trying to creat a standards based solution in the IETF.
I’ll always choose an A- solution that is standards based than a proprietary A+ solution.
Jon
@the_socialist
(Brocade pays my bills, they do not however practice mind control)




14 comments
Comments feed for this article