Examining VXLAN

It’s taken me far too long to write this post, that’s for sure. Since the announcement of VXLAN at VMworld earlier in the year, I’ve been searching for additional information on these questions: “What is VXLAN? How does it fit into the broader networking landscape? Why did we need a new standard?” I talked to Cisco, I attended a VMworld session about networking futures, I talked to some of the authors of the IETF draft on VXLAN, I read (most of) the VXLAN draft, and I studied some existing protocols that one might think could have been put to use. I think I’m finally ready to try to address these questions.

What is VXLAN?

The answer to this question is taken directly from the IETF draft (the emphasis is mine):

This document describes Virtual eXtensible Local Area Network (VXLAN), which is used to address the need for overlay networks within virtualized data centers accommodating multiple tenants.

I think it’s important to keep this purpose in mind. While it’s a bit simplistic to state it this way, VXLAN is—essentially—a proposed standards-based replacement for the proprietary MAC-in-MAC encapsulation that is currently used in vCloud Director. Instead of using MAC-in-MAC encapsulation, VXLAN uses MAC-in-IP encapsulation, with multicast groups to handle MAC learning and unique UDP source ports to help with load balancing across multiple links. Yes, that’s a bit of a simplification, but I think it gets the main point across.

How does VXLAN fit into the broader networking landscape?

Trying to answer this question is what has occupied the majority of the time it’s taken to write this post. You can’t explain how VXLAN fits into the broader networking landscape without having a minimal understanding, at least, of what the rest of the networking landscape looks like. I had to dig in a bit deeper to MPLS, OTV, FabricPath/TRILL, and other standards/emerging standards. I’m sure that I’ve still omitted some technologies that should have been included, and I know that there are still (so much) more to learn about the technologies I did include.

Based on the information I was able to gather, the answer to this second question really builds on the answer to the first question. VXLAN only really addresses a few fundamental concerns:

  • A shortage of VLAN address space (the theoretical limit is 4094 VLANs, with many switches supporting fewer than that)
  • An inability to support multi-tenancy (both from a scale perspective as well as a separation perspective)
  • Problems with Layer 2 connectivity across disparate virtual data centers

VXLAN addresses these concerns in this way:

  • It adds a 24-bit VXLAN Network Identifier (VNI), expanding the realm of potentially unique identifiers to just shy of 17 million (16.7 million). This addresses any scale-based concerns of multitenancy.
  • It wraps Layer 2 frames in Layer 3 packets. This addresses the other part of any multitenancy concerns (VXLAN hides duplicate MAC addresses, duplicate IP addresses, and duplicate VLAN IDs found in separate VNIs). This also addresses the Layer 2 connectivity issues between disparate virtual data centers.

And that’s really about it. It doesn’t address Layer 2 multipathing/STP, it doesn’t address Layer 2 connectivity in the physical world (layer 2 connectivity is only preserved at the virtualization level), and it doesn’t address Layer 3 routing issues created by stretched VLANs and VM mobility designs. Which brings us to our third question…

Why did we need a new standard?

This answer builds on the previous two answers. Once you have a clear understanding of what VXLAN was designed to do, and how VXLAN fits into the rest of the networking protocols, then this answer is pretty easy:

  • If you’ve been reading my articles, you know already that VXLAN doesn’t preserve all forms of Layer 3 connectivity. Because it doesn’t, you still need protocols like OTV to address Layer 2/3 connectivity at the physical level.
  • Because you still need protocols like OTV to achieve VM mobility (for the time being, at least), you’re still going to need protocols like LISP to fix funny routing issues being caused by IP addresses from the same subnet existing in multiple locations at the same time.
  • Because VXLAN doesn’t address Layer 2 multipathing concerns, you still need protocols like TRILL and technologies like FabricPath.
  • Because using MPLS—which, by the way, would also address the 3 concerns VXLAN addresses—would require MPLS-enabled/MPLS-aware equipment throughout the data center, that would make an MPLS-based solution difficult for many enterprises to adopt. Using an IP encapsulation scheme means that existing physical networking equipment doesn’t have to change. (Although it might change—to add VXLAN support—at some point in the future.)

I was not a fan of VMware (apparently) driving the creation of an entirely new networking standard. However, as I dug into this, I began to see that while other solutions almost addressed these concerns, none of them were a really good fit. Yes, using MPLS probably would have worked. Using GRE might have worked (take NVGRE, for example, but that’s also a proposed new protocol). To really address the concerns head-on, though, required a solution that was written/created expressly for that purpose, and that’s VXLAN. It’s just important, though, to really understand what VXLAN is as well as what VXLAN isn’t. Otherwise, you’ll find yourself trying to fit VXLAN to a solution for which it really wasn’t intended—which, by the way, was why VXLAN was created in the first place.

Comments, corrections, and clarifications are always welcome!

Tags: , , ,

6 comments

  1. Chris’s avatar

    Scott,

    Thanks for taking the time to build the background information. All of these competing/overlapping/gap-filling solutions are making our to-do reading lists longer and longer.

    -chris

  2. amit’s avatar

    not sure why you are concerned about multipathing with vxlan. since vxlan relies on IP routing, ecmp on routing protocols should be sufficient for multipathing and loop detection.

  3. Joe Botchagaloop’s avatar

    Think, McFly, think! (teasing you) The MAC-in-IP encap is taking place in the hypervisor. You can move the L3 boundary down to the top-of-rack/access layer, so you dont need to worry about TRILL or FP…thats one of the biggest benefits of VxLAN.

  4. TJ’s avatar

    Very nice write-up, thanks!

    One question: Any IPv6 support (MAC-in-IPv6/UDP)?

    For VLAN stretching / migration of hosts across physical sites, I wonder …
    OTTOMH, I wonder, would (virtual?) routers help here?
    e.g. – 2 routers at each of 2 physical instances of the VLAN (2 local for local redundancy and a pair of those at each site), with all of them being “active-active” and participating in local routing protocols at each site. If traffic goes to a router not at the local site, that would still be OK – it is in the “right VLAN” and could talk … am I right?

    /TJ

  5. Pawel’s avatar

    Thanks for this article – a good start for me to learn about VXLAN.

    Pawel
    pawellakomski.pl

  6. Reshad Rahman’s avatar

    Hi Scott,

    Thanks for the write-up. I know I’m 1 year late :-)

    Can you explain how/why you’d use VXLAN with TRILL. As another reader commented, since the L2 frames are encapsulated with an L3 header, wouldn’t we benefit from L3 ECMP?

    Regards,
    Reshad.

Comments are now closed.