OTV

You are currently browsing articles tagged OTV.

I like to spend time examining the areas where different groups of technologies intersect. Personally, I find this activity fascinating, and perhaps that’s the reason that I find myself pursing knowledge and experience in virtualization, networking, storage, and other areas simultaneously—it’s an effort to spend more time “on the border” between various technologies.

One border, in particular, is very interesting to me: the border between virtualization and networking. Time spent thinking about the border between networking and virtualization is what has generated posts like this one, this one, or this one. Because I’m not a networking expert (yet), most of the stuff I generate is junk, but at least it keeps me entertained—and it occasionally prods the Really Smart Guys (RSGs) to post something far more intelligent than anything I can create.

Anyway, I’ve been thinking more about some of these networking-virtualization chimeras, and I thought it might be interesting to talk about them, if for no other reason than to encourage the RSGs to correct me and help everyone understand a little better.

<aside>A chimera, by the way, was a mythological fire-breathing creature that was part lion, part goat, and part serpent; more generically, the word refers to any sort of organism that has two groups of genetically distinct cells. In layman’s terms, it’s something that is a mix of two other things.</aside>

Here are some of the networking-virtualization chimeras I’ve concocted:

  • FabricPath/TRILL on the hypervisor: See this blog post for more details. It turns out, at least at first glance, that this particular combination doesn’t seem to buy us much. The push for large L2 domains that seemed to fuel FabricPath and TRILL now seems to be abating in favor of network overlays and L3 routing.

  • MPLS-in-IP on the hypervisor: I also wrote about this strange concoction here. At first, I thought I was being clever and sidestepping some issues by bringing MPLS support into the hypervisor, but in thinking more about this I realize I’m wrong. Sure, we could encapsulate VM-to-VM traffic into MPLS, then encapsulate MPLS in UDP, but how is that any better than just encapsulating VM-to-VM traffic in VXLAN? It isn’t. (Not to mention that Ivan Pepelnjak set the record straight.)

  • LISP on the hypervisor: I thought this was a really good idea; by enabling LISP on the hypervisor and essentially making the hypervisor an ITR/ETR (see here for more LISP info), inter-DC vMotion becomes a snap. Want to use a completely routed access layer? No problem. Of course, that assumes all your WAN and data center equipment are LISP-capable and enabled/configured for LISP. I’m not the only one who thought this idea was cool, either. I’m sure there are additional problems/considerations of which I’m not aware, though—networking gurus, want to chime in and educate me on what I’m missing?

  • OTV on the hypervisor: This one isn’t really very interesting, as it bears great similarity to VXLAN (both OTV and VXLAN, to my knowledge, use very similar frame formats and encapsulation schemes). Is there something else here I’m missing?

  • VXLAN on physical switches: This one is interesting, even necessary according to some experts. Enabling VXLAN VTEP (VXLAN Tunnel End Point) termination on physical switches might also address some of the odd traffic patterns that would result from the use of VXLAN (see here for a simple example). Arista Networks demonstrated this functionality at VMworld 2012 in San Francisco, so this particular networking-virtualization mashup is probably closer to reality than any of the others.

  • OpenFlow on the hypervisor: Open vSwitch (OVS) already supports OpenFlow, so you might say that this mashup already exists. It’s not unreasonable to think Nicira might port OVS to VMware vSphere, which would bring an OpenFlow-compatible virtual switch to a much larger installed base. The missing piece is, of course, an OpenFlow controller. While an interesting mental exercise, I’m keenly interested to know what sort of real-world problems this might help solve, and would love to hear from any OpenFlow experts out there what they think.

  • Virtualizing physical switches: No, I’m not talking about running switch software on the hypervisor (think Nexus 1000V). Instead, I’m thinking more along the lines of FlowVisor, which in effect virtualizes a switch’s control plane so that multiple “slices” of a switch can be independently controlled by an external OpenFlow controller. If you’re familiar with NetApp, think of their “vfiler” construct, or think of the Virtual Device Contexts (VDCs) in a Nexus 7000. However, I’m thinking of something more device-independent than Nexus 7000 VDCs. As more and more switches move to x86 hardware, this seems like it might be something that could really take off. Multi-tenancy support (each “virtual switch instance” being independently managed), traffic isolation, QoS, VLAN isolation…lots of possibilities exist here.

Are there any other groupings that are worth exploring or discussing? Any other “you got your virtualization peanut butter in my networking chocolate” combinations that might help address some of the issues in data centers today? Feel free to speak up in the comments below. Courteous comments are invited and encouraged.

Tags: , , , , , ,

Back in early March I was invited to speak at the South Florida VMUG, and I gave this presentation on vSphere networking challenges and solutions. The idea behind the presentation was to give attendees some visibility into IEEE and IETF efforts at creating new network technologies and protocols. I’m posting it here just in case someone might find it useful or helpful.

As always, your questions, corrections, or clarifications are welcome in the comments below.

Tags: , , , ,

Building large-scale L2 networks, including stretched L2 networks, seems to be all the rage these days, driven in part by virtual machine mobility (aka vMotion in VMware vSphere environments or XenMotion in Citrix XenServer environments). While this isn’t always a good idea—some might say it’s never a good idea—it is still something that many organizations are evaluating.

With the announcement of VXLAN at VMworld 2011, a new question seems to have arisen: can I use VXLAN instead of (insert some other protocol here) to create my stretched L2 networks? In this post, I’d like to compare the use of VXLAN with OTV (Overlay Transport Virtualization) for that very purpose. Of course, since VXLAN hasn’t actually been released, the discussion is partially theoretical.

My primary focus in this post will be how each of these protocols handles traffic patterns in the course of addressing the need for L2 connectivity over routed L3 networks.

First, let’s look at VXLAN. The figure below is taken from my revised L3 connectivity with VXLAN post, which I encourage you to read for more details.

As you can see, once a VM inside a VXLAN segment is migrated to a new network, the traffic “trombones” back and forth across the VXLAN segment because all traffic has to pass through a single vShield Edge (VSE) instance. This brings up a key limitation of VXLAN that I think is important to point out: VXLAN has an innate dependency on VSE, and VSE cannot be made redundant. That’s right—you can’t have VSE-specific failover functionality; instead, you have to rely on vSphere HA, VM Monitoring, and other features. That means failover times in the minutes, not seconds. What do you think that will do to network connections?

Now, let’s compare VXLAN’s L3 connectivity with OTV. First, here’s a diagram to show connectivity with OTV before a VM is migrated to the second site:

No real surprises here. I’ll just point out here that a typical OTV deployment following “recommended practices” will use redundant Nexus 7000 switches, as shown here. That’s a key advantage that OTV has over VXLAN—the ability to provide redundancy is there and redundancy is easily built into the solution, with failover times in the seconds (or better).

Now, take a look at the post-migration traffic flows with OTV:

In case you didn’t notice it, let me point out the obvious: note the lack of traffic tromboning here. Here’s how it’s accomplished (and documented in this blog post by Ron Fuller, aka @ccie5851 or VDCBadger to his friends):

  • Each Nexus 7000 pair runs HSRP.
  • The HSRP hello packets are filtered (blocked) from the OTV interfaces. This keeps the HSRP pairs in each data center from knowing about the pair in the other data center.
  • Each HSRP pair runs the same virtual IP (the default gateway for the 10.1.1.0/24 subnet).

In this configuration, once the VM migrates to the second site the HSRP pair at the second site won’t need to send traffic across the OTV link to reach the migrated VM. This appears to be a significant advantage to OTV—a greater knowledge of the routing topology allows OTV to be more intelligent about how traffic should be directed across/around the network.

<aside>Of course, this doesn’t address L3 routing concerns from subnets not directly attached to the Nexus 7000 pairs. For that, we’d need something like LISP.</aside>

As I see it—and networking experts are welcome to jump in if I’m mistaken—this gives OTV two key advantages over VXLAN:

  1. OTV, because it is running on physical networking equipment, is more intelligent than VXLAN about how traffic is directed/routed in/around/across a network. This can result in more efficient utilization of a data center interconnect as a result of reduced “traffic tromboning.”
  2. OTV, because it is running on physical networking equipment, can provide better redundancy and faster failover than VXLAN (which relies on single instances of VSE).

It’s entirely possible that if VXLAN ever makes it into physical network equipment that these advantages of OTV will be nullified.

It’s also important to point out that while OTV and VXLAN have some overlap in functionality they are partially targeted at solving different problems. While both protocols address L2 connectivity across L3 networks, VXLAN also addresses the exhaustion of the VLAN address space in larger networks (especially service provider networks). This is an issue that OTV does not try to address. However, it seems to me that OTV would co-exist better with a solution like Q-in-Q, which could (as far as I can tell) address the VLAN ID exhaustion issue.

Once again, I encourage network experts to chime in and share their views. If I’ve misstated something, please let me know. Questions, thoughts, and comments are always welcome.

Tags: , , ,

In my earlier post on VXLAN and Layer 3 connectivity, I had a fatal flaw in my thinking and in my diagrams that was corrected for me in the comments to that post. In this post, I want to revisit the idea of Layer 3 connectivity with VXLAN and include the corrected information (and new diagrams).

The “fatal flaw” was that I was working under the impression that we’d have to change network address translation (NAT) mappings on the vShield Edge (VSE) instance that was handling NAT for a particular VXLAN segment. As a result of this incorrect thinking, I stated that VXLAN broke Layer 3 connectivity. As it turns out, I was wrong.

Instead—and this makes perfect sense now that my flawed thinking was pointed out—the VSE instance continues to serve as the default Layer 3 gateway for the workload(s) inside the VXLAN segment.

Consider this diagram, which shows how a workload external to a VXLAN segment communicates with a workload inside a VXLAN segment:

Note that in this diagram, the Linux workload outside the VXLAN segment communicates via the VSE instance handling NAT for that particular VXLAN segment. The VSE instance (VSE 1) passes that communication to the internal workload, and the return traffic follows the same path. Layer 3 connectivity outside of the VXLAN segment is handled via traditional/normal Layer 2/3 methods.

Now consider this diagram, which shows the same communication, but after the Windows-based workload inside the VXLAN segment has now migrated to a different location:

Note that even though the Windows-based workload inside the VXLAN segment now resides on a completely separate VTEP (ESXi 2, in this case), the traffic from the Linux-based workload outside the VXLAN segment continues to move through VSE 1. That’s because VSE 1 is still the Layer 3 default gateway for the IP subnet inside the VXLAN segment. Therefore—and this is where I was wrong earlier—Layer 3 connectivity is not broken, but it does have to “horseshoe” across to the other data center and then back again, as illustrated above. This is the classic traffic pattern that we see with other overlay technologies, like OTV.

For me, while this addresses Layer 3 connectivity after a migration with VXLAN, it does bring up other questions:

  • How does one provide redundancy at the VSE level? Is there VRRP support in VSE, or an equivalent function?
  • Because Layer 3 connectivity is maintained, what now is the role of OTV? Is OTV relegated to handling Layer 2 extensions only for non-virtualized workloads?
  • How do we now propose to handle the “horseshoe” routing issue? It would seem to me that the only way to address this would be to port support for LISP (or an equivalent protocol) into VSE.

Feel free to post any questions, thoughts, or corrections in the comments below. Thanks!

Tags: , , ,

Note: I’ve posted a follow-up to this article with some corrected information. Please read here.

I’ve been doing quite a bit of networking-related reading over the last few weeks, and VXLAN has been a key topic of this networking-related reading (along with OTV, MPLS, and OpenFlow). Since VXLAN’s announcement at VMworld US 2011, there have been some pretty good articles written and published about VXLAN. Here are a few, for example:

Digging Deeper into VXLAN, Part 1
VXLAN Deep Dive, Part 2: Looking at the Options
Digging Deeper in VXLAN, Pt 3, More FAQs
The Care and Feeding of VXLAN
VXLAN Part Deux
VXLAN Conclusion
Google+ discussions on VXLAN
VXLAN Primer – Part 1, BORGcube Blogs

However, the one thing that I haven’t seen a great discussion about is the impact of VXLAN on Layer 3 connectivity. I personally have fielded a number of questions about whether VXLAN will fix Layer 3 network connectivity problems with stretched clusters. So, I thought I’d take a stab here. Networking gurus (you know who you are), feel free to straighten me out if I’m wrong.

First, let’s start with a few basic things that we know about VXLAN:

  • We know that VXLAN encapsulates Layer 2 frames into Layer 3 packets (using UDP).
  • We know that VXLAN adds a 24-bit VXLAN Network Identifier (VNI) that allows for up to 16 million unique combinations.
  • We know that VXLAN Segments are built between VXLAN Tunnel End Points (VTEPs). In the initial implementation of VXLAN, the VTEP will be the Nexus 1000V VEM on an ESXi host.
  • We know that (for now) VXLAN is not understood by any physical networking devices (the transport that carries the encapsulated frames only needs to an IP-based network). (VXLAN encapsulation is a subset of OTV encapsulation, so in theory the Nexus 7000 hardware is capable of decoding VXLAN.)

With that information in mind, I’d like to use the following diagram to frame the discussion.

In the diagram, there are two ESXi hosts acting as VTEPs. Between them exist two VXLAN segments with two different VNIs (VNI 738 and VNI 864). Because VXLAN works by encapsulating Layer 2 frames into Layer 3 packets and then routing these packets between VTEPs, VXLAN accomplishes one of its primary goals: it extends Layer 2 connectivity across Layer 3 networks.

But what does that mean, exactly?

Let’s look a bit more closely. The brown shape loosely represents Layer 2 connectivity within VNI 738 (a given VXLAN segment) and its associated VLAN(s). The Windows-based VM on the ESXi host on the left can communicate via Layer 2 with the Linux-based VM on the ESXi host on the right, even though those ESXi hosts reside in completely different broadcast domains separated by a Layer 3 routed network. The key phrase here, in my mind, is that VXLAN extends Layer 2 connectivity within a given VXLAN segment.

This is not, however, the sort of “extending Layer 2 connectivity across Layer 3 networks” that people are expecting.

What people are expecting from this phrase is that you could migrate a VM from the ESXi host on the left to the ESXi host on the right (as indicated in the diagram by the large arrow pointing from left to right) and still have full IP connectivity.

In this case, the VM itself will be able to maintain the same IP address, and other VMs in the same VXLAN segment will continue to communicate with the migrated VM without any issues. But hold on a second…

We know that VXLAN allows for duplicate IP addressing schemes across different VXLAN segments (but not in the same VXLAN segment), duplicate MAC addresses across different VXLAN segments (but not in the same VXLAN segment), and duplicate VLAN IDs across different VXLAN segments (but not in the same VXLAN segment). You could, for example, use the same IP addressing scheme, same MAC addresses, and same VLAN IDs in the brown (VNI 738) and blue (VNI 864) VXLAN segments. VXLAN wouldn’t care, and the VMs inside those VXLAN segments would be unaware of this duplicity.

However, what VXLAN doesn’t address is IP translation; that functionality is relegated to a network address translator. In this case, it’s vShield Edge (VSE). So, in the instance where a VM is migrated between different Layer 3 networks, note that the only way to maintain IP connectivity from outside the VXLAN segment is to update the address translation tables and—here’s the kicker—assign the VM a new (and different) externally-accessible IP address. That breaks IP connectivity—even with dynamic DNS updates, clients will still have cached DNS responses pointing them back to the (now inactive) old external IP address. Thus, VXLAN breaks Layer 2/3 connectivity to other systems outside the VXLAN segment.

This issue, by the way, would be why various networking gurus have repeatedly stated that VXLAN does not replace OTV. To fix the issue described above, you’d have to use OTV to stretch the external-to-VXLAN VLANs so that the NAT mappings could remain unchanged and the externally accessible IP address would remain the same.

Before you assume that I knocking VXLAN, let me reaffirm that I’m not. I only felt that there hadn’t been a good, solid, understandable explanation of what sorts of connectivity were and were not extended/affected by VXLAN. Hopefully, this message has helped bring some clarity to the topic.

If I have misrepresented anything, presented something incorrectly, or if you have questions/clarifications, please let me know in the comments. Thanks!

UPDATE: As a couple of readers pointed out in the comments (thanks!), the Layer 3 connectivity isn’t quite as dire as what I’ve described. Instead of the VM’s address having to change due to a change in NAT mappings on a VSE, instead the VM’s traffic will “trombone” back to the original VSE that acts as the VXLAN segment’s default gateway. Again, thanks for the clarification/correction all!

Tags: , , ,

Welcome to Technology Short Take #15, the latest in my irregular series of posts on various articles and links on networking, servers, storage, and virtualization—everything a growing data center engineer needs!

Networking

My thoughts this time around are pretty heavily focused on VXLAN, which continues to get lots of attention. I talked about posting a dissection of VXLAN, but I have failed miserably; fortunately, other people smarter than me have stepped up to the plate. Here are a few VXLAN-related posts and articles I’ve found over the last couple of weeks:

  • There is a three-part series over at Coding Relic that does a great job of explaining VXLAN, the components of VXLAN, and how it works. Here are the links to the series: part 1, part 2, and part 3. One note of clarification: in part 3 of the series, Denny talks about a VTEP gateway. Right now, the VTEP gateway is the server itself; anytime a packet on a VXLAN-enabled network leaves the physical server to go to a different physical server, it will be VXLAN-encapsulated. It won’t be decapsulated until it hits the destination VTEP (the ESXi server hosting the destination VM). If (when?) VXLAN awareness hits physical switches, then the possibility of a VTEP gateway existing outside the server exists. Personally, it kind of makes sense—to me, at least—to build VTEP gateway functionality into vShield Edge.
  • Some people aren’t quite so enamored with VXLAN; one such individual is Greg Ferro. I respect Greg a great deal, so it was interesting to me to read his article on why VXLAN is “full of fail”. Some of his comments are only slightly related to VXLAN (the rant over IEEE vs. IETF, for example), but Greg’s comment about VMware building a new standard instead of “leveraging the value of networking infrastructure” echoes some of my own thoughts. I understand that VXLAN accomplishes things that existing standards apparently do not, but was a new standard really necessary?
  • Omar Sultan of Cisco took the time to compile some questions and answers about VXLAN. One thing that is made more clear—for me, at least—in Omar’s post is the fact that VXLAN doesn’t address connectivity to the vApps from the “outside” world. While VXLAN provides a logical isolated network segment that can span multiple Layer 3 networks and allow applications to communicate with each other, VXLAN doesn’t address the Layer 3 addressing that must exist outside the VXLAN tunnel. In fact, in my discussions with some of the IETF draft authors at VMworld, they indicated that VXLAN would require a NAT device or a DNS update in order to address changes in externally-accessible applications. This, by the way, is why you’ll still need technologies like OTV and LISP (or their equivalents); see this post for more information on how VXLAN, OTV, and LISP are complementary. If I’m wrong, please feel free to correct me.
  • In case you’re still unclear about the key problem that VXLAN attempts to address, this quote from Ivan Pepelnjak might help (the full article is here):

    VXLAN tries to solve a very specific IaaS infrastructure problem: replace VLANs with something that might scale better. In a massive multi-tenant data center having thousands of customers, each one asking for multiple isolated IP subnets, you quickly run out of VLANs.

  • Finally, you might find this PDF helpful. Ignore the first 13 slides or so; they’re marketing fluff, to be honest. However, the remainder of the slides have some useful information on VXLAN and how it’s expected to be implemented.

Servers

I didn’t really stumble across anything strictly server hardware-related; either I’m just not plugged into the right resources (anyone want to make some recommendations?) or it was just a quiet period. I’ll assume it was the former.

Storage

Virtualization

  • Did you see this post about new network simulation functionality in VMware Workstation 8?
  • Here’s a good walk-through on setting up vMotion across multiple network interfaces.
  • VMware vSphere Design co-author Maish Saidel-Keesing has a post here on how to approximate the functionality of netstat on ESXi.
  • William Lam has a “how to” on installing the VMware VSA with running VMs.
  • Fellow vSpecialist Andre Leibovici did a write-up on a proof of concept that the vSpecialists did for a customer involving Vblock, VPLEX, and VDI. This was a pretty cool use case, in my opinion, and worth having a look if you need to design a highly available environment.
  • Thinking about playing with vShield 5? That’s a good idea, but check here to learn from the mistakes of others first. You’ll thank me later.
  • The question of defragmenting guest OS disks has come up again and again; here’s the latest take from Cormac Hogan of VMware. He makes some great points, but I suspect that this question is still far from settled.

It’s time to wrap up now; I hope that you found something useful. As always, thanks for reading! Feel free to share your views or thoughts in the comments below.

Tags: , , , , , , , , ,

This is BRKDCT-9131, Mobility and Virtualization in the Data Center with LISP and OTV. This is one of the last, if not the last, session at Cisco Live 2011.

The presenter, Victor Moreno, spends a few minutes at first talking about distributed data center clouds, the goals, and the challenges of building them. Some of the challenges or considerations include:

  • L2 domain elasticity (think FabricPath and LAN extensions)
  • Fabric consolidation (think Unified Fabric, VDCs)
  • Storage elasticity (think SAN extensions)
  • IP localization (Route optimization and route portability)
  • VM awareness (think VN-Link)

Victor next discusses why LAN extensions are necessary. From a virtualization perspective, it’s because a live migration requires for the IP address of the VM to remain the same. He also discusses some non-routable L2 traffic that certain applications, but to me this seems far less likely/important than maintaining IP addresses.

There are different ways of tackling LAN extensions; the session blog from BRKDCT-3060 has more information on the various methods. This session is clearly slanted toward OTV, as the discussion of L2 VPNs focuses on the complexity of that solution as opposed to the benefits (like extremely fast convergence).

The next section reviews the MAC-in-IP encapsulation mechanism that OTV uses to transport L2 frames across a routed transport. Victor also reviewed how the OTV control plane (which uses IS-IS) proactively advertises MAC reachability information, so that all OTV edge devices already know what MAC addresses are reachable via the overlay.

Regardless of the method used to perform the LAN extension, this interferes with ingress routing. Subnet usually implies location, but with LAN extension this mechanism is obscured—the network doesn’t know which side of the extension hosts the device to which we are communicating. This is the fundamental issue addressed by LISP.

LISP is Location Identity Separation Protocol. The idea is to separate the use of the IP address as a means of making routing decisions from the use of the IP address as a means of indicating location. We want to “split” these purposes apart. By splitting these apart, we can allow workloads to move yet still make efficient routing decisions.

So how does LISP operate? Consider this walkthrough:

  1. The source endpoint performs a DNS lookup to find the destination.
  2. Traffic is remote, so traffic is sent to the branch router.
  3. The branch router doesn’t know how to get to the destination’s specific address, but it is LISP-enabled so it performs a LISP lookup to find a locator address.
  4. The LISP mapping database informs the branch router how to get to the one (or more) available addresses that can get it to the destination. The LISP mapping database can return priority and weight as part of this lookup, to help with traffic engineering/shaping.
  5. The branch router performs an IP-in-IP encapsulation and transmits the data out the appropriate interface based on standard IP routing decisions.
  6. The receiving LISP-enabled router receives the packet, decapsulates the packet, and forwards the packet to the final destination.

To support routers that are not LISP-enabled, you use proxy tunnel routers. Proxy tunnel routers will encapsulate or decapsulate traffic from or to routers that are not LISP-enabled.

Some terminology of which you should be aware:

  • EID: End-point identifier (host IP or prefix)
  • RLOC: Routing locator (IP address of routers in the backbone)
  • Tunnel router: Edge devices that perform encapsulation/decapsulation
  • ETR: Egress tunnel router
  • ITR: Ingress tunnel router
  • PxTR: Proxy tunnel router (gateway between LISP backbone and non-LISP-aware routers)
  • EID to RLOC mapping DB: Contains all the EID-to-RLOC mapping, distributed across multiple map servers (MS)

LISP resolution (determining the appropriate ETR by an ITR) is performed by the MS, which forwards requests to ETRs that have registered with the MS as authoritative for a particular prefix. The ITR then caches the mapping for future use.

Future revisions of the LISP standard will add extra security to prefix registrations, to avoid the malicious introduction of prefix registrations (I believe it’s referred to as LISP-SEC).

LISP enables on-demand routing. Rather than requiring that all routers maintain a full routing table, LISP routers will determine LISP prefix mappings on an as-needed basis.

Some potential use cases for LISP:

  • IP portability
  • Ingress traffic engineering without BGP
  • IPv6 transition support (6-over-4, 4-over-6, etc.)
  • Multitenancy and VPNs
  • VM mobility

Victor now walks through some scenarios concerning the use of LISP with VM mobility use cases. One interesting note: using LISP to provide IP portability means that I could use SRM to failover to a DR site but not have to perform IP customization—LISP would handle the IP routing side of the house. That’s a pretty cool combination.

The session is still going, but I have to get back to the show floor and finish tearing down the EMC booth.

Tags: , , , ,

This is BRKDCT-3060, Deployment Considerations with Interconnecting Data Centers, presented by Patrice Bellagamba (Distinguished SE) and Max Ardica (Solution Architect). Given that my role in virtualization and storage puts me squarely in the middle of these sorts of designs, I’m really looking forward to this session and gaining some valuable knowledge out of it.

The primary objectives of the session are to identify the business requirements driving DCI (Data Center Interconnect) deployments; understanding the functional components of the Cisco DCI solution; and get a full knowledge of Cisco LAN extension technologies and associated considerations. The session does NOT consider path optimization (ACE/GSS, LISP), storage extension considerations, and workload mobility application-specific considerations.

So what are some of the business drivers and solutions? There are several:

  • Business continuity (disaster recovery, HA framework)
  • Operational cost containment (DC maintenance or migration)
  • Business resource optimization (disaster avoidance, workload mobility)
  • Cloud services (inter-cloud networking, XaaS)

It is important, when extending VLANs between sites, to preserve STP domain isolation and storm control in order to keep fault domain as small as reasonably possible. A key part of a DCI model is path optimization, but that is not something that will be covered in this session.

In a VLAN extension environment, there are 5 different types of VLANs you might deploy in your data center:

  • Type T0: Limited to a single access layer device
  • Type T1: Extended within an aggregation block/pod
  • Type T2: Extended between aggregation blocks/pods within a single data center site
  • Type T3: Extended between aggregation blocks/pods as part of “twin DC sites” (usually connected via dark fiber; think metro/synchronous distances)
  • Type T4: Extended between aggregation blocks/pods as part of remote DC sites (think geo/asynchronous distances)

Note that these classifications are just for understanding the different ways VLANs are extended; this is not a configuration class or type.

There are three basic types of VLAN extensions: Ethernet-based, IP-based, or MPLS-based.

The Ethernet-based solution leverages dark fiber/DWDM connections between DCs. In this configuration, make sure to filter BPDUs on one site and enable broadcast storm control. You can leverage technologies like vPC or VSS to avoid loops and take advantage of redundant links between the DC sites. If you use vPC, you will need separate (dedicated) Layer 3 links for inter-DC routing.

There was a question about the distances supported for an Ethernet solution. According to Max, it’s less about distance and more about the dark fiber/DWDM/direct fibre connectivity requirements. Up to 60km is probably a reasonable distance for this solution.

The Ethernet-based solution is a viable solution, but Max prefers OTV.

OTV is Overlay Transport Virtualization and is something I’ve discussed on my site before. The OTV edge device (ED) is the device that performs the OTV encapsulation (MAC-in-IP encapsulation). The internal interfaces are the interfaces on the ED that face the site. The join interface is the interface of the ED that faces the core. The overlay interface is a logical entity that encapsulates the traffic.

Some considerations:

  • Join interfaces and internal interfaces are only supported on M1 modules.
  • The join interface can’t be an SVI or loopback interface, but it can be a port-channel interface or a routed physical interface (or sub-interface).

There were some other considerations but I couldn’t capture them in time.

OTV encapsulation adds 42 bytes to the packet IP MTU size. Be sure to keep this consideration in mind; ensure that you have the appropriate MTU size support end-to-end. The OTV control plane uses IS-IS to learn MAC address reachability information from remote OTV peers. In the current release of NX-OS, OTV requires multicast support in the transport network connecting the sites. NX-OS 5.2 will add unicast (Adjacency Server mode) support.

The adjacency server (or daemon) is “just an OTV edge device” that advertises the IP of each ED to all other EDs (in something called the OTV Neighbor List, or the oNL). All subsequent communications happen directly between EDs without going through the adjacency server/daemon.

By default OTV will isolate STP domains, which is an important part of any DCI solution. OTV also does not flood unknown unicast frames. OTV relies on the OTV control plane in order to ensure MAC address reachability information is in the MAC address table of edge OTV ED.

Some current limitations:

  • Up to 3 sites for OTV
  • Up to 128 extended VLANs
  • 12K MAC addresses

In NX-OS 5.2 these values increase to 6 sites, 256 VLANs, and 16K MAC addresses.

Max now shifts into a discussion of OTV deployment considerations. The first topic is where to place the OTV ED.

One option is to deploy the OTV ED in the core. This is an easy deployment in brownfield scenarios. In this case, the L2-L3 boundary remains at aggregation. You could have core devices perform both L3 and OTV functions, or you could use separate core devices, or you could use Nexus 7000 VDCs (see my VDC session blog for more information).

However, Max prefers deploying OTV at the aggregation layer. This preserves “pure” L3 functionality at the core and is consistent with OTV’s role as an “overlay” functionality.

Traffic on a Nexus 7000 belonging to a given VLAN can either be routed or extended, but not both. You must use dual VDCs to do both routing and OTV extension on the same Nexus 7000 physical device (either at the core or the aggregation layer).

Deploying OTV at the aggregation layer is recommended for greenfield deployments, but it does require Nexus 7000 at the aggregation layer.

A third option is to deploy OTV over dark fiber connections, much like the Ethernet-based solution described earlier. In this scenario you would not need separate L3 links as with the Ethernet-based solution. Because you are doing both routing and OTV extension at the same time, this solution would require dedicated OTV VDCs.

Max next covered the various scenarios of how you can actually connect the OTV VDC and L2/L3 VDCs. There are advantages and disadvantages to the various approaches: the simple appliance model uses fewer links but introduces other challenges; the most resilient model uses more links but provides faster convergence and better traffic flow.

Keep in mind that OTV is a pure IP encapsulation technology that does not include any L4 information (only L3 information wrapped around a MAC address), so IP hashing technologies (port-channel load balancing or ECMP hashing) won’t spread traffic across multiple links. In a two-site scenario, only one of two links will be used–always. In a three-site design, two links can be used (depends on the hashing). Future hardware updates will allow flow-based load balancing (F2 or F3 module, perhaps).

Native OTV support on F-series modules is targeted for a future hardware release. Neither F1 nor F2 modules support OTV currently.

In the event of various failure scenarios, only failures that cause an AED (Authoritative Edge Device) re-election cause extended outages for convergence. In NX-OS 5.2, convergence will be between 5 and 30 seconds for an AED re-election. Future releases will bring those values down. Failures that do not cause AED re-election will converge in sub-second timeframes.

Max still recommends configuring storm control (using the storm-control broadcast command) on the internal interface if you are concerned about broadcast storms with OTV.

Now Patrice takes over to discuss MPLS-based solutions (refer to my MPLS session blog for more information). There are three MPLS-based solutions: EoMPLS, A-VPLS, and H-VPLS.

EoMPLS is Ethernet-over-MPLS. To do this, you would use the xconnect command with MPLS encapsulation to create a point-to-point Ethernet over MPLS connection. In fact, a show cdp neighbor will show the remote port over the MPLS network.

This use of MPLS allows us to take advantage of all of MPLS’ features, like traffic engineering, for the traffic moving over this EoMPLS connection.

You can use LACP/vPC with multiple EoMPLS links.

You can also use MPLS over IP (using a GRE tunnel) and then create the EoMPLS pseudowires inside the GRE tunnels. (I think—this part got pretty complicated.) Because you are using GRE tunneling, you can apply an IPsec encryption profile against that tunnel to provide encryption for the EoMPLS traffic.

EoMPLS is point-to-point, so what about point-to-multipoint? VPLS can address this. With multiple devices connected across an MPLS core, each VSI (Virtual Switch Instance) will create multiple EoMPLS pseudowires to create a full mesh between all other VSIs.

There are considerations on attaching edge devices in this sort of configuration; best practices would be to make the MPLS-connected devices the STP root bridge (or avoid STP through vPC or equivalent). Patrice walks through a number of different deployment considerations and discussions of where to place the devices, but the focus of the discussion seemed to be more on VSS than A-VPLS. I’ll need to go back and review this information again in order to get a better understanding of the material.

Patrice now moves on to a discussion of H-VPLS.

H-VPLS is more suited for service providers (SPs) or SP-like enterprises. You might see this when interconnecting provider DCs, connecting enterprise DCs to provider DCs, or in high-end enterprise DCI scenarios.

ICCP is Inter-Chassis Communication Protocol that is in draft status with the IETF. I’m not clear on how this is related to vPC, vPC+ (since Patrice was talking about FabricPath), or other multi-chassis link aggregation solutions. I think the connection to H-VPLS is that you might use ICCP in connecting Layer 2 devices to the edge devices that will then be participating in the MPLS network.

Some terms:

  • mLACP Multi-Chassis Link Aggregation Protocol
  • MC-LAG: Multi-Chassis Link Aggregation Group
  • DHD: Dual-homed device (customer edge)
  • DHN: Dual-home network (customer edge)

Ah, here’s the information that helps me decipher the flow of the session. A-VPLS is only for certain devices (Catalyst 6500); H-VPLS is for high-end devices (7600, ASR-9K).

At this point I had to leave for a meeting so I wrapped up the session blog. Patrice was in the summary section of the presentation so no new information was going to be shared.

Tags: , , ,