Networking

This category contains information on networking and networking-related technologies or vendors.

This blog post kicks off a new series of posts describing my journey to become more knowledgeable about the Nicira Network Virtualization Platform (NVP). NVP is, in my opinion, an awesome platform, but there hasn’t been a great deal of information shared about the product, how it works, how you configure it, etc. That’s something I’m going to try to address in this series of posts. In this first post, I’ll start with a high-level description of the NVP architecture. Don’t worry—more in-depth information will come in future posts.

Before continuing, it might be useful to set some context around NVP and NSX. This series of posts will focus on NVP—a product that is available today and is currently in use in production. The architecture I’m describing here will also be applicable to NSX, which VMware announced in early March. Because NSX will leverage NVP’s architecture, spending some time with NVP now will pay off with NSX later. Make sense?

Let’s start with a figure. The diagram below graphically illustrates the NVP architecture at a high level:

High-level NVP architecture diagram

The key components of the NVP architecture include:

  • A scale-out controller cluster: The NVP controllers handle computing the network topology and providing configuration and flow information to create logical networks. The controllers support a scale-out model for high availability and increased scalability. The controller cluster supplies a northbound REST API that can be consumed by cloud management platforms such as OpenStack or CloudStack, or by home-grown cloud management systems.
  • A programmable virtual switch: NVP leverages Open vSwitch (OVS), an independent open source project with contributors from across the industry, to fill this role. OVS communicates with the NVP controller clusters to receive configuration and flow information.
  • Southbound communications protocols: NVP uses two open communications protocols to communicate southbound to OVS. For configuration information, NVP leverages OVSDB; for flow information, NVP uses OpenFlow. The management (OVSDB) communication between the controller cluster and OVS is encrypted using SSL.
  • Gateways: Gateways provide the “on-ramp” to enter or exit NVP logical networks. Gateways can provide either L2 gateway services (to bridge NVP logical networks onto physical networks) as well as L3 gateway services (to route between NVP logical networks and physical networks). In either case, the gateways are also built using a scale-out model that provides high availability and scalability for the L2 and L3 gateway services they host.
  • Encapsulation protocol: To provide full independence and isolation of logical networks from the underlying physical networks, NVP uses encapsulation protocols for transporting logical network traffic across physical networks. Currently, NVP supports both Generic Routing Encapsulation (GRE) and Stateless Transport Tunneling (STT), with additional encapsulation protocols planned for future releases.
  • Service nodes: To offload the handling of BUM (Broadcast, Unknown Unicast, and Multicast) traffic, NVP can optionally leverage one or more service nodes. Note that service nodes are optional; customers can choose to have BUM traffic handled locally on each hypervisor node. (Note that service nodes are not shown in the diagram above.)

Now that you have an idea of the high-level architecture, let me briefly outline how the rest of this series will be organized. The basic outline of this series will roughly correspond to how NVP would be deployed in a real-world environment.

  1. In the next post (or two), I’ll be focusing on getting the controller cluster built and diving a bit deeper into the controller cluster architecture.
  2. Once the controller cluster is up and running, I’ll take a look at getting NVP Manager up and running. NVP Manager is an application that consumes the northbound REST APIs from the controller cluster in order to view and manage NVP logical networks and NVP components. In most cases, this function is part of a cloud management platform (such as OpenStack or CloudStack), but using NVP Manager here allows me to focus on NVP instead of worrying about the details of the cloud management platform itself.
  3. The next step will be to bring hypervisor nodes into NVP. I’ll focus on using nodes running KVM, but keep in mind that Xen is also supported by NVP. If time (and resources) permit, I may try to look at bringing up Xen-based hypervisor nodes as well. Because NVP leverages OVS as the edge virtual switch, I’ll naturally be discussing some OVS-related tasks and topics as well.
  4. Following the addition of hypervisor nodes into NVP, I’ll look at creating a simple logical network, and we’ll examine how this logical network works with the underlying physical network.
  5. To add more flexibility to our logical network, we need to be able to bring physical resources into NVP logical networks. To enable that functionality, we’ll need to add gateways and gateway services to our configuration, so I’ll discuss gateways and L2 gateway services, how they work, and how we add them to an NVP configuration.
  6. The next step is to enable L3 (routing) functionality within NVP, and that is enabled by L3 gateway services. I’ll spend some time talking about the L3 gateway services, their architecture, adding them to NVP, and including L3 functionality in an NVP logical network. I’ll also explore distributed L3 routing, where the L3 routing is actually distributed across hypervisors in an NVP environment (this is a new feature just added in NVP 3.1).
  7. Now that we have both L2 and L3 gateway services in NVP, I’ll take a look at building more intricate logical networks.

Beyond that, it’s hard to say where the series will go. I’ll likely also take a look at some of NVP’s security features, and examine a few more complex NVP use cases. If there are additional topics you’d like to see beyond what I’ve outlined above, please feel free to speak up in the comments below.

I’m excited about this journey to learn NVP in more detail, and I’m looking forward to taking all of you along with me. Ready? Let’s go!

Tags: , , , , ,

In this post, I want to provide some additional insight on how the use of Open vSwitch (OVS) affects—or doesn’t affect, in some cases—how a Linux host directs traffic through physical interfaces, OVS internal interfaces, and OVS bridges. This is something that I had a hard time understanding as I started exploring more advanced OVS configurations, and hopefully the information I share here will be helpful to others.

To help structure this discussion, I’m going to walk through a few different OVS configurations and scenarios. In these scenarios, I’ll use the following assumptions:

  • The physical host has four interfaces (eth0, eth1, eth2, and eth3)
  • The host is running Linux with KVM, libvirt, and OVS installed

Scenario 1: Simple OVS Configuration

In this first scenario let’s look at a relatively simple OVS configuration, and examine how Linux host and guest domain traffic moves into or out of the network.

Let’s assume that our OVS configuration looks something like this (this is the output from ovs-vsctl show):

bc6b9e64-11d6-415f-a82b-5d8a61ed3fbd
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "eth0"
            Interface "eth0"
    Bridge "br1"
        Port "br1"
            Interface "br1"
                type: internal
        Port "eth1"
            Interface "eth1"
    ovs_version: "1.7.1"

This is a pretty simple configuration; there are two bridges, each with a single physical interface. Let’s further assume, for the purposes of this scenario, that eth2 has an IP address and is working properly to communicate with other hosts on the network. The eth3 interface is shutdown.

So, in this scenario, how does traffic move into or out of the host?

  1. Traffic from a guest domain: Traffic from a guest domain will travel through the OVS bridge to which it is attached (you’d see an additional “vnet0″ port and interface appear on that bridge when you start the guest domain). So, a guest domain attached to br0 would communicate via eth0, and a guest domain attached to br1 would communicate via eth1. No real surprises here.

  2. Traffic from the Linux host: Traffic from the Linux host itself will not communicate over any of the configured OVS bridges, but will instead use its native TCP/IP stack and any configured interfaces. Thus, since eth2 is configured and operational, all traffic to/from the Linux host itself will travel through eth2.

The interesting point (to me, at least) about #2 above is that this includes traffic from the OVS process itself. In other words, if the OVS process(es) need to communicate across the network, they won’t use the bridges—they’ll use whatever interfaces the Linux host uses to communicate. This is one thing that threw me off: because OVS is itself a Linux process, when OVS needs to communicate across the network it will use the Linux network stack to do so. In this scenario, then, OVS would not communicate over any configured bridge, but instead using eth2. (This makes perfect sense now, but I recall that it didn’t earlier. Maybe it’s just me.)

Scenario 2: Adding Bonding

In this second scenario, our OVS configuration changes only slightly:

bc6b9e64-11d6-415f-a82b-5d8a61ed3fbd
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "bond0"
            Interface "eth0"
            Interface "eth1"
    ovs_version: "1.7.1"

In this case, we’re now leveraging a bond that contains two physical interfaces (eth0 and eth1). (By the way, I have a write-up on configuring OVS and bonds, if you need/want more information.) The eth2 interface still has an IP address assigned and is up and communicating properly. The physical eth3 interface is shutdown.

How does this affect the way in which traffic is handled? It doesn’t, really. Traffic from guest domains will still travel across br0 (since this is the only configured OVS bridge), and traffic from the Linux host—including traffic from OVS itself—will still use whatever interfaces are determined by the host’s TCP/IP stack. In this case, that would be eth2.

Scenario 3: The Isolated Bridge

Let’s look at another OVS configuration, the so-called “isolated bridge”. This is a configuration that is commonly found in implementations using NVP, OpenStack, and others, and it’s a configuration that I recently discussed in my post on GRE tunnels and OVS.

Here’s the configuration:

bc6b9e64-11d6-415f-a82b-5d8a61ed3fbd
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "bond0"
            Interface "eth0"
            Interface "eth1"
    Bridge "br-int"
        Port "br-int"
            Interface "br-int"
                type: internal
        Port "gre0"
            Interface "gre0"
                type: gre
                options: {remote_ip="192.168.1.100"}
    ovs_version: "1.7.1"

As with previous configurations, we’ll assume that eth2 is up and operational, and eth3 is shutdown. So how does traffic get directed in this configuration?

  1. Traffic from guest domains attached to br0: This is as before—traffic will go out one of the physical interfaces in the bond, according to the bonding configuration (active-standby, LACP, etc.). Nothing unusual here.

  2. Traffic from the Linux host: As before, traffic from processes on the Linux host will travel out according to the host’s TCP/IP stack. There are no changes from previous configurations.

  3. Traffic from guest domains attached to br-int: Now, this is where it gets interesting. Guest domains attached to br-int (named “br-int” because in this configuration the isolated bridge is often called the “integration bridge”) don’t have any physical interfaces they can use; they can only use the GRE tunnel. Here’s the “gotcha”, so to speak: the GRE tunnel is created and maintained by the OVS process, and therefore it uses the host’s TCP/IP stack to communicate across the network. Thus, traffic from guest domains attached to br-int would hit the GRE tunnel, which would travel through eth2.

I’ll give you a second to let that sink in.

Ready now? Good! The key to understanding #3 is, in my opinion, understanding that the tunnel (a GRE tunnel in this case, but the same would apply to a VXLAN or STT tunnel) is created and maintained by the OVS process. Thus, because it is created and maintained by a process on the Linux host (OVS itself), the traffic for the tunnel is directed according to the host’s TCP/IP stack and IP routing table(s). In this configuration, the tunnels don’t travel through any of the configured OVS bridges.

Scenario 4: Leveraging an OVS Internal Interface

Let’s keep ramping up the complexity. For this scenario, we’ll use an OVS configuration that is the same as in the previous scenario:

bc6b9e64-11d6-415f-a82b-5d8a61ed3fbd
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "bond0"
            Interface "eth0"
            Interface "eth1"
    Bridge "br-int"
        Port "br-int"
            Interface "br-int"
                type: internal
        Port "gre0"
            Interface "gre0"
                type: gre
                options: {remote_ip="192.168.1.100"}
    ovs_version: "1.7.1"

The difference, this time, is that we’ll assume that eth2 and eth3 are both shutdown. Instead, we’ve assigned an IP address to the br0 interface on bridge br0. OVS internal interfaces, like br0, can appear as “physical” interfaces to the Linux host, and therefore can be assigned IP addresses and used for communication. This is the approach I used in describing how to run host management across OVS.

Here’s how this configuration affects traffic flow:

  1. Traffic from guest domains attached to br0: No change here. Traffic from guest domains attached to br0 will continue to travel across the physical interfaces in the bond (eth0 and eth1, in this case).

  2. Traffic from the Linux host: This time, the only interface that the Linux host has is the br0 internal interface. The br0 internal interface is attached to br0, so all traffic from the Linux host will travel across the physical interfaces attached to the bond (again, eth0 and eth1).

  3. Traffic from guest domains attached to br-int: Because Linux host traffic is directed through br0 by virtue of using the br0 internal interface, this means that tunnel traffic is also directed through br0, as dictated by the Linux host’s TCP/IP stack and IP routing table(s).

As you can see, assigning an IP address to an OVS internal interface has a real impact on the way in which the Linux host directs traffic through OVS. This has both positive and negative impacts:

  • One positive impact is that it allows for Linux host traffic (such as management or tunnel traffic) to take advantage of OVS bonds, thus gaining some level of NIC redundancy.
  • A negative impact is that OVS is now “in band,” so upgrades to OVS will be disruptive to all traffic moving through OVS—which could potentially include host management traffic.

Let’s take a look at one final scenario.

Scenario 5: Using Multiple Bridges and Internal Interfaces

In this configuration, we’ll use an OVS configuration that is very similar to the configuration I showed in my post on GRE tunnels with OVS:

bc6b9e64-11d6-415f-a82b-5d8a61ed3fbd
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "mgmt0"
            Interface "mgmt0"
                type: internal
        Port "bond0"
            Interface "eth0"
            Interface "eth1"
    Bridge "br1"
        Port "br1"
            Interface "br1"
                type: internal
        Port "tep0"
            Interface "tep0"
                type: internal
        Port "bond1"
            Interface "eth2"
            Interface "eth3"
    Bridge "br-int"
        Port "br-int"
            Interface "br-int"
                type: internal
        Port "gre0"
            Interface "gre0"
                type: gre
                options: {remote_ip="192.168.1.100"}
    ovs_version: "1.7.1"

In this configuration, we have three bridges. br0 uses a bond that contains eth0 and eth1; br1 uses a bond that contains eth2 and eth3; and br-int is an isolated bridge with no physical interfaces. We have two “custom” internal interfaces, mgmt0 (on br0) and tep0 (on br1), to which IP addresses have been assigned and which are successfully communicating across the network. We’ll assume that mgmt0 and tep0 are on different subnets, and that tep0 is assigned to the 192.168.1.0/24 subnet.

How does traffic flow in this scenario?

  1. Traffic from guest domains attached to br0: The behavior here is as it has been in previous configurations—guest domains attached to br0 will communicate across the physical interfaces in the bond.

  2. Traffic from the Linux host: As it has been in previous scenarios, traffic from the Linux host is driven by the host’s TCP/IP stack and IP routing table(s). Because mgmt0 and tep0 are on different subnets, traffic from the Linux host will go out either br0 (for traffic moving through mgmt0) or br1 (for traffic moving through tep0), and thus will utilize the corresponding physical interfaces in the bonds on those bridges.

  3. Traffic from guest domains attached to br-int: Because the GRE tunnel is on the 192.168.1.0/24 subnet, traffic for the GRE tunnel—which is created and maintained by the OVS process on the Linux host itself—will travel through tep0, which is attached to br1. Thus, the physical interfaces eth2 and eth3 would be leveraged for the GRE tunnel traffic.

Summary

The key takeaway from this post, in my mind, is understanding where traffic originates, and separating the idea of OVS as a switching mechanism (to handle guest domain traffic) as well as a Linux host process itself (to create and maintain tunnels between hosts).

Hopefully this information is helpful. I am, of course, completely open to your comments, questions, and corrections, so feel free to speak up in the comments below. Courteous comments are always welcome!

Tags: , ,

I’m back with another “how to” article on Open vSwitch (OVS), this time taking a look at using GRE (Generic Routing Encapsulation) tunnels with OVS. OVS can use GRE tunnels between hosts as a way of encapsulating traffic and creating an overlay network. OpenStack Quantum can (and does) leverage this functionality, in fact, to help separate different “tenant networks” from one another. In this write-up, I’ll walk you through the process of configuring OVS to build a GRE tunnel to build an overlay network between two hypervisors running KVM.

Naturally, any sort of “how to” such as this always builds upon the work of others. In particular, I found a couple of Brent Salisbury’s articles (here and here) especially useful.

This process has 3 basic steps:

  1. Create an isolated bridge for VM connectivity.
  2. Create a GRE tunnel endpoint on each hypervisor.
  3. Add a GRE interface and establish the GRE tunnel.

These steps assume that you’ve already installed OVS on your Linux distribution of choice. I haven’t explicitly done a write-up on this, but there are numerous posts from a variety of authors (in this regard, Google is your friend).

We’ll start with an overview of the topology, then we’ll jump into the specific configuration steps.

Reviewing the Topology

The graphic below shows the basic topology of what we have going on here:

Topology overview

We have two hypervisors (CentOS 6.3 and KVM, in my case), both running OVS (an older version, version 1.7.1). Each hypervisor has one OVS bridge that has at least one physical interface associated with the bridge (shown as br0 connected to eth0 in the diagram). As part of this process, you’ll create the other internal interfaces (the tep and gre interfaces, as well as the second, isolated bridge to which VMs will connect. You’ll then create a GRE tunnel between the hypervisors and test VM-to-VM connectivity.

Creating an Isolated Bridge

The first step is to create the isolated OVS bridge to which the VMs will connect. I call this an “isolated bridge” because the bridge has no physical interfaces attached. (Side note: this idea of an isolated bridge is fairly common in OpenStack and NVP environments, where it’s usually called the integration bridge. The concept is the same.)

The command is very simple, actually:

ovs-vsctl add-br br2

Yes, that’s it. Feel free to substitute a different name for br2 in the command above, if you like, but just make note of the name as you’ll need it later.

To make things easier for myself, once I’d created the isolated bridge I then created a libvirt network for it so that it was dead-easy to attach VMs to this new isolated bridge.

Configuring the GRE Tunnel Endpoint

The GRE tunnel endpoint is an interface on each hypervisor that will, as the name implies, serve as the endpoint for the GRE tunnel. My purpose in creating a separate GRE tunnel endpoint is to separate hypervisor management traffic from GRE traffic, thus allowing for an architecture that might leverage a separate management network (which is typically considered a recommended practice).

To create the GRE tunnel endpoint, I’m going to use the same technique I described in my post on running host management traffic through OVS. Specifically, we’ll create an internal interface and assign it an IP address.

To create the internal interface, use this command:

ovs-vsctl add-port br0 tep0 -- set interface tep0 type=internal

In your environment, you’ll substitute br2 with the name of the isolated bridge you created earlier. You could also use a different name than tep0. Since this name is essentially for human consumption only, use what makes sense to you. Since this is a tunnel endpoint, tep0 made sense to me.

Once the internal interface is established, assign it with an IP address using ifconfig or ip, whichever you prefer. I’m still getting used to using ip (more on that in a future post, most likely), so I tend to use ifconfig, like this:

ifconfig tep0 192.168.200.20 netmask 255.255.255.0

Obviously, you’ll want to use an IP addressing scheme that makes sense for your environment. One important note: don’t use the same subnet as you’ve assigned to other interfaces on the hypervisor, or else you can’t control that the GRE tunnel will originate (or terminate) on the interface you specify. This is because the Linux routing table on the hypervisor will control how the traffic is routed. (You could use source routing, a topic I plan to discuss in a future post, but that’s beyond the scope of this article.)

Repeat this process on the other hypervisor, and be sure to make note of the IP addresses assigned to the GRE tunnel endpoint on each hypervisor; you’ll need those addresses shortly. Once you’ve established the GRE tunnel endpoint on each hypervisor, test connectivity between the endpoints using ping or a similar tool. If connectivity is good, you’re clear to proceed; if not, you’ll need to resolve that before moving on.

Establishing the GRE Tunnel

By this point, you’ve created the isolated bridge, established the GRE tunnel endpoints, and tested connectivity between those endpoints. You’re now ready to establish the GRE tunnel.

Use this command to add a GRE interface to the isolated bridge on each hypervisor:

ovs-vsctl add-port br2 gre0 -- set interface gre0 type=gre \
options:remote_ip=<GRE tunnel endpoint on other hypervisor>

Substitute the name of the isolated bridge you created earlier here for br2 and feel free to use something other than gre0 for the interface name. I think using gre as the base name for the GRE interfaces makes sense, but run with what makes sense to you.

Once you repeat this command on both hypervisors, the GRE tunnel should be up and running. (Troubleshooting the GRE tunnel is one area where my knowledge is weak; anyone have any suggestions or commands that we can use here?)

Testing VM Connectivity

As part of this process, I spun up an Ubuntu 12.04 server image on each hypervisor (using virt-install as I outlined here), attached each VM to the isolated bridge created earlier on that hypervisor, and assigned each VM an IP address from an entirely different subnet than the physical network was using (in this case, 10.10.10.x).

Here’s the output of the route -n command on the Ubuntu guest, to show that it has no knowledge of the “external” IP subnet—it knows only about its own interfaces:

ubuntu:~ root$ route -n
Kernel IP routing table
Destination  Gateway       Genmask        Flags Metric Ref Use Iface
0.0.0.0      10.10.10.254  0.0.0.0        UG    100    0   0   eth0
10.10.10.0   0.0.0.0       255.255.255.0  U     0      0   0   eth0

Similarly, here’s the output of the route -n command on the CentOS host, showing that it has no knowledge of the guest’s IP subnet:

centos:~ root$ route -n
Kernel IP routing table
Destination  Gateway        Genmask        Flags Metric Ref Use Iface
192.168.2.0  0.0.0.0        255.255.255.0  U     0      0   0   tep0
192.168.1.0  0.0.0.0        255.255.255.0  U     0      0   0   mgmt0
0.0.0.0      192.168.1.254  0.0.0.0        UG    0      0   0   mgmt0

In my case, VM1 (named web01) was given 10.10.10.1; VM2 (named web02) was given 10.10.10.2. Once I went through the steps outlined above, I was able to successfully ping VM2 from VM1, as you can see in this screenshot:

VM-to-VM connectivity over GRE tunnel

(Although it’s not shown here, connectivity from VM2 to VM1 was obviously successful as well.)

“OK, that’s cool, but why do I care?” you might ask.

In this particular context, it’s a bit of a science experiment. However, if you take a step back and begin to look at the bigger picture, then (hopefully) something starts to emerge:

  • We can use an encapsulation protocol (GRE in this case, but it could have just as easily been STT or VXLAN) to isolate VM traffic from the physical network and from other VM traffic. (Think multi-tenancy.)
  • While this process was manual, think about some sort of controller (an OpenFlow controller, perhaps?) that could help automate this process based on its knowledge of the VM topology.
  • Using a virtualized router or virtualized firewall, I could easily provide connectivity into or out of this isolated (encapsulated) private network. (This is probably something I’ll experiment with later.)
  • What if we wrapped some sort of orchestration framework around this, to help deploy VMs, create networks, add routers/firewalls automatically, all based on the customer’s needs? (OpenStack Networking, anyone?)

Anyway, I hope this is helpful to someone. As always, I welcome feedback and suggestions for improvement, so feel free to speak up in the comments below. Vendor disclosures, where appropriate, are greatly appreciated. Thanks!

Tags: , , , , ,

Welcome to Technology Short Take #32, the latest installment in my irregularly-published series of link collections, thoughts, rants, raves, and miscellaneous information. I try to keep the information linked to data center technologies like networking, storage, virtualization, and the like, but occasionally other items slip through. I hope you find something useful.

Networking

  • Ranga Maddipudi (@vCloudNetSec on Twitter) has put together two blog posts on vCloud Networking and Security’s App Firewall (part 1 and part 2). These two posts are detailed, hands-on, step-by-step guides to using the vCNS App firewall—good stuff if you aren’t familiar with the product or haven’t had the opportunity to really use it.
  • The sentiment behind this post isn’t unique to networking (or networking engineers), but that was the original audience so I’m including it in this section. Nick Buraglio climbs on his SDN soapbox to tell networking professionals that changes in the technology field are part of life—but then provides some specific examples of how this has happened in the past. I particularly appreciated the latter part, as it helps people relate to the fact that they have undergone notable technology transitions in the past but probably just don’t realize it. As I said, this doesn’t just apply to networking folks, but to everyone in IT. Good post, Nick.
  • Some good advice here on scaling/sizing VXLAN in VMware deployments (as well as some useful background information to help explain the advice).
  • Jason Edelman goes on a thought journey connecting some dots around network APIs, abstractions, and consumption models. I’ll let you read his post for all the details, but I do agree that it is important for the networking industry to converge on a consistent set of abstractions. Jason and I disagree that OpenStack Networking (formerly Quantum) should be the basis here; he says it shouldn’t be (not well-known in the enterprise), I say it should be (already represents work created collaboratively by multiple vendors and allows for different back-end implementations).
  • Need a reasonable introduction to OpenFlow? This post gives a good introduction to OpenFlow, and the author takes care to define OpenFlow as accurately and precisely as possible.
  • SDN, NFV—what’s the difference? This post does a reasonable job of explaining the differences (and the relationship) between SDN and NFV.

Servers/Hardware

  • Chris Wahl provides a quick overview of the HP Moonshot servers, HP’s new ARM-based offerings. I think that Chris may have accidentally overlooked the fact that these servers are not x86-based; therefore, a hypervisor such as vSphere is not supported. Linux distributions that offer ARM support, though—like Ubuntu, RHEL, and SuSE—are supported, however. The target market for this is massively parallel workloads that will benefit from having many different cores available. It will be interesting to see how the support of a “Tier 1″ hardware vendor like HP affects the adoption of ARM in the enterprise.

Security

  • Ivan Pepelnjak talks about a demonstration of an attack based on VM BPDU spoofing. In vSphere 5.1, VMware addressed this potential issue with a feature called BPDU Filter. Check out how to configure BPDU Filter here.

Cloud Computing/Cloud Management

  • Check out this post for some vCloud Director and RHEL 6.x interoperability issues.
  • Nick Hardiman has a good write-up on the anatomy of an AWS CloudFormation template.
  • If you missed the OpenStack Summit in Portland, Cody Bunch has a reasonable collection of Summit summary posts here (as well as materials for his hands-on workshops here). I was also there, and I have some session live blogs available for your pleasure.
  • We’ve probably all heard the “pets vs. cattle” argument applied to virtual machines in a cloud computing environment, but Josh McKenty of Piston Cloud Computing asks whether it is now time to apply that thinking to the physical hosts as well. Considering that the IT industry still seems to be struggling with applying this line of thinking to virtual systems, I suspect it might be a while before it applies to physical servers. However, Josh’s arguments are valid, and definitely worth considering.
  • I have to give Rob Hirschfeld some credit for—as a member of the OpenStack Board—acknowledging that, in his words, “we’ve created such a love fest for OpenStack that I fear we are drinking our own kool aide.” Open, honest, transparent dealings and self-assessments are critically important for a project like OpenStack to succeed, so kudos to Rob for posting a list of some of the challenges facing the project as adoption, visibility, and development accelerate.

Operating Systems/Applications

Nothing this time around, but I’ll stay alert for items to add next time.

Storage

  • Nigel Poulton tackles the question of whether ASIC (application-specific integrated circuit) use in storage arrays elongates the engineering cycles needed to add new features. This “double edged sword” argument is present in networking as well, but this is the first time I can recall seeing the question asked about modern storage arrays. While Nigel’s article specifically refers to the 3PAR ASIC and its relationship to “flash as cache” functionality, the broader question still stands: at what point do the drawbacks of ASICs begin to outweight the benefits?
  • Quite some time ago I pointed readers to a post about Target Driven Zoning from Erik Smith at EMC. Erik recently announced that TDZ works after a successful test run in a lab. Awesome—here’s hoping the vendors involved will push this into the market.
  • Using iSER (iSCSI Extensions for RDMA) to accelerate iSCSI traffic seems to offer some pretty promising storage improvements (see this article), but I can’t help but feel like this is a really complex solution that may not offer a great deal of value moving forward. Is it just me?

Virtualization

  • Kevin Barrass has a blog post on the VMware Community site that shows you how to create VXLAN segments and then use Wireshark to decode and view the VXLAN traffic, all using VMware Workstation.
  • Andre Leibovici explains how Horizon View Multi-VLAN works and how to configure it.
  • Looking for a good list of virtualization and cloud podcasts? Look no further.
  • Need Visio stencils for VMware? Look no further.
  • It doesn’t look like it has changed much from previous versions, but nevertheless some people might find it useful: a “how to” on virtualization with KVM on CentOS 6.4.
  • Captain KVM (cute name, a take-off of Captain Caveman for those who didn’t catch it) has a couple of posts on maximizing 10Gb Ethernet on KVM and RHEV (the KVM post is here, the RHEV post is here). I’m not sure that I agree with his description of LACP bonds (“2 10GbE links become a single 20GbE link”), since any given flow in a LACP configuration can still only use 1 link out of the bond. It’s more accurate to say that aggregate bandwidth increases, but that’s a relatively minor nit overall.
  • Ben Armstrong has a write-up on how to install Hyper-V’s integration components when the VM is offline.
  • What are the differences between QuickPrep and Sysprep? Jason Boche’s got you covered.

I suppose that’s enough information for now. As always, courteous comments are welcome, so feel free to add your thoughts in the comments below. Thanks for reading!

Tags: , , , , , , , , , , , ,

Is there a difference between network virtualization and Software-Defined Networking (SDN)? If so, what is the relationship between them? Is one a subset of the other? This is a topic that is increasingly being discussed and debated. So, in a similar fashion to my post on network overlays vs. network virtualization, I thought I’d weigh in with some thoughts. This post, like all of my posts, is intended to help spark a conversation, so I encourage you to share your thoughts in the comments after you’ve finished reading.

Last week I had the opportunity to join John Troyer on the VMware Communities podcast. My purpose, as John put it when he invited me, was to “gently introduce” the community to the idea of network virtualization, which is where I now spend most of my time since joining VMware in early February. One of the topics we discussed on the podcast was the relationship between SDN and network virtualization, which sparked this blog post by Keith Townsend, one of the attendees. I encourage you to read Keith’s full blog post, but I think the key point he makes is right here:

How is this different from Software Defined Networking or SDN? I think VMware (who Scott works for) would like you to consider SDN as just the abstraction of the control plane from the physical plane…I believe the industry outside of VMware is defining SDN in a broader sense. (emphasis mine)

I’ll get to the definition of SDN in just a moment, but first let’s look at the definition of network virtualization. In my earlier post on network overlays vs. network virtualization, I provided a definition (taken from a presentation Dan Wendlandt gave at the OpenStack Summit in Portland) for network virtualization: “A faithful and accurate reproduction of the physical network that is fully isolated and provides both location independence and physical network state independence.” Several of the readers of that particular blog post then came up with a definition of a virtualized network: “A virtualized network is one which can be instantiated, operated, and removed without physical asset interaction by the network manager.”

These definitions are, in my humble opinion, reasonably precise and accurate. They are easy to understand and easy to explain to others who aren’t familiar with the concept of network virtualization.

With this definition in hand, let’s compare network virtualization to SDN. To compare network virtualization to SDN, we must first define SDN. So—how do we define SDN?

There are two definitions we can use for SDN:

  1. We can use the original definition when SDN was first coined in 2009.
  2. We can use the definition currently in use within the industry.

Let’s look at the first scenario, and let’s examine SDN and network virtualization in the context of the original definition of SDN. I’ve had some discussions with Martin Casado, the so-called “Father of SDN” (he’s going to hate me for using that term—sorry Martin!), about what SDN meant when it was first coined. According to Martin, the term SDN originally referred to a change in the network architecture to include a) decoupling the distribution model of the control plane from the data plane; and b) generalized rather than fixed function forwarding hardware.

Bruce Davie (Principal Engineer at VMware and long-time networking guru) recently talked with some of the other creators of OpenFlow in preparation for his presentation at ONS 2013. According to the research that Bruce did, the term “SDN” was first coined in 2009 to refer specifically to the work being done on OpenFlow. Their original definition of SDN is consistent with Martin’s definition–it refers to the decoupling of the control plane from the data plane.

Therefore, if we look at the original definition of SDN—not VMware’s definition, but the definition of those involved in the creation of the term in 2009—it becomes fairly clear that SDN is not the same as network virtualization. Instead, SDN is a tool that can be leveraged in order to create network virtualization. SDN is a tool or a technique that can be used in the creation of virtualized networks and in satisfying the definition of network virtualization. This subtle distinction, by the way, is one that Bruce addressed in his recent ONS 2013 presentation.

Now, let’s talk about the second scenario, and let’s examine SDN and network virtualization in the context of the current industry definition of SDN. The problem that we have here is that “SDN,” like “cloud,” can really be made to mean just about anything. Think about it: existing protocols like OSPF and BGP (which run in software) manipulate the forwarding rules in the data plane—does that make them SDN, too? What about NX-OS, JUNOS, or EOS? These are all software—does that make them SDN? What about having APIs on switches? What about virtualized load balancers? Or virtualized firewalls? Are these SDN too? What about products that existed long before the SDN term was coined in 2009? Are they now SDN? No, we generally don’t accept all the vast assortment of software that’s involved in networking today as SDN, because SDN originated as a term to refer to the separation of the control plane from the data plane. I think it’s telling when Martin Casado indicates that he doesn’t even know what SDN means anymore, the term is probably being overused. SDN-washing, anyone?

Therefore, we can’t take the current industry definition of SDN, because SDN currently means all things to all people. If your company is doing “something cool” in networking, it’s probably going to be called SDN. There is no clear, crisp, understandable definition.

This lack of clarity around “what is SDN vs. what isn’t SDN” is part of the reason, I think, that VMware has converged on the use of network virtualization. This isn’t just about differentiating itself from all the other SDN players out there, but it’s also about giving customers something concrete and precise they can understand.

Now, if we want to—as an industry—come up with a clearer and more precise definition of SDN, then we can revisit this discussion. In fact, I encourage you to speak up in the comments below with what you might consider would be a useful, clear, precise definition of SDN. Clear definitions and better understanding enable meaningful communication, which can only be beneficial for everyone.

Tags: , ,

This is a session titled “Networking in the cloud: An SDN primer.” It’s led by Ben Cherian, Chief Strategy Officer for Midokura. Ben indicates he’ll try to remain as impartial as possible while attempting to describe and define SDN.

Ben asserts that the basic driver pushing folks toward SDN is that the current state of networking in the cloud is too complex and too manual. As a result, the first question that people start asking is, how can I automate this? Telecom has gone through this before, and data networking is in a similar state of flux and development.

The presenter next discusses Almon Strowger, who invented the first electromechanical switch. His work–which might have been driven by some level of paranoia–led to automated phone switching and the rotary phone. Almon Strowger wanted to address privacy concerns and intentional human errors, and along the way he also solved unintentional human errors, connection speeds, and lower operational costs.

Ben indicates that the cloud—in particular, networking—is in a similar position. It’s time for the “Almon Strowger” of cloud networking to solve the challenges that keep networking from scaling.

The presenter next covers the concept of a control plane and the data plane. Abstraction is a key tenet of computer science, and when the control plane and the data plane reside on the same physical device (which is typically the case in traditional networks today), there is no abstraction. Abstraction exists in other areas of computing, but not in networking. To bring that abstraction into networking, a simple example is to separate the control plane into a separate controller that exists apart from the data plane. This is basic SDN.

Ben sees three use cases for SDN:

  1. IaaS
  2. Data center fabrics
  3. Carrier/WAN use cases

Looking at these in reverse order, examples of SDN technologies that you could see here would be a “hybrid control plane” leveraging either BGP (a distributed control plane) or OpenFlow (centralized control plane). In the data center fabric space, this involves the use of OpenFlow to manage multiple switches. Finally, in the IaaS space, you see software-based solutions and overlays. Examples of companies in this IaaS space are Midokura, VMware/Nicira, and others.

Some key requirements for IaaS “cloud networking”/SDN:

  • Multi-tenancy
  • L2 isolation
  • L3 isolation
  • Scalable control plane
  • NAT (floating IP)
  • ACLs
  • Stateful (L4) firewall
  • VPN
  • BGP gateway
  • RESTful API
  • Integration with CMS (like OpenStack)

Looking at this list of requirements, Ben feels like you can eliminate the technologies leveraged in the carrier/WAN and data center fabric SDN use cases, because these technologies don’t properly address all the IaaS/cloud networking requirements.

Ben next reviews a networking diagram that shows how these various requirements translate into cloud networks.

The candidate models to address these requirements are:

  • Traditional network
  • Hop=by-hop OpenFlow
  • Edge-to-edge IP overlays

Using traditional network models, VLANs become a constraint (only 4096 VLANs available means only 4096 tenants). This constrains L2 isolation. L3 isolation is constrained by VRFs, which are not fault tolerant, require expensive hardware, and don’t scale well.

If you wanted to use an OpenFlow fabric, the issue is more about the limitations of the physical switch itself. Storing the state in physical switches has issues with scale, not fast enough to update, and no atomicity of updates. There is also the issue of provisioning the physical switches that will then be controlled by OpenFlow. This approach also doesn’t address the other “higher level” concerns of cloud networking.

Edge-to-edge IP overlays are the method that Ben (and his company) prefers. Isolation is provided without VLANs, providing additional scalability. Only IP connectivity is required. You can use a scalable IGP (iBGP, OSPF) to build a reliable multi-path underlay network. This sort of thinking is inspired by a research paper by Microsoft regarding VL2 (no link provided).

Trends that support this sort of solution:

  • Faster packet processing on x86 servers at the edge
  • Clos (fat tree/leaf-spine) networks for the underlay
  • Merchant silicon brings down the cost of efficient IP switches
  • Optical intra-DC networks for plentiful bandwidth

The presenter also alludes to the use of configuration management tools (Chef, Puppet) on merchant silicon-based switches.

Ben next shows a diagram that illustrates an overlay network and an underlay network, and he walks through how traffic flows work (both from the overlay perspective as well as the underlay perspective).

Naturally, Ben believes that overlays are the right approach—but you still need a scalable control plane.

At this point, he opens the session up for questions and answers.

Tags: , ,

Dan Wendlandt said something today in the NVP deep dive session (liveblog of the session here) that really crystallized something for me. I thought perhaps I might be the only one that was seeing a trend, but Dan’s comment leads me to believe there are others seeing this trend as well. Here’s the quote, taken from my liveblog of the session:

It is important to note, as Dan does, that a tunneling protocol alone is not network virtualization.

There’s a lot of buzz in the industry about network virtualization and network overlays, and often those terms are used interchangeably. People talk about the need for multi-tenancy and address space isolation, point to network overlays like VXLAN, NVGRE, and STT as the answer to all our problems, and in so doing they (inadvertently) conflate network overlays with network virtualization. Network overlays and network virtualization aren’t the same thing, and people that use them interchangeably probably don’t fully understand what’s involved.

<aside>By the way, if you’re not familiar with the idea of network overlays, I’d recommend reading this, this, and this to get you started. There’s plenty more out there, but those three articles will at least prime the pump, I think.</aside>

Network overlays are great for address space isolation (for example, isolating duplicate MAC addresses, duplicate IP addresses, or duplicate VLAN IDs). As such, network overlays can be an important part of network virtualization. You need more than a network overlay, though, to have network virtualization; you also need virtualized network services (like NAT, firewalls, ACLs, QoS, routing, and the like) and you need a control plane (else how would you coordinate the various pieces within the network virtualization solution?). The overlay protocol is just one piece of the puzzle, so using “network overlay” interchangeably with “network virtualization” is incorrect.

As always, I welcome the input of those more educated/knowledgeable than I am. If you’re a networking expert (or a virtual networking expert), feel free to speak up in the comments and correct my misunderstanding or misconceptions (please disclose vendor affiliations). I’m always open to deepening my knowledge—and helping others with their understanding and knowledge along the way.

Tags: , ,

This is a “201 level” technical deep dive on Nicira Network Virtualization Platform (NVP). It was supposed to be led by Brad Hedlund, but due to unforeseen circumstances Brad was unable to make it to the Summit. Instead, Dan Wendlandt is leading the session. Dan, if you aren’t familiar, was the former Quantum PTL. I’m already familiar with all of this stuff (naturally), but wanted to sit in this session so that I could liveblog it for others’ benefit.

Dan starts the session with an explanation of network virtualization and why its important. He provides a technical definition of network virtualization as a “faithful reproduction of physical networks’ that is “fully isolated” and provides “location independence” while being “physical network state independent". This is NOT the same as simply running network software inside a VM. (While that is valuable, that’s not the same as network virtualization.)

NVP 1.0 was released in July 2011; the latest version, NVP 3.0, was released recently. The end of April will see NVP 3.1.

The only requirement from the physical network is IP connectivity. Open vSwitch (OVS) is used in all the various components, like Service Nodes and L2/L3 Gateways, and a set of out-of-band controllers manage all the components. Dan contrasts the “non-virtualized” (more physically-oriented) and “virtualized” (more logically oriented) views of NVP and the functionality is presents.

Starting at the “bottom” of the stack:

  • NVP wants to treat your network as a pool of resource capacity to be sliced on-demand for tenants.
  • NVP relies only on commodity features (specifically, L3 forwarding).
  • Configuration is done once (when the equipment is racked).
  • No humans in the loop when provisioning occurs.
  • Have the flexibility to choose/change architecture without it impacting the logical abstractions you’ve created for your workloads.

Next Dan shows that you could use a very scalable leaf-spine L3 network design, but you could also leverage existing network architectures—NVP only requires L3 connectivity.

The session then moves on to discuss Open vSwitch (OVS), which forms the basis for many of the Quantum plugins in OpenStack. It’s important to understand OVS is really like a generic “engine” that can be programmed to provide various levels of functionality, from “dumb” L2 learning to more sophisticated OpenFlow-based and tunnel-oriented networking.

This leads into a discussion of why L2/L3 tunneling as employed by NVP is important in satisfying the definition of network virtualization provided earlier. In order to truly isolate networks, NVP leverages tunneling protocols to encapsulate the traffic and provide the isolation and decoupling properties.

It is important to note, as Dan does, that a tunneling protocol alone is not network virtualization.

Next Dan moves on to describing the NVP controllers, which are x86-based software that manages OVS southbound and uses a northbound API (to expose to Quantum, for example). NVP controllers are NEVER in the data plane. The NVP controllers leverage a clustered/distributed architecture that provides both high availability (HA) and scale-out functionality.

As mentioned earlier, the controllers also provide the NVP API northbound to applications and/or cloud management systems (CMSes, like OpenStack and CloudStack). In an OpenStack environment, the NVP API is consumed by Quantum to create logical networks and logical network services.

Dan next provides an overview of the communication flow between the various components in a Quantum+NVP deployment.

The session now shifts focus to gateways, which provide the mechanism to get into or out of logical networks created by NVP. An L2 gateway allows you to map physical systems or VLANs into a logical switch managed/created by NVP. (Kudos to Dan, who actually goes with a very complex example–mapping in a remote VLAN across the WAN as an adjacent L2 segment–in order to explain L2 gateways.) He also explains the use of L3 gateways, which provide routed access into and out of the NVP logical networks. One of the interesting things about L3 gateways is that they are highly available (using failure zones) and a scale-out architecture (meaning that multiple L3 gateway appliances are supported).

Service nodes are used to handle broadcast and multicast packet replication. Using service nodes allows you to offload these tasks, if you wish, from the hypervisor compute nodes. Like the L3 gateways, the service nodes are built to provide high availability and scale-out functionality.

Dan wraps up the session with a review of the management and operations tool that NVP provides. NVP provides tunnel status, has a “port connectivity” tool that shows VM-to-VM connectivity, and actively inject traffic into flows to test network connectivity.

NVP isn’t just about scale. It’s about:

  • Data plane performance
  • Fast, reliable high availability
  • Rich logical capabilities (ACLs, QoS, statistics)
  • Ability to bring physical workloads into logical networks, onboard remote customers
  • Rich management and operations tools

At this point, Dan opens the session for questions and answers.

Tags: ,

Welcome to Technology Short Take #31, my irregularly published series that takes a look at links, posts, articles, and thoughts from around the web related to core data center technologies. I hope that you find something useful!

Networking

  • Umair Hoodbhoy speculates in this post that the inclusion of Cisco’s ONE Controller in the recently-announced “Daylight” effort could mean the end for Big Switch’s Floodlight. (Umair’s play on words—”in Daylight there is no need for Floodlights”—is cute.)
  • Of course, Big Switch recently moved to “diversify,” if you will, away from just Floodlight with the introduction of Switch Light. As usual, Brent Salisbury has an excellent write-up on Switch Light, so I recommend reading his post. Switch Light seems like a good idea—more competition is always good, isn’t that what people say?—but I wonder how much cooperation Big Switch will get from the major networking vendors with regards to OpenFlow interoperability now that Big Switch is competing even more directly with them via Switch Light.
  • I think I might have mentioned this before (sorry if so), but here’s a good write-up on using the Edge Gateway CLI for monitoring and troubleshooting. Nice.
  • Greg Ferro examines a potential SDN use case (an OpenFlow use case) in the form of enterprise firewall migrations.
  • Just getting started in the networking field? Last year, Brent Salisbury put together a couple of great posts that help “refresh the basics” of networking. Part 1 covers Ethernet, IP, and TCP headers in Wireshark captures; part 2 pulls that together to show how the headers encapsulate in the OSI stack. If you’re not already familiar with this information, this is good reading.

Servers/Hardware

Nothing this time around, but I’ll stay alert for information I can include in the next Technology Short Take!

Security

  • Mounting guest disk images on the host? That’s a no-no from a security perspective—see here to learn why.
  • Mike Foley shared recently that the release candidate of the vSphere 5 Security Hardening Guide has been released. Check it out here.

Cloud Computing/Cloud Management

  • I haven’t had the chance to actually try it out myself, but Blueprint looks interesting. As the website describes it, it’s designed to “reverse engineer” servers so that you can migrate them into a configuration management system like Chef or Puppet.
  • Looking for a decent high-level overview of OpenStack and how it works? Check out this article titled “In a nutshell: How OpenStack works”. (As an aside, I think it’s awesome how Ken Pepple’s diagrams show up in all sorts of places. One day I hope my material proves as useful to folks.)
  • If you use Puppet for configuration management and want to deploy GlusterFS, be sure to check out this Puppet Forge module. I’ve tested it and it works as advertised.
  • This is an older article (published in May of last year), and it’s a bit on the lengthy side, but I like the tack the author uses. He describes cloud as the synthesis of many different forms of innovation within IT, pulling together things like open source, virtualization, distributed programming, NoSQL, DevOps/NoOps, distributed teams, dynamic languages, and Big Data (among others). He then goes on to provide examples of how organizations building or leveraging clouds are synthesizing these various independent technological innovations together. If you have a few minutes (as I said, it’s a bit on the lengthy side), I’d recommend reading it.

Operating Systems/Applications

  • This series is a bit older, but an interesting one nevertheless. Brian McClain, who was one of the presenters in a Cloud Foundry/BOSH session I liveblogged at VMworld 2012, has his own personal blog and posted a series of articles on using BOSH with vSphere. I hadn’t really considered how one might use BOSH for deploying (and managing) multi-VM applications on vSphere, but Brian provides some practical examples. Part 1 of the series is here, followed by part 2, part 3, part 4, and part 5.
  • Like using Markdown on OS X? You might find these handy.
  • Ah, the good old days of DOS…reborn as FreeDOS.
  • Go ahead, read up on YAML. You know you want to. Well, YAML is used in both Hiera (can be used with Puppet) and BOSH, after all.
  • Here’s another interesting tool that I haven’t had the opportunity to actually test myself. Oz looks like it could be quite useful—especially in virtualized/cloud computing environments—but I’m struggling to determine why I should use Oz instead of OS-specific mechanisms (like a kickstart file). If anyone has used Oz and can shed some light on this question, I’d appreciate it.
  • You may have heard that I recently switched from TextMate to BBEdit as my default OS X text editor (and therefore the tool whereby I do most of my content generation). As part of the switch, I found this to be helpful. (I might post a separate entry about the switch, if enough people seem interested in reading about it.)

Storage

Virtualization

That’s it for this time. I have plenty more links I wanted to share, but I figured I’d better not let this post get any longer. As always, courteous comments are welcome, so I invite you to participate in the conversation by adding your thoughts below.

Tags: , , , , , , , , , ,

If you want to read a masterpiece of an article describing how OpenStack Quantum works with Open vSwitch (OVS) and the OVS plugin, this write-up by Florian Otel should be your first stop.

Florian’s treatise—it’s far too in-depth and comprehensive to be referred to as anything else—covers just about everything and anything you want to know about getting OpenStack Quantum and OVS up and running. Florian not only walks readers through the configuration, but then “peels back the covers” to show them what’s happening underneath. For example, the write-up documents the settings for using the Quantum DHCP agent and L3 agent, and then goes on to explain (and demonstrate!) how each of these components leverages different Linux network namespaces to accomplish their tasks.

I highly recommend reading and reviewing this resource, and on behalf of everyone else out there like me who’s searching for more documentation on how the various OpenStack components work, I’d like to thank Florian for his efforts in putting this together. Well done.

Tags: , ,

« Older entries