Examining Open vSwitch Traffic Patterns

In this post, I want to provide some additional insight on how the use of Open vSwitch (OVS) affects—or doesn’t affect, in some cases—how a Linux host directs traffic through physical interfaces, OVS internal interfaces, and OVS bridges. This is something that I had a hard time understanding as I started exploring more advanced OVS configurations, and hopefully the information I share here will be helpful to others.

To help structure this discussion, I’m going to walk through a few different OVS configurations and scenarios. In these scenarios, I’ll use the following assumptions:

  • The physical host has four interfaces (eth0, eth1, eth2, and eth3)
  • The host is running Linux with KVM, libvirt, and OVS installed

Scenario 1: Simple OVS Configuration

In this first scenario let’s look at a relatively simple OVS configuration, and examine how Linux host and guest domain traffic moves into or out of the network.

Let’s assume that our OVS configuration looks something like this (this is the output from ovs-vsctl show):

bc6b9e64-11d6-415f-a82b-5d8a61ed3fbd
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "eth0"
            Interface "eth0"
    Bridge "br1"
        Port "br1"
            Interface "br1"
                type: internal
        Port "eth1"
            Interface "eth1"
    ovs_version: "1.7.1"

This is a pretty simple configuration; there are two bridges, each with a single physical interface. Let’s further assume, for the purposes of this scenario, that eth2 has an IP address and is working properly to communicate with other hosts on the network. The eth3 interface is shutdown.

So, in this scenario, how does traffic move into or out of the host?

  1. Traffic from a guest domain: Traffic from a guest domain will travel through the OVS bridge to which it is attached (you’d see an additional “vnet0″ port and interface appear on that bridge when you start the guest domain). So, a guest domain attached to br0 would communicate via eth0, and a guest domain attached to br1 would communicate via eth1. No real surprises here.

  2. Traffic from the Linux host: Traffic from the Linux host itself will not communicate over any of the configured OVS bridges, but will instead use its native TCP/IP stack and any configured interfaces. Thus, since eth2 is configured and operational, all traffic to/from the Linux host itself will travel through eth2.

The interesting point (to me, at least) about #2 above is that this includes traffic from the OVS process itself. In other words, if the OVS process(es) need to communicate across the network, they won’t use the bridges—they’ll use whatever interfaces the Linux host uses to communicate. This is one thing that threw me off: because OVS is itself a Linux process, when OVS needs to communicate across the network it will use the Linux network stack to do so. In this scenario, then, OVS would not communicate over any configured bridge, but instead using eth2. (This makes perfect sense now, but I recall that it didn’t earlier. Maybe it’s just me.)

Scenario 2: Adding Bonding

In this second scenario, our OVS configuration changes only slightly:

bc6b9e64-11d6-415f-a82b-5d8a61ed3fbd
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "bond0"
            Interface "eth0"
            Interface "eth1"
    ovs_version: "1.7.1"

In this case, we’re now leveraging a bond that contains two physical interfaces (eth0 and eth1). (By the way, I have a write-up on configuring OVS and bonds, if you need/want more information.) The eth2 interface still has an IP address assigned and is up and communicating properly. The physical eth3 interface is shutdown.

How does this affect the way in which traffic is handled? It doesn’t, really. Traffic from guest domains will still travel across br0 (since this is the only configured OVS bridge), and traffic from the Linux host—including traffic from OVS itself—will still use whatever interfaces are determined by the host’s TCP/IP stack. In this case, that would be eth2.

Scenario 3: The Isolated Bridge

Let’s look at another OVS configuration, the so-called “isolated bridge”. This is a configuration that is commonly found in implementations using NVP, OpenStack, and others, and it’s a configuration that I recently discussed in my post on GRE tunnels and OVS.

Here’s the configuration:

bc6b9e64-11d6-415f-a82b-5d8a61ed3fbd
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "bond0"
            Interface "eth0"
            Interface "eth1"
    Bridge "br-int"
        Port "br-int"
            Interface "br-int"
                type: internal
        Port "gre0"
            Interface "gre0"
                type: gre
                options: {remote_ip="192.168.1.100"}
    ovs_version: "1.7.1"

As with previous configurations, we’ll assume that eth2 is up and operational, and eth3 is shutdown. So how does traffic get directed in this configuration?

  1. Traffic from guest domains attached to br0: This is as before—traffic will go out one of the physical interfaces in the bond, according to the bonding configuration (active-standby, LACP, etc.). Nothing unusual here.

  2. Traffic from the Linux host: As before, traffic from processes on the Linux host will travel out according to the host’s TCP/IP stack. There are no changes from previous configurations.

  3. Traffic from guest domains attached to br-int: Now, this is where it gets interesting. Guest domains attached to br-int (named “br-int” because in this configuration the isolated bridge is often called the “integration bridge”) don’t have any physical interfaces they can use; they can only use the GRE tunnel. Here’s the “gotcha”, so to speak: the GRE tunnel is created and maintained by the OVS process, and therefore it uses the host’s TCP/IP stack to communicate across the network. Thus, traffic from guest domains attached to br-int would hit the GRE tunnel, which would travel through eth2.

I’ll give you a second to let that sink in.

Ready now? Good! The key to understanding #3 is, in my opinion, understanding that the tunnel (a GRE tunnel in this case, but the same would apply to a VXLAN or STT tunnel) is created and maintained by the OVS process. Thus, because it is created and maintained by a process on the Linux host (OVS itself), the traffic for the tunnel is directed according to the host’s TCP/IP stack and IP routing table(s). In this configuration, the tunnels don’t travel through any of the configured OVS bridges.

Scenario 4: Leveraging an OVS Internal Interface

Let’s keep ramping up the complexity. For this scenario, we’ll use an OVS configuration that is the same as in the previous scenario:

bc6b9e64-11d6-415f-a82b-5d8a61ed3fbd
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "bond0"
            Interface "eth0"
            Interface "eth1"
    Bridge "br-int"
        Port "br-int"
            Interface "br-int"
                type: internal
        Port "gre0"
            Interface "gre0"
                type: gre
                options: {remote_ip="192.168.1.100"}
    ovs_version: "1.7.1"

The difference, this time, is that we’ll assume that eth2 and eth3 are both shutdown. Instead, we’ve assigned an IP address to the br0 interface on bridge br0. OVS internal interfaces, like br0, can appear as “physical” interfaces to the Linux host, and therefore can be assigned IP addresses and used for communication. This is the approach I used in describing how to run host management across OVS.

Here’s how this configuration affects traffic flow:

  1. Traffic from guest domains attached to br0: No change here. Traffic from guest domains attached to br0 will continue to travel across the physical interfaces in the bond (eth0 and eth1, in this case).

  2. Traffic from the Linux host: This time, the only interface that the Linux host has is the br0 internal interface. The br0 internal interface is attached to br0, so all traffic from the Linux host will travel across the physical interfaces attached to the bond (again, eth0 and eth1).

  3. Traffic from guest domains attached to br-int: Because Linux host traffic is directed through br0 by virtue of using the br0 internal interface, this means that tunnel traffic is also directed through br0, as dictated by the Linux host’s TCP/IP stack and IP routing table(s).

As you can see, assigning an IP address to an OVS internal interface has a real impact on the way in which the Linux host directs traffic through OVS. This has both positive and negative impacts:

  • One positive impact is that it allows for Linux host traffic (such as management or tunnel traffic) to take advantage of OVS bonds, thus gaining some level of NIC redundancy.
  • A negative impact is that OVS is now “in band,” so upgrades to OVS will be disruptive to all traffic moving through OVS—which could potentially include host management traffic.

Let’s take a look at one final scenario.

Scenario 5: Using Multiple Bridges and Internal Interfaces

In this configuration, we’ll use an OVS configuration that is very similar to the configuration I showed in my post on GRE tunnels with OVS:

bc6b9e64-11d6-415f-a82b-5d8a61ed3fbd
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "mgmt0"
            Interface "mgmt0"
                type: internal
        Port "bond0"
            Interface "eth0"
            Interface "eth1"
    Bridge "br1"
        Port "br1"
            Interface "br1"
                type: internal
        Port "tep0"
            Interface "tep0"
                type: internal
        Port "bond1"
            Interface "eth2"
            Interface "eth3"
    Bridge "br-int"
        Port "br-int"
            Interface "br-int"
                type: internal
        Port "gre0"
            Interface "gre0"
                type: gre
                options: {remote_ip="192.168.1.100"}
    ovs_version: "1.7.1"

In this configuration, we have three bridges. br0 uses a bond that contains eth0 and eth1; br1 uses a bond that contains eth2 and eth3; and br-int is an isolated bridge with no physical interfaces. We have two “custom” internal interfaces, mgmt0 (on br0) and tep0 (on br1), to which IP addresses have been assigned and which are successfully communicating across the network. We’ll assume that mgmt0 and tep0 are on different subnets, and that tep0 is assigned to the 192.168.1.0/24 subnet.

How does traffic flow in this scenario?

  1. Traffic from guest domains attached to br0: The behavior here is as it has been in previous configurations—guest domains attached to br0 will communicate across the physical interfaces in the bond.

  2. Traffic from the Linux host: As it has been in previous scenarios, traffic from the Linux host is driven by the host’s TCP/IP stack and IP routing table(s). Because mgmt0 and tep0 are on different subnets, traffic from the Linux host will go out either br0 (for traffic moving through mgmt0) or br1 (for traffic moving through tep0), and thus will utilize the corresponding physical interfaces in the bonds on those bridges.

  3. Traffic from guest domains attached to br-int: Because the GRE tunnel is on the 192.168.1.0/24 subnet, traffic for the GRE tunnel—which is created and maintained by the OVS process on the Linux host itself—will travel through tep0, which is attached to br1. Thus, the physical interfaces eth2 and eth3 would be leveraged for the GRE tunnel traffic.

Summary

The key takeaway from this post, in my mind, is understanding where traffic originates, and separating the idea of OVS as a switching mechanism (to handle guest domain traffic) as well as a Linux host process itself (to create and maintain tunnels between hosts).

Hopefully this information is helpful. I am, of course, completely open to your comments, questions, and corrections, so feel free to speak up in the comments below. Courteous comments are always welcome!

Tags: , ,

  1. Sascha’s avatar

    Good post, thanks. So to iterate on my question from your previous post, if you have an isolated bridge, it is not really isolated but does eventually hit the host’s IP stack and all of it’s routing…. right? Otherwise gre0 traffic wouldn’t be able to reach the tep0 interface. I don’t see why they call this an “isolated” bridge then.

    Or is it hitting tep0 because that belongs to another OVS instance?

  2. Lennie’s avatar

    It is very good you are explaining this to people.

    But for the people that found this to easy, maybe you can add some network-namespaces, VM to VM routing, iptables, firewalling, NAT and VLAN as well. Mix it up a little. :-)

    Or just try these things and see if you come across any strange things you might not have expected and make a quiz out of it. So the commenters can try and figure out what is going on.

    What is on my todo list is to see what is possible in OpenStack, which configurations are supported. And making a choice which one I want to use.

    What I’m hoping to do is use some of the DOVE extensions of VXLAN to prevent broadcasts, the extensions are available in Linux 3.8, like on Ubuntu 13.04, but I don’t know if there are any existing open source components that can use them. I doubt I’ll have time to write any code for that myself.

    Haven’t looked at the security groups support in OpenStack either.

    So much still left to do, so little time, but at least it’s fun stuff. :-)

  3. slowe’s avatar

    Sascha, I’m the one that called it an isolated bridge, and that’s because it has no physical interfaces associated with it. I apologize if my wording threw you off. The reason the traffic hits the host’s IP stack is because of the tunnel interface. Without the tunnel interface, the bridge would truly be isolated—not able to communicate outside the host at all. The purpose of the tep0 interface is simply to control which NICs the tunnel endpoint uses. Because tep0 is utilized by the host’s IP stack, and because the tunnel interface connects the bridge to the host’s IP stack, that’s what allows the traffic to flow from the isolated bridge through tep0. You could just as easily have used a physical interface for the tunnel endpoint instead of an OVS internal interface.

    Lennie, more “complicated” configurations are on the way. First, though, I need to establish the correct base understanding upon which I can build more in-depth configurations that leverage things like VLANs, network namespaces, source routing, and similar. Patience, my friend…patience. :-)

  4. Lennie’s avatar

    @Sascha I think this might make it clear:

    Let’s say the guest has a port called vnet0 which connected to an OVS bridge br-int. And a GRE-tunnel is created called gre0 and it is also connected to br-int.

    As you know a switch and a bridge is pretty much the same thing.

    The OVS/bridge uses MAC-learning like a normal swich.

    When traffic comes onto the bridge from the guest through vnet0, the bridge will look at the forwarding table and might decide that the MAC-address of the destination is on gre0. In that case the traffic is forwarded to gre0.

    At gre0 it gets encapsulated with a GRE-header. The GRE-tunnel is handled by the host.

    The host just routes the GRE-tunnel packets to the remote_ip, where it gets unpacked and delivered on an other bridge which will hopefully know what to do with it and deliver it at the right port, which is probably connected to a VM.

  5. Sascha’s avatar

    No need to apologize Scott. I was just confused, plus I am not a native english speaker :) It makes more sense now to me, thanks. So the gre0 tunnel interface is actually the one connecting to the host’s IP stack, not the bridge per se.

    Exciting stuff. And looking forward to the more complicated stuff :D

    Also, I think it’s time to setup an OpenStack lab.

  6. Sascha’s avatar

    Thanks Lennie, I got it. :)

  7. Pasquale’s avatar

    You lied and I am very sad :/

    I am in the 4 scenario: I have assigned an ip address to br0, it is 192.168.1.2.
    Traffic works fine.

    Now, I created a gre tunnel for br-int: it is directed to 192.168.1.5 which is another pc on the network on which I have another openvswitch.

    Well, mate, if I ping 192.168.1.2 from my vm (static ip address assigned set to 192.168.1.9 so same subnet, as you say) it doesn’t work! It fails!

  8. Badiane’s avatar

    I think your articles are very interesting, especially the ones dealing with containers. I have been using containers in some for or other in Linux since around 2001-2003 starting with Vservers and later on OpenVZ.

    Also, though I’ve known of OVswitch, I’ve never used it.

    I wanted to ask you if you’ve ever read the LARTC and also investigated all that iproute has to offer and also works?

    I personally think more work needs to be done on increasing the performance of macvlan devices which I tend to use instead of bridges for container. Also if you are looking for isolation that may be the better route.
    https://blog.flameeyes.eu/tag/macvlan/ ( some of the articles are a bit dated but quite good)
    http://backreference.org/2014/03/20/some-notes-on-macvlanmacvtap/
    http://130.238.130.111/seminars/workshop-2011-03-31/virtual_router_performance.pdf (This the area of testing which has led to my “increasing performance” statement. Of course I don’t know if the identified problem has been solved in the kernel since the recent popularity of containers, but since most people tend to use bridges (the old fashion way) instead of using the newer mechanisms, I wouldn’t hold on to the notion that it might still be a problem.)
    https://encrypted.google.com/books?id=l8TQcfQGLy8C&pg=PA380&lpg=PA380&dq=%22macvlan%22+%22performance%22&source=bl&ots=Jnf968sCb-&sig=hzH8k8El_YNC3ATnybxQ1RwWwtE&hl=en&sa=X&ei=ZfSBU-zqA9XJsQTp_4CYAw&ved=0CCgQ6AEwADgK#v=onepage&q=%22macvlan%22%20%22performance%22&f=false

    Have you every looked at the routing tables and rules that OVS creates? I was wondereing about that.

    I just noticed that Halchengsun posted something along the lines of this message. I will still continue to post it anyway.

    http://www.bertera.it/index.php/2011/10/04/howto-configure-multiple-mac-address-over-a-single-ethernet-interface/
    http://blog.codeaholics.org/2013/giving-dockerlxc-containers-a-routable-ip-address/

    I’m sure that you’ve had the opportunity to explore many of these.

    Thanks again for your articles.

  9. Dashun Sun’s avatar

    Good job! This really clarifies several points in my mind. I still have a question though: if the GRE traffic is channeled through regular Linux TCP/IP stack, then why veth pairs are still necessary? I see them used a lot by openstack neutron.

    Thanks in advance!

  10. slowe’s avatar

    Dashun, in OpenStack at least, veth pairs are used because Open vSwitch (OVS) doesn’t support applying IPtables rules against OVS ports/interfaces. Therefore, to make security groups work, you need to use veth pairs and apply the IPtables rules against the veth interface. (That’s my understanding. I’m happy to be corrected if that’s incorrect.)

Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>