A Use Case for Policy Routing with KVM and Open vSwitch

In an earlier post, I provided an introduction to policy routing as implemented in recent versions of Ubuntu Linux (and possibly other distributions as well), and I promised that in a future post I would provide a practical application of its usage. This post looks at that practical application: how—and why—you would use Linux policy routing in an environment running OVS and a Linux hypervisor (I’ll assume KVM for the purposes of this post).

Before I get into the “how,” let’s first discuss the “why.” Let’s assume that you have a KVM+OVS environment and are leveraging tunnels (GRE or other) for some guest domain traffic. Recall from my post on traffic patterns with Open vSwitch that tunnel traffic is generated by the OVS process itself, and therefore is controlled by the Linux host’s IP routing table with regard to which interfaces that tunnel traffic will use. But what if you need the tunnel traffic to be handled differently than the host’s management traffic? What if you need a default route for tunnel traffic that uses one interface, but a different default route for your separate management network that uses its own interface? This is why you would use policy routing in this configuration. Using source routing (i.e., policy routing based on the source of the traffic), you could easily define a table for tunnel traffic that has its own default route while still allowing management traffic to use the host’s default routing table.

Let’s take a look at how it’s done. In this example, I’ll make the following assumptions:

  • I’ll assume that you’re running host management traffic through OVS, as I outlined here. I’ll use the name mgmt0 to refer to the management interface that’s running through OVS for host management traffic. We’ll use the IP address 192.168.100.10 for the mgmt0 interface.
  • I’ll assume that you’re running tunnel traffic through an OVS interface interface named tep0. (This helps provide some consistency with my walk-through on using GRE tunnels with OVS.) We’ll use the IP address 192.168.200.10 for the tep0 interface.
  • I’ll assume that the default gateway on each subnet uses the .1 address on that subnet.

With these assumptions out of the way, let’s look at how you would set this up.

First, you’ll create a custom policy routing table, as outlined here. I’ll use the name “tunnel” for my new table:

echo 200 tunnel >> /etc/iproute2/rt_tables

Next, you’ll need to modify /etc/network/interfaces for the tep0 interface so that a custom policy routing rule and custom route are installed whenever this interface is brought up. The new configuration stanza would look something like this:

(If the configuration stanza doesn’t appear above, click here.)

Finally, you’ll want to ensure that mgmt0 is properly configured in /etc/network/interfaces. No special configuration is required there, just the use of the gateway directive to install the default route. Ubuntu will install the default route into the main table automatically, making it a “system-wide” default route that will be used unless a policy routing rule dictates otherwise.

With this configuration in place, you now have a system that:

  • Can communicate via mgmt0 with other systems in other subnets via the default gateway of 192.168.100.1.
  • Can communicate via tep0 to establish tunnels with other hypervisors in other subnets via the 192.168.200.1 gateway.

This configuration requires only the initial configuration (which could, quite naturally, be automated via a tool like Puppet) and does not require using additional routes as the environment scales to include new subnets for other hypervisors (either for management or tunnel traffic). Thus, organizations can use recommended practices for building scalable L3 networks with reasonably-sized L2 domains without sacrificing connectivity to/from the hypervisors in the environment.

(By the way, this is something that is not easily accomplished in the vSphere world today. ESXi has only a single routing table for all VMkernel interfaces, which means that management traffic, vMotion traffic, VXLAN traffic, etc., are all bound by that single routing table. To achieve full L3 connectivity, you’d have to install specific routes into the VMkernel routing table on each ESXi host. When additional subnets are added for scale, each host would have to be touched to add the additional route.)

Hopefully this gives you an idea of how Linux policy routing could be effectively used in environments leveraging virtualization, OVS, and overlay protocols. Feel free to add your thoughts, idea, corrections, or questions in the comments below. Courteous comments are always welcome! (Please disclose vendor affiliations where applicable.)

Tags: , , , , ,

  1. Lennie’s avatar

    In your Gist and your previous article you first create the rule and the routes for the table it points to after that. When you do that, you point the rule at a non-existing table.

    Personally I tend to create things in reverse order of dependency.

    So first everything that has no dependencies and then those that logically depend on that and so on.

    So I would add the entry to rt_tables first, then create the table and after that the rule that poits to it.

    The reason for this is that it might work fine now, but might not keep working in a newer version or not work on all systems if you have multiple versions deployed.

  2. slowe’s avatar

    Lennie, I’m not sure I follow. The table is already created (it exists in /etc/iproute2/rt_tables), and it is automatically recreated on reboot as a result. Therefore, adding the rule versus populating the routes shouldn’t matter, as the table already exists. Or am I missing something?