Interoperability

This category contains posts that focus on interoperability between various technologies or products, with an emphasis on technical details on how to resolve interoperability issues.

A new startup emerged from stealth today, a company called Platform9. Platform9 was launched by former VMware veterans with the goal of making it easy for companies to consume their existing infrastructure in an agile, cloud-like fashion. Platform9 seeks to accomplish this by offering a cloud management platform that is itself provided as a cloud-based service—hence the name of this post, “cloud-hosted cloud management.”

It’s an interesting approach, and it certainly helps eliminate some of the complexity that organizations face when implementing their own cloud management platform. For now, at least, that is especially true for OpenStack, which can be notoriously difficult for newcomers to the popular open source cloud management environment. By Platform9 offering an OpenStack API-compatible service, organizations that want a more “public cloud-like” experience can get it without all the added hassle.

The announcements for Platform9 talk about support for KVM, vSphere, and Docker, though the product will only GA with KVM support (support for vSphere and Docker are on the roadmap). Networking support is also limited; in the initial release, Platform9 will look for Linux bridges with matching names in order to stitch together networks. However, customers will get an easy, non-disruptive setup with a nice set of dashboards to help show how their capacity is being utilized and allocated.

It will be interesting to see how things progress for Platform9. The idea of providing cloud management via an SaaS model (makes me think of “cloud inception”) is an interesting one that does sidestep many adoption hurdles, though questions of security, privacy, confidentiality, etc., may still hinder adoption in some environments.

Thoughts on Platform9? Feel free to speak up in the comments below. All courteous comments are welcome!

Tags: , , ,

In this post, I’ll show you how I got Arista’s vEOS software running under KVM to create a virtualized Arista switch. There are a number of other articles that help provide instructions on how to do this, but none of those that I found included the use of libvirt and/or Open vSwitch (OVS).

In order to run vEOS, you must first obtain a copy of vEOS. I can’t provide you with a copy; you’ll have to register on the Arista Networks site (see here) in order to gain access to the download. The download consists of two parts:

  1. The Aboot ISO, which contains the boot loader
  2. The vEOS disk image, provided as a VMware VMDK

Both of these are necessary; you can’t get away with just one or the other. Further, although the vEOS disk image is provided as a VMware VMDK, KVM/QEMU is perfectly capable of using the VMDK without any conversion required (this is kind of nice).

One you’ve downloaded these files, you can use the following libvirt domain XML definition to create a VM for running Arista vEOS (you’d use a command like virsh define <filename>).

(Click here if you can’t see the code block above.)

There are a few key things to note about this libvirt domain XML:

  • Note the boot order; the VM must boot from the Aboot ISO first.
  • Both the Aboot ISO as well as the vEOS VMDK are attached to the VM as devices, and you must use an IDE bus. Arista vEOS will refuse to boot if you use a SCSI device, so make sure there are no SCSI devices in the configuration. Pay particular attention to the type= parameters that specify the correct disk formats for the ISO (type “raw”) and VMDK (type “vmdk”).
  • For the network interfaces, you’ll want to be sure to use the e1000 model.
  • This example XML definition includes three different network interfaces. (More are supported; up to 7 interfaces on QEMU/KVM.)
  • This XML definition leverages libvirt integration with OVS so that libvirt automatically attaches VMs to OVS and correctly applies VLAN tagging and trunking configurations. In this case, the network interfaces are attaching to a portgroup called “trunked”; this portgroup trunks VLANs up to the guest domain (the vEOS VM, in this case). In theory, this should allow the vEOS VM to support VLAN trunk interfaces, although I had some issues making this work as expected and had to drop back to tagged interfaces.

Once you have the guest domain defined, you can start it by using virsh start <guest domain name>. The first time it boots, it will take a long time to come up. (A really long time—I watched it for a good 10 minutes before finally giving up and walking away to do something else. It was up when I came back.) According to the documentation I’ve found, this is because EOS needs to make a backup copy of the flash partition (which in this case is the VMDK disk image). It might be quicker for you, but be prepared for a long first boot just in case.

Once it’s up and running, use virsh vncdisplay to get the VNC display of the vEOS guest domain, then use a VNC viewer to connect to the guest domain’s console. You won’t be able to SSH in yet, as all the network interfaces are still unconfigured. At the console, set an IP address on the Management1 interface (which will correspond to the first virtual network interface defined in the libvirt domain XML) and then you should have network connectivity to the switch for the purposes of management. Once you create a username and a password, then you’ll be able to SSH into your newly-running Arista vEOS switch. Have fun!

For additional information and context, here are some links to other articles I found on this topic while doing some research:

If you have any questions or need more information, feel free to speak up in the comments below. All courteous comments are welcome!

Tags: , , , , ,

In April of this year, we started a series of articles at Network Heresy on the topic of policy in the data center. The first of these articles, which I mentioned in this post, focused on the problem of policy in the data center. This was a great introduction to the need for policy and the challenges with the current ways of addressing policy in the data center.

A short while ago, we published the second of our series on policy, titled “On Policy in the Data Center: The solution space”. This post describes the key features/functionality that a policy system must have to address the challenges identified in part 1 of the series. In a nutshell (I highly recommend you go read the full article), these key areas include:

  • The sources from which policy is derived
  • The language(s) used to express policy
  • The way policy systems interact with data center services
  • The actions a policy system can take

I really liked this statement from the article (this is in reference to how a policy system interacts with other services in the data center):

A policy system by itself is useless; to have value, the policy system must interact and integrate with other data center or cloud services.

The relationship between a policy system and the ecosystem of data center services with which it interacts is so critical. Having a policy system is great, but if the policy system can’t be integrated with other data center or cloud services, then it’s not very useful, is it?

Go have a look at the second post in the series on policy in the data center and feel free to join in the conversation. You can leave comments here or at the Network Heresy site.

Tags: , ,

In an earlier post, I provided an introduction to OpenStack Heat, and provided an example Heat template that launched two instances with a logical network and a logical router. Here I am going to provide another view of a Heat template that does the same thing, but uses YAML and the HOT format instead of JSON and the CFN format.

Here’s the full template (click here if the code box below isn’t showing up):

I won’t walk through the whole template again, but rather just talk briefly about a couple of the differences between this YAML-encoded template and the earlier JSON-encoded template:

  • You’ll note the syntax is much simpler. JSON can trip you up on commas and such if you’re not careful; YAML is simpler and cleaner.
  • You’ll note the built-in functions are different, as I pointed out in my first Heat post. Instead of using Ref to refer to an object defined elsewhere in the template, HOT uses get_resource instead.

Aside from these differences, you’ll note that the resource types and properties match between the two; this is because resource types are separate and independent from the template format.

Feel free to post any questions, corrections, or clarifications in the comments below. Thanks for reading!

Tags: , , ,

In this post, I’m going to provide a quick introduction to OpenStack Heat, the orchestration service that allows you to spin up multiple instances, logical networks, and other cloud services in an automated fashion. Note that this is only an introductory post—I’m not an expert on Heat, but I did want to share at least some basic information to help others get started as well.

Let’s start with some terminology, so that there is no confusion about the terms later when we start using them in specific examples:

  • Stack: In Heat parlance, a stack is the collection of objects—or resources—that will be created by Heat. This might include instances (VMs), networks, subnets, routers, ports, router interfaces, security groups, security group rules, auto-scaling rules, etc.
  • Template: Heat uses the idea of a template to define a stack. If you wanted to have a stack that created two instances connected by a private network, then your template would contain the definitions for two instances, a network, a subnet, and two network ports. Since templates are central to how Heat operates, I’ll show you examples of templates in this post.
  • Parameters: A Heat template has three major sections, and one of those sections defines the template’s parameters. These are tidbits of information—like a specific image ID, or a particular network ID—that are passed to the Heat template by the user. This allows users to create more generic templates that could potentially use different resources.
  • Resources: Resources are the specific objects that Heat will create and/or modify as part of its operation, and the second of the three major sections in a Heat template.
  • Output: The third and last major section of a Heat template is the output, which is information that is passed to the user, either via OpenStack Dashboard or via the heat stack-list and heat stack-show commands.
  • HOT: Short for Heat Orchestration Template, HOT is one of two template formats used by Heat. HOT is not backwards-compatible with AWS CloudFormation templates and can only be used with OpenStack. Templates in HOT format are typically—but not necessarily required to be—expressed as YAML (more information on YAML here). (I’ll do my best to avoid saying “HOT template,” as that would be redundant, wouldn’t it?)
  • CFN: Short for AWS CloudFormation, this is the second template format that is supported by Heat. CFN-formatted templates are typically expressed in JSON (see here and see my non-programmer’s introduction to JSON for more information on JSON specifically).

OK, that should be enough to get us going. (BTW, the OpenStack Heat documentation actually has a really good glossary. Please note that this link might break as OpenStack development continues.)

Architecturally, Heat has a few major components:

  • The heat-api component implements an OpenStack-native RESTful API. This components processes API requests by sending them to the Heat engine via AMQP.
  • The heat-api-cfn component provides an API compatible with AWS CloudFormation, and also forwards API requests to the Heat engine over AMQP.
  • The heat-engine component provides the main orchestration functionality.

All of these components would typically be installed on an OpenStack “controller” node that also housed the API servers for Nova, Glance, Neutron, etc. As far as I know, though, there is nothing that requires them to be installed on the same system. Like most of the rest of the OpenStack services, Heat uses a back-end database for maintaining state information.

Now that you have an idea about Heat’s architecture, I’ll walk you through an example template that I created and tested on my own OpenStack implementation (running OpenStack Havana on Ubuntu 12.04 with KVM and VMware NSX). Here’s the full template:

(Can’t see the code above? Click here.)

Let’s walk through this template real quick:

  • First, note that I’ve specified the template version as “AWSTemplateFormatVersion”. One thing that confused me at first was the relationship between the template format (CFN vs. HOT) and resource types. It turns out these are independent of one another; you can—as I have done here—use HOT resource types (like OS::Neutron::Net) in a CFN template. Obviously, if you use HOT resources you’re not fully compatible with AWS. Also, as I stated earlier, CFN templates are typically expressed in JSON (as mine is). Heat does support YAML for CFN templates, although again you’d be sacrificing AWS compatibility.
  • You’ll note that my template skips any use of parameters and goes straight to resources. This is perfectly acceptable, although it means that some values (like the shared public provider network to which the logical router uplinks and the security group) have to be hard-coded in the template.
  • One thing that the template format does control is some of the syntax. So, for example, you’ll note the template uses “Resources”, “Type”, and “Properties.” In some of the other template formats, these could be specified lowercase.
  • The first resource defined is a logical network, defined as type OS::Neutron::Net.
  • The next resource is a subnet (of type OS::Neutron::Subnet), which is associated with the previously-defined logical network through the use of the Ref built-in function on line 20. Built-in functions are another thing controlled by the template format, so when you want to refer to another object in a CFN template, you’ll use the Ref function as I did here. This associates the “network_id” property of the subnet with the logical network defined just prior. You’ll also note that the subnet resource has a number of properties associated with it—CIDR, DNS name servers, DHCP, and gateway IP address.
  • The third resource defined is a logical router.
  • After the logical router is defined, the template links the logical router to a pre-existing provider network via the OS::Neutron::RouterGateway type. (This was deprecated in Icehouse in favor of an external_gateway_info property on the logical router.) The UUID listed there is the UUID of a pre-existing provider network. Note the use of the Ref function again to link this resource back to the logical router.
  • Next up the template creates an interface on the logical router, using two Ref instances to link this router interface back to the logical router and the subnet created earlier. This means we are adding an interface to the referenced logical router on the specified subnet (and that interface will assume the IP address specified by the “gateway_ip” property on the subnet).
  • Next the template creates two Neutron ports, and links them to the default security group. Note that if you don’t specify a security group when creating the Neutron port, it will have none—and no traffic will pass.
  • Finally, the Heat template creates two instances (type OS::Nova::Server), using the “m1.xsmall” flavor and a hard-coded Glance image ID. These instances are connected to the Neutron ports created earlier using the Ref function once more.

(In case it wasn’t obvious already, you can’t just copy-and-paste this Heat template and use it in your own environment, as it references UUIDs for objects in my environment that won’t be the same.)

If you are going to use JSON (as I have here), then I’d recommend bookmarking a JSON validation site, such as jsonlint.com.

Once you have your Heat template defined, you can then use this template to create a stack, either via the heat CLI client or via the OpenStack Dashboard. I’ll attach a screenshot from a stack that I deployed via the Dashboard so that you can see what it looks like (click the image for a larger version):

A deployed Heat stack in OpenStack Dashboard

Kinda nifty, don’t you think? Anyway, I hope this brief introduction to OpenStack Heat has proven useful. I do plan on covering some additional topics with OpenStack Heat in the near future, so stay tuned. In the meantime, if you have any questions, corrections, or clarifications, I invite you to add them to the comments below.

Tags: , , , ,

Reader Brian Markussen—with whom I had the pleasure to speak at the Danish VMUG in Copenhagen earlier this month—brought to my attention an issue between VMware vSphere’s health check feature and Cisco UCS when using Cisco’s VIC cards. His findings, confirmed by VMware support and documented in this KB article, show that the health check feature doesn’t work properly with Cisco UCS and the VIC cards.

Here’s a quote from the KB article:

The distributed switch network health check, including the VLAN, MTU, and teaming policy check can not function properly when there are hardware virtual NICs on the server platform. Examples of this include but are not limited to Broadcom Flex10 systems and Cisco UCS systems.

(Ignore the fact that “UCS systems” is redundant.)

According to Brian, a fix for this issue will be available in a future update to vSphere. In the meantime, there doesn’t appear to be any workaround, so plan accordingly.

Tags: , , , , , ,

Most IT vendors agree that more extensive use of automation and orchestration in today’s data centers are beneficial to customers. The vendors may vary in their approach to providing this automation and orchestration—some may prefer to do it in software (VMware would be one of these, along with other software companies like Microsoft and Red Hat), while others want to do it in hardware. There are advantages and disadvantages to each approach, naturally, and customers need to evaluate the various solutions against their own requirements to find the best fit.

However, the oft-overlooked problem that more extensive use of automation and orchestration creates is one of control—specifically, how customers can control this automation and orchestration according to their own specific policy. A recent post on the Network Heresy site discusses the need for policy in fully automated IT environments:

However, fully automated IT management is a double-edged sword. While having people on the critical path for IT management was time-consuming, it provided an opportunity to ensure that those resources were managed sensibly and in a way that was consistent with how the business said they ought to be managed. In other words, having people on the critical path enabled IT resources to be managed according to business policy. We cannot simply remove those people without also adding a way of ensuring that IT resources obey business policy—without introducing a way of ensuring that IT resources retain the same level of policy compliance.

VMware, along with a number of other companies, has launched an open source effort to address this challenge: finding a way to enable customers to manage their resources according to their business policy, and do so in a cloud-agnostic way. This effort is called Congress, and it has received some attention from those who think it’s a critical project). I’m really excited to be involved in this project, and I’m also equally excited to be working with some extremely well-respected individuals across a number of different companies (this is most definitely not a VMware-only project). I believe that creating an open source solution to the policy problem will further the cause of cloud computing and the transformation of our industry. I strongly urge you to read this first post, titled “On Policy in the Data Center: The policy problem”, and stay tuned for future blog posts that will dive into even greater detail. Exciting times are ahead!

Tags: , ,

It seems as if APIs are popping up everywhere these days. While this isn’t a bad thing, it does mean that IT professionals need to have a better understanding of how to interact with these APIs. In this post, I’m going to discuss how to use the popular command line utility curl to interact with a couple of RESTful APIs—specifically, the OpenStack APIs and the VMware NSX API.

Before I go any further, I want to note that to work with the OpenStack and VMware NSX APIs you’ll be sending and receiving information in JSON (JavaScript Object Notation). If you aren’t familiar with JSON, don’t worry—I’ve have an introductory post on JSON that will help get you up to speed. (Mac users might also find this post helpful as well.)

Also, please note that this post is not intended to be a comprehensive reference to the (quite extensive) flexibility of curl. My purpose here is to provide enough of a basic reference to get you started. The rest is up to you!

To make consuming this information easier (I hope), I’ll break this information down into a series of examples. Let’s start with passing some JSON data to a REST API to authenticate.

Example 1: Authenticating to OpenStack

Let’s say you’re working with an OpenStack-based cloud, and you need to authenticate to OpenStack using OpenStack Identity (“Keystone”). Keystone uses the idea of tokens, and to obtain a token you have to pass correct credentials. Here’s how you would perform that task using curl.

You’re going to use a couple of different command-line options:

  • The “-d” option allows us to pass data to the remote server (in this example, the remote server running OpenStack Identity). We can either embed the data in the command or pass the data using a file; I’ll show you both variations.
  • The “-H” option allows you to add an HTTP header to the request.

If you want to embed the authentication credentials into the command line, then your command would look something like this:

curl -d '{"auth":{"passwordCredentials":{"username": "admin",
"password": "secret"},"tenantName": "customer-A"}}'
-H "Content-Type: application/json" http://192.168.100.100:5000/v2.0/tokens

I’ve wrapped the text above for readability, but on the actual command line it would all run together with no breaks. (So don’t try to copy and paste, it probably won’t work.) You’ll naturally want to substitute the correct values for the username, password, tenant, and OpenStack Identity URL.

As you might have surmised by the use of the “-H” header in that command, the authentication data you’re passing via the “-d” parameter is actually JSON. (Run it through python -m json.tool and see.) Because it’s actually JSON, you could just as easily put this information into a file and pass it to the server that way. Let’s say you put this information (which you could format for easier readability) into a file named credentials.json. Then the command would look something like this (you might need to include the full path to the file):

curl -d @credentials.json -H "Content-Type: application/json" http://192.168.100.100:35357/v2.0/tokens

What you’ll get back from OpenStack—assuming your command is successful—is a wealth of JSON. I highly recommend piping the output through python -m json.tool as it can be difficult to read otherwise. (Alternately, you could pipe the output into a file.) Of particular usefulness in the returned JSON is a section that gives you a token ID. Using this token ID, you can prove that you’ve authenticated to OpenStack, which allows you to run subsequent commands (like listing tenants, users, etc.).

Example 2: Authenticating to VMware NSX

Not all RESTful APIs handle authentication in the same way. In the previous example, I showed you how to pass some credentials in JSON-encoded format to authenticate. However, some systems use other methods for authentication. VMware NSX is one example.

In this example, you’ll need to use a different set of curl command-line options:

  • The “–insecure” option tells curl to ignore HTTPS certificate validation. VMware NSX controllers only listen on HTTPS (not HTTP).
  • The “-c” option stores data received by the server (one of the NSX controllers, in this case) into a cookie file. You’ll then re-use this data in subsequent commands with the “-b” option.
  • The “-X” option allows you to specify the HTTP method, which normally defaults to GET. In this case, you’ll use the POST method along the the “-d” parameter you saw earlier to pass authentication data to the NSX controller.

Putting all this together, the command to authenticate to VMware NSX would look something like this (naturally you’d want to substitute the correct username and password where applicable):

curl --insecure -c cookies.txt -X POST -d 'username=admin&password=admin' https://192.168.100.50/ws.v1/login

Example 3: Gathering Information from OpenStack

Once you’ve gotten an authentication token from OpenStack as I showed you in example #1 above, then you can start using API requests to get information from OpenStack.

For example, let’s say you wanted to list the instances for a particular tenant. Once you’ve authenticated, you’d want to get the ID for the tenant in question, so you’d need to ask OpenStack to give you a list of the tenants (you’ll only see the tenants your credentials permit). The command to do that would look something like this:

curl -H "X-Auth-Token: <Token ID>" http://192.168.100.70:5000/v2.0/tenants

The value to be substituted for token ID in the above command is returned by OpenStack when you authenticate (that’s why it’s important to pay attention to the data being returned). In this case, the data returned by the command will be a JSON-encoded list of tenants, tenant IDs, and tenant descriptions. From that data, you can get the ID of the tenant for whom you’d like to list the instances, then use a command like this:

curl -H "X-Auth-Token: <Token ID>" http://192.168.100.70:8774/v2/<Tenant ID>/servers

This will return a stream of JSON-encoded data that includes the list of instances and each instance’s unique ID—which you could then use to get more detailed information about that instance:

curl -H "X-Auth-Token: <Token ID>" http://192.168.100.70:8774/v2/<Tenant ID>/servers/<Server ID>

By and large, the API is reasonably well-documented; you just need to be sure that you are pointing the API call against the right endpoint. For example, authentication has to happen against the server running Keystone, which may or may not be the same server that is running the Nova API services. (In the examples I just provided, Keystone and the Nova API services are running on the same host, which is why the IP address is the same in the command lines.)

Example 4: Creating Objects in VMware NSX

Getting information from VMware NSX using the RESTful API is very much like what you’ve already seen in getting information from OpenStack. Of course, the API can also be used to create objects. To create objects—such as logical switches, logical switch ports, or ACLs—you’ll use a combination of curl options:

  • You’ll use the “-b” option to pass cookie data (stored when you authenticated to NSX) back for authentication.
  • The “-X” option allows you to specify the HTTP method (in this case, POST).
  • The “-d” option lets us transfer JSON-encoded data to form the request for the object we are going to create. We’ll specify a filename here, preceded by the “@” symbol.
  • The “-H” option adds an appropriate “Content-Type: application/json” header to the request, since we are passing JSON-encoded data to the NSX controller.

When you put it all together, it looks something like this (substituting appropriate values where applicable):

curl --insecure -b cookies.txt -d @new-switch.json 
-H "Content-Type: application/json" -X POST https://192.168.100.50/ws.v1/lswitch

As I mentioned earlier, you’re passing JSON-encoded data to the NSX controller; here are the contents of the new-switch.json file referenced in the above command example:

If you can’t see the code block, please click here.

Once again, I recommend piping the output of this command through python -m json.tool, as what you’ll get back on a successful call is some useful JSON data that includes, among other things, the UUID of the object (logical switch, in this case) that you just created. You can use this UUID in subsequent API calls to list properties, change properties, add logical switch ports, etc.

Clearly, there is much more that can be done with the OpenStack and VMware NSX APIs, but this at least should give you a starting point from which you can continue to explore in more detail. If anyone has any corrections, clarifications, or questions, please feel free to post them in the comments section below. All courteous comments (with vendor disclosure, where applicable) are welcome!

Tags: , , , ,

In this post, I’m going to show you how I combined Linux network namespaces, VLANs, Open vSwitch (OVS), and GRE tunnels to do something interesting. Well, I found it interesting, even if no one else does. However, I will provide this disclaimer up front: while I think this is technically interesting, I don’t think it has any real, practical value in a production environment. (I’m happy to be proven wrong, BTW.)

This post builds on information I’ve provided in previous posts:

It may pull pieces from a few other posts, but the bulk of the info is found in these. If you haven’t already read these, you might want to take a few minutes and go do that—it will probably help make this post a bit more digestible.

After working a bit with network namespaces—and knowing that OpenStack Neutron uses network namespaces in certain configurations, especially to support overlapping IP address spaces—I wondered how one might go about integrating multiple network namespaces into a broader configuration using OVS and GRE tunnels. Could I use VLANs to multiplex traffic from multiple namespaces across a single GRE tunnel?

To test my ideas, I came up with the following design:

As you can see in the diagram, my test environment has two KVM hosts. Each KVM host has a network namespace and a running guest domain. Both the network namespace and the guest domain are connected to an OVS bridge; the network namespace via a veth pair and the guest domain via a vnet port. A GRE tunnel between the OVS bridges connects the two hosts.

The idea behind the test environment was that the VM on one host would communicate with the veth interface in the network namespace on the other host, using VLAN-tagged traffic over a GRE tunnel between them.

Let’s walk through how I built this environment to do the testing.

I built KVM Host 1 using Ubuntu 12.04.2, and installed KVM, libvirt, and OVS. On KVM Host 1, I built a guest domain, attached it to OVS via a libvirt network, and configured the VLAN tag for its OVS port with this command:

ovs-vsctl set port vnet0 tag=10

In the guest domain, I configured the OS (also Ubuntu 12.04.2) to use the IP address 10.1.1.2/24.

Also on KVM Host 1, I created the network namespace, created the veth pair, moved one of the veth interfaces, and attached the other to the OVS bridge. This set of commands is what I used:

ip netns add red
ip link add veth0 type veth peer name veth1
ip link set veth1 netns red
ip netns exec red ip addr add 10.1.2.1/24 dev veth1
ip netns exec red ip link set veth1 up
ovs-vsctl add-port br-int veth0
ovs-vsctl set port veth0 tag=20

Most of the commands listed above are taken straight from the network namespaces article I wrote, but let’s break it down anyway just for the sake of full understanding:

  • The first command adds the “red” namespace.
  • The second command creates the veth pair, creatively named veth0 and veth1.
  • The third command moves veth1 into the red namespace.
  • The next two commands add an IP address to veth1 and set the interface to up.
  • The last two commands add the veth0 interface to an OVS bridge named br-int, and then set the VLAN tag for that port to 20.

When I’m done, I’m left with KVM Host 1 running a guest domain on VLAN 10 and a network namespace on VLAN 20. (Do you see how I got there?)

I repeated the process on KVM Host 2, installing Ubuntu 12.04.2 with KVM, libvirt, and OVS. Again, I built a guest domain (also running Ubuntu 12.04.2), configured the operating system to use the IP address 10.1.2.2/24, attached it to OVS via a libvirt network, and configured its OVS port:

ovs-vsctl set port vnet0 tag=20

Similarly, I also created a new network namespace and pair of veth interfaces, but I configured them as a “mirror image” of KVM Host 1, reversing the VLAN assignments for the guest domain (as shown above) and the network namespace:

ip netns add blue
ip link add veth0 type veth peer name veth1
ip link set veth1 netns blue
ip netns exec blue ip addr add 10.1.1.1/24 dev veth1
ip netns exec blue ip link set veth1 up
ovs-vsctl add-port br-int veth0
ovs-vsctl set port veth0 tag=10

That leaves me with KVM Host 2 running a guest domain on VLAN 20 and a network namespace on VLAN 10.

The final step was to create the GRE tunnel between the OVS bridges. However, after I established the GRE tunnel, I configured the GRE port to be a VLAN trunk using this command (this command was necessary on both KVM hosts):

ovs-vsctl set port gre0 trunks=10,20,30

So I now had the environment I’d envisioned for my testing. VLAN 10 had a guest domain on one host and a veth interface on the other; VLAN 20 had a veth interface on one host and a guest domain on the other. Between the two hosts was a GRE tunnel configured to act as a VLAN trunk.

Now came the critical test—would the guest domain be able to ping the veth interface? This screen shot shows the results of my testing; this is the guest domain on KVM Host 1 communicating with the veth1 interface in the separate network namespace on KVM Host 2:

Success! Although not shown here, I also tested all other combinations as well, and they worked. (Note you’d have to use ip netns exec ping … to ping from the veth1 interface in the network namespace.) I now had a configuration where I could integrate multiple network namespaces with GRE tunnels and OVS. Unfortunately—and this is where the whole “technically interesting but practically useless” statement comes from—this isn’t really a usable configuration:

  • The VLAN configurations were manually applied to the OVS ports; this means they disappeared if the guest domains were power-cycled. (This could be fixed using libvirt portgroups, but I hadn’t bothered with building them in this environment.)
  • The GRE tunnel had to be manually established and configured.
  • Because this solution uses VLAN tags inside the GRE tunnel, you’re still limited to about 4,096 separate networks/network namespaces you could support.
  • The entire process was manual. If I needed to add another VLAN, I’d have to manually create the network namespace and veth pair, manually move one of the veth interfaces into the namespace, manually add the other veth interface to the OVS bridge, and manually update the GRE tunnel to trunk that VLAN. Not very scalable, IMHO.

However, the experiment was not a total loss. In figuring out how to tie together network namespaces and tunnels, I’ve gotten a better understanding of how all the pieces work. In addition, I have a lead on an even better way of accomplishing the same task: using OpenFlow rules and tunnel keys. This is the next area of exploration, and I’ll be sure to post something when I have more information to share.

In the meantime, feel free to share your thoughts and feedback on this post. What do you think—technically interesting or not? Useful in a real-world scenario or not? All courteous comments (with vendor disclosure, where applicable) are welcome.

Tags: , , , , , , ,

I’m back with another “how to” article on Open vSwitch (OVS), this time taking a look at using GRE (Generic Routing Encapsulation) tunnels with OVS. OVS can use GRE tunnels between hosts as a way of encapsulating traffic and creating an overlay network. OpenStack Quantum can (and does) leverage this functionality, in fact, to help separate different “tenant networks” from one another. In this write-up, I’ll walk you through the process of configuring OVS to build a GRE tunnel to build an overlay network between two hypervisors running KVM.

Naturally, any sort of “how to” such as this always builds upon the work of others. In particular, I found a couple of Brent Salisbury’s articles (here and here) especially useful.

This process has 3 basic steps:

  1. Create an isolated bridge for VM connectivity.
  2. Create a GRE tunnel endpoint on each hypervisor.
  3. Add a GRE interface and establish the GRE tunnel.

These steps assume that you’ve already installed OVS on your Linux distribution of choice. I haven’t explicitly done a write-up on this, but there are numerous posts from a variety of authors (in this regard, Google is your friend).

We’ll start with an overview of the topology, then we’ll jump into the specific configuration steps.

Reviewing the Topology

The graphic below shows the basic topology of what we have going on here:

Topology overview

We have two hypervisors (CentOS 6.3 and KVM, in my case), both running OVS (an older version, version 1.7.1). Each hypervisor has one OVS bridge that has at least one physical interface associated with the bridge (shown as br0 connected to eth0 in the diagram). As part of this process, you’ll create the other internal interfaces (the tep and gre interfaces, as well as the second, isolated bridge to which VMs will connect. You’ll then create a GRE tunnel between the hypervisors and test VM-to-VM connectivity.

Creating an Isolated Bridge

The first step is to create the isolated OVS bridge to which the VMs will connect. I call this an “isolated bridge” because the bridge has no physical interfaces attached. (Side note: this idea of an isolated bridge is fairly common in OpenStack and NVP environments, where it’s usually called the integration bridge. The concept is the same.)

The command is very simple, actually:

ovs-vsctl add-br br2

Yes, that’s it. Feel free to substitute a different name for br2 in the command above, if you like, but just make note of the name as you’ll need it later.

To make things easier for myself, once I’d created the isolated bridge I then created a libvirt network for it so that it was dead-easy to attach VMs to this new isolated bridge.

Configuring the GRE Tunnel Endpoint

The GRE tunnel endpoint is an interface on each hypervisor that will, as the name implies, serve as the endpoint for the GRE tunnel. My purpose in creating a separate GRE tunnel endpoint is to separate hypervisor management traffic from GRE traffic, thus allowing for an architecture that might leverage a separate management network (which is typically considered a recommended practice).

To create the GRE tunnel endpoint, I’m going to use the same technique I described in my post on running host management traffic through OVS. Specifically, we’ll create an internal interface and assign it an IP address.

To create the internal interface, use this command:

ovs-vsctl add-port br0 tep0 -- set interface tep0 type=internal

In your environment, you’ll substitute br2 with the name of the isolated bridge you created earlier. You could also use a different name than tep0. Since this name is essentially for human consumption only, use what makes sense to you. Since this is a tunnel endpoint, tep0 made sense to me.

Once the internal interface is established, assign it with an IP address using ifconfig or ip, whichever you prefer. I’m still getting used to using ip (more on that in a future post, most likely), so I tend to use ifconfig, like this:

ifconfig tep0 192.168.200.20 netmask 255.255.255.0

Obviously, you’ll want to use an IP addressing scheme that makes sense for your environment. One important note: don’t use the same subnet as you’ve assigned to other interfaces on the hypervisor, or else you can’t control that the GRE tunnel will originate (or terminate) on the interface you specify. This is because the Linux routing table on the hypervisor will control how the traffic is routed. (You could use source routing, a topic I plan to discuss in a future post, but that’s beyond the scope of this article.)

Repeat this process on the other hypervisor, and be sure to make note of the IP addresses assigned to the GRE tunnel endpoint on each hypervisor; you’ll need those addresses shortly. Once you’ve established the GRE tunnel endpoint on each hypervisor, test connectivity between the endpoints using ping or a similar tool. If connectivity is good, you’re clear to proceed; if not, you’ll need to resolve that before moving on.

Establishing the GRE Tunnel

By this point, you’ve created the isolated bridge, established the GRE tunnel endpoints, and tested connectivity between those endpoints. You’re now ready to establish the GRE tunnel.

Use this command to add a GRE interface to the isolated bridge on each hypervisor:

ovs-vsctl add-port br2 gre0 -- set interface gre0 type=gre \
options:remote_ip=<GRE tunnel endpoint on other hypervisor>

Substitute the name of the isolated bridge you created earlier here for br2 and feel free to use something other than gre0 for the interface name. I think using gre as the base name for the GRE interfaces makes sense, but run with what makes sense to you.

Once you repeat this command on both hypervisors, the GRE tunnel should be up and running. (Troubleshooting the GRE tunnel is one area where my knowledge is weak; anyone have any suggestions or commands that we can use here?)

Testing VM Connectivity

As part of this process, I spun up an Ubuntu 12.04 server image on each hypervisor (using virt-install as I outlined here), attached each VM to the isolated bridge created earlier on that hypervisor, and assigned each VM an IP address from an entirely different subnet than the physical network was using (in this case, 10.10.10.x).

Here’s the output of the route -n command on the Ubuntu guest, to show that it has no knowledge of the “external” IP subnet—it knows only about its own interfaces:

ubuntu:~ root$ route -n
Kernel IP routing table
Destination  Gateway       Genmask        Flags Metric Ref Use Iface
0.0.0.0      10.10.10.254  0.0.0.0        UG    100    0   0   eth0
10.10.10.0   0.0.0.0       255.255.255.0  U     0      0   0   eth0

Similarly, here’s the output of the route -n command on the CentOS host, showing that it has no knowledge of the guest’s IP subnet:

centos:~ root$ route -n
Kernel IP routing table
Destination  Gateway        Genmask        Flags Metric Ref Use Iface
192.168.2.0  0.0.0.0        255.255.255.0  U     0      0   0   tep0
192.168.1.0  0.0.0.0        255.255.255.0  U     0      0   0   mgmt0
0.0.0.0      192.168.1.254  0.0.0.0        UG    0      0   0   mgmt0

In my case, VM1 (named web01) was given 10.10.10.1; VM2 (named web02) was given 10.10.10.2. Once I went through the steps outlined above, I was able to successfully ping VM2 from VM1, as you can see in this screenshot:

VM-to-VM connectivity over GRE tunnel

(Although it’s not shown here, connectivity from VM2 to VM1 was obviously successful as well.)

“OK, that’s cool, but why do I care?” you might ask.

In this particular context, it’s a bit of a science experiment. However, if you take a step back and begin to look at the bigger picture, then (hopefully) something starts to emerge:

  • We can use an encapsulation protocol (GRE in this case, but it could have just as easily been STT or VXLAN) to isolate VM traffic from the physical network and from other VM traffic. (Think multi-tenancy.)
  • While this process was manual, think about some sort of controller (an OpenFlow controller, perhaps?) that could help automate this process based on its knowledge of the VM topology.
  • Using a virtualized router or virtualized firewall, I could easily provide connectivity into or out of this isolated (encapsulated) private network. (This is probably something I’ll experiment with later.)
  • What if we wrapped some sort of orchestration framework around this, to help deploy VMs, create networks, add routers/firewalls automatically, all based on the customer’s needs? (OpenStack Networking, anyone?)

Anyway, I hope this is helpful to someone. As always, I welcome feedback and suggestions for improvement, so feel free to speak up in the comments below. Vendor disclosures, where appropriate, are greatly appreciated. Thanks!

Tags: , , , , , ,

« Older entries