Networking

This category contains information on networking and networking-related technologies or vendors.

This is a liveblog for session DATS013, on microservers. I was running late to this session (my calendar must have been off—thought I had 15 minutes more), so I wasn’t able to capture the titles or names of the speakers.

The first speaker starts out with a review of exactly what a microserver is; Intel sees microservers as a natural evolution from rack-mounted servers to blades to microservers. Key microserver technologies include: Intel Atom C2000 family of processors; Intel Xeon E5 v2 processor family; and Intel Ethernet Switch FM6000 series. Microservers share some common characteristics, such as high integrated platforms (like integrated network) and being designed for high efficiency. Efficiency might be more important than absolute performance.

Disaggregation of resources is a common platform option for microservers. (Once again this comes back to Intel’s rack-scale architecture work.) This leads the speaker to talk about a Technology Delivery Vehicle (TDV) being displayed here at the show; this is essentially a proof-of-concept product that Intel built that incorporates various microserver technologies and design patterns.

Upcoming microserver technologies that Intel has announced or is working on incude:

  • The Intel Xeon D, a Xeon-based SoC with integrated 10Gbs Ethernet and running in a 15–45 watt power range
  • The Intel Ethernet Switch FM10000 series (a follow-on from the FM6000 series), which will offer a variety of ports for connectivity—not just Ethernet (it will support 1, 2.5, 10, 25, 40, and 100 Gb Ethernet) but also PCI Express Gen3 and connectivity to embedded Intel Ethernet controllers. In some way (the speaker is unclear) this is also aligned with Intel’s silicon photonics efforts.

A new speaker (Christian?) takes the stage to talk about software-defined infrastructure (SDI), which is the vision that Intel has been talking about this week. He starts the discussion by talking about NFV and SDN, and how these efforts enable agile networks on standard high volume (SHV) servers (such as microservers). Examples of SDN/NFV workloads include wireless BTS, CRAN, MME, DSLAM, BRAS, and core routers. Some of these workloads are well suited for running on Intel platforms.

The speaker transitions to talking about “RouterBricks,” a scalable soft router that was developed with involvement from Scott Shenker. The official term for this is a “switch-route-forward”, or SRF. A traditional SRF architecture can be replicated with COTS hardware using multi-queue NICs and multi-core/multi-socket CPUs. However, the speaker says that a single compute node isn’t capable of replacing a large traditional router, so instead we have to scale compute nodes by treating them as a “linecard” using Intel DPDK. The servers are interconnected using a mesh, ToR, or multi-stage Clos network. Workloads are scheduled across these server/linecards using Valiant Load Balancing (VLB). Of course, there are issues with packet-level load balancing and flow-level load balancing, so tradeoffs must be made one way or another.

An example SRF built using four systems, each with ten 10Gbps interfaces, is capable of sustaining 40Gbps line rate with 64 bytes, 128 bytes, 256 bytes, 512 bytes, 1024 bytes, and 1500 bytes. Testing latency and jitter using a Spirent shows that an SRF compares very favorably with an edge router, but not so well against a core router (even though everything on the SRF is software-based running on Linux). Out of order frames from the SRF were less than 0.04% in all cases.

That SRF was built using Xeon processors, but what about an SRF built using Atom processors? A single Atom core can’t sustain line rate at 64 or 128 bytes per packet, but 2 cores can sustain line rate. Testing latency and jitter showed results at less than 60 microseconds and less than 0.15 microseconds, respectively.

Comparing Xeon to Atom, the speaker shows that a Xeon core can move about 4 times the number of packets compared to an Atom core. A Xeon core will also use far less memory bandwidth than an Atom core due to Xeon’s support for Direct Data I/O, which copies (via DMA) data received by a NIC into the processor’s cache. Atom does not support this feature.

With respect to efficiency, Xeon versus Atom presents very interesting results. Throughput per rack unit is better for the Atom (40 Gbps/RU compared to 13.3Gbps/RU), while raw throughput is far better for the Xeon. Throughput per watt, on the other hand, is slightly better for the Atom (0.46 Gbps/watt versus 0.37 Gbps/watt for Xeon).

At this point, the first presenter re-takes the stage and they open up the session for questions.

Tags: , , ,

This is a liveblog of IDF 2014 session DATS002, titled “Virtualizing the Network to Enable a Software-Defined Infrastructure (SDI)”. The speakers are Brian Johnson (Solutions Architect, Intel) and Jim Pinkerton (Windows Server Architect, Microsoft). I attended a similar session last year; I’m hoping for some new information this year.

Pinkerton starts the session with a discussion of why Microsoft is able to speak to network virtualization via their experience with large-scale web properties (Bing, XBox Live, Outlook.com, Office, etc.). To that point, Microsoft has over 100K servers across their cloud properties, with >200K diverse services, first-party applications, and third-party applications. This amounts to $15 billion in data center investments. Naturally, all of this runs on Windows Server and Windows Azure.

So why does networking need to be transformed for the cloud? According to Pinkerton, the goal is to drive agility and flexibility for your business. This is accomplished by pooling and automating network resources, ensuring tenant isolation, maximizing scale/performance, enabling seamless capacity expansion and workload mobility, and minimizing operational complexity.

Johnson takes over here to talk about how Intel is working to address the challenges and needs that Pinkerton just outlined. This breaks down into three core areas that have unique requirements and capabilities: network functions virtualization (NFV), network virtualization overlays (NVO), and software-defined networking (SDN).

Johnson points out that workload optimization is more than just networking; it also involves CPU (E5–2600 v3 CPU family), network connectivity (Intel XL710, now offering support for next-generation Geneve encapsulation), and storage (Intel SSDs). Johnson dives deep on the XL710, which was specifically designed to address some of the needs of cloud networking. Particularly, support for a variety of encapsulation protocols (NVGRE, IPinGRE, MACinUDP, VXLAN, Geneve), support for 40Gbps or 4x10Gbps connectivity in the same card, support for up to 8000 perfect match flow filters stored on die (this is Intel Ethernet Flow Director), and support for SR-IOV and VMDq are all areas where this card helps with NVO and SDN applications.

Next up Johnson walks through some behaviors in traditional networking as compared to network virtualization using an encapsulation protocol. Johnson uses two examples, one with VXLAN and one with NVGRE, but the basics between the two examples are very similar. Johnson also talks about why the stateless offloads in the XL710 (now supporting stateless offloads for both VXLAN and NVGRE, as well as next-generation Geneve) is important; this offloads some amount of work from the host CPU. The impact of network overlays on NIC bonding and link aggregation is another consideration; adapters and switches may not be aware of the encapsulation headers and therefore may not fully utilize all the links in a link aggregation group. The Intel X520/X540 had some offloads; the XL710 increases this support.

That wraps up the NVO portion, and now Johnson switches gears to talk about NFV. According to Johnson, service function chaining (SFC) is a key component of NFV. There are two options for SFC: Network Services Header (NSH), or Geneve. Johnson points out that Geneve was co-authored by Intel, MIcrosoft, VMware, and Red Hat, and is considered to be the next-generation encapsulation protocol. This leads Johnson into a live demo of Geneve and the importance of RSS. (Without RSS, bandwidth is constrained on the receiving system.)

One other key area for support of NFV is being able to transmit large numbers of small packets. This is enabled by Intel’s work on the Data Plane Development Kit (DPDK).

Johnson points out that 40Gbps Ethernet will not offer a BASE-T option; to help address 40Gbps connectivity, Intel is introducing new, low-cost optics (both transceivers and cables). Estimated cost for Intel Ethernet MOC (Modular Optical Connectors) is around $400—well down from costs like $1300 today.

Pinkerton now takes over again, talking about VM density and the changes that have to take place to support higher VM density in private cloud environments (although I would contend that highly virtualized data centers are not private clouds). In particular, Pinkerton feels that SMB3 and SMB Direct (RDMA support) are important developments. According to Pinkerton, these protocols address the need for lower network and storage CPU overhead, higher throughput requirements, lower variances in latency and throughput, better fault tolerance, and VM workload isolation.

Pinkerton insists that using file sharing semantics is actually a much better approach for cloud-scale properties than using block-level semantics (basically, SMB3 is better than iSCSI/FC/FCoE). That leads to a discussion of RDMA (Remote Direct Memory Access), and how that helps improve performance. Standardized implementations of RDMA include iWARP (RDMA over TCP/IP) and RoCE (RDMA over Converged Ethernet). InfiniBand also typically leverages RDMA. In the context of private cloud, having the ability to route traffic is important; that’s why Pinkerton believes that iWARP and RoCE v2 (not mentioned on the slide) are important.

That leads to a discussion of some performance results, and Pinkerton calls out incast performance (many nodes sending data to a single node) as an important metric in private cloud environments. In reviewing some performance metric for using RDMA, Pinkerton states that average latency is no longer satisfactory as a metric—instead, organizations should focus on 95th percentile and 99th percentile measurements instead of average. The metrics Pinkerton is using (based on tests with a Chelsio T580) show latency with SMB3 and RDMA to be very stable up to 90% load, and throughput is near line-rate.

Johnson takes back over now to announce that iWARP support will be built into the next generation of Intel NIC chipsets as a default for server environments.

At this point the session wraps up.

Tags: , , ,

Welcome to Technology Short Take #44, the latest in my irregularly-published series of articles, links, ideas, and thoughts about various data center-related technologies. Enjoy!

Networking

  • One of the original problems with the VXLAN IETF specification was that it (deliberately) didn’t include any control plane information; as a result, the process of mapping MAC addresses to VTEPs (VXLAN Tunnel Endpoints) wasn’t defined, and the early implementations relied on multicast to handle this issue. To help resolve this issue, Cumulus Networks (and possibly Metacloud, I’m not sure of their involvement yet) has release an open source project called vxfld. As described in this Metacloud blog post, vxfld is designed to “handle VXLAN traffic from any operationg system or hardware platform that adheres to the IETF Internet-Draft for VXLAN”.
  • Nir Yechiel recently posted part 1 of a discussion on the need for network overlays. This first post is more of a discussion of why VLANs and VLAN-based derivatives aren’t sufficient, and why we should be looking to routing (layer 3) constructs instead. I’m looking forward to part 2 of the series.
  • One ongoing discussion in the network industry these days—or so it seems—is the discussion about the interaction between network overlays and the underlying transport network. Some argue that tight integration is required; others point to streaming video services and VoIP running across the Internet and insist that no integration or interaction is needed. In this post, Scott Jensen argues in favor of the former—that SDN solutions shouldn’t just manage network overlays, but should also manage the configuration of the physical transport network as well. I’d love to hear from more networking pros (please disclose company affiliations) about their thoughts on this matter.
  • I like the distinction made here between network automation and SDN.
  • Need to get a better grasp on OpenFlow? Check out OpenFlow basics and OpenFlow deep-dive.
  • Here’s a write-up on connecting Docker containers using VXLAN. I think there’s a great deal of promise for OVS in containerized environments, but what’s needed is better/tighter integration between OVS and container solutions like Docker.

Servers/Hardware

  • Is Intel having second thoughts about software-defined infrastructure? That’s the core question in this blog post, which explores the future of Intel in a software-defined world and the increasing interest in non-x86 platforms like ARM.
  • On the flip side, proponents who claim that platforms like ARM and others are necessary in order to move forward with SDN and NFV initiatives should probably read this article on 80 Gbps performance from an off-the-shelf x86 server. Impressive.

Security

  • It’s nice to see that work on OpenStack Barbican is progressing nicely; see this article for a quick overview of the project and an update on the status.

Cloud Computing/Cloud Management

  • SDN Central has a nice write-up on the need for open efforts in the policy space, which includes the Congress project.
  • The use of public cloud offerings as disaster recovery targets is on the rise; note this article from Microsoft on how to migrate on-premises workloads to Azure using Azure Site Recovery. VMware has a similar offering via the VMware vCloud Hybrid Service recovery-as-a-service offering.
  • The folks at eNovance have a write-up on multi-tenant Docker with OpenStack Heat. It’s an interesting write-up, but not for the faint of heart—to make their example work, you’ll need the latest builds of Heat and the Docker plugin (it doesn’t work with the stable branch of Heat).
  • Preston Bannister took a look at cloud application backup in OpenStack. His observations are, I think, rational and fair, and I’m glad to see someone paying attention to this topic (which, thus far, I think has been somewhat ignored).
  • Interested in Docker and Kubernetes on Azure? See here and here for more details.
  • This article takes a look at Heat-Translator, an effort designed to provide some interoperability between TOSCA and OpenStack HOT documents for application deployment and orchestration. The portability of orchestration resources is one of several aspects you’ll want to examine as you progress down the route of fully embracing a cloud computing operational model.

Operating Systems/Applications

  • Looks like we have another convert to Markdown—Anthony Burke recently talked about how he uses Markdown. Regular readers of this site know that I do almost all of my content generation using MultiMarkdown (a variation of Markdown with some expanded syntax options). Here’s a post I recently published on some useful Markdown tools for OS X.
  • Good to see that Ivan Pepelnjak thinks infrastructure as code makes sense. I guess that means the time I’ve spent with Puppet (you can browse Puppet-related posts here) wasn’t a waste.
  • I don’t know if I’ve mentioned this before (sorry if that’s the case), but I’m liking this “NIX4NetEng” series going on over at Nick Buraglio’s site (part 1, part 2, and part 3).
  • Mike Foley has a blog post on how to go from zero to Windows domain controller in only 4 reboots. Handy.

Storage

Virtualization

  • Running Hyper-V with Linux VMs? Ben Armstrong details what versions of Linux support the various Hyper-V features in this post.
  • Here’s a quick write-up on running VMs with VirtualBox 4.3 on a headless Ubuntu 14.04 LTS server.
  • Nested OS X guest on top of nested ESXi on top of VMware Fusion? Must be something William Lam’s tried. Go have a look at his write-up.
  • Here’s a quick update on Nova-Docker, the effort in OpenStack to allow users to deploy Docker containers via Nova. I’m not yet convinced that treating Docker as a hypervisor in Nova is the right path, but we’ll see how things develop.
  • This post is a nice write-up on the different ways to connect a Docker container to a local network.
  • Weren’t able to attend VMworld US in San Francisco last week? No worries. If you have access to the recorded VMworld sessions, check out Jason Boche’s list of the top 10 sessions for a priority list of what recordings to check out. Or need a recap of the week? See here (one of many recap posts, I’m sure).

That’s it this time around; hopefully I was able to include something useful for you. As always, all courteous comments are welcome, so feel free to speak up in the comments. In particular, if there is a technology area that I’m not covering (or not covering well), please let me know—and suggestions for more content sources are certainly welcome!

Tags: , , , , , , , , , , , , ,

In this post, I’ll show you how I got Arista’s vEOS software running under KVM to create a virtualized Arista switch. There are a number of other articles that help provide instructions on how to do this, but none of those that I found included the use of libvirt and/or Open vSwitch (OVS).

In order to run vEOS, you must first obtain a copy of vEOS. I can’t provide you with a copy; you’ll have to register on the Arista Networks site (see here) in order to gain access to the download. The download consists of two parts:

  1. The Aboot ISO, which contains the boot loader
  2. The vEOS disk image, provided as a VMware VMDK

Both of these are necessary; you can’t get away with just one or the other. Further, although the vEOS disk image is provided as a VMware VMDK, KVM/QEMU is perfectly capable of using the VMDK without any conversion required (this is kind of nice).

One you’ve downloaded these files, you can use the following libvirt domain XML definition to create a VM for running Arista vEOS (you’d use a command like virsh define <filename>).

(Click here if you can’t see the code block above.)

There are a few key things to note about this libvirt domain XML:

  • Note the boot order; the VM must boot from the Aboot ISO first.
  • Both the Aboot ISO as well as the vEOS VMDK are attached to the VM as devices, and you must use an IDE bus. Arista vEOS will refuse to boot if you use a SCSI device, so make sure there are no SCSI devices in the configuration. Pay particular attention to the type= parameters that specify the correct disk formats for the ISO (type “raw”) and VMDK (type “vmdk”).
  • For the network interfaces, you’ll want to be sure to use the e1000 model.
  • This example XML definition includes three different network interfaces. (More are supported; up to 7 interfaces on QEMU/KVM.)
  • This XML definition leverages libvirt integration with OVS so that libvirt automatically attaches VMs to OVS and correctly applies VLAN tagging and trunking configurations. In this case, the network interfaces are attaching to a portgroup called “trunked”; this portgroup trunks VLANs up to the guest domain (the vEOS VM, in this case). In theory, this should allow the vEOS VM to support VLAN trunk interfaces, although I had some issues making this work as expected and had to drop back to tagged interfaces.

Once you have the guest domain defined, you can start it by using virsh start <guest domain name>. The first time it boots, it will take a long time to come up. (A really long time—I watched it for a good 10 minutes before finally giving up and walking away to do something else. It was up when I came back.) According to the documentation I’ve found, this is because EOS needs to make a backup copy of the flash partition (which in this case is the VMDK disk image). It might be quicker for you, but be prepared for a long first boot just in case.

Once it’s up and running, use virsh vncdisplay to get the VNC display of the vEOS guest domain, then use a VNC viewer to connect to the guest domain’s console. You won’t be able to SSH in yet, as all the network interfaces are still unconfigured. At the console, set an IP address on the Management1 interface (which will correspond to the first virtual network interface defined in the libvirt domain XML) and then you should have network connectivity to the switch for the purposes of management. Once you create a username and a password, then you’ll be able to SSH into your newly-running Arista vEOS switch. Have fun!

For additional information and context, here are some links to other articles I found on this topic while doing some research:

If you have any questions or need more information, feel free to speak up in the comments below. All courteous comments are welcome!

Tags: , , , , ,

Welcome to Technology Short Take #43, another episode in my irregularly-published series of articles, links, and thoughts from around the web, focusing on data center technologies like networking, virtualization, storage, and cloud computing. Here’s hoping you find something useful.

Networking

  • Jason Edelman recently took a look at Docker networking. While Docker is receiving a great deal of attention, I have to say that I feel Docker networking is a key area that hasn’t received the amount of attention that it probably needs. It would be great to see Docker get support for connecting containers directly to Open vSwitch (OVS), which is generally considered the de facto standard for networking on Linux hosts.
  • Ivan Pepelnjak asks the question, “Is OpenFlow the best tool for overlay virtual networks?” While so many folks see OpenFlow as the answer regardless of the question, Ivan takes a solid look at whether there are better ways of building overlay virtual networks. I especially liked one of the last statements in Ivan’s post: “Wouldn’t it be better to keep things simple instead of introducing yet-another less-than-perfect abstraction layer?”
  • Ed Henry tackles the idea of abstraction vs. automation in a fairly recent post. It’s funny—I think Ed’s post might actually be a response to a Twitter discussion that I started about the value of the abstractions that are being implemented in Group-based Policy (GBP) in OpenStack Neutron. Specifically, I was asking if there was value in creating an entirely new set of abstractions when it seemed like automation might be a better approach. Regardless, Ed’s post is a good one—the decision isn’t about one versus the other, but rather recognizing, in Ed’s words, “abstraction will ultimately lead to easier automation.” I’d agree with that, with one change: the right abstraction will lead to easier automation.
  • Jason Horn provides an example of how to script NSX security groups.
  • Interested in setting up overlays using Open vSwitch (OVS)? Then check out this article from the ever-helpful Brent Salisbury on setting up overlays on OVS.
  • Another series on VMware NSX has popped up, this time from Jon Langemak. Only two posts so far (but very thorough posts), one on setting up VMware NSX and another on logical networking with VMware NSX.

Servers/Hardware

Nothing this time around, but I’ll keep my eyes open for more content to include next time.

Security

  • Someone mentioned I should consider using pfctl and its ability to automatically block remote hosts exceeding certain connection rate limits. See here for details.
  • Bromium published some details on a Android security flaw that’s worth reviewing.

Cloud Computing/Cloud Management

  • Want to add some Docker to your vCAC environment? This post provides more details on how it is done. Kind of cool, if you ask me.
  • I am rapidly being pulled “higher” up the stack to look at tools and systems for working with distributed applications across clusters of servers. You can expect to see some content here soon on topics like fleet, Kubernetes, Mesos, and others. Hang on tight, this will be an interesting ride!

Operating Systems/Applications

  • A fact that I think is sometimes overlooked when discussing Docker is access to the Docker daemon (which, by default, is accessible only via UNIX socket—and therefore accessible locally only). This post by Adam Stankiewicz tackles configuring remote TLS access to Docker, which addresses that problem.
  • CoreOS is a pretty cool project that takes a new look at how Linux distributions should be constructed. I’m kind of bullish on CoreOS, though I haven’t had nearly the time I’d like to work with it. There’s a lot of potential, but also some gotchas (especially right now, before a stable product has been released). The fact that CoreOS takes a new approach to things means that you might need to look at things a bit differently than you had in the past; this post tackles one such item (pushing logs to a remote destination).
  • Speaking of CoreOS: here’s how to test drive CoreOS from your Mac.
  • I think I may have mentioned this before; if so, I apologize. It seems like a lot of folks are saying that Docker eliminates the need for configuration management tools like Puppet or Chef. Perhaps (or perhaps not), but in the event you need or want to combine Puppet with Docker, a good place to start is this article by James Turnbull (formerly of Puppet, now with Docker) on building Puppet-based applications inside Docker.
  • Here’s a tutorial for running Docker on CloudSigma.

Storage

  • It’s interesting to watch the storage industry go through the same sort of discussion around what “software-defined” means as the networking industry has gone through (or, depending on your perspective, is still going through). A few articles highlight this discussion: this one by John Griffith (Project Technical Lead [PTL] for OpenStack Cinder), this response by Chad Sakac, this response by the late Jim Ruddy, this reply by Kenneth Hui, and finally John’s response in part 2.

Virtualization

  • The ability to run nested hypervisors is the primary reason I still use VMware Fusion on my laptop instead of switching to VirtualBox. In this post Cody Bunch talks about how to use Vagrant to configure nested KVM on VMware Fusion for using things like DevStack.
  • A few different folks in the VMware space have pointed out the VMware OS Optimization Tool, a tool designed to help optimize Windows 7/8/2008/2012 systems for use with VMware Horizon View. Might be worth checking out.
  • The VMware PowerCLI blog has a nice three part series on working with Customization Specifications in PowerCLI (part 1, part 2, and part 3).
  • Jason Boche has a great collection of information regarding vSphere HA and PDL. Definitely be sure to give this a look.

That’s it for this time around. Feel free to speak up in the comments and share any thoughts, clarifications, corrections, or other ideas. Thanks for reading!

Tags: , , , , , , , , , , ,

It’s that time again—time for community voting on sessions for the fall OpenStack Summit, being held in Paris this year in early November. I wanted to take a moment and share some of the sessions in which I’m involved and/or that I think might be useful. It would be great if you could take a moment to add your votes for the sessions.

My Sessions

I have a total of four session proposals submitted this year:

Congress Sessions

You may also be aware that I am involved with a project called Congress, which aims to bring an overarching policy service to OpenStack. Here are some sessions pertaining to Congress:

VMware Sessions

Arvind Soni, one of the product managers for OpenStack at VMware, kindly pulled together this list of VMware-related sessions, so feel free to have a look at any of these and vote on what sounds appealing to you.

Other Sessions

There are way too many sessions to list all the interesting ones, but here are a few that caught my eye:

There are a bunch more that looked interesting to me, but I’ll skip listing them all here—just hop over to the OpenStack site and vote for the sessions you want to see.

Tags: , , , , ,

This is part 15 of my Learning NSX blog series, in which I will spend some time diving a bit deeper into some of the components involved in the logical routing process I described in part 14. Specifically, I’ll be taking a deeper look at gateway appliances, gateway services, and logical routers, and the relationships among these various components.

If you haven’t read any of the prior posts in this series, it would be ideal to read all of them before continuing; you can find links on my Learning NVP/NSX page. In particular, I’d suggest reading part 6 (on adding a gateway appliance), part 9 (on adding a gateway service), and part 14 (on logical routing and logical routers).

Just for the sake of completeness and to reinforce what was introduced in those posts I referenced, let’s start with some terminology:

  • Gateway (or gateway appliance): When I use the terms gateway or gateway appliance, I’m referring to the NSX software gateway that acts as the “on-ramp/off-ramp” to and from logical networks. What makes this confusing is that we also use the term “gateway” (in particular, “IP gateway” or “default gateway”) to refer to a Layer 3 router that acts as the next hop for a aystem. I’ll do my best to make sure that I’m clearly distinguishing between these ambiguous uses.
  • Gateway service: A gateway service is a logical construct within NSX that allows you to group together multiple gateway appliances. For example, in an L2 gateway service, you can combine two gateway appliances so that you have redundancy in providing L2 bridging functionality between a logical network and a physical network. In an L3 gateway service, you can combine up to 10 gateway appliances together for redundancy and scale-out performance.
  • Logical router: As you might recall from part 14, a logical router is a logical construct within NSX that provides Layer 3 routing functionality, typically (but not always) on a per-tenant basis.

I have a few more terms I’ll introduce in this post, but that should be enough for now.

This diagram contains the bulk of what I’d like to discuss in this post—the relationship between gateway services, gateway appliances, and logical routers:

As I walk you through the details of this diagram, hopefully I’ll clarify the relationships between these components.

  • In this example, there are four gateway appliances combined into a single Layer 3 gateway service. As illustrated in the diagram, gateway services can contain more than one gateway appliance (the minimum recommended is two, for reasons to be explained shortly). Gateway services may be either Layer 2 (bridging/switching) or Layer 3 (routing), but not both.
  • A gateway appliance may be a member of only one gateway service at a time; therefore, a gateway appliance is either L2 or L3, but not both.
  • When adding a gateway appliance to a gateway service, the administrator or operator has the ability to specify a failure zone ID. The idea behind the failure zone ID is to help model fault domains within a single gateway service. For example, if GW Appliance 1 is in a different fault domain—say, a different rack—then the administrator or operator could assign a different failure zone ID to GW Appliance 1, indicating that GW Appliance 1 is in a different fault domain. The significance of this functionality will be made clear in a moment.
  • Note that gateway services, gateway appliances, and failure zone IDs are not visible to tenants. Further, the configuration or management of these entities is handled through NSX (via API or NSX Manager), and isn’t tenant-specific. The CMP—OpenStack, for example—doesn’t get involved here.
  • The example diagram shows four different logical routers spread across three tenants. Each of these logical routers acts as an IP gateway (default gateway/default route) for the associated (or connected) logical network(s). Thus, a logical router is visible to a tenant.
  • Creating, managing, and configuring logical routers is handled by the CMP. With OpenStack, for example, you’d use the OpenStack Dashboard or the Neutron command-line client.
  • For redundancy, you’ll note that each logical router is instantiated on 2 different gateway appliances within the gateway service (hence why a minimum of 2 gateway appliances within a gateway service is recommended). This is completely invisible to the tenant and is handled automatically by NSX. If failure zone IDs—indicating different fault domains—are configured on the gateway appliances, then NSX will instantiate the logical router on gateway appliances in different failure zones. This is an attempt to minimize downtime by spreading the logical router across fault domains.

So far, everything I’ve shared with you has been true for centralized logical routers. For distributed logical routers, things are only slightly different. Distributed logical routers are normally instantiated on the hypervisors; a gateway service and its associated gateway appliances only gets involved when you set the uplink for the distributed logical router (using the “Set Gateway” button in OpenStack Dashboard, for example). If you never set an uplink for the logical router, it will remain instantiated only on the hypervisors, and not on the gateway service/gateway appliances.

I hope this information helps in understanding the routing aspects of VMware NSX. Feel free to post any questions, clarifications, or thoughts in the comments below. Any input on other topics you’d like to see in the Learning NSX blog series are welcome as well!

Tags: , , , , , ,

Welcome to part 14 of the Learning NSX blog series, in which I discuss the ability for VMware NSX to do Layer 3 routing in logical networks. This post will also include a look at a very cool feature within VMware NSX known as distributed logical routing. This post will take a closer look at distributed logical routing within the context of an OpenStack environment that’s been integrated with VMware NSX. (Although NSX isn’t necessarily tied to OpenStack, I’ll assume you’re using OpenStack just to simplify the discussion.)

If you’re new to this series, you can find links to all the articles on my Learning NVP/NSX page. Ideally, I’d recommend you read all the articles, but if you’re just interested in some of the high-level concepts you probably don’t need to do that. For those interested in the deep technical details, I’d suggest catching up on the series before proceeding.

Overview of Logical Routing

One of the features of VMware NSX that can be useful, depending on customer requirements, is the ability to create complex network topologies. For example, creating a multi-tier network topology like the one shown below is easily accomplished via VMware NSX:

Sample network topology

Note that this topology has two tenant-specific routing entities—these are logical routers. A logical router is an abstraction created and maintained by VMware NSX on behalf of your cloud management platform (like OpenStack, which I’ll assume you’re using here). These logical entities perform the routing process just like a physical router would (forwarding traffic based on a routing table, changing the source and destination MAC address, maintaining an ARP cache of MAC addresses, decrementing the TTL, etc.). Of course, they are not exactly the same as physical routers; you can’t, for example, connect two logical routers directly to each other.

Logical routers also act as the logical boundary between one or more logical networks and an external network. Logical routers can be connected to multiple logical networks (each logical network with its own logical router interface), but can only be connected to a single external network. Thus, you can’t use a logical router as a transit path between two external networks (two VLANs, for example).

Now that you have a good understanding of logical routing, let’s take a closer look at the various components inside VMware NSX.

Components of Logical Routing

The components are pretty straightforward. In addition to the logical router abstraction that I’ve discussed already, you also have logical router ports (naturally, these are the ports on a logical router that connect it to a logical network or an external network), network address translation (NAT) rules (for handling address translation tasks), and a routing table (for…well, routing).

You can see all of these components in NSX Manager. Once you’re logged into NSX Manager, select Network Components > Logical Layer > Logical Routers, then click on a specific logical router from the list. This will display the screen shown below (click the image for a larger version):

Logical router detail in NSX Manager

A few things to note here:

  • You’ll note that the logical router has a port whose attachment is listed as “L3GW”. This denotes an attachment to a Layer 3 Gateway Service, an entity I described in part 9 of the series. This Layer 3 Gateway Service is itself comprised of two NSX gateway appliances; part 6 in the series discussed how to add a gateway appliance to your installation. The relationship between logical router, Layer 3 Gateway Service, and gateway appliance can be confusing for some; I plan to discuss that in more detail in the next post.
  • This particular logical router is not configured as a distributed logical router. This means that the actual routing function resides on a Layer 3 Gateway Service. The routing functionality is instantiated in a highly available configuration on two different gateway appliances within the Layer 3 Gateway Service.
  • NAT Synchronization is set to on; this refers to keeping NAT state synchronized between the active and standby routing functions instantiated on the gateway appliances.
  • As noted under Replication Mode, this router uses an NSX service node (refer to part 10 for more details on service nodes) for packet replication/BUM traffic.
  • You might notice that one of the logical router ports is assigned the IP address 169.254.169.253 (and you’ll also note a corresponding “no NAT” rule and routing table entries for that same network). Astute readers recognize this as the network for Automatic Private IP Addressing (APIPA), also known as IPv4 Link-Local Addresses per RFC 3927. This exists to support an OpenStack-specific feature known as the metadata service, and is created automatically by OpenStack. (I’ll talk more about OpenStack later in this post.)

All of these components and settings are accessible via the NSX API, and since NSX Manager is completely an API client (it merely consumes NSX APIs and does not provide standalone functionality outside of some logging features), you could create, modify, and delete any of the logical routing components directly within NSX Manager. (Or, if you were so inclined, you could make the API calls yourself to do these tasks.) Typically, though, these tasks would be handled via integration between NSX and your cloud management platform, like OpenStack.

One key component of NSX’s logical routing functionality that you can’t see in NSX Manager is how the routing is actually implemented in the data plane. As with most features in NSX, the actual data plane implementation is handled via Open vSwitch (OVS) and a set of flow rules pushed down by the NSX controllers. These flow rules control the flow of traffic within and between logical networks (logical switches in NSX). You can see some of the flow rules in OVS using the ovs-dpctl dump-flows command, which will produce output something like what’s shown in this screenshot (note that the addresses are highlighted because I used grep to show only the flows matching a certain IP address):

List of flows in OVS

(Click the image above for a larger version.)

These flow rules include actions like re-writing source and destination MAC addresses and decrementing the TTL, both tasks carried out by “normal” routers when routing traffic between networks. These flow rules also provide some insight into the differences between a logical router and a distributed logical router. While both are logical entities, the way in which the data plane is implemented is different for each:

  • For a logical router, the flow rules will direct traffic to the appropriate gateway appliance in the Layer 3 Gateway Service. The logical router is actually instantiated on a gateway appliance, so all routed traffic must go to the logical router, get “routed” (routing table consulted, source and destination MAC re-written, TTL decremented, NAT rules applied, etc.), then get sent on to the final destination (which might be a VM on a hypervisor in NSX or might be a physical network outside of NSX).
  • For a distributed logical router, the flow rules will direct traffic either to the appropriate gateway appliance in the Layer 3 Gateway Service or to the destination hypervisor directly. Why the “either/or”? If the traffic is north/south traffic—that is, traffic being routed out of a logical network onto the physical network—then it must go to the gateway appliance (which, as I have mentioned before, is where traffic is unencapsulated and placed onto the physical network). However, if the traffic is east/west traffic—traffic that is moving from one server on a logical network to another server on a logical network—then the traffic is “routed” directly on the source hypervisor and then sent across an encapsulated connection to the hypervisor where the destination VM resides.

In both cases, there is only one logical router. For a non-distributed logical router, the data plane is instantiated on a gateway appliance only. For a distributed logical router, the data plane is instantiated both on the local hypervisors as well as on a gateway appliance. (This is assuming you’ve set an uplink on the logical router, meaning you have a north/south connection. If you haven’t set an uplink, then the routing functionality is instantiated on the hypervisors only.)

This should provide a good overview of how logical routing is implemented in VMware NSX, but there’s one more aspect I want to cover: logical routers in OpenStack with NSX.

Logical Routers in OpenStack

As you work with OpenStack Networking—Neutron, as it’s commonly called—you’ll find that the abstractions Neutron uses map really well to the abstractions that NSX uses. So, to create a logical router in NSX, you just create a logical router in OpenStack. Attaching an OpenStack logical router to a logical network tells NSX to create the logical switch port, create the logical router port, and connect the two ports together.

In OpenStack, there are a number of different ways to create a logical router:

  • OpenStack Dashboard (Horizon)
  • Command-line interface (CLI)
  • OpenStack Orchestration (Heat) template
  • API calls directly

When using the web-based Dashboard user interface, you can only create centralized logical routers, not distributed logical routers. The Dashboard UI also doesn’t provide any way of knowing if a logical router is distributed or not; for that, you’ll need the CLI (the command is provided shortly).

On a system with the neutron CLI client installed, you can create a logical router like this:

neutron router-create <router name>

This creates a centralized logical router. If you want to create a distributed logical router, it’s as simple as this:

neutron router-create <router name> -\-distributed True

The neutron router-show command will return output about the specified logical router; that output will tell you if it is a distributed logical router.

The neutron CLI client also offers commands to update a logical router’s routing table (to add or remove static routes, for example), or to connect a logical router to an external network (to set an uplink, in other words).

If you want to create a logical router as part of a stack created via OpenStack Orchestration (Heat), you could use this YAML snippet in a HOT-formatted template to create a distributed logical router (click here if you can’t see the code block below):

OpenStack Heat also offers resource types for setting the router’s external gateway and creating router interfaces (logical router ports). If you aren’t familiar with OpenStack Heat, you might find this introduction useful.

That wraps up this post on logical routing with VMware NSX. As always, I welcome your courteous feedback, so feel free to speak up in the comments below. In the next post, I’ll spend a bit of time discussing logical routers, gateway servies, and gateway appliances. See you next time!

Tags: , , , , , , ,

Welcome to Technology Short Take #42, another installation in my ongoing series of irregularly published collections of news, items, thoughts, rants, raves, and tidbits from around the Internet, with a focus on data center-related technologies. Here’s hoping you find something useful!

Networking

  • Anthony Burke’s series on VMware NSX continues with part 5.
  • Aaron Rosen, a Neutron contributor, recently published a post about a Neutron extension called Allowed-Address-Pairs and how you can use it to create high availability instances using VRRP (via keepalived). Very cool stuff, in my opinion.
  • Bob McCouch has a post over at Network Computing (where I’ve recently started blogging as well—see my first post) discussing his view on how software-defined networking (SDN) will trickle down to small and mid-sized businesses. He makes comparisons among server virtualization, 10 Gigabit Ethernet, and SDN, and feels that in order for SDN to really hit this market it needs to be “not a user-facing feature, but rather a means to an end” (his words). I tend to agree—focusing on SDN is focusing on the mechanism, rather than focusing on the problems the mechanism can address.
  • Want or need to use multiple external networks in your OpenStack deployment? Lars Kellogg-Stedman shows you how in this post on multiple external networks with a single L3 agent.

Servers/Hardware

  • There was some noise this past week about Cisco UCS moving into the top x86 blade server spot for North America in Q1 2014. Kevin Houston takes a moment to explore some ideas why Cisco was so successful in this post. I agree that Cisco had some innovative ideas in UCS—integrated management and server profiles come to mind—but my biggest beef with UCS right now is that it is still primarily a north/south (server-to-client) architecture in a world where east/west (server-to-server) traffic is becoming increasingly critical. Can UCS hold on in the face of a fundamental shift like that? I don’t know.

Security

  • Need to scramble some data on a block device? Check out this command. (I love the commandlinefu.com site. It reminds me that I still have so much yet to learn.)

Cloud Computing/Cloud Management

  • Want to play around with OpenDaylight and OpenStack? Brent Salisbury has a write-up on how to OpenStack Icehouse (via DevStack) together with OpenDaylight.
  • Puppet Labs has released a module that allows users to programmatically (via Puppet) provision and configure Google Compute Platform (GCP) instances. More details are available in the Puppet Labs blog post.
  • I love how developers come up with these themes around certain projects. Case in point: “Heat” is the name of the project for orchestrating resources in OpenStack, HOT is the name for the format of Heat templates, and Flame is the name of a new project to automatically generate Heat templates.

Operating Systems/Applications

  • I can’t imagine that anyone has been immune to the onslaught of information on Docker, but here’s an article that might be helpful if you’re still looking for a quick and practical introduction.
  • Many of you are probably familiar with Razor, the project that former co-workers Nick Weaver and Tom McSweeney created when they were at EMC. Tom has since moved on to CSC (via the vCHS team at VMware) and has launched a “next-generation” version of Razor called Hanlon. Read more about Hanlon and why this is a new/separate project in Tom’s blog post here.
  • Looking for a bit of clarity around CoreOS and Project Atomic? I found this post by Major Hayden to be extremely helpful and informative. Both of these projects are on my radar, though I’ll probably focus on CoreOS first as the (currently) more mature solution.
  • Linux Journal has a nice multi-page write-up on Docker containers that might be useful if you are still looking to understand Docker’s basic building blocks.
  • I really enjoyed Donnie Berkholz’ piece on microservices and the migrating Unix philosophy. It was a great view into how composability can (and does) shift over time. Good stuff, I highly recommend reading it.
  • cURL is an incredibly useful utility, especially in today’s age of HTTP-based REST API. Here’s a list of 9 uses for cURL that are worth knowing. This article on testing REST APIs with cURL is handy, too.
  • And for something entirely different…I know that folks love to beat up AppleScript, but it’s cross-application tasks like this that make it useful.

Storage

  • Someone recently brought the open source Open vStorage project to my attention. Open vStorage compares itself to VMware VSAN, but supporting multiple storage backends and supporting multiple hypervisors. Like a lot of other solutions, it’s implemented as a VM that presents NFS back to the hypervisors. If anyone out there has used it, I’d love to hear your feedback.
  • Erik Smith at EMC has published a series of articles on “virtual storage networks.” There’s some interesting content there—I haven’t finished reading all of the posts yet, as I want to be sure to take the time to digest them properly. If you’re interested, I suggest starting out with his introductory post (which, strangely enough, wasn’t the first post in the series), then moving on to part 1, part 2, and part 3.

Virtualization

  • Did you happen to see this write-up on migrating a VMware Fusion VM to VMware’s vCloud Hybrid Service? For now—I believe there are game-changing technologies out there that will alter this landscape—one of the very tangible benefits of vCHS is its strong interoperability with your existing vSphere (and Fusion!) workloads.
  • Need a listing of the IP addresses in use by the VMs on a given Hyper-V host? Ben Armstrong shares a bit of PowerShell code that produces just such a listing. As Ben points out, this can be pretty handy when you’re trying to track down a particular VM.
  • vCenter Log Insight 2.0 was recently announced; Vladan Seget has a decent write-up. I’m thinking of putting this into my home lab soon for gathering event information from VMware NSX, OpenStack, and the underlying hypervisors. I just need more than 24 hours in a day…
  • William Lam has an article on lldpnetmap, a little-known utility for mapping ESXi interfaces to physical switches. As the name implies, this relies on LLDP, so switches that don’t support LLDP or that don’t have LLDP enabled won’t work correctly. Still, a useful utility to have in your toolbox.
  • Technology previews of the next versions of Fusion (Fusion 7) and Workstation (Workstation 11) are available; see Eric Sloof’s articles (here and here for Fusion and Workstation, respectively) for more details.
  • vSphere 4 (and associated pieces) are no longer under general support. Sad face, but time stops for no man (or product).
  • Having some problems with VMware Fusion’s networking? Cody Bunch channels his inner Chuck Norris to kick VMware Fusion networking in the teeth.
  • Want to preview OS X Yosemite? Check out William Lam’s guide to using Fusion or vSphere to preview the new OS X beta release.

I’d better wrap this up now, or it’s going to turn into one of Chad’s posts. (Just kidding, Chad!) Thanks for taking the time to read this far!

Tags: , , , , , , , , , , , , , , ,

Welcome to Technology Short Take #41, the latest in my series of random thoughts, articles, and links from around the Internet. Here’s hoping you find something useful!

Networking

  • Network Functions Virtualization (NFV) is a networking topic that is starting to get more and more attention (some may equate “attention” with “hype”; I’ll allow you to draw your own conclusion there). In any case, I liked how this article really hit upon what I personally feel is something many people are overlooking in NFV. Many vendors are simply rushing to provide virtualized versions of their solution without addressing the orchestration and automation side of the house. I’m looking forward to part 2 on this topic, in which the author plans to share more technical details.
  • Rob Sherwood, CTO of Big Switch, recently published a reasonably in-depth look at “modern OpenFlow” implementations and how they can leverage multiple tables in hardware. Some good information in here, especially on OpenFlow basics (good for those of you who aren’t familiar with OpenFlow).
  • Connecting Docker containers to Open vSwitch is one thing, but what about using Docker containers to run Open vSwitch in userspace? Read this.
  • Ivan knocks centralized SDN control planes in this post. It sounds like Ivan favors scale-out architectures, not scale-up architectures (which are typically what is seen in centralized control plane deployments).
  • Looking for more VMware NSX content? Anthony Burke has started a new series focusing on VMware NSX in pure vSphere environments. As far as I can tell, Anthony is up to 4 posts in the series so far. Check them out here: part 1, part 2, part 3, and part 4. Enjoy!

Servers/Hardware

  • Good friend Simon Seagrave is back to the online world again with this heads-up on a potential NIC issue with an HP Proliant firmware update. The post also contains a link to a fix for the issue. Glad to see you back again, Simon!
  • Tom Howarth asks, “Is the x86 blade server dead?” (OK, so he didn’t use those words specifically. I’m paraphrasing for dramatic effect.) The basic premise of Tom’s position is that new technologies like server-side caching and VSAN/Ceph/Sanbolic (turning direct-attached storage into shared storage) will dramatically change the landscape of the data center. I would generally agree, although I’m not sure that I agree with Tom’s statement that “complexity is reduced” with these technologies. I think we’re just shifting the complexity to a different place, although it’s a place where I think we can better manage the complexity (and perhaps mask it). What do you think?

Security

Cloud Computing/Cloud Management

  • Juan Manuel Rey has launched a series of blog posts on deploying OpenStack with KVM and VMware NSX. He has three parts published so far; all good stuff. See part 1, part 2, and part 3.
  • Kyle Mestery brought to my attention (via Twitter) this list of the “best newly-available OpenStack guides and how-to’s”. It was good to see a couple of Cody Bunch’s articles on the list; Cody’s been producing some really useful OpenStack content recently.
  • I haven’t had the opportunity to use SaltStack yet, but I’m hearing good things about it. It’s always helpful (to me, at least) to be able to look at products in the context of solving a real-world problem, which is why seeing this post with details on using SaltStack to automate OpenStack deployment was helpful.
  • Here’s a heads-up on a potential issue with the vCAC 6.0.1.1 upgrade—the upgrade apparently changes some configuration files. The linked blog post provides more details on which files get changed. If you’re looking at doing this upgrade, read this to make sure you aren’t adversely affected.
  • Here’s a post with some additional information on OpenStack live migration that you might find useful.

Operating Systems/Applications

  • RHEL7, Docker, and Puppet together? Here’s a post on just such a use case (oh, I forgot to mention OpenStack’s involved, too).
  • Have you ever walked through a spider web because you didn’t see it ahead of time? (Not very fun.) Sometimes I feel that way with certain technologies or projects—like there are connections there with other technologies, projects, trends, etc., that aren’t quite “visible” just yet. That’s where I am right now with the recent hype around containers and how they are going to replace VMs. I’m not so sure I agree with that just yet…but I have more noodling to do on the topic.

Storage

  • “Server SAN” seems to be the name that is emerging to describe various technologies and architectures that create pools of storage from direct-attached storage (DAS). This would include products like VMware VSAN as well as projects like Ceph and others. Stu Miniman has a nice write-up on Server SAN over at Wikibon; if you’re not familiar with some of the architectures involved, that might be a good place to start. Also at Wikibon, David Floyer has a write-up on the rise of Server SAN that goes into a bit more detail on business and technology drivers, friction to adoption, and some recommendations.
  • Red Hat recently announced they were acquiring Inktank, the company behind the open source scale-out Ceph project. Jon Benedict, aka “Captain KVM,” weighs in with his thoughts on the matter. Of course, there’s no shortage of thoughts on the acquisition—a quick web search will prove that—but I find it interesting that none of the “big names” in storage social media had anything to say (not that I could find, anyway). Howard? Stephen? Chris? Martin? Bueller?

Virtualization

  • Doug Youd pulled together a nice summary of some of the issues and facts around routed vMotion (vMotion across layer 3 boundaries, such as across a Clos fabric/leaf-spine topology). It’s definitely worth a read (and not just because I get mentioned in the article, either—although that doesn’t hurt).
  • I’ve talked before—although it’s been a while—about Hyper-V’s choice to rely on host-level NIC teaming in order to provide network link redundancy to virtual machines. Ben Armstrong talks about another option, guest-level NIC teaming, in this post. I’m not so sure that using guest-level teaming is any better than relying on host-level NIC teaming; what’s really needed is a more full-featured virtual networking layer.
  • Want to run nested ESXi on vCHS? Well, it’s not supported…but William Lam shows you how anyway. Gotta love it!
  • Brian Graf shows you how to remove IP pools using PowerCLI.

Well, that’s it for this time around. As always, I welcome all courteous comments, so feel free to share your thoughts, ideas, rants, links, or feedback in the comments below.

Tags: , , , , , , , , , , , , ,

« Older entries