In this post, I’ll describe a technique I found for simplifying the use of multi-machine Vagrantfiles by extracting configuration data into a separate YAML file. This technique is by no means something that I invented or created, so I can’t take any credit whatsoever; this is an idea I first saw here. I wanted to share it here in the hopes that it might prove useful to a larger audience.

If you aren’t familiar with Vagrant and Vagrantfiles, you might start with my quick introduction to Vagrant.

I found this technique after trying to find a way to simplify the creation of multiple machines using Vagrant. Specifically, I was trying to create multiple instances of CoreOS along with an Ubuntu instance for testing things like etcd, fleet, Docker, etc. The Vagrantfile was getting more and more complex, and making changes (to add another CoreOS node, for example) wasn’t as straightforward as I would have liked.

So what’s the fix? As with other DSLs (domain-specific languages) such as Puppet, the fix was found in separating the data from the code. In Puppet, that means parameterizing the module or class, and I needed to use a similar technique here with Vagrant. So, based on the example provided here, I came up with this Vagrantfile:

(If you can’t see the code block above, please click here.)

You’ll note that there is no specific data in this Vagrantfile; all the pertinent configuration data is found in an external YAML file, named servers.yaml (the name is obviously configurable within the Vagrantfile). The Vagrantfile simply retrieves the data from the YAML file and iterates over it.

Here’s a sample servers.yaml you could use with this Vagrantfile:

(Click here if the code block above isn’t visible.)

Using the sample servers.yaml file shown above, the Vagrantfile would create three VMs, using the CoreOS Alpha Vagrant box, each with 512MB of RAM and the corresponding IP address. To add more systems to the configuration, simply add lines to the YAML file and you’re done. It’s as simple as that.

Clearly, there’s a lot more that could be done with this sort of approach, but I think this bare-bones example should give you an idea of how flexible something like this can be.

For an even more supercharged approach, check out Oh My Vagrant!. It takes the approach I’ve described here to an entirely new level.

Tags: , ,

It’s no secret that I’m something of a photography enthusiast. To me, photography is a relaxing puzzle of how to assemble all the various pieces—setting, lighting, exposure, composition, etc.—to create just the right image. I’m not an expert, but that’s OK; I just do this for fun and to relax. If you’d like to see a small sampling of some of the photos I’ve taken, I publish some of them here on 500px.com.

I know that a fair number of folks in the IT industry are also photo enthusiasts, and so I was curious to hear some feedback from fellow enthusiasts about their photography workflows. In particular, I’m curious to know about how others answer these questions:

  • What formats do you use with your photos? (I’ve been shooting in RAW—NEF, specifically, since I’m a Nikon guy—then converting to Adobe DNG for use with Lightroom.)
  • How do you handle long-term storage of your photos? (Once I have the photos in DNG in Lightroom, then I’ve been archiving the RAW files on my Synology NAS.)
  • What pictures do you keep—all of them, or only the best ones? (So far, I’ve been keeping all the RAW files, archiving when necessary, but pruning down the DNGs stored in Lightroom.)
  • How do you keep your files organized on disk? (I’ve been organizing mine by date, since this works well with the date-based filenaming convention I use.)

I think I’ve (finally) settled into a reasonable routine/approach, but it’s always good to hear how others are doing it. You never know when you might learn something new! If you’re a photo enthusiast like me—or even if you consider yourself something more, like a professional photographer—I’d be curious to know more about your workflow. Feel free to speak up in the comments below. All courteous comments are welcome!

Tags: ,

This is part 16 of the Learning NSX series, in which I will show you how to configure VMware NSX to route to multiple external VLANs. This configuration will allow you to have logical routers that could be uplinked to any of the external VLANs, providing additional flexibility for consumers of NSX logical networks.

Naturally, this post builds on all the previous entries in this series, so I encourage you to visit the Learning NVP/NSX page for links to previous posts. Because I’ll specifically be discussing NSX gateways and routing, there are some posts that are more applicable than others; specifically, I strongly recommend reviewing part 6, part 9, part 14, and part 15. Additionally, I’ll assume you’re using VMware NSX with OpenStack, so reviewing part 11 and part 12 might also be helpful.

Ready? Let’s start with a very quick review.

Review of NSX Gateway Connectivity

You may recall from part 6 that the NSX gateway appliance is the piece of VMware NSX that handles traffic into or out of logical networks. As such, the NSX gateway appliance is something of a “three-legged” appliance:

  • One “leg” (network interface) provides management connectivity among the gateway appliance and the nodes in the NSX controller cluster
  • One “leg” provides connectivity to the transport network, which carries the encapsulated logical network traffic
  • One “leg” is the uplink and provides connectivity to physical networks

That’s the physical architecture. From a more logical architecture, you may recall from part 15 that NSX gateway appliances are combined into an NSX gateway service, and the NSX gateway service hosts one or more logical routers. Neither the NSX gateway appliance nor the NSX gateway service are visible to the consumers of the environment; they are only visible to the operators and/or administrators. Consumers only see logical routers, which also serve as the default gateway/default route/IP gateway to/from their logical networks.

The configurations I’ve shown you/discussed so far have assumed the presence of only a single uplink. NSX is not constrained to having only a single uplink, nor is it constrained to having only a single physical network on an uplink. If you need multiple networks on the outside of an NSX gateway appliance, you can either use multiple uplinks, or you can use multiple VLANs on an uplink. In this post I’ll show you how to use multiple VLANs on the outside. This diagram provides a graphical representation of what the configuration will look like.

Multiple VLANs with NSX Gateways

(Click here for a larger version.)

Setting up this configuration will involve three steps:

  1. Configuring the uplink to carry multiple VLANs.
  2. Verifying the gateway configuration.
  3. Setting up the external networks in OpenStack.

Let’s take a look at each of these sections.

The process for this step will vary, mostly because it involves configuring your physical network to pass the appropriate VLANs to the NSX gateway appliance. I’ve written a few articles in the past that might be helpful here:

Although the titles of some of these articles seem to imply they are VMware-specific, they aren’t—the physical switch configuration is absolutely applicable here.

Verifying the Gateway Configuration

No special configuration is required on the NSX gateway appliance. As you probably already know, the NSX gateway appliance leverages Open vSwitch (OVS). OVS ports are, by default, trunk ports, and therefore will carry the VLAN tags passed by a properly configured physical switch. Further, the OVS bridge for the external uplink (typically breth1 or breth2) doesn’t need an IP address assigned to it. This is because the IP address(es) for logical routing are assigned to the logical routers, not the NSX gateway appliance’s interface. If you do have IP addresses assigned to the external uplink interface, you can safely remove it. If you prefer to leave it, that’s fine too.

As a side note, the NSX gateway appliances do support configuring VLAN sub-interfaces using a command like this:

add network interface <physical interface> vlan <VLAN ID>

Thus far, I haven’t found a need to use VLAN sub-interfaces when using multiple VLANs on the outside of an NSX gateway appliance, but I did want to point out that this functionality does indeed exist.

Setting up the External Networks

This is the only moderately tricky part of the configuration. In this step, you’ll prepare multiple external networks that can be used as uplinks for logical routers.

The command you’ll want to use (yes, you have to use the CLI—this functionality isn’t exposed in the OpenStack Dashboard web interface) looks like this:

neutron net-create <network name> -- 
--router:external=True --provider:network_type l3_ext
--provider:segmentation_id <VLAN ID> --provider:physical_network=<NSX gateway service UUID> --shared=True

For the most part, this command is pretty straightforward, but let’s break it down nevertheless:

  • The router:external=True tells Neutron this network can be used as the external (uplink) connection on a logical router.
  • The provider:network_type l3_ext is an NSX-specific extension that enables Neutron to work with the layer 3 (routing) functionality of the NSX gateway appliances.
  • The provider:segmentation_id portion provides the VLAN ID that should be associated with this particular external network. This VLAN ID should be one of the VLAN IDs that is trunked across the connection from the physical switch to the NSX gateway appliance.
  • The provider:physical_network portion tells OpenStack which specific NSX gateway service to use. This is important to note: this command references an NSX gateway service, not an NSX gateway appliance. Refer to part 15 if you’re unclear on the difference.

You’d repeat this command for each external network (VLAN) you want connected to NSX and usable inside OpenStack.

For each Neutron network, you’ll also need a Neutron subnet. The command to create a subnet on one of these external networks looks like this:

neutron subnet-create <network name> <CIDR>
--name <subnet name> --enable_dhcp=False
--allocation-pool start=<starting IP address>,end=<ending IP address>

The range of IP addresses specified in the allocation_pool portion of the command becomes the range of addresses from this particular subnet that can be assigned as floating IPs. It is also the pool of addresses from which logical routers will pull an address when they are connected to this particular external network.

When you’re done creating an external network and subnet for each VLAN on the outside of the NSX gateway appliance, then your users (consumers) can simply create logical routers as usual, and then select from one of the external networks as an uplink for their logical routers. This assumes you included the shared=True portion of the command when creating the network; if desired, you can omit that and instead specify a tenant ID, which would assign the external network to a specific tenant only.

I hope you find this post to be useful. If you have any questions, corrections, or clarifications, please speak up in the comments. All courteous comments are welcome!

Tags: , , , , , ,

Welcome to Technology Short Take #45. As usual, I’ve gathered a collection of links to various articles pertaining to data center-related technologies for your enjoyment. Here’s hoping you find something useful!

Networking

  • Cormac Hogan has a list of a few useful NSX troubleshooting tips.
  • If you’re not really a networking pro and need a “gentle” introduction to VXLAN, this post might be a good place to start.
  • Also along those lines—perhaps you’re a VMware administrator who wants to branch into networking with NSX, or you’re a networking guru who needs to learn more about how this NSX stuff works. vBrownBag has been running a VCP-NV series covering various objectives from the VCP-NV exam. Check them out—objective 1, objective 2, objective 3, and objective 4 have been posted so far.

Servers/Hardware

  • I’m going to go out on a limb and make a prediction: In a few years time (let’s say 3–5 years), Intel SGX (Software Guard Extensions) will be regarded as important if not more important than the virtualization extensions. What is Intel SGX, you ask? See here, here, and here for a breakdown of the SGX design objectives. Let’s be real—the ability for an application to protect itself (and its data) from rogue software (including a compromised or untrusted operating system) is huge.

Security

  • CloudFlare (disclaimer: I am a CloudFlare customer) recently announced Keyless SSL, a technique for allowing organizations to take advantage of SSL offloading without relinquishing control of private keys. CloudFlare followed that announcement with a nitty gritty technical details post that describes how it works. I’d recommend reading the technical post just to get a good education on how encryption and TLS work, even if you’re not a CloudFlare customer.

Cloud Computing/Cloud Management

  • William Lam spent some time working with some “new age” container cluster management tools (specifically, govmomi, govc CLI, and Kubernetes on vSphere) and documented his experience here and here. Excellent stuff!
  • YAKA (Yet Another Kubernetes Article), this time looking at Kubernetes on CoreOS on OpenStack. (How’s that for buzzword bingo?)
  • This analytical evaluation of Kubernetes might be helpful as well.
  • Stampede.io looks interesting; I got a chance to see it live at the recent DigitalOcean-CoreOS meetup in San Francisco. Here’s the Stampede.io announcement post.

Operating Systems/Applications

  • Trying to wrap your head around the concept of “microservices”? Here’s a write-up that attempts to provide an introduction to microservices. An earlier blog post on cloud native software is pretty good, too.
  • Here’s a very nice collection of links about Docker, ranging from how to use Docker to how to use the Docker API and how to containerize your application (just to name a few topics).
  • Here’a a great pair of articles (part 1 and part 2) on microservices and Platform-as-a-Service (PaaS). This is really good stuff, especially if you are trying to expand your boundaries learning about cloud application design patterns.
  • This article by CenturyLink Labs—which has been doing some nice stuff around Docker and containers—talks about how to containerize your legacy applications.
  • Here’s a decent write-up on comparing LXC and Docker. There are also some decent LXC-specific articles on the site as well (see the sidebar).
  • Service registration (and discovery) in a micro-service architecture can be challenging. Jeff Lindsay is attempting to help address some of the challenges with Registrator; more information is available here.
  • Unlike a lot of Docker-related blog posts, this post by RightScale on combining VMs and containers for better cloud portability is a well-written piece. The pros and cons of using containers are discussed fairly, without hype.
  • Single-process containers or multi-process containers? This site presents a convincing argument for multi-process containers; have a look.
  • Tired of hearing about containers yet? Oh, come on, you know you love them! You love them so much you want to run them on your OS X laptop. Well…read this post for all the gory details.

Storage

  • The storage aspect of Docker isn’t typically discussed in a lot of detail, other than perhaps focusing on the need for persistent storage via Docker volumes. However, this article from Red Hat does a great job (in my opinion) of exploring storage options for Docker containers and how these options affect performance and scalability. Looks like OverlayFS is the clear winner; it will be great when OverlayFS is in the upstream kernel and supported by Docker. (Oh, and if you’re interested in more details on the default device mapper backend, see here.)
  • This is a nice write-up on Riverbed SteelFusion, aka “Granite.”

Virtualization

  • Azure Site Recovery (ASR) is similar to vCloud Air’s Disaster Recovery service, though obviously tailored toward Hyper-V and Windows Server (which is perfectly fine for organizations that are using Hyper-V and Windows Server). To help with the setup of ASR, the Azure team has a write-up on the networking infrastructure setup for Microsoft Azure as a DR site.
  • PowerCLI in the vSphere Web Client, eh? Interesting. See Alan Renouf’s post for full details.
  • PernixData recently released version 2.0 of FVP; Frank Denneman has all the details here.

That’s it for this time, but be sure to visit again for future episodes. Until then, feel free to start (or join in) a discussion in the comments below. All courteous comments are welcome!

Tags: , , , , , , , , , , , , ,

One of the great things about this site is the interaction I enjoy with readers. It’s always great to get comments from readers about how an article was informative, answered a question, or helped solve a problem. Knowing that what I’ve written here is helpful to others is a very large part of why I’ve been writing here for over 9 years.

Until today, I’ve left comments (and sometimes trackbacks) open on very old blog posts. Just the other day I received a comment on a 4 year old article where a reader was sharing another way to solve the same problem. Unfortunately, that has to change. Comment spam on the site has grown considerably over the last few months, despite the use of a number of plugins to help address the issue. It’s no longer just an annoyance; it’s now a problem.

As a result, starting today, all blog posts more than 3 years old will automatically have their comments and trackbacks closed. I hate to do it—really I do—but I don’t see any other solution to the increasing blog spam.

I hope that this does not adversely impact my readers’ ability to interact with me, but it is a necessary step.

Thanks to all who continue to read this site. I do sincerely appreciate your time and attention, and I hope that I can continue to provide useful and relevant content to help make peoples’ lives better.

Tags: , ,

You may have heard of Intel Rack-Scale Architecture (RSA), a new approach to designing data center hardware. This is an idea that was discussed extensively a couple of weeks ago at Intel Developer Forum (IDF) 2014 in San Francisco, which I had the opportunity to attend. (Disclaimer: Intel paid my travel and hotel expenses to attend IDF.)

Of course, IDF 2014 wasn’t the first time I’d heard of Intel RSA; it was also discussed last year. However, this year I had the chance to really dig into what Intel is trying to accomplish through Intel RSA—note that I’ll use “Intel RSA” instead of just “RSA” to avoid any confusion with the security company—and I wanted to share some of my thoughts and conclusions here.

Intel always seems to present Intel RSA as a single entity that is made up of a number of other technologies/efforts; specifically, Intel RSA is typically presented as:

  • Disaggregation of the compute, memory, and storage capacity in a rack
  • Silicon photonics as a low-latency, high-speed rack-scale fabric
  • Some software that combines disaggregated hardware capacity over a rack-scale fabric to create “pooled systems”

When you look at Intel RSA this way—and this is the way that it is typically positioned and described—it just doesn’t seem to make sense. What’s the benefit to consumers? Why should they buy this architecture instead of just buying servers? The Intel folks will talk about right-sizing servers to the workload, but that’s not really a valid argument for the vast majority of workloads that are running virtualized on a hypervisor (or containerized using some form of container technologies). Greg Ferro (of Packet Pushers fame) was also at the show, and we both shared some of these same concerns over Intel RSA.

Determined to dig a bit deeper into this, I went down to the Intel RSA booth on the IDF show floor. The Intel guys were kind enough to spend quite a bit of time with me talking in-depth about Intel RSA. What I came away with from that discussion is that Intel RSA is really about three related but not interdependent initiatives:

  • Disaggregating server hardware
  • Establishing hardware-level APIs
  • Enabling greater software intelligence and flexibility

I think the “related but not interdependent” phrase is really important here. What Intel’s proposing with Intel RSA doesn’t require disaggregated hardware (i.e., shelves of CPUs, memory, and storage); it could be done with standard 1U/2U rack-mount servers. Intel RSA doesn’t require silicon photonics; it could leverage any number of low-latency high-speed fabrics (including PCI Express and 40G/100G Ethernet). Hardware disaggregation and silicon photonics are only potential options for how you might assemble the hardware layer.

The real key to Intel RSA, in my opinion, are the hardware-level APIs. Intel has team up with a couple vendors already to create a draft hardware API specification called Redfish. If Intel is successful in building a standard hardware-level API that is consistent across OEM platforms, then the software running on that hardware (which could be traditional operating systems, hypervisors, or container hosts) can leverage greater visibility into what’s happening in the hardware. This greater visibility, in turn, allows the software to be more intelligent about how, where, and when workloads (VMs, containers, standard processes/threads) get scheduled onto that hardware. You can imagine the sorts of enhanced scheduling that becomes possible as hypervisors, for example, become more aware and more informed about what’s happening inside the hardware.

Similarly, if Intel is successful in establishing hardware-level APIs that are consistent across OEM platforms, then the idea of assembling a “composable pooled system” from hardware resources—in the form of disaggregated hardware or in the form of resources spread across traditional 1U/2U servers and connected via a low-latency high-speed fabric—now becomes possible as well. This is why I say that the hardware disaggregation piece is only an optional part of Intel RSA: with the widespread adoption of standardized hardware-level APIs, the same thing can be achieved without hardware disaggregation.

In my mind, Intel RSA is more of an example use case than a technology in and of itself. The real technologies here—disaggregated hardware, hardware-level APIs, and high-speed fabrics—can be leveraged/assembled in a variety of ways. Building rack-scale architectures using these technologies is just one use case.

I hope that this helps explain Intel RSA in a way that is a bit more consumable and understandable. If you have any questions, corrections, or clarifications, please feel free to speak up in the comments.

Tags: , ,

This post will provide a quick introduction to a tool called Vagrant. Unless you’ve been hiding under a rock—or, more likely, been too busy doing real work in your data center to pay attention—you’ve probably heard of Vagrant. Maybe, like me, you had some ideas about what Vagrant is (or isn’t) and what it does (or doesn’t) do. Hopefully I can clear up some of the confusion in this post.

In its simplest form, Vagrant is an automation tool with a domain-specific language (DSL) that is used to automate the creation of VMs and VM environments. The idea is that a user can create a set of instructions, using Vagrant’s DSL, that will set up one or more VMs and possibly configure those VMs. Every time the user uses the precreated set of instructions, the end result will look exactly the same. This can be beneficial for a number of use cases, including developers who want a consistent development environment or folks wanting to share a demo environment with other users.

Vagrant makes this work by using a number of different components:

  • Providers: These are the “back end” of Vagrant. Vagrant itself doesn’t provide any virtualization functionality; it relies on other products to do the heavy lifting. Providers are how Vagrant interacts with the products that will do the actual virtualization work. A provider could be VirtualBox (included by default with Vagrant), VMware Fusion, Hyper-V, vCloud Air, or AWS, just to name a few.
  • Boxes: At the heart of Vagrant are boxes. Boxes are the predefined images that are used by Vagrant to build the environment according to the instructions provided by the user. A box may be a plain OS installation, or it may be an OS installation plus one or more applications installed. Boxes may support only a single provider or may support multiple providers (for example, a box might only work with VirtualBox, or it might support VirtualBox and VMware Fusion). It’s important to note that multi-provider support by a box is really handled by multiple versions of a box (i.e, a version supporting VirtualBox, a version supporting AWS, or a version supporting VMware Fusion). A single box supports a single provider.
  • Vagrantfile: The Vagrantfile contains the instructions from the user, expressed in Vagrant’s DSL, on what the environment should look like—how many VMs, what type of VM, the provider, how they are connected, etc. Vagrantfiles are so named because the actual filename is Vagrantfile. The Vagrant DSL (and therefore Vagrantfiles) are based on Ruby.

Once of the first things I thought about as I started digging into Vagrant was that Vagrant would be a tool that would help streamline moving applications/code from development to production. After all, if you had providers for Vagrant that supported both VirtualBox and VMware vCenter, and you had boxes that supported both providers, then you could write a single Vagrantfile that would instantiate the same environment in development and in production. Cool, right? In theory this is possible, but in talking with some others who are much more familiar with Vagrant than I am it seems that in practice this is not necessarily the case. Because support for multiple providers is handled by different versions of a box (as outlined above), the boxes may be slightly different and therefore may not produce the exact same results from a single Vagrantfile. It is possible to write the Vagrantfile in such a way as to recognize different providers and react differently, but this obviously adds complexity.

With that in mind, it seems to me that the most beneficial uses of Vagrant are therefore to speed up the creation of development environments, to enable version control of development environments (via version control of the Vagrantfile), to provide some reasonable level of consistency across multiple developers, and to make it easier to share development environments. (If my conclusions are incorrect, please speak up in the comments and explain why.)

OK, enough of the high-level theory. Let’s take a look at a very simple example of a Vagrantfile:

(Click here if you can’t see the code block above.)

This Vagrantfile sets the box (“ubuntu/precise64″), the box URL (retrieves from Canonical’s repository of cloud images), and then sets the “/vagrant” directory in the VM to be shared/synced with the current (“.”) directory on the host—in this case, the current directory is the directory where the Vagrantfile itself is stored.

To have Vagrant then use this set of instructions, run this command from the directory where the Vagrantfile is sitting:

vagrant up

You’ll see a series of things happen; along the way you’ll see a note that the machine is booted and ready, and that shared folders are getting mounted. (If you are using VirtualBox and the box I’m using, you’ll also see a warning about the VirtualBox Guest Additions version not matching the version of VirtualBox.) When it’s all finished, you’ll be deposited back at your prompt. From there, you can easily log in to the newly-created VM using nothing more than vagrant ssh. That’s pretty handy.

Other Vagrant commands include:

  • vagrant halt to shut down the VM(s)
  • vagrant suspend to suspend the VM(s), use vagrant resume to resume
  • vagrant status to display the status of the VM(s)
  • vagrant destroy to destroy (delete) the VM(s)

Clearly, the example I showed you here is extremely simple. For an example of a more complicated Vagrantfile, check out this example from Cody Bunch, which sets up a set of VMs for running OpenStack. Cody and his co-author Kevin Jackson also use Vagrant extensively in their OpenStack Cloud Computing Cookbook, 2nd Edition, which makes it easy for readers to follow along.

I said this would be a quick introduction to Vagrant, so I’ll stop here for now. Feel free to post any questions in the comments, and I’ll do my best to answer them. Likewise, if there are any errors in the post, please let me know in the comments. All courteous comments are welcome!

Tags: , ,

This is a liveblog for session DATS013, on microservers. I was running late to this session (my calendar must have been off—thought I had 15 minutes more), so I wasn’t able to capture the titles or names of the speakers.

The first speaker starts out with a review of exactly what a microserver is; Intel sees microservers as a natural evolution from rack-mounted servers to blades to microservers. Key microserver technologies include: Intel Atom C2000 family of processors; Intel Xeon E5 v2 processor family; and Intel Ethernet Switch FM6000 series. Microservers share some common characteristics, such as high integrated platforms (like integrated network) and being designed for high efficiency. Efficiency might be more important than absolute performance.

Disaggregation of resources is a common platform option for microservers. (Once again this comes back to Intel’s rack-scale architecture work.) This leads the speaker to talk about a Technology Delivery Vehicle (TDV) being displayed here at the show; this is essentially a proof-of-concept product that Intel built that incorporates various microserver technologies and design patterns.

Upcoming microserver technologies that Intel has announced or is working on incude:

  • The Intel Xeon D, a Xeon-based SoC with integrated 10Gbs Ethernet and running in a 15–45 watt power range
  • The Intel Ethernet Switch FM10000 series (a follow-on from the FM6000 series), which will offer a variety of ports for connectivity—not just Ethernet (it will support 1, 2.5, 10, 25, 40, and 100 Gb Ethernet) but also PCI Express Gen3 and connectivity to embedded Intel Ethernet controllers. In some way (the speaker is unclear) this is also aligned with Intel’s silicon photonics efforts.

A new speaker (Christian?) takes the stage to talk about software-defined infrastructure (SDI), which is the vision that Intel has been talking about this week. He starts the discussion by talking about NFV and SDN, and how these efforts enable agile networks on standard high volume (SHV) servers (such as microservers). Examples of SDN/NFV workloads include wireless BTS, CRAN, MME, DSLAM, BRAS, and core routers. Some of these workloads are well suited for running on Intel platforms.

The speaker transitions to talking about “RouterBricks,” a scalable soft router that was developed with involvement from Scott Shenker. The official term for this is a “switch-route-forward”, or SRF. A traditional SRF architecture can be replicated with COTS hardware using multi-queue NICs and multi-core/multi-socket CPUs. However, the speaker says that a single compute node isn’t capable of replacing a large traditional router, so instead we have to scale compute nodes by treating them as a “linecard” using Intel DPDK. The servers are interconnected using a mesh, ToR, or multi-stage Clos network. Workloads are scheduled across these server/linecards using Valiant Load Balancing (VLB). Of course, there are issues with packet-level load balancing and flow-level load balancing, so tradeoffs must be made one way or another.

An example SRF built using four systems, each with ten 10Gbps interfaces, is capable of sustaining 40Gbps line rate with 64 bytes, 128 bytes, 256 bytes, 512 bytes, 1024 bytes, and 1500 bytes. Testing latency and jitter using a Spirent shows that an SRF compares very favorably with an edge router, but not so well against a core router (even though everything on the SRF is software-based running on Linux). Out of order frames from the SRF were less than 0.04% in all cases.

That SRF was built using Xeon processors, but what about an SRF built using Atom processors? A single Atom core can’t sustain line rate at 64 or 128 bytes per packet, but 2 cores can sustain line rate. Testing latency and jitter showed results at less than 60 microseconds and less than 0.15 microseconds, respectively.

Comparing Xeon to Atom, the speaker shows that a Xeon core can move about 4 times the number of packets compared to an Atom core. A Xeon core will also use far less memory bandwidth than an Atom core due to Xeon’s support for Direct Data I/O, which copies (via DMA) data received by a NIC into the processor’s cache. Atom does not support this feature.

With respect to efficiency, Xeon versus Atom presents very interesting results. Throughput per rack unit is better for the Atom (40 Gbps/RU compared to 13.3Gbps/RU), while raw throughput is far better for the Xeon. Throughput per watt, on the other hand, is slightly better for the Atom (0.46 Gbps/watt versus 0.37 Gbps/watt for Xeon).

At this point, the first presenter re-takes the stage and they open up the session for questions.

Tags: , , ,

This is a liveblog of IDF 2014 session DATS009, titled “Ceph: Open Source Storage Software Optimizations on Intel Architecture for Cloud Workloads.” (That’s a mouthful.) The speaker is Anjaneya “Reddy” Chagam, a Principal Engineer in the Intel Data Center Group.

Chagam starts by reviewing the agenda, which—as the name of the session implies—is primarily focused on Ceph. He next transitions into a review of the problem with storage in data centers today; specifically, that storage needs “are growing at a rate unsustainable with today’s infrastructure and labor costs.” Another problem, according to Chagam, is that today’s workloads end up using the same sets of data but in very different ways, and those different ways of using the data have very different performance profiles. Other problems with the “traditional” way of doing storage is that storage processing performance doesn’t scale out with capacity, storage environments are growing increasingly complex (which in turn makes management harder).

Chagam does admit that not all workloads are suited for distributed storage solutions. If you need high availability and high performance (like for databases), then the traditional scale-up model might work better. For “cloud workloads” (no additional context/information provided to qualify what a cloud workload is), distributed storage solutions may be a good fit. This brings Chagam to discussing Ceph, which he describes as the “only” (quotes his) open source virtual block storage option.

The session now transitions to discussing Ceph in more detail. RADOS (stands for “Reliable, Autonomous, Distributed Object Store”) is the storage cluster that operates on the back-end of Ceph. On top of RADOS there are a number of interfaces: Ceph native clients, Ceph block access, Ceph object gateway (providing S3 and Swift APIs), and Ceph file system access. Intel’s focus is on improving block and object performance.

Chagam turns to discussing Ceph block storage. Ceph block storage can be mounted directly as a block device, or it can be used as a boot device for a KVM domain. The storage is shared peer-to-peer via Ethernet; there is no centralized metadata. Ceph storage nodes are responsible for holding (and distributing) the data across the cluster, and it is designed to operate without a single point of failure. Chagam does not provide any detailed information (yet) on how the data is sharded/replicated/distributed across the cluster, so it is unclear how many storage nodes can fail without an outage.

There are both user-mode (for virtual machines) and kernel mode RBD (RADOS block device) drivers for accessing the backend storage cluster itself. Ceph also uses the concept of an Object Store Daemon (OSD); one of these exists for every HDD (or SSD, presumably). SSDs would typically be used for journaling, but can also be used for caching. Using SSDs for journaling would help with write performance.

Chagam does a brief walkthrough of the read path and write path for data being read from or written to a Ceph storage cluster; here is where he points out that Ceph (by default?) stores three copies of the data on different disks, different servers, potentially even different racks or different fault zones. If you are writing multiple copies, you can configure various levels of consistency within Ceph with regard to how writing the multiple copies are handled.

So where is Intel focusing its efforts around Ceph? Chagam points out that Intel is primarily targeting low(er) performance and low(er) capacity block workloads as well as low(er) performance but high(er) capacity object storage workloads. At some point Intel may focus on the high performance workloads, but that is not a current area of focus.

Speaking of performance, Chagam spends a few minutes providing some high-level performance reviews based on tests that Intel has conducted. Most of the measured performance stats were close to the calculated theoretical maximums, except for random 4K writes (which was only 64% of the calculated theoretical maximum for the test cluster). Chagam recommends that you limit VM deployments to the maximum number of IOPS that your Ceph cluster will support (this is pretty standard storage planning).

With regard to Ceph deployment, Chagam reviews a number of deployment considerations:

  • Workloads (VMs, archive data)
  • Access type (block, file, object)
  • Performance requirements
  • Reliability requirements
  • Caching (server caching, client caching)
  • Orchestration (OpenStack, VMware)
  • Management
  • Networking and fabric

Next, Chagam talks about Intel’s reference architecture for block workloads on a Ceph storage cluster. Intel recommends 1 SSD per every 5 HDD (size not specified). Management traffic can be 1 Gbps, but storage traffic should run across a 10 Gbps link. Intel recommends 16GB or more of memory for using Ceph, with memory requirements going as high as 64GB for larger Ceph clusters. (Chagam does not talk about why the memory requirements are different.)

Intel also has a reference architecture for object storage; this looks very similar to the block storage reference architecture but includes “object storage proxies” (I would imagine these are conceptually similar to Swift proxies). Chagam does say that Atom CPUs would be sufficient for very low-performance implementations; otherwise, the hardware requirements look very much like the block storage reference architecture.

This brings Chagam to a discussion of where Intel is specifically contributing back to open source storage solutions with Ceph. There isn’t much here; Intel will help optimize Ceph to run best on Intel Architecture platforms, including contributing open source code back to the Ceph project. Intel will also publish Ceph reference architectures, like the ones he shared earlier in the presentation (which have not yet been published). Specific product areas from Intel’s lineup that might be useful for Ceph include Intel SSDs (including NVMe), Intel network interface cards (NICs), software libraries (Intel Storage Acceleration Library, ISA-L), software products (Intel Cache Acceleration Software, or CAS). Some additional open source projects/contributions are planned but haven’t yet happened. Naturally Intel partners closely with Red Hat to help address needs around Ceph development.

ISA-L is interesting, and fortunately Chagam has a slide on ISA-L. ISA-L is a set of algorithms optimized for Intel Architecture platforms. These algorithms, available as machine code or as C code, will help improve performance for tasks like data integrity, security, and encryption. One example provided by Chagam is improving performance for SHA–1, SHA–256, and MD5 hash calculations. Another example of ISA-L is the Erasure Code plug-in that will be merged with the main Ceph release (it currently exists in the development release).

Virtual Storage Manager (VSM) is an open source project that Intel is developing to help address some of the management concerns around Ceph. VSM will primarily focus on configuration management. VSM is anticipated to be available in Q4 of this year.

Intel Cache Acceleration Software (CAS) is another product (not open source) that might help in a Ceph environment. CAS uses SSDs and DRAM to speed up operations. CAS currently really only benefits read I/O operations.

Finally, Chagam takes a few minutes to talk about some Ceph best practices:

  • You should shoot for one HDD being managed by one OSD. That, in turn, translates to 1GHz of Xeon-class computing power per OSD and about 1GB of RAM per OSD. (Resource requirements are pretty significant.)
  • Jumbo frames are recommended. (No specific MTU size provided.)
  • Use 10x the default queue parameters.
  • Use the deadline scheduler for XFS.
  • Tune read_ahead_kb for sequential reads.
  • Use a queue depth of 64 for sequential workloads, and 8 for random workloads.
  • For small clusters, you can co-locate Ceph monitoring process with compute workloads, but it will take 4–6GB of RAM.
  • Use dedicated nodes for monitoring when you move beyond 100 OSDs.

Chagam now summarizes the key points and wraps up the session.

Tags: , , , ,

IDF 2014 Day 2 Recap

Following on from my IDF 2014 Day 1 recap, here’s a quick recap of day 2.

Data Center Mega-Session

You can read the liveblog here if you want all the gory details. If we boil it down to the essentials, it’s actually pretty simple. First, deliver more computing power in the hardware, either through the addition of FPGAs to existing CPUs or through the continued march of CPU power (via more cores or faster clock speeds or both). Second, make the hardware programmable, through standard interfaces. Third, expand the use of “big data” and analytics.

Technical Sessions

I attended a couple technical sessions today, but didn’t manage to get any of them liveblogged. Sorry! I did tweet a few things from the sessions, in case you follow me on Twitter.

Expo Floor

I did have an extremely productive conversation regarding Intel’s rack-scale architecture (RSA) efforts. I pushed the Intel folks on the show floor to really dive into what makes up RSA, and finally got some answers that I’ll share in a separate post. I will do my best to get a dedicated RSA piece published just as soon as I possibly can.

Also on the expo floor, I got my hands on some of the Intel optical transceivers and cables. The cables are really nice, and practically indestructible. I think this move by Intel will be good for optics in the data center.

Finally, I was also able to join for an episode of Intel Chip Chat, a podcast that Intel records regularly, including at events like IDF. It was great fun getting to spend some time talking about VMware NSX and network virtualization.

Closing Thoughts

Overall, another solid day at IDF 2014. Lots of good technical information presented (which, unfortunately, I did not do a very good job capturing), and equally good technical information available on the show floor.

I’ll try to do a better job with the liveblogging tomorrow. Thanks for reading!

Tags: , ,

« Older entries