One of the great things about this site is the interaction I enjoy with readers. It’s always great to get comments from readers about how an article was informative, answered a question, or helped solve a problem. Knowing that what I’ve written here is helpful to others is a very large part of why I’ve been writing here for over 9 years.

Until today, I’ve left comments (and sometimes trackbacks) open on very old blog posts. Just the other day I received a comment on a 4 year old article where a reader was sharing another way to solve the same problem. Unfortunately, that has to change. Comment spam on the site has grown considerably over the last few months, despite the use of a number of plugins to help address the issue. It’s no longer just an annoyance; it’s now a problem.

As a result, starting today, all blog posts more than 3 years old will automatically have their comments and trackbacks closed. I hate to do it—really I do—but I don’t see any other solution to the increasing blog spam.

I hope that this does not adversely impact my readers’ ability to interact with me, but it is a necessary step.

Thanks to all who continue to read this site. I do sincerely appreciate your time and attention, and I hope that I can continue to provide useful and relevant content to help make peoples’ lives better.

Tags: , ,

You may have heard of Intel Rack-Scale Architecture (RSA), a new approach to designing data center hardware. This is an idea that was discussed extensively a couple of weeks ago at Intel Developer Forum (IDF) 2014 in San Francisco, which I had the opportunity to attend. (Disclaimer: Intel paid my travel and hotel expenses to attend IDF.)

Of course, IDF 2014 wasn’t the first time I’d heard of Intel RSA; it was also discussed last year. However, this year I had the chance to really dig into what Intel is trying to accomplish through Intel RSA—note that I’ll use “Intel RSA” instead of just “RSA” to avoid any confusion with the security company—and I wanted to share some of my thoughts and conclusions here.

Intel always seems to present Intel RSA as a single entity that is made up of a number of other technologies/efforts; specifically, Intel RSA is typically presented as:

  • Disaggregation of the compute, memory, and storage capacity in a rack
  • Silicon photonics as a low-latency, high-speed rack-scale fabric
  • Some software that combines disaggregated hardware capacity over a rack-scale fabric to create “pooled systems”

When you look at Intel RSA this way—and this is the way that it is typically positioned and described—it just doesn’t seem to make sense. What’s the benefit to consumers? Why should they buy this architecture instead of just buying servers? The Intel folks will talk about right-sizing servers to the workload, but that’s not really a valid argument for the vast majority of workloads that are running virtualized on a hypervisor (or containerized using some form of container technologies). Greg Ferro (of Packet Pushers fame) was also at the show, and we both shared some of these same concerns over Intel RSA.

Determined to dig a bit deeper into this, I went down to the Intel RSA booth on the IDF show floor. The Intel guys were kind enough to spend quite a bit of time with me talking in-depth about Intel RSA. What I came away with from that discussion is that Intel RSA is really about three related but not interdependent initiatives:

  • Disaggregating server hardware
  • Establishing hardware-level APIs
  • Enabling greater software intelligence and flexibility

I think the “related but not interdependent” phrase is really important here. What Intel’s proposing with Intel RSA doesn’t require disaggregated hardware (i.e., shelves of CPUs, memory, and storage); it could be done with standard 1U/2U rack-mount servers. Intel RSA doesn’t require silicon photonics; it could leverage any number of low-latency high-speed fabrics (including PCI Express and 40G/100G Ethernet). Hardware disaggregation and silicon photonics are only potential options for how you might assemble the hardware layer.

The real key to Intel RSA, in my opinion, are the hardware-level APIs. Intel has team up with a couple vendors already to create a draft hardware API specification called Redfish. If Intel is successful in building a standard hardware-level API that is consistent across OEM platforms, then the software running on that hardware (which could be traditional operating systems, hypervisors, or container hosts) can leverage greater visibility into what’s happening in the hardware. This greater visibility, in turn, allows the software to be more intelligent about how, where, and when workloads (VMs, containers, standard processes/threads) get scheduled onto that hardware. You can imagine the sorts of enhanced scheduling that becomes possible as hypervisors, for example, become more aware and more informed about what’s happening inside the hardware.

Similarly, if Intel is successful in establishing hardware-level APIs that are consistent across OEM platforms, then the idea of assembling a “composable pooled system” from hardware resources—in the form of disaggregated hardware or in the form of resources spread across traditional 1U/2U servers and connected via a low-latency high-speed fabric—now becomes possible as well. This is why I say that the hardware disaggregation piece is only an optional part of Intel RSA: with the widespread adoption of standardized hardware-level APIs, the same thing can be achieved without hardware disaggregation.

In my mind, Intel RSA is more of an example use case than a technology in and of itself. The real technologies here—disaggregated hardware, hardware-level APIs, and high-speed fabrics—can be leveraged/assembled in a variety of ways. Building rack-scale architectures using these technologies is just one use case.

I hope that this helps explain Intel RSA in a way that is a bit more consumable and understandable. If you have any questions, corrections, or clarifications, please feel free to speak up in the comments.

Tags: , ,

This post will provide a quick introduction to a tool called Vagrant. Unless you’ve been hiding under a rock—or, more likely, been too busy doing real work in your data center to pay attention—you’ve probably heard of Vagrant. Maybe, like me, you had some ideas about what Vagrant is (or isn’t) and what it does (or doesn’t) do. Hopefully I can clear up some of the confusion in this post.

In its simplest form, Vagrant is an automation tool with a domain-specific language (DSL) that is used to automate the creation of VMs and VM environments. The idea is that a user can create a set of instructions, using Vagrant’s DSL, that will set up one or more VMs and possibly configure those VMs. Every time the user uses the precreated set of instructions, the end result will look exactly the same. This can be beneficial for a number of use cases, including developers who want a consistent development environment or folks wanting to share a demo environment with other users.

Vagrant makes this work by using a number of different components:

  • Providers: These are the “back end” of Vagrant. Vagrant itself doesn’t provide any virtualization functionality; it relies on other products to do the heavy lifting. Providers are how Vagrant interacts with the products that will do the actual virtualization work. A provider could be VirtualBox (included by default with Vagrant), VMware Fusion, Hyper-V, vCloud Air, or AWS, just to name a few.
  • Boxes: At the heart of Vagrant are boxes. Boxes are the predefined images that are used by Vagrant to build the environment according to the instructions provided by the user. A box may be a plain OS installation, or it may be an OS installation plus one or more applications installed. Boxes may support only a single provider or may support multiple providers (for example, a box might only work with VirtualBox, or it might support VirtualBox and VMware Fusion). It’s important to note that multi-provider support by a box is really handled by multiple versions of a box (i.e, a version supporting VirtualBox, a version supporting AWS, or a version supporting VMware Fusion). A single box supports a single provider.
  • Vagrantfile: The Vagrantfile contains the instructions from the user, expressed in Vagrant’s DSL, on what the environment should look like—how many VMs, what type of VM, the provider, how they are connected, etc. Vagrantfiles are so named because the actual filename is Vagrantfile. The Vagrant DSL (and therefore Vagrantfiles) are based on Ruby.

Once of the first things I thought about as I started digging into Vagrant was that Vagrant would be a tool that would help streamline moving applications/code from development to production. After all, if you had providers for Vagrant that supported both VirtualBox and VMware vCenter, and you had boxes that supported both providers, then you could write a single Vagrantfile that would instantiate the same environment in development and in production. Cool, right? In theory this is possible, but in talking with some others who are much more familiar with Vagrant than I am it seems that in practice this is not necessarily the case. Because support for multiple providers is handled by different versions of a box (as outlined above), the boxes may be slightly different and therefore may not produce the exact same results from a single Vagrantfile. It is possible to write the Vagrantfile in such a way as to recognize different providers and react differently, but this obviously adds complexity.

With that in mind, it seems to me that the most beneficial uses of Vagrant are therefore to speed up the creation of development environments, to enable version control of development environments (via version control of the Vagrantfile), to provide some reasonable level of consistency across multiple developers, and to make it easier to share development environments. (If my conclusions are incorrect, please speak up in the comments and explain why.)

OK, enough of the high-level theory. Let’s take a look at a very simple example of a Vagrantfile:

(Click here if you can’t see the code block above.)

This Vagrantfile sets the box (“ubuntu/precise64″), the box URL (retrieves from Canonical’s repository of cloud images), and then sets the “/vagrant” directory in the VM to be shared/synced with the current (“.”) directory on the host—in this case, the current directory is the directory where the Vagrantfile itself is stored.

To have Vagrant then use this set of instructions, run this command from the directory where the Vagrantfile is sitting:

vagrant up

You’ll see a series of things happen; along the way you’ll see a note that the machine is booted and ready, and that shared folders are getting mounted. (If you are using VirtualBox and the box I’m using, you’ll also see a warning about the VirtualBox Guest Additions version not matching the version of VirtualBox.) When it’s all finished, you’ll be deposited back at your prompt. From there, you can easily log in to the newly-created VM using nothing more than vagrant ssh. That’s pretty handy.

Other Vagrant commands include:

  • vagrant halt to shut down the VM(s)
  • vagrant suspend to suspend the VM(s), use vagrant resume to resume
  • vagrant status to display the status of the VM(s)
  • vagrant destroy to destroy (delete) the VM(s)

Clearly, the example I showed you here is extremely simple. For an example of a more complicated Vagrantfile, check out this example from Cody Bunch, which sets up a set of VMs for running OpenStack. Cody and his co-author Kevin Jackson also use Vagrant extensively in their OpenStack Cloud Computing Cookbook, 2nd Edition, which makes it easy for readers to follow along.

I said this would be a quick introduction to Vagrant, so I’ll stop here for now. Feel free to post any questions in the comments, and I’ll do my best to answer them. Likewise, if there are any errors in the post, please let me know in the comments. All courteous comments are welcome!

Tags: , ,

This is a liveblog for session DATS013, on microservers. I was running late to this session (my calendar must have been off—thought I had 15 minutes more), so I wasn’t able to capture the titles or names of the speakers.

The first speaker starts out with a review of exactly what a microserver is; Intel sees microservers as a natural evolution from rack-mounted servers to blades to microservers. Key microserver technologies include: Intel Atom C2000 family of processors; Intel Xeon E5 v2 processor family; and Intel Ethernet Switch FM6000 series. Microservers share some common characteristics, such as high integrated platforms (like integrated network) and being designed for high efficiency. Efficiency might be more important than absolute performance.

Disaggregation of resources is a common platform option for microservers. (Once again this comes back to Intel’s rack-scale architecture work.) This leads the speaker to talk about a Technology Delivery Vehicle (TDV) being displayed here at the show; this is essentially a proof-of-concept product that Intel built that incorporates various microserver technologies and design patterns.

Upcoming microserver technologies that Intel has announced or is working on incude:

  • The Intel Xeon D, a Xeon-based SoC with integrated 10Gbs Ethernet and running in a 15–45 watt power range
  • The Intel Ethernet Switch FM10000 series (a follow-on from the FM6000 series), which will offer a variety of ports for connectivity—not just Ethernet (it will support 1, 2.5, 10, 25, 40, and 100 Gb Ethernet) but also PCI Express Gen3 and connectivity to embedded Intel Ethernet controllers. In some way (the speaker is unclear) this is also aligned with Intel’s silicon photonics efforts.

A new speaker (Christian?) takes the stage to talk about software-defined infrastructure (SDI), which is the vision that Intel has been talking about this week. He starts the discussion by talking about NFV and SDN, and how these efforts enable agile networks on standard high volume (SHV) servers (such as microservers). Examples of SDN/NFV workloads include wireless BTS, CRAN, MME, DSLAM, BRAS, and core routers. Some of these workloads are well suited for running on Intel platforms.

The speaker transitions to talking about “RouterBricks,” a scalable soft router that was developed with involvement from Scott Shenker. The official term for this is a “switch-route-forward”, or SRF. A traditional SRF architecture can be replicated with COTS hardware using multi-queue NICs and multi-core/multi-socket CPUs. However, the speaker says that a single compute node isn’t capable of replacing a large traditional router, so instead we have to scale compute nodes by treating them as a “linecard” using Intel DPDK. The servers are interconnected using a mesh, ToR, or multi-stage Clos network. Workloads are scheduled across these server/linecards using Valiant Load Balancing (VLB). Of course, there are issues with packet-level load balancing and flow-level load balancing, so tradeoffs must be made one way or another.

An example SRF built using four systems, each with ten 10Gbps interfaces, is capable of sustaining 40Gbps line rate with 64 bytes, 128 bytes, 256 bytes, 512 bytes, 1024 bytes, and 1500 bytes. Testing latency and jitter using a Spirent shows that an SRF compares very favorably with an edge router, but not so well against a core router (even though everything on the SRF is software-based running on Linux). Out of order frames from the SRF were less than 0.04% in all cases.

That SRF was built using Xeon processors, but what about an SRF built using Atom processors? A single Atom core can’t sustain line rate at 64 or 128 bytes per packet, but 2 cores can sustain line rate. Testing latency and jitter showed results at less than 60 microseconds and less than 0.15 microseconds, respectively.

Comparing Xeon to Atom, the speaker shows that a Xeon core can move about 4 times the number of packets compared to an Atom core. A Xeon core will also use far less memory bandwidth than an Atom core due to Xeon’s support for Direct Data I/O, which copies (via DMA) data received by a NIC into the processor’s cache. Atom does not support this feature.

With respect to efficiency, Xeon versus Atom presents very interesting results. Throughput per rack unit is better for the Atom (40 Gbps/RU compared to 13.3Gbps/RU), while raw throughput is far better for the Xeon. Throughput per watt, on the other hand, is slightly better for the Atom (0.46 Gbps/watt versus 0.37 Gbps/watt for Xeon).

At this point, the first presenter re-takes the stage and they open up the session for questions.

Tags: , , ,

This is a liveblog of IDF 2014 session DATS009, titled “Ceph: Open Source Storage Software Optimizations on Intel Architecture for Cloud Workloads.” (That’s a mouthful.) The speaker is Anjaneya “Reddy” Chagam, a Principal Engineer in the Intel Data Center Group.

Chagam starts by reviewing the agenda, which—as the name of the session implies—is primarily focused on Ceph. He next transitions into a review of the problem with storage in data centers today; specifically, that storage needs “are growing at a rate unsustainable with today’s infrastructure and labor costs.” Another problem, according to Chagam, is that today’s workloads end up using the same sets of data but in very different ways, and those different ways of using the data have very different performance profiles. Other problems with the “traditional” way of doing storage is that storage processing performance doesn’t scale out with capacity, storage environments are growing increasingly complex (which in turn makes management harder).

Chagam does admit that not all workloads are suited for distributed storage solutions. If you need high availability and high performance (like for databases), then the traditional scale-up model might work better. For “cloud workloads” (no additional context/information provided to qualify what a cloud workload is), distributed storage solutions may be a good fit. This brings Chagam to discussing Ceph, which he describes as the “only” (quotes his) open source virtual block storage option.

The session now transitions to discussing Ceph in more detail. RADOS (stands for “Reliable, Autonomous, Distributed Object Store”) is the storage cluster that operates on the back-end of Ceph. On top of RADOS there are a number of interfaces: Ceph native clients, Ceph block access, Ceph object gateway (providing S3 and Swift APIs), and Ceph file system access. Intel’s focus is on improving block and object performance.

Chagam turns to discussing Ceph block storage. Ceph block storage can be mounted directly as a block device, or it can be used as a boot device for a KVM domain. The storage is shared peer-to-peer via Ethernet; there is no centralized metadata. Ceph storage nodes are responsible for holding (and distributing) the data across the cluster, and it is designed to operate without a single point of failure. Chagam does not provide any detailed information (yet) on how the data is sharded/replicated/distributed across the cluster, so it is unclear how many storage nodes can fail without an outage.

There are both user-mode (for virtual machines) and kernel mode RBD (RADOS block device) drivers for accessing the backend storage cluster itself. Ceph also uses the concept of an Object Store Daemon (OSD); one of these exists for every HDD (or SSD, presumably). SSDs would typically be used for journaling, but can also be used for caching. Using SSDs for journaling would help with write performance.

Chagam does a brief walkthrough of the read path and write path for data being read from or written to a Ceph storage cluster; here is where he points out that Ceph (by default?) stores three copies of the data on different disks, different servers, potentially even different racks or different fault zones. If you are writing multiple copies, you can configure various levels of consistency within Ceph with regard to how writing the multiple copies are handled.

So where is Intel focusing its efforts around Ceph? Chagam points out that Intel is primarily targeting low(er) performance and low(er) capacity block workloads as well as low(er) performance but high(er) capacity object storage workloads. At some point Intel may focus on the high performance workloads, but that is not a current area of focus.

Speaking of performance, Chagam spends a few minutes providing some high-level performance reviews based on tests that Intel has conducted. Most of the measured performance stats were close to the calculated theoretical maximums, except for random 4K writes (which was only 64% of the calculated theoretical maximum for the test cluster). Chagam recommends that you limit VM deployments to the maximum number of IOPS that your Ceph cluster will support (this is pretty standard storage planning).

With regard to Ceph deployment, Chagam reviews a number of deployment considerations:

  • Workloads (VMs, archive data)
  • Access type (block, file, object)
  • Performance requirements
  • Reliability requirements
  • Caching (server caching, client caching)
  • Orchestration (OpenStack, VMware)
  • Management
  • Networking and fabric

Next, Chagam talks about Intel’s reference architecture for block workloads on a Ceph storage cluster. Intel recommends 1 SSD per every 5 HDD (size not specified). Management traffic can be 1 Gbps, but storage traffic should run across a 10 Gbps link. Intel recommends 16GB or more of memory for using Ceph, with memory requirements going as high as 64GB for larger Ceph clusters. (Chagam does not talk about why the memory requirements are different.)

Intel also has a reference architecture for object storage; this looks very similar to the block storage reference architecture but includes “object storage proxies” (I would imagine these are conceptually similar to Swift proxies). Chagam does say that Atom CPUs would be sufficient for very low-performance implementations; otherwise, the hardware requirements look very much like the block storage reference architecture.

This brings Chagam to a discussion of where Intel is specifically contributing back to open source storage solutions with Ceph. There isn’t much here; Intel will help optimize Ceph to run best on Intel Architecture platforms, including contributing open source code back to the Ceph project. Intel will also publish Ceph reference architectures, like the ones he shared earlier in the presentation (which have not yet been published). Specific product areas from Intel’s lineup that might be useful for Ceph include Intel SSDs (including NVMe), Intel network interface cards (NICs), software libraries (Intel Storage Acceleration Library, ISA-L), software products (Intel Cache Acceleration Software, or CAS). Some additional open source projects/contributions are planned but haven’t yet happened. Naturally Intel partners closely with Red Hat to help address needs around Ceph development.

ISA-L is interesting, and fortunately Chagam has a slide on ISA-L. ISA-L is a set of algorithms optimized for Intel Architecture platforms. These algorithms, available as machine code or as C code, will help improve performance for tasks like data integrity, security, and encryption. One example provided by Chagam is improving performance for SHA–1, SHA–256, and MD5 hash calculations. Another example of ISA-L is the Erasure Code plug-in that will be merged with the main Ceph release (it currently exists in the development release).

Virtual Storage Manager (VSM) is an open source project that Intel is developing to help address some of the management concerns around Ceph. VSM will primarily focus on configuration management. VSM is anticipated to be available in Q4 of this year.

Intel Cache Acceleration Software (CAS) is another product (not open source) that might help in a Ceph environment. CAS uses SSDs and DRAM to speed up operations. CAS currently really only benefits read I/O operations.

Finally, Chagam takes a few minutes to talk about some Ceph best practices:

  • You should shoot for one HDD being managed by one OSD. That, in turn, translates to 1GHz of Xeon-class computing power per OSD and about 1GB of RAM per OSD. (Resource requirements are pretty significant.)
  • Jumbo frames are recommended. (No specific MTU size provided.)
  • Use 10x the default queue parameters.
  • Use the deadline scheduler for XFS.
  • Tune read_ahead_kb for sequential reads.
  • Use a queue depth of 64 for sequential workloads, and 8 for random workloads.
  • For small clusters, you can co-locate Ceph monitoring process with compute workloads, but it will take 4–6GB of RAM.
  • Use dedicated nodes for monitoring when you move beyond 100 OSDs.

Chagam now summarizes the key points and wraps up the session.

Tags: , , , ,

IDF 2014 Day 2 Recap

Following on from my IDF 2014 Day 1 recap, here’s a quick recap of day 2.

Data Center Mega-Session

You can read the liveblog here if you want all the gory details. If we boil it down to the essentials, it’s actually pretty simple. First, deliver more computing power in the hardware, either through the addition of FPGAs to existing CPUs or through the continued march of CPU power (via more cores or faster clock speeds or both). Second, make the hardware programmable, through standard interfaces. Third, expand the use of “big data” and analytics.

Technical Sessions

I attended a couple technical sessions today, but didn’t manage to get any of them liveblogged. Sorry! I did tweet a few things from the sessions, in case you follow me on Twitter.

Expo Floor

I did have an extremely productive conversation regarding Intel’s rack-scale architecture (RSA) efforts. I pushed the Intel folks on the show floor to really dive into what makes up RSA, and finally got some answers that I’ll share in a separate post. I will do my best to get a dedicated RSA piece published just as soon as I possibly can.

Also on the expo floor, I got my hands on some of the Intel optical transceivers and cables. The cables are really nice, and practically indestructible. I think this move by Intel will be good for optics in the data center.

Finally, I was also able to join for an episode of Intel Chip Chat, a podcast that Intel records regularly, including at events like IDF. It was great fun getting to spend some time talking about VMware NSX and network virtualization.

Closing Thoughts

Overall, another solid day at IDF 2014. Lots of good technical information presented (which, unfortunately, I did not do a very good job capturing), and equally good technical information available on the show floor.

I’ll try to do a better job with the liveblogging tomorrow. Thanks for reading!

Tags: , ,

IDF 2014: Data Center Mega-Session

This is a liveblog of the Data Center Mega-Session from day 2 of Intel Developer Forum (IDF) 2014 in San Francisco.

Diane Bryant, SVP and GM of the Data Center Group takes the stage promptly at 9:30am to kick off the data center mega-session. Bryant starts the discussion by setting out the key drivers affecting the data center: new devices (and new volumes of devices) and new services (AWS, Netflix, Twitter, etc.). This is the “digital service economy,” and Bryant insists that today’s data centers aren’t prepared to handle the digital service economy.

Bryant posits that in the future (not-so-distant future):

  • Systems will be workload optimized
  • Infrastructure will be software defined
  • Analytics will be pervasive

Per Bryant, when you’re operating at scale then efficiency matters, and that will lead organizations to choose platforms selected specifically for the workload. This leads to a discussion of customized offerings, and Bryant talks about an announcement earlier in the summer that combined a Xeon processor and a FPGA (field-programmable gate array) on the same die.

Bryant then introduces Karl Triebes, EVP and CTO of F5 Networks, who takes the stage to talk about FPGAs in F5 and how the joint Xeon/FPGA integrated solution from Intel plays into that role. F5′s products use Intel CPUs, but they also leverage FPGAs to selectively enable certain functions in hardware for improved performance. Triebes talks about how F5 and Intel have been working together for about 10 years, and discusses how F5 uses instruction set changes (they write their own microkernel—is that really sustainable moving forward?), new features, etc., and that includes leveraging the integrated FPGA in Intel’s new product.

The discussion now shifts to low-power system-on-chips (SoCs), such as the 64-bit Intel Atom. Bryant announces the third-generation SoC, named Xeon D and based on the Xeon platform. The Xeon D is sampling now. Bryant brings on stage Patty Kummrow, who is Director of Server SoC Development. Bryant and Kummrow talk about how Intel is addressing the need to customize the platform to address critical workloads: software (storage acceleration library, for example); in-package accelerator (FPGA, for example); SoC (potentially incorporating customer IP); and instruction set architectures (like the AES-NI instructions to enhance cryptographic functions). Kummrow shows off a Xeon D SoC and board.

Bryant shifts the discussion to software-defined infrastructure (SDI). The first area of SDI that Bryant focuses upon is storage, where growth is happening rapidly but storage is still siloed. Per Bryant, Intel believes that software-defined storage will address these concerns, and doing so in three ways:

  • Intel Storage Acceleration Libraries (ISA-L)
  • Open source investments in Ceph and OpenStack Swift
  • Prototype SDS controller providing separate of control plane and data plane

Bryant now turns to software-defined networking (SDN) and network functions virtualization (NFV), and—quite naturally—points to the telcos as the prime example of why SDN/NFV are so important. According to Bryant, NFV originated in October 2011, and now (just three years later) there will be commercial deployments by companies like AT&T, Verizon Wireless, SK telecom, and China Mobile. Bryant also talks about Intel’s Network Builders program, and talks about Nokia’s recent announcement (which is based on Intel Xeon).

Shifting now to the compute side, Bryant talks about Intel’s rack-scale architecture (RSA) efforts. RSA attempts to provide disaggregated pools of resources, a standard method of exposing hardware to software, and a composable infrastructure that can be assembled based on application resources.

Core to Intel’s RSA efforts is silicon photonics, which is a key point to allowing high-speed, low-latency connections between the disaggregated resources within an RSA approach. Silicon photonics will enable 100Gbps at greater than 300 meters, at a low cost and with high reliability. Also important, but sometimes overlooked, is that the silicon photonics cabling will be smaller and thinner.

Bryant introduces Andy Bechtolsheim, Founder and Chief Development Officer and Chairman of Arista Networks. Bryant gives Bechtolsheim the opportunity to talk about Arista’s recent launch of 100Gb networking and why 100Gb networking is important and necessary in modern data centers. Bryant states that she believes silicon photonics will be essential in delivering cost-effective 100Gb solutions, and that leads to a discussion of the CLR4 alliance. CLR4 is focused on delivering 100Gb over even greater distances.

Next, Bryant introduces Das Kamhout to talk about the need for an orchestration system in the data center. Kamhout talks about how advanced telemetry can be exposed to the orchestration system, which can make decisions based on that advanced telemetry. This will eventually lead to predictive actions. It boils down to a “watch, act, learn” feedback loop. The foundation is built on Intel technologies like Cache Acceleration, ISA-L, DPDK, QuickAssist, Cache QoS, and power and thermal awareness.

This “finally” leads into a discussion of pervasive analytics, which is one of the three key attributes of future data centers. Bryant states that pervasive analytics will help improve cities, discover treatments, reduce costs, and improve products—obviously all through data centers powered by Intel products. Intel’s focus is to enable analytics, and is working closely with the Hadoop community (specifically Cloudera).

According to Bryant, the new Intel E5–2600 v3 more than doubles the performance of Cloudera’s Hadoop distribution. Bryant brings out Mike Olson, Co-Founder and Chief Strategy Officer for Cloudera. Olson states that the consumer Internet “discovered” the idea of big data, but this is now taking off in all kinds of industries. Olson gives examples of hospitals instrumenting neonatal care units and cities gathering data on air quality more frequently and more comprehensively. Both Olson and Bryant reinforce the value of open source to “amplify” the effect of certain efforts. Olson again conflates big data and the Internet of Things (IoT), indicating that he believes that the two efforts are naturally coupled and will drive each other. Bryant next gives Olson the opportunity to talk about Cloudera Hadoop 5.2, which is optimized for Intel architectures to provide more performance and more security, which in turn will lead to accelerated adoption of Hadoop. Bryant reinforces the link between IoT/wearables and big data, mentioning again the “A-wear” program discussed yesterday in the keynote.

At this point Bryant wraps up the keynote and the session ends.

Tags: , ,

IDF 2014 Day 1 Recap

In case you hadn’t noticed, I’m at Intel Developer Forum (IDF) 2014 this week in San Francisco. Here’s a quick recap of day 1 (I should have published this last night—sorry for not getting it out sooner).

Day 1 Keynote

Here’s a liveblog of the IDF 2014 day 1 keynote.

The IDF keynotes are always a bit interesting for me. Intel has a very large consumer presence: PCs, ultrabooks, tablets, phones, 2-in–1/convertibles, all-in–1 devices. Naturally, this is a big part of the keynote. I don’t track or get involved in the consumer space; my focus is on the data center. It is kind of fun to see all the stuff going on in the consumer space, though. There were no major data center-centric announcements yesterday (day 1), but I suspect there will be some today (day 2) in a mega-session with Diane Bryant (SVP and GM of the Data Center Group at Intel). I’ll be liveblogging that mega-session, so stay tuned for details.

Technical Sessions

I was able to hit two technical sessions yesterday and liveblogged both of them:

Both were good sessions. The first one, on virtualizing the network, did highlight an important development regarding hardware offloads for Geneve, the next-generation network overlay encapsulation protocol. Intel announced yesterday that the new XL710 network adapters (which are 40Gbps adapters) will support Geneve hardware offloads. This is the first hardware offload for Geneve of which I am aware, and it signals increased support for Geneve. (The XL710 also supports offloads for VXLAN and NVGRE.) That’s cool.

The second session was more of an introductory session than anything else, but was useful nevertheless. I was already familiar with all the concepts discussed regarding Docker and containers and virtualization, but I did pick up a few useful analogies from the speaker, Nick Weaver. Nick didn’t share anything specific to containers with regard to work Intel might be doing, but as I was thinking about this after the session I wondered if Intel might do some work around enabling containers to use the x86 privilege rings/protection rings. This would improve container security and move Linux containers closer to the Bromium “microvisor” architecture. Nick was also bullish on Intel SGX, something I’m going to have to explore in a bit more detail (I don’t know anything about SGX yet).

Coffee Chats

One of the nice things about attending IDF is that the Intel folks do a great job of connecting influencers (bloggers, press, analysts) with key folks within Intel to discuss announcements, trends, etc. This year, this took the form of “coffee chats”—informal discussions that were, sadly, lacking coffee.

In any case, the discussions wandered around a bit (as these sorts of things are wont to do). Here are a few thoughts that I gleaned from the discussions or that resulted from the discussions:

  • Intel does have/is working with very large customers on customized silicon, typically these are tweaks to create a custom SKU (like more cores, higher frequencies, different power envelope, etc.). This is interesting, but obviously applicable only to the largest of customers given the cost involved.
  • Intel is working with a few other companies (Dell, Emerson, and HP) on a hardware API specification; early work on the API can be found here.
  • Intel is pushing forward with the idea of rack-scale architecture (RSA); this is something I blogged about last year (see this post). There’s another RSA-related session on Thursday that I’m hoping to be able to attend so I can provide more information. I’m on the fence about RSA; I still don’t see a compelling reason why users/consumers/operators should switch to RSA instead of buying servers. I may publish something else specific about RSA later; I still need to have some discussions with the Intel engineers on the floor and see if I’m missing something.
  • The networking-focused Fulcrum assets that Intel purchased a few years ago are continuing to be leveraged in a variety of ways, some of which are also related to the rack-scale architecture efforts. Personally, I’m less interested in how Intel is using the Fulcrum stuff in RSA, and more interested in work Intel might be doing around making it easier for Linux vendors to “hook into” Intel-based hardware platforms for the purpose of building disaggregated network operating systems. You may already know that I’m pretty bullish on Cumulus Linux, but Cumulus right now is heavily tied to the Broadcom chipsets, and—according to discussions I’ve had with Cumulus—the effort to port over to Intel’s Fulcrum chips is not insignificant. Any work that Intel can do to make that easier/faster/cheaper is all positive in my book. It would be great to see Intel release a DPDK equivalent that is focused on integration into the switching chipsets in their Open Networking Platform (ONP) switch reference architecture (see this post from last year).

Closing Thoughts

Clearly, there’s a lot going on within Intel, as the company works hard—and is being reasonably successful—to differentiate hardware in an environment where abstraction layers like hypervisors and cloud management platforms are trying to homogenize everything. The work that Intel has done (in conjunction with HyTrust) on geofencing is nice and is, I think, an indicator of ways that Intel can continue to innovate beyond just more cores, more efficiency, and faster clock speeds (not that there’s anything wrong with those!).

Stay tuned for more liveblogs from IDF 2014, and I’ll post a day 2 recap as well. Thanks for reading!

Tags: , ,

This is a live blog of session DATS004, titled “Bare-Metal, Docker Containers, and Virtualization: The Growing Choices for Cloud Applications.” The speaker is Nicholas Weaver (yes, that Nick Weaver, who now works at Intel).

Weaver starts his presentation by talking about “how we got here”, discussing the various technological shifts that have affected the computing landscape over the years. Weaver includes a discussion of the drivers behind virtualization as well as the pros and cons of virtualization.

That, naturally, leads to a discussion of containers. Containers are not all that new—Solaris Zones is a form of containers that existed back in 2004. Naturally, the recent hype associated with Docker has, according to Weaver, rejuvenated interest in the concept of containers.

Before Weaver gets too far into containers, he first provides a background of some of the core containerization pieces. This includes cgroups (the ability to control resource allocation/utilization), which is built into the Linux kernel. Namespace isolation is also important, which provides full process isolation (so that one process can’t see processes in another namespace). Namespace isolation isn’t just for processes; there’s also isolation for network entities, mounts, and users. LXC is a set of user-space tools that attempted to make using these constructs easier, but it hasn’t (until recently) been easy to really leverage these constructs.

Weaver next takes this relatively abstract discussion and makes it a bit more concrete with a specific example of how a microservice architecture would look under virtualization (OS instance, microservice libraries, and microservice itself) and well as under containers (shared OS instance and shared libraries plus microservice itself). Weaver talks about the “instant start” attribute of a container, but puts that in the context of the lifetime of the workload that’s running in the container. Start-up times don’t really matter for long-lived workloads, but for temporary, ephemeral workloads start-up times do matter. The pattern of “container on VM” is also mentioned by Weaver as another design pattern that some people use.

Next Weaver provides a quick list of pros and cons of containers:

  • Pros: faster lifecycle vs. virtual machines; containers what is running within the OS; ideal for homogenous application stacks on Linux; almost non-existent overhead
  • Cons: very complex to configure (by itself, absent some sort of orchestration system or operating at scale); currently much weaker security isolation than VMs; applications must run on Linux (because Windows doesn’t have the same container technologies)

Next, Weaver transitions the discussion to focus on Docker specifically. Weaver describes Docker as “an easy button for containers,” making the underlying containerization constructs (cgroups, namespaces, etc.) easier to use. Docker is simpler and easier than LXC (where multiple binaries were involved). Weaver believes that Docker images—which he describes as an ordered set of actions to build a container—are the real game-changer. Weaver’s discussion of Docker images leads to a review of a Dockerfile, which is a DSL (domain specific language) for creating Docker images. Docker images are built on a series of layers; underlying layers could be “just” OS images (like Ubuntu or CentOS), but they could also be customized builds that contain applications and/or data.

Image registries are how users can create images and share images with other users. The public Docker Hub is an example of an image registry.

The discussion now transitions into a quick review of the underlying Docker architecture. There is a Docker daemon that runs on Linux; the Docker client can be run elsewhere. The Docker client communicates with the Docker daemon (although you should note that in many cases the daemon listens on a local socket, which means using a Docker client remotely over the network won’t work).

The innovations that Weaver attributes to Docker include: images (like templates for VMs, and the use of copy-on-write makes them behave like code); API and CLI tools for managing container deployments; reduced complexity around deploying and managing containers; and support for namespaces and resource limits.

Weaver provides a more concrete example of how Docker can change a developer’s process for creating code. Here Weaver’s DevOps background really starts to show, as he discusses how Docker and containers would help streamline CI/CD operations.

Next up are the gotchas with containers. Trust is one gotcha; can we trust that one container won’t affect other containers? The answer, according to Weaver, is “it depends.” You still need to follow current recommended practices, such as no root access, host-level patches, auditing, and being aware of the default settings (which might be dangerous, if you aren’t aware). One way to address some of these concerns is to use VMs to provide strong security isolation between containers that need a stronger level of isolation than the standard container mechanisms can provide.

Intel, of course, is working on making containers better:

  • Security (Intel AES-NI, INtel TXT/TCP, Intel SGX)
  • Performance/flexibility (Intel VT-x/VT-d/VT-c)

Weaver wraps up the session with a quick summary of the key points from the session and some Q&A.

Tags: , , , ,

This is a liveblog of IDF 2014 session DATS002, titled “Virtualizing the Network to Enable a Software-Defined Infrastructure (SDI)”. The speakers are Brian Johnson (Solutions Architect, Intel) and Jim Pinkerton (Windows Server Architect, Microsoft). I attended a similar session last year; I’m hoping for some new information this year.

Pinkerton starts the session with a discussion of why Microsoft is able to speak to network virtualization via their experience with large-scale web properties (Bing, XBox Live, Outlook.com, Office, etc.). To that point, Microsoft has over 100K servers across their cloud properties, with >200K diverse services, first-party applications, and third-party applications. This amounts to $15 billion in data center investments. Naturally, all of this runs on Windows Server and Windows Azure.

So why does networking need to be transformed for the cloud? According to Pinkerton, the goal is to drive agility and flexibility for your business. This is accomplished by pooling and automating network resources, ensuring tenant isolation, maximizing scale/performance, enabling seamless capacity expansion and workload mobility, and minimizing operational complexity.

Johnson takes over here to talk about how Intel is working to address the challenges and needs that Pinkerton just outlined. This breaks down into three core areas that have unique requirements and capabilities: network functions virtualization (NFV), network virtualization overlays (NVO), and software-defined networking (SDN).

Johnson points out that workload optimization is more than just networking; it also involves CPU (E5–2600 v3 CPU family), network connectivity (Intel XL710, now offering support for next-generation Geneve encapsulation), and storage (Intel SSDs). Johnson dives deep on the XL710, which was specifically designed to address some of the needs of cloud networking. Particularly, support for a variety of encapsulation protocols (NVGRE, IPinGRE, MACinUDP, VXLAN, Geneve), support for 40Gbps or 4x10Gbps connectivity in the same card, support for up to 8000 perfect match flow filters stored on die (this is Intel Ethernet Flow Director), and support for SR-IOV and VMDq are all areas where this card helps with NVO and SDN applications.

Next up Johnson walks through some behaviors in traditional networking as compared to network virtualization using an encapsulation protocol. Johnson uses two examples, one with VXLAN and one with NVGRE, but the basics between the two examples are very similar. Johnson also talks about why the stateless offloads in the XL710 (now supporting stateless offloads for both VXLAN and NVGRE, as well as next-generation Geneve) is important; this offloads some amount of work from the host CPU. The impact of network overlays on NIC bonding and link aggregation is another consideration; adapters and switches may not be aware of the encapsulation headers and therefore may not fully utilize all the links in a link aggregation group. The Intel X520/X540 had some offloads; the XL710 increases this support.

That wraps up the NVO portion, and now Johnson switches gears to talk about NFV. According to Johnson, service function chaining (SFC) is a key component of NFV. There are two options for SFC: Network Services Header (NSH), or Geneve. Johnson points out that Geneve was co-authored by Intel, MIcrosoft, VMware, and Red Hat, and is considered to be the next-generation encapsulation protocol. This leads Johnson into a live demo of Geneve and the importance of RSS. (Without RSS, bandwidth is constrained on the receiving system.)

One other key area for support of NFV is being able to transmit large numbers of small packets. This is enabled by Intel’s work on the Data Plane Development Kit (DPDK).

Johnson points out that 40Gbps Ethernet will not offer a BASE-T option; to help address 40Gbps connectivity, Intel is introducing new, low-cost optics (both transceivers and cables). Estimated cost for Intel Ethernet MOC (Modular Optical Connectors) is around $400—well down from costs like $1300 today.

Pinkerton now takes over again, talking about VM density and the changes that have to take place to support higher VM density in private cloud environments (although I would contend that highly virtualized data centers are not private clouds). In particular, Pinkerton feels that SMB3 and SMB Direct (RDMA support) are important developments. According to Pinkerton, these protocols address the need for lower network and storage CPU overhead, higher throughput requirements, lower variances in latency and throughput, better fault tolerance, and VM workload isolation.

Pinkerton insists that using file sharing semantics is actually a much better approach for cloud-scale properties than using block-level semantics (basically, SMB3 is better than iSCSI/FC/FCoE). That leads to a discussion of RDMA (Remote Direct Memory Access), and how that helps improve performance. Standardized implementations of RDMA include iWARP (RDMA over TCP/IP) and RoCE (RDMA over Converged Ethernet). InfiniBand also typically leverages RDMA. In the context of private cloud, having the ability to route traffic is important; that’s why Pinkerton believes that iWARP and RoCE v2 (not mentioned on the slide) are important.

That leads to a discussion of some performance results, and Pinkerton calls out incast performance (many nodes sending data to a single node) as an important metric in private cloud environments. In reviewing some performance metric for using RDMA, Pinkerton states that average latency is no longer satisfactory as a metric—instead, organizations should focus on 95th percentile and 99th percentile measurements instead of average. The metrics Pinkerton is using (based on tests with a Chelsio T580) show latency with SMB3 and RDMA to be very stable up to 90% load, and throughput is near line-rate.

Johnson takes back over now to announce that iWARP support will be built into the next generation of Intel NIC chipsets as a default for server environments.

At this point the session wraps up.

Tags: , , ,

« Older entries § Newer entries »