OpenStack

You are currently browsing articles tagged OpenStack.

Welcome to Technology Short Take #33, the latest in my irregularly-published series of articles discussing various data center technology-related links, articles, rants, thoughts, and questions. I hope that you find something useful here. Enjoy!

Networking

  • Tom Nolle asks the question, “Is virtualization reality even more elusive than virtual reality?” It’s a good read; the key thing that I took away from it was that SDN, NFV, and related efforts are great, but what we really need is something that can pull all these together in a way that customers (and providers) reap the benefits.
  • What happens when multiple VXLAN logical networks are mapped to the same multicast group? Venky explains it in this post. Venky also has a great write-up on how the VTEP (VXLAN Tunnel End Point) learns and creates the forwarding table.
  • This post by Ranga Maddipudi shows you how to use App Firewall in conjunction with VXLAN logical networks.
  • Jason Edelman is on a roll with a couple of great blog posts. First up, Jason goes off on a rant about network virtualization, briefly hitting topics like the relationship between overlays and hardware, the role of hardware in network virtualization, the changing roles of data center professionals, and whether overlays are the next logical step in the evolution of the network. I particularly enjoyed the snippet from the post by Bill Koss. Next, Jason dives a bit deeper on the relationship between network overlays and hardware, and shares his thoughts on where it does—and doesn’t—make sense to have hardware terminating overlay tunnels.
  • Another post by Tom Nolle explores the relationship—complicated at times—between SDN, NFV, and the cloud. Given that we define the cloud (sorry to steal your phrase, Joe) as elastic, pooled resources with self-service functionality and ubiquitous access, I can see where Tom states that to discuss SDN or NFV without discussing cloud is silly. On the flip side, though, I have to believe that it’s possible for organizations to make a gradual shift in their computing architectures and processes, so one almost has to discuss these various components individually, because to tie them all together makes it almost impossible. Thoughts?
  • If you haven’t already introduced yourself to VXLAN (one of several draft protocols used as an overlay protocol), Cisco Inferno has a reasonable write-up.
  • I know Steve Jin, and he’s a really smart guy. I must disagree with some of his statements regarding what software-defined networking is and is not and where it fits, written back in April. I talked before about the difference between network virtualization and SDN, so no need to mention that again. Also, the two key flaws that Steve identifies—single point of failure and scalability—aren’t flaws with SDN/network virtualization, but rather flaws in an implementation of said technologies, IMHO.

Servers/Hardware

  • Correction from the last Technology Short Take—I incorrectly stated that the HP Moonshot offerings were ARM-based, and therefore wouldn’t support vSphere. I was wrong. The servers (right now, at least) are running Intel Atom S1260 CPUs, which are x86-based and do offer features like Intel VT-x. Thanks to all who pointed this out, and my apologies for the error!
  • I missed this on the #vBrownBag series: designing HP Virtual Connect for vSphere 5.x.

Security

Cloud Computing/Cloud Management

  • Hyper-V as hypervisor with OpenStack Compute? Sure, see here.
  • Cody Bunch, who has been focusing quite a bit on OpenStack recently, has a nice write-up on using Razor and Chef to automate an OpenStack build. Part 1 is here; part 2 is here. Good stuff—keep it up, Cody!
  • I’ve mentioned in some of my OpenStack presentations (see SpeakerDeck or Slideshare) that a great place to start if you’re just getting started is DevStack. Here, Brent Salisbury has a nice write-up on using DevStack to install OpenStack Grizzly.

Operating Systems/Applications

  • Boxen, a tool created by GitHub to manage their OS X Mountain Lion laptops for developers, looks interesting. Might be a useful tool for other environments, too.
  • If you use TextMate2 (I switched to BBEdit a little while ago after being a long-time TextMate user), you might enjoy this quick post by Colin McNamara on Puppet syntax highlighting using TextMate2.

Storage

  • Anyone have more information on Jeda Networks? They’ve been mentioned a couple of times on GigaOm (here and here), but I haven’t seen anything concrete yet. Hey, Stephen Foskett, if you’re reading: get Jeda Networks to the next Tech Field Day.
  • Tim Patterson shares some code from Luc Dekens that helps check VMFS version and block sizes using PowerCLI. This could come in quite handy in making sure you know how your datastores are configured, especially if you are in the midst of a migration or have inherited an environment from someone else.

Virtualization

  • Interested in using SAML and Horizon Workspace with vCloud Director? Tom Fojta shows you how.
  • If you aren’t using vSphere Host Profiles, this write-up on the VMware SMB blog might convince you why you should and show you how to get started.
  • Michael Webster tackles the question: is now the best time to upgrade to vSphere 5.1? Read the full post to see what Michael has to say about it.
  • Duncan points out an easy error to make when working with vSphere HA heartbeat datastores in this post. Key takeaway: sometimes the fix is a lot simpler than we might think at first. (I know I’m guilty of making things more complicated than they need to be at times. Aren’t we all?)
  • Jon Benedict (aka “Captain KVM”) shares a script he wrote to help provide high availability for RHEV-M.
  • Chris Wahl has a nice write-up on using log shipping to protect your vCenter database. It’s a bit over a year old (surprised I missed it until now), and—as Chris points out—log shipping doesn’t protect the database (primary and secondary copies) against corruption. However, it’s better than nothing (which I suspect it what far too many people are using).

Other

  • If you aspire to be a writer—whether that be a blogger, author, journalist, or other—you might find this article on using the DASH method for writing to be helpful. The six tips at the end of the article are especially helpful, I think.

Time to wrap this up for now; the rest will have to wait until the next Technology Short Take. Until then, feel free to share your thoughts, questions, or rants in the comments below. Courteous comments are always welcome!

Tags: , , , , , , , , , , , , , ,

Next Monday, May 20, the OpenStack Denver meetup group will gather jointly with the inaugural meeting of the Infracoders Denver meetup group for a talk titled “Infrastructure as Code with Chef and OpenStack.” The joint meeting will be held at Innovation Pavilion in Centennial/Englewood (location information here). The event will start at 7PM.

Giving the presentation will be none other than Joshua Timberman of OpsCode (@jtimberman on Twitter). Joshua will be speaking on Chef, a system integration framework that is commonly used in “infrastructure as code” environments and in a number of OpenStack deployments. Joshua will discuss the basic principles of Chef, the primitives it provides, and how you can use it to drive your infrastructure toward full automation.

For more information, or to RSVP for the meetup event, you can visit either the OpenStack Denver meetup group event page or the Infracoders Denver meetup group event page. We do ask that you RSVP so that we can plan food and drinks for the event, but please only RSVP in one of the two meetup groups (not both).

<aside>Also, if you are interested in presenting at the OpenStack Denver meetup group or the Infracoders Denver meetup group, please let me know. We are actively seeking co-organizers as well as speakers/presenters for future events.</aside>

If you live in the South Denver metro area and are interested in either OpenStack or infrastructure as code, this is an event you won’t want to miss!

Tags: , , ,

Welcome to Technology Short Take #32, the latest installment in my irregularly-published series of link collections, thoughts, rants, raves, and miscellaneous information. I try to keep the information linked to data center technologies like networking, storage, virtualization, and the like, but occasionally other items slip through. I hope you find something useful.

Networking

  • Ranga Maddipudi (@vCloudNetSec on Twitter) has put together two blog posts on vCloud Networking and Security’s App Firewall (part 1 and part 2). These two posts are detailed, hands-on, step-by-step guides to using the vCNS App firewall—good stuff if you aren’t familiar with the product or haven’t had the opportunity to really use it.
  • The sentiment behind this post isn’t unique to networking (or networking engineers), but that was the original audience so I’m including it in this section. Nick Buraglio climbs on his SDN soapbox to tell networking professionals that changes in the technology field are part of life—but then provides some specific examples of how this has happened in the past. I particularly appreciated the latter part, as it helps people relate to the fact that they have undergone notable technology transitions in the past but probably just don’t realize it. As I said, this doesn’t just apply to networking folks, but to everyone in IT. Good post, Nick.
  • Some good advice here on scaling/sizing VXLAN in VMware deployments (as well as some useful background information to help explain the advice).
  • Jason Edelman goes on a thought journey connecting some dots around network APIs, abstractions, and consumption models. I’ll let you read his post for all the details, but I do agree that it is important for the networking industry to converge on a consistent set of abstractions. Jason and I disagree that OpenStack Networking (formerly Quantum) should be the basis here; he says it shouldn’t be (not well-known in the enterprise), I say it should be (already represents work created collaboratively by multiple vendors and allows for different back-end implementations).
  • Need a reasonable introduction to OpenFlow? This post gives a good introduction to OpenFlow, and the author takes care to define OpenFlow as accurately and precisely as possible.
  • SDN, NFV—what’s the difference? This post does a reasonable job of explaining the differences (and the relationship) between SDN and NFV.

Servers/Hardware

  • Chris Wahl provides a quick overview of the HP Moonshot servers, HP’s new ARM-based offerings. I think that Chris may have accidentally overlooked the fact that these servers are not x86-based; therefore, a hypervisor such as vSphere is not supported. Linux distributions that offer ARM support, though—like Ubuntu, RHEL, and SuSE—are supported, however. The target market for this is massively parallel workloads that will benefit from having many different cores available. It will be interesting to see how the support of a “Tier 1″ hardware vendor like HP affects the adoption of ARM in the enterprise.

Security

  • Ivan Pepelnjak talks about a demonstration of an attack based on VM BPDU spoofing. In vSphere 5.1, VMware addressed this potential issue with a feature called BPDU Filter. Check out how to configure BPDU Filter here.

Cloud Computing/Cloud Management

  • Check out this post for some vCloud Director and RHEL 6.x interoperability issues.
  • Nick Hardiman has a good write-up on the anatomy of an AWS CloudFormation template.
  • If you missed the OpenStack Summit in Portland, Cody Bunch has a reasonable collection of Summit summary posts here (as well as materials for his hands-on workshops here). I was also there, and I have some session live blogs available for your pleasure.
  • We’ve probably all heard the “pets vs. cattle” argument applied to virtual machines in a cloud computing environment, but Josh McKenty of Piston Cloud Computing asks whether it is now time to apply that thinking to the physical hosts as well. Considering that the IT industry still seems to be struggling with applying this line of thinking to virtual systems, I suspect it might be a while before it applies to physical servers. However, Josh’s arguments are valid, and definitely worth considering.
  • I have to give Rob Hirschfeld some credit for—as a member of the OpenStack Board—acknowledging that, in his words, “we’ve created such a love fest for OpenStack that I fear we are drinking our own kool aide.” Open, honest, transparent dealings and self-assessments are critically important for a project like OpenStack to succeed, so kudos to Rob for posting a list of some of the challenges facing the project as adoption, visibility, and development accelerate.

Operating Systems/Applications

Nothing this time around, but I’ll stay alert for items to add next time.

Storage

  • Nigel Poulton tackles the question of whether ASIC (application-specific integrated circuit) use in storage arrays elongates the engineering cycles needed to add new features. This “double edged sword” argument is present in networking as well, but this is the first time I can recall seeing the question asked about modern storage arrays. While Nigel’s article specifically refers to the 3PAR ASIC and its relationship to “flash as cache” functionality, the broader question still stands: at what point do the drawbacks of ASICs begin to outweight the benefits?
  • Quite some time ago I pointed readers to a post about Target Driven Zoning from Erik Smith at EMC. Erik recently announced that TDZ works after a successful test run in a lab. Awesome—here’s hoping the vendors involved will push this into the market.
  • Using iSER (iSCSI Extensions for RDMA) to accelerate iSCSI traffic seems to offer some pretty promising storage improvements (see this article), but I can’t help but feel like this is a really complex solution that may not offer a great deal of value moving forward. Is it just me?

Virtualization

  • Kevin Barrass has a blog post on the VMware Community site that shows you how to create VXLAN segments and then use Wireshark to decode and view the VXLAN traffic, all using VMware Workstation.
  • Andre Leibovici explains how Horizon View Multi-VLAN works and how to configure it.
  • Looking for a good list of virtualization and cloud podcasts? Look no further.
  • Need Visio stencils for VMware? Look no further.
  • It doesn’t look like it has changed much from previous versions, but nevertheless some people might find it useful: a “how to” on virtualization with KVM on CentOS 6.4.
  • Captain KVM (cute name, a take-off of Captain Caveman for those who didn’t catch it) has a couple of posts on maximizing 10Gb Ethernet on KVM and RHEV (the KVM post is here, the RHEV post is here). I’m not sure that I agree with his description of LACP bonds (“2 10GbE links become a single 20GbE link”), since any given flow in a LACP configuration can still only use 1 link out of the bond. It’s more accurate to say that aggregate bandwidth increases, but that’s a relatively minor nit overall.
  • Ben Armstrong has a write-up on how to install Hyper-V’s integration components when the VM is offline.
  • What are the differences between QuickPrep and Sysprep? Jason Boche’s got you covered.

I suppose that’s enough information for now. As always, courteous comments are welcome, so feel free to add your thoughts in the comments below. Thanks for reading!

Tags: , , , , , , , , , , , ,

I had the pleasure of attending the OpenStack Summit in Portland, OR last week. It was my first time at the OpenStack Summit, and it was great to meet lots of folks in the OpenStack community as well as be exposed to some more in-depth and detailed OpenStack information. While I was there I tried to liveblog as many sessions as I was able; here are links to the various session liveblogs that I managed to publish. Enjoy!

Getting From Grizzly to Havana, a DevOps Upgrade Pattern
Nicira NVP Deep Dive
Considerations for Building a Private Cloud, Folsom Update
Building HA OpenStack with Puppet in 20 Minutes
OpenStack Capacity Planning
Networking in the Cloud, an SDN Primer
OpenStack Back to the Enterprise, Keep Calm and Boldly Go On
OpenStack High Availability in Grizzly and Beyond

If anyone has any other liveblog sessions that should be added to this list, drop me a comment and let me know.

Tags:

This is a session titled “More Reliable, More Resilient, More Redundant: OpenStack High Availability in Grizzly and Beyond.” The presenter is Florian Haas from Hastexo (@hastexo). Florian is one of the founders of Hastexo, which is a services firm that provides OpenStack services, among other things.

There are four things that need to be addressed when discussing high availability:

  • Infrastructure
  • Storage
  • Compute
  • Networking

The presentation starts with a discussion of the infrastructure layer. Changes from Folsom to Grizzly are relatively few. Examples of infrastructure services like the databases (typically MySQL, sometimes PostgreSQL), AMQP (message queue; could be using RabbitMQ, ZeroMQ, etc.), and API services. With regard to infrastructure HA, there are 5 types of infrastructure nodes:

  1. Cloud controller
  2. API node
  3. Network node
  4. Compute node
  5. Storage controller

The cloud controller runs services that underpin OpenStack. It runs a relational database, an AMQP server, registry services, etc. The ability for an implementation to use active/passive or active/active depends largely on the specific back-end applications in use. Largely, cloud controllers will be deployed active/passive (this is due, in part at least, to the persistent data storage found in the relational databases).

API nodes are fundamentally stateless (locally); it interacts with the AMQP message bus. For most API services, we can use active/active scale-out approaches for API services.

The network node is, according to the presenter, “interesting.” This node takes care of routing between tenant networks, provider networks, and upstream networks. The network node typically also runs the DHCP agent to provide IP addresses to tenant systems. We’ll come back to the network node shortly.

The compute node is the hypervisor itself, where guest VMs/instances actually execute.

The storage controller depends greatly on what kind of back-end block storage. The block storage server (Cinder server) might have no local storage (which might be the case if you were running with Ceph, for example) or it might have lots of local storage.

All five of these infrastructure node types can use the same high availability stack, but with a few minor differences. The “recommended” high availability stack is Pacemaker. Hastexo has reference configurations for using Pacemaker with all OpenStack infrastructure services. According to the presenter, “it’s not rocket science—you can do it.”

The presenter next shows a diagram of an architecture using Pacemaker and Corosync for high availability of the cloud controller. From there he shows an example of using Pacemaker/Corosync for high availability of the stateless API services.

From here, he moves on to discuss HA for compute in a bit more detail. Grizzly addresses guest HA. He shows a hack where you “override” a couple of parameters to “trick” Nova into restarting VMs on another host after the first host fails. Unfortunately, this hack breaks live migration and is unsafe with Cinder volumes (although the Cinder issue is fixable).

In Grizzly, Nova has a nova evacuate command, but this command actually only works after a host has failed. This evacuation functionality isn’t present in Horizon (the Dashboard).

Another interesting feature is VM Ensembles. This feature allows you to group guests in a resilient fashion. The presenter provides an example of 6 VMs (2 each of three tiers), and the desire to make sure that the redundant nodes aren’t running on the same compute node. (For VMware users, this would be analogous to anti-affinity rules.) Unfortunately, this feature did not make it into the Grizzly release (it should be in Havana). A workaround would be to use a scheduler filter.

Looking in more detail at Networking, the same HA architecture (Pacemaker/Corosync) worked reasonably well in Folsom for Quantum-server and L3 agent. It didn’t work so well for the DHCP agent. However, this active/passive approach didn’t scale well. Grizzly helps address this by running multiple DHCP agents and multiple L3 agents to provide better scale-out support (via the Quantum scheduler).

Finally, the presenter moves on to discuss storage. Grizzly has dramatically expanded the amount of support within Cinder for block storage options; the presenter highly recommends upgrading to Grizzly if you need expanded Cinder support. He briefly mentions new Cinder drivers for NFS, GlusterFS, 3Par, LeftHand, EMC, and others. Pacemaker/Corosync is needed for the Cinder volume server. There is a hack required (need to override a host value in the database) in order to provide high availability for the Cinder volume server. It’s possible that this hack will be fixed in Havana (a bug has already been filed to fix it).

A few other tidbits:

  • Libvirt watchdog support in Nova and Glance is coming
  • Heat will bring some additional high availability features (Heat is now an integrated project and will be fully supported with Havana, as will Ceilometer)
  • The RabbitMQ library (Kombu) has gained the ability to have a list of queue hosts to which to connect (instead of just a single host)
  • ZeroMQ (peer-to-peer messaging) is another interesting option
  • MySQL/Galera is firming up to provide write-set replication (which will enable synchronous replication)
  • Some changes are occurring within RBD when you use it for both Glance and Cinder (the presenter did not elaborate)

At this point, the presenter wrapped up the session by opening up for questions and answers.

Tags:

This is a session titled “OpenStack Back to the Enterprise: Keep Calm and Boldly Go.” The session is led by Florian Otel (@florianotel on Twitter), with HP Cloud Services in EMEA. The purpose of this talk is to share some of the “lessons learned” in how to position OpenStack to enterprise customers and overcome their objections.

Florian starts out the presentation with a slide that says, “This is a business, not a science project.” He re-iterates that this session is about making business sense. He also assures us that this presentation won’t be a glitzy marketing session, either—it will be real, nitty gritty, “in the trenches” knowledge learned when positioning OpenStack to enterprise customers. Finally, Florian acknowledges that his presentation will probably be a bit biased toward service provider-type use cases.

The presentation goes on to display a picture of Geoffrey Moore, who wrote a book titled “Crossing the Chasm”. Florian ties this to the adoption curves of various technologies and Moore’s assertion in his book that “we need to very mindful of the customers in the market”. Specifically, marketing to the early adopters (on the left edge of the bell curve) and the mainstream (the bulge of the bell curve) is very different.

Next, the presenter shows us a picture of Clayton Christiansen, who wrote (among other books) “The Innovator’s Dilemma.” The conclusion drawn in the book is that there are two types of innovation: sustaining innovation and disruptive innovation.

Florian ties these two thoughts together in a chart the combines the adoption curve with the adoption/evolution of OpenStack as a disruptive innovation.

So how does one pitch OpenStack to an enterprise organization? Florian shares this quote: “Never try to sell a meteor to a dinosaur. It wastes your time and annoys the dinosaur.”

If that’s not the right way, then what is? Florian makes the “dreaded Linux-OpenStack comparison,” combining it with models and charts from Moore’s “Crossing the Chasm.” Florian posits that a key adoption point is that the underlying platform—be it Linux or OpenStack—must “become irrelevant.” He points to Comcast’s demo (which is powered by OpenStack) and asks, “Did anyone see OpenStack there?”

Florian goes to another quote from Moore stating that applications have an advantage over platforms when it comes to crossing the chasm. Moore believes that “platforms must be garbed in application clothing” in order to cross the chasm. In other words, “mind the gap” between applications and platforms.

The next slide in Florian’s presentation says this: “The more I love the idea, the less money it makes!!” The key point to take away is that as technologists we often “fall in love” with a technology/project/platform, but we need to be able to articulate the value of this technology/project/platform in some way other than “it’s a really cool technology.” This aligns very closely with my own thinking—we need to adopt some practicality if we want to see the technology/project/platform we love so much actually succeed.

Florian now moves from abstract and theoretical applications and moves into a more concrete discussion of various use cases for HPCS (HP Cloud Services). These use cases include archival, collaboration, “cloud bursting,” dev/test PaaS, and production applications. He delves in a bit deeper on one particular use case, to which he refers as “Dropbox for the enterprise.”

Next the presenter shares a warning: “All good ideas must die—so that great ideas might live.” Good use cases are going to die and pass away, but new (potentially even better) use cases will emerge. We mustn’t get “tied” to our existing use cases.

There are fundamentally three different areas where a company can focus:

  • Operational excellence
  • Product leadership
  • Customer intimacy

Florian says he believes that one lesson HP learned is that customer intimacy is critically important. He didn’t say, but I suspect that customer intimacy is important at earlier stages of market adoption (going back to the bell curve of market adoption), while other areas of focus might be more important at other stages of adoption.

According to Florian, it’s called bleeding edge for a reason. Be ready to help your customers that hurt themselves. It’s also important to “not get in your own way.” Be willing to admit when you’re wrong, press the Reset button, and press forward with customer needs in the forefront of your vision.

The secret to success is, according to Florian, simple: “Just learn to use OpenStack the way Hendrix uses his guitar.”

Tags: ,

This is a session titled “Networking in the cloud: An SDN primer.” It’s led by Ben Cherian, Chief Strategy Officer for Midokura. Ben indicates he’ll try to remain as impartial as possible while attempting to describe and define SDN.

Ben asserts that the basic driver pushing folks toward SDN is that the current state of networking in the cloud is too complex and too manual. As a result, the first question that people start asking is, how can I automate this? Telecom has gone through this before, and data networking is in a similar state of flux and development.

The presenter next discusses Almon Strowger, who invented the first electromechanical switch. His work–which might have been driven by some level of paranoia–led to automated phone switching and the rotary phone. Almon Strowger wanted to address privacy concerns and intentional human errors, and along the way he also solved unintentional human errors, connection speeds, and lower operational costs.

Ben indicates that the cloud—in particular, networking—is in a similar position. It’s time for the “Almon Strowger” of cloud networking to solve the challenges that keep networking from scaling.

The presenter next covers the concept of a control plane and the data plane. Abstraction is a key tenet of computer science, and when the control plane and the data plane reside on the same physical device (which is typically the case in traditional networks today), there is no abstraction. Abstraction exists in other areas of computing, but not in networking. To bring that abstraction into networking, a simple example is to separate the control plane into a separate controller that exists apart from the data plane. This is basic SDN.

Ben sees three use cases for SDN:

  1. IaaS
  2. Data center fabrics
  3. Carrier/WAN use cases

Looking at these in reverse order, examples of SDN technologies that you could see here would be a “hybrid control plane” leveraging either BGP (a distributed control plane) or OpenFlow (centralized control plane). In the data center fabric space, this involves the use of OpenFlow to manage multiple switches. Finally, in the IaaS space, you see software-based solutions and overlays. Examples of companies in this IaaS space are Midokura, VMware/Nicira, and others.

Some key requirements for IaaS “cloud networking”/SDN:

  • Multi-tenancy
  • L2 isolation
  • L3 isolation
  • Scalable control plane
  • NAT (floating IP)
  • ACLs
  • Stateful (L4) firewall
  • VPN
  • BGP gateway
  • RESTful API
  • Integration with CMS (like OpenStack)

Looking at this list of requirements, Ben feels like you can eliminate the technologies leveraged in the carrier/WAN and data center fabric SDN use cases, because these technologies don’t properly address all the IaaS/cloud networking requirements.

Ben next reviews a networking diagram that shows how these various requirements translate into cloud networks.

The candidate models to address these requirements are:

  • Traditional network
  • Hop=by-hop OpenFlow
  • Edge-to-edge IP overlays

Using traditional network models, VLANs become a constraint (only 4096 VLANs available means only 4096 tenants). This constrains L2 isolation. L3 isolation is constrained by VRFs, which are not fault tolerant, require expensive hardware, and don’t scale well.

If you wanted to use an OpenFlow fabric, the issue is more about the limitations of the physical switch itself. Storing the state in physical switches has issues with scale, not fast enough to update, and no atomicity of updates. There is also the issue of provisioning the physical switches that will then be controlled by OpenFlow. This approach also doesn’t address the other “higher level” concerns of cloud networking.

Edge-to-edge IP overlays are the method that Ben (and his company) prefers. Isolation is provided without VLANs, providing additional scalability. Only IP connectivity is required. You can use a scalable IGP (iBGP, OSPF) to build a reliable multi-path underlay network. This sort of thinking is inspired by a research paper by Microsoft regarding VL2 (no link provided).

Trends that support this sort of solution:

  • Faster packet processing on x86 servers at the edge
  • Clos (fat tree/leaf-spine) networks for the underlay
  • Merchant silicon brings down the cost of efficient IP switches
  • Optical intra-DC networks for plentiful bandwidth

The presenter also alludes to the use of configuration management tools (Chef, Puppet) on merchant silicon-based switches.

Ben next shows a diagram that illustrates an overlay network and an underlay network, and he walks through how traffic flows work (both from the overlay perspective as well as the underlay perspective).

Naturally, Ben believes that overlays are the right approach—but you still need a scalable control plane.

At this point, he opens the session up for questions and answers.

Tags: , ,

This is a liveblog of a session titled “OpenStack Capacity Planning.” The presenter starts out with a shout-out to the OpenStack Operations Guide that was recently written.

Here’s how to make capacity planning easy and simple:

  • A blank check backed by limitless funds
  • Unlimited time
  • A well-organized team of geniuses
  • Perfectly clear expectations that never change (up front & in writing)

Don’t have all that? Well, then you have to worry about capacity planning.

To start with capacity planning, the presenter suggests that the absolute best place to start is DevStack. Using DevStack allows you to test various capacity planning scenarios easily and quickly. However, the presenter warns against trying to use DevStack in production.

Next, you need to answer the question: Public cloud or private cloud? The answer to this question will drive a lot of the follow-up questions that you must answer. If the answer to this first question is private cloud, then it’s typically easier to do capacity planning because you’ll generally have a better idea of what sort of applications and workloads will be deployed. The presenter also feels that limited hardware/networking/storage choices in private cloud deployments makes capacity planning easier (although I’m not sure I agree). Deployment is also easier and can be a bit more leisurely due to lower growth rates.

For a public cloud, capacity planning is much trickier. You’ll have to design against a generic use case/workload because you won’t know the types of applications and workloads that your customers will actually put on this public cloud. In public cloud environments, you’ll likely be standing up new compute nodes constantly, so the speed and frequency of deployment is a key issue. Tools for fast, reliable provisioning and configuration management are very important.

The presenter mentions a few tools in this space: Crowbar, Puppet, Razor, Cobbler, Chef, CFEngine, and Fuel.

Monitoring is often an afterthought, but it really shouldn’t be—it should be done up front as much as possible. Here you can look at tools like Nagios, but there is a lot of debate in this space within the open source community. One of the reasons monitoring is important is to help establish some trending information, which helps in forecasting capacity needs.

What about the hypervisor you’re going to use? The presenter feels like this is an important decision. Pick the best hypervisor for your workload. The presenter prefers KVM, but there are others (he notably doesn’t mention vSphere, but does mention Hyper-V—interesting). He recommends against mixed hypervisor/heterogeneous environments. Hypervisor decision will also drive some storage decisions down the road (among other things).

At this point, the presenter circles back to DevStack, and recommends the use of DevStack to “test the heck out of” your selected hypervisor and anticipated workload. This isn’t necessarily for performance benchmarks, but rather to validate that everything works as expected.

Networking is the next big topic. The presenter recommends being very intentional about network selections, as he says that it is extremely difficult to switch between networking architectures (like from FlatDHCP to Quantum). Also be sure to watch out for bandwidth requirements and design accordingly. Naturally, IP addressing will be another concern you’ll need to consider.

Software-defined networking (SDN) is what the presenter recommends for any sizable deployments. He’s partial to NEC’s OpenFlow solution.

Compute density is a key factor in capacity planning. You’ll need to incorporate physical CPU cores, RAM, oversubscription ratio, and instance storage (ephemeral storage is local or shared). Using shared storage, like NFS or Ceph, means recovering VMs is much easier. Naturally, this means you’ll need to balance IOPS against storage capacity (GB/TB). Based on the speaker’s comments, I’d guess that he heavily favors Ceph.

When it comes to storage, your options for object storage are Swift, Ceph, and Gluster. Bandwidth is a concern (moving data to/from the object storage platform). The presenter stresses the importance of testing to understand the impact of the workloads on the object storage platform. The same goes for persistent block storage.

Finally, the presenter touches on concerns over the scalability of the cloud controller (which hosts the API endpoints and database). At some point you’ll have to consider separating the API endpoints onto dedicated boxes and using a load balancer. The presenter also suggests considering the use of Nova cells to help with growth and partitioning the load on the cloud controllers. (Cells are something that I really need to understand a bit better.)

And that concludes the session.

Overall, I didn’t find this session as useful as I would have liked—I expected this to be about ongoing capacity planning, not implementation design considerations. However, others might have come into the session with different expectations and might have found this session helpful. The key takeaway for me is that, as I saw from a similar session yesterday, it’s just as important when designing an OpenStack implementation to consider the resource demands, workload needs, and similar requirements. Just because it’s “cloud” doesn’t mean that it doesn’t still require some knowledge of how these components work under the covers.

Tags:

This is a liveblog of a session on using Puppet to build a highly available (HA) installation of OpenStack. The presenter is Boris Renski of Mirantis.

Boris believes that you need to know many things in order to successfully create the build architecture for an OpenStack deployment:

  • Linux
  • Networking
  • Virtualization
  • Python
  • Ruby
  • Puppet
  • Cobbler (or Razor)
  • mCollective/Salt

Boris introduces Fuel, which is an automation library for OpenStack (that is supposedly easy enough for a goat to use—a play on Mirantis’ geographical location and the Borat movies).

Fuel essentially includes the following items:

  • An OpenStack reference architecture with HA
  • Puppet manifests for deploying OpenStack
  • Cobbler-based bare metal provisioning
  • OpenStack packages
  • Support for CentOS, RHEL, and Ubuntu
  • Support for OpenStack Essex, Folsom, and (in May) Grizzly
  • A detailed configuration guide

Fuel supports a number of different deployment configurations: single node (pretty straightforward, much like DevStack); multi-node (including compact Swift and standalone Swift); and multi-node HA (with compact and standalone Swift and Quantum elements). “Swift compact” is for when Swift will be used only as back-end storage for VMs. “Quantum compact” is running Quantum on the controller node, even with high availability.

Fuel was specifically created in the form of a library so that users could easily modify and adopt the scripts to fit their particular OpenStack deployment. This gives users more flexibility when using Fuel to deploy OpenStack.

For the HA architecture:

  • They use HAProxy for most of the OpenStack services
  • For the message queue, they use an active/active RabbitMQ cluster
  • For the database, they use an active/active Galera MySQL cluster (this forces a minimum of four physical nodes)
  • The architecture uses keepalived for VIP (virtual IP) management

The overall process for deploying OpenStack with Fuel goes like this:

  1. Build the Fuel “master node,” which runs Cobbler and Puppet master
  2. Enter hardware info into Cobbler
  3. Cobbler installs the base OS (CentOS, RHEL, Ubuntu)
  4. Puppet picks up the node and installs/configures OpenStack

Next, Boris goes into a more light-hearted section on how he taught a goat to use Fuel. For us humans, this means the “Fuel portal,” which provides step-by-step instructions on using Fuel. They (Mirantis) also created “Fuel Web,” which is an easy GUI for Fuel. A private beta for Fuel Web is starting today.

Boris now turns the stage over to Roman, who shows a live demo of using Fuel to turn up a 2-node OpenStack deployment. Overall, Fuel looks like a very interesting and useful tool.

Looking ahead, what’s on the Fuel roadmap? Roman wants to add screens for the management of disks and NICs, which don’t exist in Fuel Web today. There’s also no support for Cinder in the web UI today, which is another item they’d like to add in future releases. They are also considering some level of monitoring and performance metrics for the OpenStack environments deployed using Fuel. Finally, they want to extend Fuel to help with OpenStack upgrades as well.

Fuel is available for download at http://fuel.mirantis.com.

Tags: , ,

This is a live blog for the session titled “Considerations for Building a Private Cloud, Folsom Update,” led by Ryan Richard of Rackspace (@rackninja on Twitter). As with other sessions here at the 2013 OpenStack Summit, this session is totally full, with people standing in the back, sitting on the floor along the sides, and seated on the floor across the front.

This session is about design considerations for building a private cloud with OpenStack. The focus will be the Folsom release. This session is based on experience after running Folsom for 6 months. Ryan intends to be able to provide a Grizzly-based version of this talk at the next time, after running Grizzly for 6 months.

First he tackles the question, “what is a private cloud”?

  • Are you looking for elastic or traditional virtualization? It most likely won’t be both.
  • Multi-tenant (or, more likely, multi-application)
  • Size (this talk will be limited to discussions of up to 100 nodes)
  • Private endpoints (the management endpoints aren’t accessible from the Internet)
  • Limited inbound connectivity
  • Customized for specific workloads

Ryan’s first recommendation is “Build with the end in mind.” He looks at how deploying the “m1.tiny” flavor would create a mismatch between CPU and RAM utilization, in that 48 vCPUs will be utilized but only a fraction of the host’s RAM would be allocated. The “m1.medium” flavor (4GB RAM, 2 vCPUs) creates a more balanced workload, whereas the larger flavors imbalance utilization the other way.

What this tells me is that capacity planning is just as importan with a private cloud deployment as it is with a “traditional virtualization” solution. Ryan’s recommendations around capacity are:

  • Don’t use a disk size of 0.
  • For public cloud offerings, you can limit the number of flavors. For private cloud offerings, you can create “customized” flavors for specific workloads. Find a balance for your organization.
  • Don’t forget about network utilization.

It’s important to remember that it’s easy to add compute nodes, but you can’t changed the fixed network (without Quantum) once instances are running. Quantum helps address this. (Note that Ryan indicates it’s possible to create multiple networks using the CLI or API in Folsom without Quantum, but the dashboard doesn’t respect or recognize it.)

This limit on the fixed network means that it is critically important to size the number of addresses available by calculating the number of instances that could be spun up within the cloud.

It’s easy to add nodes or networks on the host network side, but you can’t change the fixed network once you go into production (not without destroying your instances). It’s also easy to add more floating networks.

Ryan now switches gears to talk about images and storage. (There’s a session tomorrow in C123 at 1:50 PM.) Rackspace is going with virtio, qcow2 disk format, bare container, cloud-init with dynamic partitioning. I’m not 100% familiar with all of these terms, so you might expect to see some posts soon on some of these. There is a small performance hit with qcow2, so be sure to quantify that when building images (or re-using images that someone else has built).

Snapshotting is another sizing consideration that can’t really be adequately predicted (it’s hard to know if your users will be using snapshots or not). Ryan recommends qcow2 for snapshots. To help maximize the use of host caching of images, try to streamline the number of images.

A few more Glance tips:

  • Watch for network utilization. Glance could consume an entire 1Gb NIC.
  • Consider RAID–5 for large sequential reads/writes.
  • Disk bandwidth is more important than disk IOPS.
  • Reduce the number of images to improve host cache functionality.

Storage on the compute nodes, on the other hand, require a different view. Build for random I/O (RAID 10 or SSD or both).

A few architectural examples and thoughts:

  • For 1–20 servers, a single controller, a single API, and a single network (1 to 2 Gbps) are probably sufficient. If you need HA, you’ll need to increase those numbers appropriately.
  • For 20–100 servers, take the same architecture but add load balancing for APIs (for HA and scalability), use Swift (or CloudFiles or S3) for Glance, consider the use of availability zones, and consider dedicated networks for management, Cinder, VM-to-VM traffic, etc. You might also want to consider a dedicated system for gathering compute node metrics.

Some other general performance considerations:

  • Watch random IO and try to get as much random IO off local storage onto Cinder where possible.
  • Review hypervisor best practices.

A few other “lessons learned”:

  • Floating IPs must be associated with the “public_interface”
  • Each piece of OpenStack has its own architecture.
  • Folsom is stable.
  • Migration (live, block) works but scenarios exist where it doesn’t. Try not to rely on these mechanisms where possible, especially if you’re building an “elastic cloud” as opposed to “traditional virtualization.”
  • OpenStack is changing often, so keep up to date with the current state of the projects.
  • Don’t do heterogeneous nodes.

A few other operational updates:

  • There are some new nova hypervisor calls
  • New image types in Glance (including VMDK)
  • The policy.json file
  • Other things coming in Grizzly: cells, Quantum, and better AD/LDAP support

At this point Ryan opens the session up for Q&A.

Tags:

« Older entries