October 2009

You are currently browsing the monthly archive for October 2009.

Stu over at vInternals posted an article a couple of days ago about a problem he encountered with VMware vSphere and Windows Server 2008. Apparently, there is an unexpected behavior with Windows Server 2008 and VM hardware version 7 that is described in this VMware KB article. Stu, however, was seeing the behavior not on upgrading VMs from VM hardware version 4 to VM hardware version 7, but on new virtual machines created from the beginning with VM hardware version 7.

According to an update on Stu’s article, VMware has acknowledged this as a bug and will be investigating a fix to the problem. Until then, follow Stu’s advice and speak to your VMware account team if you are experiencing this problem. If you are getting ready to proceed with a VMware vSphere upgrade and have Windows Server 2008 Enterprise Edition VMs in place, keep this behavior in mind and plan accordingly.

Thanks to Stu for bringing this matter to light!

UPDATE: Stu posted an update with more information and an explanation for the unexpected behavior, so be sure to check it out.

Tags: , , , ,

With the release of VMware vSphere 4 earlier this year, VMware officially introduced VMware Fault Tolerance (VMware FT), a new mechanism for providing extremely high levels of availability to virtual machine workloads. As I’ve talked with customers, I’ve noticed a growing number of customers who are unaware of the differences between the types of high availability that VMware provides (in the form of VMware HA and VMware FT) and operating system-level clustering (such as Microsoft Windows Failover Clustering). Although both types of technology are intended to increase availability and reduce downtime, they are very different and offer different types of functionality.

Consider these points:

  • While using VMware HA will protect you against the failure of an ESX/ESXi host, VMware HA won’t—by default—protect you against the failure of the guest operating system. An OS-level cluster, on the other hand, does protect against the failure of the guest operating system. +1 for OS-level clustering.
  • VMware clusters that are using VMware HA can choose to use VM Failure Monitoring and gain some level of protection against the failure of the guest operating system, but you still won’t get protection of the specific application within the guest operating system, unlike an OS-level cluster. +1 for OS-level clustering.
  • These same arguments also apply to VMware FT. VMware FT won’t protect you against guest operating system failure—a crash of the OS in the primary VM generally means a crash of the OS in the secondary VM at the same time—and it won’t protect you against application failure. +1 for OS-level clustering.
  • You can’t failover between systems using VMware HA or VMware FT in order to perform OS upgrades or apply OS patches. +1 for OS-level clustering.
  • Similarly, you can’t failover between systems using VMware HA or VMware FT in order to do a rolling upgrade of the application itself. +1 for OS-level clustering.
  • Of course, the VMware technologies do have some advantages. Both VMware HA and VMware FT are far, far simpler to enable and configure than an OS-level cluster. +1 for VMware.
  • Both VMware HA and VMware FT don’t require any application support in order to protect the VM and its workloads. +1 for VMware.
  • Neither VMware HA nor VMware FT require that you license specific editions of the guest operating system or application in order to be able to use their benefits. +1 for VMware.
  • VMware HA can produce higher levels of utilization within a host cluster than using OS-level clustering. +1 for VMware.
  • VMware FT can provide higher levels of availability than what is available in most OS-level clustering solutions today. +1 for VMware.

This is not a knock against any of technologies listed—VMware HA, VMware FT, or OS-level clustering—but rather an exploration of their advantages, disadvantages, similarities, and differences. Hopefully, this will help readers who might not be as familiar with these products make a more informed decision about which technologies to deploy in their data center. (Hint: You’ll probably need all of them.)

Tags: , , , , ,

Fibre Channel over Ethernet (FCoE) is receiving a great deal of attention in the media these days. Fortunately, setting up FCoE on a Nexus 5000 series switch from Cisco isn’t too terribly complicated, so don’t be too concerned about deploying FCoE in your datacenter (assuming it makes sense for your organization). Configuring FCoE basically consists of three major steps:

  1. Enable FCoE on the switch.
  2. Map a VSAN for FCoE traffic onto a VLAN.
  3. Create virtual Fibre Channel interfaces to carry the FCoE traffic.

The first step is incredibly easy. To enable FCoE on the switch, just use this command:

switch(config)# feature fcoe

The next part of the FCoE configuration is mapping a VSAN to a VLAN. What VSAN should you use? Well, if you are connecting to an existing Fibre Channel fabric, perhaps on a Cisco MDS switch, you’ll need to make sure that the VSANs between the Nexus and the MDS are appropriately matched. Otherwise, traffic on one VSAN on the Nexus won’t be able to reach devices on another VSAN on the MDS. If there’s enough demand, I’ll post a quick piece on this step as well.

Note that this FCoE VSAN-to-VLAN mapping is a required step; if you don’t do this, the FCoE side of the interfaces won’t come up (as you’ll see later in this post). Assuming the VSAN is already defined, perform these steps to map the VSAN to a VLAN:

switch(config)# vlan XXX
switch(config-vlan)# fcoe vsan YYY
switch(config-vlan)# exit

Obviously, you’ll want to substitute XXX and YYY for the correct VLAN and VSAN numbers, respectively.

After you’ve enabled FCoE and mapped FCoE VSANs onto VLANs, then you are ready to create virtual Fibre Channel (vfc) interfaces. Each physical Nexus port that will carry FCoE traffic must have a corresponding vfc interface. Generally, you will want to create the vfc interface with the same number as the physical interface, although as far as I know you are not required to do so. It just makes management of the interfaces easier. The commands to create a vfc interface look like this:

switch(config)# interface vfc ZZ
switch(config-if)# bind interface ethernet 1/ZZ
switch(config-if)# no shutdown
switch(config-if)# exit

At this point the vfc interface is created, but it won’t work yet; you’ll need to place it into an VSAN that is mapped to an FCoE enabled VLAN. If you don’t, the show interface vfc <number> command will report this (emphasis mine):

vfc13 is down (VSAN not mapped to an FCoE enabled VLAN)

As I mentioned earlier, if you haven’t mapped the FCoE VSAN onto a VLAN, you won’t be able to fix this problem. If you have mapped the FCoE VSAN onto a VLAN, then you only need to assign the vfc interface to the appropriate VSAN with these commands:

switch(config)# vsan database
switch(config-vsan-db)# vsan <number> interface vfc <number>
switch(config-vsan-db)# exit

At this point, the vfc interface will report up, and you should be able to see the host’s connection information with the show flogi database command.

From this point—assuming that your storage is attached to a traditional Fibre Channel fabric, which is likely to be the case in the near future—you only need to create zones with the WWNs of the FCoE-attached hosts in order to grant them access to the storage. Refer to my posts on creating zones and managing zones on a Cisco MDS for more information on this task.

In my own experience, once FCoE was properly configured on the Nexus 5000 switch, then creating zones and zonesets on the Cisco MDS Fibre Channel switch and creating and masking LUNs on the Fibre Channel-attached storage is very straightforward. This, as has been stated on several previous occasions, is one of the strengths of FCoE: it’s compatibility with existing Fibre Channel installations is outstanding.

Feel free to submit any questions or clarifications in the comments below.

Tags: , , , , ,

How To Create a MetaLUN

MetaLUNs are a way of expanding a LUN for either additional I/O capacity (using a striped MetaLUN) or additional space (using a concatenated MetaLUN). A striped MetaLUN, as the name implies, stripes data across multiple component LUNs. Each of these component LUNs resides on a different RAID group, so creating a striped MetaLUN allows the MetaLUN to utilize all the spindles in all the RAID groups. A concatenated MetaLUN, on the other hand, fills up one component LUN before moving on to the next component LUN. I/O capacity is essentially unchanged, but storage capacity is expanded.

This article has some good information on MetaLUNs.

While trying to learn more about MetaLUNs, I searched high and low for a “how to” guide on creating MetaLUNs. Surprisingly enough, I didn’t find anything. So, here’s my take on how to create a MetaLUN. EMC experts, feel free to refine my process, correct my errors, and provide general guidance.

The equipment used in my experimentation is a CLARiiON CX4-960 running FLARE 28 (as best I can tell). I performed this task using Navisphere 6 running under Internet Explorer 6 on Windows Server 2003.

OK, ready? Here we go:

  1. Create the necessary RAID groups to house multiple LUNs. Based on my limited experience thus far, it seems like RAID 5 (4+1) is the most common configuration for a RAID group unless you know you need something else.
  2. Create LUNs with the exact same size and settings in each of the RAID groups you created. For maximum performance, ensure that the LUNs are created on separate buses. I used LUNs on three separate RAID groups: one on Bus 0, one on Bus 1, and one on Bus 2.
  3. Right-click on the first LUN you created and select Expand.
  4. Click Next to start the wizard.
  5. Select Striping and click Next.
  6. Select each of the LUNs. Hold down the Control key to select more than one LUN. Click Next when you have selected all your component LUNs.
  7. Make sure Maximum Capacity is selected and click Next.
  8. Click Finish.

When the wizard is done processing, the LUNs you created in the RAID groups will be moved into the Private LUNs container, and a new MetaLUN object will appear in Navisphere (under LUN Folders > MetaLUNs). You can then present this MetaLUN out to one or more servers in the same way you would present any other LUN object.

If I’ve misrepresented something or have provided incorrect information, please let me know by speaking up in the comments. I’d also love to hear any recommendations from EMC experts on the use of MetaLUNs, advantages and disadvantages of using them, etc. Share your knowledge!

Tags: ,

Yesterday I published a short post titled “I/O Virtualization and the Double-Edged Sword”. In that post, I discussed how Xsigo was criticizing FCoE for “not going far enough” in the realm of I/O virtualization. Unfortunately, I didn’t do a very good job of really getting my point across, because the discussion rapidly turned into a discussion of the merits of various interconnect technologies and why one might win over the other. While that is a great discussion to have—and I’m thrilled my site can help further that discussion—it wasn’t really the key point behind my article. I/O virtualization was only the catalyst to prompt the original post.

Let me see if I can more clearly articulate what I’m trying to say here. If you are a Twitter user and into virtualization or storage, then you probably are following either Chad Sakac of EMC (@sakacc on Twitter), Vaughn Stewart of NetApp (@vaughn_stewart on Twitter), or both. That being the case, you are probably very familiar with the extensive “discussions” that take place between the two of them. Both of them are very passionate about storage and virtualization, but they have differing viewpoints. Now, before I’m accused by NetApp of being an EMC bigot (which would be ridiculous given the coverage I’ve given NetApp) or accused by EMC of being a NetApp bigot (that, at least, might be understandable as I’m just now starting to learn EMC storage), let me say that I’m not endorsing either product. NetApp’s products and EMC’s products are different; each of them has strengths and weaknesses in different areas.

Now, ask yourself, “Why do these products have different strengths and weaknesses?” Do you know the answer? These products have different strengths and weaknesses because of the technology decisions each company chose to make in the products’ development. NetApp chose one path, EMC chose another. For NetApp, that has created certain efficiences, certain strengths—and corresponding weaknesses. Likewise, EMC’s technology decisions have resulted in their products having certain strengths and weaknesses. Neither of these products is perfect. For NetApp to claim that “their way is the right way” is ridiculous; their way is only one of many different ways to accomplish something. The same is true for EMC. And, by extension, the same is true for every other technology vendor on the planet.

You want more examples? Consider the architectural differences between VMware ESX/ESXi and Microsoft Hyper-V. The technology choices made by each company created inherent strengths and weaknesses in each product. VMware claims their choices are the best choices; Microsoft believes their architecture is the best. Clearly, neither product is perfect. Both products have their flaws.

The real key takeaway here is that no technology vendor has the right to throw rocks at another technology vendor. All technology vendors live in glass houses. For VMware to claim that Microsoft’s architecture is all wrong is, well, wrong. For EMC to say that NetApp’s technology choices are stupid would be wrong. For Xsigo to claim that FCoE is the wrong path for I/O virtualization is wrong (although, personally, I don’t consider FCoE an I/O virtualization technology, but that’s a different discussion for a different day). Why? Because every company has to make technology choices, and those technology choices will—by the very nature of technology—automatically create inherent differences, strengths, and weaknesses in the resulting product. And when you accept that truth (and it is a truth, I promise you), then you see why vendors should not engage in negative marketing. When a vendor engages in negative marketing about the competition, that vendor is simply inviting others to pick apart the flaws in their own products.

Of course, I’m not naive enough to believe that vendors will stop negative competitive marketing overnight. Still, I stand firm in the belief that those vendors that focus on the strengths of their products instead of the flaws of others’ products will move ahead. I’m certainly more likely to do business with them.

I’d be interested to hear what others have to say. Voice your position in the comments.

Disclosure: As you probably know, I work for a reseller who represents many different vendors and manufacturers. My words here are not endorsed by my employer, nor do I represent my employer in this area.

Tags: , , , ,

I recently came across this blog entry over at Xsigo’s new corporate blog, I/O Unplugged. A key phrase in this blog entry really caught my eye:

The reality is that FCoE solves neither the complexity nor the management problems. It is a minor change to the status quo when a major leap forward comparable to server virtualization is needed for I/O.

At first glance, I’d say they are right. FCoE was designed from the ground up to be completely compatible with Fibre Channel—and that’s one of its key strengths. Yes, Xsigo’s InfiniBand-based solution is a very different architecture, and the set of capabilities provided by the Xsigo I/O Director are very different than the capabilities enabled by an FCoE solution such as the Cisco Nexus 5000 (or the newly-announced Nexus 4000). I wouldn’t necessarily disagree that Xsigo’s solution might offer some benefits over FCoE. I would strongly contend, however, that FCoE does offer some benefits over InfiniBand-based I/O virtualization solutions.

See, every technology decision is a double-edged sword. Xsigo “breaks the mold” by using a new architecture based on InfiniBand, but this decision comes at the cost of compatibility. Cisco chooses to go with a “less innovative” solution, but gains the benefit of broad compatibility with a large installed base. There is no one solution that offers all advantages and no disadvantages. That being said, which is more important to you and your company: innovation or compatibility? These are the sorts of questions you need to ask when evaluating solutions.

What do you think? Feel free to post your thoughts below. Vendors, please be sure to disclose your affiliation. And, in the spirit of full disclosure, keep in mind that my employer is a Cisco partner, but I have worked with both Xsigo and Cisco solutions. The thoughts I post here do not reflect the thoughts or views of my employer.

Tags: , , , ,

Toward the end of August 2009, I posted an article on how to configure Cisco MDS zones via the command-line interface (CLI). This article is a follow-up to that article; in this post, I’ll review some commands that are helpful in managing those zones.

As with the first post, this post probably won’t be very helpful to users who are well-versed with the Cisco MDS family of Fibre Channel switches. Hence, why I’ve tagged it as a “new user’s” post. Similarly, I’m not going into the need for zones, as that is covered amply elsewhere.

First, I find it extremely handy to be able to rename Fibre Channel aliases using the fcalias rename command like this:

switch(config)# fcalias rename <old alias> <new alias> vsan XXX

You can also rename zones:

switch(config)# zone rename <old zone name> <new zone name> vsan XXX

And you can rename zonesets:

switch(config)# zoneset rename <old zoneset name> <new zoneset name> vsan XXX

In my earlier article I talked about the zoneset clone command, but you can also clone aliases and individual zones. I’m not yet convinced of the value of being able to clone an individual alias, and if you are using single initiator/single target zoning I’m not 100% sure how helpful it will be to clone a specific zone. Still, the functionality is there if you need it.

Adding a new alias, zone, or zoneset is similar to modifying an existing alias, zone, or zoneset. For example, to add a new alias to an existing zone, you would use these commands:

switch(config)# zone name existing-zone-name-here vsan XXX
switch(config-zone)# member fcalias new-alias-to-add
switch(config-zone)# exit

Likewise, adding a new zone to an existing zoneset is similar to defining a new zoneset:

switch(config)# zoneset name existing-zoneset-name vsan XXX
switch(config-zoneset)# member new-zone-to-add
switch(config-zoneset)# member another-new-zone
switch(config-zoneset)# exit

Managing zones via the CLI can be a bit daunting; as the number of aliases and zones increases, it becomes more difficult to work with all of them and find only the ones in which you are interested at the moment. Here, using the include keyword can be rather handy. Consider this command:

switch# show zone | include server-name
zone name server-name-storage vsan XXX
  fcalias name server-name vsan XXX
zone name server-name-storage2 vsan XXX
  fcalias name server-name vsan XXX

I’ve marked the matching text in bold, so that you can see that the include keywords acts like a bit like grep. This makes it much easier to filter out only the zones you want or need to see, instead of having to wade through all the currently defined zones. This is not an MDS-specific trick; it’s also applicable in IOS and NX-OS as well. And it works not only with zones, but also with zonesets, FC aliases, etc.

Cisco MDS experts, feel free to post additional suggestions on managing zones via the CLI in the comments below so that all readers can benefit. Thanks for reading!

Tags: , , , ,

By Aaron Delp
Twitter: aarondelp
FriendFeed (Delicious, Twitter, & all my blogs in one spot): aarondelp

I have done a number of VMware Lab Manager white boarding sessions and I want to share a few of my design notes and the reason for each.  Most of items come from my installation experience and the version 4 release notes.  Here they are in no particular order.

  • You need an LDAP server to import groups – Yes, you can set up user IDs in Lab Manager but you CAN NOT create groups.  Groups must be imported into the LM Server from an LDAP server.  This is critical if you intend to do any kind of restrictions around lease durations of configurations or storage pools.
  • You need fully qualified name resolution (with functioning reverse look-up) between all clients, ESX/vSphere servers, the LM Server, and the vCenter Server - The clients need the ESX/vSphere servers because if this isn’t in place, remote control of the virtual machines will not function (you get a black screen).  You also need DNS entries for the LM server because if you implement LiveLink functionality, LiveLink is hardcoded to the LM server name.  Lastly, you need the vCenter Server for behind the scenes communication of the LM environment.
  • Workflow and Disk Chains will be the key to success or failure of your project – The VMware documentation does a great job of describing how to do things. But, the documentation falls on its face when it comes to describing WHY you should do things.  The behind the scenes architecture must be planned out very specifically for Lab Manager to perform as you would expect. I will be covering Work Flow and Disk Chains in a future article.
  • Lab Manager version 4 Host Spanning requires an Enterprise Plus subscription because the VMware Distributed Switch technology is required – In the previous version of Lab Manager, VMware HA, DRS, and VMotion were not supported if you set up a fenced (isolated by a NAT router) configuration. The configurations (a configuration is groups of VMs in Lab Manager) were pinned to an ESX server at time of creation and stayed with that server until destruction.  LM version 4 gets around this by using the Distributed Switch to span hosts.  Some people will want this feature but in my experience some will not want to pay the extra dollars just to get this one feature. Also, be aware that the Cisco 1000V isn’t supported with Lab Manager.
  • You will need to monitor the number of configurations per server if you do not use Host Spanning – Lab Manager deploys new configurations to the ESX/vSphere servers using round robin.  A configuration is removed from a host when it is undeployed.  It is possible that the workload in your cluster will become out of balance because certain machines “live” longer than others.  Take this example; you have two vSphere servers and ten configurations with only one VM to keep it easy.  The configurations will be deployed split in between the servers for a total of five per server.  A user removes the configurations on the first server but leaves the configurations on the second server.  Now another ten configurations are deployed.  The new ten configurations will again be deployed five each.  You know have one server with five and one server with ten.  Over time your load will become unbalanced.
  • Lab Manager 4 can’t use vSphere’s thin provisioning for disks – Lab Manager uses the concept of Linked Clones for copies of virtual machines.  The first one in the chain is “thick”, the rest of the machines are delta disks of the first one.  This is a technology that is independent and different from both vSphere and VMware View.
  • I haven’t had a chance to test this one yet but it appears VMware is now supporting and suggesting that you run the Lab Manager 4 server in a virtual machine.  Be careful with this one!  Where are you going to install it?  Are you going to install it on the same cluster and ESX servers you are managing?  This will create a circular dependency that I just don’t like.  Same goes for vSphere.  Have you ever tried to patch ESX servers using Update Manager when vCenter Server is on that host?  I have by accident, it doesn’t work!  For my lab I have my vCenter and Lab Manager servers in a different cluster (Scott’s vSphere cluster) than my LM vSphere servers.  In this configuration the servers are virtual, but they are out of the way.  I know with Lab Manager 3 you couldn’t install LM server on a virtual machine if it detected it was hosted on the ESX server you are trying to manage.  I’m not sure if version 4 still has this checking feature included at installation.
  • Make sure you back up your Crypto Key that Lab Manager generates during the installation –  See the Manual for more information.  You will need it if you run into big issues and need to reinstall Lab Manager.
  • VMware Snapshots are not supported because Lab Manager has a Snapshot feature built-in – Lab Manager allows the user to take one (and only one!) snapshot of a configuration.
  • Increase the Service Console on ESX/vSphere servers from 272MB to 800MB – This will help with the overhead of Lab Manager on the servers.
  • VMware Fault Tolerance (FT) and Distributed Power Management (DPM) are not supported with Lab Manager.
  • Storage VMotion and VMware VCB are not supported with Lab Manager.

If you have any other suggestions or design considerations, please let me know!

Tags: , ,

Welcome to Virtualization Short Take #30, my irregularly posted collection of links and thoughts on virtualization. I hope you find something useful here!

  • I believe Jason Boche already mentioned this on his own blog (I couldn’t find a link) and also started this VMware Communities thread discussing the fact that the 8/6 patch breaks FT compatibility between ESX and ESXi hosts in the same cluster. This VMware KB article is now available with more information on the problem. I’m hearing from VMware is that there is no short-term solution; the workaround is to use only ESX or only ESXi within a single cluster. (I don’t recommend not patching the hosts until the problem is fixed.)
  • And while we’re talking VMware FT, here’s a good document on VMware FT architecture and performance. (Eric Siebert’s Virtualization Pro blog post about VMware FT is really good, too.)
  • I’m also hearing reports that there are problems mixing ESX and ESXi in the same cluster when using host profiles. Theoretically, you should be able to use an ESX reference host and apply that to ESXi hosts, but in reality it’s not working so well.
  • If you’re using AppSpeed, you’ll need to manually turn off the AppSpeed sensor VMs in order to put ESX/ESXi hosts into Maintenance Mode. The sensor VM won’t VMotion off the host, so this prevents the host from entering Maintenance Mode.
  • Here’s another topic that I think has been mentioned elsewhere (looks like Duncan mentions it here), but SRM 1.0 Update 1 Patch 4 was released a couple of weeks ago and it includes a fit for customizing the IP addresses of Windows Server 2008 guest operating system instances.
  • Toward the end of August, VMware Infrastructure 3 support was added for NetApp MetroCluster (see this VMware KB article). Now, how about some VMware vSphere 4 support?
  • Most of you are aware by now (and if you aren’t aware, go buy a copy of my book so you will be aware) that you can use Storage VMotion to change virtual disks from thin provisioned to thick provisioned. The problem is this: the type of thick provisioned disk created when you do this via Storage VMotion is eagerzeroedthick, not zeroedthick. This means that it is not friendly to storage array thin provisioning!
  • I’m still looking for a valid use case for this little trick, but it’s mentioned by both Duncan and Eric: the ability to present multiple cores per socket to a virtual machine. Duncan’s post is here; Eric’s post is here. As Eric points out, licensing is one potential use. Anyone have any other valid use cases?
  • Eric Sloof has a great post on dvSwitch caveats and best practices that is definitely worth reading.
  • Want to make linked clones work on vSphere? Tom Howarth points out in this post some information made available by William Lam. Both articles are worth a look.
  • Tom also posted some useful information on enabling firewall logging on VMware ESX hosts.
  • This post over on Aaron Sweemer’s blog was actually written by guest author John Blessing (aka @vTrooper on Twitter) and just goes to illustrate how difficult it can be to create a chargeback model.
  • Of course, the “Super iSCSI Friends” recently produced a multi-vendor post on using iSCSI with VMware vSphere, a great follow-up to the original multi-vendor VI3 post. Here’s Chad’s version of the multi-vendor vSphere and iSCSI post.

That wraps it up for this time around. Thanks for reading, and feel free to submit any other useful or interesting links in the comments below.

Tags: , , , , , ,