March 2010

You are currently browsing the monthly archive for March 2010.

It’s that time again: time for another Virtualization Short Take! My Virtualization Short Takes are quick glances at various news bytes, announcements, useful blog posts, or other items of interest. (By the way, the “short” in “Short Take” does not imply that my post is going to be short, in case anyone was wondering. I’m still long-winded, and I have a lot of things that I find interesting.)

  • Have I mentioned how useful the weekly VMware KB digests are?
  • Frank Denneman has published a couple of really great articles recently. The first discussed how to remove an orphaned Nexus 1000V distributed virtual switch; the second discusses a complex interaction between HP Continuous Access and LUN balancing scripts. Both articles are worth a read.
  • Similarly, Jeremy Waldrop has had a couple of good posts since he managed to get his hands on a Cisco UCS. The first post describes a “Doh!” moment when Jeremy realizes that adding more vNICs to a VMware ESXi instance with the Cisco VIC (aka “Palo”; sorry, Cisco, you’re not going to be escaping that code name any time soon) is really just a matter of specifying them in the service profile. I can certainly see where that’s not immediately intuitive. The other article describes Jeremy’s experience with using vNIC failover. There’s great information in the comments to that article; in particular, be sure not to enable vNIC failover with VMware vSphere. Bad things happen as a result. (OK, maybe not “bad things,” but network connectivity can be adversely affected. You should let VMware vSphere handle the NIC teaming and failover.)
  • Toni Westbrook has a good article on how to move the COS VMDK in VMware ESX 4.0. Key note: this solution is currently unsupported by VMware, so use at your own risk.
  • I’ve mentioned before how various bloggers often have a “masterpiece” post. This isn’t necessarily their most well-written post, but it’s the post that, for whatever reason, is a defining post for them. For me, it’s the ESX/VLANs/NIC teaming article I wrote in 2006. I think Jason Boche might have just come up with his: an in-depth discussion of the vpxd.cfg configuration options. Great information, Jason!
  • In VDI environments, storage capacity is only one aspect of the overall storage equation. Vijay Swami at vEverything takes a pretty balanced view of how two leading storage vendors—EMC and NetApp—address not only storage capacity, but also IOPS. It’s worth a read and again underscores that there is no “one right way” to do things. Different doesn’t necessarily mean better or worse, just different. It’s all about the technology choices. (Disclosure: I work for EMC Corporation.)
  • VDI on local disks, anyone? It’s an interesting discussion point that has its pros and cons. I guess the value of this sort of design really depends upon the business objectives the VDI implementation is trying to fulfill.
  • Is anyone else amused by the abrupt “about face” that Microsoft performed with Hyper-V’s dynamic memory feature? Wow…even I was caught off-guard by how quickly they went from one end of the spectrum to the other. I would rather hear someone say, “You know, we were wrong, and this is a valuable feature after all” than to just flip 180 degrees and start moving in a whole new direction.
  • Speaking of Microsoft and whole new directions…there was a great deal of coverage about Microsoft’s desktop virtualization announcement. I won’t try to delve into the details here; that’s a particular niche that is better served by those who have the time to dedicate to it. If you haven’t seen the news, my good friend Alessandro has a great write-up and there’s the official press release from Microsoft.
  • If you’re interested in getting more information on RemoteFX—which appears, more than anything else, to simply be a set of LAN-only acceleration features for RDP and not an entirely new protocol—this article has good information. You might also have a look at this post about Service Pack 1 for Windows Server 2008 R2, which will enable both RemoteFX as well as the afore-mentioned Dynamic Memory.
  • Continuing along with my little BSD love-fest, I came across this article that describes some strange behavior with CARP that can only be fixed by using link aggregation. The geek in me wants to go test this in a bunch of different scenarios to see if the Nexus 1000V fixes it or something like that, but I doubt that I’ll have the time.
  • This is old news now, but in case you hadn’t heard VMware is licensing technology from Likewise Software for use with the next version of VMware vSphere. This will tighten vSphere’s integration with Active Directory. This is generally good, except that it will render my articles on ESX integration with Active Directory useless.
  • With VMware vSphere 4.0 Update 1, you can now install EMC PowerPath/VE using vCenter Update Manager. This VMware KB article provides the details how it’s done.
  • If you’re using ESXi and want to direct logging data elsewhere via syslog, this VMware KB article describes to configure syslog in ESXi.
  • The ages-old discussion of scale up vs. scale out is revisited again in this blog post. I guess the key takeaway for me is the reminder that while VMware HA does restart workloads automatically, there’s still an outage. If you’re running 50 VMs on a host, you’re still going to have an outage across as many as 50 different applications within your organization. That’s not a trivial event. I think a lot of people gloss over that detail. VMware HA helps, but it’s not the ultimate solution to downtime that people sometimes portray it as.
  • PHD Virtual has released esXpress version 4.0 today. I’ve taken a step back from most product announcements simply because they come too quickly to really keep up with them (unless you’re a madman like David Marshall over at VMBlog.com—my hat’s off to you, David!), but the timing worked out for this one. Go have a look at PHD Virtual’s web site for all the details.
  • Last, but most certainly not least, my esteemed colleague Mike Laverick has completed his updated VMware SRM book, now updated for VMware SRM 4.0. Great work, Mike! I would wish you all financial success with the book, but as you’re giving it away for free (an admirable step, by the way) I guess I’ll just have to wish you all other forms of success!

That does it for me this time around, folks. Thanks for reading (I appreciate it!), and if you have some good information to add please feel free to speak up in the comments.

Tags: , , , , , , ,

Last year, I wrote a piece about multi-hop Fibre Channel over Ethernet (FCoE) and some of the various reasons why—at the time—multi-hop FCoE was not a practical reality. Some things have changed since I last discussed multi-hop FCoE, and today I’d like to take a quick look at the interaction between network interface virtualization (NIV) and FCoE and see how this plays into multi-hop FCoE.

If you haven’t already read my recent article on understanding NIV, go read it now. Likewise, if you haven’t read the multi-hop FCoE article, you should go read it now too. Believe me, the rest of this article will make a lot more sense if you do.

Done now? Good. Let’s get started.

From my article on multi-hop FCoE, I identified two key roadblocks to multi-hop FCoE support:

  1. First, there was a lack of widespread support for FCoE Initialization Protocol (FIP). FIP allows an FCoE initiator to be separated from a Fibre Channel forwarder (FCF) by one or more IEEE DCB-capable switches. This has since been addressed by updates to NX-OS (the code that runs on Cisco’s Nexus 5000 switches) and by Generation 2 converged network adapters (CNAs); both of these components now have FIP support included.
  2. Second, there was no defined standard for creating the FCoE equivalent of trunking E_ports (which I believe are referred to as VE_ports). VE_ports are necessary to link multiple FCFs together, much in the same way you would use an inter-switch link (ISL) in traditional Fibre Channel storage area networks (SANs). To my knowledge, this issue has not yet been addressed.

At first glance, NIV doesn’t seem to help with either of these roadblocks. When you take a deeper look, though, you’ll see that NIV can actually serve as a workaround for both problems. Here’s how.

In NIV, recall that you have both an interface virtualization (IV)-capable bridge (or switch) as well as one or more interface virtualizers (IVs). (Remember that IVs are also referred to as fabric extenders.) Network interface cards (NICs) connect to ports on the IVs, and the IVs uplink to the IV-capable bridge. Even though the IV-capable bridge and the IVs are physically separate devices, they appear as a single device. Even though there is a connection—typically an Ethernet connection—between the IVs and the IV-capable bridge, it appears and functions as a single device.

With this in mind, then, I’ll ask this question: what is a multi-hop topology? If a multi-hop topology is multiple physical devices connected over an Ethernet uplink, then multi-hop FCoE is possible today with an FCoE-enabled IV-capable bridge and one or more FCoE-enabled IVs. In fact, this is the topology that Cisco uses in its Unified Computing System (UCS): a pair of FCoE-enabled IV-capable bridges (the UCS 6100XP fabric interconnects) connected to one or more FCoE-enabled IVs (the I/O Modules in the back of the chassis).

Applying this line of thinking to our roadblocks above, we see that the use of NIV allows for greater port densities; greater port density is one of the primary reasons why users would want FCoE initiators separated from an FCF by an IEEE DCB-capable switch. In addition to leveraging FIP (and eventually leveraging the IEEE DCB standards once they are finalized), you can build the same sort of topology using an IV-capable bridge and one or more IVs.

Similarly, using NIV as a way of connecting multiple devices together eliminates the need to chain multiple FCFs together; this, in turn, eliminates the need for the FCoE equivalent of ISLs and the need to create VE_ports between the FCFs. So NIV helps to address the second roadblock as well.

Of course, NIV isn’t the only way the industry is going to address the need for multi-hop FCoE. Further—to my knowledge, at least—NIV is a Cisco-only approach. As FCoE continues to mature and the IEEE DCB standards are finalized and ratified, then organizations can leverage a standards-based approach to building more complex FCoE topologies than are currently possible today.

Courteous comments, corrections, and thoughts (with full vendor disclosure, please) are welcome below.

Tags: , , ,

I’ll start this off with a disclaimer: this post is really more for my own benefit than the benefit of anyone else.

OpenBSD is my OS of choice when it comes to setting up a quick, simple UNIX-based virtual machine (VM). Need a virtual firewall? Use OpenBSD. Need a router? Use OpenBSD. Need a web server or an FTP server? Use OpenBSD. Need to run some network security tools? Use OpenBSD.

The problem is this: once I get an OpenBSD system up and running, it runs so well that I rarely have to go set up another one. Because there is then this length of time between installations, I always find myself forgetting the steps to take when installing an OpenBSD system. Thus, the need for this post and why I say it’s really for my benefit more than anything else. Next time I need to install OpenBSD in a VM for some reason, I can quickly come back and reference my list. (I will say that the installation of OpenBSD in recent versions has gotten much simpler than it was in the past.)

Oh, another disclaimer is probably necessary here, too: this is not to be considered some sort of “best practices” guide, so please don’t hammer the comments with stuff like “You know, you really should…”. This is just a quick and simple setup.

With those disclaimers out of the way, here’s the installation procedure. This was written for use with OpenBSD 4.6:

  1. Boot from the OpenBSD installation ISO image. When prompted, choose “i” to install.
  2. Press Enter for the default keyboard layout (unless you need a different layout, naturally).
  3. Enter the system’s hostname in short form.
  4. Enter the name of the network interface to configure. When installing on VMware Fusion 3.0.2 on my Macintosh, the default interface is em0. On VMware vSphere 4, the default interface is vic0.
  5. Enter the IPv4 address or press Enter to use DHCP.
  6. Enter the IPv6 address or press Enter to not assign an IPv6 address.
  7. Press Enter to complete the configuration of network interfaces.
  8. Press Enter not to perform any manual network configuration.
  9. Enter and confirm the root password.
  10. Press Enter to start sshd by default.
  11. Press Enter not to start ntpd by default.
  12. Enter “no” to indicate that you will not be running the X Window System.
  13. Press Enter not to change the default console to com0.
  14. Press Enter not to create an additional user. (I generally prefer to create an additional user after installation is complete.)
  15. Press Enter to accept the default disk as the root disk. On my Mac running VMware Fusion 3.0.2, the default disk is wd0.
  16. Press Enter to use the whole disk.
  17. Press Enter to use auto layout of partitions on the disk. (I’m not sure what version of OpenBSD added this feature, but it is quite handy for simple installations.)
  18. Press Enter to use the CD to install the sets. The CD in the VM should be mapped to the ISO image of the OpenBSD 4.6 install CD.
  19. Press Enter to use the default CD (which showed up as cd0 on my system).
  20. Press Enter to use the default path to the sets.
  21. Remove the X Window System sets by entering “-x*” and pressing Enter.
  22. Verify that the X Window System sets (xbase46.tgz, xetc46.tgz, xshare46.tgz, xfont46.tgz, and xserv46.tgz) are unselected, then press Enter to complete set selection. OpenBSD will start installing the sets.
  23. Enter the timezone, such as “US/Eastern”.
  24. Enter reboot to reboot your new OpenBSD VM. You should now be ready to perform final configuration of OpenBSD, such as using pkg_add to install packages or editing rc.conf.local to control what daemons are launched at startup. (Of course, those are tasks for an entirely different blog post).

That’s it. Again, this not a best practice/ideal installation. It’s just a “drop dead simple” installation in a VM for when you need to get something done quickly.

Tags: , , , , ,

In late November 2009, I published a post on understanding NPIV (N_Port ID Virtualization) and NPV (N_Port Virtualization); you can read the full post here. In that post, I described a pair of virtualization technologies—NPV and NPIV—for Fibre Channel storage area networks (SANs). This time around, I’d like to discuss a related technology on the Ethernet networking side called network interface virtualization (NIV). This technology, by the way, is currently undergoing IEEE standardization as part of the 802.1Qbh standard under the name “Bridge Port Extension”.

Before I can describe NIV, I need to define a few key terms:

IV-capable bridge: This is a switch that understands and is aware of interface virtualization (IV). Examples of IV-capable bridges include the Nexus 5000 and the UCS 6100XP fabric interconnects.

Interface virtualizer: An interface virtualizer (IV) is a device that simply extends the reach of an IV-capable bridge. To the outside world, an IV-capable bridge and all of its IVs appear as a single bridge. Examples of IVs include the Nexus 2000 fabric extender and the I/O module (IOM) in the Cisco Unified Computing System (UCS). There are also other forms of IVs, as I’ll explain later in this post.

Link-local tag: Frames entering an IV from a network interface card (NIC) have a link-local tag added to them; this link-local tag denotes the source IV port. Similarly, frames entering an IV-capable bridge have a link-local tag added to them that indicates the path through the IV(s) to the destination IV port. In this way, the link-local tag supplants the MAC address as the primary method of determining to which port (or out which port) a frame should be forwarded. The link-local tag is removed from the frame when it exits an IV (headed into a NIC) or when it exits an IV-capable bridge. (This link-local tag is what Cisco refers to as VNTag.)

Now that I’ve defined some NIV terminology, I’d like to explain why NIV is useful. To understand the value of NIV, take a look at the progression or development of data center networking:

  1. Before virtualization, bridges (or switches) were connected to a NIC (sometimes multiple NICs, but usually just one NIC) in a physical server. The relationship between physical NICs, physical switch ports, and MAC addresses is static and easy to determine (generally one NIC with one MAC address connected to one switch port). The bridge hierarchy is reasonably simple.
  2. Then hardware manufacturers introduce the server blade form factor. This introduces another layer of bridges (switches) in the blade chassis themselves and creates a more complex bridge hierarchy. This more complex bridge hierarchy also means more management, as each of these bridges represents a point of management. In order to provide redundancy between the various layers of bridges multiple connections are necessary; this, in turn, necessitates the use of Spanning Tree Protocol (STP) in order to prevent bridging loops. STP creates active/passive connections, meaning the effective bandwidth of the network is reduced.
  3. Along comes virtualization, which introduces the idea of multiple virtual NICs (vNICs) associated with a single physical NIC. This, in turn, introduces another layer of switching and more points of management, and the bridge hierarchy grows more complex. Because multiple MAC addresses are now associated with a single NIC connected to a single switch port, bridges must now deal with “hairpin forwarding,” in which the switch must retransmit a frame out the same port on which it was received. The relationship between NICs, MAC addresses, and switch ports is now much more complex and very dynamic (due to live migration technologies).

As the proliferation of virtualization continues, this trend toward increased complexity also continues unabated. How, then, are we supposed to address this? NIV is intended to help address this problem. NIV seeks to remove the complexity from the edge—the NICs and vNICs—and drive that complexity toward the bridges. That is a key underlying principle behind NIV. Look back at the definitions: one characteristic of an IV-capable bridge is that the IV-capable bridge and all of its associated IVs appear to the outside world as a single bridge.

For example, consider a Nexus 2000 fabric extender connected to a Nexus 5000 switch. From both a management perspective as well as a networking perspective, the Nexus 5000+Nexus 2000 combination appears as a single device. The Nexus 2000 is simply an extension of the Nexus 5000 (hence the name “fabric extender”). This is why we say that a Nexus 2000 is an example of a interface virtualizer and why a Nexus 5000 is provided as an example of an IV-capable bridge. Similarly, this is why an IOM in the back of a UCS blade chassis functions as an interface virtualizer; it acts as an integrated part of the UCS 6100XP fabric interconnect. Just as the ports on a Nexus 2000 appear to be part of the Nexus 5000, the ports on a UCS IOM appear to be part of the UCS 6100XP fabric interconnects.

By now it should start making a little bit of sense. IVs allow you to scale to larger port densities, but they keep the complexity away from the edge of the network. If you’re astute, though, you’ll note that the examples I’ve provided so far don’t address the growth of vNICs created by virtualization. The Nexus 2000 and the UCS IOM help address the growth of physical ports, but not virtual ports. Can NIV help address vNICs as well?

So far, the examples I’ve provided of IVs have been IVs that embrace the bridge/switch form factor. There’s no reason, though, that an IV must look or feel like existing bridges or switches. Here’s another type of interface virtualizer that will really blow you away: what about the Cisco Virtual Interface Controller (VIC), aka Palo? Think about it: a VIC is really just an IV built into a UCS server blade. It appears to the outside world as additional ports on the IV-capable bridge (the UCS 6100XP fabric interconnect) to which it is connected. This underscores the flexibility of IVs and also underscores the fact that IVs can be “chained,” just as the VIC (an IV built onto the server blade) is “chained” behind the UCS IOM (an IV in the rear of the UCS chassis). Chained IVs appear as part of the upstream IV-capable bridge to which they are connected.

By building the IV into the server itself and leveraging hypervisor bypass (VMDirectPath in VMware vSphere), Cisco can address the growth of vNICs. With VIC as an IV on a VMware vSphere host, the ports that connect directly to a VM appear as ports on the upstream IV-capable bridge (the UCS 6100XP in this case); this erases any distinction between physical NICs on physical servers and virtual NICs on virtual servers. Again, a key component of NIV is that an IV-capable bridge and all associated IVs (regardless of form factor) appear as a single bridge.

Related to NIV is the idea of an Ethernet Host Virtualizer (EHV). EHV describes the behavior when an IV-capable bridge and associated IVs appear to the rest of the network as a single host. Because it appears as a single host, all issues with multiple uplinks and STP now go away. This is why, for example, the uplinks on the UCS 6100XP fabric interconnects are always active-active. I’m planning on delving a bit deeper in EHV in the near future.

I hope that this discussion of NIV has been useful. If you are interested in some additional information, I found some documents which were extremely helpful in solidifying my information. These documents (here, here, and here) are very technical but they are good sources of information nevertheless.

In a future post, I’ll discuss the role of the Nexus 1000V in NIV and how it relates to the other components described here.

Feel free to post any questions, comments, or clarifications below.

Tags: , , ,

This is just a quick post about a potential fix for some timeout issues when using EMC Replication Manager (RM). An e-mail sent to an internal distribution list described a situation in which a user was using RM but was getting an error when trying to take a VMware snapshot. The error reported was a fairly generic error:

Cannot create a quiesced snapshot because the create snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine.

As it turns out, the problem was actually VSS in the Windows Server 2003-based guest. Since RM leverages VSS, an error with VSS was causing the entire process to fail. The fix was to clean up VSS as described in this Microsoft KB article and then reinstall the VMware Tools. After completing both of those steps, the problem was resolved.

If you are using RM and run into this problem, be sure to double-check to ensure that VSS is working as expected.

Tags: , , ,

It’s been a busy couple of weeks! I was in Vienna, Austria, all last week, and I’m on the US West Coast this week. Even though I’ve been on the go, I’ve still been collecting various virtualization-related posts and tidbits. Here they are for you in Virtualization Short Take #36! I hope you find something useful.

  • You might recall that in early 2008 I wrote about how thin provisioned VMDKs on NFS storage tend to inflate. In a recent post, Chad Sakac pointed out that VMware has addressed this problem, which is caused by the use of the eagerzeroedthick VMDK format instead of the zeroedthick format. The fix requires ESX 3.5 Update 5 and VirtualCenter 2.5 Update 6, plus a configuration change that is outlined in this VMware KB article. Kudos to VMware for fixing the underlying issue instead of just forcing customers to upgrade to vSphere.
  • VMware’s Scott Drummonds provides a bit more information on the memory compression technology previewed by Steve Herrod at Partner Exchange 2010 a few weeks ago. In my opinion, anyone who says that the hypervisor is a commodity isn’t paying attention to the fact that VMware is still innovating in this space.
  • If you perform virtualization assessments using VMware’s Capacity Planner tool, you’ll find Gabe’s Capacity Planner troubleshooting tips helpful.
  • Gabe also published a “wish list” for VMware datastores. If you take a deeper look at what Gabe is really trying to address, though, a great deal of the functionality he’s looking for could be achieved through a combination of policy-based storage tiering and greater integration between VMware and the storage array. Would you really need category labels on VMware datastores if the underlying storage was tiering data effectively based on utilization? Probably not. It might still make sense in some cases, but I think the vast majority of cases would be addressed. I think that you are going to see some very cool innovation in this space over the course of this year.
  • Simon Gallagher recently asked this question: with the move to ESXi, is NFS more useful than VMFS? It appears that a large part of Simon’s argument centers around the speed at which files can be transferred into VMFS using ESXi, and it seems to me that VMware needs to do some optimization there. I’m not knocking NFS—I’ve used it extensively in the past and I have and continue to recommend it to customers where it is appropriate—but I’m not sure that you can build an argument for NFS based on ESXi’s file transfer performance. My friend and former colleague Aaron Delp (whose blog was recently added to Planet V12n; congrats!) points out that fixing VMDK alignment using ESXi could be an issue; now that’s a great point. Even third-party utilities like vOptimizer don’t work with ESXi. In my opinion, these points underscore the need for VMware to concentrate very heavily on ESXi if that is indeed going to be the “platform moving forward”.
  • I came across an interesting VMware KB article while browsing the weekly VMware KB digest for the week ending 2/28/10. The article, which discusses a situation in which VMware HA would fail to configure at 90% completion, describes how some network switches—HP ProCurve 1810G switches with automatic denial-of-service protection enabled and Cisco Catalyst 4948 switches with ICMP rate limiting enabled—can drop packets that are necessary for VMware HA to configure and start correctly.
  • Unfortunately, the latest VMware KB weekly digest (for the week ending 3/6/2010) didn’t include links to the actual articles that were published. Bummer! Still, it’s easy enough to simply look up the articles directly.
  • EMC today released a couple of plug-ins for vCenter Server. The Celerra plug-in for VMware Environments brings Celerra NFS provisioning into the vSphere Client. The Celerra Failback Plug-in for SRM automates failback in VMware SRM environments. The official press release is here, which contains links to more information on the individual plug-ins. (Disclaimer: I work for EMC.)
  • Newly-minted VCDX #029 Frank Denneman posted a good article on using reservations on resource pools to bypass slot sizing. As Frank points out, it’s not a recommended practice necessarily, but it might be warranted depending on customer requirements.
  • Duncan’s recent article on the behavior of CPU and memory reservations is also helpful, especially for those who might not be familiar with the differences between the two types of reservations.
  • Similarly, this guest post on Duncan’s site by VCDX Craig Risinger also helps explain how shares on a resource pool work. This is good information to have if you are unfamiliar with the topic.
  • I’m not a security geek, but I did think that the RSA-Intel-VMware announcement at RSAC 2010 (third-party coverage here) was pretty cool. Security experts, I’d love to hear your thoughts on the matter. What was good about the announcement? What was missing?
  • If you will be working with distributed vSwitches, this post by EMC’s Gregg Robertson might help; it underscores the need to ensure that your environment is being consistently and thoroughly patched and maintained. vCenter Update Manager, anyone?

I do have a few other articles in my “things to read list” that I haven’t yet gotten around to reading:

The Official Quest Software Desktop Virtualization Group Blog » Blog Archive » How to Integrate ThinApp with Quest vWorkspace 7.0
DRS Resource Distribution Chart
HP Flex-10 versus Nexus 5000 & Nexus 1000V with 10GE passthrough

That’s it for now. I hope that you’ve found something useful here, and—as always—I’d love to hear your thoughts in the comments below.

Tags: , , ,

I was browsing through an EMC technical document titled “EMC CLARiiON Integration with VMware ESX Server” (download it here) a little while ago and I came across a phrase in the document that caught my attention:

“VMware ESX/ESXi support both Fibre Channel and iSCSI storage. However, VMware and EMC do not support connecting VMware ESX/ESXi servers to CLARiiON Fibre Channel and iSCSI devices on the same array simultaneously.”

What? No Fibre Channel and iSCSI from the same array to a VMware ESX/ESXi host simultaneously? That piqued my curiosity, so I contacted a few people within EMC to question the veracity of that statement. It turns out that the answer is more complicated than it might seem at first glance.

For those of you who aren’t interested in the deep technical details, here’s the short explanation behind this behavior:

  • VMware fully supports the use of both Fibre Channel and iSCSI from the same array to the same VMware ESX/ESXi host simultaneously.
  • VMware does not support presenting the same LUN via both protocols concurrently to the same host. (I qualified this directly with VMware.)
  • For a Celerra, you can use both Fibre Channel (via the CLARiiON side of the array) and iSCSI (via the Celerra side of the array) simultaneously. This is a fully supported configuration.
  • A CLARiiON array can easily present the same LUN via both Fibre Channel and iSCSI, but then VMware wouldn’t support it (see earlier bullet).
  • With a CLARiiON array, it is possible to present some LUNs via Fibre Channel and some LUNs via iSCSI to the same VMware ESX/ESXi host (i.e., LUN A via Fibre Channel and LUN B via iSCSI), but EMC will only support it if you file an RPQ. Without an RPQ, it’s an unsupported configuration. An RPQ, by the way, is a request to qualify a certain configuration for support.

I’m confident that some other array vendors out there will be very quick to jump on this post and harp on this limitation until the cows come home. I would just ask this question: is it really as big of a limitation as it seems? I’ll come back to that question in a moment.

With the short explanation in mind, here are the more in-depth details. If you like the longer, more technical explanation, then read on!

From EMC’s side, the root of the restriction about using both Fibre Channel and iSCSI devices on the same array simultaneously stems from the interaction of host registration and storage groups.

Host registration is a requirement in the CLARiiON world. In order to present storage to a host from a CLARiiON array, you must first register the host’s initiators with the array in Navisphere. Once the host has been registered, then you can proceed with presenting storage to that host. In theory the CLARiiON could operate without registering hosts and initiators, but EMC chose to require registration. EMC made this choice in order to help simplify host management.

Requiring host registration is a bit different than some of other storage arrays on the market. It’s not better or worse—just different. (Remember, pros and cons come from every technology decision.)

If you’re like me, you’re probably wondering at this point how requiring host registration simplifies anything. Instead of having to manage multiple paths, multiple initiators, and individual hosts every time you want to present storage to a host, you only need to register the host—and all of its initiators—and then you can refer to that same object (the host) over and over again as needed. Yes, host registration does mean a bit more work up front, but the idea is that it will save some work down the road. I guess you can think of host registration kind of like defining aliases in your Fibre Channel zoning configuration: it’s a bit more work up front, but it simplifies things later down the road. If you didn’t create device aliases in your Fibre Channel switch, you’d end up having to re-enter Fibre Channel WWPNs multiple times. You create the aliases so that it’s easier later. The same applies to host registration. Again, it’s a matter of choices.

One might also say that registration is security measure, albeit a weak measure. Rather than allow just any Fibre Channel-attached or iSCSI-attached host to see storage, the array requires that it know about the host (via host registration) in order to present storage to the host. This provides an additional layer of security to ensure that only authorized hosts are presented storage from the array.

Now you have a fairly decent idea of why host registration is necessary. So how does host registration occur? Host registration can occur either manually or automatically. Starting with version 4.0, both VMware ESX and VMware ESXi will automatically register with a CLARiiON array running any recent version of FLARE (ESX 3i version 3.5 also supports this form of push registration). FLARE release 28 and earlier will show these hosts as “Manually registered, unmanaged”; starting with FLARE 29, these hosts are listed as “Manually registered, managed”. In either case, the registration occurs automatically. If the host is Fibre Channel-attached, then the Fibre Channel initiators will be included in the automatic registration. The same goes for iSCSI initiators. Normally, this is a good thing because it saves the administrator the extra steps of registering the host with the storage array. (Also, because VMware ESX/ESXi hosts register automatically, there is no need to install the Navisphere Agent.)

In this case, though, the automatic registration causes a problem. Why? This goes back to the second item I said I needed to discuss: storage groups. Specifically, storage groups have two characteristics that come into play here:

  1. First, any given host—not just VMware ESX/ESXi hosts, but all types of hosts—can only be connected to a single storage group at any given time.
  2. Second, while the CLARiiON can present Fibre Channel LUNs and iSCSI LUNs simultaneously (including presenting the same LUN via both protocols simultaneously), there is no way within a single storage group to specify which LUNs should be accessed via Fibre Channel and which LUNs should be accessed via iSCSI. This is necessary because VMware won’t support accessing the same LUN via both protocols at the same time (see earlier VMware support statement).

Do you see how all the pieces come together? The only way to control which LUNs should be presented via which protocol is to use multiple storage groups—but a host can only be in a single storage group at a time. With only a single host object for any given VMware ESX/ESXi host, that host can only see either Fibre Channel LUNs (by being in a storage group containing Fibre Channel LUNs) or iSCSI LUNs (by being in a storage group containing iSCSI LUNs), but not both. Hence, the statement in the CLARiiON document I referenced in the very beginning of this blog post that outlines using either Fibre Channel or iSCSI but not both. This behavior is required to enforce the single-protocol LUN access required by VMware.

As with all things, there is a workaround. Because it is a workaround, that’s why the RPQ is necessary to get full support.

To work around this problem, you’ll need to ignore the automatic host registration (or disable the automatic host registration) and instead create two manually registered “pseudo-hosts”: one with the Fibre Channel initiators and one with the iSCSI initiators. These “pseudo-hosts” will need fake IP addresses (if they both use the same IP address, Navisphere will treat them as the same host, thus defeating the purpose of the workaround). Put the Fibre Channel initiators into the Fibre Channel storage group(s), and put the iSCSI initiators into the iSCSI storage group(s). Each “pseudo-host” will be able to see LUNs presented to that storage group and therefore would see both Fibre Channel and iSCSI LUNs at the same time. And, as required by VMware, any given LUN would be accessed only via Fibre Channel or iSCSI but not both. Remember that you need to file an RPQ in order to get support on this configuration.

For VMware ESX/ESXi 4.0 hosts (and ESX 3i version 3.5 hosts), you can disable automatic registration using the Disk.EnableNaviReg advanced configuration option. Setting this value to 0 disables the automatic registration with Navisphere. (Here are screenshots for VMware ESX 3i and VMware ESX/ESXi 4.) If you disable the automatic registration, then you only need to manually register the Fibre Channel and iSCSI initiators as separate “pseudo-hosts” and you’re ready to go.

Let me reiterate again that if you are presenting iSCSI LUNs via the Celerra and not the CLARiiON, none of this applies. Presenting Fibre Channel LUNs via the CLARiiON and iSCSI LUNs via the Celerra to the same VMware ESX/ESXi host is fine. This workaround that I’ve described only applies when you want to present some LUNs via Fibre Channel and some LUNs via iSCSI from a CLARiiON to a single VMware ESX/ESXi host.

Earlier you’ll recall that I asked this question: is this really a limitation? There are a couple of viewpoints:

  • One viewpoint states there is no need for both Fibre Channel and iSCSI connectivity to the same array. Since you already have Fibre Channel connectivity to the array, what’s the point in using iSCSI? Conversely, if you already have iSCSI connectivity to an array, why invest in establishing Fibre Channel connectivity? Since you can’t use it for failover (that would violate the VMware support position), running another block protocol against the same array and same sets of disks doesn’t add a great deal of value.
  • A second viewpoint argues that the ability to provide a differentiation of service based on the different performance characteristics of Fibre Channel and iSCSI (and NFS, but we’re focusing on block protocols for this discussion) is valuable, and thus the need to be able to easily present LUNs via either protocol from the same array to the same host is a worthwhile function. There are a number of potential use cases here—test/development environments, Tier 2 applications, varying SLAs, etc. This is especially true if you are using different disk pools (fast Fibre Channel drives or EFDs vs. slower SATA drives) on the same array.

I can see both sides of the coin. Personally, I tend to side more with the second viewpoint and would prefer to see the CLARiiON have the ability to easily present Fibre Channel and iSCSI to the same host, especially when multiple disk pools are involved. I think that CLARiiON engineering is now evaluating this possibility; as more information emerges, I’ll be sure to keep you posted.

Courteous and professional comments, clarifications, or corrections are always welcome!

Tags: , , , , ,

I recently had the opportunity to work on a proof of concept (PoC) in which we wanted to help a customer streamline the processes needed to deploy new hosts and reduce the amount of time it took overall. One of the tools we used in the PoC for this purpose was PXE booting VMware ESX for an automated installation. Here are the details on how we made this work.

Before I get into the details, I’ll provide this disclaimer: there are probably easier ways of making this work. I specifically didn’t use UDA or similar because I wanted to gain the experience of how to do this the “old fashioned” way. I also wanted to be able to walk the customer through the “old fashioned” way and explain all the various components.

With that in mind, here are the components you’ll need to make this work:

  1. You’ll need a DHCP server to pass down the PXE boot information. In this particular instance, I used an existing Windows-based DHCP server. Any DHCP server should work; feel free to use the Linux ISC DHCP server if you prefer.
  2. You’ll need an FTP server to host the kickstart script and VMware ESX 4.0 Update 1 installation files. In this case, I used a third-party FTP server running on the same Windows-based server as DHCP. Again, feel free to use a Linux-based FTP server if you prefer.
  3. You will need a TFTP server to provide the boot files. The third-party FTP server used in the previous step also provided TFTP functionality. Use whatever TFTP server you prefer.

Make sure that each of these components is working as expected before proceeding. Otherwise, you’ll spend time troubleshooting problems that aren’t immediately apparent.

Preparing for the Automated ESX Installation

First, copy the contents for the VMware ESX 4.0 Update 1 DVD—not the actual ISO, but the contents of the ISO—to a directory on the FTP server. Test it to make sure that the files can be accessed via an anonymous FTP user.

Also go ahead and create a simple kickstart script that automates the installation of VMware ESX. I won’t bother to go into detail on this step here; it’s been quite adequately documented elsewhere. You’ll need to put this kickstart script on the FTP server as well.

At this point, you’re ready to proceed with gathering the PXE boot files.

Gathering the PXE Boot Files

The first task you’ll need to complete is gathering the necessary files for a PXE boot environment.

First, copy the vmlinuz and initrd.img files from the VMware ESX 4.0 Update 1 ISO image. Since I use a Mac, for me this was a simple case of mounting the ISO image and copying out the files I needed. Linux or Windows users, it might be a bit more complicated for you. These files, by the way, are in the ISOLINUX folder on the DVD image.

Next, you’ll need the PXE boot files. Specifically, you’ll need the menu.c32 and pxelinux.0 files. These files are not on the DVD ISO image; you’ll have to download Syslinux from this web site. Once you download Syslinux, extract the files into a temporary directory. You’ll find menu.c32 in the com32/menu folder; you’ll find pxelinux.0 in the core folder. Copy both of these files, along with vmlinuz and initrd.img, into the root directory of the TFTP server. (If you don’t know the root directory of the TFTP server, double-check its configuration.)

You’re now ready to configure the PXE boot process.

Configuring the PXE Boot Environment

Once the necessary files have been placed into the root directory of the TFTP server, you’re ready to configure the PXE boot environment. To do this, you’ll need to create a PXE configuration file on the TFTP server.

The file should be placed into a folder named pxelinux.cfg under the root of the TFTP server. The filename of the PXE configuration file should be named something like this:

01-<MAC address of network interface on host>

If the MAC address of the host was 01:02:03:04:05:06, the name of the text file in the pxelinux.cfg folder on the TFTP server would be:

01-01-02-03-04-05-06

The PoC in which I was engaged involved Cisco UCS, so we knew in advance what the MAC addresses were going to be (the MAC address is assigned in the UCS service profile).

The contents of this file should look something like this (lines have been wrapped here for readability and are marked by backslashes; don’t insert any line breaks in the actual file):

default menu.c32
menu title Custom PXE Boot Menu Title
timeout 30
 
label scripted
menu label Scripted installation
kernel vmlinuz
append initrd=initrd.img mem=512M ksdevice=vmnic0 \
  ks=ftp://A.B.C.D/ks.cfg
IPAPPEND 1

You’ll want to replace ftp://A.B.C.D/ks.cfg with the correct IP address and path for the kickstart script on the FTP server.

Only one step remains: configuring the DHCP server.

Configuring the DHCP Server for PXE Boot

As I mentioned earlier, I used the Windows DHCP server as a matter of ease and convenience; feel free to use whatever DHCP server best suits your needs. There are only two options that are necessary for PXE boot:

066 Boot Server Host Name (specify the IP address of the TFTP server)
067 Bootfile Name (specify pxelinux.0)

In this particular example, I created reservations for each MAC address. Because the values were the same for all reservations, I used server-wide DHCP options, but you could use reservation-specific DHCP options if you wanted different boot options on a per-MAC address (i.e., per-reservation) basis.

The End Result

Recall that this PoC was using Cisco UCS blades. Thus, in this environment, to prepare for a new host coming online we only had to make sure that we had a PXE configuration file and create a matching DHCP reservation. The MAC address would get assigned via the service profile, and when the blade booted then it would automatically proceed with an unattended installation. Combined with Host Profiles in VMware vCenter, this took the process of bringing new ESX/ESXi hosts online down to mere minutes. A definite win for any customer!

Tags: , , , , ,