October 2007

You are currently browsing the monthly archive for October 2007.

Last year, I wrote an article about using NetApp Snapshots and LUN clones to enable the recovery on individual files within a VM.  This time around, I’d like to have a quick at that same process, but this time using NFS instead of block-level storage.

As I mentioned a couple of weeks ago, NFS is getting more and more attention as a key storage enabler for Virtual Infrastructure implementations.  I do still plan to conduct some tests of my own between iSCSI and NFS.  (Since they are both IP-based storage protocols, I figure that makes the playing field as level as possible.)  In any case, with regards to file-level recovery within VMs, NFS does possess at least one advantage.

Using any sort of clones (LUN clones or FlexClones) within VI3 currently requires resignaturing enabled, or else the ESX Servers don’t even see the clones.  While enabling resignaturing is not difficult (can be done via the command line or via VirtualCenter), it is not the default configuration and VMware appears not to recommend it (per the SAN Configuration Guide, pages 112 through 115).  With NFS, it’s only necessary to create a FlexClone and set up a new NFS mount; no other configuration is required.

By the same token, using NFS for file-level recovery within VMs also has one key disadvantage:  LUN clones are free, whereas the use of FlexClone requires a license.

With these advantages and disadvantages in mind, let’s have a look at the what the process would look like to recover files inside VMs using NFS for VM storage with NetApp Snapshots.

First, we’d review the list of available Snapshots using the snap list command, as shown below:

filer> snap list nfs_volume1
Volume nfs_volume1
working…
 
%/used %/total date name
———- ———- ———— ——–
0% ( 0%) 0% ( 0%) Oct 08 12:00 hourly.0
0% ( 0%) 0% ( 0%) Oct 08 08:00 hourly.1
0% ( 0%) 0% ( 0%) Oct 08 00:00 nightly.0
0% ( 0%) 0% ( 0%) Oct 07 20:00 hourly.2
0% ( 0%) 0% ( 0%) Oct 07 16:00 hourly.3
0% ( 0%) 0% ( 0%) Oct 07 12:00 hourly.4
0% ( 0%) 0% ( 0%) Oct 07 08:00 hourly.5
0% ( 0%) 0% ( 0%) Oct 07 00:00 nightly.1

Once we identify the Snapshot that contains the data we need to recover (based on the date/time of the Snapshot), we create a FlexClone using that Snapshot as its backing:

vol clone create nfs_volume1_clone -s file -b nfs_volume1 nightly.0

This creates a FlexClone named “nfs_volume1_clone” based on the nightly.0 Snapshot of the volume nfs_volume1.  If you immediately run the exportfs command, you’ll see that the new clone is already shared via NFS, too.

From here, the process is pretty straightforward:

  1. Create a new NFS datastore within VirtualCenter, using the new NFS mount as the destination.  This makes the data inside the FlexClone visible to the existing VMs.
  2. Add one of the VMDKs on the cloned NFS datastore to an existing VM as an additional hard drive.  You should be able to do this on the fly without shutting down the VM.
  3. Extract the files you need and place them back where you want them.

When you’re done recovering files, the clean-up process looks like this:

  1. Remove the VMDK(s) from the VM to which it/they was/were added.
  2. Remove the NFS datastore from VirtualCenter.
  3. Destroy the FlexClone using the vol offline and vol destroy commands.

Overall, this process is rather similar to the technique described using LUN clones, although a bit simpler because resignaturing is not required.

Tags: , , , , , ,

Various Odds and Ends

I was going through my list of flagged headlines in NetNewsWire and realized that I’d built up quite a list of articles that I intended to write something about.  Some of them just don’t merit a full-blown post, though, so I thought I’d just toss a bunch of them in here along with a brief sentence or two about them:

  • VMTN Discussion Forums: vdiskmanager GUI for OSX:  An enterprising Fusion user has written an OS X GUI for vdiskmanager, so that VMDKs on Fusion can be expanded or defragmented, or new virtual disks can be created.  I haven’t tried it yet, but it looks like it could be extremely useful, and it’s nice to see Fusion users creating useful utilities like this.
  • Running ESX 3i Beta in a VM with VMware Fusion:  Still thinking Fusion, this article discusses how a user managed to get ESX Server 3i (the beta version obtained at VMworld 2007) running as a VM under Fusion.  There’s also information on running it under Workstation 6 as well.
  • Tech: How to get the command line in ESX Server 3i beta:  Turns out ESX Server 3i has a command line after all, based on BusyBox.  Richard Garsthagen has more information about ESX 3i available at run-virtual.com.  Also see Eric Sloof’s info on boot options.
  • Storm Worm Botnet Attacks Anti-Spam Firms:  Is this botnet really as massive as everyone says?  I’ve been seeing so many articles about the Storm botnet, but I have yet to see (perhaps I haven’t looked hard enough yet) definitive information that describes the type of traffic these bots generate.  Surely there’s got to be something we can do about this.
  • Microsoft Updates Windows Without User Permission, Apologizes:  Oh, goodness—where do I start with this one?  Let’s just say that I’m glad I’m using Little Snitch, which catches this kind of outbound traffic that so easily slips through the Windows “firewalls” onto the Internet. Otherwise, I might be getting product updates without anyone bothering to tell me so.  (And perhaps it’s just me, but an apology from Microsoft doesn’t make me feel any more trusting of them.)
  • NFS vs iSCSI vs FC:  More information on why we should be interested in running VMware over NFS.

I guess that’s all for now, as it’s getting late and I have to get up in the morning and go to church.  Feel free to share any comments or corrections below.  Thanks for reading!

Tags: , , , , , , , ,

The Power of Quicksilver

So the other day I’m sitting in my office, working on my laptop, when my seven year old son came up to me and asked a question.  I don’t recall exactly what I was doing at the time (probably working on a blog post!), but as I frequently do when working in my office I was listening to some Christian music with iTunes (and GrowlTunes).  So that I could hear my son ask his question, I quickly pulled up Quicksilver and paused iTunes with only a few keystrokes.

“What did you just do?” my son asked.  Whatever question had been on his mind previously was now gone.

“I paused iTunes so that I could pay attention to your question,” I replied.

“How did you do that?”  His curiosity, naturally high anyway (he is a seven year old kid, after all), is really piqued now.

“I used Quicksilver,” I answered.

“What’s Quicksilver?”

I took a moment to show him how it worked.  He was completely hooked, and since that day has been bugging me incessantly to install Quicksilver on the Mac mini downstairs.  I haven’t yet installed it, mostly because I don’t want to give him any excuses to spend more time on the computer than he does already.  Another part of me, though, is intrigued by how naturally the idea behind Quicksilver seemed to come to him.  What is the mysterious attraction behind Quicksilver?  Is it truly so natural, so intuitive, that even young children seem to “get” it?  Or is it just “cool” to a seven year old, and that’s why he wants it?  Or is it a little of both?

Tags: ,

This article started life as something entirely different. I was reviewing some of the VMworld 2007 slide decks, looking for “nuggets of knowledge,” as I like to call them (these are the small details that are often far more significant than they might seem) when I came across some information on VMware HA isolation response. I was actually looking for something else but as is typically the case when you’re looking for something, you find everything but the one thing for which you’re looking.

In any event, I wanted to take some time to better understand isolation response, so I decided to perform some experiments in my lab with VMware HA and isolation response. For those that aren’t familiar with it, isolation response is the term used to describe what an ESX Server in a DRS/HA cluster will do if it loses connectivity to all the other servers in the cluster, i.e., if it becomes isolated. Isolation response is set on a per-VM basis, and the default (I believe) is to power off. What this means is that when an ESX host becomes isolated, it will power off the VMs that are currently running on that host.

There’s a great deal of debate as to whether this is the right setting or not, which I won’t really delve into right now. In any case, how does a host determine if it is isolated, or if the rest of the cluster is just down? That’s what got me started down this path. The VMware HA agent (which is really the Legato Automated Availability Manager, or AAM, agent—hence the AAMClient stuff in esxcfg-firewall) uses the Service Console’s default gateway as its isolation address. Basically, what this means is that if a host can’t get to any of the other hosts in the cluster and can’t get to the isolation address, then it assumes it is isolated and initiates the isolation response. If it can’t get to other nodes in the cluster but can reach the isolation address, then it is not isolated and should continue operation (perhaps even restarting some VMs locally since this would indicate host failures in the cluster).

The stuff I found in the VMworld 2007 slides talks of using a second isolation address, which provides the VMware HA agent with another means of verifying isolation before initiating the isolation response. Before I proceeded with setting this second address, however, I wanted to be sure I understood the operation of isolation response in the current configuration. Once I’d tested that and then tested the second isolation address, I was going to write it up here.

To make a long story not quite so long, I found that isolation response was not working as expected. What happened is that other hosts in the cluster would detect the “host failure” (the isolation of my test host) and try to restart the VM before the test host detected isolation and tried to shutdown the VM. This was evidenced by these lines in /var/log/vmkernel:

Oct 5 13:13:36 esx02 vmkernel: 38:20:29:34.025 cpu3:1305)WARNING: NFSLock: 1883: disk is being locked by other consumer
Oct 5 13:13:36 esx02 vmkernel: 38:20:29:34.025 cpu3:1305)NFSLock: 2479: failed to get lock on file vswim01-flat.vmdk 0×5a1b6a0 on 192.168.31.51 (192.168.31.51)

(Yes, I’m running my VMs on NFS. Yes, I did try iSCSI to see if the behavior was different. No, I did not try Fibre Channel. Yes, I got the same results in both cases.)

To make things even more interesting, I found that the test host failed to successfully shut down a Linux VM when the isolation response was finally triggered, but was able to successfully power down a Windows guest. Both VMs had the latest version of the VMware Tools installed.

Since that time, I’ve been combing the Internet searching for more information on the VMware HA agent, the AAM ftcli utility, behaviors, workarounds, configuration tweaks, etc. Thus far, it has been an abysmal failure. There are lots of VMware Community threads, but almost every one of those is a “double-check your DNS and /etc/hosts” thread.

So, any VMware gurus out there have some useful information to share? Anyone else having VMware HA problems? Anyone know where I can find some actually useful information on VMware HA and the AAM client? I’d love to get some more detailed information and be able to put this thing to rest (and be able to advise others on how to put it to rest as well).

Tags: , , , ,

VI 3.5 Expected in December

Virtualization.info broke the news that VMware is expected to announce next week that version 3.5 of Virtual Infrastructure will be generally available in December.  Of course, that’s not the only news released (new editions and such), but I’ll leave it to others to comment on those items.

I’m not surprised by Alessandro’s announcement, but I am concerned at the speed with which VMware is releasing this new update.  Let’s hope that the software quality doesn’t take a hit.

Clearly, VMware is feeling pressure from its up-and-coming competitors, namely Citrix (via XenSource and the Xen hypervisor) and Microsoft (with Windows Server Virtualization, aka Viridian).  In my opinion, Citrix is closer to being a real threat but the inevitable reorganization that occurs after an acquisition introduces delays that give VMware a bit more time to respond.  So, VMware needs to react quickly, before Citrix/XenSource can get their act together; hence, the (in my mind) accelerated release schedule.

(I know that it’s always been said that the next version would be available before the end of the year, so technically this isn’t really an “accelerated” release schedule.  It’s just that something about this expected announcement gives me the impression that it’s coming earlier than anyone really thought.)

On the other hand, VMware has done a reasonably good job of delivering quality products that actually work (and work well), and with the relatively hefty set of new features slated for inclusion in the next version, I hope that VMware can maintain the software quality.  A misstep in software quality at this point in the game, with Citrix hot on their heels and Microsoft looming on the horizon, could be disastrous.  The last thing VMware needs is a major flaw in VI 3.5 or one of its major new features.  That would do almost irreparable damage to the company’s reputation.

Unfortunately, I’ve been unable to get involved in the beta program, so I have no idea of whether VMware is on track with software quality or not.  (Of course, even if I were on the beta program, I wouldn’t be allowed to comment about it anyway, but that’s another story…)

Tags: , ,

The Sanrad V-Switch is a useful device that offers a number of features, including (but not limited to) storage virtualization, iSCSI-to-FC connectivity, snapshots, and data migration services.  It’s actually a pretty handy little device.  I had the opportunity to spend the day today with a V-Switch 2000, an entry-level model, connected in front of a small Fibre Channel array.  I thought it might be handy to include some configuration commands here in the event that someone else needed them (to be honest, the Sanrad documentation is horrible).

Overall, the process for configuring a V-Switch looks something like this:

  1. Physically connect the V-Switch to the Ethernet network and the Fibre Channel fabric(s).
  2. Set the IP configuration for the Ethernet interfaces on the V-Switch.
  3. Create an iSCSI portal and an iSCSI target on the V-Switch.
  4. Present FC storage to the V-Switch.
  5. Create volumes on the V-Switch.
  6. Present the volumes as iSCSI LUNs.

And that’s it!  Once you get used to the interface (I used the command line interface because…well, I prefer the command line to the GUI).  Let’s walk through those steps real quick.

Physically Connect the V-Switch

The V-Switch gets connected via Gigabit Ethernet and via Fibre Channel.  Physically connecting the V-Switch is as simple as plugging in the Ethernet and Fibre Channel ports to the appropriate switches.  It’s probably also going to be necessary to modify your Fibre Channel zones as well, so that the V-Switch can see any Fibre Channel targets on the fabric(s).

Configure IP

The V-Switch comes with two Gigabit Ethernet interfaces.  To assign IP address(es) to the interface(s), use the ip config set command, like this:

ip config set -ip 10.2.3.45 -if eth1 -im 255.255.255.0

Verify the IP address has been added using ip config show command, as well as by pinging the new IP address from another system on the same subnet.

By the way, the system comes with a pre-assigned IP address (I believe it is 10.11.12.123) which can’t be removed.  At least, I couldn’t figure out how to remove or change it.

Create iSCSI Portal and iSCSI Target

Next, we create an iSCSI portal (IP address and listening TCP port) on the assigned IP address:

iscsi portal create -ip 10.2.3.45 -p 3260

TCP port 3260 is the default, so you could omit that if desired.  The iscsi portal show command will verify the creation of the iSCSI portal.

Once the iSCSI portal is created, create an iSCSI target with the iscsi target create command, like this (I used a backslash to denote line continuation):

iscsi target create -ta vmtarget \
-tn iqn.2000-04.com.sanrad:vswitch01

We can verify the creation of the iSCSI target using iscsi target show.  At this point, iSCSI initiators will see the iSCSI target, but won’t see any LUNs, because we haven’t connected and configured the back-end storage yet.  That’s next.

Present FC Storage to the V-Switch

The exact procedures and commands here will vary based on the back-end storage array.  Create the volumes, disk groups, LUNs, RAID groups, etc., using the standard tools and commands provided by the back-end storage array.  Present those storage containers to the V-Switch using the V-Switch’s WWNN (which can be displayed using the fc interface show command).

Once that’s done, the V-Switch should discover it automatically.  To show the storage that the V-Switch has discovered, use this command:

storage show

This will show disks and controllers with very unimaginative (and undescriptive) names like “Stor_10”.  To keep things logicaly and organized, rename the storage to something that more closely identifies what it is.  In this example, we’ll identify the storage with a logical drive identifier and an array identifier:

storage set -s Stor_15 -na LD09-Ar01 -info Log drv 9, Array 1
storage set -s Stor_6 -na LD00-Ar00 -info Log drv 0, Array 0

With the storage now recognized by the V-Switch and labeled as something recognizable (and traceable back to the storage array itself), we can create a couple of simple volumes:

volume create simple -vol simple00 -d LD00-Ar00
volume create simple -vol simple05 -d LD05-Ar01

Simple volumes are the building blocks for more complex structures like mirrored volumes or striped volumes.  With a couple of simple volumes created, we can create a mirrored volume from these simple volumes:

volume create mirror -vol mirror01 -ch simple00 -ch simple05

Verify the creation of the volumes (simple or mirrored) with the volume show command.

Let’s create a few more simple volumes:

volume create simple -vol simple01 -d LD01-Ar00
volume create simple -vol simple02 -d LD02-Ar00
volume create simple -vol simple06 -d LD06-Ar01
volume create simple -vol simple02 -d LD07-Ar01

Then, using these four simple volumes, let’s create a striped volume:

volume create stripe -vol stripe01 -nbc 4 \
-sus 64 -ch simple01 -ch simple02 -ch simple06 \
-ch simple07

Again, we can use the volume show command to list the volumes that have been created.  However, our job is not yet done.  We have to present these volumes as LUNs to the iSCSI initiators.

Present iSCSI LUNs

The volumes have been created from back-end storage, but now we need to expose the volumes to iSCSI initiators on the iSCSI target created earlier.  We do that with this command:

volume expose -vol mirror01 -ta vmtarget

This exposes the volume named “mirror01” on the iSCSI target named “vmtarget” (created earlier).  That target is accessible on the iSCSI portal also created earlier.

You can verify or show that the volume is exposed as a LUN on the iSCSI target using the lu show command.

Assuming that it is properly exposed, then you should be able to now see the exposed LUN via your iSCSI initiator (software or hardware).

The V-Switch also has some other advanced functionality, like subdisks (carving up LUNs from back-end storage into smaller pieces), snapshots, and data replication/migration.  I hope to get the opportunity to discuss those features more in the future.

Tags: , ,

Exciting New Opportunity

I’m extremely excited about a new opportunity that has just recently opened up for me.  I’ve been offered the opportunity to do some writing for a VMware-focused web site that will be launching shortly.  I can’t disclose the name of the site or anything like that (at least, not as far as I know), but I’ll be sure to let everyone know when the site goes live.

I will say that this site is being launched by a fairly well-known organization that already operates a number of very popular sites, so it’s not like this is some fly-by-night operation.  I’m very honored to have been offered this opportunity, and writing is something that I truly enjoy.  In fact, getting the opportunity to write—even if only for my own benefit—was one of the reasons I started this site almost two and a half years ago.  As it turns out, some of the information I’ve been able to share has been helpful to others as well, and that makes it even better.

I do not anticipate that writing for the new site will negatively impact this weblog.  I plan to continue to do my best to provide content and information centered around virtualization, servers, storage, and life as a systems engineer.  I’m sure that the occasional personal post will show up now and then as well.

So stay tuned, and when the new site goes live I’ll hopefully be able to provide some links to some of my work.

Tags:

Newer entries »