blog.scottlowe.org

The weblog of an IT pro specializing in virtualization, storage, and servers

Archive for Articles Tagged Snapshots

Storage Array Snapshots with VMware

June 4th, 2008 by slowe

A new article of mine has been published on SearchVMware.com! This article discusses the use of storage array snapshots with VMware, specifically focusing on ensuring that storage array snapshots are consistent and usable:

When used in conjunction with a VMware infrastructure, storage array-based snapshots are touted for their ability to create point-in-time pictures of virtual machines (VMs) for business continuity, disaster recovery and backups. While this can be true, it’s important to understand how virtualization affects storage array snapshot use. Incorrect usage can render storage array snapshots unreliable and generally defunct.

The article provides a few guidelines on making sure that storage array snapshots are usable. Keep in mind, too, that some storage array vendors have applications that are specifically designed to help with this particular issue. NetApp, for example, has SnapManager for Virtual Infrastructure; this product is specifically designed to address this problem (among other problems). I would imagine that other vendors also offer a software solution to this problem, but I’m not particularly familiar with those. I’d love to hear from readers as to their experience or knowledge with any such software solutions.

Category: Virtualization, Storage | 10 Comments »

NetApp OSSV with VMware ESX Server

May 29th, 2008 by slowe

A couple of days ago my good friend Nick Triantos—formerly of Storage Foo fame, now writing at the Storage Nuts & Bolts Blog—published a short piece of a new version of Open Systems SnapVault (OSSV) that offers new functionality when used with VMware Infrastructure 3 (VI3).

For those that aren’t familiar with OSSV, it’s basically a way to bring NetApp-style snapshots to non-NetApp storage. For example, it’s pretty common to use OSSV to take snapshots of a server that is not attached to a NetApp SAN and store those snapshots on a NetApp SAN. Think of a remote office/branch office kind of scenario, for example, where the branch server can be easily backed up using incremental snapshots to a NetApp storage system in the main location. It’s pretty handy technology, to be honest.

After reading Nick’s piece on OSSV, one question popped in my mind: is he talking about running OSSV on the guest, or running OSSV on the ESX host? So I contacted Nick, obtained clarification, and wanted to post that clarification here. (BTW, thanks, Nick, for taking the time to answer my questions and allowing me to discuss this here.)

Nick’s article primarily focuses on the use of OSSV at the ESX level, running within the ESX Console OS (or Service Console). In this respect, OSSV will be taking snapshots of the VMDK files via the Service Console and replicating them over to a NetApp storage system. As with running OSSV in other environments, the snapshots are block-level incremental snapshots and thus only the changed blocks will be replicated across the wire when a snapshot is taken and shipped off to the destination storage system.

What’s apparently new here is that NetApp has optimized the OSSV code so that the initial baseline will only capture the utilized portions of a VMDK file. Keep in mind that ESX, by default, uses thick-provisioned disks, so that a 30GB virtual disk takes up 30GB on the physical storage. OSSV 2.6, the new version, understands that if only 10GB of that 30GB VDMK is actually being utilized, it will only replicate 10GB of data. That is really important in reducing the overhead required for the initial baseline transfer. Thereafter, OSSV operates as usual by capturing and replicating only changed blocks.

This is a nice addition to OSSV, and it greatly increases the usefulness of OSSV in VI3 environments. But there is a drawback…has anyone figured it out yet?

If you guessed that taking block-level incremental snapshots of the VMDK files via the Service Console means that we lose file-level granularity within the guest file system, you would be correct! What does this mean? Basically, it means that if you have to restore a VMDK from an OSSV snapshot, you have to restore the entire VMDK. You can’t restore individual files within a guest from an OSSV snapshot taken while OSSV is running at the ESX layer.

Before you start knocking OSSV as worthless, however, consider that VCB suffers the same limitation when creating image-level backups. Also keep in mind that file-level backups are only possible with VCB when the guest is running Microsoft Windows, so you’re forced to use image-level backups with any other operating system. Also keep in mind that file-level backups with VCB are slower than image-level backups. With these considerations in mind, it becomes clearer that the limitation is not inherent to OSSV per se, but rather a limitation of technologies operating at the ESX layer.

Of course, there are a number of workarounds to this; one way is to attach the restored VMDK to a different VM, then pull the individual files out that are needed. If you do need file-level granularity from within the guest OS (such as the ability to quickly and easily restore a specific file within a guest), then you can always run OSSV inside the guest and replicate block-level incremental snapshots from the guest over to a target NetApp storage system. Just be sure to keep these distinct procedures in mind as you plan backups of VMs using OSSV.

As always, I encourage you to ask questions, make comments, or add your thoughts below.

UPDATE: Keep in mind that OSSV is VMotion aware, meaning that block-level incrementals are preserved even after a VMotion operation. This is true even within DRS/HA clusters; you just have to install the OSSV agent on all ESX servers within the cluster. The real question: what effect will Storage VMotion have?

Category: Virtualization, Storage | 1 Comment »

Virtualization Short Take #8

May 26th, 2008 by slowe

It’s that time again, friends, time for another Virtualization Short Take!

  • OpenSolaris on Fusion: As expected, Solaris/OpenSolaris fans are experimenting with OpenSolaris on Fusion. Apparently, it runs rather well.
  • Brian Madden had an interesting thought about Thinstall (now ThinApp) plus WINE to eliminate Windows. In the end, Brian feels like many companies will just want to deal with the larger vendors, and won’t be willing to support this kind of “cobbled together” solution. The idea of using ThinApp on WINE on a non-Windows operating system is a pretty cool idea, but it may be a bit early for its time.
  • Microsoft Hyper-V made it to RC1, apparently ahead of schedule. I wonder if they will try to make RTM in time for TechEd in Florida in June? In addition, Microsoft also released information about how they are “eating their own dog food” and using Hyper-V for the MSDN and TechNet web sites.
  • Citrix has released XenDesktop 2.0, their VDI solution. Alessandro has a fairly complete breakdown of the components involved in the solution and the various editions under which it will be released. A lot of these components are pre-existing products that are being rebundled into XenDesktop; XenApp (Presentation Server) and Provisioning Server (Ardence) are two examples. VMware came out with a competitive response almost immediately, and Gareth dissected that response on DABCC. Having not actually installed XenDesktop yet, I don’t know how integrated—or not integrated—the various components are, so I’ll reserve judgment until later. I have my beefs with VDM; in particular, I don’t like how it mandates VM provisioning in order to use pools. I hope that Leostream’s removal of their P>V product as reported by Alessandro doesn’t portend dark days for Leostream.
  • According to Tony Asaro at Virtual Iron, Citrix’s release of XenDesktop signals the beginning of a “shift” in focus from server virtualization to desktop virtualization. One must consider this comment in the context of who is providing the comment; Virtual Iron is, of course, a competitor in the server virtualization market whose product is also based on the Xen hypervisor. Besides, even if that is true, so what? Citrix has made an existence out of focusing on client-side application delivery. This would be completely logical, in my mind, and would allow Citrix to focus on an area where they are strong instead of competing in a market where they are weak.
  • Lou Springer brings us a method of connecting to a VM’s console using VNC over SSH from Mac OS X. I’d seen references to using this with VMware Server, but didn’t know that it worked with VI3. Thanks, Lou! (Lou’s trick was based on information from this VMware KB article, by the way.)
  • From IPMer, here’s some information on using VMware Converter to assist with VM snapshots. This was picked up by Rich over at VM /ETC and also included in the first-ever VMware Communities Roundtable podcast (which I’ve downloaded but not yet had the opportunity to actually review yet).

That’s it for today. I hope that everyone has a great Memorial Day. Don’t forget to thank a veteran or active serviceman/servicewoman for your freedom!

Category: Microsoft, Virtualization | No Comments »

Keeping Thin VMDKs Using NetApp SnapRestore

April 9th, 2008 by slowe

A short while ago, I discussed that VMDKs on NFS may start out thin provisioned, but will lose that thin provisioned status over time. Operations like cloning and Storage VMotion will cause these thin provisioned disks to become thick (fully provisioned) disks, and you lose one of the benefits of running VMware on NFS.

Fortunately, if you’re running a NetApp storage system as your NFS server, you can preserve the thin provisioned status of these VMDKs by leveraging NetApp’s single file SnapRestore functionality. This article describes how that works.

There’s a couple of caveats here:

  1. This technique only helps with making new VMs from an existing VM. SnapRestore won’t help preserve thin provisioned status after a Storage VMotion operation.
  2. This isn’t integrated with VirtualCenter, so you won’t be able to take advantage of VirtualCenter’s integration with Sysprep and such.

As a result of #2 above, then, you’ll need to first prepare your source VM by running Sysprep inside the VM (assuming it is a Windows-based VM) and then allowing Sysprep to shut down the VM. Once that’s accomplished, then you can proceed.

The first step is to take a snapshot of the volume containing the already prepared VM:

snap create <vol-name> <snapshot-name>

Next, create a new VM in VirtualCenter, but do not create a virtual disk for the VM. This will create the VM configuration and associated files and the directory on the NFS datastore.

Third, run a SnapRestore operation to restore both the .vmdk file and the -flat.vmdk files. You have to restore both in order for this to work; keep in mind that the .vmdk file is just a header file and the -flat.vmdk is the actual disk file. The commands would look something like this:

snap restore -t file -s <snapshot-name> -r <new filename and path> <original filename and path>

As an example, let’s say you had a VM named template01 and you wanted to clone the disks for template01 to a new VM called newvm01, and these are stored on a volume called nfsvol. After you’ve run Sysprep on template01, taken the Snapshot and called it base_snapshot, and created newvm01 without a virtual disk, you’d run this command:

snap restore -t file -s base_snapshot -r /vol/nfsvol/newvm01/newvm01.vmdk /vol/nfsvol/template01/template01.vmdk

That would restore the .vmdk (header) file; then you’d restore the actual virtual disk file:

snap restore -t file -s base_snapshot -r /vol/nfsvol/newvm01/newvm01-flat.vmdk /vol/nfsvol/template01/template01-flat.vmdk

Once this process is complete—and it may take some time depending upon the size of the files being restored—you should see both the .vmdk and the -flat.vmdk files listed in the Datastore Browser.

“But wait a minute, Scott,” you say. “The -flat.vmdk no longer looks thin provisioned. You lied! This process doesn’t work.”

Indeed, it will look like it is no longer thin provisioned. Trust me; there’s one more step and then all will make sense. If you log into the ESX Server and open the .vmdk file in vi, you’ll see that it references the old file name of the -flat.vmdk. Edit that to reflect the new, restored file name, save your changes, and go back to the Datastore Broswer again. Refresh the display, and all should be well.

Why does the Datastore Browser work that way? Beats me. You’ll also find that running an “ls -la” on an NFS datastore from the Service Console will show you -flat.vmdk files that appear to be thick provisioned. The only way I’ve found to see if they are thin provisioned is to use the Datastore Browser. It’s a VMware thing, I suppose.

The last and final step is to edit the settings for the VM you created earlier and add the new virtual disk to that VM. Then you can boot up that VM and proceed with whatever customization steps, if any, are needed.

In the near future I plan to test another possible method of preserving the VMDK’s thin provisioned status, a method that is storage agnostic. Look for details of that testing here.

Category: Virtualization, Storage | 7 Comments »

NetApp Snapshot Support

April 1st, 2008 by slowe

For readers that didn’t already know this, Snapshots on NetApp-hosted CIFS shares are automatically and transparently recognized by Windows versions that support the “Previous Versions” tab. I believe this functionality is native on Windows Server 2003 and an add-on for Windows XP, but I’m not 100% certain. Either way, this means that users can easily recover files from NetApp Snapshots from within the Windows GUI.

Here’s a screenshot of the Previous Versions support on an out-of-the-box server running Windows Server 2003 R2:

NetApp Snapshots as Previous Versions

As you can see in the screenshot, the Previous Versions tab automatically shows the NetApp Snapshots. No configuration is required; this is after a vanilla installation of Windows Server 2003 R2 and all applicable updates from Windows Update.

To NetApp veterans, this is nothing new, but I thought some new NetApp users out there might find information about this functionality and integration useful.

Category: Microsoft, Storage | 3 Comments »

Virtualization Short Take #4

March 14th, 2008 by slowe

Once again, here’s my take a few virtualization-related stories that have passed through my computer in the last few days:

  • OK, this first one isn’t technically related to virtualization, but it was too good to pass up. Is there anyone besides me and The Register who thinks NetApp’s new logo is…um…well, not as good as the previous one?
  • A new blog war is brewing between VMware and Citrix, and this time I had nothing to do with it: VMware apparently launched the first volley in discussing the value of ESX Server’s memory overcommitment and page sharing functionality; Citrix’s Roger Klorese then responded and Simon Crosby chimed in as well. I would completely agree with Roger’s and Simon’s comments, except for this one statement in Eric’s original post:

    We created and powered on 512MB Windows XP VMs running a light workload [emphasis mine] and kept adding them until the server couldn’t take any more.

    Since Eric stated the parameters of the test involved lightly loaded workstations, Roger’s comments about heavy workloads don’t apply. Besides, any engineer worth his/her weight isn’t going to overcommit a production workload like that, and this analysis shows that some overcommitment can produce notable financial results.

  • CIO Magazine recently published a list of 10 virtualization risks hiding in your company. It’s a pretty interesting list, although it’s worthwhile to note that this list was produced by a VP of Marketing for Embotics and therefore is heavily slanted toward the risks that his company’s products can help mitigate.
  • This is interesting and novel, but that’s about it. (UPDATE: The creator of the 37migrations VI plugin, Schley Andrew Kutz, wrote me to state that there is no point in 37migrations; it’s just for fun. So stop trying to find a deeper meaning in it, OK?)
  • There’s apparently a problem with using Sysprep in VirtualCenter 2.5 with Windows Server 2003 SP2. A Microsoft hotfix is available.
  • Speaking of NetApp, they’ve been generating some buzz around their SnapManager for Virtual Infrastructure (SMVI) product, yet another unreleased product. I echo Duncan’s thoughts about the VC plugin!
  • Gabe shares some information he’s gathered about VMsafe, the recently announced security APIs from VMware.
  • Alessandro shares his thoughts about Microsoft’s virtualization strategy following the announcement of Microsoft’s purchase of Kidaro. My question is this: was VMware’s announcement of offline VDI functionality at VMworld Europe 2008 because they had an inkling of Microsoft’s moves, or is Microsoft’s purchase a result of VMware’s announcement?

That’s it for today. Join in the discussion by adding your 2 cents in the comments below!

Category: Security, Microsoft, Virtualization, Storage | 9 Comments »

Recent Virtualization Links

January 20th, 2008 by slowe

Over the last few weeks, I’ve been collecting various virtualization-related links in NetNewsWire’s Flagged Items collection, with the intention of blogging about them, bookmarking them, or both.  With time a bit short recently—let’s just say that life is really, really busy right now—I decided to just condense a bunch of them here with a brief commentary, where applicable, for each.  Hopefully some of this information will prove useful to some readers here.

  • ESX Host Currently Has No Management Network Redundancy Error:  This is new to ESX Server 3.5; VMware HA reports a warning when it detects that there is no redundancy for the Service Console.  Clearly, this is an attempt to prevent situations where isolation response kicks in, and as the author points out can be mitigated by adding another NIC to the vSwitch where the Service Console port group is located.  I have also found that creating a second Service Console port group on another vSwitch will also remove the warning.  Duncan of Yellow Bricks also goes into more detail on Service Console redundancy on his blog as well.
  • ESX “Configuring for HA” errors - What to do?:  VMware HA continues to be a sore spot, as Rick Vanover discusses here.  One useful tidbit of information from this article is the suggestion to go directly to the VPX_EVENT table of the VirtualCenter database to look for troubleshooting information.  Rick’s right—VirtualCenter’s error messages with regards to VMware HA are often totally useless.
  • How to Use the Remote Command-line Interface to Invoke Storage Vmotion in Windows Server or Desktop:  Jack’s off to a great start to his blog at VMware World with a lot of very relevant and very useful information.  This article on using the RCLI to do Storage VMotion can come in handy at times, until you get the hang of it.  On a related note, Duncan hits us up with some information on useful add-ons for Storage VMotion.
  • Virtual Machine High Availability:  Still listed as an “experimental” feature in VI3 version 3.5, if I recall correctly, Virtual Machine HA uses heartbeats from the VMware Tools inside a guest to try to determine if a guest has failed.  Anyone out there doing more than just experimenting with this?
  • Delete all snapshots:  For those end users that don’t work with snapshots, this article is a must read.
  • VMotion Is Disabled After ESX Server Upgrade:  This can be handy if you were wondering why VMotion suddenly stopped working after the upgrade to ESX Server 3.5.
  • Migration will cause the virtual machine’s configuration to be modified:  It’s still not clear exactly why VirtualCenter is making some changes to virtual machines during a live migration.  Duncan’s explanation about virtualized MMU and paravirtualization support in ESX Server 3.5 makes sense, but what about the commenter’s issue with a migration from ESX Server 3.0.1 to ESX Server 3.0.2?  That doesn’t seem to make any sense, especially on identical hardware.

Anyone with additional information on any of these topics is invited to speak up in the comments.

Category: Networking, Virtualization | 1 Comment »

Managing LUN Space Requirements with NetApp Storage

December 5th, 2007 by slowe

If you’ve worked with Network Appliance storage before, you’re probably already familiar with the idea of snap reserve (storage space set aside to accommodate for Snapshots) and fractional reserve (used with LUNs).  I’m going to hold the in-depth discussion of why you need snap reserve and fractional reserve for a different day, but I did want to pass on these commands that were shared with me by a colleague of mine.  These Data ONTAP commands, available with Data ONTAP 7.2 or later (some commands are available in Data ONTAP 7.1), will help you manage the space requirements for LUNs on a NetApp storage area network (SAN).

I’ll try to explain the commands along the way, but I would recommend you review the documentation available from the NOW site for more complete information.

vol options <volname> fractional_reserve 0

This command sets the fractional reserve to zero percent, down from the default of 100 percent.  Note that fractional reserve only applies to LUNs, not to NAS storage presented via CIFS or NFS.

snap autodelete <volname> trigger snap_reserve

This sets the trigger at which Data ONTAP will begin deleting Snapshots.  In this case, Snapshots will start getting deleted when the snap reserve for the volume gets nearly full.  The current size of the snap reserve can be viewed for a particular volume with the “snap reserve <volname>” command.

snap autodelete <volname> defer_delete none

This command instructs Data ONTAP not to exhibit any preference in the types of Snapshots that are deleted.  Options for this command include “user_created” (delete user-created Snapshot copies last) or “prefix” (Snapshot copies with a specified prefix string).

snap autodelete <volname> target_free_space 10

With this setting in place, Snapshots will be deleted until there is 10% free space in the volume.

snap autodelete <volname> on

Now that the Snapshot autodelete options have been configured, this command will actually turn the functionality on.

vol options <volname> try_first snap_delete

When a FlexVol runs into an issue with space, this option tells Data ONTAP to first try to delete Snapshots in order to free up space.  This command works in conjunction with the next command:

vol autosize <volname> on

This enables Data ONTAP to automatically grow the size of a FlexVol if the need arises.  This command works hand-in-hand with the previous command; Data ONTAP will first try to delete Snapshots to free up space, then grow the FlexVol according to the autosize configuration options.  Between these two options—Snapshot autodelete and volume autogrow—you can reduce the fractional reserve from the default of 100 and still make sure that you don’t run into problems taking Snapshots of your LUNs.

If you have a NOW login, you can get more information on Snapshot autodelete here; more information on volume autogrow is available here.  Be aware that SnapDrive may require different settings in order to accommodate its functionality, as it moves LUN management out of the storage system and onto the host.  Finally, the values presented here are only examples; be sure to use values that are appropriate for your environment.

Credit for compiling this list goes to my colleague Chauncey Willard.  Good work!

Category: Storage | 5 Comments »

Nifty NFS-VMware Trick

September 27th, 2007 by slowe

I can take absolutely zero credit for this idea; it came completely from this aticle by Nick Triantos.  But the trick is so absolutely cool, so incredibly useful, and yet so obvious (once you read it, you’ll smack yourself in the head and say, “Why didn’t I think of that?”) that I just had to say something about it.

The use of NFS is getting more and more attention (I blogged about it briefly a few days ago) as a primary storage technology for VMware deployments.  Although NFS lacks the raw throughput of Fibre Channel, once you start loading up VMs in a datastore NFS begins to look more and more attractive.  But performance is only part of the allure here, especially when using something like a Network Appliance storage system with its Snapshot functionality.  (Yes, other vendors can do the same kinds of things.  Substitute your favorite vendor or filesystem here, if you so desire.  I would imagine you could do something similar with ZFS.)

The basic gist of the article (I do encourage you to go read it; I’ve already added it to my del.icio.us bookmarks) is to use NetApp Snapshots to gain access to VMware’s VMDK files (even while the VM is running), and Linux with the Linux-NTFS driver to mount virtual machine disk files over NFS for file-level backups of both Windows and Linux guest VMs.  Now that’s something not even VCB can do (VCB file-level backups are limited to Windows guests).  Pretty cool, if you ask me.

Category: Virtualization, Storage | 11 Comments »

LUN Clones vs. FlexClones

May 21st, 2007 by slowe

My recent article on how to provision VMs using FlexClones prompted a reader to ask the question, “What about using LUN clones?”  That’s an excellent question, and one that I myself asked when I first started using some of the advanced functionality of Network Appliance storage systems.  I had expected that this question would come up, and so I’d already begun preparing an article discussing LUN clones vs. FlexClones.  My thanks go to Aaron for prompting the discussion!

LUN clones and FlexClones share a lot of similarities:

  • Both LUN clones and FlexClones are built on top of the Snapshot functionality resident within Data ONTAP, the OS that runs on Network Appliance storage systems.
  • Both LUN clones and FlexClones are space conservative, meaning the clones only take up as much space as required to store changes from the original.
  • Both LUN clones and FlexClones can be created in seconds, and the size of the LUN (or FlexVol) does not significantly impact the time required to create the clone.

The key disadvantage to using LUN clones comes as a result of an interaction between how WAFL (the file system used by Data ONTAP) handles LUNs, and how Snapshots are performed and managed.

From WAFL’s perspective, a LUN is really nothing more than a single file on the file system.  You can see this by browsing via CIFS or NFS to a FlexVol that contains a LUN:

[macosx:/Volumes/vol02$] slowe% ls -la
total 25165856
drwx------   1 slowe  admin        16384 Dec 31  1969 .
drwxrwxrwt   8 root   admin          272 May 21 11:47 ..
-rwx------   1 slowe  admin  12884901888 May 21 11:48 vswex02_vmfs

If I were to enable the .snapshot (or ~snapshot) directory, we’d actually be able to see Snapshots of the LUN within that directory.  In fact, this NOW (NetApp on the Web; login required) article describes mounting LUN snapshots inside the .snapshot (or ~snapshot) as a way of recovering files or folders inside a snapshot.  This technique is also applicable to recovering VMs from a LUN snapshot.

“OK,” you may be saying, “so LUNs are implemented and managed as files.  What’s your point?”

My point is that Snapshots are handled per volume, and capture all the data in the active filesystem.  A LUN exists as a file in the filesystem, so a Snapshot will capture that.  When you create a LUN clone, you will then create another file in the active filesystem, which subsequent Snapshots will then capture.  The end result is that you can end up with Snapshots that cannot be deleted because they reference a LUN clone which is, in turn, backed by another Snapshot.  In these cases, you won’t be able to delete Snapshots until you delete the LUN clone and all the Snapshots that reference that LUN clone.  This blog posting discusses this very problem and provides a Data ONTAP command to help track down the dependencies.  (I’m also told that there is a NOW article on this problem as well, but I was unable to locate it.)

For short-term scenarios, LUN cloning works well and is, as some have pointed out, free (FlexClone requires a separate, paid license).  For longer-term storage scenarios, however, LUN clones and the dependencies introduced by subsequent Snapshots of those LUN clones mean that FlexClones are a better solution.  Since FlexClones are entire volumes stored within an aggregate, they aren’t subject to the same problems as LUN clones (which are stored within a volume).

I hope this helps clear up some of the differences between using LUN clones and using FlexClones.  Add your comments or questions below.

Category: Storage | 4 Comments »