Snapshots

You are currently browsing articles tagged Snapshots.

Over the last few weeks, I’ve been collecting various virtualization-related links in NetNewsWire’s Flagged Items collection, with the intention of blogging about them, bookmarking them, or both. With time a bit short recently—let’s just say that life is really, really busy right now—I decided to just condense a bunch of them here with a brief commentary, where applicable, for each. Hopefully some of this information will prove useful to some readers here.

  • ESX Host Currently Has No Management Network Redundancy Error: This is new to ESX Server 3.5; VMware HA reports a warning when it detects that there is no redundancy for the Service Console. Clearly, this is an attempt to prevent situations where isolation response kicks in, and as the author points out can be mitigated by adding another NIC to the vSwitch where the Service Console port group is located. I have also found that creating a second Service Console port group on another vSwitch will also remove the warning. Duncan of Yellow Bricks also goes into more detail on Service Console redundancy on his blog as well.
  • ESX “Configuring for HA” errors – What to do?: VMware HA continues to be a sore spot, as Rick Vanover discusses here. One useful tidbit of information from this article is the suggestion to go directly to the VPX_EVENT table of the VirtualCenter database to look for troubleshooting information. Rick’s right—VirtualCenter’s error messages with regards to VMware HA are often totally useless.
  • How to Use the Remote Command-line Interface to Invoke Storage Vmotion in Windows Server or Desktop: Jack’s off to a great start to his blog at VMware World with a lot of very relevant and very useful information. This article on using the RCLI to do Storage VMotion can come in handy at times, until you get the hang of it. On a related note, Duncan hits us up with some information on useful add-ons for Storage VMotion.
  • Virtual Machine High Availability: Still listed as an “experimental” feature in VI3 version 3.5, if I recall correctly, Virtual Machine HA uses heartbeats from the VMware Tools inside a guest to try to determine if a guest has failed. Anyone out there doing more than just experimenting with this?
  • Delete all snapshots: For those end users that don’t work with snapshots, this article is a must read.
  • VMotion Is Disabled After ESX Server Upgrade: This can be handy if you were wondering why VMotion suddenly stopped working after the upgrade to ESX Server 3.5.
  • Migration will cause the virtual machine’s configuration to be modified: It’s still not clear exactly why VirtualCenter is making some changes to virtual machines during a live migration. Duncan’s explanation about virtualized MMU and paravirtualization support in ESX Server 3.5 makes sense, but what about the commenter’s issue with a migration from ESX Server 3.0.1 to ESX Server 3.0.2? That doesn’t seem to make any sense, especially on identical hardware.

Anyone with additional information on any of these topics is invited to speak up in the comments.

Tags: , , , , ,

If you’ve worked with Network Appliance storage before, you’re probably already familiar with the idea of snap reserve (storage space set aside to accommodate for Snapshots) and fractional reserve (used with LUNs). I’m going to hold the in-depth discussion of why you need snap reserve and fractional reserve for a different day, but I did want to pass on these commands that were shared with me by a colleague of mine. These Data ONTAP commands, available with Data ONTAP 7.2 or later (some commands are available in Data ONTAP 7.1), will help you manage the space requirements for LUNs on a NetApp storage area network (SAN).

I’ll try to explain the commands along the way, but I would recommend you review the documentation available from the NOW site for more complete information.

vol options <volname> fractional_reserve 0

This command sets the fractional reserve to zero percent, down from the default of 100 percent. Note that fractional reserve only applies to LUNs, not to NAS storage presented via CIFS or NFS.

snap autodelete <volname> trigger snap_reserve

This sets the trigger at which Data ONTAP will begin deleting Snapshots. In this case, Snapshots will start getting deleted when the snap reserve for the volume gets nearly full. The current size of the snap reserve can be viewed for a particular volume with the “snap reserve <volname>” command.

snap autodelete <volname> defer_delete none

This command instructs Data ONTAP not to exhibit any preference in the types of Snapshots that are deleted. Options for this command include “user_created” (delete user-created Snapshot copies last) or “prefix” (Snapshot copies with a specified prefix string).

snap autodelete <volname> target_free_space 10

With this setting in place, Snapshots will be deleted until there is 10% free space in the volume.

snap autodelete <volname> on

Now that the Snapshot autodelete options have been configured, this command will actually turn the functionality on.

vol options <volname> try_first snap_delete

When a FlexVol runs into an issue with space, this option tells Data ONTAP to first try to delete Snapshots in order to free up space. This command works in conjunction with the next command:

vol autosize <volname> on

This enables Data ONTAP to automatically grow the size of a FlexVol if the need arises. This command works hand-in-hand with the previous command; Data ONTAP will first try to delete Snapshots to free up space, then grow the FlexVol according to the autosize configuration options. Between these two options—Snapshot autodelete and volume autogrow—you can reduce the fractional reserve from the default of 100 and still make sure that you don’t run into problems taking Snapshots of your LUNs.

If you have a NOW login, you can get more information on Snapshot autodelete here; more information on volume autogrow is available here. Be aware that SnapDrive may require different settings in order to accommodate its functionality, as it moves LUN management out of the storage system and onto the host. Finally, the values presented here are only examples; be sure to use values that are appropriate for your environment.

Credit for compiling this list goes to my colleague Chauncey Willard. Good work!

Tags: , , , , ,

Last year, I wrote an article about using NetApp Snapshots and LUN clones to enable the recovery on individual files within a VM.  This time around, I’d like to have a quick at that same process, but this time using NFS instead of block-level storage.

As I mentioned a couple of weeks ago, NFS is getting more and more attention as a key storage enabler for Virtual Infrastructure implementations.  I do still plan to conduct some tests of my own between iSCSI and NFS.  (Since they are both IP-based storage protocols, I figure that makes the playing field as level as possible.)  In any case, with regards to file-level recovery within VMs, NFS does possess at least one advantage.

Using any sort of clones (LUN clones or FlexClones) within VI3 currently requires resignaturing enabled, or else the ESX Servers don’t even see the clones.  While enabling resignaturing is not difficult (can be done via the command line or via VirtualCenter), it is not the default configuration and VMware appears not to recommend it (per the SAN Configuration Guide, pages 112 through 115).  With NFS, it’s only necessary to create a FlexClone and set up a new NFS mount; no other configuration is required.

By the same token, using NFS for file-level recovery within VMs also has one key disadvantage:  LUN clones are free, whereas the use of FlexClone requires a license.

With these advantages and disadvantages in mind, let’s have a look at the what the process would look like to recover files inside VMs using NFS for VM storage with NetApp Snapshots.

First, we’d review the list of available Snapshots using the snap list command, as shown below:

filer> snap list nfs_volume1
Volume nfs_volume1
working...
 
%/used %/total date name
---------- ---------- ------------ --------
0% ( 0%) 0% ( 0%) Oct 08 12:00 hourly.0
0% ( 0%) 0% ( 0%) Oct 08 08:00 hourly.1
0% ( 0%) 0% ( 0%) Oct 08 00:00 nightly.0
0% ( 0%) 0% ( 0%) Oct 07 20:00 hourly.2
0% ( 0%) 0% ( 0%) Oct 07 16:00 hourly.3
0% ( 0%) 0% ( 0%) Oct 07 12:00 hourly.4
0% ( 0%) 0% ( 0%) Oct 07 08:00 hourly.5
0% ( 0%) 0% ( 0%) Oct 07 00:00 nightly.1

Once we identify the Snapshot that contains the data we need to recover (based on the date/time of the Snapshot), we create a FlexClone using that Snapshot as its backing:

vol clone create nfs_volume1_clone -s file -b nfs_volume1 nightly.0

This creates a FlexClone named “nfs_volume1_clone” based on the nightly.0 Snapshot of the volume nfs_volume1.  If you immediately run the exportfs command, you’ll see that the new clone is already shared via NFS, too.

From here, the process is pretty straightforward:

  1. Create a new NFS datastore within VirtualCenter, using the new NFS mount as the destination.  This makes the data inside the FlexClone visible to the existing VMs.
  2. Add one of the VMDKs on the cloned NFS datastore to an existing VM as an additional hard drive.  You should be able to do this on the fly without shutting down the VM.
  3. Extract the files you need and place them back where you want them.

When you’re done recovering files, the clean-up process looks like this:

  1. Remove the VMDK(s) from the VM to which it/they was/were added.
  2. Remove the NFS datastore from VirtualCenter.
  3. Destroy the FlexClone using the vol offline and vol destroy commands.

Overall, this process is rather similar to the technique described using LUN clones, although a bit simpler because resignaturing is not required.

Tags: , , , , , , ,

Nifty NFS-VMware Trick

I can take absolutely zero credit for this idea; it came completely from this aticle by Nick Triantos.  But the trick is so absolutely cool, so incredibly useful, and yet so obvious (once you read it, you’ll smack yourself in the head and say, “Why didn’t I think of that?”) that I just had to say something about it.

The use of NFS is getting more and more attention (I blogged about it briefly a few days ago) as a primary storage technology for VMware deployments.  Although NFS lacks the raw throughput of Fibre Channel, once you start loading up VMs in a datastore NFS begins to look more and more attractive.  But performance is only part of the allure here, especially when using something like a Network Appliance storage system with its Snapshot functionality.  (Yes, other vendors can do the same kinds of things.  Substitute your favorite vendor or filesystem here, if you so desire.  I would imagine you could do something similar with ZFS.)

The basic gist of the article (I do encourage you to go read it; I’ve already added it to my del.icio.us bookmarks) is to use NetApp Snapshots to gain access to VMware’s VMDK files (even while the VM is running), and Linux with the Linux-NTFS driver to mount virtual machine disk files over NFS for file-level backups of both Windows and Linux guest VMs.  Now that’s something not even VCB can do (VCB file-level backups are limited to Windows guests).  Pretty cool, if you ask me.

Tags: , , , ,

My recent article on how to provision VMs using FlexClones prompted a reader to ask the question, “What about using LUN clones?”  That’s an excellent question, and one that I myself asked when I first started using some of the advanced functionality of Network Appliance storage systems.  I had expected that this question would come up, and so I’d already begun preparing an article discussing LUN clones vs. FlexClones.  My thanks go to Aaron for prompting the discussion!

LUN clones and FlexClones share a lot of similarities:

  • Both LUN clones and FlexClones are built on top of the Snapshot functionality resident within Data ONTAP, the OS that runs on Network Appliance storage systems.
  • Both LUN clones and FlexClones are space conservative, meaning the clones only take up as much space as required to store changes from the original.
  • Both LUN clones and FlexClones can be created in seconds, and the size of the LUN (or FlexVol) does not significantly impact the time required to create the clone.

The key disadvantage to using LUN clones comes as a result of an interaction between how WAFL (the file system used by Data ONTAP) handles LUNs, and how Snapshots are performed and managed.

From WAFL’s perspective, a LUN is really nothing more than a single file on the file system.  You can see this by browsing via CIFS or NFS to a FlexVol that contains a LUN:

[macosx:/Volumes/vol02$] slowe% ls -la
total 25165856
drwx------   1 slowe  admin        16384 Dec 31  1969 .
drwxrwxrwt   8 root   admin          272 May 21 11:47 ..
-rwx------   1 slowe  admin  12884901888 May 21 11:48 vswex02_vmfs

If I were to enable the .snapshot (or ~snapshot) directory, we’d actually be able to see Snapshots of the LUN within that directory.  In fact, this NOW (NetApp on the Web; login required) article describes mounting LUN snapshots inside the .snapshot (or ~snapshot) as a way of recovering files or folders inside a snapshot.  This technique is also applicable to recovering VMs from a LUN snapshot.

“OK,” you may be saying, “so LUNs are implemented and managed as files.  What’s your point?”

My point is that Snapshots are handled per volume, and capture all the data in the active filesystem.  A LUN exists as a file in the filesystem, so a Snapshot will capture that.  When you create a LUN clone, you will then create another file in the active filesystem, which subsequent Snapshots will then capture.  The end result is that you can end up with Snapshots that cannot be deleted because they reference a LUN clone which is, in turn, backed by another Snapshot.  In these cases, you won’t be able to delete Snapshots until you delete the LUN clone and all the Snapshots that reference that LUN clone.  This blog posting discusses this very problem and provides a Data ONTAP command to help track down the dependencies.  (I’m also told that there is a NOW article on this problem as well, but I was unable to locate it.)

For short-term scenarios, LUN cloning works well and is, as some have pointed out, free (FlexClone requires a separate, paid license).  For longer-term storage scenarios, however, LUN clones and the dependencies introduced by subsequent Snapshots of those LUN clones mean that FlexClones are a better solution.  Since FlexClones are entire volumes stored within an aggregate, they aren’t subject to the same problems as LUN clones (which are stored within a volume).

I hope this helps clear up some of the differences between using LUN clones and using FlexClones.  Add your comments or questions below.

Tags: , ,

Network Appliance Snapshots—point-in-time copies of a file system that can be created almost instantaneously and which generally require much smaller amounts of storage to keep—are an integral part of NetApp’s value over other storage systems. These snapshots make it far easier and quicker to recover from data loss or corruption than a tape backup system.

But how do we go about recovering individual files from a snapshot when those files are stored in a virtual disk (VMDK) file used by a VM? After all, VMware proponents tout the encapsulation property of virtualization as a benefit: “One file to back up and you get a backup of your entire server!” Fortunately, there’s a way to continue to reap the benefits of encapsulation while still allowing for the ability to recover individual files from a snapshot of the VM’s virtual disk file. Here’s how.

The trick here is to take advantage of LUN cloning, a feature on the NetApp storage systems that allows you to take a snapshot—which is a read-only point-in-time copy of the file system—and create a clone, which is a read-write point-in-time copy of the file system. This clone takes only seconds to create, like the snapshot on which it is based, and requires only enough storage to store the changed blocks, i.e., the “deltas” between the clone and the original. We can then present that clone back to VMware ESX Server to manipulate in whatever way we see fit.

There are three parts to this process. First, we configure ESX Server to recognize snapshot LUNs on the SAN (this is a one-time configuration change). Then, we take the snapshot on the NetApp storage system, create a LUN clone from the snapshot, and present that LUN clone back to the ESX servers. Finally, we manipulate the LUN clone within ESX in order to retrieve the specific data we need.

Enable Resignaturing on ESX Server

In the ESX SAN Configuration Guide (found here on VMware’s site), there is this blurb about resignaturing:

VMFS volume resignaturing allows you to make a hardware snapshot of a VMFS volume and access that snapshot from an ESX Server system.

This is the functionality that allows us to use LUN clones on the NetApp storage system in ESX Server. Without this functionality, the LUN clones aren’t properly recognized by ESX Server and can’t be utilized to allow us to perform data recovery.

To enable VMFS volume resignaturing, set the LVM.EnableResignature option to 1 (on). This option can be set in VirtualCenter using these steps:

  1. Set the ESX Server host for which you want to enable VMFS volume resignaturing.
  2. Go to the Configuration tab for that host, then select Advanced Settings.
  3. Change the LVM.EnableResignature to 1 (on). The default is off.

After this option is set, you’ll be able to present LUN clones (or other hardware snapshots) to ESX Server and it will recognize them as such.

Now we’re ready to move to the NetApp storage system.

Taking a Snapshot and Making a LUN Clone

By default, snapshots are already enabled and scheduled, so unless you’ve modified the configuration, the NetApp storage system is already taking snapshots of the volumes that hold the LUNs where the VMware VMFS partitions (and thus the VMDK virtual disk files) are stored.

We can view the list of snapshots like this:

filer> snap list vol_name
Volume vol_name
working...
 
  %/used       %/total  date          name
----------  ----------  ------------  --------
  0% ( 0%)    0% ( 0%)  Dec 30 08:00  hourly.0
  1% ( 0%)    0% ( 0%)  Dec 30 00:00  nightly.0
  1% ( 0%)    0% ( 0%)  Dec 29 20:00  hourly.1
  1% ( 0%)    0% ( 0%)  Dec 29 16:00  hourly.2
  1% ( 0%)    0% ( 0%)  Dec 29 12:00  hourly.3
  2% ( 1%)    0% ( 0%)  Dec 29 08:00  hourly.4
  2% ( 0%)    0% ( 0%)  Dec 29 00:00  nightly.1
  3% ( 0%)    0% ( 0%)  Dec 28 20:00  hourly.5

Now, we can make a LUN clone from one of these snapshots and map it to an igroup (this would normally all be on a single line, but I’ve wrapped it here for readability):

filer> lun clone create /vol/vol_name/lun0_clone
-b /vol/vol_name/lun0_vmfs nightly.1
filer> lun map /vol/vol_name/lun0_clone igroup_name 0

The LUN clone has now been created and presented back to the igroup named igroup_name as LUN ID 0. A rescan of the storage adapters in ESX Server (iSCSI was being used in this case) will now show the LUN clone as “snap-00000001-lun0_vmfs” (the number will change depending upon how many snapshot LUNs have been presented to the server farm). Now that we have access to the VMFS, we can do any number of things:

  • We can create a new VM with the same configuration as the original VM and boot it up to recover data from the VM in that manner (be cautious of networking issues, such as duplicate IP addresses). You’ll just need to select the existing VMDK (or VMDKs, if there are more than one) on the snapshot VMFS LUN instead of creating a new virtual disk file when creating the VM.
  • We can attach the VMDK(s) to an existing VM running the same operating system (or an operating system that will read the file system used inside the VMDKs in question) and then browse the file system to retrieve data stored inside the VM.
  • We could shut down the original VM and boot up the clone VM instead. This might be helpful if you needed to recover data but also needed network connectivity, or if the two VMs couldn’t be running at the same time. (In theory, this might work for Microsoft Exchange, if you aren’t using SnapManager for Exchange.)

As you can see, this allows us to take full advantage of encapsulating the server in the VMDK file(s) but also allows us to retrieve individual files or groups of files from a snapshot of the VMDK file(s).

In future articles, I’ll touch on restoring entire VMs using NetApp snapshots, as well as talk about getting consistent snapshots of the VMs.

Other Information

This process was performed on a Network Appliance FAS810 running Data ONTAP 7.1.1.1 and servers running VMware ESX Server (both 3.0.0 and 3.0.1) with the software iSCSI initiator. VMs running Windows Server 2003 R2 were used for testing.

Tags: , , , , , , , ,

Newer entries »