Storage Array Snapshots with VMware

A new article of mine has been published on SearchVMware.com! This article discusses the use of storage array snapshots with VMware, specifically focusing on ensuring that storage array snapshots are consistent and usable:

When used in conjunction with a VMware infrastructure, storage array-based snapshots are touted for their ability to create point-in-time pictures of virtual machines (VMs) for business continuity, disaster recovery and backups. While this can be true, it’s important to understand how virtualization affects storage array snapshot use. Incorrect usage can render storage array snapshots unreliable and generally defunct.

The article provides a few guidelines on making sure that storage array snapshots are usable. Keep in mind, too, that some storage array vendors have applications that are specifically designed to help with this particular issue. NetApp, for example, has SnapManager for Virtual Infrastructure; this product is specifically designed to address this problem (among other problems). I would imagine that other vendors also offer a software solution to this problem, but I’m not particularly familiar with those. I’d love to hear from readers as to their experience or knowledge with any such software solutions.

Tags: , , ,

This post is very timely for me, as I have just had discussions on this topic with both VMware and NetApp. What is the take on the use of VMware’s method to quiesce the filesystem. I have seen some issues with the filesystem flush and database backed apps such as AD, exchange, SQL, etc.

I agree that cold or warm snapshots probably give the best guarantee on consistent snapshots.

As to the difference between hot snapshots and vmware+array snapshots, in practice we’ve seen little difference on recoverability. If anything, virtual machines that have associated VM snapshots give extra problems if you flexclone them and import the cloned VM.

We perform a hot (Netapp) snapshot on all our datastores every hour. We used to “VM snapshot” the VMs before taking the NetApp snapshot, but this impacted performance, and even from time to time locked up a VM. Now we only take Netapp snapshots.

Wade,

First, see KP’s very informative comment; in his experience, adding VMware snapshots to the mix doesn’t seem to help much. I will agree that not all workloads are good candidates for the use of VMware snapshots; this also precludes the use of most hot-backup technologies including VCB.

Second, I’d recommend application-aware utilities such as those offered by NetApp (SnapManager for Exchange, SnapManager for SQL, etc.); they can provide the guaranteed consistency and data integrity that you need for these types of applications, and their use is–as I understand it–fully supported within virtual machines.

Thanks for reading!

KP,

Thanks for your very helpful comment. Your experience in using VMware snapshots along with NetApp snapshots is very relevant. Thanks for reading!

Other than vendor recommendations, does anyone have any direct evidence that VMFS3 filesystems snapshotted are any worse than a power failure?

VMFS2 had significant issues with crash consistancy, but in my experience NTFS on VMFS3 survives fairly well when you pull the plug (or recover the snapshot).

For super-critical financial applications, I am not recommending this, but three snapshots a day in this style has very rarely failed.

Does anybody know if their is a “SnapManager for Virtual Infrastructure” from Dell Equallogic coming out?

I haven’t heard anything along those lines.

Scott,

Great post(s)!

I added my “2 cents” on my blog. It ended up being too much to post in a comment here.

http://vmetc.com/2008/06/07/avoid-hot-vmware-snapshots-when-using-storage-array-snapshots/#more-414

-Rich

The combination of VMsnaps+NetApp snaps has been heavily debated within NetApp the past few months.

Some of us believe we should provide, within SMVI, the option to disable VMsnaps while at the same time and during initiating a NetApp snap have a Warning window pop up,cautioning the user as to the potential ramifications of such practice. Bottom line, let the user decide the value of their data and make the appropriate decision.

Others believe that even if there’s a 1% chance for FS corruption, then we ought to do whatever’s possible to avoid putting customers in this situation which means utilize VMsnaps as part of the process.

For those who are not familiar with the process SMVI follows here’s how it works:

1) Initiate NetApp snapshot request
2) SMVI call to VC over port 443 - https. VC call to lgtosync.sys driver inside the Guest. Quiesce, flush fs buffers
2) VMsnapshot
3) NetApp Snapshot
4) After NetApp snapshot completion, remove VMsnapshot

The whole process takes somewhere between 5-6 secs. Because it’s so quick, there’s barely any I/Os accumulated in the Redo Log so deleting the VMsnap is a rather quick process.

Mind you that there’s SRM integration with SMVI. The NetApp SRA (Site Recovery adapter) will leverage SMVI snapshots and create FlexClones from SMVI snapshots for site recovery testing while replication is still intact and ongoing. Furthermore, the above can occur with FC or iSCSI or a combination of both (i.e FC on Production site and iSCSI on recovery).

Cheers

Great post as always Scott. EMC offers a tool that does this (along with application-integrated replication for Exchange 2003/2007, SQL Server 2005/2008, Oracle 10g/11g).

It’s called Replication Manager. It’s supported VMware for a long time (for customers running apps in VMs), but recently added VMFS integration (for filesytem-flushed datastore level point in time copies and instant restore for datastores or VMs)

For NetApp folks - it’s equivalent to the Snap Manager family (SME, SMS, SM for Sharepoint, SMO), but is a single tool.

Dell/EqualLogic will create a similar tool at some point, as they recently started creating tools that integrate with Exchange (VSS) and SQL Server (VSS/VDI). Only NetApp and EMC have been doing this for a long time.

I’ll be doing a post on this on my blog in a bit with screenshots and a demo, as it seems to be a hot topic.

Chad,

Thanks for the feedback and the information on EMC’s product offerings in this area. I appreciate it!