Reader Kyle Ross shared with me a potential issue with VMware’s new backup product, VMware Data Recovery. Others within the VMware blogging scene have also covered this, but I wanted to mention it as well so that others didn’t run into the problem. Here’s Kyle’s write-up:
I was made aware of a serious (in my opinion) bug with VDR during a call with VMware support that I haven’t seen discussed anywhere. This is an internally known issue that causes snapshots to build up on VM’s that are members of VDR backup jobs.
During the backup process a new snapshot is created and VDR updates the snapshot descriptor file (vm_name-000001.vmdk) to mark the snapshot as un-removable. The bug is introduced when the backup process completes, it fails to mark the snapshot as removable causing them to remain.
The tricky part of the problem is that the snapshots are not visible through the vSphere Client, nor are they listed in apps like ‘RVTools’ that use the VMware CLI to gather data. They could potentially be listed in the new datastore views but I didn’t think to look there before I resolved it in my environment. I ran across them by logging into the service console and running the following command to list all the delta files on the datastores attached to the server.
find /vmfs/volumes/ -name \*delta\*
In my environment I noticed numerous VM’s with multi-gigabyte delta files that I couldn’t account for via snapshots listed in the GUI. Here is the solution I was given by VMware. Via the service console, browse to the location of the VMDK files for the affected VM. Run this command to identify the descriptors that need to be corrected, replacing ‘virtual_machine_name’ with the actual name of the VM.
grep –I ddb.dele *virtual_machine_name*-000???.vmdk
This command will quickly identify the delta files that are marked as non-deletable. The workaround is to edit the affected VMDK descriptor files and change “ddb.deletable” from “false” to “true”. You will probably also need to edit the root VMDK file and change this field as well, otherwise you may be left with one open snapshot. Note that due to a change in how ESX 4 performs file locking, you will probably need to SSH into the host that is currently running the VM to edit these files. Once you have edited all the files, create a new snapshot for the VM either via the GUI or command line. Then issue the “Delete All” snapshots command to force ESX to combine all the files and close all the visible and hidden snapshots.
As soon as more information is available, I’ll post it here. If any other readers have more information to share, please speak up in the comments.
Tags: Backup, Snapshot, Virtualization, VMware
-
Hi there,
This is good to hear as I had not heard it before. My lab – which I just checked – doesn’t show this issue. Can someone give me more info so I can chase this down in VMware?
Thanks,
Michael
-
Michael,
this also happens when the VDR backup process fails. The VDR app tends to lose connection to the VDR datastore. Several different errors may be logged if that happens.
You can identify the orphaned snapshots by it’s naming convention. They all end with -00000x.vmdk. Each time it fails it increments the number.
this can eat up your available datastore space really quick.If you use VDR i would recommend to add a custom alarm definition with “VM snapshot size” as trigger to get a warning for VM’S with snapshots.
Mirko
-
Thanks for the heads up. Although I’m still not brave enough to touch production with anything with a VSPHERE tag on it just yet.
There’s plenty of good Powershell and other scripts to do your daily health reporting for ESX environments. I run two – one is Powershell dump of snapshots visible to VCENTER client, the other is the old weekly ESX shell script that finds all -delta files, which are very occasionally still present after a snapshot is deleted from VCENTER database. I’ve been meaning to find the time to move the shell script into a Powershell that does the same thing by browsing datastores.
The combination of the two techniques is very effective with this sort of thing,
-
Scott,
VCB also sometimes leave snapshots behind. This has caused major issues for us in our environments. I was therefore forced to find a way to report snapshots that were left behind. We therefore came up with a script that would send email alerts everyday at 3pm listing all the current snapshots on the SAN. This has proven to be a major time saver!
I blogged on the script here: http://www.virtualvcp.com/content/view/86/41/
The script is also available for download from my site, if anyone wish to use it.
-
Trackback from VMware Info on Wednesday, July 8, 2009 at 10:22 am
-
I had the same problem, but that was before this post.. The only way i saw a solutions was to clone the disks of 30 VM`s….
ATM i have exactly the same problem…. i`ll try your solution
-
Trackback from uberVU - social comments on Friday, February 5, 2010 at 8:12 am
-
I have the same problem. vmdsk disk is full i have disabled vdr.
-
I actually have the same problem with BackupExec 2010 and using the VMware Agent – which uses the vstorage API’s … No vcbmounter -U there anymore, which mostly helped in enviroments with VCB … so I will try you’re suggestions.
-
Hi Scott, we met at VMUG a few weeks ago at the EMC booth and talked about VCDX cert… I have good luck with VDR to date, however, I just updated vCenter to 4.1 and rebuilt vCenter on w2008 64bit, and also configured VUM on the same server, hey – its a lab! and now VDR can’t seem to read the snapshot during the backup process, it fails everytime.. I’ve tried a variety of things to resolve this issue, moved vCenter VM to new datastore with more space since the one it was on was showing usage warnings.. rebooted, looked at logs, can’t see anything at all the would indicate why VDR can’t mount \ read the snapshot to perform the backup… all other VMs are backing up fine.. my thought is that it just isn’t ready for vSphere 4.1 64bit vcenter \ VUM… has anyone else experienced this or know what this is about?
-
I ran ito this problem as well.
I used VDR over a long time but never noticed this behaviour till my VDR solution crashed. Over a long time I expierenced problems creating decent backups. The first few months after implementing I it ran without any problem but than without any reason it started to produce failed backups. I tried different things to get i working again but succeeded always in a temporarly working solution. I removed the existing job and created a new job to a different destination. Since I did this I was never be able to get it running decent again.
I think to remove the product completly and setting up a different solution for my backup. Does anybody recommend something? -
This appears to still be an issue with vdr 1.2.1. Making the above changes to ddb.deleteable and create/remove snap process cleared the issue for me. Thx.
-
Hello,
I have the same problem but only with a virtual machine which has two disk on two diffent datastores.
If I use your method it is definitively good ? Or should I do this for exemple each week ?
Thx.
-
Wow. It’s 2012 and vmware still hasn’t gotten a handle on this? Been using vDR for a year now and it requires daily administrative attention….




17 comments
Comments feed for this article
Trackback link: http://blog.scottlowe.org/2009/06/29/snapshot-issue-with-vmware-data-recovery/trackback/