FlexClones Versus Deduplication with VMware Infrastructure

A number of times over the last few months, I’ve run into situations where NetApp’s FlexClone technology was being heavily pitched to customers interested in deploying, or expanding their deployment of, VMware Infrastructure.

In case you aren’t familiar with the use of NetApp FlexClones in conjunction with VMware Infrastructure, have a look at these earlier articles of mine:

How to Provision VMs Using NetApp FlexClones
NetApp FlexClones with VMware, Part 1
NetApp FlexClones with VMware, Part 2
LUN Clones vs. FlexClones

Now, after you’ve read all those articles (you did read them, didn’t you?), it should be fairly clear that using FlexClones can be very advantageous. However, those advantages come with some tradeoffs as well, most notably in the complete and total lack of integration with VMware Infrastructure itself.

This lack of integration means that users can’t use VirtualCenter templates, because the cloning is taking place at the storage array instead of within VMware Infrastructure. This also means that customers can’t apply customization specifications during the cloning process, so users will need to create their own Sysprep answer files and manually Sysprep the VMs before invoking the FlexClone process. Users are required to create scripts and tools to do simple things like using the VM name for the guest OS name during cloning.

<aside>Lest anyone think I’m picking on NetApp here, let me state that this would apply to any storage vendor that offers pointer-based copies. As long as the use of those pointer-based copies (or even deep copies, for that matter) is not integrated within VirtualCenter, then they will suffer the same problems.</aside>

Deduplication, on the other hand, works seamlessly with VMware Infrastructure. This is primarily because the details of the deduplication are completely hidden; it all occurs “inside the box.” Nothing needs to be configured within VirtualCenter; no VMs need to be modified. The NetApp storage system handles the details of the deduplication process itself, and VMware Infrastructure just consumes the storage.

Looking at these two technologies in that light, one might ask: why use FlexClones at all? If deduplication works seamlessly with VMware Infrastructure and FlexClones don’t, then why bother? To be honest, there are some instances where FlexClones make sense—even with the lack of integration. Consider some of the examples listed below.

  • In instances where a user needs to deploy lots of VMs in a very rapid fashion, FlexClones are much better. If time-to-deployment is the #1 driving factor, then FlexClones are the way to go. This could be particularly applicable and useful in VDI situations, as long as the broker doesn’t mandate handling provisioning itself (like VDM does).
  • In environments where provisioning and re-provisioning occurs on a frequent, regular basis, then FlexClones make sense. Even though large numbers of VMs aren’t being provisioned, the time saved on frequent re-provisioning via FlexClones will not be insignificant.
  • In situtations where there isn’t sufficient storage for the VMs before they are deduplicated, FlexClones may be a better option. Deduplication is post-process, meaning that storage will be needed for the full datasets until deduplication runs. In situations where that isn’t an option, then FlexClones can provide the same end benefit.

Personally, I’m of the opinion that unless an organization meets one of these criteria, then that organization should look to deduplication instead of FlexClones. Of course, that’s just my personal opinion, and I’m open to hear what others have to say about the matter. NetApp gurus, feel free to weigh in.

Tags: , , , , , ,

Is the NetApp deduplication on the fly - or post event? I think we have to remember that if its a post event you need the storage upfront anyway… or you have to go through cycles of de-duplication to claim back your disk space.

All the stuff you say - screams out VDI to me… And from what I have seen from VMware - its likely the storage vendors will have an edge over VMware for sometime..

Lastly, the now legendary NetApp YouTube demo - where they create 100 VMs for real via SnapClone. Is that for real. I’ve heard some people say it takes hours to do that - not minutes… What’s your experience been?

Hi Scott,

Nice article. I always enjoy reading your thoughts on NetApp and VMware technologies!

My take on FlexClone and A-SIS is a little different. I think of them as being used for complimentary but distinct purposes.

FlexClone is fantastic for DR testing in a VMware environment. Most Storage technologies make you break replication relationships to read/write the destination volume which is required by VMware datastores to bring up VMs. With FlexClone, the administrator can just FlexClone the SnapMirror destination FlexVol and connect the new read/write capable datastore into VMware. This can be over FCP, iSCSI or NFS of course. The FlexClone FlexVol takes a second to create, and can initially take up 0 space…beautiful.

A-SIS Deduplication is awesome at efficiently saving the admin a ton of space, not only on the source/production datastore volumes, but also on SnapMirror replicated destination volumes (at DR for example). I have personally seen an 80% saving on NetApp NFS datastores running a mix of Windows XP and 2003 VMs.

Together these two technologies work great.

1. A-SIS De-duplicate your production Flexvols holding VMs = Large Space Savings!
2. Replicate your entire VMware production environment with SnapMirror to DR (Only de-duplicated data gets sent, saving on time, bandwidth, and storage needs)
3. Use FlexClone to create instantaneous read/write clones of your VM Datastores at DR. No need to break your replication relationship during DR testing!
4. Use DR ESX Farm & VI Client to connect up to FlexClone datastore and test DR Strategy
5. Use VMware’s new SRM (Site Recovery Manager) which does integrate with NetApp FlexClone and SnapMirror to offer “push button” DR testing and failover features. (Hopefully they add NFS support in the next release)

I too wish there was some integration of these tools in the VI Client. VI Plugins do exist, and I use the SVmotion Plugin a good deal already. http://sourceforge.net/projects/vip-svmotion/

Both NetApp and VMware have strong Software Development Kits, which help make an opportunity to add NetApp technologies into VMware’s VI Client. Citrix has done some of this already with their NetApp Adapter for XenServer, which automates some basic storage provisioning and A-SIS features within their GUI.

Dave

Mike,

Good questions! NetApp deduplication is post-process, meaning its deduplicated after the fact. That’s one of the reasons I pointed out for using FlexClones, in that you don’t need to wait for the deduplication process to get the space back. If total space is a concern, then FlexClones may be a better option even with the added complexity.

Deduplication and FlexClones both really shine in VDI environments, because the scale of the environment makes the impact of these technologies that much more powerful.

As for the legendary YouTube video….here are my thoughts:

http://blog.scottlowe.org/2008/02/28/i-love-it-but-its-not-available/

Last time I checked, this functionality wasn’t available in Data ONTAP yet (although it may be included in the very recently released ONTAP 7.3). So I can’t really say what that’s going to look like. Keep in mind, though, that without integration in VirtualCenter it will suffer from many of the same problems as FlexClones.

Thanks for reading!

Scott,
I have a question relating to the de-dupe/FlexClone talk that is slightly off topic and might be a good idea to address in a post all to itself, but I’ll ask it anyway.

I’m no SAN expert, just Jack-of-all-trades-VMware guy, but de-dupe or flex-clones have absolutely nothing to do with handling more or less IOPS, right? They seem to have everything to do with decreasing the amount of disk space necessary. When designing a SAN solution to accommodate a customer’s VMware storage needs, the first thing I do is look at IOPS and make sure my SAN solution has enough spindles to provide the required IO for the VMs. Secondary in my calculations is total disk space needed. So, I’m guessing then that the only place de-dupe actually pays for itself, is in cases where you have larger volumes of inactive data than active data. Would that be a correct assumption? If so, then NetApp vs an Equallogic SAN: I would purchase the same about of disks, it’s just that with NetApp I could get away with cheaper 146 GB disks instead of 300 GB disks. But in an environment with lots of active data, and lower percentage of inactive data, there would seem to be no benefit to de-dupe.

As always, thanks for your work! I’m just trying to get my head around when NetApp is THE solution for a customer, and when Equallogic/Compellent or random other SAN is good enough.

Sean Clark

Sean,

You are correct–deduplication and FlexClones do not address IOPS requirements, only storage requirements. You should still size the SAN to handle the IOPS requirements with the appropriate number of spindles, storage controllers, target ports, etc., etc. Deduplication isn’t just about active vs. inactive data, though; it’s about duplicate data. Data may be active, but if it’s the same data as exists elsewhere (looking at this from a block level, not just a file level), then why are we storing multiple instances of it?

Hope this helps!