Part 1 in our series on NetApp FlexClones and VMware discussed in greater detail some of the advantages of using FlexClones for VM provisioning. In that article, we saw that using FlexClones can greatly reduce both the storage required for new VMs as well as the time required to provision new VMs, especially when the storage needed by the VMs is large. Both of these advantages can be very compelling.
However, in order to make an informed decision about whether we should use FlexClones we must also look at the disadvantages of this approach. In this part of the series, we’ll take a look at some of those disadvantages.
Just as there were two key advantages to using FlexClones, there are also two key disadvantages to using FlexClones:
- First, the use of FlexClones (unless properly architected) may cause installations to bump up against the maximum number of LUN IDs available (255), or the maximum number of LUNs that may be opened concurrently by all virtual machines (256). (Both of those numbers were taken from the “Configuration Maximums for VMware Infrastructure 3†white paper, available here.) Unless properly architected, this means that your installation could be limited to about 250 virtual machines. Note that there are ways around this limitation; see below.
- Second, there is currently no integration from either Network Appliance or VMware to help automate the process of using FlexClones for VM provisioning. As shown in my technical article on how to use FlexClones for VM provisioning, there are a number of manual steps currently, and users seeking a greater level of automation must create their own scripts or hope that someone else already has written scripts they can re-use.
While these are the biggest disadvantages (in my opinion) there are also a number of other, smaller disadvantages as well:
- Using FlexClones for VM provisioning blurs the operational responsibilities of the SAN administrators and the server administrators; SAN administrators are now responsible for provisioning new VMs via FlexClones.
- Using FlexClones for VM provisioning adds complexity to the solution, making it harder to troubleshoot and administer over time.
- Making the most of the FlexClone space-conserving functionality requires more specialized VM configuration, such as dedicated VMDKs for pagefiles/swap partitions on separate LUNs.
These limitations can really derail the use of FlexClones in larger deployments. Ironically, it’s the larger deployments where the advantages of FlexClones are the most significant. In larger environments, there are typically dedicated SAN administrators and dedicated VMware/VI3 administrators. Introducing the use of FlexClones in this type of environment now begs the question: who’s responsible for provisioning VMs?
While some of these disadvantages are inherent in the solution, there is a workaround for one of them, and that’s the maximum number of LUN IDs/LUNs. Because each FlexClone is a separate LUN with a separate LUN ID, your installation will be limited to about 250 LUNs. If you only place one VM per LUN, then you’ll be limited to about 250 VMs. However, if we instead take the practice of placing multiple VMs—each of them prepared for cloning as described in my how-to article—then we can easily scale the number of VMs. For example, we could create a master VMFS datastore that had 10 VMs built and prepared for cloning, then deploy FlexClones for each group of 10 VMs that were needed. Using this technique, it would be reasonably easy to scale this solution to support 2,500 VMs (250 VMFS datastores with 10 VMs each). (Of course, 1,500 VMs is the configuration maximum for a single VirtualCenter management server.)
In summary, the use of FlexClones for VMware environments has some very compelling advantages, and some very significant disadvantages. It will be up to each organization to weigh these pros and cons against the specifics of their company to determine which route to take. At least now you have some information upon which to base your decision.
As always, I welcome any comments below.
Tags: NetApp, Storage, Virtualization, VMware
-
Scott,
Surely a flexclone is just a read witeable snap of the original source LUN and not a clone at all. That would mean that all of the flexclones although initially space efficient, would effectively be competing for the same disk resource and in many cases the same data blocks. As they divergeover time then the competition for the same blocks may decrease whilst the capacity requirements would increase, although they would still effectively be competing for the same disk resource. The other issue is the fact that this is just a snap which means it’s completely reliant on the parent LUN’s integrity for it’s survival in a failure. Assuming the original source LUN fails for some reason then you also lose all of the flexclones.Cheers
John -
Scott,
Forgive my ignorance, I’m’ just trying to get my head round how this would work. It appears the majority of my conclusions were correct, but the snap applies to the volume and not the LUN.
Read / write snaps are nothing new really, neither are space efficient snaps so I’m applying my understanding of these to a flexclone, although I realise there’s no copy on write penalty for wafl.
What I don’t understand is how data blocks will not be shared if only for unchanged data, if the data remains unchanged post snap (frozen) in the parent volume then surely everyone accesses that same frozen data block ? Data’s data, it either consumes additional space or it’s shared as a single instance via a pointer.
If the data block does change then the clones will begin to diverge and consume their own storage space outside of the parent LUN’s blocks?
How easy would it be to predict and manage the additional space consumption given the storage is effectively oversubscribed ? I’m assuming this could be easily monitored to give plenty of warning.Cheers
John -
“As for contending for disk resources, the FlexClone will contend with other workloads on the storage system for the disks in the aggregate”
Hi, so with ONTAP 7.2 we offer FlexShare (no license required - priority command) where you can set execution priority for each volume relative to other volumes as well relative to system resources (i.e SnapMirror, SnapVault).
-
Sorry for the delayed response Scott. In fact I just saw your reply.
Given that FlexShare is not tied to a specific protocol and applies on a per volume basis regardless of the type of the volume (Flexclone or not) and its contents (regular files or LUNs), there’s no protocol preference because it operates in the same manner in all cases.
In a multiprotocol array the ability to provide the peacful co-existence or various workloads regadless of the protocol is important. Afterall, you probably don’t want a volume with home dirs to have the same execution priority as a mission critical app. in an array appoaching full resource utilization.
In fact, I would equate FlexShare to VMware’s Resource Management where you can cap CPU and Memory resouces on a per VM basis. Now, if you take Flexshare and Multistore (which provides the ability to partition a physical filer into several virtual filers), what you end up with is capabilities similar to what ESX server is providing but for the storage side.
An area where Flexshare can be beneficial with Flexclone is Test/Dev. after you clone the primary volume, the clone resides on the same aggregate. So you may want to lower the execution priority of the flexclone in relation to the other volumes using the same disk resources.
Another area Flexclone can be beneficial is a case where one decides to split the Flexclone from the snapshot and create a fully autonomous volume. When that happens blocks are copied in the background from the snapshot into the Flexclone’s active space. Depending on the number of disks and back-end loops, this process can be quite intense (north of 350-400 MBps internal reads/writes). So you may want use Flexshare to limit the intensity of the back-ground copy.
happy holidays to you and your readers.
-
Hi Scott,
Yes Flexclones are considered equal to a typical volume. Anything you can do with a non-cloned volume can be done with a Flexclone. when you create a Flexclone from a snapshot, you’re actually creating a brand new volume that happens to share some blocks in the snapshot with the “parent” (original) volume. Writes to the Flexclone get allocated new space. Reads can come from the snapshot or from newly written blocks. Overtime, as the number of common(shared) blocks between the “parent” volume and the flexclone diverge, you have the option to split the flexclone from the snapshot at which point the “shared” blocks get copied out of the snapshot and into the space of the flexclone.
You can do the same thing with LUNs and the lun clone command.
There are more cloning enchancements on the way that server virtualization users will find extremely usefull not only from a povisioning but also from a recovery standpoint.
-
Hi Scott,
Why would you hit the limit of 256 LUNs? If its flexclone/netapp, surely its running on NFS, not iSCSI/VMFS .. in which case you would hit the NAS datastore limit of 32…?
Cheers
Alex -
Scott, if we were using iSCSI/block storage/VMFS, then how would Netapp be able to do Flexcloning?
I was under the impressoin that the Netapp must use NFS for flexcloning so it can manipulate the files on the NFS filesystem.
Or can Netapp read VMFS filesystems?
Or does it not need to and by some magic flexcloning works with both iSCSI/VMFS and NAS/NFS?
Cheers
-
Excellent work Scott. I know it’s late, but my question revolves around updating the master volume; Do the flexclones see the updates?
Specificaly, if I create a master volume with a golden XP image, can I auto update the master and propogate all changes out to the children in one whack? Will this cause (or potentially cause) inconsistencies in the childrens’ registry settings, say if a key was previously changed in the child?
-C
-
bummer - I thought you were going to say that… So if the original blocks are modified in the update in the master, the old pre-modified blocks are then in the snapshot the flexclones are based off? Or, does each flexclone get bigger as a result? Where are their pointers pointing?
Seems like there should be a way - kind of like unionfs in linux.
http://www.filesystems.org/project-unionfs.html-C
-
we need dynamic clones. clones whose underlying base is malleable and can propagate immediately. You’re saying VMware View does essentially this, but this whole VDI thing adds an enormous amount of complexity. So much so that I’m seeking simpler alternatives by trying to leverage the NetApp directly in some way.
Ideally it’s an object oriented environment, where the child ‘class’ has inherited much of itself from it’s singular (or multiple modular) parent(s).
A spawned virtual environment is instantiated instantly on access, and is simply an extension of the dynamic class(es) it’s based off. ‘It’ is simply the change it brings to it’s instance.
Perhaps this is what ‘View’ essentially is? I’m not all that versed in it actually.
-C



19 comments
Comments feed for this article
Trackback link: http://blog.scottlowe.org/2007/05/17/netapp-flexclones-with-vmware-part-2/trackback/