Building on my earlier article on setting up NetApp deduplication, I wanted to follow up with some information on using NetApp deduplication with block storage (LUNs presented via Fibre Channel or iSCSI).
For the most part, using NetApp deduplication with block storage is a lot like I described earlier:
- You (obviously) still need the NearStore and deduplication (A-SIS) licenses installed on the controller(s).
- You will still turn deduplication on using the “sis on” command for the FlexVol containing the LUNs.
- Limitations on the size of the FlexVol still apply.
- You use the “sis status” command to check on the status of deduplication, and the “sis config” command to see the deduplication schedule.
OK, so what’s different? Well, it has to do with how LUNs are provisioned on a NetApp storage system. I’ve blogged before about managing LUN space requirements on a NetApp, and about using LUN clones vs. FlexClones. That second article, in particular, really goes into detail on how LUNs are implemented on top of NetApp’s file system, WAFL. Since LUNs are represented by WAFL as a single file, they are also normally “space reserved,” meaning that the maximum size of the LUN is allocated at the time of creation. If you create a 50GB LUN, then Data ONTAP creates a 50GB file right away. (For readers out there who are well-versed in NetApp storage, I know that’s a bit of a simplification, but bear with me.)
What does this have to do with deduplication? Great question. If the LUN is space reserved—meaning that the maximum size of the LUN is allocated up front and remains allocated to the LUN—then the file that represents the LUN won’t ever decrease in size to reflect deduplication savings, and deduplication therefore does you absolutely no good whatsoever. This is not to say that deduplication doesn’t work, just that it won’t help you at all.
Fortunately, there’s an easy fix for this. When creating the LUN, simply uncheck the box marked “Space Reserved” and allow Data ONTAP to allocate space to the LUN out of the containing FlexVol on an as-needed basis. Because the file that represents the LUN can grow in size, it can also shrink in size, and deduplication will cause the file that represents the LUN to decrease in size. This then allows you to provision additional LUNs from the same FlexVol to take advantage of the space savings resulting from deduplication.
I know that seems a bit confusing; I’ll probably post another article with some more in-depth discussions of the details. (Either that, or I’ll encourage my NetApp readers to chime in below in the comments.)
So, in summary, when using NetApp deduplication with block storage:
- you’ll setup and configure deduplication on the FlexVol containing your LUN(s) just like described in my earlier article;
- you’ll uncheck the “Space Reserved” checkbox when creating the LUNs to be deduplicated;
- you won’t see the space savings from the host’s perspective and therefore can’t store more data in that LUN than the size of the LUN; but
- you will be able to provision additional LUNs in that same FlexVol that can be presented back to host for additional storage.
I hope this helps clarify some of the questions or issues surrounding the use of NetApp deduplication with block storage. Feel free to add information, experiences with deduplication and block storage, or ask additional questions in the comments below.
UPDATE: There are some additional considerations about how to provision LUNs along with NetApp deduplication that warrant a more in-depth discussion. Look for a follow-up post within the next few days.


13 comments
Comments feed for this article
Trackback link
http://blog.scottlowe.org/2008/04/24/using-netapp-deduplication-with-block-storage/trackback/
Friday, April 25, 2008 at 12:14 pm
Ausmith1
I just enabled ASIS on one of our filers and one thing I noticed is that you may need to watch your snapshot sizes carefully. I found that because the ASIS process shrunk the data so much (93% for 0.5TB of VMs) that the used snapshot size blew way past the 20% allocated. If you were close to filling a volume then this could be very problematic. Just got to keep an eye on the data trends I guess…
Friday, April 25, 2008 at 12:23 pm
slowe
Yes, this is one thing that you will need to watch out for. I believe that NetApp recommends deleting existing Snapshots, then turning on deduplication and letting it run, then taking Snapshots. This will help prevent that problem.
Thanks for reading!
Friday, April 25, 2008 at 1:39 pm
Nick Triantos
That’s correct Scott. The proper procedure is to:
1) Deduplicate
2) Snapshot
The Snapshot space grew because there were snapshots in place prior to enabling deduplication. Since there are pointer updates when blocks are freed, from a snapshot perspective these are treated as changed blocks, thus the snapshot expands and the duplicate blocks, while have been freed from the Active file system are still locked in the snapshot.
In order for duplicate blocks not to be locked in the snapshot the above listed procedure needs to be followed.
Friday, April 25, 2008 at 3:01 pm
Ausmith1
In my case the initial dedupe process took long enough that several snapshots happened along the way. If I had temporarily turned off snapshots during the initial dedupe process then this would not have happened.
Saturday, April 26, 2008 at 1:04 am
Robert
I’m wondering about the transperency to the applications. For instance I used the described setup (not show how space reservation was configured) and with sis status I saw that deduplication worked very well. Within in the LUN I used VMFS for my ESX servers. At the ESX server level I didn’t saw any additional free space.
So my understanding is that dedup works on LUNs but is not transparent to the overlying file system. I’m right?
Robert
Saturday, April 26, 2008 at 12:37 pm
slowe
Robert,
The behavior you are describing is absolutely correct–to the host OS managing the LUN’s file system, you won’t see any deduplication effects. The freed blocks resulting from deduplication allow you to either a) provision additional LUNs or b) take/store more Snapshots. Those freed blocks aren’t passed up to the host file system on the LUN, which means that the host OS doesn’t need to have any knowledge whatsoever of the deduplication process.
Hope this helps!
Saturday, April 26, 2008 at 7:48 pm
Nick Triantos
As Scott correctly pointed out, from an ESX Administrator’s point of view, everything will look like it looked prior to deduplication taking place. The reason for that is that there’s a filesystem in the middle, VMFS in this case. There is no SCSI command that can pass this infomation from the array to the host filesystem or between the host filesystem and the array (the latter case for thin provisioning).
So while the space saving are not realized by the ESX administrator when deduplication is implemented over VMFS (or any other FS), the space savings are evident from a Storage Admins point of view as there’s now capacity available for more LUNs or Snapshots.
The behavior is different with NFS. Because there’s no other layer of indirection between the array and the host, then the array’s filesystem features are exposed directly to the ESX admin. That means blocks that are freed are immediately available for use by the ESX Admin. From an Storage Admins perspective there’s nothing to be done.
Saturday, April 26, 2008 at 7:51 pm
Nick Triantos
BTW…What I mean by “any other FS” is that the behavior would be the same, if I were to use deduplication on an HP-UX host on a LUN layered with JFS.
Wednesday, May 14, 2008 at 4:07 am
Gary
Great article Scott. I am wrestling with VMWare and ASIS at the moment and the biggest consideration for me was volume size with regard to ASIS limits (2TB on FAS3040 I think). I am hitting performance issues at the moment with SCSI reservation errors, and it looks like I need to break out my VMs into smaller LUNs (currently 500GB LUN, thin provisioned, with 15 vms per LUN). I initially kept one LUN per volume in order to keep the management easy, but I think having several smaller LUNs should reduce SCSI reservation errors. I am hoping that moving machines to a new LUN in the same volume will keep the dedupe amount around the same (as I snapmirror the volume and don’t want to increase the transfer volume too much).
I am looking forward to Ontap 7.3 as it will dedupe at the aggregate level (which should increase my savings significantly) and they were also talking about increasing ASIS limits and SAN performance.
In the future I might move from FC to NFS, what are your experiences with this (if any)?
Wednesday, May 14, 2008 at 7:10 am
slowe
Gary,
I use NFS quite a bit with VMware. I haven’t had the opportunity to do any performance testing, but I can say that subjectively it appears to perform very well. Since Data ONTAP manages the file system (rather than layering a host file system on a LUN), it makes it easy to leverage Snapshots. Check out these articles for more information:
http://blog.scottlowe.org/tag/nfs
Thanks for reading!
Tuesday, May 20, 2008 at 6:54 pm
Larry Freeman
Hi Scott, great information in your blogs. BTW we’ve just written a configuration guide for NetApp dedupe and LUNs. There are actually 5 basic configurations that will drive the “freed” blocks to the LUN overwrite area, the volume free pool, the aggregate free pool, or a combination of the above. I’ve posted the paper on the NetApp dedupe community site:
http://communities.netapp.com/community/our_products_and_solutions/deduplication
Tuesday, May 20, 2008 at 9:55 pm
slowe
Larry,
Excellent document, I just finished reading it. I may have to post an article just on that document!
Wednesday, May 21, 2008 at 2:26 pm
Larry Freeman
Thanks Scott. BTW I worked through 10 revisions of this document before I could get all our experts to agree on the exact behaviour of dedupe within a NetApp LUN! As we move forward with more apps running dedupe on LUNs I hope this doc becomes very useful. We are seeing broad adoption with VMware and I am starting to notice customers experimenting with dedupe on SharePoint, Oracle, Exchange and SQL. Thanks again for you great blog Scott.