Using NetApp Deduplication with Block Storage

Building on my earlier article on setting up NetApp deduplication, I wanted to follow up with some information on using NetApp deduplication with block storage (LUNs presented via Fibre Channel or iSCSI).

For the most part, using NetApp deduplication with block storage is a lot like I described earlier:

  • You (obviously) still need the NearStore and deduplication (A-SIS) licenses installed on the controller(s).
  • You will still turn deduplication on using the “sis on” command for the FlexVol containing the LUNs.
  • Limitations on the size of the FlexVol still apply.
  • You use the “sis status” command to check on the status of deduplication, and the “sis config” command to see the deduplication schedule.

OK, so what’s different? Well, it has to do with how LUNs are provisioned on a NetApp storage system. I’ve blogged before about managing LUN space requirements on a NetApp, and about using LUN clones vs. FlexClones. That second article, in particular, really goes into detail on how LUNs are implemented on top of NetApp’s file system, WAFL. Since LUNs are represented by WAFL as a single file, they are also normally “space reserved,” meaning that the maximum size of the LUN is allocated at the time of creation. If you create a 50GB LUN, then Data ONTAP creates a 50GB file right away. (For readers out there who are well-versed in NetApp storage, I know that’s a bit of a simplification, but bear with me.)

What does this have to do with deduplication? Great question. If the LUN is space reserved—meaning that the maximum size of the LUN is allocated up front and remains allocated to the LUN—then the file that represents the LUN won’t ever decrease in size to reflect deduplication savings, and deduplication therefore does you absolutely no good whatsoever. This is not to say that deduplication doesn’t work, just that it won’t help you at all.

Fortunately, there’s an easy fix for this. When creating the LUN, simply uncheck the box marked “Space Reserved” and allow Data ONTAP to allocate space to the LUN out of the containing FlexVol on an as-needed basis. Because the file that represents the LUN can grow in size, it can also shrink in size, and deduplication will cause the file that represents the LUN to decrease in size. This then allows you to provision additional LUNs from the same FlexVol to take advantage of the space savings resulting from deduplication.

I know that seems a bit confusing; I’ll probably post another article with some more in-depth discussions of the details. (Either that, or I’ll encourage my NetApp readers to chime in below in the comments.)

So, in summary, when using NetApp deduplication with block storage:

  • you’ll setup and configure deduplication on the FlexVol containing your LUN(s) just like described in my earlier article;
  • you’ll uncheck the “Space Reserved” checkbox when creating the LUNs to be deduplicated;
  • you won’t see the space savings from the host’s perspective and therefore can’t store more data in that LUN than the size of the LUN; but
  • you will be able to provision additional LUNs in that same FlexVol that can be presented back to host for additional storage.

I hope this helps clarify some of the questions or issues surrounding the use of NetApp deduplication with block storage. Feel free to add information, experiences with deduplication and block storage, or ask additional questions in the comments below.

UPDATE: There are some additional considerations about how to provision LUNs along with NetApp deduplication that warrant a more in-depth discussion. Look for a follow-up post within the next few days.

Tags: , , , ,

  1. Ausmith1’s avatar

    I just enabled ASIS on one of our filers and one thing I noticed is that you may need to watch your snapshot sizes carefully. I found that because the ASIS process shrunk the data so much (93% for 0.5TB of VMs) that the used snapshot size blew way past the 20% allocated. If you were close to filling a volume then this could be very problematic. Just got to keep an eye on the data trends I guess…

  2. slowe’s avatar

    Yes, this is one thing that you will need to watch out for. I believe that NetApp recommends deleting existing Snapshots, then turning on deduplication and letting it run, then taking Snapshots. This will help prevent that problem.

    Thanks for reading!

  3. Nick Triantos’s avatar

    That’s correct Scott. The proper procedure is to:

    1) Deduplicate
    2) Snapshot

    The Snapshot space grew because there were snapshots in place prior to enabling deduplication. Since there are pointer updates when blocks are freed, from a snapshot perspective these are treated as changed blocks, thus the snapshot expands and the duplicate blocks, while have been freed from the Active file system are still locked in the snapshot.

    In order for duplicate blocks not to be locked in the snapshot the above listed procedure needs to be followed.

  4. Ausmith1’s avatar

    In my case the initial dedupe process took long enough that several snapshots happened along the way. If I had temporarily turned off snapshots during the initial dedupe process then this would not have happened.

  5. Robert’s avatar

    I’m wondering about the transperency to the applications. For instance I used the described setup (not show how space reservation was configured) and with sis status I saw that deduplication worked very well. Within in the LUN I used VMFS for my ESX servers. At the ESX server level I didn’t saw any additional free space.

    So my understanding is that dedup works on LUNs but is not transparent to the overlying file system. I’m right?

    Robert

  6. slowe’s avatar

    Robert,

    The behavior you are describing is absolutely correct–to the host OS managing the LUN’s file system, you won’t see any deduplication effects. The freed blocks resulting from deduplication allow you to either a) provision additional LUNs or b) take/store more Snapshots. Those freed blocks aren’t passed up to the host file system on the LUN, which means that the host OS doesn’t need to have any knowledge whatsoever of the deduplication process.

    Hope this helps!

  7. Nick Triantos’s avatar

    As Scott correctly pointed out, from an ESX Administrator’s point of view, everything will look like it looked prior to deduplication taking place. The reason for that is that there’s a filesystem in the middle, VMFS in this case. There is no SCSI command that can pass this infomation from the array to the host filesystem or between the host filesystem and the array (the latter case for thin provisioning).

    So while the space saving are not realized by the ESX administrator when deduplication is implemented over VMFS (or any other FS), the space savings are evident from a Storage Admins point of view as there’s now capacity available for more LUNs or Snapshots.

    The behavior is different with NFS. Because there’s no other layer of indirection between the array and the host, then the array’s filesystem features are exposed directly to the ESX admin. That means blocks that are freed are immediately available for use by the ESX Admin. From an Storage Admins perspective there’s nothing to be done.

  8. Nick Triantos’s avatar

    BTW…What I mean by “any other FS” is that the behavior would be the same, if I were to use deduplication on an HP-UX host on a LUN layered with JFS.

  9. Gary’s avatar

    Great article Scott. I am wrestling with VMWare and ASIS at the moment and the biggest consideration for me was volume size with regard to ASIS limits (2TB on FAS3040 I think). I am hitting performance issues at the moment with SCSI reservation errors, and it looks like I need to break out my VMs into smaller LUNs (currently 500GB LUN, thin provisioned, with 15 vms per LUN). I initially kept one LUN per volume in order to keep the management easy, but I think having several smaller LUNs should reduce SCSI reservation errors. I am hoping that moving machines to a new LUN in the same volume will keep the dedupe amount around the same (as I snapmirror the volume and don’t want to increase the transfer volume too much).

    I am looking forward to Ontap 7.3 as it will dedupe at the aggregate level (which should increase my savings significantly) and they were also talking about increasing ASIS limits and SAN performance.

    In the future I might move from FC to NFS, what are your experiences with this (if any)?

  10. slowe’s avatar

    Gary,

    I use NFS quite a bit with VMware. I haven’t had the opportunity to do any performance testing, but I can say that subjectively it appears to perform very well. Since Data ONTAP manages the file system (rather than layering a host file system on a LUN), it makes it easy to leverage Snapshots. Check out these articles for more information:

    http://blog.scottlowe.org/tag/nfs

    Thanks for reading!

  11. Larry Freeman’s avatar

    Hi Scott, great information in your blogs. BTW we’ve just written a configuration guide for NetApp dedupe and LUNs. There are actually 5 basic configurations that will drive the “freed” blocks to the LUN overwrite area, the volume free pool, the aggregate free pool, or a combination of the above. I’ve posted the paper on the NetApp dedupe community site:
    http://communities.netapp.com/community/our_products_and_solutions/deduplication

  12. slowe’s avatar

    Larry,

    Excellent document, I just finished reading it. I may have to post an article just on that document!

  13. Larry Freeman’s avatar

    Thanks Scott. BTW I worked through 10 revisions of this document before I could get all our experts to agree on the exact behaviour of dedupe within a NetApp LUN! As we move forward with more apps running dedupe on LUNs I hope this doc becomes very useful. We are seeing broad adoption with VMware and I am starting to notice customers experimenting with dedupe on SharePoint, Oracle, Exchange and SQL. Thanks again for you great blog Scott.

  14. Drew’s avatar

    What happened to this White Paper? The link appears to be broken now…

    http://communities.netapp.com/community/our_products_and_solutions/deduplication

    It sounds like the exact Doc that I’m looking for to implement DeDup on our ESX FC LUNs on NetApp…

    Please let me know (anyone)

    Drew

  15. Mar’s avatar

    Sorry for my stupidity here, but I am not a storage admin.
    I have a question, so when you have deduplication enabled and you present a 100GB LUN to the server, the OS sees 100GB no matter what is going on in the background on the filer side, now say the LUN has 70GB of data, does the filer see the space allocated as 70GB or 100GB? We have recently run into issues with losing our connection to the LUNS’s and my storage administrator was seeing the space allocated on the filer smaller then what the OS was seeing.

  16. slowe’s avatar

    Mar, it’s not a stupid question—it’s actually a very pertinent question. You’re correct in that the host accessing the LUN will always only see the space allocated to the LUN. It’s on the back end that you need to monitor space usage. This makes using deduplication with block storage a task that can be quite administrator-intensive, or else you’ll run into issues like what you’ve described (and already experienced). The storage administrator has to closely monitor storage utilization on the array (even more so if you are using snapshots with your deduplicated block storage) or else you run the risk of taking LUNs (and potentially applications) offline.

    Good luck!

  17. Peter Just’s avatar

    This is an old post, but a very good one. We have been running deduplication in our environment for quite a while. It seems to us that when we run a partial deduplication it appears to take longer than a “full” deduplication. This seems counterintuitive. Have you experienced this as well?