Storage Alignment Document

NetApp has recently released TR-3747, Best Practices for File System Alignment in Virtual Environments. This document addresses the situations in which file system alignment is necessary in environments running VMware ESX/ESXi, Microsoft Hyper-V, and Citrix XenServer. The authors are Abhinav Joshi (he delivered the Hyper-V deep dive at Insight last year), Eric Forgette (wrote the Rapid Cloning Utility, I believe), and Peter Learmonth (a well-recognized name from the Toasters mailing list), so you know there’s quite a bit of knowledge and experience baked into this document.

There are a couple of nice tidbits of information in here. For example, I liked the information on using fdisk to set the alignment of a guest VMDK from the ESX Service Console; that’s a pretty handy trick! I also thought the tables which described the different levels at which misalignment could occur were quite useful. (To be honest, though, it took me a couple of times reading through that section to understand what information the authors were trying to deliver.)

Anyway, if you’re looking for more information on storage alignment, the different levels at which it may occur, and the methods used to fix it at each of these levels, this is an excellent resource that I strongly recommend reading. Does anyone have any pointers to similar documents from other storage vendors?

Tags: , , , , , , , , , ,

18 comments

  1. Tom’s avatar

    This NetApp article is the first I have seen that uses create partition primary align=32 for MS Windows.

    Every other article I have seen uses align=64.

    Which is better / more appropriate, 64 or 32?? Both are evenly divisible by 4096…

    Or is align=32 peculiar to NetApp??

    Thank you, Tom

  2. slowe’s avatar

    Tom, I’ve seen recommendations to use both 32 and 64. AFAIK, either value is fine. Anyone else have more information to share on this question?

  3. Abhinav Joshi’s avatar

    Hi Tom,

    Great question.

    I am one of the co-authors of this document. Align=32 is not peculiar to NetApp. It is used as an example value here. We don’t care as long as the starting partition offset is divisible by 4096. So both 32 and 64 are viable options.

    Regards,

    Abhinav Joshi

  4. Tom’s avatar

    Thanks everyone for answering.
    I think we all feel better now.

    Tom
    P.S. Nice avatar too Scott, better than many!!

  5. Fletch’s avatar

    Hi, can you quantify the impact of misalignment?
    I want to be able to say we improved throughput by X% by fixing this problem. How do we best measure it?

    The section from the doc does not offer an answer for quantifying – can anyone else?

    thanks!

    3.2 IMPACT
    Misalignment may cause an increase in per-operation latency. It requires the storage array to read from or
    write to more blocks than necessary to perform logical I/O. Figure 3 shows an example of a LUN with and
    without file system alignment. In the first instance, the LUN with aligned file systems uses four 4KB blocks
    on the LUN to store four 4KB blocks of data generated by a host. In the second scenario, where there is
    misalignment, the NetApp storage controller has to use five blocks to store the same 16KB of data. This is
    an inefficient use of space, and the performance suffers when the storage array has to process five blocks to
    read or write what should only be four blocks of data. This results in inefficient I/O, because the storage
    array is doing more work than is actually requested.

  6. Reuv’s avatar

    Can you change/fix the alignment of a disk already in production, or do you need to create a new properly aligned disk, and migrate the data?

  7. Reuv’s avatar

    Also – check out:

    How do I diagnose misaligned I/O on Windows hosts? https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb36108

  8. slowe’s avatar

    Reuv,

    You can fix the alignment of a virtual disk file already in production, although VM downtime is required. See the TR for more information on mbrscan and mbralign, some tools that will help with this process.

  9. Fletch’s avatar

    I’m trying to get a handle on how important this alignment problem is – for that numbers would be great! ;)
    Can anyone address the performance impact in terms of real numbers ?
    Please correct/forgive me – I am going to apply some long archived compsci algorithmics – this looks like O(n + 1) where n is the number of blocks being read/written.
    worst case: n = 1 (50% “performance hit”)
    average case: n = ?? (need to research this – Easy and Efficient Disk I/O Workload Characterization in VMware ESX Server report states: “Collection of detailed characteristics of disk I/O for workloads is the first step in tuning disk subsystem performance”

    It strikes me as ironic that netapp would go to the trouble of identifying a problem with their product and publish a paper on it without providing the requisite analysis of the underlying business impact.

    eg I don’t like the temperature of my netapp, but the fact its at 70 degrees probably only adds .00000000001% latency in the electrons on the bus and the resulting impact of cooling it by 20 degrees is negligible compared to say adding a PAM (ram cache) card.

    Request For Improvement on tr-3747: add some quanitification around the impact and how to measure before and after for improvement
    thanks :)

  10. slowe’s avatar

    Fletch, I recall seeing somewhere that at a 4K block size in a misaligned environment, approximately 1/8 of all reads will be affected. This is consistent with the 12% performance impact that is mentioned in VMware’s partition alignment document. You’ll get your biggest return on effort when you fix the alignment of high I/O systems.

  11. Fletch’s avatar

    Right, in the n+1 analysis:
    worst case (n = 1) 50%
    n=2 (33%)
    n=3 (25%)
    n=4 (20%)
    n=5 (16.7%)
    n=6 (14.3%)
    n=7 (12.5%) <— 12.5% extra blocks read if average read/write is 7 blocks
    n=8 (11.1%)
    n=9 (10%)

    Biggest return will be on high IO vms with AND _small_ average reads/writes

    The priv set diag; stats show lun; priv set admin command tr-3747 mentions to quantify the performance hit does not apply to NFS datastores…
    Thanks

  12. Fletch’s avatar

    Wow, here is a great analysis from the overall system performance perspective:

    http://www.princeton.edu/~unix/Solaris/troubleshoot/diskio.html

    I used DTrace to analyze what is probably one of my busiest VMs (Solaris 10 mail server running sendmail, mailman, spamassassin) – bitesize.d from the DTracetoolkit provided these stats showing a histogram of the IO sizes

    8192 |@@@@@@@@@@@@@@@@ 385
    16384 |@@@@@@@@@@ 258
    32768 |@@@@@@ 160
    512 |@@@@@@@@@@@@@@@@@@@@@ 154
    65536 |@@@ 73
    1024 |@@@@@@@@ 61
    4096 |@@ 44

    So in the case of this VM the average IO size (in terms of frequency) is 1-2 4k blocks (call it 2 blocks) – that means if its misaligned on the NetApp NFS datastore, its reading on average 1/3 extra block (33%)
    Next question: how much does this actually add to the overall operation?
    Certainly from the _Netapp_ perspective this is could be considered an aggregiously cost in terms of wasted effort – but from the system side I’d like to know how much more elapsed time there is for the misaligned 2 block read versus the aligned 2 block read…

  13. Chris W.’s avatar

    What about us NFS-datastore users..?

  14. slowe’s avatar

    What about NFS users? VMDK alignment is still necessary, but the underlying FlexVol needs no alignment. All the tools work the same way, as far as I know. I personally have used mbrscan/mbralign on NFS datastores, so I know they work.

  15. kyle’s avatar

    Could someone mind telling me how to get my hands on the mbralign tools for esx? I am not a netapp customer :)

    Netapp – good work.. This really makes me truely want to consider netapp when our lease on our san is up. We love HDS, but they dont seem to know alot about vmware. Netapp seems to though.

  16. Kyle Tucker’s avatar

    FYI, we are a NetApp shop that just moved our VMs to Sun 7000 series NAS. The mbrscan/mbralign tools work well on both platforms, so I suspect any NFS datastores.

  17. Bug’s avatar

    What about non-virtualized environments? we have a RHEL box with iscsi Lun from a netapp filer. having very poor performance. was refered to this document by tech support. is there a non-distructive way to fix allignment on production iscsi lun

  18. Darren DeHaven’s avatar

    For clarification:
    - NetApp 4KB Logical Blocks
    - Dell EqualLogic 64KB Strips
    - EMC Symmetrix 64KB Blocks

    So, if you are only a NetApp shop, for now and forever, then you require your starting offset and formating block size to be divisable by 4KB (such as 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, etc..).
    However if you use other SANs, such as Dell or EMC, then the block size needs to be divisable by at least 64KB, such as (64KB, 128KB, 265KB, etc…)

    In windows, I set the starting offset to 1MB, and 64KB block size.
    For linux I use a 1MB offset, and use the “largefile” option to format, such as “mkfs.ext3 -T largefile “

Comments are now closed.