vSphere on NFS Design Considerations Presentation

This presentation is one that I gave at the New Mexico, New York City, and Seattle VMUG conferences (this specific deck came from the Seattle conference, as you can tell by the Twitter handle on the first slide). The topic is design considerations for running vSphere on NFS. This isn’t an attempt to bash NFS, but rather to educate users on the things to avoid if you’re going to build a rock-solid NFS infrastructure for your VMware vSphere environment. I hope that someone finds it useful.

My standard closing statements goes here–your questions, thoughts, corrections, or clarification (always courteous, please!) are welcome in the comments below.

Tags: , , , , ,

  1. Jason Boche’s avatar

    Very nice Scott. For two years I managed a large Tier 1 VMware on NFS shop (500 hosts, 25 vCenter Servers). Designed right with the proper care and feeding, NFS is both a viable and scable option. That’s not to say it doesn’t have it’s own set of unique headaches which range from design, to impelmentation, to troubleshooting.

  2. Ryan Russell’s avatar

    Just configure a different NAS server IP for each datastore you’d like to talk to. You get N interfaces of aggregate bandwidth that way. It won’t be balanced evenly between them, but works fine. (I had this setup at the previous job.)

  3. slowe’s avatar

    Jason, thanks for the feedback! I agree completely—a well-designed NFS infrastructure can be rock-solid, but it takes good design and a knowledge of the potholes to get you there. Sounds like you did a great job!

    Ryan, that will (usually) work fine if you are using link aggregation, which will give you greater aggregate bandwidth but not—as I pointed out in the presentation—greater per-flow/per-datastore bandwidth. I say “usually” because it all depends on whether the IP addresses you’ve selected will hash out to multiple links (they might not). This configuration won’t help for configurations using the default vSwitch/dvSwitch load balancing policy. Thanks for your comment!

  4. Dan’s avatar

    A company I work for whats to start using NFS, and asked me to design the needed network connections.

    I am in now way VMware admin, soII though its just an easy task. Thanks for the post, now I know what it is not that simple.

  5. Kevin golding’s avatar

    Excellent NFS 101 Scott. In reference to Ryan, If your NAS appliances can support sub interfaces or alias addresses on an aggregated LINK, connecting to different Datastores via different VMkernels/subnets works a treat.

    The problem with NFS has always been the lack of load balancing.. that is going to change, and as soon as it does I can see iSCSI Datastores becoming less common.

  6. slowe’s avatar

    Dan, it’s not that NFS isn’t simple—it just has its own set of considerations. The same could be said for iSCSI (hardware or software) and Fibre Channel (traditional or FCoE). Good luck!

    Kevin Golding, all block protocols still have one advantage over NFS, and that’s support for RDMs. While many view RDMs as a necessary evil, if there are applications that require RDMs you’ll be thankful for the ability to support those applications. Coming back to the idea of using multiple interfaces on your NAS appliance, keep in mind that using multiple interfaces only increases aggregate throughput to the NAS appliance, not per-datastore bandwidth, and only helps the ESX/ESXi hosts if you are using link aggregation.

  7. roadrunner’s avatar

    the slides are blocked by EMC IT, sigh…

  8. Jerry’s avatar

    Scott,

    Very useful and timely presentation. It was great to see this presentation here in Seattle.

    Regarding Jumbo frames during the presentation, if i captured your thoughts accurately you effective explained:
    - 10Gb CNA’s typically mask performance gains utilizing Jumbo Frames – performance benefits minimal.
    - Beneficial to low IOP / Large Payload apps / instances.
    - 95% of apps won’t realize a benefit from Jumbo Frames.

    What are your thoughts on UCS implementations:
    - 10 Gb connectivity
    - All vmnics funnel through a set of uplink connections to the physical switches which effectively would require all interfaces to be configured with Jumbo frames.
    - SQL Best Practices (per vBCA video) recommend Jumbo Frames.

    I am attempting to define a solid Reference Architecture for UCS implementations I am involved in. Your views and input would be greatly appreciated!

    Thanks – Jerry

  9. slowe’s avatar

    Jerry, my comments regarding jumbo frames are *general* recommendations. Ultimately, the “best practices” for any given environment will depend upon the people, the products, the applications, and the business. If Microsoft recommends the use of jumbo frames for SQL, then use jumbo frames. No problem. My point is that—in general—most applications and most workloads won’t benefit significantly, and therefore it isn’t worth the extra complexity. Every environment is different—do what is best for your environment.

    Thanks for your comment!

  10. Collin C MacMillan’s avatar

    Scott:

    I’m finishing part 1 in a series on NFS deployment in SMB space and your slide deck popped up as a reference a couple of times. Your overview succinctly hits the high-points, and since I didn’t hear the preso I can assume you elaborated on a couple of IP-hash relevant issues:

    1) hashing related to switches (typically layer-2);
    2) hashing related to NAS (layer-2 to layer-4).

    vSphere has no deterministic influence on return-path choice (other than how it advertises and withdraws its MAC and sources its vmknic IPs), but those elements listed above do; and different switch/NAS vendors use different algorithms to determine how deeply hashing goes. For instance, low-end switches will likely use (or be limited to) layer-2 hashes yet Linux and Solaris derivatives (NAS) will use layer-3 or layer-4 hashing as default (multiplexing datastore traffic per session/TCP-data port).

    Higher-end switches will often use layer-3 hashes, with more advanced ones using layer-4 (beyond layer-4 has dubious value for NFS). I have not seen any studies on the importance, risks or performance factors related to hash alignment where storage is concerned, although the biggies – out-of-order delivery and latency – can be assumed to be affected to various degrees. These factors could shift switch vendor (and/or model) selection for heavy NFS shops and could imply performance issues at the network layer based on NFS volume/loads (i.e. NFS packet rate increasing switch packet inspection overhead).

    Also, the issue of VMware Tools guest OS tuning was absent in the deck, although I came across your valuable input on Frank Denneman’s blog on the topic as well as Jason Boche’s NFS postings so I have to assume a question or two came up in the preso. This topic is a very important one to my posting – especially where the differences between Windows and Linux treatments are concerned.

    I’ve yet to find a VMware source that explains why Windows timeout adjustments per VMware Tools are still 60 seconds and Linux timeouts are 180. Your recommendation was 125 seconds back in 2009, but have you shifted your guidance to agree with Jason Boche and NetApp at 190 seconds? Likewise, as an EMC’r with closer ties to VMware now, do you have any insight into the discrepancy between Linux and Windows defaults per VMware Tools?

    Thanks for all you do in support of the VMware community – it might be a job now, but your passion still shows!

Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>