Optimizing iSCSI Traffic with ESX

A response in this VMTN forums thread by Paul Lalonde got me to thinking about iSCSI traffic, network designs, and the software initiator provided with ESX Server.  The statement was this (in response to questions about how ESX uses network links to communicate with an iSCSI storage array):

In a single server environment, 802.3ad would only offer failover. A single ESX box would only ever use one network path for iSCSI traffic.

In my lab, I’ve setup a Network Appliance storage system with a virtual interface (a “VIF” in NetApp parlance), which is essentially 802.3ad link aggregation (in fact, newer versions of Data ONTAP can use LACP to build link aggregates).  On the ESX side, I’ve created Gigabit EtherChannels and configured the vSwitches to use IP hash load balancing, with the thought that this would help improve network utilization.  But after reading that statement (and following up on some other related threads; see these del.icio.us bookmarks), I started wondering if there was a better way to architect the network for iSCSI traffic from ESX Server.

I have some ideas, and have already started working on implementing and testing those ideas in the lab.  As soon as I have more information, I’ll share it here.  In the meantime, any iSCSI gurus out there care to share their network designs for optimizing ESX-iSCSI traffic?

Tags: , , , ,

  1. Adam Sherman’s avatar

    Hold on, I think I’m missing the point here. IEEE 802.3ad (Otherwise known as LACP.) provides full-duplex aggregated links using multiple physical links between two devices.

    So, for example, I have a NetApp connected to a ProCurve switch using two GigE copper connections. This provides me with 2 Gbit/s (4 Gbit/s if you count full-duplex operation) of bandwidth between these devices. I do exactly the same thing with my Solaris servers. (Solaris calls them “aggregate interfaces”.)

    Note that the ONTAP “vif” command doesn’t create 802.3ad aggregates by default and that you can not do this if you are using two switches for redundancy. (Unless they are really shiny expensive switches.)

    Am I not getting the gist of what you are saying?

    Thanks,

    A.

    P.S. I would appreciate an email if you update this post… :-)

  2. slowe’s avatar

    Adam,

    Part of this discussion is specific to ESX Server’s iSCSI implementation, which is an open source version of Cisco’s Linux iSCSI initiator. With regards to ESX Server in particular, the software iSCSI initiator will not take advantage of multiple links between the host and the storage system, even if you are using LACP/802.3ad/Gigabit EtherChannel on both the server and the storage system. This article was primarily targeted at seeing if there are some ways to get around this ESX-specific limitation.

    On a broader scale, it is my understanding (and I could be incorrect) that all link aggregates (whether they be PAgP, LACP/802.3ad, or Fast/Gigabit EtherChannel) only serve to enhance traffic flows in a many-to-many/one-to-many/many-to-one scenario. Multiple iSCSI initiators connecting to a single iSCSI target (a NetApp, for example) would benefit from a link aggregate (or a multi-mode VIF, in NetApp terminology) because the switch and the storage system could spread the traffic across all the links in the aggregate. However, any traffic flow between a single server and the storage system could not exceed the bandwidth of a single link in the aggregate. Likewise, a Solaris server providing services to multiple clients would benefit from an aggregate interface because the traffic from all the clients can be spread across the individual links. Each individual client, however, would be limited to the bandwidth of a single member of the aggregate.

    NetApp multi-mode VIFs *can* use 802.3ad/LACP as of Data ONTAP 7.2, if I’m not mistaken.

    I hope this makes sense to you. Thanks!

  3. Don Williams’s avatar

    One word of caution. If you use IP HASH load balancing and have non-stacked, (trunked) switches you’ll be very unhappy. Or if you have any stand-by pNICs.
    performance and stability problems have been seen in such configurations.

    The best way I’ve seen to get both paths working is to have multiple volumes on your iSCSI SAN.

    Take care.

    Don

  4. slowe’s avatar

    Don,

    Even multiple volumes (LUNs) on the iSCSI SAN doesn’t seem to make any difference, at least not as far as I could tell.

    About the other…is this documented anywhere, even a VMTN forums thread? I’d love to see some additional information on this. Thanks!

  5. Don Williams’s avatar

    Hello,

    How are you testing the results? Did you change the teaming mode to IP HASH? Customers who have made the change notice an ‘improvement’ I usually see the result in the network stats. You see better utilization of both links.

    It’s not documented anywhere that I know of. I found it by experimentation and observation. Passed it to customers who reported good results. Excluding one who had trunked switches and stand-by pNICs. Then it fell down. :-(

    Don

  6. slowe’s avatar

    Don,

    I haven’t created a formal, quantitative testing plan; it’s more “off the cuff.” :)

    I do have the vSwitches configured for IP hash teaming mode, but thus far haven’t seen any real difference in the various configurations I’ve tried. I intend to conduct more testing, more fully documenting each of the test scenarios, and then I’ll post more results here.

  7. Don Williams’s avatar

    Hello,

    vSwitches? The only ones that would likely see improvement is the VMKernel port vSwitch. non-iSCSI traffic seems to do fine with the default setting. There should be one vSwitch that has the VMKernel port, with teamed pNICs in IP HASH teaming. Then use multiple (two is fine) EQL volumes. You should see good activity on both pNICs.

    Don

  8. Andrew Miller’s avatar

    Actually, just due to how etherchannel works, a single MAC addresses traffic can only ever go across a single gig port on the etherchannel.

    So….if you have 1 ESX server with a dual gigabit etherchannel going to a NetApp, you have fault tolerance (one gig link can go down with no problems) but no bandwidth increase.

    If you have 2 ESX server with a dual gigabit etherchannel going to a NetApp, each server can use up to a full gigabit link (so you don’t really get a bandwidth increase per server but do keep the servers from fighting over bandwidth).

  9. Andrew Miller’s avatar

    And…to provide a useful anecdote…. :-)

    In our scenario, I currently have 3 ESX servers (pretty beefy, 12 or 24 GB RAM, 4 or 8 core). For the iSCSI link, I have a 2 port gig etherchannel setup going to a Cisco 3750 switch. That switch then has a dual gig etherchannel going to a NetApp 3050 clustered head (one dual gig etherchannel to each head actually).

    The Cisco 3750 has an RPS on it to give it dual power supplies (one AC, one DC).

    So far it’s been very stable.

  10. slowe’s avatar

    Andrew,

    You’re absolutely correct, of course; I pointed that out in my comment earlier (http://blog.scottlowe.org/2007/06/26/optimizing-iscsi-traffic-with-esx/#comment-32537), but probably didn’t state it clearly enough. Thanks for the clarification!

    The other side of this is specific to the ESX iSCSI initiator. To get around the limitation with GigabitEtherChannel (or NetApp VIFs), the idea would be to create multiple iSCSI targets (with different IP addresses) so that the software iSCSI initiator would more effectively use multiple links. EtherChannel is great, as you’ve pointed out, for helping to keep multiple connections from fighting over bandwidth. I was trying to tackle more effective utilization from a single-server perspective.

    Thanks for your comment!

  11. vmware training’s avatar

    Hi Scott

    I just wanted to say, I stumbled upon your blog via a google search for vmware. I’d just like to say thanks for creating it and it looks as though you have a real good, helpful community here too.

    I’d also like to say I’ve learned a good few things myself from reading your posts, keep up the good work it’s rather inspiring.

    Kind regards

    Scott

  12. Albert Widjaja’s avatar

    Hi Scott,

    For your info, I’d like to share my hard to believe experience in configuring my iSCSI SAN with you here:

    http://img38.imageshack.us/img38/1397/deployment.jpg

    MD3000i is just a small entry level SAN device which can only use one single cable to access the iSCSI target, so no matter how complex the configuration is, the I/O performance will not be as great as the adding managed switch to perform VLAN trunking.

    According to the following blog:
    http://virtualgeek.typepad.com/virtual_geek/2009/01/a-multivendor-post-to-help-our-mutual-iscsi-customers-using-vmware.html –> the last question #4 is the eye opener

    so by using the deployment diagram that i supplied on top, i have to accept that it is not possible to achieve high performance greater than single cable connection :-| due to the limitation of the ESX Sofware iSCSI initiator. Even by using the Intel Pro 1000 TOE enabled pNIC it’s all the same slow result.

    hope that helps you in the future,

    I feel bad after spending this much money without any greater performance of my Local Server RAID-5 SATA drive :-|

  13. dan’s avatar

    I have a similar set up as yours Scott, with a single ESX host (6 nics) and a single SAN (4 NIC’s). I set up the SAN with 2 GB NIC’s using LACP going to a Cisco 3750 with LACP. And from my ESX server using 2 NIC’s and IP Hash to the Cisco switch using LACP. Anyways, ESX is only sending and receiving on the iSCSI network using one of the NIC’s.

    I supposed I can assign the SAN NIC’s different IP’s and set up multipathing… I dont know if that will provide a performance increase though…