A response in this VMTN forums thread by Paul Lalonde got me to thinking about iSCSI traffic, network designs, and the software initiator provided with ESX Server. The statement was this (in response to questions about how ESX uses network links to communicate with an iSCSI storage array):
In a single server environment, 802.3ad would only offer failover. A single ESX box would only ever use one network path for iSCSI traffic.
In my lab, I’ve setup a Network Appliance storage system with a virtual interface (a “VIF†in NetApp parlance), which is essentially 802.3ad link aggregation (in fact, newer versions of Data ONTAP can use LACP to build link aggregates). On the ESX side, I’ve created Gigabit EtherChannels and configured the vSwitches to use IP hash load balancing, with the thought that this would help improve network utilization. But after reading that statement (and following up on some other related threads; see these del.icio.us bookmarks), I started wondering if there was a better way to architect the network for iSCSI traffic from ESX Server.
I have some ideas, and have already started working on implementing and testing those ideas in the lab. As soon as I have more information, I’ll share it here. In the meantime, any iSCSI gurus out there care to share their network designs for optimizing ESX-iSCSI traffic?
Tags: ESX, iSCSI, Networking, Virtualization, VMware
-
Hold on, I think I’m missing the point here. IEEE 802.3ad (Otherwise known as LACP.) provides full-duplex aggregated links using multiple physical links between two devices.
So, for example, I have a NetApp connected to a ProCurve switch using two GigE copper connections. This provides me with 2 Gbit/s (4 Gbit/s if you count full-duplex operation) of bandwidth between these devices. I do exactly the same thing with my Solaris servers. (Solaris calls them “aggregate interfaces”.)
Note that the ONTAP “vif” command doesn’t create 802.3ad aggregates by default and that you can not do this if you are using two switches for redundancy. (Unless they are really shiny expensive switches.)
Am I not getting the gist of what you are saying?
Thanks,
A.
P.S. I would appreciate an email if you update this post…
-
One word of caution. If you use IP HASH load balancing and have non-stacked, (trunked) switches you’ll be very unhappy. Or if you have any stand-by pNICs.
performance and stability problems have been seen in such configurations.The best way I’ve seen to get both paths working is to have multiple volumes on your iSCSI SAN.
Take care.
Don
-
Hello,
How are you testing the results? Did you change the teaming mode to IP HASH? Customers who have made the change notice an ‘improvement’ I usually see the result in the network stats. You see better utilization of both links.
It’s not documented anywhere that I know of. I found it by experimentation and observation. Passed it to customers who reported good results. Excluding one who had trunked switches and stand-by pNICs. Then it fell down.
Don
-
Hello,
vSwitches? The only ones that would likely see improvement is the VMKernel port vSwitch. non-iSCSI traffic seems to do fine with the default setting. There should be one vSwitch that has the VMKernel port, with teamed pNICs in IP HASH teaming. Then use multiple (two is fine) EQL volumes. You should see good activity on both pNICs.
Don
-
Actually, just due to how etherchannel works, a single MAC addresses traffic can only ever go across a single gig port on the etherchannel.
So….if you have 1 ESX server with a dual gigabit etherchannel going to a NetApp, you have fault tolerance (one gig link can go down with no problems) but no bandwidth increase.
If you have 2 ESX server with a dual gigabit etherchannel going to a NetApp, each server can use up to a full gigabit link (so you don’t really get a bandwidth increase per server but do keep the servers from fighting over bandwidth).
-
And…to provide a useful anecdote….
In our scenario, I currently have 3 ESX servers (pretty beefy, 12 or 24 GB RAM, 4 or 8 core). For the iSCSI link, I have a 2 port gig etherchannel setup going to a Cisco 3750 switch. That switch then has a dual gig etherchannel going to a NetApp 3050 clustered head (one dual gig etherchannel to each head actually).
The Cisco 3750 has an RPS on it to give it dual power supplies (one AC, one DC).
So far it’s been very stable.
-
Hi Scott
I just wanted to say, I stumbled upon your blog via a google search for vmware. I’d just like to say thanks for creating it and it looks as though you have a real good, helpful community here too.
I’d also like to say I’ve learned a good few things myself from reading your posts, keep up the good work it’s rather inspiring.
Kind regards
Scott
-
Hi Scott,
For your info, I’d like to share my hard to believe experience in configuring my iSCSI SAN with you here:
http://img38.imageshack.us/img38/1397/deployment.jpg
MD3000i is just a small entry level SAN device which can only use one single cable to access the iSCSI target, so no matter how complex the configuration is, the I/O performance will not be as great as the adding managed switch to perform VLAN trunking.
According to the following blog:
http://virtualgeek.typepad.com/virtual_geek/2009/01/a-multivendor-post-to-help-our-mutual-iscsi-customers-using-vmware.html –> the last question #4 is the eye openerso by using the deployment diagram that i supplied on top, i have to accept that it is not possible to achieve high performance greater than single cable connection
due to the limitation of the ESX Sofware iSCSI initiator. Even by using the Intel Pro 1000 TOE enabled pNIC it’s all the same slow result.hope that helps you in the future,
I feel bad after spending this much money without any greater performance of my Local Server RAID-5 SATA drive
-
I have a similar set up as yours Scott, with a single ESX host (6 nics) and a single SAN (4 NIC’s). I set up the SAN with 2 GB NIC’s using LACP going to a Cisco 3750 with LACP. And from my ESX server using 2 NIC’s and IP Hash to the Cisco switch using LACP. Anyways, ESX is only sending and receiving on the iSCSI network using one of the NIC’s.
I supposed I can assign the SAN NIC’s different IP’s and set up multipathing… I dont know if that will provide a performance increase though…




13 comments
Comments feed for this article
Trackback link: http://blog.scottlowe.org/2007/06/26/optimizing-iscsi-traffic-with-esx/trackback/