More on vSwitch Load Balancing

I had a customer contact me about scaling network throughput when using NFS datastores. Specifically, this customer was interested in knowing if it was possible to utilize more than 1 NIC with IP-based storage. The customer is currently using link aggregation (EtherChannel on a Cisco switch). I pointed the customer to my post on NIC utilization, in which I explain the prerequisites for utilizing more than 1 NIC in this sort of configuration. To refresh your memory, those prerequisites are:

  • The vSwitch must be configured for “Route based on IP hash”
  • The physical NICs connected to the vSwitch as uplinks must all be configured as active in the failover order
  • The physical switch must be configured for link aggregation
  • There must be multiple, unique source-destination IP address pairs involved

The customer responded with a question (which I’m paraphrasing here): “That’s all? It will just automatically use more than one link?”

Well…sort of.

There is one little caveat. Cisco IOS uses a hashing algorithm to determine which link a particular traffic flow between a source and destination will use. This algorithm is controlled by the port-channel load-balance command. Assuming that you’re using source-destination IP hashing, that means the Cisco switch will use a hash of the source IP address and the destination IP address to determine which link it will use. This page has more detailed information.

It’s theoretically possible, based on the number of links in the port channel, that some traffic flows between different pairs of source-destination IP addresses might end up on the same link. That means it’s not necessarily just as simple as setting up multiple NFS exports or iSCSI targets on different IP addresses—you also need to know if the IP addresses you are using will actually result in the traffic being distributed across the links.

How does one tell? Good question, and one I’m glad you asked. You can tell using this command (this command assumes you are using IP-based hashing):

switch# test etherchannel load-balance interface <Port channel interface> ip <Src IP Addr> <Dst IP Addr>

So, let’s say that you have an ESX/ESXi host with a VMkernel interface whose address is 172.16.5.10. Let’s say that you have a storage array (NetApp FAS, EMC Celerra, etc.) that supports NFS and you want to mount two different NFS exports on two different IP addresses so that traffic from this ESX/ESXi host to the storage array. You could use the test etherchannel load-balance command to help you determine which address could help ensure traffic distribution across the links:

switch# test etherchannel load-balance interface Po3 ip 172.16.5.10 172.16.5.100

For more examples of what the output would look like, take a look at this image. This was taken off a Cisco Catalyst 3560G running my test lab (and yes, the IP addresses have been changed to protect the innocent).

This would give you one way of testing whether your link aggregation configuration would actually use multiple links, or only a single link due to the IP hash calculation. Also, don’t forget that esxtop can also show you NIC utilization; here’s an example of both uplinks being used in this sort of configuration.

Unfortunately, what I can’t tell you right now is what algorithm the vSwitch itself uses to place traffic onto the uplinks. Does it follow the same sort of mechanism as the Cisco switch? I don’t know. If anyone has any information on that, it would be tremendously helpful.

If anyone has any other pertinent information or resources on this topic, please add them to the comments below.

UPDATE: Duncan Epping pointed out an article by Ken Cline from earlier this year provides the mechanism VMware uses to determine which uplink on a vSwitch will be used. This algorithm performs an XOR operation on the Least Significant Byte (LSB) of the source and destination IP addresses, then finds the modulus of that result and the number of uplinks. Thanks, Duncan and Ken!

Tags: , , , ,

  1. slowe’s avatar

    Ah, yes–I see that! Under the “IP Hash Based Load Balancing” section, Ken indicates that the VMware vSwitch uses an XOR operation on the Least Significant Byte (LSB) of both the source and destination IP addresses, then calculates the modulus to determine which uplink should be used. I’ll update the article accordingly.

  2. Niall’s avatar

    Good article, Scott.

    Just one thing to clarify in your example: you’re showing the src IP (172.16.5.10) as being the ESX/ESXi host’s VMk, so I guess that example looks at how storage traffic is forwarded by the switch to the storage (connected on port-channel3), right?

    If the reader wants to test how traffic might be load shared across the links _to_ the host, then the VMk IP address would be the *destination* parameter to “test etherchannel load-balance”.

    More generally, it’s worth noting that since ESX uses a different decision algorithm than the switch it’s quite possible to get a nice balance of traffic across all NICs going _to_ the storage, but all traffic coming back on a single NIC (or vice-versa). And it gets even more fun when there are multiple bundled connections to the storage, too :-)

    Cheers,
    Niall.

  3. slowe’s avatar

    Niall,

    Yes, you are correct—you would reverse the IP addresses if you were testing connections from the switch back to the host.

    As for the ESX/ESXi vSwitch using a different algorithm, that is true. Some “back of the napkin” calculations I performed showed the vSwitch’s algorithm produces pretty well-distributed traffic, and the Cisco page to which I linked above shows the traffic distribution across port channels of various sizes. It doesn’t look too terribly likely that you’d end up with the scenario you describe (although it *is* possible).

    Thanks for reading and for your comment!

  4. Nick’s avatar

    Hi Slowe,

    Great article as usual.

    Im ok for all you wrote on your articles, but just would like some kind of precision about one point, which seems to be a little bit confused in all articles i’ve read so far.

    For the exemple :
    You’ve got 4 pNIC constituing a link aggregrate, binded to a vSwitch. Each pNIC use the same subnet from one system storage.
    What do you bind on the vSwitch for the Vmkernel ? 1 vmkernel, or 4 vmkernel.

    I’ve read the article about multivendor on iscsi, and i’m not really sur what is the best solution.

    For me, one vmkernel is enough even for a 4×1 aggregate in that case.I’m maybe wrong, just would like your advices on this point.

    Cheers,
    Nick.

  5. slowe’s avatar

    Nick,

    Using multiple pNICs in a link aggregate with a single VMkernel interface won’t do you any good for IP-based storage (NFS or iSCSI) *UNLESS* you have multiple targets (i.e., multiple NFS datastores on different IP addresses, or multiple iSCSI targets). Otherwise, the nature of link aggregation means that you’ll only use a single link out of that link aggregate. The others will go unused!

    I strongly recommend you have a look at this blog post:

    http://blog.scottlowe.org/2008/07/16/understanding-nic-utilization-in-vmware-esx/

    It should help provide some additional information. Thanks!

  6. Nick’s avatar

    Slowe,

    Thanks for these precisions.

    Actually, im on a IP-based storage (ISCSI), with with 4 targets (different iscsi targets)

    According to your blog post, “•Each VMkernel NIC will utilize multiple uplinks only if multiple destination IP addresses are involved. Conceivably, you could also use multiple VMkernel NICs with multiple source IP addresses, but I haven’t tested that configuration.”

    I’ll try some kind of bench with both configuration (1 vmknic and “4″ vmknic), if you haven’t tested yet, and provide results.

    Cheers,
    Nick.

  7. charlie’s avatar

    excellent post as always. issue for us has always been how to get it to balance the traffic across both nics when going to the same IP. we have 100TB plus of nfs vm data but this is our single biggest issue.

  8. Timothy Massey’s avatar

    I understand why using IP-based storage where the storage only has a single IP address and the host only has a single VMkernel port will only use a single channel in spite of aggregation. What you’ve written as the solution is to use IP-based storage that supports multiple targets.

    But what do you do when you have an e.g. NFS-based NAS server that only supports a single IP address?