VMware ESX, NIC Teaming, and VLAN Trunking with HP ProCurve

In an earlier article about VMware ESX, NIC teaming, and VLAN trunking, I described what the configuration should look like if one were using these features with Cisco switch hardware. It’s been a quite popular post, one I will probably need to update soon.

In this article, I’d like to discuss how to do the same thing, but using HP ProCurve switch hardware. The article is broken into three sections: using VLANs, using link aggregation (NIC teaming), and using both together.

Using VLAN Trunking

To my Cisco-oriented mind, VLANs with ProCurve switches are handled quite differently. Port-based VLANs, in which individual ports are assigned to one or more VLANs, allow a switch port to participate in that VLAN as either an untagged fashion or in a tagged fashion.

The difference here is really simpler than it may seem: the untagged VLAN can be considered the “native VLAN” from the Cisco world, meaning that the VLAN tags are not added to packets traversing that port. Putting a port in a VLAN in untagged mode is essentially equivalent to making that port an access port in the Cisco IOS world. Only one VLAN can be marked as untagged, which makes sense if you think about it.

Any port groups that should receive traffic from the untagged VLAN need to have VLAN ID 0 (no VLAN ID, in other words) assigned.

A tagged VLAN, on the other hand, adds the 802.1q VLAN tags to traffic moving through the port, like a VLAN trunk. If a user wants to use VST (virtual switch tagging) to host multiple VLANs on a single VMware ESX host, then the ProCurve ports need to have those VLANs marked as tagged. This will ensure that the VLAN tags are added to the packets and that VMware ESX can direct the traffic to the correct port group based on those VLAN tags.

In summary:

  • Assign VLAN ID 0 to all port groups that need to receive traffic from the untagged VLAN (remember that a port can only be marked as untagged for a single VLAN). This correlates to the discussion about VMware ESX and the native VLAN, in which I reminded users that port groups intended to receive traffic for the native VLAN should not have a VLAN ID specified.
  • Be sure that ports are marked as tagged for all other VLANs that VMware ESX should see. This will enable the use of VST and multiple port groups, each configured with an appropriate VLAN ID. (By the way, if users are unclear on VST vs. EST vs. VGT, see this article.)
  • VLANs that VMware ESX should not see at all should be marked as “No” in the VLAN configuration of the ProCurve switch for those ports.

Using Link Aggregation

There’s not a whole lot to this part. In the ProCurve configuration, users will mark the ports that should participate in link aggregation as part of a trunk (say, Trk1) and then set the trunk type. Here’s the only real gotcha: the trunk must be configured as type “Trunk” and not type “LACP”.

In this context, LACP refers to dynamic LACP, which allows the switch and the server to dynamically negotiate the number of links in the bundle. VMware ESX doesn’t support dynamic LACP, only static LACP. To do static LACP, users will need to set the trunk type to Trunk.

Then, as has been discussed elsewhere in great depth, configure the VMware ESX vSwitch’s load balancing policy to “Route based on ip hash”. Once that’s done, everything should work as expected. This blog entry gives the CLI command to set the vSwitch load balancing policy, which would be necessary if configuring vSwitch0. For all other vSwitches, the changes can be made via VirtualCenter.

That’s really all there is to making link aggregation work between an HP ProCurve switch and VMware ESX.

Using VLANs and Link Aggregation Together

This section exists only to point out that when a trunk is created, the VLAN configuration for the members of that trunk disappears, and the trunk must be configured directly for VLAN support. In fact, users will note that the member ports don’t even appear in the list of ports to be configured for VLANs; only the trunks themselves appear.

Key point to remember: apply your VLAN configurations after your trunking configuration, or else you’ll just have to do it all over again.

With this information, users should now be pretty well prepared to configure HP ProCurve switches in a VMware ESX environment. Feel free to post any questions, clarifications, or corrections in the comments below, and thanks for reading!

Tags: , , , , ,

  1. Wade H.’s avatar

    Hi Scott,

    Do you enable flowcontrol on the ProCurve switches. I have had some issues with IP storage, ESX, and flowcontrol on HP Procurves. My research came up with references to some procurves using a poor/non-standard implementation of flow control. Do you have any experience with this?

  2. Francois Menard’s avatar

    Does this work with ESXi as well ?

  3. slowe’s avatar

    Francois,

    I haven’t tested it with ESXi, but I don’t really see any reason why it wouldn’t work on any vSwitch other than vSwitch0. To change vSwitch0, you generally need command-line access, which of course isn’t possible with ESXi. In addition, you can’t use the Remote CLI because of network connectivity.

    Otherwise, it should work fine.

    Good luck!

  4. Stig’s avatar

    Fransois

    I currently have it running with ESXi and it works just beautifully. you can change the Vlan of vSwitch0 using the configurations menu on the ESXi.

    Happy hunting.

  5. Mark Masson’s avatar

    Can you do link aggreation across multiple switches? In my scenario I am looking at a HP server wth 8 nics, 4 application and 4 storage. Using two HP ProCurve 3500YL’s, can I create one team for the application ports and spread them out over two switches? I know you can with Cisco when they are stacked, but I don’t know about doing this with HP???

  6. Josh Finn’s avatar

    Mark,

    Though I haven’t tried it yet i think you can “Mesh” the two switches. once that is done I believe you can perform functions across switches.

  7. andrew young’s avatar

    Mark

    You need to buy a Premium License for both 3500yl’s then you can use VRRP.

  8. Dave Dunn’s avatar

    We have a Dell R900 with 4 onboard Broadcom Nics. We need to know hwo to disable ‘flow control’ on those nics. I am not a linux/unix guy, so please provide command that would allow this. Thank you in advance.

  9. John Smer’s avatar

    I just got ESXi set up, bought a Procurve 1800-8G as a result of this post, but the setup directions are still unclear to me.

    I have an ESXi setup with 2 Intel Pro 1000 GT NICs. They’re both set up as VSwitch0, and they appear to be working well.

    I’m still not entirely clear on what I must do on the HP Procurve 1800-8G side in order to get it set up correctly. I’ve tried turning on Trunk1 and associating it with ports 7&8 (where the 2 NICs for the ESXi is plugged into) and I then lose connectivity with the rest of my network.

    The PC1800-8G setup is back to straight defaults except having jumbo frames turned on. What must be done first? Is it possible to simply post a step-by-step guide?

  10. Jeremy L. Gaddis’s avatar

    @John Smer

    Just a guess, but check your VLAN configuration on “TrkX” after it is configured.

  11. Jared’s avatar

    This is a great post, but there is a major GOTCHA which I couldn’t find anywhere else. You MUST set the trunk on EVERY VLAN on which it will receive traffic! So if you want the trunk to receive traffic for VLAN 5,6, & 7 then first setup the trunk, then add it to each VLAN with the ports tagged.
    The command would be:
    configure
    vlan 5 tagged Trk1
    vlan 6 tagged Trk1
    vlan 7 tagged Trk1 (Use your own Vlan-id’s and trunk names of course)

    This article is also extremely helpful
    http://docs.hp.com/en/J4240-90039/apds01.html

    I hope I just saved someone hours of work!

  12. Patrick’s avatar

    Well I finally took the time today to really learn about vlans and trunking cause man for some reason vlans and trunking confuse me. But thanks to this post its a little more clear. Thanks guys

  13. Tom Ranson’s avatar

    I have comissioned ESX implimentations with Cisco switching for a number of years and have become acustomed to that way of working, however the network which I now manage is almost entirely ProCurve based.

    In configuring pNIC teaming between ESX hosts and ProCurve 3500/5400/8200 family devices, I encountered an odd and frustrating issue with link aggregation (or ‘trunking’ as it is refered to in the ProCurve world).

    The issue was that I would configure a 2 port (static - no negotiation protocol) trunk on a ProCurve 5400, i.e.

    # trunk ethernet B20,D20 Trk25 trunk

    I would then configure the appropriate vlans (all tagged) atop of the trunk interface, Trk25 in this example; with vlan 4001 being the service console/management network. The ESX server is configured to expect the service console/management traffic TAGGED (non-default).

    (config)# vlan 4001
    (config-vlan)# tagged Trk25
    (config)# vlan 4002
    (config-vlan)# tagged Trk25

    These two ports would be connected to an ESX host; Immediately I would loose all connectivity to the host… however if I disabled one of the two links (leaving the trunk config in place on the switch) connectivity returned to normal. I double checked the load balancing configuration on the vSwitch(0) of the ESX host - it was correctly set to ‘Route based on ip hash’ (non-default), but no dice…

    The issue lay in that there were overriding load balancing configuration settings within the configuration of the ‘Management Network’ of the ESX host; these settings were the ‘defaults’ (i.e. route based upon virtual port ID), however they were configured (tick boxes) to override the configuration of the associated vSwitch! - the Management Network appeared to be set to override a fair number of vSwitch configuration options by default, and was causing all of my issues! Disabling all of these override options (which is appropriate in our configuration), so that the vSwitch configuration options are the only ones considered resolved the ‘no connectivity’ issues - obviously (well, now anyway) the Management Network was attempting to load balance in a means that the ProCurve switch could not handle - i.e. originating virtual port ID.

    To summarise:

    Switch configuration:

    # trunk ethernet B20,D20 Trk25 trunk
    (config)# vlan 4001 < Management Network vlan
    (config-vlan)# tagged Trk25
    (config)# vlan 4002 < Virtual servers VLAN #1 of x
    (config-vlan)# tagged Trk25

    etc. etc.

    ESX host configuration:

    vSwitch0 configuration (’NIC Teaming’ tab):

    Load Balancing: Route based on IP hash (non-default setting)
    Network Failover Detection: Link Status Only
    Notify Switches: Yes
    Failback: Yes
    Active Adapters: vmnic0, vmnic1
    Standby Adapters: None
    Unused Adapters: None.

    Management Network configuration:
    VLAN ID: 4001
    IP Address: x.x.x.x/yy

    Load Balancing: UNDEFINED - inherit from associated vSwitch
    Network Failover Detection: UNDEFINED - inherit from associated vSwitch
    Notify Switches: UNDEFINED - inherit from associated vSwitch
    Failback: UNDEFINED - inherit from associated vSwitch
    Override vSwitch Failover Order: UNDEFINED - inherit from associated vSwitch

    The same settings would also need to be made for any ‘VM Networks’ - the critical setting being ‘Load Balancing = Route based on ip hash’ or ensure the networks associated with the vSwitch are set to inherit the properties of the vSwitch.

    A default setting which caused me a good couple of hours head scratching. Hope this info will be of use to others.

    Tom

  14. Dario’s avatar

    Finally I have a working virtual infrastructure lab with some old servers and a few hp 1800 (web managed) switches. But before I start learning other topics I have some questions on how vlans are used in a virtual scenario.
    I’m still a bit confused on where you actually create the vlan.
    From the post it seems that you do it at layer-2 with the port based tagging method, but then, you would need a port for each ingress connection, the switch would tag it and forward it to each corresponding 802.1Q trunk / tagged link.
    Is this correct or am I missing something? Isn’t it like in the real world where you create the vlan at layer-3 and then use the switching infrastructure for forwarding? I’m asking this because you mention that the switch is tagging the packets that will reach the vSwitch, while to me it seems it just forwards packets based on the vlan id it finds in the tag.

    Since we are talking about port based vlans, the only case where my switch can modify a packet traveling the trunk is based on that port (PVID). This is the only way I can do it with my (cheap) equipment, but it also make me wonder if there is a better way of doing it, maybe a layer-3 switch?

    Hope that somebody could clear this to me.

  15. MurrayJ’s avatar

    I have a ProCurve 5412zl and tryed everything above and cannot get the other vlans to work.
    Trk1 - Trunk
    vlan 0 (Default) - untagged
    vlan 106 - tagged
    vlan 110 - tagged
    vlan 111 - tagged
    vSwitches only see vlan 0

    What’s wrong?

  16. GregD’s avatar

    Murray, Have you created the VLAN on the virtual networks you want to see the tagged VLANS on? I think that may be necessary or the virtual networks will only see the untagged VLAN otherwise.

    I could be wrong however.

  17. Bryan’s avatar

    802.1q trunking on ESX 3.X-4.X requires a bit of uncommon knowledge about the network port group configuration. Setting the network port group VLAN ID to 4095 enables 802.1q trunking for any vmnic attached to that port group. If the VLAN ID is set to none, then it is untagged. And, obviously, if set to any other VLAN ID then only that VLAN ID will pass. NIC teaming has no impact upon trunking.

  18. John’s avatar

    It’s already mentioned above, but does anybody know how to provide load balancing with 2 Pnics, each connected to a different Procurve swich? I know the 3500 yl can do that, but is there a way to trick the 2910 to do this?

  19. Damo’s avatar

    Im setting up my first ESX 4 server (were using 2) using an iSCI SAN with a Pro Curve 2910al layer 3 switch and im having some problems with taggin the trunks (so I think).

    Basically the problem just came up when I tried to assign more than one nic to the vswitch in my esx1 server. I lost the network connection when I did that. So im thinking either I need to set up the esx server to route based on IP hash (which Ive read about but havnt done), or I have tagged the trunks in the wrong vlans (still trying to get my head out of the cisco clouds and into the pro curve ones).

    So heres how I set up the Pro Curve switch……
    Port 1&2 are the management ports for the SAN which is Vlan 102
    Port 3-6 are the iSCSI ports for the SAN which is Vlan 101
    Port 7-12 are trunked together as Trk1 and tagged in all Vlans (101,102,110 & 10)
    Port 13-18 are trunked together as Trk2 and tagged in all Vlans (101,102,110 & 10)
    I also have a workstation Vlan which is Vlan 110 and a servers Vlan which is Vlan 10
    I have flow control setup on the trunked ports (not sure if this needs to be set or not)
    IP Routing is enabled

    Can someone please tell me what im doing wrong, im more of a network guy than a virtual guy and with my little (I have some) virtual knowledge im finding it hard to work out if my switch is setup correctly or not. I have another engineer here who is competent in VM ware but a little unsure when it comes to the networking side of it.

    Really appreciate any help

  20. Peter F’s avatar

    Nice article! I have configured one of our ESX 4 hosts with trunking and VLAN’s as decribed here and it seems to work.

    However, I would like to verify that I really get more bandwith because of the trunk. So I set up af VM with a 10Gb vNIC, and a physical test server with two teamed 1Gb NIC’s with LACP enabled at the physical switch (5412zl) so it runs 2Gb/s in both directions. The VM host has four 1-Gb NIC’s trunked as descibed on this page.

    Then I copy a huge GB-file from the VM to my physical test server. But I never get more throughput than approx. 70% of 1 Gb, and the HP teaming software at the physical test server shows that it all runs on one NIC.

    So my questions are:
    - Should it actually be possible to use more than one physical NIC’s bandwith from a VM when using trunked NIC’s?
    - How can I test that it works when my test as decribed above don’t seem to work?

    Peter :-)

  21. slowe’s avatar

    Peter,

    See this:

    http://blog.scottlowe.org/2008/07/16/understanding-nic-utilization-in-vmware-esx/

    That post should give you the information you’re seeking. Thanks!

  22. mark’s avatar

    hi

    i have just setup our esx with four nics and set them to ip route in the networking on config on v centre

    on the hp switch i have added these for cables to trk1. my question is do i need to enable flow control and do i just set the group to tunk and not use lacp. other than that is there anything else i need to do and will i get the benefit of a better performace with a 4gb trunk.

    excellent site full of some excellent info thanks

    mark

  23. adam’s avatar

    I’d be curious what the best practice is for flow control both on the switching side and the vmware esx side as well!

  24. dom_b’s avatar

    I’ve set up as instructed as far as I can tell but I’m getting very slow throuput (400Mbps max with 4 Gb NICs trunked on an 1800-24g). Any ideas? I’ve created a lengthy post on the ESXi forum with the problem! If anyone could help I’d much appreciate it.

    http://communities.vmware.com/thread/260302?tstart=0

  25. Eric’s avatar

    I am looking at replacing an aging pair of Cisco switches with either 3750’s or HP Procurve 3500’s. I would prefer HP (we replaced the rest of the Cisco gear except for 2 which are used for our SAN farm with ESX. One concern I have is about NIC teaming. I read the following on another website:

    —————
    As it turns out HP does not support teaming over 2 switches ( yet ).
    This means the ports in Trunk 1 of switch x does not share the MAC adresses with the ports in Trunk 1 of switch y.

    So there are 2 solutions left.

    1. Connect both NICs from the ESX server on 1 HP switch.In VMWare configure the VSwitch with both NICs as active. Setup a trunk on this switch with 2 ports.
    Load balancing works, but no redundancy.
    2. Connect 1 NIC to switch x and 1 NIC to switch y. In VMWare configure the VSwitch with 1 NIC as active and 1 NIC as standby. Setup a trunk in not necessary.
    No Load balancing, but you will have redundancy.

    From http://www.jorink.nl/2010/02/massive-tcp-retransmits-with-esx-3-5-u5-hp-procurve-3500yl-and-equallogic-san/
    ——————–
    The Equallogic SAN keeps the spare ports dark until needed so that isn’t a problem, I’m worried that my ESX Hosts will not have redundancy for iSCSI traffic. Two questions, is the above concern valid and if so could I put NIC’s in standby and attach to the other switch to get around this?

  26. vmChris’s avatar

    Scott -

    I love the blog. I get a lot of good info on here. I too am running HP Procurve switches and am configuring some trunks for ESXi hosts. Would you know if LACP is supported with ESXi update 1? Or should I just use the trunk type “trunk”? Thanks.

  27. slowe’s avatar

    vmChris,

    To my knowledge, LACP is not supported. You should select “trunk” as your trunk type.

    Good luck!

  28. Arnljot’s avatar

    Hi folks. I know very little about ESX, but I have been working with Procurve switches for many years. In the old days, before VMWare virtual switches and blade switches came along, us networking people did all the networking, and the server people did what they do best. Now however, I see server people doing a lot of bad networking on their blade switches and virtual switches, and switch people that doesn’t want to touch this new stuff. The result is bad networking setups in the interface between the heavy server machines and the multi-gig networks.

    I have a problem I need help with in relation to Procurve and VMWare, but first I would like to shed some light on some of the networking issues above, and I’m happy to answer questions about Procurve if you have.

    I have worked with teaming on HP servers, but not on VMWare. So too illustrate some teaming/trunking/load balancing points I’l l explain about teaming (HP on Windows) and Procurve, and hope that some of this applies to VMWare. I’m assuming two NICs, but that is irrelevant, there can be many:
    HP supported three basic teaming mechanisms: Network Fault Tolerance (NTB), Transmit Load Balance (TLB) and Switch-assisted Load Balancing (SLB). For big servers, we want fault tolerance all the way, not just on NICs but also on server switches. NTB is really simple, one NIC is active and one passive. They can be connected to one switch but preferably to two separate switches. No configuration is necessary on the switches either way. TLB on the other hand can transmit on both NIC, but can receive only on one. The reason for this, is that the team can only have one primary MAC address. The closest router have a table connecting the NICs IP address with its (one) MAC address. The switches have tables that tell each switch on which port each MAC address in the network is located. So a MAC address, can only be associated with ONE port. NICS can be connected to one or two switches, and no configuration requred on the switches. SLB requires configuration of the switches. SLB have both NICs connected to two Trunk ports (Link Aggregated), so a Trunk must be defined on the swith ports as described by others above. Thes give you redundancy (not on switches) and load balancing/double bandwith in both directions.

    Procurve did not use to have support for trunks (link aggregation) going to different switches (Distributed Trunking), but now supports what they call Server-toSSwitch Distributed Trunking on their 6600, 3500, 5400 and 8200 series. This is the best of both worlds: full redundancy and loadbalancing as well. I have a very good HP whitepaper describing how the network work together with server teaming. Also a Procurve paper comparing Cisco trunking/link aggregation with Procurve; a very good primer for ex Cisco folks…

    I see some misconceptions about link aggregation above. Some folks complain about bandwidth issues after trunking two 1-gig ports. The “conversation” between two computers (two MACs (not the Apple type)) will always follow the same path for each packet. The team or switch make a decision for the first packet, on witch port to send it, and then all the rest of the packets follow the same path. Between these two computers, the bandwidth will remain the same as before trunking (link aggregation). The next computer sending data may be allocated to the other port, but may as well use the same port as the first computer. The more computers sending data over the trunk, the more evenly the load will be balanced, and the collection of computers will have near 2 gig connection. We call it statistical load balancing (its NOT load based).

    In the HP teaming described above, you can set op MAC hashing or IP hashing to be used for load balancing. If all traffic from servers to clients are routed, you better chose IP hash. If it is switched, you can choose either. Since all traffic from a server come from the same MAC, and all routed traffic going to the routers interface is addressed to the same MAC, this scenario will provide NO load balancing if MAC hashing is used for routed traffic. To mu knowledge yhis apply to most switch manufacturers, as MAC hashing is the method used for link aggregation on switches,

    I see some of you are using 1800 switches for server connectivity. They wil provide good throughput, but I consider them baby switches for home use. I recommend some real switches for the data center like 2910, 3500 (if you need full L3/L4 stack) or even better the 6600 series which are expensive data center switches where you can even configure the airflow.

    I’m sorry for the lengthy lecturing, but you see, I used to be an Procurve Instructor;-) Hope some of this is useful, I just wanted to share…..

    OK, if you’re still with me, maybe I can get some help with my problem. A customer have been having trouble that resembles Tom Ransons problem above. They have defined a team, but loose all connectivity when connecting both ports in the trunk. They have now disconnected one port in the team, but still experiencing problems: They use vMotion, and after moving a VM from one ESX host to another, it takes 5 minutes before the network connection is up. Both ESX servers now only have one NIC each connected to the same switch. I have been doing some network sniffing, and I see that if we have a ping (within the same subnet) from the VM that we move, the outgoing ping request is moving to the new NIC, but the reply is going to the first NIC for 5 mins.

    5 minutes is the same as the switch MAC timeout value; that is if the switch does not “hear” from a MAC address for 5 mins., the entry is erased from the MAC table. This is really odd, because when the switch see the ping request from the next ESX host on the new port, it should immediately acknowledge that the MAC has moved, and associate the MAC address fith the new port. This is very basic L2 switch functionality, and I don’t think this has to to with the switches. So I suspect the setup of the networking on the server.

    Seeing Erics scenario above, I realize that I should use his solution 2. But I would like functionality like HPs TLB server teaming. Is it possible to have NICs distributed on two switches, and send traffic on both NICs? It has two use a secondary MAC address for the second interface, so as not to confuse the network. If the same MAC address appear on different ports on the same or different switches, traffic will be very intermittent.