Before we get into the details, allow me to give credit where credit is due. First, thanks to Dan Parsons of IT Obsession for an article that jump-started the process with notes on the Cisco IOS configuration. Next, credit goes to the VMTN Forums, especially this thread, in which some extremely useful information was exchanged. I would be remiss if I did not adequately credit these sources for the information that helped make this testing successful.
There are actually two different pieces described in this article. The first is NIC teaming, in which we logically bind together multiple physical NICs for increased throughput and increased fault tolerance. The second is VLAN trunking, in which we configure the physical switch to pass VLAN traffic directly to ESX Server, which will then distribute the traffic according to the port groups and VLAN IDs configured on the server. I wrote about ESX and VLAN trunking a long time ago and ran into some issues then; here I’ll describe how to work around the issues I ran into at that time.
So, let’s have a look at these two pieces. We’ll start with NIC teaming.
Configuring NIC Teaming
There’s a bit of confusion regarding NIC teaming in ESX Server and when switch support is required. You can most certainly create NIC teams (or “bonds”) in ESX Server without any switch support whatsoever. Once those NIC teams have been created, you can configure load balancing and failover policies. However, those policies will affect outbound traffic only. In order to control inbound traffic, we have to get the physical switches involved. This article is written from the perspective of using Cisco Catalyst IOS-based physical switches. (In my testing I used a Catalyst 3560.)
To create a NIC team that will work for both inbound and outbound traffic, we’ll create a port channel using the following commands:
s3(config)#int port-channel1
s3(config-if)#description NIC team for ESX server
s3(config-if)#int gi0/23
s3(config-if)#channel-group 1 mode on
s3(config-if)#int gi0/24
s3(config-if)#channel-group 1 mode on
This creates port-channel1 (you’d need to change this name if you already have port-channel1 defined, perhaps for switch-to-switch trunk aggregation) and assigns GigabitEthernet0/23 and GigabitEthernet0/24 into team. Now, however, you need to ensure that the load balancing mechanism that is used by both the switch and ESX Server matches. To find out the switch’s current load balancing mechanism, use this command in enable mode:
show etherchannel load-balance
This will report the current load balancing algorithm in use by the switch. On my Catalyst 3560 running IOS 12.2(25), the default load balancing algorithm was set to “Source MAC Address”. On my ESX Server 3.0.1 server, the default load balancing mechanism was set to “Route based on the originating virtual port ID”. The result? The NIC team didn’t work at all—I couldn’t ping any of the VMs on the host, and the VMs couldn’t reach the rest of the physical network. It wasn’t until I matched up the switch/server load balancing algorithms that things started working.
To set the switch load-balancing algorithm, use one of the following commands in global configuration mode:
port-channel load-balance src-dst-ip (to enable IP-based load balancing)
port-channel load-balance src-mac (to enable MAC-based load balancing)
There are other options available, but these are the two that seem to match most closely to the ESX Server options. I was unable to make this work at all without switching the configuration to “src-dst-ip” on the switch side and “Route based on ip hash” on the ESX Server side. From what I’ve been able to gather, the “src-dst-ip” option gives you better utilization across the members of the NIC team than some of the other options. (Anyone care to contribute a URL that provides some definitive information on that statement?)
Creating the NIC team on the ESX Server side is as simple as adding physical NICs to the vSwitch and setting the load balancing policy appropriately. At this point, the NIC team should be working.
Configuring VLAN Trunking
In my testing, I set up the NIC team and the VLAN trunk at the same time. When I ran into connectivity issues as a result of the mismatched load balancing policies, I thought they were VLAN-related issues, so I spent a fair amount of time troubleshooting the VLAN side of things. It turns out, of course, that it wasn’t the VLAN configuration at all. (In addition, one of the VMs that I was testing had some issues as well, and that contributed to my initial difficulties.)
To configure the VLAN trunking, use the following commands on the physical switch:
s3(config)#int port-channel1
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094
This configures the NIC team (port-channel1, as created earlier) as a 802.1q VLAN trunk. You then need to repeat this process for the member ports in the NIC team:
s3(config)#int gi0/23
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094
s3(config-if)#int gi0/24
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094
If you haven’t already created VLAN 4094, you’ll need to do that as well:
s3(config)#int vlan 4094
s3(config-if)#no ip address
The “switchport trunk native vlan 4094″ command is what fixes the problem I had last time I worked with ESX Server and VLAN trunks; namely, that most switches don’t tag traffic from the native VLAN across a VLAN trunk. By setting the native VLAN for the trunk to something other than VLAN 1 (the default native VLAN), we essentially force the switch to tag all traffic across the trunk. This allows ESX Server to handle VMs that are assigned to the native VLAN as well as other VLANs.
On the ESX Server side, we just need to edit the vSwitch and create a new port group. In the port group, specify the VLAN ID that matches the VLAN ID from the physical switch. After the new port group has been assigned, you can place your VMs on that new port group (VLAN) and—assuming you have a router somewhere to route between the VLANs—you should have full connectivity to your newly segregated virtual machines.
Final Notes
I did encounter a couple of weird things during the setup of this configuration (I plan to leave the configuration in place for a while to uncover any other problems).
- First, during troubleshooting, I deleted a port group on one vSwitch and then re-created it on another vSwitch. However, the virtual machine didn’t recognize the connection. There was no indication inside the VM that the connection wasn’t live; it just didn’t work. It wasn’t until I edited the VM, set the virtual NIC to a different port group, and then set it back again that it started working as expected. Lesson learned: don’t delete port groups.
- Second, after creating a port group on a vSwitch with no VLAN ID, one of the other port groups on the same vSwitch appeared to “lose” its VLAN ID, at least as far as VirtualCenter was concerned. In other words, the VLAN ID was listed as “*” in VirtualCenter, even though a VLAN ID was indeed configured for that port group. The “esxcfg-vswitch -l” command (that’s a lowercase L) on the host still showed the assigned VLAN ID for that port group, however.
- It was also the “esxcfg-vswitch” command that helped me troubleshoot the problem with the deleted/recreated port group described above. Even after recreating the port group, esxcfg-vswitch still showed 0 used ports for that port group on that vswitch, which told me that the virtual machine’s network connection was still somehow askew.
Hopefully this information will prove useful to those of you out there trying to set up NIC teaming and/or VLAN trunking in your environment. I would recommend taking this one step at a time, not all at once like I did; this will make it easier to troubleshoot problems as you progress through the configuration.
Tags: Cisco, ESX, Interoperability, IOS, Networking, Virtualization, VLAN, VMware
-
Hi Scott,
Just a couple of notes based on my little experience (I’m doing my first tests in these days @work).My config is ESX 3.0 (4 nic) + 2 Cisco 3750 (stack) + Netapp 3020C
After some testing (yesterday and today) i succeeded configuring NIC Teaming + VLAN Trunking on 2 of the NICs and then use this connection to transport networking data, iSCSI and so on. The ESX is now using a datastore on the NetApp via iSCSI.
That’s what emerged:
1) In my IOS Version 12.2(25)SEB4 the ’show etherchannel load-balance’ reports
EtherChannel Load-Balancing Operational State (src-ip):
so i’ve set “Route based on ip hash” on the nic teaming page.2) i had NOT to use the “switchport trunk native vlan xxx” statement because when i left it on the config i wasnt able to use any VLAN different from xxx. When i deleted the statement i was able to use all of the vlans without problems
3) (this is iSCSI related) if you dont create a service console on the iSCSI network you wont be able to scan/found LUNs on the storage[*]. I think this is due to the fact that our iSCSI network is not routed so the ESX cant reach it without having a service console “foot” on that network (even though Vmkernel of course has one ip on that network).
[*] this means that if you delete the iSCSI service console everything works until you try to find another LUN or reboot the ESX server –> BOOM you cant see the storage anymore (fortunately i found this before going into production :D)
Ciao,
EcioPS sorry for my english
-
Hey Scott,
I really appreciate all the work you took into putting all this together. Thanks. We have just purchased VMware and we are looking to do all that you have talked about. BUT….i was wondiering if you have tried configuring the devices that are communicating via ISCSI to either a Netapp or EMC device to use jumbo frames. We are using Cisco 3750-48 Port Gig switches to hopefully do this.
This is really where I want to take it. Because we don’t have all of our equipment as of yet i am unable to test it to see if it will work.
Please email me let me know what you think. Thanks again.
-
Hey Scott,
Are you saying the Software Iscsi, like Microsoft ISCSI that i would have installed on a wn2k3 server connected to a SAN via ISCSI, doesnt support jumbo frames? The servers that i have this installed on the nics do support jumbo frames. Or are you saying there is something with ESX?
I’m trying to find out all the pros and cons so if you could shed some more light on the situation for me i’d appreciate it.
Thanks.
-
can i just configure the server for link aggregation without configuring etherchannel on the SW?
-
It’s true: with LACP, ESX doesn’t balance traffic toward ONE iSCSI target. You need to configure multiple destinations, using virtual iSCSI IPs.
-
Hey Scott,
You have mention that you have ran into some configuration issues and I was wondering if you would have idea about the issue that I’m having. We are using HP 685 blade enclosures with Pro-curve switches. We have 2 virtual switches created with 2 nics on each switch. we are also using vlan tagging on the switches. The issue that I’m having is that every once in a while when a VM migrates to antoher server it will lose network connectivity. This pops up more frequently when we do patching of the ESX hosts and migrate multiple VM around. One person on the team thinks it is due to an issue with Network Detection Failure set to Beacon Probing. I was wondering if you any suggestions. For the life of me, I have been trying to reproduce the problem but cannot.
Any suggestions woould be appreciated
-
Hi…
one help…
HOw to find MAC address for an ESX server ?thanks * regards
VJ -
Question: If I am going from ESX to an Iscsi device, and my data flows are going from ESX to the Isci device am I really “bonding” for a full X gb of bandwidth. When I look at the nics that are sending data out (for example during a file copy from a guest that is copying from Internet storage (where the C: .vmdk is) to the iSCSI device (where drive d: .vmdk is) I notice only 1 NIC is getting in on the act…IE I’m not filling up all 4 nics I’ve “bonded” for iSCSI, hence I only get 1GB of throughput. Shouldn’t I see all 4 bonded NIC’s just pound away at sending that traffic to the Iscsi device?
-
Just for anyone else looking, I had some major issues with vlan tagging on an IBM blade center with a server connectivity module.
Symptoms were no vlan tagging was working. Only vlan that WAS working was the native vlan.
Under vSwitch0 Properties on the Network Adapters tab, vmnic0’s observed IP range was again only the native vlan (same with the networks under the status box on the right)THE SOLUTION was to start a web session to the switch module, click Non-Default Virtual LANs and add vlans to your port group.
So simple, but I missed it again and again.
Thanks for your blog Scott, it has been helpful time and time again
-Alex
-
Would you care to write a guide for the HP ProCurve switches? I’m trying to make an ESX server with 4 1GbE ports link-aggregate (2 ports, 2 links) but I’m always loosing connectivity if both ports are plugged in.
-
Hi Scott,
This has been bothering me for some time & I can’t seem to find a clear answer, so here it goes: We have ESX servers with 4 nic’s for the VM’s vswitch. 2 of the NIC’s connect to switch A & the other 2 connect to switch B. Both switches are Cisco CAT6500. I would like to configure incoming load balancing by creating a portchannel on each switch with the 2 related NIC’s, effectively having 2 PortChannels connecting to my load balanced (ip hash) vswitch. Is this at all possible ? If not, why ? -
Hi Scott,
I believe I might have an answer to this problem: http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1001938&sliceId=1&docTypeID=DT_KB_1_1&dialogID=14912551&stateId=1%200%2014916516
The firts line of the article states: ESX Server supports link aggregation only on a single physical switch or stacked switches, never on disparate trunked switches.
So I guess I’m out of luck… -
Scott - thanks for the blog. Basic question: I’m using ESX 3.5 with a pair of dual-Gig NICs connected to a 6509 and an iSCSI SAN. I was thinking of a single vSwitch, physical connections teamed & trunked on alternate NIC’s (for redundancy on the chipsets) in active/active mode, with port groups for VM’s, iSCSI/vmKernal and Service Consoles - each using 2 physical NICs (one channel). Does this make sense for maximizing performance and redundancy? Is there a better method? Thanks!
-
Thanks Scott - I’m using a single physical switch (4 NIC’s on the ESX box talking to a single 6509 - jumping blades for redundancy). I took a look at the link on #29 and that along with your comment about ESX not doing a great job utilizing the uplink’s makes me believe I would be best served using 802.3ad aggregation at switch (alternating port pairs) and leaving the ESX load balancing setting to ‘route based on the originating virtual port ID’. As for the iSCSI SAN we are stuck with the single VIP.
I have the luxury of time, so I can actually test a few configurations to see which works best before placing the system into production in Sept.
Looking forward to your test results - and thanks again for the response!
-
I too am attempting to connect an ESX server to redundant switches.
My thought was to have a Vswitch with two NICs (or more), each connected to redundant Cisco switches. The Cisco switches have an interconnect as well, creating a physical loop.
My concern is that I might create a bridging loop between the three switches, since the Vswitch does not support Spanning Tree Protocol.
From what I can find, VMWare says that STP isn’t necessary only because Vswitches don’t bridge to each other. They don’t seem to address the chaos that can ensue in a Cisco world.
1) Are my concerns substantiated?
2) If so, is there a way to enable STP on a Vswitch?
-
hi all,
i have an cisco 6509 switch (9 blades) and esx 3.5.
my network configuration is:
cisco: multiple trunked etherchannels (X), gigabit ports in seperate blades in the switch configured to be an member of an etherchannel with lacp and an allow vlan map. (in short configured as most examples indicate)
esx: created an vswitch with the two fysical nics and on that portgroups that have the vlan tagged.
my results are:
both nics are up when adding a service console i can ping it thus the vlan mapping works.. shutting down one of the nics works.. (i do see the teaming working.)
BUT i see in my cisco switch config the channel interface as down. when issuing an “show int trunk” i do see the both adapters as trunks but not the port channel interface.
as i am used to is that when using a (etherchannal) team on cisco, is that you should see this port channel inertface as up. oteherwise it would be the same as configuring no channel at all.
my questions:
- the only difference at the moment is that i do not have the src-dst-ip setup at the switch side. is this causing the channel to be down. (?)
- how do other cisco alike configs show the etherchannel status (?)
- and trunking status, does it show the etherchannal or the seperate ints.?
- erm any pointer in what i possible doing wrong?Thanks in advance
Soul -
Actually i answerd my own question.
after reading
http://www.vmware.com/files/pdf/virtual_networking_concepts.pdf
i decided to ignore the “etherchannel unconditionally” but place the etherchannel in pure mode on. (and no protocol definitions ofc) this placed my etherchannels as up. what i do find strange is that other with oterh settings do have an up etherchannel.. or possibly not?
ps: also in the document in the link state no lacp should be used. what i find again strange is that across the net i see people using lacp.. strange
-
I skimmed through the replies so if I missed the answer I apologize.
I was able to get trunking and teaming working on 3 interfaces. I configured my management VLAN as the native VLAN and a handful of other vlans are allowed in the switch which are used for VMs which I can VLAN tag at the vSwitch without issue. I am however having problems PXE booting when I have all three of the ports connected to my switch. I have a helper entry which has the IP of my PXE server and I am able to PXE boot fine on the native vlan if I have only the PXE capable NIC in the bond connected to the switch. When I have all three Interfaces connected to the switch the PXE boot fails with an ARP time out. I also tried tagging the PXE adapter in the NIC configuration bios with another VLAN which also has a helper entry of the PXE server… This works identical to the native VLAN and only PXE boots when just the PXE capable NIC is connected. I believe the team configuration is what is breaking my ability to PXE boot when all three of the Ports are active. Has anyone come across a work around for this? I am guessing I am going to have to sacrifice inbound load balancing and configure the 3 Interfaces as trunks only? -
My network guy and I looked over this earlier today in an effort to re-design my ESX environment. We ran into two issues. For the NIC teaming, port channels are required. However, you apparently can’t port channel across core switches (we’ve got two Cisco 4506’s linked together.) This poses a problem for redundancy, since 4 NICs go to core 1 and the other 4 go to core 2 (so 8 NIC ports for the VMs - plus two others for SC and VKernel.) The other problem is that the load-balance command you mentioned is a global command and would affect all of the ports, not just the ones that are port-channeled. When we tried to test this, Cisco did not recognize the command on that interface. So I assume maybe that you have your ESX boxes on their own switch?
We also looked at the native vlan options you discussed in your Vmotion and VLAN security article. However in our case, you can already route between our VLANs so hopping wouldn’t be an issue (or so I’m told.) He made the point that you’d have to be inside the building to even get to our private VLANs, at which point, we’d have a much bigger problem
Thoughts and comments are more than welcomed. As I mentioned, I’m in the process of redesigning 8 different sites so that they’re all setup the same way. Thanks!
-
Thanks Scott. Your post really helped me out to solve our problem to get the ESX to work with our Cisco 3750 switches.
I know nothing about ESX servers and the ESX administrator told me he used LACP…//Linus Cisco Engineer
-
I’m building a new farm (cloud?) from scratch using ESX3.5. I’ve configured etherchannel on my 3750s with src-dst-ip balancing and set my vSwitch to use “Route based on IP hash”.
Now everything seems to be working but the networking page is worrying me. I would expect all NICs in the team to see our entire LAN IP range.
http://vitaredux.files.wordpress.com/2009/01/screenshot026.jpg
Is this expected? If not, any ideas where to look to identify the problem?
-
In the article you said, “You can most certainly create NIC teams (or “bonds”) in ESX Server without any switch support whatsoever. Once those NIC teams have been created, you can configure load balancing and failover policies. However, those policies will affect outbound traffic only. ”
If you don’t use etherchannel, and you balance based on VPID, then each VM will pass traffic out of a single nic. On the physical switch side, it will see that mac address on that single physical nic, and inbound traffic will be switched back through the same port that traffic came out of. Effectivly giving all virtual machines an equal share of the aggregate bandwidth. Right?
This works well in the many to many case. If you had one IP address talking to many clients, then a single nic would be the bottleneck, right? But this would be rare in an ESX environment.
My real question is when does etherchannel help? I assume it would in the “one to many” case I listed above, but would it help in any other cases? How would it help?
-
Thanks for the How-To, we’re going to give it a try tomorrow.
One questions though, I have the SUP720s and all the fancy stuff, can you give a basic rundown of how to trunk across the switches?
I’m the server guy, not the network guy, but I like to know what I’m asking somebody to do.
Thanks a bunch.




89 comments
Comments feed for this article
Trackback link: http://blog.scottlowe.org/2006/12/04/esx-server-nic-teaming-and-vlan-trunking/trackback/