It’s common in blade deployments to use multiple Ethernet switches in the blade chassis to provide network redundancy (I’ll refer to these as “in-chassis switches” moving forward). For example, in both the IBM BladeCenter H and the HP BladeSystem c-Class, we can provision multiple in-chassis switches so that half of the NICs on the blades connect to one in-chassis switch and the other half connect to the other switch. Within the OS, we load NIC teaming software to provide automatic failover if one of the links goes down. In this scenario, if one of the in-chassis switches fails then traffic will automatically fail over to the other switch.
In cases like this, everything works as advertised. But what about when the in-chassis switch stays up, but the uplink from that switch to the outside world goes down (perhaps the upstream switch went down or the link was unplugged)? In that case, the link from the in-chassis switch to the blade’s NIC is still up, and therefore the NIC teaming software in the OS does not know that a problem has occurred and will not move the traffic to the other link. In situations like this, we need to implement link state tracking.
<aside>Astute readers will recognize that link state tracking is actually applicable in any server deployment—not just a blade server deployment—where the servers connect to a distribution switch and not the core. I’m just going to focus on blade server deployments here, but the configuration would be much the same, if not exactly the same, in non-blade server deployments.</aside>
Link state tracking is pretty easy to configure; you define one or more upstream ports and one or more downstream ports. The upstream port(s) are the ports that uplink to the rest of the network; in a blade server deployment, this would be the ports (or port groups) that connect to the network backbone. The downstream port(s) are the ports that connect back to the servers.
Here’s an example. We have a Cisco in-chassis switch that has a GigabitEtherChannel port group defined as an uplink out to the outside world:
interface Port-Channel1
description Uplink to network backbone
switchport trunk encapsulation dot1q
switchport trunk native vlan 2
switchport trunk allowed vlan 2-4094
switchport mode trunk
link state group 1 upstream
Note the “link state group 1 upstream” command, which marks this port channel as an upstream port. If all the links in this port channel go down (thus making the port channel itself go down), then the switch will notify downstream ports in the same group to mark themselves as down also.
The member ports of this port channel would not have the “link state” command present:
interface GigabitEthernet0/18
description Port group member for uplink to network
switchport trunk encapsulation dot1q
switchport trunk native vlan 2
switchport trunk allowed vlan 2-4094
switchport mode trunk
channel-group 1 mode on
So for the ports on the same in-chassis switch that are connecting to the servers in the chassis, we have this configuration:
interface GigabitEthernet0/10
description Web server NIC
switchport access vlan 2
switchport mode access
link state group 1 downstream
spanning-tree portfast
Note the “link state group 1 downstream” command, which marks this port as a downstream port from the Port-Channel1 interface. If Port-Channel1 goes down (because all the member links in Port-Channel1 also went down), then GigabitEthernet0/10 will also go down. Because GigabitEthernet0/10 went down, the NIC teaming software running in the OS on the blade will fail the traffic over to a different NIC, presumably a NIC that connects to the redundant in-chassis switch.
You’ll also need the global “link state track 1″ global command to enable link state tracking (thanks for the clarification, Matt!).
Because of the nature of blade deployments, this sort of configuration is particularly applicable in blade deployments, but also applies in other situations as well (as mentioned earlier). I hope this is useful!
UPDATE: I’ve changed from using “chassis switch” to “in-chassis switch” to help avoid confusion with products like the Cisco Catalyst 6500 series, which are commonly referred to as chassis switches. Thanks, James!
-
Very good info, and a very valuable feature. One comment, you will also need to have a global config command “link state track 1″ to enable the link state feature described above. To check the status of the feature (to make sure it is truly enabled and ready), use the command “sh link state group detail”. Again, great blog!
-
I’m interested to know how you achieve the same if you’re using a Virtual Connect module instead of a Cisco chassis switch.
-
Hey, thanks for clearing this up, that’s probably the thing I’m looking for. I also found out that this feature is only available after a firmware upgrade on the VC module. You need to have at least v1.15.
-
Hi,
How about configuring that even one of the upstream port is down (1 out of 4 external ports), the upstream port-channel should be down as well making the paving the failover in the blade servers. Is this possible?
rgds,
Pritz -
Hi,
I am looking for a solution in an environment with a CIGESM blade switch module having 2 uplinks trunk to 2 LAN switches. Whenever one of the uplink is down, I want to shut the downstreams port down. I have done some testing, but unable to achieve what I want. Any idea??
-
hello,
as mentioned in the article Link State Tracking in Blade Deployments, we have a new blade infrastructure, we were doing teaming for the blade server on windows 2003 enterprise, but were unable to accomplish it, do we require to do the Link state tracking to enable teaming on the blade, please advice.
-
Hello,
this link-state feature is just what I need by there is a catch in it: we are only allowed 2 link-state group per switch. The uplinks on my blade switch are connecting to a 6500 system that is not managed by me, and therefore I will be given 1 VLAN per uplink (4 distinct VLANs per switch). In a wonderful world I would be able to configure a link-state group for each of these 4 Vlans, but turns out I can only have 2 … Kind of frustrating I may say. Is there any way to increase this capacity ? Or should I be looking into something else ? Thank you !
-
Hello,
I am not allowed to have trunk on my uplinks
So , that is why I am stuck with 1 Vlan per physical uplink interface ! The 6500 is owned by another entity that will ‘own’ the L3 for my BladeCentre. Bad design, but this was not a technical choice. Any ideas ? Thank you ! -
Hi,
we had to choose another way to achieve the L2 failover result: we enabled the ‘beacon probing’ facility on VMWAre environment. But I have read some bad reviews on using this. Can you share any experience you may have had with it ? Thank you !
-
Hi Scott,
I have an intersting one for you:-
I have a C7000 with 6 x VC switches – 4 x Ethernet and 2 x FC. Yes I am using the Ethernet Switch Link cables.
I can get the 2 x vertical ones working on the left-hand-side working, but cannot get the other 2 x switches to play together nicely….everything points to the customer’s 2nd core switch as the issue…
Strangely when I use just the left-hand-side switches I only get 1 x active connection…which is great…ALL Works no issues – But when I enable the 2 x right-hand-side switches as well, it drops my active connection from the left-hand-side and then activates 2 x Active connections causing a Loop so drops all packets…..I believe?
All Hardware is running latest F/W as of today.
We are using A “Shared Uplink Set”….Server are ESX 3.5.0 Update 2.
Rgds
Mark -
Rather a late chime in, but I just experienced major problems with ‘Beacon Probing’. I had four GB NICs setup in a channel group as outlined in your article. However I was getting reports of very poor performance.
Using iperf I was able to determine that typical network speed through this was ~50Mbps. If I then put the same server in a portgroup with a single NIC it could run up to 700Mbps.
I tried plenty of things before eventually disabling Beacon Probing at which point the network performance returned to the expected 600-700Mbps.
I don’t claim to understand what was happening, but I can do without beacon probing.
-
Very nice, thanks for the info, needed to do some work tonight and this explains a line which I didn’t fully understand 5 minutes ago
-
Hi
I am looking at how to implement link state tracking on Nortel blade centre switches?
Thanks



27 comments
Comments feed for this article
Trackback link: http://blog.scottlowe.org/2007/06/22/link-state-tracking-in-blade-deployments/trackback/