Link State Tracking in Blade Deployments

It’s common in blade deployments to use multiple Ethernet switches in the blade chassis to provide network redundancy (I’ll refer to these as “in-chassis switches” moving forward). For example, in both the IBM BladeCenter H and the HP BladeSystem c-Class, we can provision multiple in-chassis switches so that half of the NICs on the blades connect to one in-chassis switch and the other half connect to the other switch. Within the OS, we load NIC teaming software to provide automatic failover if one of the links goes down. In this scenario, if one of the in-chassis switches fails then traffic will automatically fail over to the other switch.

In cases like this, everything works as advertised. But what about when the in-chassis switch stays up, but the uplink from that switch to the outside world goes down (perhaps the upstream switch went down or the link was unplugged)? In that case, the link from the in-chassis switch to the blade’s NIC is still up, and therefore the NIC teaming software in the OS does not know that a problem has occurred and will not move the traffic to the other link. In situations like this, we need to implement link state tracking.

<aside>Astute readers will recognize that link state tracking is actually applicable in any server deployment—not just a blade server deployment—where the servers connect to a distribution switch and not the core. I’m just going to focus on blade server deployments here, but the configuration would be much the same, if not exactly the same, in non-blade server deployments.</aside>

Link state tracking is pretty easy to configure; you define one or more upstream ports and one or more downstream ports. The upstream port(s) are the ports that uplink to the rest of the network; in a blade server deployment, this would be the ports (or port groups) that connect to the network backbone. The downstream port(s) are the ports that connect back to the servers.

Here’s an example. We have a Cisco in-chassis switch that has a GigabitEtherChannel port group defined as an uplink out to the outside world:

interface Port-Channel1
description Uplink to network backbone
switchport trunk encapsulation dot1q
switchport trunk native vlan 2
switchport trunk allowed vlan 2-4094
switchport mode trunk
link state group 1 upstream

Note the “link state group 1 upstream” command, which marks this port channel as an upstream port. If all the links in this port channel go down (thus making the port channel itself go down), then the switch will notify downstream ports in the same group to mark themselves as down also.

The member ports of this port channel would not have the “link state” command present:

interface GigabitEthernet0/18
description Port group member for uplink to network
switchport trunk encapsulation dot1q
switchport trunk native vlan 2
switchport trunk allowed vlan 2-4094
switchport mode trunk
channel-group 1 mode on

So for the ports on the same in-chassis switch that are connecting to the servers in the chassis, we have this configuration:

interface GigabitEthernet0/10
description Web server NIC
switchport access vlan 2
switchport mode access
link state group 1 downstream
spanning-tree portfast

Note the “link state group 1 downstream” command, which marks this port as a downstream port from the Port-Channel1 interface. If Port-Channel1 goes down (because all the member links in Port-Channel1 also went down), then GigabitEthernet0/10 will also go down. Because GigabitEthernet0/10 went down, the NIC teaming software running in the OS on the blade will fail the traffic over to a different NIC, presumably a NIC that connects to the redundant in-chassis switch.

You’ll also need the global “link state track 1″ global command to enable link state tracking (thanks for the clarification, Matt!).

Because of the nature of blade deployments, this sort of configuration is particularly applicable in blade deployments, but also applies in other situations as well (as mentioned earlier). I hope this is useful!

UPDATE: I’ve changed from using “chassis switch” to “in-chassis switch” to help avoid confusion with products like the Cisco Catalyst 6500 series, which are commonly referred to as chassis switches. Thanks, James!

Tags: , , , , ,

  1. Matt Slavin’s avatar

    Very good info, and a very valuable feature. One comment, you will also need to have a global config command “link state track 1″ to enable the link state feature described above. To check the status of the feature (to make sure it is truly enabled and ready), use the command “sh link state group detail”. Again, great blog! :-)

  2. slowe’s avatar

    Matt,

    Thanks for the clarification! I’ve updated the article accordingly.

  3. rdeville’s avatar

    I’m interested to know how you achieve the same if you’re using a Virtual Connect module instead of a Cisco chassis switch.

  4. slowe’s avatar

    Honestly, I don’t know. But I can try to find out for you! I’ll add a comment back here when I have some additional information.

  5. slowe’s avatar

    Rdeville,

    VirtualConnect includes a feature called SmartLink that performs the same function. With SmartLink enabled, if a VirtualConnect switch detects that all upstream links have failed, it will mark the blade server links as down as well–thus accomplishing the same thing as link state tracking in the Cisco world.

    There is some additional advanced functionality that is present with VirtualConnect that I haven’t had the opportunity to try out yet that may also help provide fault tolerance and redundancy in the upstream links.

    Hope this helps!

  6. rdeville’s avatar

    Hey, thanks for clearing this up, that’s probably the thing I’m looking for. I also found out that this feature is only available after a firmware upgrade on the VC module. You need to have at least v1.15.

  7. slowe’s avatar

    Robin,

    Yes, you’re absolutely correct–I should have mentioned that SmartLink requires firmware revision 1.15 as you stated. Thanks!

  8. Pritz’s avatar

    Hi,

    How about configuring that even one of the upstream port is down (1 out of 4 external ports), the upstream port-channel should be down as well making the paving the failover in the blade servers. Is this possible?

    rgds,
    Pritz

  9. MH’s avatar

    Hi,

    I am looking for a solution in an environment with a CIGESM blade switch module having 2 uplinks trunk to 2 LAN switches. Whenever one of the uplink is down, I want to shut the downstreams port down. I have done some testing, but unable to achieve what I want. Any idea??

  10. Ilyas’s avatar

    hello,

    as mentioned in the article Link State Tracking in Blade Deployments, we have a new blade infrastructure, we were doing teaming for the blade server on windows 2003 enterprise, but were unable to accomplish it, do we require to do the Link state tracking to enable teaming on the blade, please advice.

  11. slowe’s avatar

    Pritz,

    This should be fairly easy; just set the “link state group X upstream” command on the physical uplink port instead of the port channel. Keep in mind that you can use multiple link state tracking groups. I don’t know what the maximum number of link state tracking groups is, so watch for that.

    MH,

    You should be able to do exactly what are you seeking to do with the suggested configs above. These were tested on an IBM BladeCenter.

    Ilyas,

    Have you tried the configurations in the article yet? That will enable link state tracking and will create a solution in which NIC teaming will work properly.

  12. Sonia’s avatar

    Hello,

    this link-state feature is just what I need by there is a catch in it: we are only allowed 2 link-state group per switch. The uplinks on my blade switch are connecting to a 6500 system that is not managed by me, and therefore I will be given 1 VLAN per uplink (4 distinct VLANs per switch). In a wonderful world I would be able to configure a link-state group for each of these 4 Vlans, but turns out I can only have 2 … Kind of frustrating I may say. Is there any way to increase this capacity ? Or should I be looking into something else ? Thank you !

  13. slowe’s avatar

    Sonia,

    Unless I am misunderstanding your question, I don’t see a problem with what you want to do. The VLANs are logical connections, not physical connections; you can run all four of your VLANs across two physical connections that are configured as VLAN trunks. This allows all four VLANs to share the same uplinks to the core switch(es) and allows you to take advantage of link-state tracking.

    Good luck!

  14. Sonia’s avatar

    Hello,

    I am not allowed to have trunk on my uplinks :) So , that is why I am stuck with 1 Vlan per physical uplink interface ! The 6500 is owned by another entity that will ‘own’ the L3 for my BladeCentre. Bad design, but this was not a technical choice. Any ideas ? Thank you !

  15. slowe’s avatar

    Sonia,

    In that case, if you are not allowed to configure your uplinks as 802.1q VLAN trunks, then there isn’t a whole lot you can do. You *might* be able to set up some sort of scenario in which you link the chassis switches and each switch has an uplink to a different VLAN, but I’m not 100% sure that would actually work. If you get it to work, please let me know what steps you took so that I can post that information here for everyone’s benefit.

    Thanks, and good luck!

  16. Sonia’s avatar

    Hi,

    we had to choose another way to achieve the L2 failover result: we enabled the ‘beacon probing’ facility on VMWAre environment. But I have read some bad reviews on using this. Can you share any experience you may have had with it ? Thank you !

  17. slowe’s avatar

    Sonia,

    I, too, have heard of problems when using beacon probing, but I don’t have any direct experience one way or the other. Hopefully some other readers will be able to share their experience and/or knowledge on the use of beacon probing.

    Anyone care to chime in?

  18. Erik’s avatar

    Thanks for this great information. It came in very handy on a new BladeCenter H implementation with ESX 3.5

  19. slowe’s avatar

    Erik,

    I’m glad you found the information helpful. Feel free to spread the word about the site!

  20. Mark Burse’s avatar

    Hi Scott,

    I have an intersting one for you:-
    I have a C7000 with 6 x VC switches – 4 x Ethernet and 2 x FC. Yes I am using the Ethernet Switch Link cables.
    I can get the 2 x vertical ones working on the left-hand-side working, but cannot get the other 2 x switches to play together nicely….everything points to the customer’s 2nd core switch as the issue…
    Strangely when I use just the left-hand-side switches I only get 1 x active connection…which is great…ALL Works no issues – But when I enable the 2 x right-hand-side switches as well, it drops my active connection from the left-hand-side and then activates 2 x Active connections causing a Loop so drops all packets…..I believe?
    All Hardware is running latest F/W as of today.
    We are using A “Shared Uplink Set”….Server are ESX 3.5.0 Update 2.
    Rgds
    Mark

  21. slowe’s avatar

    Mark,

    Interesting issue. Send me an e-mail and let’s take this discussion private. I’ll be happy to help, if I’m able to do so.

  22. James Cape’s avatar

    Terminology quibble:

    The term “Chassis Switch” refers to something else: a modular switch, e.g. the Cisco 6500 series.

  23. slowe’s avatar

    James, thanks for the clarification. I’ve modified the post accordingly to help avoid confusion.

  24. Chris’s avatar

    Rather a late chime in, but I just experienced major problems with ‘Beacon Probing’. I had four GB NICs setup in a channel group as outlined in your article. However I was getting reports of very poor performance.

    Using iperf I was able to determine that typical network speed through this was ~50Mbps. If I then put the same server in a portgroup with a single NIC it could run up to 700Mbps.

    I tried plenty of things before eventually disabling Beacon Probing at which point the network performance returned to the expected 600-700Mbps.

    I don’t claim to understand what was happening, but I can do without beacon probing.

  25. William’s avatar

    Very nice, thanks for the info, needed to do some work tonight and this explains a line which I didn’t fully understand 5 minutes ago :)

  26. A. Mikkelsen’s avatar

    Thanks for this solution.
    It helped us.

    A. Mikkelsen