Cisco

You are currently browsing articles tagged Cisco.

In my previous article about using NetApp multi-mode VIFs with Cisco switches, I mentioned that you could—at that time—only use 802.3ad static link aggregation:

Be aware that Data ONTAP’s multi-mode VIFs are only compatible with static 802.3ad link aggregation; you can’t use PAgP (Cisco proprietary protocol).  I would assume dynamic LACP is also incompatible.  For this reason we used the “channel-group 1 mode on” statement instead of something like “channel-group 1 mode desirable”.

I recently got some feedback from a NetApp SE in my area; this SE informed me that Link Aggregation Control Protocol (LACP, part of the IEEE 802.3ad specification) is indeed supported with Data ONTAP version 7.2.  This KB article on the NetApp NOW site (login required) indicates that ONTAP 7.2.1 is required in order to use a LACP VIF.

There are a couple important requirements to note; these are laid out in the referenced KB article:

  1. Dynamic multimode VIFs should use IP address-based load balancing.  This means that the Cisco switch or the channel group must also use IP address-based load balancing.
  2. Dynamic multimode VIFs must be first-level VIFs.  This makes sense; LACP is a Layer 2 protocol, so layering a LACP VIF on top of other VIFs just doesn’t work.

To create the dynamic multimode VIF on the Data ONTAP side, the command is pretty simple:

vif create lacp <vif name> -b ip {interface list}

On the Cisco side, the commands are very similar:

s3(config)#int port-channel1
s3(config-if)#description LACP multimode VIF for netapp1
s3(config-if)#int gi0/23
s3(config-if)#channel-protocol lacp
s3(config-if)#channel-group 1 mode active

These commands would be repeated for all physical ports that should be included in the LACP bundle.  Note the differences from the earlier commands in the previous article; here we use “channel-group 1 mode active” instead of “channel-group 1 mode on”.  We also added the “channel-protocol lacp” command.

Together, these commands will establish a LACP-based link aggregate between a NetApp storage system running Data ONTAP version 7.2.1 or higher and a Cisco IOS-based switch.

Thanks to Jeff, our NetApp SE, for providing the updated information.

Tags: , , , ,

I just saw this headline from virtualization.info about Cisco being the first to announce a third-party virtual switch for ESX Server. Honestly, I’m not too terribly surprised.

Surely, the announcement of John Chambers as a keynote speaker at VMworld 2007, followed by the investment of $150 million by Cisco into VMware surely had to signal that something was going on.  This revelation that Cisco plans on being the first to announce a third-party virtual switch for ESX Server is honestly not a surprise.  In fact, if I recall correctly, there were rumors of this last year at VMworld.

Still, despite the fact that it’s not really a surprise, I’m still excited about the possibility that this is actually the case.  I’m not a Cisco expert, but I’d love to have the ability to run Cisco-based switches on ESX Server (assuming the overhead isn’t too significant).  That should make integration of ESX Server with Cisco-based networks much easier than it has been in the past.  What’s next—SAN switches on ESX?  (I may not be too far off, there, with all the rumors and discussions about NPIV support in ESX Server.)

UPDATE:  Ryan Glover over at p2vd.com shares his thoughts on the announcement here.

Tags: , , ,

The following configuration will enable you to authenticate login requests to Cisco equipment running IOS against Active Directory.  This would, for example, allow you to centralize the authentication of your Cisco-based network infrastructure against Active Directory.

Configuring the Cisco Equipment

The equipment I used in this configuration was a Cisco Catalyst 3560G switch running IOS 12.2(25); please note that the commands listed here may be different in different versions of IOS.  The commands should be roughly equivalent, however, across hardware platforms.

First, we must enable the external authentication mechanisms, then we’ll specify the external authentication servers we’re going to use.  This is listed below:

  1. First, to enable external authentication on the switch, use the following commands in global configuration mode:
    s1(config)#aaa new-model
    s1(config)#aaa authentication login default group radius local

    This enables the authentication of login requests by RADIUS first, then by a local database (just in case network connectivity is down).  We specify “local” as well because this configuration applies to both telnet requests as well as physical console requests.
  2. Next, we specify the external authentication servers that the switch should use:
    s1(config)#radius-server host 10.1.1.254 auth-port 1645
    acct-port 1646 key Password

    (This should all be on one line.)  Best practices dictate that you should have at least two RADIUS servers for redundancy.  Note that the “auth-port” and “acct-port” parameters are only necessary if you are using nonstandard ports.  Since Microsoft’s IAS (Internet Authentication Service, which provides the RADIUS interface to Active Directory) uses both sets of standard ports (1645/1812 and 1646/1813) you won’t need to specify these parameters.  The “key” parameter is a shared secret key between the RADIUS client (the switch) and the RADIUS server.  Obviously, you’ll want to use something other than “Password”.

Now we’re ready to move to configuring the Windows servers that we’ll use for RADIUS authentication.

Configuring Internet Authentication Service (IAS)

Configuring IAS is rather simple.  I’ve discussed the use of IAS before (here in discussing Cisco PIX-AD integration and here regarding WatchGuard Firebox-AD integration), and I’ll refer you back to those articles for some of the basics on setting up and configuring IAS.

Note that these instructions are based on the version of IAS included with Windows Server 2003 R2; different versions may behave slightly differently.

To configure IAS in this instance (once it has been installed and registered with Active Directory), we’ll do the following:

  • Add the Cisco Catalyst switch as a RADIUS client.  We’ll need to be sure to specify the same shared secret as used in the switch configuration above.  You can specify the Cisco switch either by DNS name (if it is registered in DNS) or by IP address.
  • Create a new remote access policy that grants remote access permission.  The conditions on the policy should be “NAS-IP-Address” (set to the IP address of the Cisco equipment) and “Windows-Groups” (set to whatever group should be allowed to authenticate to the switch; I created a group called “Cisco Admins” and used it).
  • Configure the profile to use only PAP authentication and no encryption.

Repeat this process on the second Windows server running IAS (you did configure two for redundancy, didn’t you?).

That’s it!  At this point, you should be able to telnet to the Cisco switch (or whatever IOS-based equipment you’ve configured) and log in with your Active Directory username and password.  Once logged in, you can use your enable or enable secret password to enter privileged exec mode.

Now, before you go any farther, add a local account to use in case the network connectivity to the RADIUS server is lost:

s1(config)#username localaccount password password123

(Obviously, you’ll want to use a secure password!)  This will ensure that if you lose network connectivity to the equipment, you can still get in through the serial console connection.  Be warned: without this local account, you can be locked out of the equipment completely if the RADIUS server(s) are inaccessible!

This Cisco document offers some additional information on AAA configurations, so I’ll refer you there for more detailed descriptions of the commands involved.  Enjoy!

Tags: , , ,

It’s common in blade deployments to use multiple Ethernet switches in the blade chassis to provide network redundancy (I’ll refer to these as “chassis switches” moving forward). For example, in both the IBM BladeCenter H and the HP BladeSystem c-Class, we can provision multiple chassis switches so that half of the NICs on the blades connect to one chassis switch and the other half connect to the other switch. Within the OS, we load NIC teaming software to provide automatic failover if one of the links goes down. In this scenario, if one of the chassis switches fails then traffic will automatically fail over to the other switch.

In cases like this, everything works as advertised. But what about when the chassis switch stays up, but the uplink from that switch to the outside world goes down (perhaps the upstream switch went down or the link was unplugged)? In that case, the link from the chassis switch to the blade’s NIC is still up, and therefore the NIC teaming software in the OS does not know that a problem has occurred and will not move the traffic to the other link. In situations like this, we need to implement link state tracking.

<aside>Astute readers will recognize that link state tracking is actually applicable in any server deployment—not just a blade server deployment—where the servers connect to a distribution switch and not the core. I’m just going to focus on blade server deployments here, but the configuration would be much the same, if not exactly the same, in non-blade server deployments.</aside>

Link state tracking is pretty easy to configure; you define one or more upstream ports and one or more downstream ports. The upstream port(s) are the ports that uplink to the rest of the network; in a blade server deployment, this would be the ports (or port groups) that connect to the network backbone. The downstream port(s) are the ports that connect back to the servers.

Here’s an example. We have a Cisco chassis switch that has a GigabitEtherChannel port group defined as an uplink out to the outside world:

interface Port-Channel1
description Uplink to network backbone
switchport trunk encapsulation dot1q
switchport trunk native vlan 2
switchport trunk allowed vlan 2-4094
switchport mode trunk
link state group 1 upstream

Note the “link state group 1 upstream” command, which marks this port channel as an upstream port. If all the links in this port channel go down (thus making the port channel itself go down), then the switch will notify downstream ports in the same group to mark themselves as down also.

The member ports of this port channel would not have the “link state” command present:

interface GigabitEthernet0/18
description Port group member for uplink to network
switchport trunk encapsulation dot1q
switchport trunk native vlan 2
switchport trunk allowed vlan 2-4094
switchport mode trunk
channel-group 1 mode on

So for the ports on the same chassis switch that are connecting to the servers in the chassis, we have this configuration:

interface GigabitEthernet0/10
description Web server NIC
switchport access vlan 2
switchport mode access
link state group 1 downstream
spanning-tree portfast

Note the “link state group 1 downstream” command, which marks this port as a downstream port from the Port-Channel1 interface. If Port-Channel1 goes down (because all the member links in Port-Channel1 also went down), then GigabitEthernet0/10 will also go down. Because GigabitEthernet0/10 went down, the NIC teaming software running in the OS on the blade will fail the traffic over to a different NIC, presumably a NIC that connects to the redundant chassis switch.

You’ll also need the global “link state track 1″ global command to enable link state tracking (thanks for the clarification, Matt!).

Because of the nature of blade deployments, this sort of configuration is particularly applicable in blade deployments, but also applies in other situations as well (as mentioned earlier). I hope this is useful!

Tags: , , , ,

Network Appliance storage systems support the use of virtual interfaces (VIFs) to provide link redundancy and improved network throughput.  Two types of VIFs are available:

  • Single-mode VIFs act like a fault tolerant team and will fail traffic over to a standby link when the active link goes down.
  • Multi-mode VIFs act like a group of links providing aggregate bandwidth as well as link redundancy.

Single-mode VIFs are great for fault tolerance, but the storage system isn’t leveraging all the links.  It’s “active-passive” arrangement in which only one of the links is passing traffic while the other link is idle.  No switch support is required for this configuration.

Multi-mode VIFs, on the other hand, allow for both greater bandwidth utilization as well as fault tolerance.  Traffic will be distributed across all the links in the VIF (typically based on IP address), and if one link fails the traffic is redistributed across the remaining links.  However, this configuration requires support on the switch.  In this article, we’re going to look at configuring a Cisco Catalyst 3560 switch to do link aggregation with a NetApp storage system running Data ONTAP 7.1.1.1.

To configure the switch, we’ll use the following commands (these are entered in global configuration mode on the switch):

s3(config)#int port-channel1
s3(config-if)#description Multi-mode VIF for netapp1
s3(config-if)#int gi0/23
s3(config-if)#channel-group 1 mode on
s3(config-if)#int gi0/24
s3(config-if)#channel-group 1 mode on

This creates the port-channel1 interface (you may need to increment that number, i.e., use port-channel2 or port-channel3, if you already have existing link aggregates configured) and adds interfaces GigabitEthernet0/23 and GigabitEthernet0/24 to the link aggregate.  If you do have to use a different link aggregate interface, be sure the number of the interface (“int port-channel4”) matches the number of the channel-group specified on the member interfaces (“channel-group 4 mode on”).  This seems obvious, but it’s worth mentioning nevertheless.

Be aware that Data ONTAP’s multi-mode VIFs are only compatible with static 802.3ad link aggregation; you can’t use PAgP (Cisco proprietary protocol).  I would assume dynamic LACP is also incompatible.  For this reason we used the “channel-group 1 mode on” statement instead of something like “channel-group 1 mode desirable”.

By default, many Cisco switches default to MAC address-based load balancing across the links, whereas NetApp defaults to IP address-based load balancing.  To see the switch’s current load balancing configuration, use this command in privileged mode:

s3#show etherchannel load-balance

To change the switch’s load balancing algorith to a mode compatible with NetApp’s, use one of the following command in global configuration mode (note that changing it affects the entire switch; you can’t change it for a single port-channel individually):

s3(config)#port-channel load-balance src-dst-ip

Once the switch is configured, then we can proceed with configuring the NetApp storage system.  The following commands will create the the multi-mode VIF (this can also be done via the FilerView GUI):

netapp1>vif create multi vif0 e6d e7d
netapp1>ifconfig vif0 172.31.254.10 netmask 255.255.255.0
netapp1>ifconfig vif0 up

This creates the VIF with interfaces e6d and e7d as members, plumbs it with an IP address, and brings it up.  Running the command “vif status vif0” now will return the following results:

default: transmit ‘IP Load balancing’, VIF Type ‘multi_mode’, fail ‘log’
vif0: 2 links, transmit ‘IP Load balancing’, VIF Type ‘multi-mode’ fail ‘default’
 
VIF Status Up Addr_set
up:
e6d: state up, since 05Oct2001 17:17:15 (05:23:05)
mediatype: auto-1000t-fd-up
flags: enabled
input packets 2000, input bytes 12800
output packets 173, output bytes 1345
up indications 1, broken indications 0
drops (if) 0, drops (link) 0
indication: up at boot
consecutive 3, transitions 1
e7d: state up, since 05Oct2001 17:18:03 (00:10:03)
mediatype: auto-1000t-fd-up
flags: enabled
input packets 134, input bytes 987
output packets 20, output bytes 156
up indications 1, broken indications 0
drops (if) 0, drops (link) 0
indication: broken

Note the ‘IP Load balancing’ algorithm stated in the output; this is why the switch’s load-balancing mechanism should be changed to match.

At this point, the links should be up between the Cisco switch and the NetApp storage system, and traffic should be passing to and from the storage system without any problems.  To test the fault tolerance, we can pull one of the links in VIF; traffic should continue to flow with very little, if any, interruption.  And while traffic from a single client to the NetApp won’t see a significant increase in throughput, the overall throughput of multiple separate clients to the NetApp should improve with multiple links in the VIF.

More information, including additional Cisco configs, is available here.

Tags: , , , ,

For the last few weeks, I’ve been implementing a Squid proxy server (with SquidGuard content filtering) to help control outbound web access from my home network.  Basically, I wanted to make sure that the kids weren’t accessing content that was inappropriate.  So, after making sure that the proxy server was working as expected, last night I locked down the Cisco PIX firewall (OK, I’m a geek—what can I say?) to only allow outbound HTTP/HTTPS traffic from the proxy server itself.  (I suspect that one of my daughters had discovered that she could bypass the proxy, hence the need to lock down the firewall.  She’s got a surprise coming.)

Upon arriving home from work today, I booted up my laptop and launched my suite of applications and sites (Camino, NetNewsWire, Adium, Cocalicious, webmail for the office, a few other key web sites, etc.).  I was greeted with a prompt to enter the password for two MSN accounts that I use for instant messaging with Adium.  I entered the password.  It prompted me again.  Puzzled, I typed the password again, more slowly to make sure that I had every character right.  Still wouldn’t connect.  What was going on?

Now suspicious that someone had defrauded my accounts, I logged in to the accounts via an HTTPS connection in a Web browser.  No, the password was right; the accounts were OK.  So why wouldn’t Adium connect?

Next, I reviewed the account settings in Adium while also performing a Google search.  The settings looked right (most notably, the “Connect via HTTP” option was unchecked), and the Google search turned up an Adium Trac ticket about MSN connectivity with a proxy server.  At that point, the light bulb came on—I must need to configure Adium to use the systemwide proxy settings in Mac OS X.  A quick couple of clicks later I was done, but Adium still wouldn’t connect.  Huh?

<aside>I still haven’t determined if configuring Adium to use the systemwide proxy settings failed due to a content violation, an error with the proxy configuration, or a bug in Adium.  Nothing was logged in the blocked sites log on the proxy, though, so I’m leaning toward one of the latter two.</aside>

OK, then, let’s go old-school on this thing.  I logged into my PIX firewall, issued a “show access-list” command to get the current traffic matching statistics, and tried to connect again.  Another “show access-list” command, and it became painfully clear that the outbound rule blocking HTTPS access from everyone except the proxy was causing the problem.

<rant>If an application says that it uses a certain set of protocols or ports, DON’T use an entirely different set of protocols and ports.  It drives people like me crazy!  (Not to mention it flies in the face of trying to establish reasonable security policies.)</rant>

Back in the PIX firewall again, I entered a “debug packet” command to echo HTTPS packets exiting the network, and tried connecting to my MSN account from Adium again.  Yep, there it is—a couple of different IP addresses (65.54.179.228 and 65.54.183.203) showed up in the debug output every time I tried to connect.  After modifying the access-list to allow HTTPS traffic to those specific addresses, Adium connected without any further problems.

Now what would an “ordinary” person have done in this situation?  Would an “ordinary” (and by that I mean non-network engineer or non-geek individuals) person have known to check their firewall?  Or to issue a command to have the firewall tell you about blocked packets so that they could determine exactly what was being blocked and why?

Granted, ordinary people probably won’t have PIX firewalls at their house, and probably wouldn’t be locking down outbound traffic via access lists.  Still, doesn’t this tell us something?  Shouldn’t applications adhere to the protocols and ports they say they are using?  Shouldn’t applications be intelligent enough to provide an error message that describes the underlying problem?  After all this progress in computer hardware, applications, and networking, and we are still unable to accurately diagnose problems without going to multiple sources?  Or am I just expecting too much?

Tags: , , ,

To help make it easier to find the various Active Directory integration articles I’ve written, I’m including links below to the latest version of each article.  As new versions of an article are published, I can simply update this link to point to the new version.

I’ve grouped the integration articles according to product below.

Linux

Latest version for Windows Server 2008 (“Longhorn”)

Latest version for Windows Server 2003 R2

Latest version for Windows 2000 Server and Windows Server 2003 (pre-R2)

SuSE Linux Enterprise Desktop (SLED)-specific version

Solaris 10

Latest version for Solaris 10 x86

Firewalls

Latest version for Cisco PIX VPN

Latest version for WatchGuard Firebox VPN

VMware ESX Server

Latest version for ESX Server 2.5.x

Latest version for ESX Server 3.0.x

OpenBSD

Latest version for OpenBSD 3.9

Networking Equipment and Protocols

Latest version for 802.1x

Latest version for Cisco IOS

As new articles are published or existing articles are revised with new versions, I’ll update this post accordingly.

Tags: , , , , , , , , , , , , ,

The idea behind 802.1x is to provide Layer 2 authentication; that is, to authenticate LAN clients at the Ethernet layer.  (This is before the client gets a DHCP lease or anything of that nature.)  With 802.1x in place, rogue users can’t just tap into a physical connection on the network.  In order to gain network connectivity, the device must authenticate before network traffic is allowed.

The idea here is to configure 802.1x authentication on a network switch in such a way as to leverage the existing authentication infrastructure provided by Active Directory.  Like it or not, Active Directory is a widely deployed directory service and leveraging it where we can will certainly provide an advantage.  This process uses RADIUS to provide an interface between a Cisco Catalyst 3560G switch (the 802.1x authenticator in this scenario) and Active Directory.  I could only test Mac OS X as the client (or 802.1x supplicant), but I’m confident that the configuration will work equally well with Windows XP Professional.

Configuring the Cisco Catalyst 3560G

The Catalyst switch I used in this configuration was running IOS 12.2(25); please note that the commands listed here may be different in different versions of IOS.

To configure the switch for 802.1x authentication, three steps are involved:

  1. Enable 802.1x authentication on the switch (global configuration).
  2. Configure the RADIUS server(s) to which the switch will communicate for authentication requests.
  3. Enable 802.1x authentication on the individual ports.

(This document from the Cisco web site was tremendously helpful in configuring 802.1x.)

First, to enable 802.1x authentication on the switch, use the following commands in global configuration mode:

aaa new-model
aaa authentication dot1x default group radius
aaa authorization network default group radius
dot1x system-auth-control

This enables 802.1x globally on the switch, but none of the interfaces are enabled for 802.1x authentication.  Next, we configure the RADIUS server(s) to which the switch will pass the 802.1x authentication traffic.  That’s handled with these commands in global configuration mode:

radius-server host 10.1.1.254 auth-port 1645
acct-port 1646 key Password

(This should all be on one line.)  Note that the “auth-port” and “acct-port” parameters are only necessary if you are using nonstandard ports.  Since Microsoft’s IAS (Internet Authentication Service, which provides the RADIUS interface to Active Directory) uses both sets of standard ports (1645/1812 and 1646/1813) you won’t need to specify these parameters.  The “key” parameter is a shared secret key between the RADIUS client (the switch) and the RADIUS server.  Obviously, you’ll want to use something other than “Password”.

Finally, to enable 802.1x on the applicable interfaces, you’ll use these commands in interface configuration mode:

int gi0/23 (or whatever interface you want to configure)
dot1x port-control auto

That enables 802.1x authentication on that specific port.  Repeat the process for all ports that should use 802.1x authentication.  Note that some ports can’t be enabled for 802.1x authentication; most notably, trunk ports can’t be used for 802.1x.  Refer to the Cisco documentation (or the documentation from your particular vendor) for complete details on the limitations.

Now that the switch is configured, we move on to configuring Active Directory.

Configuring Active Directory and IAS

I suppose that saying we need to “configure Active Directory” isn’t entirely accurate, since no configuration changes and no schema extensions are necessary to make this work.  All that really needs to be done is to enable reversible password encryption (which can be done on a per-user basis) and setup Internet Authentication Service (IAS).

First, regarding reversible password encryption:  The configuration described here uses MD5 hashes (passwords) to authenticate clients to the network.  There are other methods, such as digital certificates, to accomplish the same thing, and I’ll probably revisit this configuration again at a later date to look at using those.  For now, though, the use of MD5 for authentication means that we have to enable reversible password encryption for every user that will need to authenticate via 802.1x, and those users will need to change their passwords after that change is made.  A pain, yes, and a potential security concern, yes, but necessary at this point.  (I won’t bother going through the details of enabling reversible password encryption here; there are plenty of resources available on the Internet, like this one, that provide that information.)

Configuring IAS is really pretty straightforward.  I’ve discussed the use of IAS before (here in discussing Cisco PIX-AD integration and here regarding WatchGuard Firebox-AD integration), and I’ll refer you back to those articles for some of the basics on setting up and configuring IAS.

To configure IAS in this instance (once it has been installed and registered with Active Directory), we’ll do the following:

  • Add the Cisco Catalyst switch as a RADIUS client.  We’ll need to be sure to specify the same shared secret as used in the switch configuration.
  • We’ll create a new remote access policy.  The conditions on the policy should be “NAS-Port-Type” (set to Ethernet) and “Windows-Groups” (set to whatever group should be allowed to authenticate via 802.1x; I used Domain Users).
  • The profile associated with this policy should be edited to note only the EAP MD5 authentication type (under “EAP Methods” on the Authentication tab); all other authentication types should be unchecked.  In addition, all encryption types on the Encryption tab should be unchecked except for “No encryption”.

At this point, the IAS configuration should be complete.  Now for the final step:  configuring the client to use 802.1x.

Configuring the Client (Mac OS X)

As mentioned earlier, I didn’t have a physical Windows XP Professional-based machine to test with, but I did do some testing with Mac OS X.  Although the software used to configure the operating system is different, the overall configuration is similar and should work without any major hitches on Windows XP.

To configure Mac OS X, launch the Internet Connect software in the Applications folder and follow these steps:

  1. From the File menu, choose “New 802.1X Connection…”.
  2. Specify a description and choose the appropriate network port (typically “Built-in Ethernet”).
  3. Specify a username and password.
  4. For authentication types, click to enable MD5 and move it to the top of the list.  Uncheck all other authentication types.
  5. Click OK to save the connection.

Once the connection has been defined, you can plug your OS X-based system into one of the 802.1x-enabled ports and click “Connect” in the Internet Connect window.  If everything is configured correctly, you should be connected and be able to pass network traffic without any issues.  If things don’t work, go back and check the switch configuration and the logs on the IAS/RADIUS server.  In particular, the logs may indicate that an incorrect password was used, or you may be able to determine that the switch isn’t even talking to the IAS/RADIUS server (perhaps a typo in the server address?).

By the way, configuring Mac OS X to use 802.1x for wireless connections is equally easy and done the same way (using Internet Connect).  I used to regularly use my MacBook Pro in an environment that used 802.1x and EAP-FAST/LEAP for wireless authentication with no problems.

Future enhancements to this configuration include switching from EAP-MD5 to something like EAP-TLS or PEAP; this will avoid the need to enable reversible password encryption on the domain.

Tags: , , , ,

Before we get into the details, allow me to give credit where credit is due. First, thanks to Dan Parsons of IT Obsession for an article that jump-started the process with notes on the Cisco IOS configuration. Next, credit goes to the VMTN Forums, especially this thread, in which some extremely useful information was exchanged. I would be remiss if I did not adequately credit these sources for the information that helped make this testing successful.

There are actually two different pieces described in this article. The first is NIC teaming, in which we logically bind together multiple physical NICs for increased throughput and increased fault tolerance. The second is VLAN trunking, in which we configure the physical switch to pass VLAN traffic directly to ESX Server, which will then distribute the traffic according to the port groups and VLAN IDs configured on the server. I wrote about ESX and VLAN trunking a long time ago and ran into some issues then; here I’ll describe how to work around the issues I ran into at that time.

So, let’s have a look at these two pieces. We’ll start with NIC teaming.

Configuring NIC Teaming

There’s a bit of confusion regarding NIC teaming in ESX Server and when switch support is required. You can most certainly create NIC teams (or “bonds”) in ESX Server without any switch support whatsoever. Once those NIC teams have been created, you can configure load balancing and failover policies. However, those policies will affect outbound traffic only. In order to control inbound traffic, we have to get the physical switches involved. This article is written from the perspective of using Cisco Catalyst IOS-based physical switches. (In my testing I used a Catalyst 3560.)

To create a NIC team that will work for both inbound and outbound traffic, we’ll create a port channel using the following commands:

s3(config)#int port-channel1
s3(config-if)#description NIC team for ESX server
s3(config-if)#int gi0/23
s3(config-if)#channel-group 1 mode on
s3(config-if)#int gi0/24
s3(config-if)#channel-group 1 mode on

This creates port-channel1 (you’d need to change this name if you already have port-channel1 defined, perhaps for switch-to-switch trunk aggregation) and assigns GigabitEthernet0/23 and GigabitEthernet0/24 into team. Now, however, you need to ensure that the load balancing mechanism that is used by both the switch and ESX Server matches. To find out the switch’s current load balancing mechanism, use this command in enable mode:

show etherchannel load-balance

This will report the current load balancing algorithm in use by the switch. On my Catalyst 3560 running IOS 12.2(25), the default load balancing algorithm was set to “Source MAC Address”. On my ESX Server 3.0.1 server, the default load balancing mechanism was set to “Route based on the originating virtual port ID”. The result? The NIC team didn’t work at all—I couldn’t ping any of the VMs on the host, and the VMs couldn’t reach the rest of the physical network. It wasn’t until I matched up the switch/server load balancing algorithms that things started working.

To set the switch load-balancing algorithm, use one of the following commands in global configuration mode:

port-channel load-balance src-dst-ip (to enable IP-based load balancing)
port-channel load-balance src-mac (to enable MAC-based load balancing)

There are other options available, but these are the two that seem to match most closely to the ESX Server options. I was unable to make this work at all without switching the configuration to “src-dst-ip” on the switch side and “Route based on ip hash” on the ESX Server side. From what I’ve been able to gather, the “src-dst-ip” option gives you better utilization across the members of the NIC team than some of the other options. (Anyone care to contribute a URL that provides some definitive information on that statement?)

Creating the NIC team on the ESX Server side is as simple as adding physical NICs to the vSwitch and setting the load balancing policy appropriately. At this point, the NIC team should be working.

Configuring VLAN Trunking

In my testing, I set up the NIC team and the VLAN trunk at the same time. When I ran into connectivity issues as a result of the mismatched load balancing policies, I thought they were VLAN-related issues, so I spent a fair amount of time troubleshooting the VLAN side of things. It turns out, of course, that it wasn’t the VLAN configuration at all. (In addition, one of the VMs that I was testing had some issues as well, and that contributed to my initial difficulties.)

To configure the VLAN trunking, use the following commands on the physical switch:

s3(config)#int port-channel1
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094

This configures the NIC team (port-channel1, as created earlier) as a 802.1q VLAN trunk. You then need to repeat this process for the member ports in the NIC team:

s3(config)#int gi0/23
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094
s3(config-if)#int gi0/24
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094

If you haven’t already created VLAN 4094, you’ll need to do that as well:

s3(config)#int vlan 4094
s3(config-if)#no ip address

The “switchport trunk native vlan 4094″ command is what fixes the problem I had last time I worked with ESX Server and VLAN trunks; namely, that most switches don’t tag traffic from the native VLAN across a VLAN trunk. By setting the native VLAN for the trunk to something other than VLAN 1 (the default native VLAN), we essentially force the switch to tag all traffic across the trunk. This allows ESX Server to handle VMs that are assigned to the native VLAN as well as other VLANs.

On the ESX Server side, we just need to edit the vSwitch and create a new port group. In the port group, specify the VLAN ID that matches the VLAN ID from the physical switch. After the new port group has been assigned, you can place your VMs on that new port group (VLAN) and—assuming you have a router somewhere to route between the VLANs—you should have full connectivity to your newly segregated virtual machines.

Final Notes

I did encounter a couple of weird things during the setup of this configuration (I plan to leave the configuration in place for a while to uncover any other problems).

  • First, during troubleshooting, I deleted a port group on one vSwitch and then re-created it on another vSwitch. However, the virtual machine didn’t recognize the connection. There was no indication inside the VM that the connection wasn’t live; it just didn’t work. It wasn’t until I edited the VM, set the virtual NIC to a different port group, and then set it back again that it started working as expected. Lesson learned: don’t delete port groups.
  • Second, after creating a port group on a vSwitch with no VLAN ID, one of the other port groups on the same vSwitch appeared to “lose” its VLAN ID, at least as far as VirtualCenter was concerned. In other words, the VLAN ID was listed as “*” in VirtualCenter, even though a VLAN ID was indeed configured for that port group. The “esxcfg-vswitch -l” command (that’s a lowercase L) on the host still showed the assigned VLAN ID for that port group, however.
  • It was also the “esxcfg-vswitch” command that helped me troubleshoot the problem with the deleted/recreated port group described above. Even after recreating the port group, esxcfg-vswitch still showed 0 used ports for that port group on that vswitch, which told me that the virtual machine’s network connection was still somehow askew.

Hopefully this information will prove useful to those of you out there trying to set up NIC teaming and/or VLAN trunking in your environment. I would recommend taking this one step at a time, not all at once like I did; this will make it easier to troubleshoot problems as you progress through the configuration.

Tags: , , , , , , ,

Assorted Links

I have a variety of links and articles, mostly security related, that aren’t really substantial enough for a full-blown entry, but I wanted to mention them anyway.

UPDATE:  Apparently, the Wi-Fi hijacking of an Apple MacBook was indeed demonstrated yesterday; see this updated article.

Tags: , , , , ,

« Older entries § Newer entries »