blog.scottlowe.org

The weblog of an IT pro specializing in virtualization, storage, and servers

Archive for Articles Tagged VLAN

VLANs on ESX with Nortel Switches

June 24th, 2008 by slowe

I ran into a recent issue with a customer who was having problems getting VLANs to work as expected with ESX. The basic scenario was that ESX would refuse to work properly with a VLAN that was marked as the native (or untagged) VLAN. This was causing no end of grief for this customer.

I’ve discussed VLANs extensively—first with this blog post, then again here, and again in this SearchVMware.com tip—so I was confident that I could help the customer resolve this issue. Granted, the customer was using Nortel switches, with which I am completely unfamiliar, but a switch is a switch, right?

Not quite. While the configuration seemed correct in all ways, it turns out there is a checkbox somewhere labeled “untag-default-vlan”. If this box is not checked, then the default VLAN gets tagged. Since ESX wasn’t configured with a VLAN tag, then it doesn’t see the network traffic. Once that box gets checked, then the default (or native) VLAN doesn’t get tagged and will be properly recognized by an ESX port group without a VLAN tag configured.

So, if you’re using Nortel switches and having problems with VLANs, double-check this setting.

Category: Networking, Virtualization | 3 Comments »

Configuration for Protecting VMotion

March 7th, 2008 by slowe

As a follow-up to my earlier post about using VLANs with VMotion, I wanted to also share a brief configuration snippet (based on a Cisco IOS-based switch) that aligns with the recommendations in that article. This is nothing new to those who are familiar with IOS, but for those readers who are not it may provide some helpful information.

For the purposes of this configuration, I’m making the following assumptions:

  • VLANs 100 and 200 are the VMotion VLAN and some other production VLAN, respectively (perhaps a VLAN carrying virtual machine traffic, or a VLAN carrying Service Console traffic)
  • VLAN 4094 is the “ESX Trunk” VLAN, which isn’t used anywhere else in the network for any purpose

Here’s the recommended port configuration for ports connecting to an ESX Server:

interface g0/1
switchport trunk encapsulation dot1q
switchport trunk native vlan 4094
switchport mode trunk
switchport trunk allowed vlan 100,200,4094
switchport noneg
spanning-tree portfast-trunk

Technically, the “switchport noneg” command won’t really do anything; it disables DTP (Dynamic Trunking Protocol) but DTP isn’t supported by ESX Server anyway.

A couple of notes about this configuration:

  • Refer back to my article on the native VLAN with ESX Server; by setting the native VLAN to 4094 (the “ESX Trunk” VLAN), you won’t carry that traffic into the ESX Server. If used on a vSwitch that also carried Service Console traffic, then it could impact automated build scripts.
  • If you needed to combine this configuration with EtherChannel, refer back to my original article on ESX Server, NIC teaming, and VLAN trunking.

Keep in mind the other recommendations as well: explicitly control trunking to other devices and explicitly control the transmission of VLANs across trunks to control “VLAN leakage”.

CCIEs and other experts, I’d welcome any other suggestions as well as recommended commands to use in the switch configuration to help maximize security and minimize exposure to VLAN-based attacks.

Category: Security, Networking, Virtualization | No Comments »

VMotion and VLAN Security

March 5th, 2008 by slowe

Xensploit, as it’s called, is the recently demonstrated exploit that allows virtual machines (VMs) that are “in flight” during a live migration (XenMotion in Citrix XenServer, VMotion in VMware ESX Server) to be manipulated. If you haven’t yet read the PDF that describes Xensploit, I highly encourage that you take a look at it. It’s very enlightening as to exactly what can be done to an in-flight VM.

Naturally, the best way to protect against this particular problem is to guard the integrity of the live migration network. For simplicity’s sake, I’ll refer to this as the VMotion network from this point on, but keep in mind that it is equally applicable to any network connections on any virtualization platform that uses live migration.

The most surefire way to protect the VMotion network is to place it on its own dedicated, physically separate network, using separate physical NICs plugged into separate physical switches that do not possess any connections to production networks. This will ensure that unauthorized access to the VMotion network is prevented, but comes with disadvantages as well: this configuration requires more physical NICs and more physical switches than other configurations.

In implementations with limited numbers of physical NICs, however, this isn’t really an option. In these cases, the use of Layer 2 VLANs and multiple port groups on a single ESX Server vSwitch to allow VMotion traffic to share the same physical NICs and the same physical switches as other traffic is a very common solution. In fact, it’s a solution that I’ve recommended many, many times. But does this configuration provide enough protection for the VMotion network?

The real question is, does a simple Layer 2 VLAN offer enough protection? That question, in turn, spawns other questions: what kinds of attacks are there against Layer 2 VLANs? Is it possible for traffic to hop across VLAN boundaries?

Armed with those questions, I set out to do some research. You can see some links in my del.icio.us bookmarks that pertain to the research I did. Basically, I found that Layer 2 VLAN attacks boil down to two basic types:

  1. The first type involves a malicious host pretending to be a switch and forming a 802.1Q VLAN trunk with the real switch, which then passes traffic from all VLANs across to the malicious host.
  2. The second type involves double-tagged 802.1Q frames and the native VLAN, whereby traffic can, under specific circumstances, hop from one VLAN to another without any Layer 3 routing involved.

(Network and security gurus out there feel free to elaborate or correct me on this information.)

The best way to address attack vector #1 is to explicitly disable automatic trunk negotiation on all ports that don’t need to be trunks. From my research and my (relatively) limited knowledge of Cisco IOS, this command should do it:

switchport mode access

This explicitly forces the switchport into a state where it will not negotiate an 802.1Q VLAN trunk with another device, hence killing attack vector #1 dead in its tracks. Again, this should only be done on the ports that are not connected to the ESX Servers; otherwise, you’re shooting yourself in the foot. Keep in mind that uplinks to other switches, ports going out to IP phones, etc., may also need to be configured as VLAN trunks. Really, the issue is about controlling the creation of unauthorized VLAN trunks in order to control VLAN leakage. One of the CCIEs at the office mentioned a “switchport noneg” command, but I’m not familiar with that one; anyone have more details?

Addressing attack vector #2 is also relatively straightforward. Since the VLAN hopping exploit takes advantage of the nature of the native VLAN (which I’ve discussed before here), setting the native VLAN on the trunk ports connecting to the ESX Servers to a VLAN that is not used anywhere else in the network will prevent VLAN hopping. For example:

switchport trunk native vlan 4094

From my NIC teaming and VLAN trunking article (one of the most popular articles on the site, by the way), you’ll see that I recommended at that time the creation of a VLAN that is used only as the native VLAN for 802.1Q trunks. At that time, I didn’t fully understand why; now, after additional research, I understand why I needed to do that and I also recognize that the suggested configuration provides a layer of protection against VLAN hopping attacks.

To summarize:

  • To protect against malicious hosts forming unauthorized 802.1Q trunks, disable automatic trunk negotiation and explicitly/manually create trunks. The key here is to ensure that VLANs don’t inadvertently “leak” beyond where they should (also see note below about specifying the allowed VLANs).
  • To protect against VLAN hopping, create a VLAN that is used only as the native VLAN on the 802.1Q trunks connecting to the ESX Servers. This VLAN must not be used anywhere else in the LAN. Set this VLAN as the native VLAN on the trunks into the ESX Servers.

In addition, my networking mentors also recommended the “switchport trunk allowed vlan” command to specify which VLANs are allowed to cross 802.1Q VLAN trunks. This will help ensure that VMotion traffic is limited to only those switches that absolutely must carry it; again, we’re seeking to control VLAN leakage.

With these configurations in place, using a Layer 2 VLAN to carry VMotion traffic on the same physical NICs and same physical switches as other types of traffic is fairly well-protected against malicious interference. While it is not as secure as a physical separate, dedicated network, it is secure enough for most organizations and the reduction in infrastructure needs generally outweighs the risks.

More information and discussion about Xensploit—and protecting against Xensplot—at the following links:

Keeping Your VMotion Traffic Secure
Two vulnerabilities found in VMware virtualization products
‘Live’ VMs at Risk While in Transit
News Flash: If You Don’t Follow Suggested Security Hardening Guidelines, Bad Things Can Happen…

UPDATE: I’ve updated the wording above to more properly reflect the goal behind the use of the “switchport mode access” command, and when it should be used. Colin, thanks for the feedback and clarification!

Category: Security, Networking, Virtualization | 8 Comments »

HP VirtualConnect Clarification

January 30th, 2008 by slowe

SearchVMware.com recently published an article of mine about using HP VirtualConnect with ESX Server and the impact it has on configuring ESX Server networking.  Based on the comments to my original post about the article, it appears that I didn’t adequately explain the interaction between VirtualConnect’s Standard Ethernet Networks and VLAN tagging.  I’d like to thank those readers that asked about this issue and take a moment to clarify.

Standard Ethernet Networks are defined in the HP VirtualConnect Manager and allow you to control the mapping between physical NIC ports on the blades (the “downstream ports”) and the connections from the VirtualConnect switches to the external network infrastructure (the “upstream ports”).  For example, I could define a Standard Ethernet Network called “Production” and specify that Production would uplink either via Slot 2 Port 7 or Slot 2 Port 8.

That part is pretty straightforward.  Here’s how it plays into VLAN tagging with ESX Server.  Most organizations prefer to use VST (Virtual Switch Tagging; described in more detail in this article).  To use VST, the ports on the network infrastructure must be configured as 802.1Q VLAN trunks so that the external physical switches will pass the VLAN tags to the vSwitches inside ESX Server, where ESX Server can then handle the traffic appropriately.

To use VST with VirtualConnect and Standard Ethernet Networks, there is no difference.  The VirtualConnect upstream ports, as defined in the Standard Ethernet Network configuration, must be plugged into physical switch ports that are configured as 802.1Q VLAN trunks.  When you plug the upstream port into an 802.1Q VLAN trunk, the downstream port—the port going to the ESX Server—automatically becomes an 802.1Q VLAN trunk as well.  The VLAN tags will pass all the way to the ESX Server’s vSwitches, where ESX Server can deal with the traffic appropriately.

Likewise, if the VirtualConnect’s upstream ports are plugged into a static access port—a port which is not configured as an 802.1Q VLAN trunk but instead carries traffic only for a single VLAN—then the downstream ports also become static access ports and you are, effectively, using External Switch Tagging (EST).

I stated this in the original article in the third paragraph in the section titled “How Virtual Connect differs in ESX”:

In either way, these Ethernet networks will “pass through” the 802.1Q status of the physical switch port to which it is uplinked. If the physical switch port to which they are connected is configured as an 802.1Q VLAN trunk, then the downstream ports will act as 802.1Q VLAN trunks. Likewise, if the uplink is connected to a switch port that is configured as a static access port, then the downstream ports will act as static access ports.

Shared Uplink Sets, on the other hand, are different.  They don’t behave in the same way as Standard Ethernet Networks.  With Shared Uplink Sets, you are forced to use EST because the VLAN tags are stripped away at the VirtualConnect level.  The associated networks that are defined in VirtualConnect Manager define the different VLANs, and each downstream port is connected to an associated network.  Unlike with Standard Ethernet Networks, no VLAN tags are passed up to the ESX Server with Shared Uplink Sets.

So, to summarize:

  • Standard Ethernet Network uplink connected to external switch port configured as 802.1Q VLAN trunk allows ESX Server to see VLAN tags and supports both VST and Virtual Guest Tagging (VGT)
  • Standard Ethernet Network uplink connected to external switch port configured as static access port only allows EST configuration
  • Shared Uplink Set only allows EST configuration

I hope this clarifies the interaction between HP VirtualConnect and ESX Server networking.  I welcome any further questions or clarifications in the comments below.  Thanks!

Category: Networking, Virtualization | 2 Comments »

New VLAN Article at SearchVMware.com

December 7th, 2007 by slowe

SearchVMware.com has published another article of mine, this one on the various VLAN configurations within VMware Infrastructure 3 (VI3), the differences between each of them, and when each configuration may be appropriate.

Here’s the obligatory teaser excerpt from the article:

When VMware gurus talk about the use of virtual LANs (VLANs) with VMware Infrastructure 3 (VI3), they are usually referring to the use of VLAN trunks. There are, however, three other types of VLAN configurations VI3 uses: virtual switch tagging (VST), external switch tagging (EST) and virtual guest tagging (VGT).
 
This tip is your guide to VST, EST and VGT, covering what they are and when to use them.

Read the full article here.

Between this latest VLAN article, an earlier VLAN article published on SearchVMware.com, a VLAN article published here on my site, and the latest discussion of the use of the native VLAN, I’m trying to make sure everyone has the information they need to understand and use VLANs in their VI3 implementation.  If there are other networking-related articles you’d like to see, please shoot me an e-mail and let me know, or post your ideas/suggestions in the comments below.  Thanks!

Category: Networking, Virtualization | 1 Comment »

ESX Server and the Native VLAN

November 13th, 2007 by slowe

It was December 2006 when I first published this article on using NIC teams and VLANs with ESX Server.  As you can see in the “Top Posts” section in the sidebar, that article has since claimed the top position in the most popular post here on this blog.  Note that “most popular” does not translate into “most commented”; that distinction falls to one of the Linux-AD integration articles, although I not sure which one right at the moment.

In that previous article, I demonstrated the use of a “dummy VLAN” which was set as the native VLAN for the VLAN trunk, like so:

s3(config)#int gi0/23
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094

The idea behind the dummy VLAN was this:  because ESX Server needed—or so I thought—all the traffic to be tagged as it came across the VLAN trunk, creating a VLAN that is never used and setting it as the native VLAN solves our problem.  Remember that the native VLAN is the VLAN whose traffic is not tagged as it travels across the trunk into ESX Server or another physical switch.

It turns out that I was actually mistaken—sort of.  It’s true that the native VLAN won’t get tagged, yes; however, it’s not true that ESX Server requires all the traffic to be tagged.  What was missing in my configuration was, quite simply, a port group that was intended to receive untagged traffic.

Configuring ESX Server to support VLANs involves the creation of one or more port groups configured with matching VLAN IDs.  If a port group has a VLAN ID, it will essentially only accept traffic tagged with that VLAN ID.  Traffic not tagged with that VLAN ID, or untagged traffic, will be ignored.  So, if you create a series of port groups on a vSwitch for your various VLANs but neglect to create a port group that does not have a VLAN ID specified, untagged traffic will be ignored because there are no port groups configured to receive untagged traffic.

If, on the other hand, you create a series of port groups for your various VLANs and you create a port group that does not have a VLAN ID specified, then both tagged and untagged traffic will be handled correctly:

  • Tagged traffic with a VLAN ID matching one of the configured port groups will be sent to that port group
  • Tagged traffic with a VLAN ID not matching any configured port group will be ignored
  • Untagged traffic will be directed to the port group that does not have a VLAN ID configured

Now, it is true that VMware’s best practices documents (sorry, I don’t have a link for them at the moment) recommend that users avoid the use of the native VLAN, and one of the CCIEs in my office indicated that it is also considered a networking best practice to avoid the use of VLAN 1, the default native VLAN on Cisco equipment, for anything other than switch management traffic.  With those things in mind, it may not be an issue for many deployments.

Except…

…when using automated scripts to build and install ESX Server.  You see, after ESX Server is installed, then specifying a VLAN ID on the Service Console port group is no big deal and it will work just fine, as I described earlier.  Before ESX Server is installed, though, there is no VLAN support and no way to specify a VLAN ID.  Hence, installations that need to download and install from a FTP server or an NFS mount will fail, because the system won’t have any network connectivity.  (Everyone understands why, right?  If you don’t, go back and read the earlier paragraphs again.)

What’s the fix here?  We come back, full circle, to the idea of the default VLAN and untagged traffic.  While the system won’t accept any tagged traffic during the install process, it will happily accept untagged traffic during the installation.  Therefore, if you set the native VLAN to be the VLAN to which the Service Console should be connected once the installation is complete, then everything should work just fine.

Don’t believe me?  From the “Show Me” state?  Perform this quick test yourself:

  1. On a test ESX Server, configure the Service Console port group with a VLAN ID of 0.  The “esxcfg-vswitch” command is handy for this.
  2. Set the switch port to which the Service Console is physically connected to use a native VLAN different than the VLAN the Service Console was previously using.  A VLAN with DHCP present is ideal, as you’ll see with the next step.
  3. Using the “dhclient” command, try to obtain a DHCP lease.  You should get a DHCP lease for whatever subnet matches the default VLAN.
  4. Repeat steps 2 and 3 and you should see the DHCP lease follow the native VLAN configuration, i.e., whatever VLAN is set to native will be the VLAN that issues a DHCP address to your Service Console.

Hopefully, this helps clear up some of the misunderstanding and confusion around the use of VLANs, VLAN trunks, port groups, and the native—or untagged—VLAN.  Feel free to hit me up in the comments if you have any questions!

Category: Networking, Virtualization | 4 Comments »

My First Articles!

November 2nd, 2007 by slowe

Back at the start of October, about a month ago, I wrote about an exciting new opportunity that had recently opened up for me.  I couldn’t really disclose any details at that time, but now that everything has finally materialized into reality I can share exactly what is happening.

So here’s the opportunity that I was talking about—I’m writing as a contributor for SearchVMware.com, a new TechTarget site focused on VMware virtualization.  I am really excited about this!  I’ve written three articles so far, all of which were published yesterday:

VDI on VMware Virtual Infrastructure: Using the three main components

In some aspects, Virtual Desktop Infrastructure (VDI) takes the best of server-based computing and removes many of the drawbacks. Most people understand that the concept of VDI is using virtualization software, typically VMware Virtual Infrastructure 3 (VI3), to host instances of a desktop operating system instead of a server operating system…

Configuring VLANs in VMware VI3 (Virtual Infrastructure 3)

The key to understanding VI3’s support for VLANs lies with the concept of a “VLAN trunk”. A non-trunk port—also called access port—carries traffic for a single VLAN, but a trunk port carries traffic for multiple VLANs simultaneously…

Authentication in a VMware VI3 Implementation

Many organizations that have implemented VMware Virtual Infrastructure 3 (VI3), including both ESX Server and VirtualCenter, but do not have a firm grasp of how these components handle authentication…

More articles are in the works, so be sure to stay tuned to SearchVMware.com!  And, if you have any suggestions for future articles that should be written, please let me know.  Or, better yet, register at SearchVMware.com and let me know there too!

Category: Virtualization | 2 Comments »

VMworld 2007 Partner Day Sessions

September 10th, 2007 by slowe

I had the opportunity to attend three Partner Day sessions today, all three from the “Advanced In-Depth Technical Track.”

The first was this morning, and it was focused on an in-depth technical review of VMware Consolidated Backup.  I’ve written a few articles about VMware Consolidated Backup before:

The presenter was very knowledgeable; in fact, he was the one running the VCB labs at VMworld (and ran the labs at VMworld last year, which I attended).  I got some great information from the session, and when I have more time I’ll compile that information here.  Some of the key points I took away from the session included information on a command-line interface (CLI) for VMware Converter that allows for automated restores of VCB full VM backups using Converter (I’m really excited about looking into that one); good information on the minimal permissions needed for the user that logs into VirtualCenter and whose login information is hard-coded in config.js (this is a big security concern for many customers); and some RDM compatibility mode issues (RDM in virtual compatibility mode versus physical compatibility mode).  It was a great session; I thoroughly enjoyed it.

Unfortunately, the next session (the first session after lunch) was not nearly as useful.  Although included in the same in-depth track, there was a remarkable lack of technical information.  Fortunately, it was only 45 minutes long.

The final session of the day was on VMware’s VDI and ACE solutions, including a first look at VDM (Virtual Desktop Manager) 2.0, which VMware just publicly announced earlier today.  The presenter, Tommy Walker of VMware, was a great presenter and I enjoyed the presentation.  I’m hoping to be able to catch up with Tommy later this week to conduct some in-depth comparisons of VDM with other brokers, such as Leostream (a broker with which I’ve worked fairly extensively—and which, by the way, just released version 5.0 of their broker).

As soon as I have some additional time, I’ll try to post some additional information about these sessions and some of the in-depth technical details presented.

Category: Virtualization | 2 Comments »

VLAN Interfaces with OpenBSD 4.1

August 31st, 2007 by slowe

I’ve been doing some interoperability testing with VMware ESX Server and VLANs (a separate article on that is in the works), and needed a guest OS that supported VLAN interfaces.  From my previous (but limited) experience with OpenBSD, I suspected that VLAN interfaces were indeed supported, and after setting up a quick VM running OpenBSD 4.1 I found that I was indeed correct.  Not only are they supported, they are incredibly easy to setup and configure.

The command to configure a VLAN interface is simply a variation of the standard ifconfig command (note that I’m using a backslash to denote a line continuation, so that I can wrap lines here for readability):

ifconfig <VLAN interface name> vlan <VLAN ID> \
vlandev <physical network device>

So, by example, the command I used to create a VLAN interface for VLAN ID 3 looked like this:

ifconfig vlan3 vlan 3 vlandev pcn0

I did find that I couldn’t name the VLAN interface (“vlan3”, in this case) anything other than vlanX, where X was a number.  I don’t know if this is an OpenBSD limitation, or just an error on my part.  The latter is certainly a distinct possibility.

Once the VLAN interface, is created, then I just followed the standard OpenBSD way of provisioning an interface—create /etc/hostname.ifname (where ifname is the name of the VLAN interface) for each VLAN interface and that should be that.

The ESX Server configuration to support these VLAN interfaces at the guest level was pretty easy, too.  I just had to create a port group with a VLAN ID of 4095 and attach the OpenBSD guest to that port group.  ESX Server automatically passed the VLAN tags up to the guest and everything worked as expected.  (Again, I’ll have a separate article on that published soon.)

Next, perhaps I’ll try this with Linux or Solaris…

Category: Networking, Unix | 1 Comment »

ESX Server, NIC Teaming, and VLAN Trunking

December 4th, 2006 by slowe

Before we get into the details, allow me to give credit where credit is due.  First, thanks to Dan Parsons of IT Obsession for an article that jump-started the process with notes on the Cisco IOS configuration.  Next, credit goes to the VMTN Forums, especially this thread, in which some extremely useful information was exchanged.  I would be remiss if I did not adequately credit these sources for the information that helped make this testing successful.

There are actually two different pieces described in this article.  The first is NIC teaming, in which we logically bind together multiple physical NICs for increased throughput and increased fault tolerance.  The second is VLAN trunking, in which we configure the physical switch to pass VLAN traffic directly to ESX Server, which will then distribute the traffic according to the port groups and VLAN IDs configured on the server.  I wrote about ESX and VLAN trunking a long time ago and ran into some issues then; here I’ll describe how to work around the issues I ran into at that time.

So, let’s have a look at these two pieces.  We’ll start with NIC teaming.

Configuring NIC Teaming

There’s a bit of confusion regarding NIC teaming in ESX Server and when switch support is required.  You can most certainly create NIC teams (or “bonds”) in ESX Server without any switch support whatsoever.  Once those NIC teams have been created, you can configure load balancing and failover policies.  However, those policies will affect outbound traffic only.  In order to control inbound traffic, we have to get the physical switches involved.  This article is written from the perspective of using Cisco Catalyst IOS-based physical switches.  (In my testing I used a Catalyst 3560.)

To create a NIC team that will work for both inbound and outbound traffic, we’ll create a port channel using the following commands:

s3(config)#int port-channel1
s3(config-if)#description NIC team for ESX server
s3(config-if)#int gi0/23
s3(config-if)#channel-group 1 mode on
s3(config-if)#int gi0/24
s3(config-if)#channel-group 1 mode on

This creates port-channel1 (you’d need to change this name if you already have port-channel1 defined, perhaps for switch-to-switch trunk aggregation) and assigns GigabitEthernet0/23 and GigabitEthernet0/24 into team.  Now, however, you need to ensure that the load balancing mechanism that is used by both the switch and ESX Server matches.  To find out the switch’s current load balancing mechanism, use this command in enable mode:

show etherchannel load-balance

This will report the current load balancing algorithm in use by the switch.  On my Catalyst 3560 running IOS 12.2(25), the default load balancing algorithm was set to “Source MAC Address”.  On my ESX Server 3.0.1 server, the default load balancing mechanism was set to “Route based on the originating virtual port ID”.  The result?  The NIC team didn’t work at all—I couldn’t ping any of the VMs on the host, and the VMs couldn’t reach the rest of the physical network.  It wasn’t until I matched up the switch/server load balancing algorithms that things started working.

To set the switch load-balancing algorithm, use one of the following commands in global configuration mode:

port-channel load-balance src-dst-ip (to enable IP-based load balancing)
port-channel load-balance src-mac (to enable MAC-based load balancing)

There are other options available, but these are the two that seem to match most closely to the ESX Server options.  I was unable to make this work at all without switching the configuration to “src-dst-ip” on the switch side and “Route based on ip hash” on the ESX Server side.  From what I’ve been able to gather, the “src-dst-ip” option gives you better utilization across the members of the NIC team than some of the other options.  (Anyone care to contribute a URL that provides some definitive information on that statement?)

Creating the NIC team on the ESX Server side is as simple as adding physical NICs to the vSwitch and setting the load balancing policy appropriately.  At this point, the NIC team should be working.

Configuring VLAN Trunking

In my testing, I set up the NIC team and the VLAN trunk at the same time.  When I ran into connectivity issues as a result of the mismatched load balancing policies, I thought they were VLAN-related issues, so I spent a fair amount of time troubleshooting the VLAN side of things.  It turns out, of course, that it wasn’t the VLAN configuration at all.  (In addition, one of the VMs that I was testing had some issues as well, and that contributed to my initial difficulties.)

To configure the VLAN trunking, use the following commands on the physical switch:

s3(config)#int port-channel1
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094

This configures the NIC team (port-channel1, as created earlier) as a 802.1q VLAN trunk.  You then need to repeat this process for the member ports in the NIC team:

s3(config)#int gi0/23
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094
s3(config-if)#int gi0/24
s3(config-if)#switchport trunk encapsulation dot1q
s3(config-if)#switchport trunk allowed vlan all
s3(config-if)#switchport mode trunk
s3(config-if)#switchport trunk native vlan 4094

If you haven’t already created VLAN 4094, you’ll need to do that as well:

s3(config)#int vlan 4094
s3(config-if)#no ip address

The “switchport trunk native vlan 4094” command is what fixes the problem I had last time I worked with ESX Server and VLAN trunks; namely, that most switches don’t tag traffic from the native VLAN across a VLAN trunk.  By setting the native VLAN for the trunk to something other than VLAN 1 (the default native VLAN), we essentially force the switch to tag all traffic across the trunk.  This allows ESX Server to handle VMs that are assigned to the native VLAN as well as other VLANs.

On the ESX Server side, we just need to edit the vSwitch and create a new port group.  In the port group, specify the VLAN ID that matches the VLAN ID from the physical switch.  After the new port group has been assigned, you can place your VMs on that new port group (VLAN) and—assuming you have a router somewhere to route between the VLANs—you should have full connectivity to your newly segregated virtual machines.

Final Notes

I did encounter a couple of weird things during the setup of this configuration (I plan to leave the configuration in place for a while to uncover any other problems).

  • First, during troubleshooting, I deleted a port group on one vSwitch and then re-created it on another vSwitch.  However, the virtual machine didn’t recognize the connection.  There was no indication inside the VM that the connection wasn’t live; it just didn’t work.  It wasn’t until I edited the VM, set the virtual NIC to a different port group, and then set it back again that it started working as expected.  Lesson learned: don’t delete port groups.
  • Second, after creating a port group on a vSwitch with no VLAN ID, one of the other port groups on the same vSwitch appeared to “lose” its VLAN ID, at least as far as VirtualCenter was concerned.  In other words, the VLAN ID was listed as “*” in VirtualCenter, even though a VLAN ID was indeed configured for that port group.  The “esxcfg-vswitch -l” command (that’s a lowercase L) on the host still showed the assigned VLAN ID for that port group, however.
  • It was also the “esxcfg-vswitch” command that helped me troubleshoot the problem with the deleted/recreated port group described above.  Even after recreating the port group, esxcfg-vswitch still showed 0 used ports for that port group on that vswitch, which told me that the virtual machine’s network connection was still somehow askew.

Hopefully this information will prove useful to those of you out there trying to set up NIC teaming and/or VLAN trunking in your environment.  I would recommend taking this one step at a time, not all at once like I did; this will make it easier to troubleshoot problems as you progress through the configuration.

Category: Networking, Interoperability, Virtualization | 34 Comments »