Xensploit, as it’s called, is the recently demonstrated exploit that allows virtual machines (VMs) that are “in flight” during a live migration (XenMotion in Citrix XenServer, VMotion in VMware ESX Server) to be manipulated. If you haven’t yet read the PDF that describes Xensploit, I highly encourage that you take a look at it. It’s very enlightening as to exactly what can be done to an in-flight VM.
Naturally, the best way to protect against this particular problem is to guard the integrity of the live migration network. For simplicity’s sake, I’ll refer to this as the VMotion network from this point on, but keep in mind that it is equally applicable to any network connections on any virtualization platform that uses live migration.
The most surefire way to protect the VMotion network is to place it on its own dedicated, physically separate network, using separate physical NICs plugged into separate physical switches that do not possess any connections to production networks. This will ensure that unauthorized access to the VMotion network is prevented, but comes with disadvantages as well: this configuration requires more physical NICs and more physical switches than other configurations.
In implementations with limited numbers of physical NICs, however, this isn’t really an option. In these cases, the use of Layer 2 VLANs and multiple port groups on a single ESX Server vSwitch to allow VMotion traffic to share the same physical NICs and the same physical switches as other traffic is a very common solution. In fact, it’s a solution that I’ve recommended many, many times. But does this configuration provide enough protection for the VMotion network?
The real question is, does a simple Layer 2 VLAN offer enough protection? That question, in turn, spawns other questions: what kinds of attacks are there against Layer 2 VLANs? Is it possible for traffic to hop across VLAN boundaries?
Armed with those questions, I set out to do some research. You can see some links in my del.icio.us bookmarks that pertain to the research I did. Basically, I found that Layer 2 VLAN attacks boil down to two basic types:
- The first type involves a malicious host pretending to be a switch and forming a 802.1Q VLAN trunk with the real switch, which then passes traffic from all VLANs across to the malicious host.
- The second type involves double-tagged 802.1Q frames and the native VLAN, whereby traffic can, under specific circumstances, hop from one VLAN to another without any Layer 3 routing involved.
(Network and security gurus out there feel free to elaborate or correct me on this information.)
The best way to address attack vector #1 is to explicitly disable automatic trunk negotiation on all ports that don’t need to be trunks. From my research and my (relatively) limited knowledge of Cisco IOS, this command should do it:
switchport mode access
This explicitly forces the switchport into a state where it will not negotiate an 802.1Q VLAN trunk with another device, hence killing attack vector #1 dead in its tracks. Again, this should only be done on the ports that are not connected to the ESX Servers; otherwise, you’re shooting yourself in the foot. Keep in mind that uplinks to other switches, ports going out to IP phones, etc., may also need to be configured as VLAN trunks. Really, the issue is about controlling the creation of unauthorized VLAN trunks in order to control VLAN leakage. One of the CCIEs at the office mentioned a “switchport noneg” command, but I’m not familiar with that one; anyone have more details?
Addressing attack vector #2 is also relatively straightforward. Since the VLAN hopping exploit takes advantage of the nature of the native VLAN (which I’ve discussed before here), setting the native VLAN on the trunk ports connecting to the ESX Servers to a VLAN that is not used anywhere else in the network will prevent VLAN hopping. For example:
switchport trunk native vlan 4094
From my NIC teaming and VLAN trunking article (one of the most popular articles on the site, by the way), you’ll see that I recommended at that time the creation of a VLAN that is used only as the native VLAN for 802.1Q trunks. At that time, I didn’t fully understand why; now, after additional research, I understand why I needed to do that and I also recognize that the suggested configuration provides a layer of protection against VLAN hopping attacks.
To summarize:
- To protect against malicious hosts forming unauthorized 802.1Q trunks, disable automatic trunk negotiation and explicitly/manually create trunks. The key here is to ensure that VLANs don’t inadvertently “leak” beyond where they should (also see note below about specifying the allowed VLANs).
- To protect against VLAN hopping, create a VLAN that is used only as the native VLAN on the 802.1Q trunks connecting to the ESX Servers. This VLAN must not be used anywhere else in the LAN. Set this VLAN as the native VLAN on the trunks into the ESX Servers.
In addition, my networking mentors also recommended the “switchport trunk allowed vlan” command to specify which VLANs are allowed to cross 802.1Q VLAN trunks. This will help ensure that VMotion traffic is limited to only those switches that absolutely must carry it; again, we’re seeking to control VLAN leakage.
With these configurations in place, using a Layer 2 VLAN to carry VMotion traffic on the same physical NICs and same physical switches as other types of traffic is fairly well-protected against malicious interference. While it is not as secure as a physical separate, dedicated network, it is secure enough for most organizations and the reduction in infrastructure needs generally outweighs the risks.
More information and discussion about Xensploit—and protecting against Xensplot—at the following links:
Keeping Your VMotion Traffic Secure
Two vulnerabilities found in VMware virtualization products
‘Live’ VMs at Risk While in Transit
News Flash: If You Don’t Follow Suggested Security Hardening Guidelines, Bad Things Can Happen…
UPDATE: I’ve updated the wording above to more properly reflect the goal behind the use of the “switchport mode access” command, and when it should be used. Colin, thanks for the feedback and clarification!
Tags: ESX, Networking, Security, Virtualization, VLAN, VMotion, VMware
-
[The most surefire way to protect the VMotion network is to place it on its own dedicated, physically separate network, using separate physical NICs plugged into separate physical switches that do not possess any connections to production networks.]
In Vmware two-node clusters I recommend keeping the vmkernel/vmotion network completely off of a physical switch whether on/off of production nets. Rather use cross-over cables between nodes, which will completely isolate the packet flows.
-
Hey, Scott.
Couldn’t you also setup an empty native vlan, that way you don’t have to worry about vlan hopping, ever?
like this:
no switchport native vlan
switchport mode acces
switchport access vlan ###I’m by no means a Cisco IOS expert, but that’s what’s in our build doc for isolating particular switchports in an IBM BladeCenter.
-
Great post, Scott!
It highlights how a seemingly simple mechanism like VLANs to provide isolation for your migration traffic is not always so simple, which can result in people not following the best practices for security due to factors like configuration and deployment complexity. Guides like this will certainly help alleviate that problem.
Regards,
Jon Oberheide -
Hi,
So, I’m a little confused over your advice on whether to set the Native VLAN to that of the Service Console, as described here:http://blog.scottlowe.org/2007/11/13/esx-server-and-the-native-vlan/
Doesn’t that conflict with your advice in this column about setting the Native VLAN to one that is not used anywhere else in the network?
I’m trying to figure out how to deploy ESX via the network without exposing the network to additional vulnerabilities. Any guidance would be appreciated.
Thanks,
-
Scott,
Thanks for the reply. Unfortunately (hah, never had to say that in this context), security is one of the customer’s greatest concerns. Perhaps eventually the Linux install kernel will figure out how to support VLANs.Our other alternative is to add uplinks to the blade chassis and use the HP Virtual Connect profiles to assign the NICs to the access port uplinks during provisioning and move them to the trunked uplinks afterwards. Annoying for operations, but at least the management can all be accomplished remotely.
Cheers,



8 comments
Comments feed for this article
Trackback link: http://blog.scottlowe.org/2008/03/05/vmotion-and-vlan-security/trackback/