More on Hyper-V and NIC Teaming

My original article on Hyper-V’s issues with NIC teaming has gotten a fair amount of attention.

First Keith Ward over at Virtualization Review blogged about this issue. In his initial post, Keith basically pointed out the issue and then asked the readers for feedback: is this really as big of an issue as it seemed? The readers who responded were split; one blasted Hyper-V and the other wasn’t too concerned.

Keith followed that up with another post in which he provides a response from Microsoft regarding this issue:

NIC Teaming is a capability provided by our hardware partners such Intel and Broadcom. Microsoft supports our partners who provide this capability. This is true whether the customer is running Windows, Exchange, SQL, Hyper-V, etc. We’ll have a detailed KB article about this coming out soon.

Keith’s second article was then also picked up by DABCC.

While Microsoft is sticking to the “this is a device driver issue” mantra, I’m not so sure I agree. I can see their position to a point. In Keith’s second post, analyst Chris Wolf brings up storage drivers. This is similar in that Microsoft relies upon the storage vendors to provide device-specific modules (DSMs) that provide the multipathing functionality. So, like with the NIC teaming, Microsoft is pushing the functionality back to the device drivers and vendors who write them.

But that’s as far as this comparison can be taken. Microsoft officially supports storage multipathing; they don’t officially support NIC teaming. (See this KB article or this KB article.) In addition, Microsoft provides an official framework in which the storage vendors can operate: the MPIO framework. There is no such framework for network redundancy. In fact, if such a framework existed then much of the dissatisfaction with Microsoft over this issue would be alleviated, in my opinion.

Instead, there is no framework to provide official NIC redundancy for any Microsoft product running on Windows Server, and Windows itself doesn’t provide that functionality. Users are forced to adopt unsupported means to provide NIC redundancy. Why shouldn’t they be upset?

By the way, since publishing the first article I’ve been contacted by one of the presenters of the VIR358 session during this which issue came to light, but he has not yet been able to provide any additional information. As soon as more information is available, I’ll be sure to let everyone know here.

Tags: , , , ,

  1. Duncan’s avatar

    Well the biggest problem with doing NIC teaming this way is that you can’t combine several vendors in 1 team. If a driver fails, the complete team fails. So this way doesn’t provide you with optimal redundancy in my opinion.

  2. Andrew’s avatar

    First, I’m glad Scott is bringing light to this issue. It needs crystal clarification and Microsoft needs to pump out a recommended strategy. Clustering and Live Migration is not a replacement.

    I really hope this article is not just Hyper-V bashing for the sake of it and truly helps us find a great low-cost virtualization solution.

    A couple notes:

    The KB articles are about Virtual Server not Hyper-V.

    I’ve used Etherchannel (NIC bandwidth load-balancing and fault-tolerance) for many years. Intel, Broadcom, Cisco etc… are the experts in this arena not Microsoft or VMware.

    Historically, Broadcom does allow you to combine several vendors in 1 team but I prefer keeping 1 vendor in a team. Mixing NIC drivers is too much of a negativve to outweigh the positive.

  3. slowe’s avatar

    Andrew,

    I’m not just bashing Hyper-V for the sake of bashing Hyper-V. I’m pointing out what I feel is a significant problem in a vendor’s solution. Note that I’ve done the same to VMware in the past. I’m an equal opportunity basher. :)

    With regards to your notes:

    - One of the KB articles is about Virtual Server, the other is about Windows Server clustering, including Windows Server 2008.

    - No one is disputing that Microsoft is not a networking expert. Again, my biggest beef is that Microsoft has not even provided a framework in which NIC vendors can provide fault tolerance.

    - I was not aware that Broadcom did allow you to mix vendors in a team. I do agree, though, that doing so adds a layer of complexity that would be better avoided if at all possible.

    Finally, with regard to cost: I would just say that you get what you pay for. If you truly need an enterprise-class virtualization solution, then you need VMware, and the cost reflects the functionality. Hyper-V’s not there in that space yet. If you need an entry-level virtualization solution, then Hyper-V will work for you, and its cost reflects that as well.

  4. Andrew’s avatar

    “Get what you pay for” comments is the reason VMWARE charges the rediculous sums they do.

    They also see that Application Virtualization is really going to crash the party.

    Is it your feeling that current Etherchannel is not an acceptable solution in general or just specifically Hypervisors?

  5. slowe’s avatar

    Andrew,

    I don’t want to get into an ideological debate with you regarding VMware’s pricing, but I will say that VMware is free to charge whatever they like for their products, just as Microsoft is. Some would say that Microsoft’s prices for Windows Vista are as ridiculous as VMware’s VI3 pricing, if not more so.

    Also, take a look at the per-VM costs for each of the major virtualization solutions…you may be surprised at the outcome.

    With regards to application virtualization, I don’t know that I would say it’s going to “crash the party,” but I will say that it was a smart move for VMware to buy Thinstall. Microsoft is coming on strong in the desktop virtualization space; although they have yet to deliver some concrete products, the technology looks solid. VMware had to be prepared to compete.

    Finally, about EtherChannel: EtherChannel (or LACP) is a good technology, and it works well in the right situations. It’s not the right fit in all cases, virtualized or not. I’ve used it in some VMware deployments, and in some VMware deployments we just use NIC teaming. In network-to-network connections (say, between switches), it makes a lot of sense. From a server to a switch…well, that depends upon what the server is doing, what kinds of traffic it’s generating, how many clients are connecting, etc.

  6. Chris Wolf’s avatar

    Good points, Scott. Sorry for jumping in late, but I was at a conference all of last week, so I’m finally getting caught up on news, etc. My point to Keith Ward was basically that we’ve gotten by without official support for so long, that we’re almost used to it. I’d never deploy a Microsoft cluster without teamed NICs (even though such a config wasn’t supported by MS), and MS and the organization would typically look right past the NIC teaming support issue. That all being said, your points are right on. Multipath is officially supported and NIC teaming should be as well.

  7. slowe’s avatar

    Chris,

    No worries about “jumping in late”–I’ve been loosely following your comments at the Catalyst ‘08 conference.

    I think that all most readers really want is just for Microsoft to acknowledge that Windows needs a NIC teaming framework. Even if Microsoft can’t (or won’t) provide that functionality themselves, they should at least provide a supported framework for third-party vendors to use.

    Thanks for reading, and for adding your thoughts!

  8. GeoTech’s avatar

    I just designed and deployed a clustered Hyper-V system that has Intel NIC Teams no problem. It’s in production and running 12 VM in on the cluster…

  9. slowe’s avatar

    GeoTech,

    Thanks for sharing your experience. I think that everyone agrees it will probably work just fine. A lot of people, though, are just uncomfortable running enterprise workloads with configurations that are “unsupported.”

  10. Nate’s avatar

    This is a late comment to this post, but I just got around to playing with NIC teaming on one of my IBM x series servers. The server has broadcom NICs, and what I found was using the BACS software I could use the smart load balancing for making a failover team that would work just fine with hyper-v. When I flipped it to being an active-active team things hit the wall. Smart load balancing wasn’t the only option though, you can also do an 802.3ad team (etherchannel). After some searching I found a Microsoft presentation that actually suggests using 802.3ad to load balance NICs for hyper-v. Going the etherchannel route means you are using a standards based well supported framework to run your team on. Now that Cisco had released VSS which lets you build an etherchannel across multiple 6500s (you could do it before on stacked 3750s) I don’t see why you wouldn’t use it. Its a more stable method than mac-out methods, and should result in better load balancing.

    Also, just because Microsoft isn’t the one who supports something doesn’t mean it is “unsupported.” That’s just a marketing scare tactic from vmware.

  11. slowe’s avatar

    Nate, I don’t know about your experience, but my experience has been that having cross-stack capable switches (like stacked Catalyst 3750s or VSS-enabled Catalyst 6500s) is generally less common than perhaps one might think. With that in mind, the ability to use a third-party NIC tool to create 802.3ad/LACP-capable bonds is less compelling than the ability to team dissimilar NICs in an active/active mode without any switch configuration required.

    Finally, with regards to “unsupported”…customers who are deploying enterprise-grade virtualization solutions generally want support from the vendor. “Unsupported” means “not in my data center” for these types of customers.

  12. Nate’s avatar

    I suppose my thought is that if we are talking about ‘enterprise-grade’ virtualization customers they are also likely ‘enterprise-grade’ networking customers. I’m not sure if other netorking vendors already offer solutions to etherchannel with multiple switches, but if they don’t I’d assume they’ll follo suit soon. I know Juniper has something they call “virtual chassis” for stacking that I’d assume could lead to the same end goal. To me an etherchannel solution is more ‘enterprise’ than a mac-out solution. My guess is Microsoft will probably implement something within hyper-v at some point to do NIC bonding for SMB folks using budget switches, but my point is that there is a solution now. Additionally even if Microsoft implements something similar to what vmware has done it may still be beneficial to opt for etherchannel in an enterprise scenario if you can for better load balancing.

    Again my beef is with saying ‘unsupported’ in the first place. It’s an incorrect statement. The NIC vendor supports thier software, therefore it is supported. Microsoft states the use of third party NIC teaming is ‘accepted’ just that the support of that third party software falls to the NIC vendor. It’s not like Microsoft is saying “don’t do it”. Yes Microsoft says that if you are having an issue they may ask you to disable the team as part of the troubleshooting process. That’s not a big surprise. Even if the teaming software was provided by Microsoft I’m sure there are scenarios where they would ask you to disable it in troubleshooting to verify if it is the issue. Heck, I’m sure that vmware does the same with their teaming solution. To me the ‘unsupported’ thing is FUD.

  13. slowe’s avatar

    I suppose we just have different perspectives on the matter. I can see your point, but to me this level of redundancy should have been built into the hypervisor/parent partition.

    Thanks for commenting!

  14. Sean McPartlin’s avatar

    Nat: So you got Teaming to work with the IBM x series and Hyper-V? Would love some more info on that.

    I’m trying to get HS12 in a BladeCenter S to work with hyper-v in a clustered 2008 2 node system.

    My VM’s can only ping the IP of the Hyper V. Can’t route past it.

    I’m going to try messing with the 802.3ad setup and see if that helps.

    Sean~

  15. Ian’s avatar

    Sean,

    We are ewxperiencing the same issue only we are using IBM HS21 blades in a BladeCentre H. We have SLB setup on our Broadcom NIC’s however are somewhat tied to the SLB technology because our NIC’s are across different physical switches in the back of the BladCentre-H Chassis. I don’t believe we can use 802.3ad across multiple switches???

    Did you have any luck?

    Ian

  16. Dan’s avatar

    Further to Ian’s message, does anyone have hands on experience they can share with a BladeCentre H, HS21’s or HS22’s and load balancing. I’ll be implementing a similar system in the coming months and are unsure of any potential issues. Specifically, I’ll have the two blade NICs and two NICs on the expansion card (NIC types the same). There will be a number of Hyper-V blades clustered.

    If anyone can share first hand experience it would be much appreciated. Cheers

    Dan

  17. Amr Nassar’s avatar

    Installation order should be as follows:
    1. Add the Hyper-V role
    2. Install the NIC Teaming software
    3. Configure the team
    4. Configure virtual networking within Hyper-V