I’ve written a couple of times about NetApp virtual interfaces (VIFs), which are Data ONTAP’s name for a link aggregate using either EtherChannel or dynamic LACP. The earlier articles I wrote are:
Cisco Link Aggregation and NetApp VIFs
LACP with Cisco Switches and NetApp VIFs
I came across an issue today of which I was not aware. I’ve been working on a new NetApp deployment with a fellow engineer that called for a number of different VIFs to be created: one for CIFS traffic, one for NFS traffic, and one for SnapMirror traffic. (Yes, I know the SnapMirror VIF won’t really use more than one link because it’s all point-to-point traffic; it’s primarily for redundancy.) There were some really strange network issues going on, like losing connectivity to the default gateway one moment and then network connectivity being restored the next. We were having a hard time troubleshooting the problem until one of the network engineers casually commented that it looked like the static LACP bundles (the aggregated links represented by the VIFs on the NetApp storage array) weren’t really coming up.
That comment lead to a deeper inspection of the NetApp VIFs and eventually a case with NetApp. The end result was that we learned that multimode VIFs can’t span built-in NICs and add-in NICs. Since the FAS3000 series has a limited number of built-in NICs, we’d installed two additional quad-port NICs and then, as was customary, created VIFs spanning the built-in NICs and the add-in NICs for maximum redundancy. Well, that doesn’t work!
Once we reconfigured the Cisco switches (these were Cisco Catalyst 3750 switches uplinked via 10 Gigabit Ethernet to Catalyst 6509 switches) so that the link aggregates only contained add-in NICs or built-in NICs but not both, the connections came up fully and the network connectivity issues disappeared.
So, when creating multimode VIFs, be sure to only include NICs from add-in cards or the built-in NICs, but not both.
Tags: NetApp, Networking, ONTAP, Storage
-
Scott – I’m going to call BS on the NetApp support guy. I’m pretty sure that is not accurate at all. Here is the excert from the NetApp Network Management Guide – pg 123:
The following guidelines apply to creating and configuring vifs on your storage system:
You can group up to 16 physical Ethernet interfaces on your storage system to obtain a vif. The network interfaces that are part of a vif do not have to be on the same network adapter, but it is best that all network interfaces be full-duplex.
You cannot include a VLAN interface in a vif.
The interfaces that form a vif must have the same Maximum Transmission Unit (MTU) size. You can use the ifconfig command to configure the MTU size on the interfaces of a vif. You need to configure the MTU size only if you are enabling jumbo frames on the interfaces.
You can include any interface, except the e0M management interface that is present on some storage systems. -
What I would suggest was the problem – because I ran into this recently is; the 4 port cards are sometimes confusing which physical port is a and which physical port is d. We had a customer who setup multi-vifs who read these backwards and every time we unplugged the onboard port the VIF stopped flapping. It wasn’t the onboard ports issue – it was that the onboard port was being physically plugged into a VIF but on the switch side those physical cables were 2 different port-channels.
-don
-
Hi, Thanks for sharing this info. Is there a burt number you can point us to, or more technical details about what’s going wrong, because I’ve implemented this setup a number of times without any (obvious) problems, so I am a little confused right now. In other words: it strikes me as being really weird, so I’d like to get as much additional information as possible. Is it possible it could work under certain conditions ?
-
Great post! I was about to make this mistake myself with a new setup.
-
I’m actually curious if you have any further details you can post as we’ve done this at various places — might you be able to post the /etc/rc lines and/or Cisco switch config?
Just curious to see if there’s anything I’m missing….
Thanks.
-
Would you clarify something for me? The network engineer mentioned LACP bundles not coming up. However, later on, you mention that it was multimode VIFs that don’t work across NICs. Which one was it?
We’ve got a FAS3040A with each head having 2 mulitmode VIFs; each VIF having members on the on-board and an add-in card. When we tried to convert these to LACP (on a Juniper EX 4200 8PoE stack) the dumb things would never come up. We converted back to multimode and things worked just fine.
I’m betting that the issue described above is the problem we had.
-
The article is completely “false” in stating that VIF’s can’t be created between onboard ports and expansion cards.
Could you please provide some additional info. on where you got this from?
I’m sure it’s an honest mistake.
Thanks.
PS: I like your blog.
Sam
-
I would like to rephrase the above, for it comes accross way too harsh.
Please feel free to remove it. It should have read:
In my humble opinion, you can have internal and external nics coexisting in a multimode vif. Could you please provide your netapp case number so that I can further examine this?
Sam
-
Can you post the vif create line from the /etc/rc? That shouldn’t contain anything remotely sensitive.
-
Scott – you say “As I understand it, the Cisco switches are using static LACP (”channel-group X mode on” with “channel-protocol lacp”). The NetApp VIFs are configured as multimode, not as LACP. Last time I checked, the configuration remains like that but now that the VIFs are reconfigured to use only add-in or built-in but not both, everything works fine.”
This sounds a bit like you have the configuration noted in kb34818? Can you clarify?
-
We have VIFs that are split between built-in and add-on cards running on Cisco switches without any issues whatsoever. My guess is that the two cards are flipped and your ports were actually in different port groups. I’ve seen that the 3070s are different than the 3020s in this regard, and that the add-in cards are “upside down”, which can cause confusion when creating the groups.
-
Scott is actually correct, in some circumstances, this will not work..
See BURT 295540. I believe the public report is not displaying correctly at the moment for some reason, so here’s the high level view:
The problem will only reveal itself during boot, and has to do with some adapters negotiating 10mbps for a very brief time before coming online as GbE. In this case, the switch will disable those members of the VIF. This is a function set by the hardware manufacturer and cannot be changed or worked around with data Ontap.
This applies to both multimode and LACP VIFs. The chances of this occurring are, of course, dependant on what the switch does in this situation, so it will vary depending on the switch model. In some cases, hard coding the switch port to 1000mbps/full has prevented the problem.
Also, some flow control negiotiation problems can contribute to this, I believe that KB was referenced earlier though.
Aside from this particular issue, there is no other problem that I know of that will prevent you from mixing ports from different physical adapters in a multimode or LACP VIF. I have seen it hundreds of times if not thousands.
-
I’d say BS on what the NetApp support guy told you as well.
I’ve got (2) NetApp 3040a clusters with vifs setup for cifs traffic and for NFS traffic. Both of mine are span the add in cards and the onboard nics. No flapping and all the ports are online. On one of the clusters I’m using (2) Cisco Catalyst 3750′s linked together and the other is a Cisco 4507 spanned over multiple high speed blades. My switches are set to IP Load Balance.
Also, NetApp vifs can be either Multimode of LACP.
vif create lacp SANAprivate -b ip e0c e0d e4c e4dI’ve written more about my setup here:
http://universitytechnology.blogspot.com/2009/02/netapp-3040a-clustered-link-aggregation.html -
An interesting thread of comments. Thanks for sharing.
Just to add to the pot; I have before experienced an LACP VIF that wouldn’t return from a CF Giveback. Just the one inparticular which happens to span onboard/PCI PIFs and connected to 3com switches. Interface status looked OK but was unreachable by client hosts. An ifconfig up/down resolved it at the time.
One to watch I guess.
-
Haven’t been able to replicate what I saw with some quick tests just now. Although we’re on a different ONTAP version since I last experienced it.
Hypothetically, if it was a problem with mixed PIFs, then some up-delay logic would prevent it. Similar to the behaviour of Linux’s bonding module which alleviates slaves from becoming active before a switch port moves into STP forwarding.
Also, from memory, ONTAP doesn’t allow you to hardset ports at 1GbE. If the switch port is set alone then are you going to experience issues with duplex and flow control negotiation?
-
Hi gang
I’m one of the NetApp peeps talking to Scott in the background. I’m politely calling “negatory” on this in general. Mostly. However, I looked up 295540 and the issue is a hard-coded-on-the-card negotiation issue on certain cards where the cards start their negotiation at 10Mbps. When the other ports start at 1Gbps, the switch disables the 10Mbps ports. There are workarounds in the public burt. Not sure why the burt isn’t published, but I’ve asked. One workaround is “When including the listed controllers in a mixed-controller VIF, set the
speed on all switch ports in the VIF. The link will come up even though the controllers only support autonegotiation at 1000 Mbps.”I’ve seen best practices, including from VMware, that recommend not mixing cards from different vendors (or even different family or model from the same vendor) within an active-active team/channel/bond. While not mixing cards is one workaround of 295540, there is no hard rule not to mix on-board and PCI slot interfaces in a multi-mode or LACP VIF. Oh, and it’s not about whether you mix on-board and slots, it’s about the Ethernet chips in question, so look at card description in sysconfig. Couple examples:
FAS3070 with an additional dual-port
slot 0: BGE 10/100/1000 Ethernet Controller
slot 2: Dual 10/100/1000 Ethernet Controller G20FAS3020 with an additional dual-port
slot 0: Dual 10/100/1000 Ethernet Controller VI
slot 1: Dual Gigabit Ethernet Controller VIThe FAS3070 could have this issue (but I’m not). The FAS3020 has the same chips on-board and in slot, so should never have this problem. (It says dual for the motherboard when it’s actually 4 ports because it’s 2 dual-port chips.)
Another way to fix this problem would be for Cisco not to disable the port right away. Give the NIC a chance to negotiate up to a better speed.
I’m currently working on a RefArch with much the same scenario everyone else seems to be piling on:
Multiple ESX 3.5u2/u3 Pair of stacked c3750 FAS3070HAOn the filer side, it’s two onboard ports and both ports of a dual port (happens to be a G20 supposedly affected by 295540, but I haven’t seen that) in an LACP VIF.
Here’s the pertinent part of /etc/rc:
vif create lacp vif-stor -b ip e0b e0d e2a e2b
ifconfig e0a 10.60.120.189 mediatype auto netmask 255.255.255.0 partner e0a
ifconfig vif-stor 192.168.42.203 mediatype auto netmask 255.255.255.0 partner vif-stor
ifconfig vif-stor alias 192.168.42.204 netmask 255.255.255.0We’ve tested this by unplugging cables (ESX side and filer side), 1 and 2 at a time, unplugging a switch, and powering off a head and it just doesn’t seem to break.
Share and enjoy!
Peter
-
SLOWE – what was your case number?
-
Was this issue ever resolved? Im seeing the same behavior on a new FAS2050 install. Onboard nics work fine together in both multi and lacp vifs. Addon nics work fine together in both multi or lacp vifs. Put one of each in the vif, and both links refuse to come up.
-
I ran into an issue configuring flow control on my FAS3140 (7.3.3.P5). When attempting to set flow control on a vif, an error is spit out by ONTAP citing an “error 22, invalid argument to SIOCSFLOWCONTROL”. Looking up the KB at NOW there is a statement that says flow control is not supported on vifs, period.
But all the best practices guides say to set flow control on the filer (and in this case, the ESXi hosts) to send on, receive off. The Cisco switch should be send off, receive desired.SO… if vifs do not support flow control, what is the real “best practice”?
-
Hi Jim,
The “flowcontrol send” has to be set on the individual interface like e0a etc. The vif will acquire this setting automatically from the interfaces.
Regds,
Murthy




24 comments
Comments feed for this article
Trackback link: http://blog.scottlowe.org/2009/02/02/netapp-vif-member-limitations/trackback/