Gestalt

Articles in this category are republished to the GestaltIT site.

In late November 2009, I published a post on understanding NPIV (N_Port ID Virtualization) and NPV (N_Port Virtualization); you can read the full post here. In that post, I described a pair of virtualization technologies—NPV and NPIV—for Fibre Channel storage area networks (SANs). This time around, I’d like to discuss a related technology on the Ethernet networking side called network interface virtualization (NIV). This technology, by the way, is currently undergoing IEEE standardization as part of the 802.1Qbh standard under the name “Bridge Port Extension”.

Before I can describe NIV, I need to define a few key terms:

IV-capable bridge: This is a switch that understands and is aware of interface virtualization (IV). Examples of IV-capable bridges include the Nexus 5000 and the UCS 6100XP fabric interconnects.

Interface virtualizer: An interface virtualizer (IV) is a device that simply extends the reach of an IV-capable bridge. To the outside world, an IV-capable bridge and all of its IVs appear as a single bridge. Examples of IVs include the Nexus 2000 fabric extender and the I/O module (IOM) in the Cisco Unified Computing System (UCS). There are also other forms of IVs, as I’ll explain later in this post.

Link-local tag: Frames entering an IV from a network interface card (NIC) have a link-local tag added to them; this link-local tag denotes the source IV port. Similarly, frames entering an IV-capable bridge have a link-local tag added to them that indicates the path through the IV(s) to the destination IV port. In this way, the link-local tag supplants the MAC address as the primary method of determining to which port (or out which port) a frame should be forwarded. The link-local tag is removed from the frame when it exits an IV (headed into a NIC) or when it exits an IV-capable bridge. (This link-local tag is what Cisco refers to as VNTag.)

Now that I’ve defined some NIV terminology, I’d like to explain why NIV is useful. To understand the value of NIV, take a look at the progression or development of data center networking:

  1. Before virtualization, bridges (or switches) were connected to a NIC (sometimes multiple NICs, but usually just one NIC) in a physical server. The relationship between physical NICs, physical switch ports, and MAC addresses is static and easy to determine (generally one NIC with one MAC address connected to one switch port). The bridge hierarchy is reasonably simple.
  2. Then hardware manufacturers introduce the server blade form factor. This introduces another layer of bridges (switches) in the blade chassis themselves and creates a more complex bridge hierarchy. This more complex bridge hierarchy also means more management, as each of these bridges represents a point of management. In order to provide redundancy between the various layers of bridges multiple connections are necessary; this, in turn, necessitates the use of Spanning Tree Protocol (STP) in order to prevent bridging loops. STP creates active/passive connections, meaning the effective bandwidth of the network is reduced.
  3. Along comes virtualization, which introduces the idea of multiple virtual NICs (vNICs) associated with a single physical NIC. This, in turn, introduces another layer of switching and more points of management, and the bridge hierarchy grows more complex. Because multiple MAC addresses are now associated with a single NIC connected to a single switch port, bridges must now deal with “hairpin forwarding,” in which the switch must retransmit a frame out the same port on which it was received. The relationship between NICs, MAC addresses, and switch ports is now much more complex and very dynamic (due to live migration technologies).

As the proliferation of virtualization continues, this trend toward increased complexity also continues unabated. How, then, are we supposed to address this? NIV is intended to help address this problem. NIV seeks to remove the complexity from the edge—the NICs and vNICs—and drive that complexity toward the bridges. That is a key underlying principle behind NIV. Look back at the definitions: one characteristic of an IV-capable bridge is that the IV-capable bridge and all of its associated IVs appear to the outside world as a single bridge.

For example, consider a Nexus 2000 fabric extender connected to a Nexus 5000 switch. From both a management perspective as well as a networking perspective, the Nexus 5000+Nexus 2000 combination appears as a single device. The Nexus 2000 is simply an extension of the Nexus 5000 (hence the name “fabric extender”). This is why we say that a Nexus 2000 is an example of a interface virtualizer and why a Nexus 5000 is provided as an example of an IV-capable bridge. Similarly, this is why an IOM in the back of a UCS blade chassis functions as an interface virtualizer; it acts as an integrated part of the UCS 6100XP fabric interconnect. Just as the ports on a Nexus 2000 appear to be part of the Nexus 5000, the ports on a UCS IOM appear to be part of the UCS 6100XP fabric interconnects.

By now it should start making a little bit of sense. IVs allow you to scale to larger port densities, but they keep the complexity away from the edge of the network. If you’re astute, though, you’ll note that the examples I’ve provided so far don’t address the growth of vNICs created by virtualization. The Nexus 2000 and the UCS IOM help address the growth of physical ports, but not virtual ports. Can NIV help address vNICs as well?

So far, the examples I’ve provided of IVs have been IVs that embrace the bridge/switch form factor. There’s no reason, though, that an IV must look or feel like existing bridges or switches. Here’s another type of interface virtualizer that will really blow you away: what about the Cisco Virtual Interface Controller (VIC), aka Palo? Think about it: a VIC is really just an IV built into a UCS server blade. It appears to the outside world as additional ports on the IV-capable bridge (the UCS 6100XP fabric interconnect) to which it is connected. This underscores the flexibility of IVs and also underscores the fact that IVs can be “chained,” just as the VIC (an IV built onto the server blade) is “chained” behind the UCS IOM (an IV in the rear of the UCS chassis). Chained IVs appear as part of the upstream IV-capable bridge to which they are connected.

By building the IV into the server itself and leveraging hypervisor bypass (VMDirectPath in VMware vSphere), Cisco can address the growth of vNICs. With VIC as an IV on a VMware vSphere host, the ports that connect directly to a VM appear as ports on the upstream IV-capable bridge (the UCS 6100XP in this case); this erases any distinction between physical NICs on physical servers and virtual NICs on virtual servers. Again, a key component of NIV is that an IV-capable bridge and all associated IVs (regardless of form factor) appear as a single bridge.

Related to NIV is the idea of an Ethernet Host Virtualizer (EHV). EHV describes the behavior when an IV-capable bridge and associated IVs appear to the rest of the network as a single host. Because it appears as a single host, all issues with multiple uplinks and STP now go away. This is why, for example, the uplinks on the UCS 6100XP fabric interconnects are always active-active. I’m planning on delving a bit deeper in EHV in the near future.

I hope that this discussion of NIV has been useful. If you are interested in some additional information, I found some documents which were extremely helpful in solidifying my information. These documents (here, here, and here) are very technical but they are good sources of information nevertheless.

In a future post, I’ll discuss the role of the Nexus 1000V in NIV and how it relates to the other components described here.

Feel free to post any questions, comments, or clarifications below.

Tags: , , ,

Two technologies that seem to have come to the fore recently are NPIV (N_Port ID Virtualization) and NPV (N_Port Virtualization). Judging just by the names, you might think that these two technologies are the same thing. While they are related in some aspects and can be used in a complementary way, they are quite different. What I’d like to do in this post is help explain these two technologies, how they are different, and how they can be used. I hope to follow up in future posts with some hands-on examples of configuring these technologies on various types of equipment.

First, though, I need to cover some basics. This is unnecessary for those of you that are Fibre Channel experts, but for the rest of the world it might be useful:

  • N_Port: An N_Port is an end node port on the Fibre Channel fabric. This could be an HBA (Host Bus Adapter) in a server or a target port on a storage array.
  • F_Port: An F_Port is a port on a Fibre Channel switch that is connected to an N_Port. So, the port into which a server’s HBA or a storage array’s target port is connected is an F_Port.
  • E_Port: An E_Port is a port on a Fibre Channel switch that is connected to another Fibre Channel switch. The connection between two E_Ports forms an Inter-Switch Link (ISL).

There are other types of ports as well—NL_Port, FL_Port, G_Port, TE_Port—but for the purposes of this discussion these three will get us started. With these definitions in mind, I’ll start by discussing N_Port ID Virtualization (NPIV).

N_Port ID Virtualization (NPIV)

Normally, an N_Port would have a single N_Port_ID associated with it; this N_Port_ID is a 24-bit address assigned by the Fibre Channel switch during the FLOGI process. The N_Port_ID is not the same as the World Wide Port Name (WWPN), although there is typically a one-to-one relationship between WWPN and N_Port_ID. Thus, for any given physical N_Port, there would be exactly one WWPN and one N_Port_ID associated with it.

What NPIV does is allow a single physical N_Port to have multiple WWPNs, and therefore multiple N_Port_IDs, associated with it. After the normal FLOGI process, an NPIV-enabled physical N_Port can subsequently issue additional commands to register more WWPNs and receive more N_Port_IDs (one for each WWPN). The Fibre Channel switch must also support NPIV, as the F_Port on the other end of the link would “see” multiple WWPNs and multiple N_Port_IDs coming from the host and must know how to handle this behavior.

Once all the applicable WWPNs have been registered, each of these WWPNs can be used for SAN zoning or LUN presentation. There is no distinction between the physical WWPN and the virtual WWPNs; they all behave in exactly the same fashion and you can use them in exactly the same ways.

So why might this functionality be useful? Consider a virtualized environment, where you would like to be able to present a LUN via Fibre Channel to a specific virtual machine only:

  • Without NPIV, it’s not possible because the N_Port on the physical host would have only a single WWPN (and N_Port_ID). Any LUNs would have to be zoned and presented to this single WWPN. Because all VMs would be sharing the same WWPN on the one single physical N_Port, any LUNs zoned to this WWPN would be visible to all VMs on that host because all VMs are using the same physical N_Port, same WWPN, and same N_Port_ID.
  • With NPIV, the physical N_Port can register additional WWPNs (and N_Port_IDs). Each VM can have its own WWPN. When you build SAN zones and present LUNs using the VM-specific WWPN, then the LUNs will only be visible to that VM and not to any other VMs.

Virtualization is not the only use case for NPIV, although it is certainly one of the easiest to understand.

<aside>As an aside, it’s interesting to me that VMotion works and is supported with NPIV as long as the RDMs and all associated VMDKs are in the same datastore. Looking at how the physical N_Port has the additional WWPNs and N_Port_IDs associated with it, you’d think that VMotion wouldn’t work. I wonder: does the HBA on the destination ESX/ESXi host have to “re-register” the WWPNs and N_Port_IDs on that physical N_Port as part of the VMotion process?</aside>

Now that I’ve discussed NPIV, I’d like to turn the discussion to N_Port Virtualization (NPV).

N_Port Virtualization

While NPIV is primarily a host-based solution, NPV is primarily a switch-based technology. It is designed to reduce switch management and overhead in larger SAN deployments. Consider that every Fibre Channel switch in a fabric needs a different domain ID, and that the total number of domain IDs in a fabric is limited. In some cases, this limit can be fairly low depending upon the devices attached to the fabric. The problem, though, is that you often need to add Fibre Channel switches in order to scale the size of your fabric. There is therefore an inherent conflict between trying to reduce the overall number of switches in order to keep the domain ID count low while also needing to add switches in order to have a sufficiently high port count. NPV is intended to help address this problem.

NPV introduces a new type of Fibre Channel port, the NP_Port. The NP_Port connects to an F_Port and acts as a proxy for other N_Ports on the NPV-enabled switch. Essentially, the NP_Port “looks” like an NPIV-enabled host to the F_Port on the other end. An NPV-enabled switch will register additional WWPNs (and receive additional N_Port_IDs) via NPIV on behalf of the N_Ports connected to it. The physical N_Ports don’t have any knowledge this is occurring and don’t need any support for it; it’s all handled by the NPV-enabled switch.

Obviously, this means that the upstream Fibre Channel switch must support NPIV, since the NP_Port “looks” and “acts” like an NPIV-enabled host to the upstream F_Port. Additionally, because the NPV-enabled switch now looks like an end host, it no longer needs a domain ID to participate in the Fibre Channel fabric. Using NPV, you can add switches and ports to your fabric without adding domain IDs.

So why is this functionality useful? There is the immediate benefit of being able to scale your Fibre Channel fabric without having to add domain IDs, yes, but in what sorts of environments might this be particularly useful? Consider a blade server environment, like an HP c7000 chassis, where there are Fibre Channel switches in the back of the chassis. By using NPV on these switches, you can add them to your fabric without having to assign a domain ID to each and every one of them.

Here’s another example. Consider an environment where you are mixing different types of Fibre Channel switches and are concerned about interoperability. As long as there is NPIV support, you can enable NPV on one set of switches. The NPV-enabled switches will then act like NPIV-enabled hosts, and you won’t have to worry about connecting E_Ports and creating ISLs between different brands of Fibre Channel switches.

I hope you’ve found this explanation of NPIV and NPV helpful and accurate. In the future, I hope to follow up with some additional posts—including diagrams—that show how these can be used in action. Until then, feel free to post any questions, thoughts, or corrections in the comments below. Your feedback is always welcome!

Disclosure: Some industry contacts at Cisco Systems provided me with information regarding NPV and its operation and behavior, but this post is neither sponsored nor endorsed by anyone.

Tags: , , , , ,

I’ve been doing a pretty fair amount of work recently with the Cisco Nexus 5000 series of switches, as evidenced by the flurry of Nexus-related articles:

Connecting Nexus 5000 to Older Gigabit Ethernet Switches
Setting Up FCoE on a Nexus 5000
FCoE and VLAN Trunking on Nexus 5000

One thing I hadn’t yet documented was how to enable jumbo frames on a Nexus 5000. Since jumbo frames are now officially supported for VMkernel traffic with VMware vSphere, the combination of jumbo frames and 10Gb Ethernet is an attractive one. I’ve covered the ESX/ESXi side (ordinary vSwitches here and distributed vSwitches here), but here’s the Nexus side.

The commands are pretty straightforward, and I’ve included the commands for both NX-OS 4.0 and NX-OS 4.1 (they are different between versions). Important note: if you enabled jumbo frames under NX-OS 4.0 and then upgraded the switch to version 4.1, you’ll need to re-do your jumbo frame configuration.

For NX-OS 4.1, the commands to enable jumbo frames are:

switch(config)# policy-map type network-qos jumbo
switch(config-pmap-nq)# class type network-qos class-default
switch(config-pmap-c-nq)# mtu 9216
switch(config-pmap-c-nq)# exit
switch(config-pmap-nq)# exit
switch(config)# system qos
switch(config-sys-qos)# service-policy type network-qos jumbo

Now, contrast the commands above with the following commands, which you would have used to enable jumbo frames on NX-OS 4.0:

switch(config)# policy-map jumbo
switch(config-pmap)# class class-default
switch(config-pmap-c)# mtu 9216
switch(config-pmap-c)# exit
switch(config)# system qos
switch(config-system)# service-policy jumbo

The end result of these differences is this: if you upgrade NX-OS from 4.0 to 4.1, then your jumbo frames configuration will go away, and you’ll need to enter the commands for version 4.1 in order to enable jumbo frame support again. This little gotcha caused me quite a headache when my NFS-based datastores suddenly went offline after the NX-OS upgrade.

More information on the necessary commands can be found here for version 4.0 and here for version 4.1.

Tags: , ,

VMware, Cisco, and EMC made their official announcement of the VCE Coalition and the joint venture Acadia this morning. You can read one of the press releases here via MarketWire.

Acadia is interesting, but it really isn’t the meat of the announcement, in my opinion. The real substance of the matter is the nature of the coalition. There are many interesting questions/thoughts circling in my head right at the moment:

  • What impact will this have on VMware’s relationship(s) with HP, IBM, and Dell? “Throwing their hat in the ring” with Cisco’s UCS, so to speak, may greatly endanger VMware’s much larger (with respect to revenue) relationships with other OEMs. What will happen to VMware if those OEMs “throw their hat in the ring” with Microsoft and Hyper-V? This is not a good place to be.
  • The acrimonious Cisco-HP relationship adds further fuel to the concerns over VMware’s close alliance with Cisco’s computing platform.
  • Does this new coalition signal a move away from the “arms-length” relationship between EMC and VMware, a move that some (competitors, notably) have been talking about for some time? If so, what danger does that put VMware in with regards to storage relationships?
  • It seems to me that VMware has the most to lose here. What does EMC lose if this doesn’t go well? Nothing, really. What about Cisco? Nothing, really. VMware, on the other hand…well, it could be ugly.
  • What does this coalition offer that the three companies couldn’t deliver without the coalition? Why risk important relationships? This is a big question in my mind. Lots of technology companies have delivered validated designs without any sort of formal coalition. Why is one necessary in this case?
  • On the other end of the spectrum—keeping Acadia out of the picture for the moment—is this “new coalition” really anything more than what the three companies have already been doing? Is this really anything more than each of the companies dedicating resources to this effort? I know from my own direct interaction with at least one of these vendors that resources had already been dedicated to the VCE technology intersection before any sort of formal announcement. So, does this formal announcement really mean anything at all?

I don’t have any answers (yet), but you can at least read my thoughts—and contribute back to them via the comments—without having to pay $499 to some analyst firm.

By the way, if you’d like some other viewpoints on this matter, here are a couple from opposing viewpoints:

NetApp – Jay’s Blog: The Importance of Being Open
Chuck’s Blog: Announcing the VCE Coalition

Feel free to speak up in the comments below (courteous comments only, please, and be sure to include full vendor disclosure where appropriate). Thanks!

Tags: , , , , , , , ,

Fibre Channel over Ethernet (FCoE) is receiving a great deal of attention in the media these days. Fortunately, setting up FCoE on a Nexus 5000 series switch from Cisco isn’t too terribly complicated, so don’t be too concerned about deploying FCoE in your datacenter (assuming it makes sense for your organization). Configuring FCoE basically consists of three major steps:

  1. Enable FCoE on the switch.
  2. Map a VSAN for FCoE traffic onto a VLAN.
  3. Create virtual Fibre Channel interfaces to carry the FCoE traffic.

The first step is incredibly easy. To enable FCoE on the switch, just use this command:

switch(config)# feature fcoe

The next part of the FCoE configuration is mapping a VSAN to a VLAN. What VSAN should you use? Well, if you are connecting to an existing Fibre Channel fabric, perhaps on a Cisco MDS switch, you’ll need to make sure that the VSANs between the Nexus and the MDS are appropriately matched. Otherwise, traffic on one VSAN on the Nexus won’t be able to reach devices on another VSAN on the MDS. If there’s enough demand, I’ll post a quick piece on this step as well.

Note that this FCoE VSAN-to-VLAN mapping is a required step; if you don’t do this, the FCoE side of the interfaces won’t come up (as you’ll see later in this post). Assuming the VSAN is already defined, perform these steps to map the VSAN to a VLAN:

switch(config)# vlan XXX
switch(config-vlan)# fcoe vsan YYY
switch(config-vlan)# exit

Obviously, you’ll want to substitute XXX and YYY for the correct VLAN and VSAN numbers, respectively.

After you’ve enabled FCoE and mapped FCoE VSANs onto VLANs, then you are ready to create virtual Fibre Channel (vfc) interfaces. Each physical Nexus port that will carry FCoE traffic must have a corresponding vfc interface. Generally, you will want to create the vfc interface with the same number as the physical interface, although as far as I know you are not required to do so. It just makes management of the interfaces easier. The commands to create a vfc interface look like this:

switch(config)# interface vfc ZZ
switch(config-if)# bind interface ethernet 1/ZZ
switch(config-if)# no shutdown
switch(config-if)# exit

At this point the vfc interface is created, but it won’t work yet; you’ll need to place it into an VSAN that is mapped to an FCoE enabled VLAN. If you don’t, the show interface vfc <number> command will report this (emphasis mine):

vfc13 is down (VSAN not mapped to an FCoE enabled VLAN)

As I mentioned earlier, if you haven’t mapped the FCoE VSAN onto a VLAN, you won’t be able to fix this problem. If you have mapped the FCoE VSAN onto a VLAN, then you only need to assign the vfc interface to the appropriate VSAN with these commands:

switch(config)# vsan database
switch(config-vsan-db)# vsan <number> interface vfc <number>
switch(config-vsan-db)# exit

At this point, the vfc interface will report up, and you should be able to see the host’s connection information with the show flogi database command.

From this point—assuming that your storage is attached to a traditional Fibre Channel fabric, which is likely to be the case in the near future—you only need to create zones with the WWNs of the FCoE-attached hosts in order to grant them access to the storage. Refer to my posts on creating zones and managing zones on a Cisco MDS for more information on this task.

In my own experience, once FCoE was properly configured on the Nexus 5000 switch, then creating zones and zonesets on the Cisco MDS Fibre Channel switch and creating and masking LUNs on the Fibre Channel-attached storage is very straightforward. This, as has been stated on several previous occasions, is one of the strengths of FCoE: it’s compatibility with existing Fibre Channel installations is outstanding.

Feel free to submit any questions or clarifications in the comments below.

Tags: , , , , ,

Yesterday I published a short post titled “I/O Virtualization and the Double-Edged Sword”. In that post, I discussed how Xsigo was criticizing FCoE for “not going far enough” in the realm of I/O virtualization. Unfortunately, I didn’t do a very good job of really getting my point across, because the discussion rapidly turned into a discussion of the merits of various interconnect technologies and why one might win over the other. While that is a great discussion to have—and I’m thrilled my site can help further that discussion—it wasn’t really the key point behind my article. I/O virtualization was only the catalyst to prompt the original post.

Let me see if I can more clearly articulate what I’m trying to say here. If you are a Twitter user and into virtualization or storage, then you probably are following either Chad Sakac of EMC (@sakacc on Twitter), Vaughn Stewart of NetApp (@vaughn_stewart on Twitter), or both. That being the case, you are probably very familiar with the extensive “discussions” that take place between the two of them. Both of them are very passionate about storage and virtualization, but they have differing viewpoints. Now, before I’m accused by NetApp of being an EMC bigot (which would be ridiculous given the coverage I’ve given NetApp) or accused by EMC of being a NetApp bigot (that, at least, might be understandable as I’m just now starting to learn EMC storage), let me say that I’m not endorsing either product. NetApp’s products and EMC’s products are different; each of them has strengths and weaknesses in different areas.

Now, ask yourself, “Why do these products have different strengths and weaknesses?” Do you know the answer? These products have different strengths and weaknesses because of the technology decisions each company chose to make in the products’ development. NetApp chose one path, EMC chose another. For NetApp, that has created certain efficiences, certain strengths—and corresponding weaknesses. Likewise, EMC’s technology decisions have resulted in their products having certain strengths and weaknesses. Neither of these products is perfect. For NetApp to claim that “their way is the right way” is ridiculous; their way is only one of many different ways to accomplish something. The same is true for EMC. And, by extension, the same is true for every other technology vendor on the planet.

You want more examples? Consider the architectural differences between VMware ESX/ESXi and Microsoft Hyper-V. The technology choices made by each company created inherent strengths and weaknesses in each product. VMware claims their choices are the best choices; Microsoft believes their architecture is the best. Clearly, neither product is perfect. Both products have their flaws.

The real key takeaway here is that no technology vendor has the right to throw rocks at another technology vendor. All technology vendors live in glass houses. For VMware to claim that Microsoft’s architecture is all wrong is, well, wrong. For EMC to say that NetApp’s technology choices are stupid would be wrong. For Xsigo to claim that FCoE is the wrong path for I/O virtualization is wrong (although, personally, I don’t consider FCoE an I/O virtualization technology, but that’s a different discussion for a different day). Why? Because every company has to make technology choices, and those technology choices will—by the very nature of technology—automatically create inherent differences, strengths, and weaknesses in the resulting product. And when you accept that truth (and it is a truth, I promise you), then you see why vendors should not engage in negative marketing. When a vendor engages in negative marketing about the competition, that vendor is simply inviting others to pick apart the flaws in their own products.

Of course, I’m not naive enough to believe that vendors will stop negative competitive marketing overnight. Still, I stand firm in the belief that those vendors that focus on the strengths of their products instead of the flaws of others’ products will move ahead. I’m certainly more likely to do business with them.

I’d be interested to hear what others have to say. Voice your position in the comments.

Disclosure: As you probably know, I work for a reseller who represents many different vendors and manufacturers. My words here are not endorsed by my employer, nor do I represent my employer in this area.

Tags: , , , ,

I recently came across this blog entry over at Xsigo’s new corporate blog, I/O Unplugged. A key phrase in this blog entry really caught my eye:

The reality is that FCoE solves neither the complexity nor the management problems. It is a minor change to the status quo when a major leap forward comparable to server virtualization is needed for I/O.

At first glance, I’d say they are right. FCoE was designed from the ground up to be completely compatible with Fibre Channel—and that’s one of its key strengths. Yes, Xsigo’s InfiniBand-based solution is a very different architecture, and the set of capabilities provided by the Xsigo I/O Director are very different than the capabilities enabled by an FCoE solution such as the Cisco Nexus 5000 (or the newly-announced Nexus 4000). I wouldn’t necessarily disagree that Xsigo’s solution might offer some benefits over FCoE. I would strongly contend, however, that FCoE does offer some benefits over InfiniBand-based I/O virtualization solutions.

See, every technology decision is a double-edged sword. Xsigo “breaks the mold” by using a new architecture based on InfiniBand, but this decision comes at the cost of compatibility. Cisco chooses to go with a “less innovative” solution, but gains the benefit of broad compatibility with a large installed base. There is no one solution that offers all advantages and no disadvantages. That being said, which is more important to you and your company: innovation or compatibility? These are the sorts of questions you need to ask when evaluating solutions.

What do you think? Feel free to post your thoughts below. Vendors, please be sure to disclose your affiliation. And, in the spirit of full disclosure, keep in mind that my employer is a Cisco partner, but I have worked with both Xsigo and Cisco solutions. The thoughts I post here do not reflect the thoughts or views of my employer.

Tags: , , , ,

I had a customer contact me about scaling network throughput when using NFS datastores. Specifically, this customer was interested in knowing if it was possible to utilize more than 1 NIC with IP-based storage. The customer is currently using link aggregation (EtherChannel on a Cisco switch). I pointed the customer to my post on NIC utilization, in which I explain the prerequisites for utilizing more than 1 NIC in this sort of configuration. To refresh your memory, those prerequisites are:

  • The vSwitch must be configured for “Route based on IP hash”
  • The physical NICs connected to the vSwitch as uplinks must all be configured as active in the failover order
  • The physical switch must be configured for link aggregation
  • There must be multiple, unique source-destination IP address pairs involved

The customer responded with a question (which I’m paraphrasing here): “That’s all? It will just automatically use more than one link?”

Well…sort of.

There is one little caveat. Cisco IOS uses a hashing algorithm to determine which link a particular traffic flow between a source and destination will use. This algorithm is controlled by the port-channel load-balance command. Assuming that you’re using source-destination IP hashing, that means the Cisco switch will use a hash of the source IP address and the destination IP address to determine which link it will use. This page has more detailed information.

It’s theoretically possible, based on the number of links in the port channel, that some traffic flows between different pairs of source-destination IP addresses might end up on the same link. That means it’s not necessarily just as simple as setting up multiple NFS exports or iSCSI targets on different IP addresses—you also need to know if the IP addresses you are using will actually result in the traffic being distributed across the links.

How does one tell? Good question, and one I’m glad you asked. You can tell using this command (this command assumes you are using IP-based hashing):

switch# test etherchannel load-balance interface <Port channel interface> ip <Src IP Addr> <Dst IP Addr>

So, let’s say that you have an ESX/ESXi host with a VMkernel interface whose address is 172.16.5.10. Let’s say that you have a storage array (NetApp FAS, EMC Celerra, etc.) that supports NFS and you want to mount two different NFS exports on two different IP addresses so that traffic from this ESX/ESXi host to the storage array. You could use the test etherchannel load-balance command to help you determine which address could help ensure traffic distribution across the links:

switch# test etherchannel load-balance interface Po3 ip 172.16.5.10 172.16.5.100

For more examples of what the output would look like, take a look at this image. This was taken off a Cisco Catalyst 3560G running my test lab (and yes, the IP addresses have been changed to protect the innocent).

This would give you one way of testing whether your link aggregation configuration would actually use multiple links, or only a single link due to the IP hash calculation. Also, don’t forget that esxtop can also show you NIC utilization; here’s an example of both uplinks being used in this sort of configuration.

Unfortunately, what I can’t tell you right now is what algorithm the vSwitch itself uses to place traffic onto the uplinks. Does it follow the same sort of mechanism as the Cisco switch? I don’t know. If anyone has any information on that, it would be tremendously helpful.

If anyone has any other pertinent information or resources on this topic, please add them to the comments below.

UPDATE: Duncan Epping pointed out an article by Ken Cline from earlier this year provides the mechanism VMware uses to determine which uplink on a vSwitch will be used. This algorithm performs an XOR operation on the Least Significant Byte (LSB) of the source and destination IP addresses, then finds the modulus of that result and the number of uplinks. Thanks, Duncan and Ken!

Tags: , , , ,

Much ado has been made—some of it by yours truly—about the current lack of ability to create a multi-hop Fibre Channel over Ethernet (FCoE) fabric. After digging in deeper with Cisco during my recent Unified Computing System (UCS) class, I have some additional information to share about the different forms of multi-hop FCoE and why multi-hop FCoE still isn’t available.

Multi-hop FCoE falls into two basic scenarios:

  • FCoE initiators and/or FCoE targets separated from an FCF (fibre channel forwarder) by multiple hops through IEEE DCB-compliant Ethernet switches
  • Multiple FCFs chained together to connect FCoE initiators and FCoE targets

There are additional scenarios, but for now let’s discuss just these two.

In the first scenario, FCoE initiators and FCoE targets might be separated from an FCF by one or more IEEE DCB-compliant Ethernet switches (also known as an “FCoE passthrough”). In this situation, FCoE Initialization Protocol (FIP) would be required in order for the FCoE initiators and FCoE targets to communicate. Now that FIP support is beginning to emerge following the ratification of the FC-BB-5 standard in early June, this sort of scenario becomes more possible.

If you think like me (and if you do, I’m very sorry to hear that!), your next question is, “OK, what is an IEEE DCB-capable switch?” As it turns out, the Nexus 5000 can be an IEEE DCB-capable switch (or an FCoE passthrough). Cisco doesn’t advertise that fact because they don’t feel that building a solution out of a bunch of Nexus 5000 switches is the best approach. OK, fair enough, so the Nexus 5000 isn’t really designed to be used that way. So what other options are there? None of which I’m aware, at this point, so that makes it impossible to build multi-hop FCoE solutions today. When a valid IEEE DCB-capable switch or FCoE forwarder does appear then you’ll be able to build these sorts of designs—assuming that you have FIP support in both the FCoE initiators and the targets. (Note that you could mix pre-FIP components in here, but all such components would have to be connected directly to the FCF, and would only be able to communicate with other components connected directly to the FCF.)

In the second scenario, FCoE initiators and targets are connected directly to an FCF—like a Nexus 5000—but you’ve got multiple FCFs chained together to create a larger fabric. You might consider this analogous to linking multiple MDS 9000 series switches together with inter-switch links (ISLs). In this case, FIP support would still be necessary for initiators to connect to targets on a different FCF, but now there’s another wrinkle. You see, Cisco has the concept of a VSAN (think of it like a VLAN for Fibre Channel—this is a simplistic definition but reasonable enough to use). In the MDS world (keeping in mind that NX-OS, the software running on Nexus switches, has its roots in SAN-OS, the software that runs on MDS Fibre Channel switches), there is the concept of trunking E_ports, where multiple VSANs are carried on a single E_port between two MDS switches. Continuing the VSAN/VLAN analogy, a trunking E_port is analogous to an 802.1q VLAN trunk.

Bear with me, there’s a reason I’m telling you all this.

When you use FCoE on a Nexus 5000, you end up mapping each VSAN to a VLAN. When you need to move from one FCF to another FCF—i.e., from one Nexus 5000 to another Nexus 5000—how should the VSAN information be presented? Should the VSAN information reside in the 802.1q VLAN tag, so that an 802.1q VLAN trunk is considered a trunking E_port with regard to VSANs? Or should the VSAN information remain embedded in the FC commands that are encapsulated by Ethernet? This fundamental question has not yet been answered. There are advantages and disadvantages to each approach, and as the T.11 group responsible for FC-BB-5 and other FCoE standards hasn’t yet come to agreement yet on how to handle this, then it’s currently not possible to create the FCoE equivalent of trunking E_ports (I believe these will be referred to as VE_ports). Since you can’t create VE_ports, you can’t connect multiple FCFs together, and you can’t build a multi-hop FCoE fabric composed of multiple FCFs.

As you can see, then, that even with FIP present in all components, neither definition of multi-hop FCoE is possible today. Although a Nexus 5000 can function as an FCoE passthrough, Cisco doesn’t recommend that architecture. Without any other IEEE DCB-capable Ethernet switches available to use as an FCoE passthrough, that makes the first scenario impossible to build. Likewise, the inability to create VE_ports and trunk VSANs across multiple Nexus 5000 switches means that it’s impossible to build the second scenario today. While multi-hop FCoE is the ultimate goal, it’s just not possible right now.

Here’s some food for thought while you digest this information: how would a fabric extender change things? That’s a topic I’ll delve into in a future post, so stay tuned!

Of course, FCoE experts and wizards are encouraged to add your corrections, clarifications, and thoughts in the comments below.

Tags: , , , , ,

Update: See this follow-up post for more information.

I mentioned yesterday on Twitter that I’d had something of a revelation with regard to Fibre Channel over Ethernet (FCoE). This is probably nothing new to the experienced storage intelligentsia, but I’m just a simple guy so this was a big deal. After a spirited discussion in the Cisco UCS class about how to best leverage “FCoE-capable” storage, I have come to this realization: there is no such thing as an end-to-end FCoE solution.

If you’re impatient and want the short story, here it is: Even if you have an FCoE-capable storage array and you have FCoE converged network adapters (CNAs), you still can’t build an end-to-end FCoE solution. Why? Because you must put a standard Fibre Channel switch into the mix in order to provide fabric services like zoning, etc., because equipment like the UCS 6100 fabric interconnects and the Nexus 5000 don’t provide those services.

Here’s the longer version. We were having a discussion in the Cisco UCS training class revisiting the northbound FCoE connectivity issue that I discussed here. It turns out that the UCS 6100 fabric interconnect runs in NPV (or end-host) mode, so you can’t hook up any sort of storage target, FC or FCoE, directly to the UCS 6100 fabric interconnect. Even if you were to enable the UCS 6100 fabric interconnect to run in switch mode—something that’s not possible today—you still can’t hook a storage target, FC or FCoE, to the fabric interconnect because the fabric interconnect doesn’t provide any fabric services. Further, even if you were to leave the UCS 6100 fabric interconnect in NPV mode and add a Nexus 5000 switch to the mix, you can’t hook the the UCS 6100 and the Nexus 5000 together because FCoE isn’t multi-hop capable (yet). If I understand correctly, the FC-BB-5 standard includes FIP, which will address this limitation. However, according to the information I’m getting here—and I’m fully open to more information from others who are “in the know”—even that won’t fully address the problem because neither the UCS 6100 nor the Nexus 5000 will offer fabric services. So, you will still need a traditional Fibre Channel switch, like a Cisco MDS 9000 series, to provide fabric services.

The end result is that, today, it’s impossible to build an end-to-end FCoE solution. You will still need a traditional Fibre Channel switch somewhere in the mix, either to connect the FCoE equipment together (for example, to link a UCS 6100 fabric interconnect to a Nexus 5000) and/or to provide fabric services.

<aside>Now, there seems to be some confusion within Cisco, as the UCS resources to which I’ve been speaking are confirming my conclusions, but others (consider this tweet by Brad Hedlund) are saying it’s not true. I don’t know who’s correct—I can only go on what I’m being given.</aside>

As a result, it seems completely futile and useless for storage vendors to offer FCoE support on their storage arrays until these issues are addressed. In my mind, this further cements FCoE as an “edge-only” solution. Adding fabric services to the Nexus 5000 and/or UCS 6100 fabric interconnects would address this problem, and perhaps that’s something that is now enabled and made possible via the FC-BB-5 standard and FIP. If so, I have yet to hear a timeline in which these limitations will be addressed.

Either way, if you’re thinking of deploying FCoE today, be sure to keep this in mind or you could find yourself in for a surprise.

Courteous comments and clarifications are welcome!

Tags: , , , ,

« Older entries