I/O Virtualization and the Double-Edged Sword

I recently came across this blog entry over at Xsigo’s new corporate blog, I/O Unplugged. A key phrase in this blog entry really caught my eye:

The reality is that FCoE solves neither the complexity nor the management problems. It is a minor change to the status quo when a major leap forward comparable to server virtualization is needed for I/O.

At first glance, I’d say they are right. FCoE was designed from the ground up to be completely compatible with Fibre Channel—and that’s one of its key strengths. Yes, Xsigo’s InfiniBand-based solution is a very different architecture, and the set of capabilities provided by the Xsigo I/O Director are very different than the capabilities enabled by an FCoE solution such as the Cisco Nexus 5000 (or the newly-announced Nexus 4000). I wouldn’t necessarily disagree that Xsigo’s solution might offer some benefits over FCoE. I would strongly contend, however, that FCoE does offer some benefits over InfiniBand-based I/O virtualization solutions.

See, every technology decision is a double-edged sword. Xsigo “breaks the mold” by using a new architecture based on InfiniBand, but this decision comes at the cost of compatibility. Cisco chooses to go with a “less innovative” solution, but gains the benefit of broad compatibility with a large installed base. There is no one solution that offers all advantages and no disadvantages. That being said, which is more important to you and your company: innovation or compatibility? These are the sorts of questions you need to ask when evaluating solutions.

What do you think? Feel free to post your thoughts below. Vendors, please be sure to disclose your affiliation. And, in the spirit of full disclosure, keep in mind that my employer is a Cisco partner, but I have worked with both Xsigo and Cisco solutions. The thoughts I post here do not reflect the thoughts or views of my employer.

Tags: , , , ,

70 comments

  1. Peter van den Bosch’s avatar

    2 years ago IO never was an issue. These days i come across some cosutmers that reach the limits of the “old” bandwiths and protocols. So a solution must be found soon! Infiniband, FCoE, 10GB etc etc. As long as they found a solution that wil be broadly supported and available!

  2. Collin C. MacMillan’s avatar

    As far as compatibility is concerned as a differentiator to Xsigo’s approach, I say FCoE is not really compatible with anything but FCoE… Since you can’t exactly drop-in an Ethernet switch and bridge FCoE traffic, it requires a new bridging fabric – FCoE – which is a new animal.

    To split hairs – since you’re feeling contrary – I’d say Xsigo’s a new spin on existing technology (Infiniband) that encompasses MANY more interconnect options (low-latency GE, storage, memory interconnect architectures, HPC, etc.) while FCoE presents a bridge solution to the GE/10G/FC interconnect issue only. Xsigo’s “bang-for-buck” proposition is that is encompasses HPC and Cloud in the same architecture. Both Xsigo and FCoE require new cabling…

    FCoE’s “strength” is legacy GE and FC – but then again I/O director has that too (via I/O module)… Since new HBAs are a requirement of each, neither technology really wins there. Since Infiniband is proven in HPC architectures and low-latency 10GE is in its relative infancy there – Infiniband’s climbing interconnect rates make it more compelling (today.) If I were going cloud+HPC – especially within a scalable, flexible/reusable infrastructure – I’d go Xsigo and treat storage as an edge device (FC, 1/10G iSCSI), no questions asked.

    You asked, and that’s what I think…

  3. Brad Hedlund’s avatar

    Scott,

    The title of this post attracted me to it in hopes that I could high-five and cheer you on, or roll up my sleeves up for a healthy debate. Although after reading this, I’m not sure what to make of it.

    At any rate, I’m surprised you did not draw any comparisons to Cisco UCS, as what the Xsigo post claims FCoE is lacking (I/O management) is entirely included in UCS, and then some, with all the advantages of Ethernet and without all of the disadvantages of (cough) Infiniband.

    Cheers,
    Brad

    (Cisco)

  4. Nigel Poulton’s avatar

    On the FCoE topic. I know that you already know this, but its worth mentioning that there is more to a converged Ethernet fabric thatn FCoE. The beauty is in the DCB/CEE and what it offers to more than just doing Fibre Channel over Ethernet. The future of Ethernet really looks exciting to me.

    Another thing that DCB/CEE/FCoE…. has over Xsigo’s InfiniBand offerings is that there is huge competition in the Ethernet space, and competition drives innovation, cost reductions and other good stuff like that.

    My 0.02 penny’s worth.

  5. slowe’s avatar

    Great comments everyone!

    Peter,

    That’s one of the points behind this post—while InfiniBand might have some technological advantages over DCE/CEE/DCB (and that is not necessarily a fact), Ethernet is much more likely to be able to deliver cost-effective, broadly supported solutions in volume. That’s something that cannot be overlooked, IMHO.

    Colin,

    The “compatibility” is in the fact that you don’t need to “rip-and-replace” to embrace FCoE. True, with the Xsigo I/O Director and its I/O modules, you can do the same. The key difference here is volume—InfiniBand has been around for quite a while, but it is still relegated to specific use cases. Ethernet has shown that it will, generally, triumph over other technologies on the basis of compatibility, scalability, volume, and price. Token Ring was widely regarded as superior to Ethernet, yet where is Token Ring today? The “best technology” doesn’t necessarily always win. If it did, we’d be using OS/2 instead of Windows. ;-)

    Brad,

    The point of this post is that both FCoE/DCE/CEE and InfiniBand have their advantages and disadvantages. Neither technology is perfect. I’m not going to cater to Cisco and say that FCoE/DCE/CEE is the greatest and the best and that it doesn’t have its foibles and flaws. Neither am I going to cater to Xsigo (or any other InfiniBand vendor) and say that InfiniBand is the best thing since sliced bread. Every technology decision has advantages and disadvantages. That goes for the technologies that vendors choose to incorporate into their solutions as well as for the products that customers choose to adopt in their data centers.

    Having said all that, I do believe that FCoE/DCE/CEE will eventually become the preferred high-speed connection based on broad support, multiple vendors, volume, and price. As a result, solutions—like Cisco UCS—that are built on FCoE/DCE/CEE will naturally have an advantage over other solutions.

    Nigel,

    You are correct, and thank you for pointing that out. There ARE other advantages to DCE/CEE/DCB that will bring benefits to other types of traffic as well, and those advantages are NOT addressed using the Xsigo solution with traditional I/O modules.

  6. Massimo Re Ferre'’s avatar

    Scott,

    my take is fairly simple. Both technologies uses a sort of “new approach” to reduce complexity (hence improve management) from the hosts to the core. Both are then capable of “proxying” towards legacy FC / Eth environments. In my opinion customers shouldn’t care too much about the implementation details re what you put between your hosts and your “core” provided both solutions guarantee the “legacy” connectivity at the core level.

    FCoE is going to win not because it is better (maybe it is but that’s not the point). It is going to win for two reasons:
    1) economy of scale (hence lower price and more skills).
    2) Cisco is Cisco. Xsigo is Xsigo (with all respect for Xsigo).

    Massimo.

  7. slowe’s avatar

    Massimo,

    I agree with you—FCoE might or might not be better technologically than InfiniBand, but that’s an entirely different discussion. In the end, I believe that DCE/CEE/DCB will win for exactly the same reasons you list.

    Thanks for reading and commenting!

  8. Collin C. MacMillan’s avatar

    I would not argue any of your points – in fact, I’ve argued for some time that one form of DCE will win out over the not-so-long term for the same reasons Ethernet -> 1GE -> 10GE has/has/is winning-out: ubiquity. Who will win was not the question, was it? It seems that innovation and capability seem to be strongly in Xsigo’s favour vs. DCE.

    My point is that today Xsigo’s approach – for the cloud, virtualization and hpc solutions we like to talk about – scales better than DCE today. Xsigo’s approach clusters and promotes HPC capabilities within the same infrastructure today – something still baking in DCE. And, the elephant in the room so far is bandwidth: Xsigo’s approach is 20G/port out of the box in DDR (without clumsy LAG). Infiniband QDR is already baked too, so DCE has some serious catching up to do (albeit Xsigo is not on the QDR wagon yet.)

    FCoE capability requires end-to-end DCE, so it’s not really and less rip-and-replace than Xsigo on that front. Ultimately, DCE/CEE/DCB’s appeal is much like that of Ethernet v. Token Ring, et al – momentum. The hazards/short-comings of today’s implementation will go away as more capability is folded into the emerging standards.

    Will we see an order of magnitude more 10GE than DCE? Yep. An order of magnitude more DCE than Xsigo? Probably. Will Infiniband be replaced by DCE? Not any time soon…

  9. Brad Hedlund’s avatar

    Scott,

    As techies we can debate the merits of Ethernet vs. Infiniband and this protocol vs. that protocol. But the ultimate reality is that the business decision maker doesn’t care about such non-sense anymore. The game has changed from data center connectivity to data center automation & orchestration that only a tight integration of network + compute + virtualization + storage can provide.

    Xsigo saying “a major leap forward was need for I/O” is partly right, but this misses the bigger picture and market transition that has already happened in the data center.

    For that reason alone Infiniband and Xsigo will at best remain a niche if not doomed all together. Too your point, Ethernet will win, but not because of bits per second or this wiz bang protocol or that wiz bang switch. It will be because Ethernet is the more obvious choice for network + compute automation.

    Cheers,
    Brad

  10. Eric Stephenson’s avatar

    I have heard the expression “better isn’t better; it’s different”.
    The practical compatibility of fcoe with conventional fc can outweigh potential technical advantage of infiniband.

    I suggest Virtual Iron suffered early with their initial infiniband solution – difficult to require customers to deploy new infrastructure.

  11. Craig Thompson’s avatar

    We over here at Aprius can be a little torn over debates like this. We’re in the I/O virtualization camp, much like Xsigo, and believe in the benefit of virtualizing, sharing and managing all I/O (ethernet, fibre channel, FCoE, SAS etc) as a ‘pool’ of resources across multiple servers, much like server virtualization.

    We’re also pragmatic and see the inherent advantages in FCoE, especially with the backing of Cisco. That is a horse that I just wouldn’t bet against. But cost can be an issue for some customers, especially when FCoE solves some but not all of the issues associated with server I/O going forward (as Xsigo has recognized).

    We would propose virtualizing I/O, including FCoE, across servers and managing FCoE I/O as a fungible resource pool. We (and others) can do this because we operate within the PCIe Express domain and do not introduce another protocol layer, like Infiniband. By providing a ‘shared’ PCIe fabric, any PCIe resource (including an FCoE CNA or 10GbE card) can be virtualized and dynamically managed across a group of servers.

    It is foreseeable to get both the benefit of Xsigo-like functionality AND FCoE AND maintain your choice of card and switch vendor.

  12. slowe’s avatar

    Collin,

    I think you and I are in agreement—InfiniBand isn’t going to go away, because there are advantages to using it. At the same time, InfiniBand isn’t going to see the broad adoption that DCE/CEE/DCB will see. And no, who will win wasn’t the original point of the post; the original point of the post was recognizing that all technology decisions have their advantages and disadvantages, and any vendor that claims otherwise is not being honest with you.

    Brad,

    I don’t think InfiniBand is doomed, but—like I said to Collin above—I also don’t that InfiniBand will see the broad adoption that DCE/CEE/DCB will see. As a result, InfiniBand-based I/O virtualization solutions are also relegated to a niche market, IMHO.

    Eric,

    Agreed—FCoE’s compatibility with the HUGE installed base of Fibre Channel is both a strength (easing adoption of the new technology) and a weakness (preventing some sort of “huge leap” into a new I/O model). That is really the core of my point: deciding to make FCoE fully compatible with FC created both advantages and disadvantages.

    Craig,

    My flight is getting ready to leave, so I’ll respond to your comment later.

    Thanks to all for reading and commenting!

  13. Rod’s avatar

    Xsigo’s technology is fabric agnostic, e.g. there is nothing inherently *requiring* Infiniband. Infiniband just happens to be the only reasonably priced shipping standarized fabric technology that has enabled virtual I/O for a while, thus Xsigo adopted it.

    When DCE/CEE/DCB are fully baked and are on par with Infiniband, Xsigo could adopt that fabric technology and keep all the benefits they have today with being able to swap I/O personalities, dynamically provision I/O etc.

    The nice thing about Xsigo is you don’t have to wait for eventually to realize all the benefits of virtualized I/O today.

  14. Ariel Cohen’s avatar

    I like the lively discussion here, but the comments about InfiniBand vs. DCE vs. PCIe miss the point of my post on the Xsigo blog. The post was not about I/O fabric technologies per se. It was about whether FCoE has what it takes to deliver a fundamental improvement in data center I/O to match the improvements that we’re seeing on the server side with the shift to server virtualization and cloud computing. My focus was on why the answer to this question is no. FCoE merely provides convergence of FC and Ethernet networking while sticking to the same old I/O architecture, when what is really needed is a transition from physical I/O to virtual I/O.

    A discussion of I/O fabric technologies is valuable, but it’s a separate topic which I plan to discuss further in our blog. The concept of an I/O Director (or an I/O virtualization platform) and the great value that it brings to a modern data center should not be confused with mere I/O fabric technologies (IB, DCE, PCIe).

    In fact, the unique aspect of our approach at Xsigo from the start has been the strong focus on solving the fundamental problems of data center I/O, which was an area that was lagging behind, didn’t see much change in many years, and was becoming more and more painful and expensive for data center managers. The rigid physical aspect of I/O connectivity, the plethora of adapters, cables and edge switches, the inescapable need to over-provision, the inability to manage it all centrally and flexibly, and the lack of QoS were all problems that we were targeting. Our focus wasn’t on protocols or convergence or some other relatively minor aspect, but on finally really doing what needed to be done to modernize I/O.

    An analysis of the full scope of the I/O challenge led us to an elegant and simple solution. Servers just need a high-performance I/O pipe connected to flexible I/O virtualization devices (the I/O Directors). Beyond the device needed to connect to the I/O pipe, I/O devices do not belong in the servers – having the I/O devices in the servers is what got us into trouble to begin with!

    Instead, the I/O devices are in external I/O Director systems where they can be shared with QoS guarantees, their cost can be amortized, and they can be centrally managed with full freedom to change the I/O identity and connectivity of servers to standard Ethernet and Fibre Channel networks. NICs, HBAs, and potentially other I/O devices now become virtual I/O devices which can be assigned, configured, and migrated between servers and between physical external networks as needed. We had to pull the I/O devices out of the servers and virtualize them to achieve this.

    This architecture led us to the simplicity, flexibility, elegance, and manageability characteristics that we were looking for in the solution that we wanted to build. An architecture where the I/O devices are still in the servers, even if they can be virtualized to some degree within the servers and even if networking and storage are converged, just doesn’t provide those characteristics. You end up with a pile of technologies which can be explained only with incredibly complex diagrams, lack of manageability, and a lack of a unified solution to the problem.

    As I mentioned, I promise to contribute to the I/O fabric discussion (IB vs. DCE vs. PCIe) as well – it deserves a separate post, and the I/O pipe technology between the servers and the I/O Directors is a separate topic from the I/O Director technology itself.

    Ariel Cohen
    CTO, Xsigo Systems

  15. Jim Ensign’s avatar

    Can someone explain to me why VMWare picked Xsigo for all their demos and in all their server racks during VMWorld2009 despite Cisco’s investment in them and the all the claims around UCS.

    My guess is that UCS did NOT work, did not scale, could not support 500-1000 VMs in a rack and a whole lot more.

    Customers want things that work and not hype and claims.

  16. Massimo Re Ferre'’s avatar

    BTW.. a bit off topic but somehow related.

    The FCOE advantages are clear in a 2 NICs + 2 HBAs scenario. However most virtualization deployments require more (6-8-10+ .. I have seen customers with 22 NICs per ESX servers – and for their good reasons in terms of network architecture/security boundaries).

    Of course being able to reduce 22 NICs + 2 HBAs “all the way down” to 2 CNAs and some 20 standard NICs (for the security reasons above) is laughable.

    Customers that didn’t want to use VLAN technologies to trunk their multiple 1Gbit security zones are most likely not going to trust them on 10Gbit. This is obviously not just a bandwidth issue…..

    Is FCoE (more precisely DCE/CEE) going to do something for this?

    Massimo.

  17. slowe’s avatar

    More good comments! BTW everyone, recall that I don’t agree with referring to FCoE as an “I/O virtualization technology”:

    http://blog.scottlowe.org/2008/11/17/fcoe-versus-mr-iovhuh/

    Ariel,

    Thanks for chiming in. The discussion of PCIe vs. DCE/CEE/DCB vs. InfiniBand wasn’t the point of my post, either. The real point of my post was that by deciding to make your solution based on InfiniBand, you gained some advantages (low latency, high bandwidth interconnect technology) and some disadvantages (no direct compatibility with existing installed base). Likewise, by deciding to make FCoE directly compatible with the existing FC installed base—which, by the way, is quite significant in size and value—they gained some advantages (compatibility) and some disadvantages (all the same limitations of the current FC model). Every technology decision is about trade-offs. You know the old cliché: “You can’t have your cake and eat it, too.” That’s very applicable here, IMHO.

    Jim,

    I honestly don’t know. That would be a question only the VMware team can answer.

    Massimo,

    Great question! Anyone know the answer? (I’m curious about this answer, too.)

    Craig (of Aprius),

    I promised I’d get back to you. In my mind, this is the “purest” form of I/O virtualization because you are just virtualizing the I/O devices, not encapsulating I/O types on top of other I/O types. While this technology decision has some advantages, it also has some disadvantages, and this is the very point of this post. By the fact that Aprius decided to go down the PCIe virtualization/SR-IOV/MR-IOV route, there were certain features or functions that you weren’t able to provide. At the same time, when the Xsigo team developed their InfiniBand-based solution, the choice to build their product in the way that they did meant that certain features and functionality just weren’t possible. The same goes for Cisco building FCoE-capable switches—some things just weren’t possible. I’m not saying that any one of these approaches is inherently better or worse—as Ariel says, that is really a separate discussion—but that all vendors must accept that their solution cannot be the “perfect solution” for this very reason. As a friend of mine once said, “People are imperfect; therefore, everything we create is imperfect. Only perfection breeds perfection.”

    Thanks for reading and commenting!

  18. Craig Thompson’s avatar

    Scott – great discussion. You should be flattered by the amount of interest here!

    The conversation may have gotten sidetracked by the transport debate, but the only point I was really trying to make is that FCoE and I/O Virtualization can and should live happy lives together. In fact they do already, albeit in a limited way, via SR-IOV support on CNAs. But it is entirely possible that this can be expanded across many servers (a la Xsigo, Aprius etc) with all the management bells and whistles that companies like Xsigo have pioneered. We can have our cake and eat it too.

  19. Brad Hedlund’s avatar

    Massimo / Scott,

    Yes it is possible to have as many as 128 virtual adapters, and combination of vNIC or vHBA on an FCoE capable *Ethernet* adapter.
    One of several use cases for this adapter would be for exactly what you described, a customer with multiple NIC’s in their server, and maintaining that same adapter footprint on a consolidated 10GE adapter.

    I have written about these new Ethernet adapter technologies on my blog and Scott was kind enough to link to it in previous posts.

    (Scott, if I may do so again: http://www.internetworkexpert.org/2009/08/11/cisco-ucs-nexus-1000v-design-palo-virtual-adapter/ )

    The advantage here is that this provides extreme I/O virtualization on Ethernet, and when used in Cisco UCS, you can not only manage the virtual I/O but the entire configuration of the server, including the I/O, as well as the servers upstream LAN/SAN settings from a single management input.

    This goes back to my statement earlier that a market transition has already happened for complete integration of networking + compute. So a solution that just addresses I/O management alone is a day late and a dollar short.

    Cheers,
    Brad

  20. slowe’s avatar

    Brad,

    The Palo adapter (and SR-IOV in general) does allow you to present multiple virtual devices, but it doesn’t change the fact that multiple types of traffic are still running on the same wire. Customers must still rely upon VLAN technologies to segregate the traffic, so your answer doesn’t really answer Massimo’s question, IMHO.

  21. Brad Hedlund’s avatar

    Scott,

    Not true at all. I describe this in my article. The Palo adapter uses NIV tagging from the adapter to the upstream switch (UCS Fabric Interconnect). This allows you to manage each virtual adapter as if it were a physical adapter from a network perspective.

    For example, with Palo I could have 50 virtual adapters, none of which are doing any VLAN tagging, and they could each be connected to different VLAN’s on the upstream switch (UCS Fabric Interconnect).

    Cheers,
    Brad

  22. Omar Sultan’s avatar

    Scott:

    So, if we are looking at something like:

    Server > Palo > Nexus 5K (FCF)

    are you asking about segregating storage traffic from data w/o VLANs or generally segregating any VIF traffic from any other VIF traffic without VLANs?

    Omar

  23. nate’s avatar

    My thoughts on FCoE I posted to my own blog a couple months ago:
    http://www.techopsguys.com/2009/08/17/fcoe-hype/

    In a nut shell I view FCoE as a high priced scam, what I would like to see is converged ethernet adapters that have embedded iSCSI and network on the same adapter, and can run multiple VLANs with varying frame sizes on the same port, yes I want jumbo frames on my ethernet storage network! Add in virtualization ala HP VirtualConnect for even nicer connectivity options.

  24. Massimo Re Ferre'’s avatar

    Brad.

    Thanks.

    More than 2 years ago, after VMworld 2007, I posted a similar article on my blog (Scott won’t mind I hope: http://it20.info/blogs/main/archive/2007/10/30/75.aspx). Well that was about IB … we know now that if that is going to happen it will happen on 10Gbit Ethernet but the fundamental architecture is there. It’s not very different from your diagram from a logical perspective (BTW, you are an artist… love those drawings).

    You’ll forgive me but I haven’t caputure 100% from your article how you are going to (channel and explode) all the vNICs inside the Nexus architecture. I have searched for VLANs and found 18 occurrences (and that’s not good :-) ).

    The problem is….. if a customer has 22 NICs (let’s assume 11 network security zones x 2 for redundancy) it’s very likely they have 11 disconnected “network devices” they need to connect to. What the “legacy” networking team may be concerned about is to shortcut all those physically segmented zones into a single device (no matter what it is).

    Well in the end it does matter. If the “mapping” you do inside your architecture between vNICs and the external port of your shortcut device is based on VLANs…… this is not going to work for all these customers (otherwhise they would have implemented already VLANs and PortGroups at the VMware level cutting those 22 NICs by 70-80% already).

    If the “mapping” you do inside your architecture is based on some other technologies (that are not based on VLANs) than the networking team might only be worried…. but you may have a chance to explain them why your new method is better / more secure than VLANs.

    I entertained a very similar discussion with some HP folks on the VMware forum re their HP Virtual Connect. See here: http://it20.info/blogs/main/archive/2007/10/30/75.aspx (warning: long thread).

    Massimo.

  25. Brad Hedlund’s avatar

    Massimo,

    Thank you for reading my post about NIV and the compliments. I appreciate that.

    You raise a very good point about security zones that are currently handled by physically separate networks. When I was writing the NIV post and drawing the diagrams for it I had your exact scenario in the back of my mind, and I struggled with whether or not I should address it. I decided not to take it that far in order to error on the side of keeping things very *simple* so that the fundamental concepts of NIV were crystal clear.

    Getting back to your scenario, I believe it would be possible to have no dependency on VLANs for attaching the various vNICs to 11 different security zones. This would be enabled by a capability that is in Cisco UCS today called “LAN Pin Groups”.

    To briefly explain, in your scenario, I would have 11 distinct physical interfaces from the UCS Fabric Interconnect linking to their respective 11 different networks. Each one of these 11 links would be defined as a “LAN Pin Group”, creating 11 groups. Each of the 11 vNICs on the servers would be assigned one of the 11 pin groups, which would essentially force all traffic from that vNIC up that it associated link with no dependency or consideration of VLANs. I believe this would qualify, based on your comment, as something the network & security teams would be willing to listen to.

    Nonetheless, this does warrant a follow up post as another potential use case for NIV. Thanks to you, and Scott, for the inspiration to get blogging again this week!

    Cheers,
    Brad

  26. Paul’s avatar

    Brad

    I’d be keen to see the concept of these pin groups explained along with an idea if it allows me to securely collapse the cabling (i.e pNIC) requirement – sonmething VLAN’s (which were never meant as a securty concept even with pruning etc) cannot do

  27. Jim Ensign’s avatar

    Great discussion.

    Brad – I am yet to hear from you as to why VMWare did not pick Cisco-UCS to show all the capabilities that you assert during VMWorld-2009.

    We all get enamored with our own technologies and positions – but ultimately technology is meant to solve real world problems and tasks.

    I hope everyone recognizes the positions we all take is all about our own center of knowledge and biases/support for one view or the other.

    My measure for success is when deployments happen, proof positive echoes are heard back from customers and momentum occurs in the marketplace. From my remote vantage point – I see Xsigo having a different thinking, tackling the issues of I/O Virtualization uniquely and most importantly by using existing building blocks and by NOT creating entirely new standards. For that I give them credit.

    Ciao

    Jim

  28. Brad Hedlund’s avatar

    Massimo / Scott,

    Hmm, after thinking about the above 11 vNIC security scenario w/ NIV a little more, there still may be a VLAN component involved in forwarding decision on the UCS Fabric Interconnect prior to “LAN Pin Group” logic being applied. Question now is: Is that a bad thing?

    Just wanted to clarify that while I craft up more diagrams for another blog post dedicated to this scenario. Be sure to grab my RSS feed :)

    Cheers,
    Brad

  29. rodos’s avatar

    Pin groups! Good spotting Brad. It had not occurred to me (must be a bit slow) that for a Palo if you are wanting to separate traffic on different vNICs without a vLAN you are going to want to take that all the way northbound and have an uplink port for each using a pin group to direct traffic. I suspect you can also do it the other way too, using vLANs north but not south (for whatever use case you had).

    Great discussion everyone!

    Rodos

  30. Brad Hedlund’s avatar

    Rodos,

    After drawing this out a few times I have realized that separate LAN Pin Groups will not serve as a traffic separation mechanism between two Palo vNICs at the UCS Fabric Interconnect level (contrary to what I said earlier).

    In other words, each Palo vNIC is assigned to a VLAN (just as any physical adapter would be), and the UCS Fabric Interconnect will locally switch unicast traffic between any two vNICs placed in the same VLAN regardless of any LAN PIN group settings. So, per Massimo’s scenario above, placing the 11 vNICs in separate VLANs would be needed to prevent local switching between those vNICs.

    Therefore, for a customer like Massimo describes with 11 NICs in server for the purpose of connecting to 11 separate physical switches for security reasons, well that customer would need to get over the fear of VLANs to take advantage of the physical adapter consolidation and virtualization that NIV provides. Im brewing up a post on this very scenario, stay tuned.

    Cheers,
    Brad

  31. Massimo Re Ferre'’s avatar

    Brad,

    sorry about that little question that boiled the ocean…..

    If that is the conclusion …. than I miss what would be the advantage of using these new (cool) technologies vs standard legacy VLANs and VMware PortGroups implementations (should one get passed the “VLAN is not a security boundary enforcement” fear).

    BTW my example was a bit of a stretch (yet real but admittedly a stretch). Having this said, while many SMB customers would do with just a pair of teamed pNICs where they could collapse everything (COS + VMs + VMotion etc etc) most enterprise customers are not so “flexible”. Sure 22 NICs might be a strect but 8/10/12/ NICs would be a common scenario as they want to physically separate things like the management network (where the COS lives and may be HW management interfaces) from VMs (maybe multiple networks), from VMotion, etc etc etc.

    So this is not a so uncommon scenario among enterprise customers. The point I am trying to make is that there is a lot more to be done (and a lot more to be gained) in allowing customers to collapse 8/10/12+ separated networks onto a single trunk than there is in allowing customers to collapse 1 network segment with the FC storage protocol. After all you are Cisco… if you don’t do that …. who is going to? :-)

    I hope you’ll appreciated this is not a vendor discussion but just a “geek round table” (but yet with an eye on customers’ requirements, buying behaviors and internal politics/policies).

    Massimo.

    P.S. If I have ever wondered if one day I would join the Cisco work force… well after asking these questions I guess this is no longer open for discussion … :-D (just kidding).

  32. Brad Hedlund’s avatar

    Massimo,

    Sorry? cut that talk out my friend, these are the kind of discussion I absolutely *LOVE* — so, Thank you! And thank you Scott for hosting it.

    I agree with you that many physical adapters can, in many cases, be consolidated to fewer adapters with VLANs. VMware is perfect example and customers are doing exactly that today in servers with 2 x 10GE NICs.

    My original point about NIV and the Cisco Palo adapter vNICs was that physical adapters can be consolidated without changing the number of adapters seen by the servers and network switches, and without changing VLAN configurations on any of the adapters. For example, your customer with 8/10/12 NICs can port that exact configuration into Cisco UCS w/ Palo adapters without needing to make a single change to that 8/10/12 NIC design, greatly easing the adoption of 10GE.

    Furthermore, imagine the scenario where each virtual machine gets its own physical adapter with VMDirectPath I/O. If I had a ESX host with 30 VM’s and needed to install 30 separate physical adapters in the server to make this work, well, that simply would not work. However with Cisco UCS + Palo NIV this scenario is a very real possibility, just provision 30 vNICs in the servers Service Profile and off you go.

    Also consider that not every workload is the same, some can use 2 NICs, some might absolutely require more (Oracle RAC for example). Running these varying workloads in a server pool where the number of adapters on each server can be dynamically provisioned, such as with Cisco UCS + Palo, this allows for tremendous flexibility with were each workload can be located within that server pool. Once the physical hardware configuration of the server no longer matters the customer can now explore the possibilities of stateless computing.

    In summary, NIV, and the malleable nature of I/O configurations with the Palo adapters in Cisco UCS affords the customer tremendous *flexibility* in porting existing multi 1GE configurations into 10GE today, and adopting stateless computing tomorrow.

    Cheers,
    Brad

    P.S. You can join Cisco any time you want. I would be thrilled to have you on the team. Just let me know when you are serious about it :)

  33. Ariel Cohen’s avatar

    It sounds like the conclusion is that UCS doesn’t enable you to consolidate from multiple 1GE NIC ports on different networks to a single 10GE NIC port (Massimo’s question). I understand that you can transition to 10GE with VLANs, and that may be ok in some cases, but it may be viewed as a complication.

    It boils down to the fact that in the case of UCS the external devices are switches and not I/O virtualization systems. With an IOV system, vNICs and vHBAs can be terminated on different Ethernet and FC uplink ports as needed without imposing VLANs or VSANs, so the networking model doesn’t need to change.

    Having an external IOV system (vs. internal-to-the-server IOV like UCS) has other inherent benefits.

    For example, QoS control at the granularity of a vNIC or a vHBA (not an entire traffic class like FCoE).

    Another advantage is being able to change server I/O controllers without having to open the server. Want to move from 4Gbs FC to 8Gbs? No problem. Want to move from FC to iSCSI or from FC to FCoE (or vice versa)? Again, no need to crack open the servers and mess with cabling and edge switches.

    Beyond all this, there are also the central and uniform I/O management advantages of the external IOV model – it’s very hard to manage IOV within the servers due to its distributed nature and the plethora of environments: operating systems and CNAs.

    Ariel

  34. Massimo Re Ferre'’s avatar

    Brad,

    let’s put it this way. I’ll give you the benefit of the last word on this thread ;-)

    Keep up with the nice drawings.

    Massimo.

  35. Massimo Re Ferre'’s avatar

    Ariel,

    I don’t want to get into the UCS/Nexus Vs Xsigo battle.

    While doing what you describe with VLAN will leave many customers with a bad taste in the mouth…. don’t assume for granted that trying to do this with a different technology will make this fly to every single account. The networking team (especially if in “security paranoid” mode) will be concerned to attach and shortcut their physically separated (by design) networks into a “single device”. No matter what the “device” is.

    Not using VLANs is an advantage (admittedly) but the challenge is also to by-pass the “shortcut syndrome” in general.

    I didn’t call out when I linked above to my blog but… have a look at the first customer comment in the I/O virtualization post: (http://it20.info/blogs/main/archive/2007/10/30/75.aspx).

    Massimo.

  36. Jim Ensign’s avatar

    Nice response Ariel.

    It means that isolation and security are inherent benefits of an IOV system like Xsigo’s – just by the mere fact of creating a vNIC or vHBA — and one does not have to jump through hoops to achieve this security……Correct??

    Jim

  37. Ariel Cohen’s avatar

    Massimo,

    We’re in agreement. However, there are two angles to the VLAN/VSAN issue. You talked about the trust and security angle, but there’s also the angle of complexity and management. The internal IOV approach adds the requirement to manage VLANs and VSANs to achieve isolation. External IOV doesn’t impose this. If you didn’t have to worry about it before because you had separate ports connected to separate networks and SANs, why should it suddenly be forced upon you to try to map that to VLANs/VSANs? Not to mention that it’s not exactly a simple exercise based on the diagrams that I’ve seen. The dual correlated tasks of IOV configuration on the CNAs within the server and the configuration of the external switches to achieve what you need is a complex undertaking…

    By the way, the comparison I’m making here is between internal IOV (virtualized CNAs connected to switches) and external IOV (channel adapters or bus extensions connected to an external IOV system). It’s not specific to UCS vs. Xsigo – those are just two examples of the different approaches. This question is not vendor-specific; it’s goes to the core of the underlying architectures. Similarly, my comments about QoS at the virtual I/O adapter level and unified central management are related to the underlying architecture.

    Ariel

  38. Brad Hedlund’s avatar

    Ariel,

    In the interest of keeping the facts straight…

    Cisco UCS does provide granular QoS at the vNIC & vHBA level: http://bit.ly/4CpheW

    Cisco UCS does provide consolidation of multiple 1GE NICs into 10GE without requiring VLAN configuration changes at the adapter:
    http://www.internetworkexpert.org/2009/10/23/network-interface-virtualization-simple-example/

    Cheers,
    Brad

  39. Brad Hedlund’s avatar

    Ariel,

    Im curious, let’s talk a little bit more about Massimo’s security scenario again if you don’t mind. Your response to Massimo makes it sound as if there is 1:1 mapping of a vNIC to an external facing port on the “external IOV” with no local VLAN switching.

    Therefore if that is true, if I had 40 servers, and each server having 10 vNICs, and each vNIC needing to connect to different physical networks as Massimo described, that would require 400 ports on the “external IOV”, Correct?

    Cheers,
    Brad

  40. Ariel Cohen’s avatar

    Brad,

    Thanks for the details, but these details illustrate my points exactly.

    QoS: UCS simply provides the 8 classes of DCE traffic. While this is better than nothing, it is not a per-vNIC or per-vHBA QoS. For example, if I have 32 vNICs and 16 vHBAs on the server, with UCS I can map them to the 8 DCE classes, but I can’t assign separate individual bandwidth to each one of the 48 virtual adapters that I have on that server, which is something that I can do with the Xsigo I/O Director. In fact, the I/O Director goes even further than that by making it possible to carve out the bandwidth of each Ethernet and FC uplink port on a per vNIC and vHBA basis across the multiple servers that may be utilizing that port.

    Isolation without VLANs/vSANs: Based on your diagram, UCS still requires VLANs and VSANs on the fabric for isolation. Between the servers and the switches there is NIV tagging which ultimately associates the packets with VLANs and VSANs configured on the switches. The Xsigo I/O Director (and other external I/O virtualization solutions) does not impose VLANs or VSANs anywhere whatsoever. If you weren’t using them before for isolation because you have separate networks, you don’t need to use them after deploying the solution, and you don’t need to change your model.

    I think this is a good discussion which can serve to clarify the basic differences between the approaches. Thanks for engaging in this conversation!

    Best,

    Ariel

  41. Ariel Cohen’s avatar

    Jim,

    Yes to isolation and yes to not having to jump through hoops. Those are indeed benefits of the external IOV architecture.

    You can keep your Ethernet and Fibre Channel networks the same as they are now. Separate Ethernet networks and SANs can remain separate – you’ll connect them to different ports on the I/O modules of the I/O Director. Then you will assign vNICs and vHBAs to these ports and to the servers where you want them to appear. It really is as simple as that. No NIV/VLAN/VSAN required for isolation.

    Ariel

  42. Ariel Cohen’s avatar

    Brad,

    You would need one port for each separate physical network. Your example would require 10 I/O module ports (one for each of the 10 networks). Each port would have 40 vNICs associated with it (one for each server).

    You can assign more than one port per network if you need more bandwidth, of course.

    Ariel

  43. Massimo Re Ferre'’s avatar

    Ariel,

    in a way you are asking not to see the external IOV equipment as a “device” which is connected southbound to servers and northbound to the legacy worlds (ethernet and FC). Rather you are asking to look at the external IOV equipment as a “shared extension of the server internals”. That is… as opposed to have a dedicated legacy I/O subsystem PER EACH x86 servers with 22 NICS and 2 HBAs…. you are saying the external IOV equipment is a sort of EXTENDED (via the IB cable) and SHARED I/O subsystem FOR ALL x86 servers which provides the 22 NIC-like ethernet connections and 2 FC-like connection to the legacies? You then map (at the low level each virtual NIC / virtual HBA that you create on each server to the specific northbound port to connect to the legacy. This w/o using VLANs or things like that.

    Better than using switches with VLANs but yet it’s possible that customers may show symptoms of the “shortcut syndrome”. You may say that if you don’t shortcut those security zones at the Director level you will have to shortcut them at the server level anyway (after all those 22 NICs are installed into a single device called x86 server). I have heard this before and I have somewhat to agree with such a point of view.

    I am focusing on the security aspect just because that’s what you need to convince customers of. They know that from a management perspective there are plenty of better solutions than having 8/10/12/22 NICs per server…. they are just not bought on the security issues in using them.

    Good stuff.

    Massimo.

  44. Brad Hedlund’s avatar

    Ariel,

    Just as I suspected!

    What you have just admitted with your answer is that your “I/O Director” device is in-fact a network switch using VLAN forwarding and traffic separation mechanisms. The fact is, your solution handles Massimo’s scenario no differently than Cisco UCS would. Will you be retracting your incorrect statements: “Isolation without VLANs/vSANs”?

    Im glad we had this discussion about “architecture differences” because in reality we have learned there is no fundamental differences in the architecture. Your “I/O Director” device, and the Cisco UCS Fabric Interconnect, provide the same server network switching piece of the architecture.

    I have seen this time and time again, a vendor trying to mislead customers into believing a device is something other than a network switch out of fear the network team will get involved and compare the solution to Cisco. The two most recent examples of this are HP’s Virtual Connect, and now this, the Xsigo “I/O Director”. Once the customer figures out that you have in-fact sold them a network switch, after telling them it wasn’t, you may begin to loose credibility.

    Based on this discussion, customers reading this should ask themselves this fundamental question:

    What network switching technology makes the most sense for my data center? Infiniband, or Ethernet?

    Cheers,
    Brad

  45. Massimo Re Ferre'’s avatar

    Brad,

    I thought Ariel was saying Xsigo doesn’t need VLANs to do that. Did I miss a piece?

    BTW I am glad I am becoming (actually “Massimo’s question” is becoming) the benchmark here…. :-)

    Massimo.

  46. Ariel Cohen’s avatar

    Brad,

    Huh? Where on earth did you see me say that?

    I’m not sure if you truly don’t understand, or what’s going on, but here’s another attempt to explain.

    In the Xsigo solution, the vNICs and vHBAs are implemented OUTSIDE the servers in I/O modules on the I/O Director. IB acts as an I/O channel to connect servers to their external I/O devices just like a PCIe bus connects servers to their local I/O devices (and guess what, PCIe is not Ethernet either!)

    Different physical networks can be connected to different Ethernet and Fibre Channel ports on the I/O Director, and they remain isolated because the I/O Director doesn’t switch between them – it’s not an Ethernet or Fibre Channel switch – it’s a system for sharing I/O controllers. Multiple servers can share the same I/O module port within an I/O Director with full QoS control (committed and peak rates) at the granularity of an individual virtual I/O resource (vNIC and vHBA).

    Let me repeat – no VLAN/VSAN is needed to achieve isolation because networks connected to different I/O module ports on the I/O Director are already isolated due to the nature of the device as an I/O controller system and not an Ethernet or Fibre Channel switch. There is no forwarding between I/O Director Ethernet or FC ports, or between virtual I/O adapters assigned to different ports.

    The UCS architecture is the exact inverse. The vNICs and vHBAs are implemented on the CNAs INSIDE the servers. What you have outside the servers are switches (such as Nexus), not I/O Directors. This is why UCS needs to use VLANs/VSANs to limit traffic to specific ports – it’s how you carve out a switch into isolated pipes.

    There are other implications to the basic UCS architecture which I mentioned before: the limitations on what can be done with QoS (mapping vNICs and vHBAs to 8 DCE traffic classes rather than individual QoS per virtual adapter), and also management complexity implications due to the distributed nature of the solution.

    Ariel

  47. Ariel Cohen’s avatar

    Massimo,

    The isolation security aspect really varies from customer to customer and application to application. Some are ok with VLANs. Some don’t like VLANs, but are ok with the isolation provided by an IOV device. Some are ok with having devices share a local bus within the server, but don’t want them to share an external interconnect (even if isolated as in IOV). Some don’t even like sharing a bus within the server. Then there are those who only trust unvirtualized servers which are shared-nothing (no shared KVM switches either), and put servers in isolated sealed and locked racks with security cameras pointing at the racks. I’ve encountered all of those…

    Regards,

    Ariel

  48. Ariel Cohen’s avatar

    Massimo,

    One more point. I think your analysis is mostly on-target, but I differ a little bit. Separate physical networks are sometimes used for security, but I’ve seen them used more often for two other reasons.

    One is performance isolation. People are concerned about the performance interference effects of having all types of traffic on the same network.

    The second is that consolidation on the same networks with VLAN and VSAN partitioning throughout the fabrics is complex, with different management approaches taken by different switch vendors. It’s actually easier for people to understand and manage separate networks…

    Since an I/O Director just provides virtualized external I/O controllers, it is agnostic on this topic. If you use VLANs/VSANs, that’s fine. If you don’t, and you have separate physical networks, that’s fine too. If you have a mix, that works too.

    Internal-to-the-server I/O virtualization (such as UCS) doesn’t provide this flexibility. You need either multiple CNAs in each server going to different networks (which defeats the whole purpose), or you need to migrate to VLANs/VSANs, which may require re-architecting your data center (not just the edge server connectivity) if your approach was separate networks.

    Ariel

  49. Brad Hedlund’s avatar

    Ariel,

    Let’s have a quick networking 101 discussion about my original question concerning 40 servers, each having 10 vNICs for 10 different networks, as this question and your answers gets straight to the heart of the matter.

    1) You are claiming that each vNIC is mapped to an external Ethernet port with no local switching between vNICs.

    2) You are claiming that 40 servers, each having 10 vNICs for 10 different networks would only require 10 external ports on the “I/O Director” (one for each network that would be shared among vNICs associated to the same network).

    There is no way that #1 and #2 can be true at the same time. Here’s why…

    Lets say that vNIC-1 in each server connects to Network-1, and Network-1 is mapped to Port-1 on the I/O Director. Port-1 on the I/O Director is connected to Ethernet-1 on the upstream Ethernet switch. This would be the topology if your claim #2 is true. Fair enough?

    OK, now what if Server-1 needs to send traffic to Server-2 on Network-1? If your claim #1 is true (no local switching), Server-1′s packet to Server-2 will be forwarded up to Ethernet-1 … and guess what … Ethernet-1 will DROP THE PACKET!

    Why does Ethernet-1 drop the packet? Because Ethernet-1 has also learned the MAC address for Server-2 on that same port, and a fundamental rule of Ethernet switching is that a switch will not forward packets received on a port who’s destination address is the same port from which the packet was received.

    The only way your claim #2 works is if the Xsigo I/O Director provides local switching for traffic between Server-1 and Server-2.

    The only way your claim #1 works is if the Xsigo I/O Director had 400 external ports for this scenario, a 1:1 mapping of vNIC to external port.

    The fact is Ariel, you either wrong with claim #1, or claim #2 — Which one is it?

    Lets assume claim #2 is true and the “I/O Director” does provide local switching between vNICs. For local switching you would need to define in the “I/O Director” which vNICs on each server belong to the same network and mapping an also defining an external port to that same network — that is the same concept of a VLAN.

    Cheers,
    Brad

1 · 2 ·

Comments are now closed.