UCS Class Wrap-Up

Last week’s partner boot camp for the Cisco Unified Computing System (UCS) was very helpful. It has really helped me gain a better understanding of the solution, how it works, and its advantages and disadvantages. I’d like to share some random bits of information I gathered during the class here in the hopes that it will serve as a useful add-on to the formal training. I’m sorry the thoughts aren’t better organized.

  • Although the UCS 6100 fabric interconnects are based on Nexus 5000 technologies, they are not the same. It would be best for you not to compare the two, or you’ll find yourself getting confused (I did, at least) because there are some things the Nexus 5000 will do that the fabric interconnects won’t do. Granted, some of these differences are the result of design decisions around the UCS, but they are differences nonetheless.
  • You’ll see the terms “northbound” and “southbound” used extensively throughout UCS documentation. Northbound traffic is traffic headed out of the UCS (out of the UCS 6100 fabric interconnects) to external Ethernet and Fibre Channel networks. Southbound traffic is traffic headed into the UCS (out of the UCS 6100 fabric interconnects to the I/O modules in the chassis). You may also see references to “east-to-west” traffic; this is traffic moving laterally from chassis to chassis within a UCS.
  • For a couple of different reasons (reasons I will expand upon in future posts), there is no northbound FCoE or FC connectivity out of the UCS 6100 fabric interconnects. This means that you cannot hook your storage directly into the UCS 6100 fabric interconnects. This, in turn, means that purchasing a UCS alone is not a complete solution—customers need supporting infrastructure in order to install a UCS. That supporting infrastructure would include a Fibre Channel fabric and 10Gbps Ethernet ports.
  • Continuing the previous thought, this means that—with regard to UCS, at least—my previous assertion that there is no such thing as an end-to-end FCoE solution is true. (Read my correction article and you’ll see that I qualified the presence of end-to-end FCoE solutions as solutions that did not include UCS.)
  • The I/O Modules (IOMs) in the back of each chassis are fabric extenders, not switches. This is analogous to the Nexus 5000-Nexus 2000 relationship. (Again, be careful about the comparisons, though.) You’ll see the IOMs occasionally referred to as fabric extenders, or FEXs. As a result, there is no switching functionality in each chassis—all switching takes place within the UCS 6100 fabric interconnects. Some of the implications of this architecture include:
    1. All east-to-west traffic must travel through the fabric interconnects, even for east-to-west traffic between two blades in the same chassis.
    2. When you use the Cisco “Palo” adapter and start creating multiple virtual NICs and/or virtual HBAs, the requirement for all east-to-west traffic applies to each individual vNIC. This means that east-to-west traffic between individual vNIC instances on the same blade must also travel through the fabric interconnects.
    3. This means that in ESX/ESXi environments using hypervisor bypass (VMDirectPath) with Cisco’s “Palo” adapter, inter-VM traffic between VMs on the same host must travel through the fabric interconnects. (This is not true if you are using a software switch, including the Nexus 1000V, but rather only when using hypervisor bypass.)
  • Each IOM can connect to a single fabric interconnect only. You cannot uplink a single IOM to both fabric interconnects. For full redundancy, then, you must have both fabric interconnects and both IOMs in each and every chassis.
  • Each 10Gbps port on a blade connects to a single IOM. To use both ports on a mezzanine adapter, you must have both IOMs in the chassis; to have both IOMs in the chassis, you must have both fabric interconnects. This makes the initial cost much higher (because you have to buy everything), but incremental cost much lower.
  • If you want to use FCoE, you must purchase the Cisco “Menlo” adapter. This will provide both a virtual NIC (vNIC) and a virtual HBA (vHBA) for each IOM populated in the chassis (i.e., populate the chassis with a single IOM and you get one vNIC and one vHBA, use two IOMs and get two vNICs and two vHBAs).
  • If you use the Cisco “Oplin” adapter, you’ll get 10Gbps Ethernet only. There is no FCoE support; you would have to use a software-based FCoE stack.
  • The Cisco “Palo” adapter offers the ability to use SR-IOV to present multiple, discrete instances of vNICs and vHBAs. The number of instances is based on the number of uplinks from the IOMs to the fabric interconnects. The formula for calculating this number is 15 * (IOM uplinks) – 2. So, for two uplinks, you could create a total of 28 vNICs or vHBAs (any combination of the two, not 28 each).
  • Blades within a UCS are designed to be completely stateless; the full identity of the system can be assigned dynamically using a service profile. However, to take full advantage of this statelessness, organizations will also have to use boot-from-SAN. This further echoes the need for organizations to dramatically re-architect in order to really exploit the value of UCS.
  • There are Linux kernels embedded everywhere: in the blades firmware, in the firmware of the IOMs, in the chassis, and in the fabric interconnects. On the blades, this embedded Linux version is referred to as pnuOS. (At the moment, I can’t recall what it stands for. Sorry.)
  • In order to reconfigure a blade, the UCS Manager boots into pnuOS, reconfigures the blade, and then boots “normally.” While this is kind of cool, it also makes the reconfiguration of a blade take a lot longer than I expected. Frankly, I was a bit disappointed at the time it took to associate or de-associate a service profile to a blade.
  • To monitor the status of a service profile association or de-association, you’ll use the FSM (Finite State Machine) tab within UCS Manager.
  • You’ll need a separate block of IP addresses, presumably on a separate VLAN, for each blade. These addresses are the management addresses for the blades. Cisco folks won’t like this analogy, but consider these the equivalent of Enclosure Bay IP Addressing (EBIPA) in the HP c7000 environment.
  • The UCS Manager software is written in Java. Need I say anything further?
  • UCS Manager uses the idea of a “service profile” to control the entire identity of the server. However, admins must be careful when creating and associating service profiles. A service profile that has two vNICs assigned would require a blade in a chassis with two IOMs connected to two fabric interconnects, and that service profile would fail to associate to a blade in a chassis with only a single IOM. Similarly, a service profile that defines both vNICs and vHBAs (assuming the presence of the “Menlo” or “Palo” adapters) would fail to associate to a blade with an “Oplin” adapter because the “Oplin” adapter doesn’t provide vHBA functionality. The onus is upon the administrator to ensure that the service profile is properly configured for the hardware. Once again, I was disappointed that the system was not more resilient in this regard.
  • Each service profile can be associated to exactly one blade, and each blade may be associated to exactly one service profile. To apply the same type of configuration to multiple blades, you would have to use a service profile template to create multiple, identical service profiles. However, a change to one of those service profiles will not affect any of the other service profiles cloned from the same template.
  • UCS Manager does offer role-based access control (RBAC), which means that different groups within the organization can be assigned different roles: the network group can manage networking, the storage group can manage the SAN aspects, and the server admins can manage the servers. This effectively addresses the concerns of some opponents that UCS places the network team in control.
  • While UCS supports some operating systems on the bare metal, it really was designed with virtualization in mind. ESX 4.0.0 (supposedly) installs out of the box, although I have yet to actually try that myself. The “Palo” adapter is built for VMDirectPath; in fact, Cisco makes a big deal about hypervisor bypass (that’s a topic I’ll address in a future post). With that in mind, some of the drawbacks—such as how long it takes to associate or de-associate a blade—become a little less pertinent.

I guess that about does it for now. I’ll update this post with more information as I recall/remember it over the next few days. I also encourage other readers who have attended similar UCS events to share any additional points in the comments below.

Tags: , , , , , , , ,

  1. Brad Hedlund’s avatar

    Scott,

    Great notes. I just have one comment to your statement: “The onus is upon the administrator to ensure that the service profile is properly configured for the hardware.”

    You can place some onus on UCS Manager by way of Blade pools and pool qualifications. For example, I can create a Blade pool called “ESX Servers” and have one of the qualifications of a blade being in that pool be that it has a “Menlo” or “Palo” adapter, among many other things. When a new blade is inserted into the UCS chassis it can be automatically inventoried and placed into all qualifying pools. As the administrator I can then create a service profile template called “ESX Server” based on the criteria for the “ESX Servers” pool.

    Granted, I need to make sure the requirements of my service profile template meet the qualifications of the pool I’m associating it to, but at least I only had to think about that once, not every time I need to deploy a new ESX Server in this case.

    Cheers,
    Brad

  2. rodos’s avatar

    Scott, thanks for writing this up.

    To answer one of your questions, pnuOS is the “Processor Node Utility OS” but its now called the “UCS Utility OS”, not sure if it has an acronym.

    Rodos

  3. slowe’s avatar

    Rodos, thanks! That does sound familiar now that you mention it.

  4. Dave Alexander’s avatar

    Scott –

    A few additional points.

    The blade management IP addresses you refer to must be routable on the VLAN that the Fabric Interconnect management (mgmt0) ports are connected to. In practice, this means that they will very likely be in the same subnet as your mgmt0 address, any funky “multiple-subnets in a VLAN” solutions aside. These addresses are used for KVM-over-IP access to the blades, as well as the optional exposure of the blade hardware via IPMI. This happens completely outside of the blades’ “production” VLANs.

    Brad’s comments about using hardware pools are right on. In actual practice, you’re not very likely to deploy a service profile directly to a specific compute node, you’d be using pools. Even if you opted not to create discrete pools for your different functions, you would still create a single pool containing all of your hardware resources and deploy service profiles from that pool. UCS Manager would then select a compute node that meets all of the requirements of the service profile.

    I’ve heard some complaints as well about the amount of time required to associate a service profile with a compute node. Honestly, I’ve not seen any other solution out there that can take bare metal, configure the networking and SAN connectivty, set up virtualized identifiers (WWNs, MAC, UUID, etc), configure BIOS boot orders, and make it available for OS install in 5 minutes like the UCS can. Sure, if it did it in 30 seconds, that would be great – but doesn’t it beat the hours of doing it the “old” way? Even if you can do it in 5 minutes today with your existing solution, it breaks as soon as you move to another chassis… which doesn’t happen in UCS.

    The fact that all blade-to-blade traffic (or VM-to-VM in the case of Hypervisor Bypass) goes through the Fabric Interconnects is a *good* thing. A *very* good thing. The two major reasons for this are uniformity in network policy application and consistent latency between blades.

    By routing all traffic through the Fabric Interconnect, the latency from blade to blade is the same regardless of the chassis in which each resides. This reduces the workload of the server/network administrators in trying to plan for chatty applications to reside in the same chassis as with traditional solutions. If you need to move a service profile from one blade to another, say due to additional hardware requirements or maintenance, you again don’t have to consider the target chassis – the latency between two blades will be the same regardless of chassis. This of course assumes that no chassis is overloading its uplinks to the Fabric Interconnect, but that’s a discussion for another post. :)

    Finally, by ensuring that all traffic flows through the Fabric Interconnect, network administrators regain the ability enforce network policy on all traffic in the data center – even blade to blade or VM-to-VM in the case of Hypervisor Bypass. This is a huge win for consistency, managablility, and compliance monitoring.

    Just my $0.02, hope it was helpful.

    - Dave Alexander

  5. Stuart Miniman’s avatar

    Scott,
    Clarification on the adapters – at initial release there are 4 adapters, the “cost” solution of Intel (Oplin) which as you state does not currently support FCoE, the “compatibility” solutions (which support the same FC driver family as the FC HBA equivalent) of Emulex (M71KR-E) and QLogic (M71KR-Q) – both of which support both FCoE and Networking capability and the “virtualization” solution of Cisco “Palo” which you describe.

  6. slowe’s avatar

    Dave,

    I appreciate your viewpoint. I do agree that the ability to completely assign the identity of a server is a great feature—I merely pointed out that I was disappointed in how long it took. Yes, it’s better than the “old” way. It still takes longer than I had expected. Perhaps this is due to the fact that you have to boot a Linux-based OS (pnuOS) every time you want to make that change. I’m sure that as further optimizations are made to the pnuOS that time will decrease.

    I will beg to differ—for now, at least—on the fact that having all traffic flow through the fabric interconnects is a good thing. That’s a discussion best saved for later, in a post of its own.

    Thanks for your 2 cents!

    Stu,

    At initial release there are 3 adapters, one of which has two flavors:
    - Oplin (“cost”)
    - Menlo (“compatibility”), which comes in an Emulex flavor or a Qlogic flavor
    - Palo (“virtualization”)

    Most people I’ve talked to aren’t breaking out the different variants of Menlo, but rather treating them as a single type. I suppose it’s all in how you look at it.

    Thanks for your comment!

  7. Dave Alexander’s avatar

    At first release, only the Oplin-based and Menlo-based mezzanine cards are available. The Palo chipset card will be coming a bit later… the exact timeframe is still up in the air, as far as I know. The Palo mezzanine cards exist and work… they’re just not being released in the intial batch of hardware.

    I’d love to be part of the “all traffic flows through the Fabric Interconnect” discussion. It’s actually a very similar discussion to the Cisco MDS architecture (whereby all traffic goes through the crossbar) versus the Brocade “switch on a chip” architecture (where some traffic is switched locally per ASIC). It’s a discussion I enjoy greatly. ;)

  8. Carl S.’s avatar

    Full disclosure – I am an HP employee, but not in any way an official spokesperson.

    I just wanted to add a bit to this discussion based on some of the earlier comments I read.

    “I’ve not seen any other solution out there that can take bare metal, configure the networking and SAN connectivty, set up virtualized identifiers (WWNs, MAC, UUID, etc), configure BIOS boot orders, and make it available for OS install in 5 minutes like the UCS can.”

    HP’s Virtual Connect has had the bare hardware configuration capability for over two years already – including boot from SAN configuration. A blade can be configured from bare metal ready to accept an OS in less than 2 minutes, conservatively speaking. If you create the server profile before the blade even hits the chassis, the actual power-on “provisioning time” can be as short as it takes to insert a blade and power it on – very fast. HP innovation in this area is one of the reasons Virtual Connect has achieved great acceptance from customers in a short time.

    The recent Virtual Connect Flex10 introduction has just sweetened the pot a bit so we can now do 10GbE, pre-provision up to 8 FlexNICs per half-height server, and provide a ton of flexibility within existing core infrastructure – 10Gb or not.

    Scott, please keep your analysis coming – I enjoy reading them.

  9. Brad Hedlund’s avatar

    Carl,

    To be fair, HP Virtual Connect does not provision BIOS and Adapter firmware or SAN/LAN network settings like UCS. When you move a Virtual Connect “Server Profile” from one blade to another there is no guarantee the new blade is running the same BIOS or Adapter firmware. Furthermore, there is no guarantee the upstream switch port is forwarding the proper VLANs or VSANs to make the migration successful. Since UCS is both the server and upstream switch, UCS is able to insure this consistency in the provisioning process.

    So, even if HP Virtual Connect does provision in 2 minutes, versus 5 minutes for UCS which is doing a lot more, what difference does that make when you end up having to manually provision other important pieces such as server firmware bundles and upstream network settings?

    Cheers,
    Brad

    (vendor disclosure: Cisco Systems)

  10. Christopher Reed’s avatar

    “If you want to use FCoE, you must purchase the Cisco “Menlo” adapter. This will provide both a virtual NIC (vNIC) and a virtual HBA (vHBA) for each IOM populated in the chassis (i.e., populate the chassis with a single IOM and you get one vNIC and one vHBA, use two IOMs and get two vNICs and two vHBAs).”

    Promise I am not nit picking. Enjoy the reading but the Palo also does FCoE. Keep up the good work. It might keep Steve Kaplin in line!

  11. slowe’s avatar

    Christopher Reed,

    You are correct; Palo also does FCoE. Of course, I’m not nit-picking, either, but Palo isn’t available to customers yet, so Menlo is the only option customers have. ;-)

  12. Bobby’s avatar

    Brad, As UCS “comes up to speed with Virtual Connect” sure UCS has some new things that I’m sure HP will be addressing, (Firmware and BIOS synchronizing sounds great !!) but I think there is some education here that the network adminstrators or otherwise non HP server customers need to know that have been out there for well over 2 years. I think Carl S. makes some great points. I’m reading a lot here and there seems to be a lot of focus on capabilities of FCoE which isn’t a mainstream Wing to Wing solution today by any means. Convergence at the infrastructure (server-rack and edge) is being addressed as phase 1 for sure. Phase 2 / 3 / 4 implementations which addresses End of Row / Core switches we must all admit wont be around for a couple at least years now. Also, another “not there yet technology is the VNtagging solution. I think instead of focusing on something that requires a Nexus, CISCO should jump on the protocols that “everyone” will adopt and use. VEPA.

    Disclosure. I’m a happy HP customer.

  13. Bobby’s avatar

    Scott,

    I just watched this video. I encourage all people looking at UCS to take an hour and watch this …

    http://hpbroadband.com/(S(jm0fp1zfyhx5e255enx0nw3q))/program.aspx?key=EnablingHardwareManagementJuly29

    -Bobby

  14. Tommi Salli’s avatar

    “Also, another “not there yet technology is the VNtagging solution. I think instead of focusing on something that requires a Nexus, CISCO should jump on the protocols that “everyone” will adopt and use.”

    UCS uses VNTag technology internally and using it does not require UCS to be connected to Nexus. In fact as VNtag is link local only it does not matter where you connect UCS to nor is VNTag ever visible to user or upstream switch.
    Only thing that is required to connect UCS to any environment is 10Gb Ethernet port and in the future you can connect it to 1Gb ports as well.

    Disclosure: Cisco UCS TME

  15. Sal Collora’s avatar

    For some screen shot videos and other UCS and FCoE/Nexus related material, please visit my website at http://salsdclist.com. Just click on UCS at the top of the screen and feel free to subscribe to RSS as I am constantly posting things.

    Disclosure: I am a happy Cisco employee.

  16. jermat’s avatar

    I do not want to make this about who has the better product argument. However I think blog entries like this tend to turn comments into comparisons, and then eventually into arguments about who does what better.

    As a consumer of Egenera, HP, and Cisco compute platforms let me just say that no one other than Egenera in my opinion is an innovator of these technologies. That said when I read a blog entry by Pete Manca of Egenera about the launch of the Cisco UCS, and saw the same type of conversation occuring I had to comment. Pete had to post some snarky comments abotu how Egenera did this first and for many years. Well at the end of the day it does not matter who does it first but about who does it best.

    Bottom line is that many companies will be developing products with features that solve the same pain points for the customer. While they may do it nearly the same, the result is intended to be identical; the mitigation of the pain point. Now some companies will do this better than others as is the case here. The example of the service profiles and HP having done this for 2 years just really does not do complete justice to Cisco and the service profiles.

    The fact is this, in many ways the Cisco UCS service profiles and how they work are more comprehensive and less complex than HP stateless operation. Brad’s comment about the firmware is huge and HP really falls flat by that single comparison. The only thing I have seen close to Cisco UCS and service profiles is the Egenera pServer model, which even falls short to some degree when compared to Cisco’s solution. The quality of the Cisco UCS for a 1.0 product I think speaks volumes, as they are coming late into an arena dominated by giants like HP and IBM, and from what I have seen it is a very legitimate product able to compete against the HP C-class which has become the defacto standard for blade servers.

    While it has it warts, I for one think the UCS is a great start and frankly what customer only buys from a single server vendor today? Now they have a real alternative from a company who brings a great name and reputation to the table. Cisco is no Egenera, and to that end I would simply say to happy HP customers, “Keep your eyes and mind open”.

  17. Carl S.’s avatar

    All,

    I’d really like to hear some more detail on your experience around how service profiles work. Some of this may be redundant from other postings, but I had so many questions that I figured I would throw them all out there.

    From what I gather there is exactly one service profile per blade (similar to HP VC). In addition to the MAC and LAN information, does the profile also include the WWNs (or are those handled differently?), and firmware images or something like that? Are the WWNs part of the profile regardless of the mezzanine card? Seems that with Menlo and Palo there would be different profile capabilities in that regard.
    Is the firmware image part of the service profile? How does option specific firmware get installed on the blade? What I am trying to understand is how the firmware update mentioned earlier is part of the service profile, and how it gets applied.
    In the case of different blades and different I/O adapters, does there have to be a service profile template for each hardware combination? What if a specific set of blades require a different firmware level than others? Can UCS manager handle multiple templates with different firmware versions?
    When firmware needs to be updated for a bunch of blades all at once, does each service profile need to be modified or is there a “one button” upgrade? This is all very intriguing concept, don’t want to assume how anything really works without hearing from a subject matter expert.

  18. gnijs’s avatar

    Great to discuss the technical details (i like that much), however it all comes down to this: Cisco sells servers, HP sells servers, and so much more: server management, deployment, provisioning, maintaining tools. Cisco is still lacking here. UCS is just a bios programming tool. it will not patch my servers, do server software inventory, alert me when a server is running at 100% cpu, or even warn me when a disk has crashed or do anything on the os level. i hope you agree UCS is not really targetted at “small” customers, more at “larger” customers. do you think that customers running 100+, 200+, 300+ servers are going to manage each server individually ?…if cisco don’t have the software themselves, do they provide agents to integrate in other third-party software ? does Cisco have HP insight agents for cisco blades ? i am sure ucs will be a fantastic solution……..within 1 or 2 years. try installing a server with 4 nics -today-, oops, full-width server with 2 CNA isn’t available yet, and ‘palo’ CNA neither…? so ?

  19. Dave Chapman’s avatar

    gnijs wrote “if cisco don’t have the software themselves, do they provide agents to integrate in other third-party software ? does Cisco have HP insight agents for cisco blades ?”

    Cisco knew from the beginning that being all things to all people with regard to management is an inherently un-winable prospect. UCS includes a comprehensive and open XML API to facilitate the development and integration of 3rd-party tools. Anything you can do in the UCS Manager can also be done via XML API.

    http://www.cisco.com/en/US/products/ps10281/products_programming_reference_guides_list.html

    BMC’s has been working with Cisco since the early days of UCS, and Blade Logic for UCS provides automation, OS provisioning, App provisioning and patch management. My description of Blade Logic’s capabilities is a very short sampling of what it can do.

    There is no need to wait ’1-2′ years for an enterprise-scalable solution. You can order it today.

  20. Craig’s avatar

    I think the UCS is target for large scale virtualization. When you guest utilization hit to the threshold, the alert will be generated from vcenter automatically. Well, there are improvement require for Cisco to monitor their host, but again that should be the monitoring tools company core business. I do not think every server vendor will want to do everything by themselves, and most of the time they allow the technology alliance partner to address for some of the pieces which are not their core technology

  21. Martin’s avatar

    Scott – who was the instructor?

  22. Tejo Prayaga’s avatar

    Great post Scott !!

    Just wanted to add some clarification for this point

    >>>
    However, a change to one of those service profiles will not affect any of the other service profiles cloned from the same template.
    >>>

    UCS has two variants of Service Profile Templates

    1) Initial Template
    2) Updating Template

    If a service profile is created from a “initial template”, it will get all the properties from template, but remain detached from the template i.e any later changes to the template are not applied to the service profile

    However, if the service profile is created from a “updating template”, it will get all the properties from the template, and remain attached to the template. Any later changes to the template will apply to all the service-profiles that used this template as “updating template”

    Disclosure: Cisco employee

  23. Martin’s avatar

    Carl>
    There is really no magic at all. The main thing is that besides all these pools (MAC, pWWN, nWWN, UUID, but also compute nodes (physical blades)) are also many policies, which can restrict Profile assignment and will choose only a suitable compute node. Policies can be based on memory, CPU sockets, firmware versions and mezannine card type – or any combination. This policy then can be also applied to Service Profile. Regard firmwares, there is a firmware store on UCS Fabric Interconnect from which you can choose your desired firmwares. These will be applied upon association process. Ale the state is applied via that pnuOS see article. I must say I was really impressed by such simplicity and efficiency in helping out an administrator.

Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>