FCoE

You are currently browsing articles tagged FCoE.

Vendor Meetings at VMworld 2013

This year at VMworld, I wasn’t in any of the breakout sessions because employees aren’t allowed to register for breakout sessions in advance; we have to wait in the standby line to see if we can get in at the last minute. So, I decided to meet with some vendors that seemed interesting and get some additional information on their products. Here’s the write-up on some of the vendor meetings I’ve attended while in San Francisco.

Jeda Networks

I’ve mentioned Jeda Networks before (see here), and I was pretty excited to have the opportunity to sit down with a couple of guys from Jeda to get more details on what they’re doing. Jeda Networks describes themselves as a “software-defined storage networking” company. Given my previous role at EMC (involved in storage) and my current role at VMware focused on network virtualization (which encompasses SDN), I was quite curious.

Basically, what Jeda Networks does is create a software-based FCoE overlay on an existing Ethernet network. Jeda accomplishes this by actually programming the physical Ethernet switches (they have a series of plug-ins for the various vendors and product lines; adding a new switch just means adding a new plug-in). In the future, when OpenFlow or its derivatives become more ubiquitous, I could see using those control plane technologies to accomplish the same task. It’s a fascinating idea, though I question how valuable a software-based FCoE overlay is in a world that seems to be rapidly moving everything to IP. Even so, I’m going to keep an eye on Jeda to see how things progress.

Diablo Technologies

Diablo was a new company to me; I hadn’t heard of them before their PR firm contacted me about a meeting while at VMworld. Diablo has created what they call Memory Channel Storage, which puts NAND flash on a DIMM. Basically, it makes high-capacity flash storage accessible via the CPU’s memory bus. To take advantage of high-capacity flash in the memory bus, Diablo supplies drivers for all the major operating systems (OSes), including ESXi, and what this driver does is modify the way that page swaps are handled. Instead of page swaps moving data from memory to disk—as would be the case in a traditional virtual memory system—the page swaps happen between DRAM on the memory bus and Diablo’s flash on the memory bus. This means that page swaps are extremely fast (on the level of microseconds, not the milliseconds typically seen with disks).

To use the extra capacity, then, administrators must essentially “overcommit” their hosts. Say your hosts had 64GB of (traditional) RAM installed, but 2TB of Diablo’s DIMM-based flash installed. You’d then allocate 2TB of memory to VMs, and the hypervisor would swap pages at extremely high speed between the DRAM and the DIMM-based flash. At that point, the system DRAM almost looks like another level of cache.

This “overcommitment” technique could have some negative effects on existing monitoring systems that are unaware of the underlying hardware configuration. Memory utilization would essentially run at 100% constantly, though the speed of the DIMM-based flash on the memory bus would mean you wouldn’t take a performance hit.

In the future, Diablo is looking for ways to make their DIMM-based flash appear to an OS as addressable memory, so that the OS would just see 3.2TB (or whatever) of RAM, and access it accordingly. There are a number of technical challenges there, not the least of which is ensuring proper latency and performance characteristics. If they can resolve these technical challenges, we could be looking at a very different landscape in the near future. Consider the effects of cost-effective servers with 3TB (or more) of RAM installed. What effect might that have on modern data centers?

HyTrust

HyTrust is a company with whom I’ve been in contact for several years now (since early 2009). Although HyTrust has been profitable for some time now, they recently announced a new round of funding intended to help accelerate their growth (though they’re already on track to quadruple sales this year). I chatted with Eric Chiu, President and founder of HyTrust, and we talked about a number of areas. I was interested to learn that HyTrust had officially productized a proof-of-concept from 2010 leveraging Intel’s TPM/TXT functionality to perform attestation of ESXi hypervisors (this basically means that HyTrust can verify the integrity of the hypervisor as a trusted platform). They also recently introduced “two man” support; that is, support for actions to be approved or denied by a second party. For example, an administrator might try to delete a VM, but that deletion would need to be approved by a second party before it is allowed to proceed. HyTrust also continues to explore other integration points with related technologies, such as OpenStack, NSX, physical networking gear, and converged infrastructure. Be sure to keep an eye on HyTrust—I think they’re going to be doing some pretty cool things in the near future.

Vormetric

Vormetric interested me because they offer a data encryption product, and I was interested to see how—if at all—they integrated with VMware vSphere. It turns out they don’t integrate with vSphere at all, as their product is really more tightly integrated at the OS level. For example, their product runs natively as an agent/daemon/service on various UNIX platforms, various Linux distributions, and all recent versions of Windows Server. This gives them very fine-grained control over data access. Given their focus is on “protecting the data,” this makes sense. Vormetric also offers a few related products, like a key management solution and a certificate management solution.

SimpliVity

SimpliVity is one of a number of vendors touting “hyperconvergence,” which—as far as I can tell—basically means putting storage and compute together on the same node. (If there is a better definition, please let me know.) In that regard, they could be considered similar to Nutanix. I chatted with one of the designers of the SimpliVity OmniCube. SimpliVity leverages VM-based storage controllers that leverage VMDirectPath for accelerated access to the underlying hardware, and present that underlying hardware back to the ESXi nodes as NFS storage. Their file system—developed during the 3 years they spent in stealth mode—abstracts away the hardware so that adding OmniCubes means adding both capacity and I/O (as well as compute). They use inline deduplication not only to reduce storage capacity, but especially to avoid having to write I/Os to the storage in the first place. (Capacity isn’t usually the issue; I/Os are typically the issue.) SimpliVity’s file system enables fast backups and fast clones; although they didn’t elaborate, I would assume they are using a pointer-based system (perhaps even an optimized content-addressed storage [CAS] model) that keeps them from having to copy large amounts of data around the system. This is what enables them to do global deduplication, backups from any system to any other system, and restores from any system to any other system (system here referring to an OmniCube).

In any case, SimpliVity looks very interesting due to its feature set. It will be interesting to see how they develop and mature.

SanDisk FlashSoft

This was probably one of the more fascinating meetings I had at the conference. SanDisk FlashSoft is a flash-based caching product that supports various OSes, including an in-kernel driver for ESXi. What made this product interesting was that SanDisk brought out one of the key architects behind the solution, who went through their design philosophy and the decisions they’d made in their architecture in great detail. It was a highly entertaining discussion.

More than just entertaining, though, it was really informative. FlashSoft aims to keep their caching layer as full of dirty data as possible, rather than seeking to flush dirty data right away. The advantage this offers is that if another change to that data comes, FlashSoft can discard the earlier change and only keep the latest change—thus eliminating I/Os to the back-end disks entirely. Further, by keeping as much data in their caching layer as possible, FlashSoft has a better ability to coalesce I/Os to the back-end, further reducing the I/O load. FlashSoft supports both write-through and write-back models, and leverages a cache coherency/consistency model that allows them to support write-back with VM migration without having to leverage the network (and without having to incur the CPU overhead that comes with copying data across the network). I very much enjoyed learning more about FlashSoft’s product and architecture. It’s just a shame that I don’t have any SSDs in my home lab that would benefit from FlashSoft.

SwiftStack

My last meeting of the week was with a couple folks from SwiftStack. We sat down to chat about Swift, SwiftStack, and object storage, and discussed how they are seeing the adoption of Swift in lots of areas—not just with OpenStack, either. That seems to be a pretty common misconception (that OpenStack is required to use Swift). SwiftStack is working on some nice enhancements to Swift that hopefully will show up soon, including erasure coding support and greater policy support.

Summary and Wrap-Up

I really appreciate the time that each company took to meet with me and share the details of their particular solution. One key takeaway for me was that there is still lots of room for innovation. Very cool stuff is ahead of us—it’s an exciting time to be in technology!

Tags: , , , , , ,

This is session CLDS006, “Exploring New Intel Xeon Processor E5 Based Platform Optimizations for 10 Gb Ethernet Network Infrastructures.” That’s a long title! The speakers are Brian Johnson from Intel and Alex Rodriguez with Expedient.

The session starts with Rodriguez giving a (thankfully) brief overview of Expedient and then getting into the evolution of networking with 10 Gigabit Ethernet (GbE). Rodriguez provides the usual “massive growth” numbers that necessitated Expedient’s relatively recent migration to 10 GbE in their data center. As a provider, Expedient has to balance five core resources: compute, storage (capacity), storage (performance), network I/O, and memory. Expedient found that migrating to 10 GbE actually “unlocked” additional performance headroom in the other resources, which wasn’t expected. Using 10 GbE also matched upgrades in the other resource areas (more CPU power, more RAM through more slots and higher DIMM densities, larger capacity drives, and SSDs).

Rodriguez turns the session over to Brian Johnson, who will focus on some of the specific technologies Intel provides for 10 GbE environments. After briefly discussing various form factors for 10 GbE connectivity, Johnson moves into a discussion of some of the I/O differences between Intel’s 5500/5600 processors and the E5 processors. The E5 processors integrate PCI Express root ports, providing upwards of 200 Gbps of throughput. This is compared to the use of the “Southbridge” with the 5500/5600 series CPUs, which were limited to about 50 Gbps.

Integrated I/O in the E5 CPUs has also allowed Intel to introduce something like Intel Data Direct I/O (DDIO). DDIO allows PCIe devices to DMA information directly to cache–instead of main memory–where it can then be fetched by a processor core. This results in reduced memory transactions and, as a result, greater performance. The end result is that the E5 CPUs can support more throughput on more ports than previous generation CPUs (up to 150 Gbps across 16 10 GbE ports with an E5-2600 CPU).

Johnson also points out that the use of AES-NI helps with the performance of encryption, and turns the session back over to Rodriguez. Rodriguez shares some details on Expedient’s experience with Intel AES-NI, 10 GbE, and DDIO. In some tests that Expedient performed, throughput increased from 5.3 Gbps at ~91% CPU utilization with a Xeon 5500 (no AES-NI) to 33.3 Gpbs at ~79% CPU utilization on an E5-2600 with AES-NI support. (These tests were 256-bit SSL tests with Open SSL.)

Rodriguez shares some of the reasons why Expedient felt 10 GbE was the right choice for their data center. Using 1 GbE would have required too many ports, too many cables, and too many switches; 10 GbE offered Expedient a 23% reduction in cables and ports, a 14% reduction in infrastructure costs, and offered a significant bandwidth improvement (compared to the previous 1 GbE architecture).

Next the presentation shifts focus a little bit to discuss FCoE. Rodriguez goes over the reasons that Expedient is evaluating FCoE for their data center. Expedient is looking to build the first Cat6a-based 10GBase-T FCoE environment leveraging FC-BB-6 and VN2VN standards.

Johnson takes back over again to discuss some of the specific technical items behind Expedient’s FCoE initiative. Johnson shows a great diagram that reviews all the various types of VM-to-VM communications that can exist in modern data centers:

  • VM-to-VM (on the same host) via the software-based virtual switch (could be speeds of 30 to 40 Gbps in this use case)
  • VM-to-VM (on the same host) via a hardware-based virtual switch in an SR-IOV network interface card (NIC)
  • VM-to-VM (on a different host) over a traditional external switch

One scenario that Johnson didn’t cover was VM-to-VM traffic (on different hosts) over a fabric extender (interface virtualizer) environment, such as a Cisco Nexus 2000 connected up to a Nexus 5000 (there are some considerations there; I’ll try to discuss those in a separate post).

Intel VT-c actually provides a couple of different ways to work in virtualized environments. VMDq can provide a hardware assist when the hypervisor softswitch is involved, or you can use hypervisor bypass and SR-IOV to attach VMs directly to VFs (virtual functions). Johnson shows that the E5 processor provides higher throughput at lower CPU usage with VMDq compared to a Xeon 5500 CPU (tests were done using an Intel X520 with VMware ESXi 5.0). Using SR-IOV–support for which is included in vSphere 5.1 as well as Microsoft Windows Server 2012 and Hyper-V–allows VMware customers to use DirectPath I/O to assign VMs directly to a VF, bypassing the hypervisor. (Note that there are trade-offs as a result.) In this case, the switching is done in hardware in the SR-IOV NIC. The use of SR-IOV shows dramatic improvements in throughput with small packet sizes as well as significant reductions in CPU utilization. Because of the trade-offs associated with SR-IOV (no hypervisor intervention, no vMotion on vSphere, etc.), it’s not a great general-purpose solution. It is, however, very well-suited to workloads that need predictable performance and that work with lots of small packets (firewalls, load balancers, other network devices).

Going back to the earlier discussion about PCIe root ports being integrated into the E5 CPUs, this leads to a consideration for the placement of PCIe cards. Make sure your high-speed cards aren’t inserted in a slot that runs through the C600 chipset southbridge. Make sure that you are using Gen2 x8 slot, and make sure that the slot is actually wired to support a x8 card (some slots on some systems have a x8 connector but are only wired for x4 throughput). Johnson recommends using either LoM, slot 2, slot 3, or slot 5 for 10 GbE PCIe NICs; this will ensure direct connections to one of the CPUs and not to the southbridge chipset.

Johnson next transitions into a discussion of VF failover using NIC teaming software. There’s a ton of knowledge disclosed (too much for me to capture; I’ll try to do a separate blog post on it). The key takewaway: don’t use NCI teaming in the guest when using SR-IOV VFs, or else traffic patterns could vary dramatically and create unpredictable results without very careful planning. Johnson also mentions DPDK; see this post for more details.

At this point, Johnson wraps up the session with a summary of key Intel initiatives with regard to networking (optimized drivers and initiators, intelligent use of offloads, everything based on open standards) and then opens the floor to questions.

Tags: , , , ,

Welcome to Technology Short Take #23, another collection of links and thoughts related to data center technologies like networking, storage, security, cloud computing, and virtualization. As usual, we have a fairly wide-ranging collection of items this time around. Enjoy!

Networking

  • A couple of days ago I learned that there are a couple open source implementations of LISP (Locator/ID Separation Protocol). There’s OpenLISP, which runs on FreeBSD, and there’s also a project called LISPmob that brings LISP to Linux. From what I can tell, LISPmob appears to be a bit more focused on the endpoint than OpenLISP.
  • In an earlier post on STT, I mentioned that STT’s re-use of the TCP header structure could cause problems with intermediate devices. It looks like someone has figured out how to allow STT through a Cisco ASA firewall; the configuration is here.
  • Jose Barreto posted a nice breakdown of SMB Multichannel, a bandwidth-enhancing feature of SMB 3.0 that will be included in Windows Server 2012. It is, unexpectedly, only supported between two SMB 3.0-capable endpoints (which, at this time, means two Windows Server 2012 hosts). Hopefully additional vendors will adopt SMB 3.0 as a network storage protocol. Just don’t call it CIFS!
  • Reading this article, you might deduce that Ivan really likes overlay/tunneling protocols. I am, of course, far from a networking expert, but I do have to ask: at what point does it become necessary (if ever) to move some of the intelligence “deeper” into the stack? Networking experts everywhere advocate the “complex edge-simple core” design, but does it ever make sense to move certain parts of the edge’s complexity into the core? Do we hamper innovation by insisting that the core always remain simple? As I said, I’m not an expert, so perhaps these are stupid questions.
  • Massimo Re Ferre posted a good article on a typical VXLAN use case. Read this if you’re looking for a more concrete example of how VXLAN could be used in a typical enterprise data center.
  • Bruce Davie of Nicira helps explain the difference between VPNs and network virtualization; this is a nice companion article to his colleague’s post (which Bruce helped to author) on the difference between network virtualization and software-defined networking (SDN).
  • The folks at Nicira also collaborated on this post regarding software overhead of tunneling. The results clearly favor STT (which was designed to take advantage of NIC offloading) over GRE, but the authors do admit that as “GRE awareness” is added to the cards that protocol’s performance will improve.
  • Oh, and while we’re on the topic of SDN…you might have noticed that VMware has taken to using the term “software-defined” to describe many of the services that vSphere (and related products) provide. This includes the use of software-defined networking (SDN) to describe the functionality of vSwitches, distributed vSwitches, vShield, and other features. Personally, I think that the term software-based networking (SBN) is far more applicable than SDN to what VMware does. It is just me?
  • Brad Hedlund wrote this post a few months ago, but I’m just now getting around to commenting about it. The gist of the article—forgive me if I munge it too much, Brad—is that the use of open source software components might dramatically change the shape/way/means in which networking protocols and standards are created and utilized. If two components are communicating over the network via open source components, is some sort of networking standard needed to avoid being “proprietary”? It’s an interesting thought, and goes to show the power of open source on the IT industry. Great post, Brad.
  • One more mention of OpenFlow/SDN: it’s great technology (and I’m excited about the possibilities that it creates), but it’s not a silver bullet for scalability.

Security

  • I came across this interesting post on a security attack based on VMDKs. It’s quite an interesting read, even if the probability of being able to actually leverage this attack vector is fairly low (as I understand it).

Storage

  • Chris Wahl has a good series on NFS with VMware vSphere. You can catch the start of the series here. One comment on the testing he performs in the “Same Subnet” article: if I’m not mistaken, I believe the VMkernel selection is based upon which VMkernel interface is listed in the first routing table entry for the subnet. This is something about which I wrote back in 2008, but I’m glad to see Chris bringing it to light again.
  • George Crump published this article on using DCB to enhance iSCSI. (Note: The article is quite favorable to Dell, and George discloses an affiliation with Dell at the end of the article.) One thing I did want to point out is that—if I recall correctly—the 802.1Qbb standard for Priority Flow Control only defines a single “no drop” class of service (CoS). Normally that CoS is assigned to FCoE traffic, but in an environment without FCoE you could assign it to iSCSI. In an environment with both, that could be a potential problem, as I see it. Feel free to correct me in the comments if my understanding is incorrect.
  • Microsoft is introducing data deduplication in Windows Server 2012, and here is a good post providing an introduction to Microsoft’s deduplication implementation.
  • SANRAD VXL looks interesting—anyone have any experience with it? Or more detailed technical information?
  • I really enjoyed Scott Drummonds’ recent storage performance analysis post. He goes pretty deep into some storage concepts and provides real-world, relevant information and recommendations. Good stuff.

Cloud Computing/Cloud Management

  • After moving CloudStack to the Apache Software Foundation, Citrix published this discourse on “open washing” and provides a set of questions to determine the “openness” of software projects with which you may become involved. While the article is clearly structured to favor Citrix and CloudStack, the underlying point—to understand exactly what “open source” means to your vendors—is valid and worth consideration.
  • Per the AWS blog, you can now export EC2 instances out of Amazon and into another environment, including VMware, Hyper-V, and Xen environments. I guess this kind of puts a dent in the whole “Hotel California” marketing play that some vendors have been using to describe Amazon.
  • Unless you’ve been hiding under a rock for the past few weeks, you’ve most likely heard about Nick Weaver’s Razor project. (If you haven’t heard about it, here’s Nick’s blog post on it.) To help with the adoption/use of Razor, Nick also recently announced an overview of the Razor API.

Virtualization

  • Frank Denneman continues to do a great job writing solid technical articles. The latest article to catch my eye (and I’m sure that I missed some) was this post on combining affinity rule types.
  • This is an interesting post on a vSphere 5 networking bug affecting iSCSI that was fixed in vSphere 5.0 Update 1.
  • Make a note of this VMware KB article regarding UDP traffic on Linux guests using VMXNET3; the workaround today is using E1000 instead.
  • This post is actually over a year old, but I just came across it: Luc Dekens posted a PowerCLI script that allows a user to find the maximum IOPS values over the last 5 minutes for a number of VMs. That’s handy. (BTW, I have fixed the error that kept me from seeing the post when it was first published—I’ve now subscribed to Luc’s blog.)
  • Want to use a Debian server to provide NFS for your VMware environment? Here is some information that might prove helpful.
  • Jeremy Waldrop of Varrow provides some information on creating a custom installation ISO for ESXi 5, Nexus 1000V, and PowerPath/VE. Cool!
  • Cormac Hogan continues to pump out some very useful storage-focused articles on the official VMware vSphere blog. For example, both the VMFS locking article and the article on extending an EagerZeroedThick disk were great posts. I sincerely hope that Cormac keeps up the great work.
  • Thanks to this Project Kronos page, I’ve been able to successfully set up XCP on Ubuntu Server 12.04 LTS. Here’s hoping it gets easier in future releases.
  • Chris Colotti takes on some vCloud Director “challenges”, mostly surrounding vShield Edge and vCloud Director’s reliance on vShield Edge for specific networking configurations. While I do agree with many of Chris’ points, I personally would disagree that using vSphere HA to protect vShield Edge is an acceptable configuration. I was also unable to find any articles that describe how to use vSphere FT to protect the deployed vShield appliances. Can anyone point out one or more of those articles? (Put them in the comments.)
  • Want to use Puppet to automate the deployment of vCenter Server? See here.

I guess it’s time to wrap up now, lest my “short take” get even longer than it already is! Thanks for reading this far, and I hope that I’ve shared something useful with you. Feel free to speak up in the comments if you have questions, thoughts, or clarifications.

Tags: , , , , , , , , , , , , , , , , ,

This is a short post, but one that I hope will stir some discussion.

Earlier this evening, I read Maish’s blog post titled My Wish for dvFabric—a dvSwitch for Storage. In that blog post Maish describes a self-configuring storage fabric that would simplify how we provision storage in a vSphere environment. Here’s how Maish describes the dvFabric:

So how do I envision this dvFabric? Essentially the same as a dvSwitch. A logical entity to which I attach my network cards (for iSCSI/NFS) or my HBA’s (for FcoE or FC). I define which uplink goes to which storage, what the multi-pathing policy is for this uplink, how many ports should be used, what is the failover policy for which NIC, which NFS volumes to mount, which LUNS to add – I gather you see what I am getting at.

It’s a pretty interesting idea, and one with a great deal of merit. So here’s the “Thinking Out Loud” part: is Target Driven Zoning (or Peer Zoning) the answer to a large part of Maish’s dvFabric?

If you don’t know what Target Driven Zoning (TDZ) or Peer Zoning are, I recommend you go have a look at Erik Smith’s introductory blog post on Target Driven Zoning. Based on Erik’s description of TDZ, it certainly seems like it could be used to help on the block side of the house with Maish’s dvFabric idea.

So what do you think—am I way off here?

Tags: , , ,

I recently read over Stephen Foskett’s Eight Unresolved Questions About FCoE article, in which he outlines eight topics that should be addressed/prioritized in order to make FCoE, in his words, “truly world-class”.

One of these topics was virtual machine mobility. Here’s what Stephen wrote under the heading “How Do You Handle Virtual Machine Mobility?”:

As I described recently, virtual machine mobility is a major technical challenge for existing networks. The VMware proposal, the VXLAN, seems to be gaining traction right now. But this is only a solution for data networking. How will FCoE SANs handle virtual machine mobility? This remains unresolved as far as I can tell, though Ethernet switch vendors have come up with their own answers. Brocade demonstrated just such a solution at Networking Field Day 2, and I know that others have answers as well. But will there be an interoperable industry solution?

My response to Stephen’s question is pretty simple: VM mobility isn’t a problem for FCoE. VM mobility is a problem for all storage, regardless of protocol.

In my view, tying VM mobility to FCoE is simply buzzword hype. Let’s look at this from a technical perspective—what is so different about FCoE compared to FC, iSCSI, or NFS with regard to VM mobility? Nothing! Regardless of storage protocol, when you move a VM from one data center to another data center, the VM’s storage is still going to sit back in the original data center. The protocol whereby we access that data doesn’t change that fact.

Yes, you can route the IP-based protocols (iSCSI and NFS), which might give you some additional flexibility. But the migrated VM’s storage is still going to sit in the original data center—the only thing you’ve accomplished is introduced some additional latency by adding routers into the mix.

As you can see, VM mobility and storage isn’t an FCoE problem—it’s a storage problem, plain and simple. Protocols can’t fix it. What’s needed to fix it is a fundamental shift in how storage is designed. We need distributed file systems (not just clustered file systems), distributed caching algorithms, greater integration between the block storage and the distributed file systems, greater hypervisor awareness of the underlying storage topology, and more efficient data movement mechanisms. Until then, we can rage all we want about how this protocol or that protocol doesn’t address VM mobility, but we’re simply missing the mark.

Tags: , , ,

One of the features added in vSphere 5 was a software FCoE initiator. This is a feature that has gotten relatively little attention, other than a brief mention here and there. I’m not entirely sure why the software FCoE initiator in vSphere 5 hasn’t gotten more attention, but this past week I had the opportunity to work with the software FCoE initiator in a bit more detail. In this post I’m going to describe in a bit more detail how to set up the software FCoE initiator; in future posts, I hope to be able to provide some performance and troubleshooting information.

Prerequisites

In order to use the software FCoE initiator, you must have a network interface card (NIC) that supports the FCoE offloads. The Intel X520 NICs certainly do; these are the cards that I used in my testing. (Disclaimer: Intel provided me a pair of X520 NICs at no charge to use in this testing.) There might be others, I don’t know; the VMware HCL doesn’t seem to provide an easy way of identifying those NICs that do support the FCoE offloads vs. those NICs that don’t.

If you have a NIC that doesn’t support FCoE offloads, you won’t even be able to add a software FCoE initiator:

If, on the other hand, your NIC does support FCoE offloads, you’ll see the option to add a software FCoE initiator, like this:

Obviously, you’ll want to be sure that your NIC does support FCoE offloads before continuing.

Setting Up Networking for Software FCoE

Before you can actually set up a software FCoE initiator, you’ll first need to configure your networking appropriately. There is one important piece of information you’ll want to be sure to have before you continue: the ID of the VLAN for FCoE traffic.

If you’ve read my article on setting up FCoE on a Nexus 5000, then you know that—on a Nexus 5000 series switch, anyway—you must map the FCoE VSAN to a VLAN (using the fcoe vsan XXX command). You’re going to need that VLAN ID, so make sure that you have it. In a dual fabric setup, you’re going to have two different VLAN IDs. You’ll need both.

Once you have those VLAN IDs, you can then proceed with the networking setup:

  1. Create a VMkernel port on your ESXi host. When creating the VMkernel port, when prompted for VLAN ID specify the FCoE VLAN on fabric A.
  2. I don’t know why (and maybe this will be fixed in a future release), but you’ll need to assign an IP address to each VMkernel port. The IP address will not be used, so just pick an address on a subnet that you don’t use. (Don’t use a subnet that you are using elsewhere on the ESXi host or you could run into IP routing weirdness—remember that the VMkernel uses the first interface it finds for a particular route.)
  3. After you’ve created the VMkernel port, modify the NIC teaming policy to only use the physical uplink that is physically connected to fabric A. This might require a bit of sleuthing and/or using CDP/LLDP data to ensure that you have the right uplink selected.
  4. Create a second VMkernel port on your ESXi host, this time specifying the FCoE VLAN ID for fabric B and modifying the NIC teaming policy
    When creating the VMkernel ports, specify the appropriate VLAN IDs—one for fabric A, one for fabric B. Modify the NIC teaming policy to only use the physical uplink connected to fabric B, again using physical tracing and CDP/LLDP data as needed to verify it.

At this point, you should now have two VMkernel ports, each with separate (unused) IP addresses and configured for separate VLAN IDs and separate physical uplinks. The VLAN IDs and the physical uplinks should correspond to the FCoE fabrics (fabric A and fabric B).

Setting Up the Software FCoE Initiator

With the networking in place, you can actually add the software FCoE initiator using the “Add…” button on the Storage Adapters view of the Configuration tab. This opens up the Add Software Adapter dialog box I showed you earlier, where you can select “Add Software FCoE Adapter” and click OK.

That option opens the following dialog box:

You’ll note that the VLAN ID is 0 and isn’t changeable. I couldn’t find any way to enable this field, and in my testing it proved unnecessary to change it (it worked anyway). Select the appropriate physical uplink and click OK. You’ll do this twice—once for fabric A, and again for fabric B. After you’ve done this twice, you’ll have two software FCoE adapters:

For each of these two software FCoE adapters, you’ll see a node WWN and a port WWN listed. You can use these values in creating the zones and zonesets (see here for more information). First, though, you’ll want to be sure that the software FCoE adapter is actually talking to the fabric correctly; the best way to do that is to use the show flogi data command (on a Nexus 5000; other vendors’ switches will use a slightly different command). The outcome of the show flogi data command will look something like this:

In this screenshot (taken from an SSH session into a Nexus 5010), you can see that a device has logged into vfc1009 on VSAN 300. If you compare the port name and node name, you’ll see that they match one of the software FCoE adapters shown earlier. This is only one of the two fabrics; a matching result was seen from the other fabric, showing that both software FCoE adapters successfully logged into their respective fabrics.

At this point, the rest of the configuration consists of creating the appropriate device aliases (if desired); defining zones and zonesets; and presenting storage from the storage array(s) to the initiator(s). Since these are topics that are fairly well understood and well documented elsewhere, I won’t rehash them again here.

In future posts, I hope to be able to do provide some performance and some troubleshooting information. However, it will likely be early 2012 before I have that opportunity. In the meantime, if anyone has any additional information they’d like to share—including any corrections or clarifications—please feel free to add them to the comments below.

Tags: , , , ,

Welcome to Technology Short Take #17, another of my irregularly-scheduled collections of various data center technology-related links, thoughts, and comments. Here’s hoping you find something useful!

Networking

  • I think it was J Metz of Cisco that posted this to Twitter, but this is a good reference to the various 10 Gigabit Ethernet modules.
  • I’ve spoken quite a bit about stretched clusters and their potential benefits. For an opposing view—especially regarding the use of stretched clusters as a disaster avoidance solution—check out this article. It’s a nice counterpoint, especially from the perspective of the network.
  • Anyone know anything about sFlow?
  • Here’s a good post on VXLAN that has some useful information. I’d just like to point out that VXLAN is really only intended to address Layer 2 communications “within” a vApp or a collection of VMs (perhaps a single organization’s VMs), and doesn’t do anything to address Layer 3 routing/accessibility for clients (or “consumers”) attempting to connect to those systems. For that, you’ll still need—at least today—technologies like OTV, LISP, and others.
  • A quick thought that I’m still exploring: what’s the impact of OpenFlow on technologies like VXLAN, NVGRE, and others? Does SDN eliminate the need for these technologies? I’d be curious to hear your thoughts.

Servers/Operating Systems

  • If you’ve adopted Mac OS X Lion 10.7, you might have noticed some problems connecting to older servers/NAS devices running AFP (AppleTalk Filing Protocol). This Apple KB article describes a fix. Although I’m running Snow Leopard now, I was running Lion on a new MacBook Pro and I can attest that this fix does work.
  • This Microsoft KB article describes how to extend the Windows Server 2008 evaluation period. I’ve found this useful for Windows Server 2008 instances in the lab that I need for longer 60 days but that I don’t necessarily want to activate (because they are transient).

Storage

  • Jason Boche blogged about a way to remove stubborn hosts from Unisphere. I’ve personally never seen this problem, but it’s nice to know how to address it should it occur.
  • Who would’ve thought that an HDD could serve as a cache for an SSD? Shouldn’t it be the other way around? Normally, that would probably be the case, but as described here there are certain instances and ways in which using an HDD as a cache for an SSD can improve performance.
  • Scott Drummonds wraps up his 3 part series on flash storage in part 3, which contains information on sizing flash storage. If you haven’t been reading this series, I’d recommend giving it a look.
  • Scott also weighs in on the flash as SSD vs. flash on PCIe discussion. I’d have to agree that interfaces are important, and the ability of the industry to successfully leverage flash on the PCIe bus is (today) fairly limited.
  • Henri updated his VNXe blog series with a new post on EFD and RR performance. No real surprises here, although I do have one question for Henri: is that your car in the blog header?

Virtualization

  • Interested in setting up host-only networking on VMware Fusion 4? Here’s a quick guide.
  • Kenneth Bell offers up some quick guidelines on when to deploy MCS versus PVS in a XenDesktop environment. MCS vs. PVS is a topic of some discussion on the vSpecialist mailing list as they have very different IOPs requirements and I/O profiles.
  • Speaking of VDI, Andre Leibovici has two articles that I wanted to point out. First, Andre does a deep dive on Video RAM in VMware View 5 with 3D; this has tons of good information that is useful for a VDI architect. (The note about the extra .VSWP overhead, for example, is priceless.) Andre also has a good piece on VDI and Microsoft Outlook that’s worth reading, laying out the various options for Outlook-related storage. If you want to be good at VDI, Andre is definitely a great resource to follow.
  • Running Linux in your VMware vSphere environment? If you haven’t already, check out Bob Plankers’ Linux Virtual Machine Tuning Guide for some useful tips on tuning Linux in a VM.
  • Seen this page?
  • You’ve probably already heard about Nick Weaver’s new “Uber” tool, a new VM alignment tool called UBERAlign. This tool is designed to address VM alignment, a problem with how guest file systems are formatted within a VMDK. For more information, see Nick’s announcement here.
  • Don’t disable DRS when you’re using vCloud Director. It’s as simple as that. (If you want to know why, read Chris Colotti’s post.)
  • Here’s a couple of great diagrams by Hany Michael on vCloud Director management pods (both public cloud and private cloud management).
  • People automatically assume that “virtualization” means consolidating multiple workloads onto a single physical server. However, virtualization is really just a layer of abstraction, and that layer of abstraction can be used in a variety of ways. I spoke about this in early 2010. This article (written back in March of 2011) by Brad Hedlund picks up on that theme to show another way that virtualization—or, as he calls it, “inverse virtualization”—can be applied to today’s data centers and today’s applications.
  • My discussion on the end of the infrastructure engineer generated some conversations, which is good. One of the responses was by Aaron Sweemer in which he discusses the new (but not new) “data layer” and expresses a need for infrastructure engineers to be aware of this data layer. I’d agree with a general need for all infrastructure engineers to be aware of the layers above them in the stack; I’m just not convinced that we all need to become application developers.
  • Here’s a great post by William Lam on the missing piece to creating your own vSEL cloud. I’ll tell you, William blogs some of the coolest stuff…I wish I could dig in as deep as he does in some of this stuff.
  • Here’s a nice look at the use of PowerCLI to help with the automation of DRS rules.
  • One of my projects for the upcoming year is becoming more knowledgeable and conversant with the open source Xen hypervisor and Citrix XenServer. I think that the XenServer Design Handbook is going to be a useful resource for that project.
  • Interested in more information on deploying Oracle databases on vSphere? Michael Webster, aka @vcdxnz001 on Twitter, has a lengthy article with lots of information regarding Oracle on vSphere.
  • This VMware KB article describes how to enable centralized logging for vCloud Director cells. This is particularly important for HA environments, where VMware’s recommended HA strategy involves the use of multiple vCD cells.

I guess I should wrap it up here, before this post gets any longer. Thanks for reading this far, and feel free to speak up in the comments!

Tags: , , , , , , , , , , , , , ,

Welcome to Technology Short Take #15, the latest in my irregular series of posts on various articles and links on networking, servers, storage, and virtualization—everything a growing data center engineer needs!

Networking

My thoughts this time around are pretty heavily focused on VXLAN, which continues to get lots of attention. I talked about posting a dissection of VXLAN, but I have failed miserably; fortunately, other people smarter than me have stepped up to the plate. Here are a few VXLAN-related posts and articles I’ve found over the last couple of weeks:

  • There is a three-part series over at Coding Relic that does a great job of explaining VXLAN, the components of VXLAN, and how it works. Here are the links to the series: part 1, part 2, and part 3. One note of clarification: in part 3 of the series, Denny talks about a VTEP gateway. Right now, the VTEP gateway is the server itself; anytime a packet on a VXLAN-enabled network leaves the physical server to go to a different physical server, it will be VXLAN-encapsulated. It won’t be decapsulated until it hits the destination VTEP (the ESXi server hosting the destination VM). If (when?) VXLAN awareness hits physical switches, then the possibility of a VTEP gateway existing outside the server exists. Personally, it kind of makes sense—to me, at least—to build VTEP gateway functionality into vShield Edge.
  • Some people aren’t quite so enamored with VXLAN; one such individual is Greg Ferro. I respect Greg a great deal, so it was interesting to me to read his article on why VXLAN is “full of fail”. Some of his comments are only slightly related to VXLAN (the rant over IEEE vs. IETF, for example), but Greg’s comment about VMware building a new standard instead of “leveraging the value of networking infrastructure” echoes some of my own thoughts. I understand that VXLAN accomplishes things that existing standards apparently do not, but was a new standard really necessary?
  • Omar Sultan of Cisco took the time to compile some questions and answers about VXLAN. One thing that is made more clear—for me, at least—in Omar’s post is the fact that VXLAN doesn’t address connectivity to the vApps from the “outside” world. While VXLAN provides a logical isolated network segment that can span multiple Layer 3 networks and allow applications to communicate with each other, VXLAN doesn’t address the Layer 3 addressing that must exist outside the VXLAN tunnel. In fact, in my discussions with some of the IETF draft authors at VMworld, they indicated that VXLAN would require a NAT device or a DNS update in order to address changes in externally-accessible applications. This, by the way, is why you’ll still need technologies like OTV and LISP (or their equivalents); see this post for more information on how VXLAN, OTV, and LISP are complementary. If I’m wrong, please feel free to correct me.
  • In case you’re still unclear about the key problem that VXLAN attempts to address, this quote from Ivan Pepelnjak might help (the full article is here):

    VXLAN tries to solve a very specific IaaS infrastructure problem: replace VLANs with something that might scale better. In a massive multi-tenant data center having thousands of customers, each one asking for multiple isolated IP subnets, you quickly run out of VLANs.

  • Finally, you might find this PDF helpful. Ignore the first 13 slides or so; they’re marketing fluff, to be honest. However, the remainder of the slides have some useful information on VXLAN and how it’s expected to be implemented.

Servers

I didn’t really stumble across anything strictly server hardware-related; either I’m just not plugged into the right resources (anyone want to make some recommendations?) or it was just a quiet period. I’ll assume it was the former.

Storage

Virtualization

  • Did you see this post about new network simulation functionality in VMware Workstation 8?
  • Here’s a good walk-through on setting up vMotion across multiple network interfaces.
  • VMware vSphere Design co-author Maish Saidel-Keesing has a post here on how to approximate the functionality of netstat on ESXi.
  • William Lam has a “how to” on installing the VMware VSA with running VMs.
  • Fellow vSpecialist Andre Leibovici did a write-up on a proof of concept that the vSpecialists did for a customer involving Vblock, VPLEX, and VDI. This was a pretty cool use case, in my opinion, and worth having a look if you need to design a highly available environment.
  • Thinking about playing with vShield 5? That’s a good idea, but check here to learn from the mistakes of others first. You’ll thank me later.
  • The question of defragmenting guest OS disks has come up again and again; here’s the latest take from Cormac Hogan of VMware. He makes some great points, but I suspect that this question is still far from settled.

It’s time to wrap up now; I hope that you found something useful. As always, thanks for reading! Feel free to share your views or thoughts in the comments below.

Tags: , , , , , , , , ,

Welcome to Technology Short Take #14, another collection of links and tidbits of information I’ve gathered over the last few weeks. Let’s dive right in!

Networking

Much of my focus in the networking space recently has been around virtualization-centric initiatives, so the items on this list might seem a bit skewed in that direction. Sorry!

  • I’ve been doing some reading on 802.1Qbg (Edge Virtual Bridging). I still have a long way to go, but I think that I’m starting to better understand this draft and the problem(s) it’s attempting to address. As usual when I’m dealing with networking-related technologies, especially data center-focused networking technologies, I’m finding some of Ivan Pepelnjak’s articles useful. For example, he wrote an article on how EVB should ease VLAN configuration pains; this article is helpful in translating the terms the IEEE uses (like “virtual station interface” and “EVB station”) into terms virtualization-friendly folks like me can understand (like “vNIC” and “Hypervisor supporting EVB”, respectively). Ivan also provides a rough comparison of 802.1Qbh/FEX and 802.1Qbg, which I also found helpful in better understanding both technologies. There is still much that I want/need to understand, such as how 802.1Qbg affects or is affected by VXLAN, the recent darling of the cloud networking space.
  • Speaking of VXLAN, a number of articles have emerged since the announcement of VXLAN last week at VMworld 2011 in Las Vegas. Jon Oltsik of Network World called it “Cloud Network Segmentation at Layer 2.5″, which is catchy but doesn’t really delve into the details of VXLAN and how it plays into/with related data center protocols. Of course, there’s also the obligatory VMware post on the technology, talking about how great it is—naturally—but failing to again provide substantive information on the relationships between VXLAN and other, related data center technologies. If anything, Allwyn’s post made VXLAN seem even more proprietary and linked to vCloud Director and vShield Edge than I’d understood it to be. Fortunately, Ivan weighed in on the proposed new standard and also provided some information on the relationship between VXLAN, OTV, and LISP. I’m still digging into VXLAN myself and I plan to post an article within the next week or so (I’ve been a bit busy with moving halfway across the United States).
  • Ivan also has a post with more details on the Brocade VCS fabric load-balancing behaviors that’s worth having a look.

Servers

  • This article on AES-NI in the newer Intel CPUs is a great look at the benefits of adding symmetric encryption support at the CPU level. Almost makes me want to go out and buy a new MacBook Pro so that I could use File Vault 2 with hardware encryption support…

Storage

  • One cool find recently is this series of “hands-on” posts by Henri Hamalainen (aka @henriwithani) on the EMC VNXe 3300. I had the pleasure of meeting Henri in person at VMworld this year, and he mentioned that he’d started a series of posts on the VNXe 3300. His posts are here: part 1, part 2, part 3, part 4, part 5, and part 6. (Part 7 hasn’t yet been written.)
  • There’s been quite a hubbub going on in the FCoE space, with so many articles flashing back and forth from various contributors I’m still having a hard time deciphering all of it. From what I can tell, it all started with an article by a VP at Juniper about FCoE over TRILL. That sparked Ivan Pepelnjak to coin some new terms: “dense-mode FCoE” (in which FCFs exist at every hop) and “sparse-mode FCoE” (in which LAN switches may forward FCoE frames without any FCoE awareness). That, in turn, sparked an article by Tony Bourke in which he creates more new modes of operation: single hop FCoE (SHFCoE), dense-mode FCoE (DMFCoE), and sparse-mode FCoE (SMFCoE). A fantastic (and very informative) discussion ensued in the comments to that article and the follow-up article. Ivan also responded to Tony’s post as well with a post on FCoE network elements classification. I’m not sure that all the contributors ever came to a consensus, but you’ll learn a lot about FCoE and related technologies just following along, that’s for sure.
  • By the way, this transcript of questions and answers from a live FCoE webcast has some great information buried in it as well.
  • This is an older article, but Stephen Foskett does a good job with discussing FCoE vs. iSCSI. Like so many other IT-related decisions, it’s not an “either-or” discussion—it’s about finding the right tool to do the job.
  • This article provides one suggestion for handling zoning with multiple storage arrays, and provides some good information on EMC CLARiiON/VNX arrays in the process.

Virtualization

  • The idea of stretched clusters and interconnecting data centers continues to be an idea many people are interested in exploring. Rawley Burbridge, of IBM, discusses how this might be done using IBM SVC and VMware vSphere in this three-part series (part 1, part 2, and part 3).
  • Kendrick Coleman, in conjunction with a collection of folks from both VMware and VCE, recently published an article on design considerations for vCloud Director on a Vblock. I haven’t yet read the full document, primarily because it appears to require a Facebook login in order to download. (I don’t use Facebook.)
  • Andre Leibovici—who I had the pleasure of meeting in person at VMworld—has an article on how to modify the Windows Registry settings (or apply Group Policy) for the VMware View Client in order to integrate self-service password reset.
  • The VMware vCloud Architecture Toolkit (vCAT) version 2.0 is now available; get it here (via Beaker).
  • Forbes Guthrie—the lead author with whom I worked on VMware vSphere Design, published earlier this year—posted some great 10Gb Ethernet-related information from VMworld session SPO3040.
  • This is a slightly older post from Hany Michael, but a good one nevertheless; he posts information on how to publish the vCloud Director portal on the Internet.

I guess that will wrap things up for this time around. Thanks for reading, and I hope that you found something useful in this varied mix of links. Feel free to share more in the comments below!

Tags: , , , , ,

Now that I’ve published the Storage Edition of Technology Short Take #12, it’s time for the Networking Edition. Enjoy, and I hope you find something useful!

  • Ron Fuller’s ongoing deep dive series on OTV (Overlay Transport Virtualization) has been great for me. I knew about the basics of OTV, but Ron’s articles really gave me a better understanding of the technology. Check out the first three articles here: part 1, part 2, and part 3.
  • Similarly, Joe Onisick’s two-part (so far) series on inter-fabric traffic on Cisco UCS is very helpful and informative as well. There are definitely some design considerations that come about from deploying VMware vSphere on Cisco UCS. Have a look at Joe’s articles on his site (Part 1 and Part 2).
  • Kurt Bales’ article on innovation vs. standardization is a great read. The key, in my mind, is innovating (releasing “non-standard” stuff) while also working with the broader community to help encourage standardization around that innovation.
  • Here’s another great multi-part series, this time from Brian Feeny on NX-OS (part 1 here, and part 2 here). Brian exposes some pretty interesting stuff in the NX-OS kickstart and system image.
  • I’ve discussed LISP a little bit here and there, but Greg Ferro reminds us that LISP isn’t a “done deal.”
  • J Metz wrote a good article on the interaction (or lack thereof, depending on how you look at it) between FCoE and TRILL.
  • For a non-networking geek like me, some great resources to become more familiar with TRILL might include this comparison of 802.1aq and TRILL, this explanation from RFC 5556, this discussion of TRILL-STP integration, or this explanation using north-south/east-west terminology. Brad Hedlund’s TRILL write-up from a year ago is also helpful, in my opinion. All of these are great resources, in my mind.
  • And as if understanding TRILL, or the differences between TRILL and FabricPath weren’t enough (see this discussion by Ron Fuller on the topic), then we have 802.1aq Shortest Path Bridging (SPB) thrown in for good measure, too. If it’s hard for networking experts to keep up with all these developments, think about the non-networking folks like me!
  • Ivan Pepelnjak’s examination of vCDNI-based private networks via Wireshark traces exposes some notable scalability limitations. It makes me wonder, as Ivan does, why VMware chose to use this method versus something more widely used and well-proven, like MPLS? And isn’t there an existing standard for MAC-in-MAC encapsulation? Why didn’t VMware use that existing standard? Perhaps it goes back to innovation vs. standardization again?
  • If you’re interested in more details on vCDNI networks, check out this post by Kamau Wanguhu.
  • Omar Sultan of Cisco has a quick post on OpenFlow and Cisco’s participation here.
  • Jake Howering of Cisco (nice guy, met him a few times) has a write-up on an interesting combination of technologies: ACE (load balancing) plus OTV (data center interconnect), with a small dash of VMware vCenter API integration.

I think that’s going to do it for this Networking Edition of Technology Short Take #12. I’d love to hear your thoughts, suggestions, or corrections about anything I’ve mentioned here, so feel free to join the discussion in the comments. Thanks for reading!

Tags: , , , ,

« Older entries