Storage Protocol Performance Whitepaper from NetApp

NetApp recently published a white paper summarizing some tests they ran to compare storage protocol performance in a VMware Infrastructure environment. The white paper, TR-3697, compares the storage performance of Fibre Channel, software iSCSI, and NFS against a couple of different NetApp storage systems.

I won’t go into all the sordid details here—you can read the white paper yourself—but the end results look something like this:

  • Fibre Channel provided the highest throughput and the lowest processor utilization of all the storage protocols.
  • Software iSCSI provided only slightly lower throughput than Fibre Channel (not more than 9% or 10% less than Fibre Channel depending upon the specific tests being run). However, software iSCSI consistently showed the highest CPU utilization on the ESX hosts.
  • NFS showed throughput on the same levels as software iSCSI (again, not more than about 9% or 10% less than Fibre Channel depending upon the tests being run) and had higher CPU utilization than Fibre Channel. However, the CPU utilization was lower than with software iSCSI.

While overall performance was roughly comparable between all three storage protocols, depending upon the tests being run, the host CPU utilization was a different story entirely. In some cases, software iSCSI’s CPU utilization was as much as 80%—that’s right, almost double—that of Fibre Channel. In no cases did the CPU utilization drop below 40% higher than Fibre Channel. Keep in mind these numbers are relative to Fibre Channel. So if Fibre Channel used 200MHz of host CPU power and software iSCSI used 360MHz of host CPU power, that’s an 80% relative increase. We don’t know, unfortunately, how this translates into actual host CPU usage; in my mind, that’s a key piece of information that really should have been included. I’m puzzled as to why it’s not included.

NFS fared better; at its worst, the tests showed NFS running CPU overhead 40% greater than Fibre Channel. At its best, NFS looked like it was only requiring about 15% more CPU overhead than Fibre Channel (keep in mind the comments made above regarding relative utilization). Of course, NetApp loves to push the NFS; the document adds the extra sell for NFS:

While NFS does not quite achieve the performance of FC and has a slightly higher CPU utilization, it does have some advantages over FC that should be considered when deciding which protocol to deploy. Running on a standard TCP/IP network, NFS does not require the expensive Fibre Channel switches, host bus adapters, and Fibre Channel cabling that FC requires, making NFS a lower cost alternative of the two protocols. Additionally, operational costs are low with no specialized staffing or training needed in order to maintain the environment. Also, NFS provides further storage efficiencies by allowing on-demand resizing of data stores and increasing storage saving efficiencies gained when using deduplication. Both of these advantages provide additional operational savings as a result of this storage simplification.

I suppose I can’t blame them; NFS is one of their strong points, so they’ll naturally lean that direction.

There are a few key things that I need to say about this document, though:

  1. Benchmark tests can be made to say just about anything. It’s all in the types of tests that you run and the parameters of those tests. I’m not saying that NetApp specifically skewed the tests in any way; what I am saying, though, is that users need to take these types of benchmark tests as a general guideline and not the definitive word.
  2. While NetApp does highlight the “operational savings” of NFS, what they fail to mention is the added complexity of scaling NFS traffic as the environment grows. Fibre Channel multipathing in a VMware environment is very robust, and I expect that the Round Robin pathing policy will move from “experimentally supported” to fully supported rather quickly. This makes it quite easy to scale the FC connection, although to be honest that probably won’t be necessary. However, to scale the NFS connection, you need multiple NFS exports with multiple IP addresses, link aggregation via LACP/802.3ad/EtherChannel and switches that support cross-switch link aggregation, and possibly multiple VMkernel ports on different IP subnets. This is described, by the way, in the latest revision of TR-3428, also from NetApp. (As a side note, I believe that these scaling issues would affect any NFS storage vendor and are not specific to NetApp in any way.)
  3. If you look at VMware’s development, you will see that Fibre Channel gets the goods the earliest. iSCSI and NFS were only added in VMware Infrastructure 3, whereas Fibre Channel support has been around in ESX for much longer. Storage VMotion support went to Fibre Channel first. VCB support went to Fibre Channel first. SRM support went to both iSCSI and Fibre Channel, but not NFS. Fibre Channel multipathing is, as I mentioned already, quite robust; iSCSI multipathing and NFS multipathing aren’t quite so robust. All these things considered, there could be a sound business case to use Fibre Channel in spite of cost savings from iSCSI (especially software iSCSI, given the added CPU overhead) or NFS. That’s something that each individual organization will need to decide for themselves.

By the way, I know the gentleman that wrote this technical report and he’s a straight-up guy. I respect him. So, don’t take any of my comments or thoughts to imply anything beyond the fact that I’m simply presenting my thoughts around the data contained in this document. You should also know that I am a fan of using NFS for VMware, but I don’t necessarily believe that it is the “slam dunk” that it’s often presented to be.

UPDATE: I’ve made some corrections to the interpretations of the CPU utilization numbers in response to some of the comments below.

Tags: , , , , , , , ,

Wonder ISCSI would have fared in these tests using iscsi HBA’s instead of the software initiator. Offloading the IP processing and overhead to the hba instead of the OS and proc would help to even out the CPU utilization between FC and ISCSI. Seems like more of a 1 for 1 test as well

Your points make sense Scott. I believe that NFS datastores will be viable (in production) when organizations make the jump to 10Gbit ethernet. The HP C-class enclosures with 10Gbit VirtualConnect with 3-enclosure-area-networks plus 10Gbit on the NetApp side should perform “better” than current 4Gbit FC.

I really think that NFS should be sold based on its benefits over the performance aspect. Things like thin provisioning, easy file system mounting for recovery, snapshot flexibility, etc are the best parts. Having to double up on datastore volume sizes to take LUN clones when your fractional reserve = 100 can be limiting. You need to have a lot of free space to save space with flexclones. NFS free’s up a lot of that slack space requirement.

Brian,

It’s absolutely true that using iSCSI HBAs would have changed the results dramatically, especially for CPU utilization. I not so sure that performance would have changed all that much.

Mlambert,

10Gb Ethernet will change the picture, no doubt; keep in mind, though, that 8Gb Fibre Channel is already on the stage as well.

With regards to the “operational savings” that you also mention here, I think it is pertinent to point out that VMDKs are only thin provisioned at the beginning:

http://blog.scottlowe.org/2008/03/31/only-thin-provisioned-in-the-beginning/

Using Storage VMotion or templates pretty much removes that advantage, but the other features you mentioned are still applicable.

I think going forward the battle is between hw iscsi, NFS, and Fiber Channel over Ethernet.

Basically, ethernet is going to be the physical transport…the only question is what storage protocol is going to run over it and whether hosts are going to need hw assistance.

Fiber Channel Over Ethernet may win out, but it’s going to take a few years before it’s deployed enough to effect anything.

In the short term, then, that just leaves NFS and hw iscsi.

My thoughts: If you have a netapp, go NFS. If you don’t have a netapp, go hw iscsi. I’ve benchmarked alot of storage systems and netapp is really the only I trust to have enterprise level nfs performance. maybe someone else thinks some of the emc clarion gear would work too, but I have a general aversion to emc storage after seeing the heavy handness of their sales force.

HW ISCSI does seem to be the most cross platform vendor neutral way to get good performance over ethernet at the moment, so that’s what we deployed in our cluster.

We love NFS, and we truly miss the thin provisioning features, but when we selected our san hardware we couldn’t find a netapp that met both our price requirements and general requirements (we would have had to deploy one solution and upgrade to a higher end head unit within 2 years, plus the feature licensing aspects of netapp are painful.) I truly wish netapp would switch to a feature licensing model like equallogic.

Matthew,

I do agree that the likely winner in the long-term is FCoE, but–as you pointed out–it’s likely to be several years before that happens.

Again, I have to go back the mention of “thin provisioning” on NFS…if you are referring to the VMDKs, see the article I wrote that’s linked above. If you are referring to thin provisioning the FlexVols, keep in mind that we can do the same thing with LUNs. In fact, thin provisioning FlexVols and LUNs is a recommended course of action if you are going to also use deduplication.

Two things I’d like to mention.

First, VMware and NetApp produced this TR jointly and this TR is co-branded by both NetApp and VMware. In order to achieve co-branding VMware did an independent confirmation of our results using the configuration detailed in the TR and they achieved the same results. Their equipment was slightly different, but they followed the same testing methodology that we used.

Also, the CPU numbers are all relative in comparison to Fibre Channel. While I’m not at liberty to give exact numbers (and the numbers following are in no way representative of any of the tests), if FC had a 15% CPU utilization and iSCSI had a 25% CPU utilization then that number would actually show iSCSI as being about 67% higher than FC. The formula breaks down to ((A-B)/B)*100, with A representing the actual iSCSI percentage and B representing the actual FC percent.

Scott - I believe your comments on the CPU overhead was misinterpreted. The results are relative percentages. For example if one test has 10% CPU utilization and the second test has 15%, than the relative difference is 50% where the actual difference was 5%.

Vaughn,

Right, I believe that’s what Jack was clarifying above. Any clarification as to why potentially misleading relative percentages were used for CPU utilization? I can see that it makes sense with regards to throughput, as Fibre Channel is considered the “standard” for storage throughput. But why use relative numbers for CPU utilization?

I personally suspect that relative numbers are used because the actual cpu cost of the storage traffic is rapidly becoming a non-issue. VMworld 2007 also had a session with a powerpoint that provided relative numbers not absolute numbers. With small I/Os (8kb) on iSCSI, I have seen about 1 Mhz per IO (that includes both VM CPU + Host CPU) on software iSCSI. For my workloads, I can assume one core worth of CPU will be busily doing IO work. Who cares? Cores are like tribbles…

I would choose a protocol based on other factors.

> Who cares? Cores are like tribbles…

That’s a classic, that I will remember and re-use :-)

This is very interesting whitepaper. I would have loved to see also the CPU utilization fluctuations, if the Fiber Channel was set at 1Gb and 4Gb (probably not natively supported by the equipment used for the test).

Has anyone tested the above variables? I too would be interested in seeing that outcome.