NAS

You are currently browsing articles tagged NAS.

NetApp recently published a white paper summarizing some tests they ran to compare storage protocol performance in a VMware Infrastructure environment. The white paper, TR-3697, compares the storage performance of Fibre Channel, software iSCSI, and NFS against a couple of different NetApp storage systems.

I won’t go into all the sordid details here—you can read the white paper yourself—but the end results look something like this:

  • Fibre Channel provided the highest throughput and the lowest processor utilization of all the storage protocols.
  • Software iSCSI provided only slightly lower throughput than Fibre Channel (not more than 9% or 10% less than Fibre Channel depending upon the specific tests being run). However, software iSCSI consistently showed the highest CPU utilization on the ESX hosts.
  • NFS showed throughput on the same levels as software iSCSI (again, not more than about 9% or 10% less than Fibre Channel depending upon the tests being run) and had higher CPU utilization than Fibre Channel. However, the CPU utilization was lower than with software iSCSI.

While overall performance was roughly comparable between all three storage protocols, depending upon the tests being run, the host CPU utilization was a different story entirely. In some cases, software iSCSI’s CPU utilization was as much as 80%—that’s right, almost double—that of Fibre Channel. In no cases did the CPU utilization drop below 40% higher than Fibre Channel. Keep in mind these numbers are relative to Fibre Channel. So if Fibre Channel used 200MHz of host CPU power and software iSCSI used 360MHz of host CPU power, that’s an 80% relative increase. We don’t know, unfortunately, how this translates into actual host CPU usage; in my mind, that’s a key piece of information that really should have been included. I’m puzzled as to why it’s not included.

NFS fared better; at its worst, the tests showed NFS running CPU overhead 40% greater than Fibre Channel. At its best, NFS looked like it was only requiring about 15% more CPU overhead than Fibre Channel (keep in mind the comments made above regarding relative utilization). Of course, NetApp loves to push the NFS; the document adds the extra sell for NFS:

While NFS does not quite achieve the performance of FC and has a slightly higher CPU utilization, it does have some advantages over FC that should be considered when deciding which protocol to deploy. Running on a standard TCP/IP network, NFS does not require the expensive Fibre Channel switches, host bus adapters, and Fibre Channel cabling that FC requires, making NFS a lower cost alternative of the two protocols. Additionally, operational costs are low with no specialized staffing or training needed in order to maintain the environment. Also, NFS provides further storage efficiencies by allowing on-demand resizing of data stores and increasing storage saving efficiencies gained when using deduplication. Both of these advantages provide additional operational savings as a result of this storage simplification.

I suppose I can’t blame them; NFS is one of their strong points, so they’ll naturally lean that direction.

There are a few key things that I need to say about this document, though:

  1. Benchmark tests can be made to say just about anything. It’s all in the types of tests that you run and the parameters of those tests. I’m not saying that NetApp specifically skewed the tests in any way; what I am saying, though, is that users need to take these types of benchmark tests as a general guideline and not the definitive word.
  2. While NetApp does highlight the “operational savings” of NFS, what they fail to mention is the added complexity of scaling NFS traffic as the environment grows. Fibre Channel multipathing in a VMware environment is very robust, and I expect that the Round Robin pathing policy will move from “experimentally supported” to fully supported rather quickly. This makes it quite easy to scale the FC connection, although to be honest that probably won’t be necessary. However, to scale the NFS connection, you need multiple NFS exports with multiple IP addresses, link aggregation via LACP/802.3ad/EtherChannel and switches that support cross-switch link aggregation, and possibly multiple VMkernel ports on different IP subnets. This is described, by the way, in the latest revision of TR-3428, also from NetApp. (As a side note, I believe that these scaling issues would affect any NFS storage vendor and are not specific to NetApp in any way.)
  3. If you look at VMware’s development, you will see that Fibre Channel gets the goods the earliest. iSCSI and NFS were only added in VMware Infrastructure 3, whereas Fibre Channel support has been around in ESX for much longer. Storage VMotion support went to Fibre Channel first. VCB support went to Fibre Channel first. SRM support went to both iSCSI and Fibre Channel, but not NFS. Fibre Channel multipathing is, as I mentioned already, quite robust; iSCSI multipathing and NFS multipathing aren’t quite so robust. All these things considered, there could be a sound business case to use Fibre Channel in spite of cost savings from iSCSI (especially software iSCSI, given the added CPU overhead) or NFS. That’s something that each individual organization will need to decide for themselves.

By the way, I know the gentleman that wrote this technical report and he’s a straight-up guy. I respect him. So, don’t take any of my comments or thoughts to imply anything beyond the fact that I’m simply presenting my thoughts around the data contained in this document. You should also know that I am a fan of using NFS for VMware, but I don’t necessarily believe that it is the “slam dunk” that it’s often presented to be.

UPDATE: I’ve made some corrections to the interpretations of the CPU utilization numbers in response to some of the comments below.

Tags: , , , , , , , ,

NetApp Blog Aggregator?

VMware’s done a great job of rolling together the VMware news and views with their VMware blog aggregator, Planet V12n. Does anyone know of something equivalent for NetApp? Where is “Planet NetApp”?

Tags: , , , ,

I’m relatively new to NetApp deduplication (formerly A-SIS), so this article won’t be an advanced treatise on NetApp deduplication or its deep inner workings. Instead, this is intended to be a quick guide to setting up NetApp deduplication for others, like myself, who may be familiar with Data ONTAP but not necessarily deduplication.

Obviously, the first step will be to ensure that your NetApp storage system is licensed for deduplication. As of March 10, NetApp made the NearStore option, which was a prerequisite for deduplication, free. Yes, you read that right: free. Since NearStore is a prerequisite, you’ll need to be sure to license that first:

license add <Code for NearStore>
license add <Code for Deduplication>

Once deduplication is licensed, then you can enable it on a per-volume basis using the “sis on” command:

sis on /vol/<volname>

Note, however, that the volume cannot exceed a certain size, based on the storage system model, in order for deduplication to work. These volume size limits are laid out in TR3505. Note that the volume must never have been any bigger than the size limits described, so this means you can’t size it down to the limits set forth and then run deduplication.

Once it’s running, you can check the status with:

sis status /vol/<volname>

After it’s finished running, you can see your space savings like this:

df -s /vol/<volname>

After running deduplication on a small NFS volume that housed only three VMs, the “df -s” command showed a space savings of 64%. That’s pretty impressive!

Moving forward, deduplication will run automatically every night at midnight, as shown by this command:

sis config /vol/<volname>

That should be enough to get most everyone started. Feel free to post comments or corrections below.

Tags: , , , , ,

A niggling doubt about thin provisioned disks was placed in my head when I read Duncan’s article on a Storage VMotion problem; in that article, a statement is made that ESX Server 3.5 no longer supports thin provisioned disks. Intrigued by that comment, I started doing some digging to see if that was actually the case. I was unable to find any concrete statement one way or the other.

Some testing in my lab showed that with ESX Server 3.5, VMDKs are still thin provisioned by default when stored on an NFS datastore. So that put to rest the idea that thin provisioned disks had been abandoned, but now I was curious to follow up on the issue of how ESX Server handled cloning thin provisioned disks, as mentioned in Virtualization Short Take #1.

Additional testing showed that although the VMDKs are indeed thin provisioned at the beginning of their life, they won’t necessarily stay that way:

  • Migration of the VMs files from one datastore to another, even if the destination datastore is also NFS, will cause the VMDKs to revert to thick (fully allocated) VMDKs.
  • Clones made from the thin provisioned disks have thick provisioned VMDKs.
  • A Storage VMotion operation will cause the disks to become fully allocated instead of thin provisioned.

This is a strong counterpoint to the arguments in favor of using NFS for your VMware storage in order to gain thin provisioned disks. In order to really take advantage of thin provisioned disks, every VM must be provisioned from scratch—no cloning within VirtualCenter—and you must give up Storage VMotion or cold migration of the VMDKs.

So far, I have not found any workaround for this behavior. If anyone knows of a workaround, please share it in the comments. (To be honest, I don’t really expect to find one.)

Tags: , , , , ,

A friend of mine at Network Appliance was one of the presenters last year at VMworld 2007 for the now-famous presentation that showed a solution from Network Appliance where 100 VMs are created in just a couple of minutes. It’s great technology that is extremely useful in exactly those kinds of situations. I love it.

The video is so popular that it’s even been posted to YouTube. (By the way, did you know I’m on YouTube? My kids think that’s the greatest thing in the world, but I’m not so convinced.)

And, according to Manlio, it appears that they are showing off this kind of thing again at VMworld Europe 2008.

But it’s technology that’s not available yet.

Yep, that’s right. It’s not available yet. It’s based on new functionality, related to their existing FlexClone functionality (which I’ve blogged about before), that is due to be released very soon. Combine this new functionality with NFS on a NetApp storage system and you’ll be able to do exactly what NetApp is demonstrating. But not today…not until these new features are made available to the public.

That bothers me. I suppose it shouldn’t; I mean, you’ve got all sorts of vendors talking about their products and what their products can do when those products aren’t yet available. Microsoft Hyper-V is one example—it’s not available yet, won’t be until later this year, and yet Microsoft is showing it off. VMware is doing the same thing with the Continuous HA stuff they demo’ed at VMworld 2007. Likewise, VMware’s done the same thing this year with offline VDI and scalable virtual image technology.

So, if you’re thinking about a huge VDI deployment and planning on putting that on NetApp storage, that’s fine because there are plenty of other reasons to use Network Appliance—deduplication, anyone? But don’t plan on being able to take advantage of some of this highly touted functionality until it is publicly released.

UPDATE: Another colleague of mine at NetApp wrote me to clarify that the file-level cloning functionality demonstrated in the video is not, technically speaking, related to FlexClone functionality since FlexClone operates on a per-volume basis. I might argue that they both appear to exploit the same underlying functionality in WAFL, but I don’t know that for certain and at that point we’re splitting hairs anyway.

Tags: , , , , , ,

Proving VMware Over NFS

I’m a fan of using NFS for VMware; I’ve mentioned it before. I’m not the only one, either; there have been a number of recent blog entries from various people regarding the use of NFS for VMware:

Virtual Optics: Why VMware over NetApp NFS

Storage Nuts & Bolts: VMware over NetApp NFS: A Customer’s Testimonial

It’s great that this is receiving more attention in the spotlight, but I still have one question: where are the statistics proving NFS’ value in VMware deployments?

If NFS is equally as good as Fibre Channel or iSCSI—and personally I agree with Nick that most deployments would be hard-pressed to tell the difference—then where are the stats that show this? Or is it impossible to demonstrate that a VMware deployment on NFS is “just as good” as one on Fibre Channel or iSCSI? Does the value of NFS come in subjective measurements that can’t be quantified, and perhaps that’s why we haven’t seen any hard proof of the value of NFS when compared to Fibre Channel or iSCSI?

If you have some insight, please share it in the comments. I’d love to hear everyone’s thoughts on the matter.

Tags: , , , , ,

I guess I’m on a bit of a NetApp kick this week.  After discussing (or perhaps revisiting) the idea of recovering files inside VMs using NetApp Snapshots (first here late last year, then again here), I wanted to take a closer look at full VM recovery using NetApp Snapshots.

First of all, it should go without saying that you should never use any of the procedures I’m describing here without first testing them yourself.  While they worked fine for me, they may not work fine for you.  Don’t just assume they will!  Do the due diligence and test it in your environment first; you’ll be glad you did.

Second, before using NetApp Snapshots to recover VM data (file-level or full VM), be sure you are getting good, consistent Snapshots.  The Network Appliance Technical Reports Library has a number of excellent articles on this subject; I’ll defer you there for more information.

I’ll break this article into two sections, one for block-level storage (I’m using iSCSI, but the process should be almost identical for Fibre Channel) and one for NAS/NFS.  Please note that I’m not focusing so much on the specific steps that are required as I am on general concepts and any gotchas that may arise during the process.

Full VM Recovery using Block Storage

To recover a full VM using block-level storage, a number of steps have to be taken:

  1. Create a LUN clone (or a FlexClone) of the original LUN based on a Snapshot. 
  2. Enable resignaturing on the ESX host(s) that will need to see the cloned LUN.
  3. Mount the cloned LUN(s) on the ESX host(s) and copy the appropriate VM files from the clone to the production LUN.

For the first two steps, I’ll refer you back to one of my first articles on VMware data recovery with Snapshots, which has more information on the necessary commands and settings.

For the third step, you’ll need to login to the Service Console (typically via SSH) and copy the desired VM(s)—and all their files—from the cloned datastore to the production datastore, overwriting whatever is in the destination (you typically wouldn’t need to recover a full VM unless the production VM was hosed, right?).  Once the file(s) have been copied back over to the production datastore, dismount the cloned datastore and destroy it.

You should now be able to boot up your VM at the state it was in at the time of the Snapshot used to recover it.  Unless the Snapshot was a cold Snapshot (taken while the VM was powered off), the VM will perform a file system check (chkdsk or fsck) when it boots up.

Full VM Recovery using NFS

The procedure for recovering full VMs when using NFS is even easier:

  1. Using an NFS client, mount the NFS export and navigate to the hidden “.snapshot” directory.
  2. In the “.snapshot” directory, find the Snapshot from which you wish to recover the VM.
  3. Copy that VM’s files (the entire folder) out of the “.snapshot” directory into the production filesystem, replacing the current contents (again, this assumes that what’s in the production filesystem is no good, else why would you be recovering a full VM?).
  4. Unmount the NFS export from your NFS client.

The recovered VM should now boot and be back to the point in time at which the Snapshot was taken.  Again, unless the Snapshot was a cold Snapshot, the VM will likely perform a file system check upon boot.  This is normal and not unexpected.

I suppose you could even do this second procedure from a CIFS client, assuming that CIFS and NFS were both configured on the storage system and an appropriate CIFS share existed.  (Please note that I’ve never tried this, so I can’t tell you what the results might be.)  In that case, use the “~snapshot” directory instead of “.snapshot”.

And that’s it—there you have two ways of recovering entire VMs using Network Appliance Snapshots.  As always, feel free to hit me up in the comments with any questions, thoughts, corrections, or rants (just keep the rants on-topic, please!).  Thanks for reading!

Tags: , , , , , , ,

Nifty NFS-VMware Trick

I can take absolutely zero credit for this idea; it came completely from this aticle by Nick Triantos.  But the trick is so absolutely cool, so incredibly useful, and yet so obvious (once you read it, you’ll smack yourself in the head and say, “Why didn’t I think of that?”) that I just had to say something about it.

The use of NFS is getting more and more attention (I blogged about it briefly a few days ago) as a primary storage technology for VMware deployments.  Although NFS lacks the raw throughput of Fibre Channel, once you start loading up VMs in a datastore NFS begins to look more and more attractive.  But performance is only part of the allure here, especially when using something like a Network Appliance storage system with its Snapshot functionality.  (Yes, other vendors can do the same kinds of things.  Substitute your favorite vendor or filesystem here, if you so desire.  I would imagine you could do something similar with ZFS.)

The basic gist of the article (I do encourage you to go read it; I’ve already added it to my del.icio.us bookmarks) is to use NetApp Snapshots to gain access to VMware’s VMDK files (even while the VM is running), and Linux with the Linux-NTFS driver to mount virtual machine disk files over NFS for file-level backups of both Windows and Linux guest VMs.  Now that’s something not even VCB can do (VCB file-level backups are limited to Windows guests).  Pretty cool, if you ask me.

Tags: , , , ,

NFS for VMware Storage

I thought that I had blogged here before about using NFS for VMware storage, but it appears that I have not.  (I guess that’s one of the downfalls of a fairly long-running weblog—you blog about some things too often and not at all about other topics.)  In any case, following some of the VMworld breakout sessions last week, NFS is getting a lot more attention these days as the storage protocol for VMware.

A couple of recent blog entries on this topic caught my attention:

Eisler’s NFS Blog - VMware over NFS?
Storage - VMware over NFS

Network Appliance seems to be talking the most about NFS for VMware, which kind of makes sense given their history in NFS.  I’m using NFS in our lab (which uses NetApp storage systems) and have had nothing but positive experiences thus far.  I have not yet had the opportunity to conduct any performance tests, but I do plan to try to work up some numbers on NFS vs. software-based iSCSI.  I can’t, unfortunately, compare to Fibre Channel as I have no FC infrastructure in the lab (yet).

I’d love to hear feedback from any readers that might be using NFS for VMware storage.  What have your experiences been?

Tags: , , , ,

I just wrapped up two different sessions on IP-based storage, one on iSCSI configuration and one on performance characteristics and comparisons between iSCSI and NFS.

I couldn’t liveblog the first session because it was too crowded (no room to type on my laptop) and I couldn’t get a wireless signal from the VMworld 2007 network.  There are, however, a couple key points from the session that stick out in my mind:

  • ESX Server does not currently support MCS (multiple connections per session) or jumbo frames, two key optimizations that can really help with iSCSI performance.  There is no word yet on when those shortcomings will be addressed; personally, I’m hoping that VMware fixes them in ESX 3.5.
  • There is, apparently, some way of performing manual load balancing of iSCSI LUNs to help improve performance.  The speakers did not go into any great details, and I was unable to speak with one of the presenters, Jon Hall, after the session.  He did, however, invite me to contact him via e-mail, so I’ll post more information on that once I’ve had some communication with him.

Most of the rest of the information presented in that session was pretty straightforward and was information I’d already seen.  All in all, it was a decent session, but I didn’t as much information from the session as I had hoped I would.

After lunch, I returned to the Moscone Center for a session titled “NFS and iSCSI - Performance Characterization and Best Practices.”  I was really hoping to get some additional best practices on using NFS and iSCSI and on maximizing performance with these IP-based storage solutions.

The session started with some performance characteristics with ESX Server today vs. ESX Server 3.0.1; basically, it was an update of some performance data presented last year at VMworld 2006.

These updated performance statistics are intended to show the results of optimizations that have been incorporated into ESX Server.  This includes optimizations like improved and more accurate CPU accounting (this improves load balancing across VMs), improved PAE support, minimized NUMA overhead, improved CPU cost per I/O, increased maximum transfer sizes, and the ability in handle more concurrent I/Os.

As a result, software iSCSI sees a range of improvements since ESX Server 3.0.1, as high as 15% for 8K block writes, with reductions in latency across the board and reducions in CPU utilization as well.  Read operations will show the greatest improvements.

Hardware iSCSI sees dramatic improvements in smaller block sizes, but the larger block sizes are essentially unchanged.  The same goes for latency, and the reductions in CPU utilization for hardware iSCSI shares the same characteristics as for software iSCSI (but keep in mind that the absolute change—as opposed to percentage change—will be greater for software iSCSI).  With hardware iSCSI, mixed read-write operations will benefit more than just read options.

Differences between VMFS and RDM (raw device mapping) are inconsequential (less than 2.5%); the only significant difference is CPU utilization, where VMFS requires more CPU time than RDM.

Comparisons of hardware iSCSI, software iSCSI, and NFS with regards to throughput show figures that are not entirely unexpected.  NFS is slightly slower than both flavors of iSCSI, and has greater latency than the iSCSI flavors.  However, all of the measured figures were in milliseconds, so it’s not terribly significant.

Moving into performance best practices, the presenter started with the storage array itself, and provided the typical list of items to consider:  total spindle count, number of spindles allocated for use, RAID level and stripe size, storage processor specifications, read/write cache sizes, and caching policies.  This is all pretty standard information that is applicable in sizing a correct storage solution, independent of a virtualization implementation.  (I would use those same counters to size a Microsoft Exchange storage solution or an Oracle storage solution, for example.)

Since we are talking IP-based storage, networking configuration comes to play here, including such things as the network topology, switches, NICs, flavor of iSCSI (hardware/software).  Similarly, things about the ESX Server host like CPU speed and number of CPU cores, overall system architecture, bus speed, I/O subsystems, and memory configuration all play a part in determining performance of IP-based storage solutions.

Finally, it’s important to understand the characteristics of the workload(s), such as I/O sizes, read/write patterns, and dependence upon aggregate throughput or latency.

<aside>OK, can we move past this stuff now?  This is all basic stuff that isn’t necessarily specific in any way to virtualization.  I want to see best practices for using IP-based storage with VMware!</aside>

To increase the overall throughput, using multiple NFS mount points may improve aggregate throughput as the cost of slightly higher CPU cost.  NFS export options can affect performance as well.  (OK, which options?  Telling us that without telling us which options is kind of like leaving us hanging.)

iSCSI digests may or may not have an impact on performance; iSCSI header digests have little or no impact; turning off iSCSI data digests can improve performance.

The presenter went over some additional troubleshooting tips, and the slide briefly mentioned the vsish command.  I hadn’t heard of that command; anyone know of where I can find some additional information on vsish?

Wrapping up the session, the presenter went through a few scenarios involving performance troubleshooting with both iSCSI and NFS.  Overall, I did not find this session to be nearly as helpful as I had hoped it would be, the presenter was not engaging, and the presentation did not provide the kind of detailed information that I felt should have been included.  (Examples: mentioning that some NFS export options affect performance, but failing to mention which options, or stating that there is a VMware knowledge base article about a topic but failing to provide the KB article number or URL).

Tags: , , , , , ,

« Older entries