NFS

You are currently browsing articles tagged NFS.

By Aaron Delp
Twitter: aarondelp
FriendFeed (Delicious, Twitter, & all my blogs in one spot): aarondelp

This week I’ve had the privilege of attending a Cisco Nexus 5000/7000 class. I have learned a tremendous amount about FCoE this week and after some conversations with Scott about the topic, I wanted to tackle it one more time from a different point of view. I have included a list of some of Scott’s FCoE articles at the bottom for those interested in a more in-depth analysis.

Disclaimer: I am by no means an FCoE expert! My working knowledge of FCoE is about four days old at this point. If I am incorrect in some of my situations below, please let me know (keep it nice and professional, people!) and I will be happy to make adjustments.

If you are an existing VMWare customer today with FC as your storage transport layer, should you be thinking about FCoE? How would you get started? What investments can you make in the near future to prepare you for the next generation?

These are questions I am starting to get from my customers in planning/white board sessions around VMware vSphere and the next generation of virtualization. The upgrade to vSphere is starting to prompt planning and discussions around the storage infrastructure.

Before I tackle the design aspect, let me start out with some hardware and definitions.

Cisco Nexus 5000 series switch: The Nexus 5K is a Layer 2 switch that is capable of FCoE and can provide both Ethernet and FC ports (with an expansion module). In addition to Ethernet switching, the switch also operates as an FC fabric switch providing full fabric services, or it can be set for N_Port Virtualization (NPV) mode. The Nexus 5K can’t be put in FC switch mode and NPV mode at the same time. You must pick one or the other.

N_Port Virtualization (NPV) mode: NPV allows the Nexus 5K to act as a FC “pass thru” or proxy. NPV is great for environments where the existing fabric is not Cisco and merging the fabrics could be ugly. There is a downside to this. In NPV mode, no targets (storage arrays) can be hung off the Nexus 5K. This is true for both FC and FCoE targets.

Converged Network Adapter (CNA): A CNA is single PCI card that contains both FC and Ethernet logic, negating the need for separate cards, separate switches, etc.

Now that the definitions and terminology is out of the way, I see four possible paths if you have FC in your environment today.

1. FCoE with a Nexus 5000 in a non-Cisco MDS environment (merging)

In this scenario, the easiest way to get the Nexus on the existing non-Cisco FC fabric is to put the switch in NPV mode. You could put the switch in interop mode (and all the existing FC switches), but it is a nightmare to get them all talking and you often lose vendor specific features in interop mode. Plus, to configure interop mode, the entire fabric has to be brought down. (You do have redundant fabrics, right?)

With the Nexus in NPV mode, what will it do? Not much. You can’t hang storage off of it. You aren’t taking advantage of Cisco VSANs or any other features that Cisco can provide. You are merely a pass thru. The zoning is handled by your existing switches; your storage is off the existing switches, etc.

Why would you do this? By doing this, you could put CNAs in new servers (leaving the existing servers alone) to talk to the Nexus. This will simplify the server side infrastructure because you will have fewer cables, cards, switch ports, etc. Does the cost of the CNA and new infrastructure offset the cost of just continuing the old environment? That is for you to decide.

2. FCoE with a Nexus 5000 in a non-Cisco MDS environment (non-merging)

Who says you have to put the Nexus into the existing FC fabric? We have many customers that purchase “data centers in a box”. By that I mean a few servers, FC and/or Ethernet switches, storage, and VMware all in one solution. This “box” sits in the data center and the network is merged with the legacy network, but we stand up a Cisco SAN next to the existing non-Cisco SAN and just not let them talk to each other. In this instance, we would use CNAs in the servers, Nexus as the switch, and you pick a storage vendor. This will work just like option 3.

3. FCoE with a Nexus 5000 in a Cisco MDS environment

Now we’re talking. Install the Nexus in FC switch mode, merge it with the MDS fabric, put CNAs in all the servers and install the storage off the Nexus as either FC or FCoE. You’re off to the races!

You potentially gain the same server side savings by replacing FC and Ethernet in new servers with CNAs. You are able to use all of the Cisco sexy features of FCoE. Nice solution if the cost is justified in your environment.

4. Keep the existing environment and use NFS to new servers

What did I just say? Why would I even consider that option?

OK, this last one is a little tongue-in-cheek for customers that are already using FC. The NFS vs. traditional storage for VMWare is a bit of a religious debate. I know you aren’t going to sway me and I know I’m not going to sway you.

I admit I’m thinking NetApp here in a VMWare environment; I’m a big fan so this is a biased opinion. NetApp is my background but other vendors play in this space as well. I bet Chad will be happy to leave a comment to help tell us why (and I hope he does!).

Think of it this way. You’re already moving from FC cards to CNAs. Why not buy regular 10Gb Ethernet cards instead? Why not just use the Nexus 5K as a line-rate, non-blocking 10Gb Ethernet switch? This configuration is very simple compared to FCoE at the Nexus level and management of the NetApp is very easy! Besides, you could always turn up FCoE on the Nexus (and the NetApp) at a future date.

In closing, I really like FCoE but as you can see it isn’t a perfect fit today for all environments. I really see this taking off in 1-2 years and I can’t wait. Until then, use caution and ask all the right questions!

If you are interested in some more in-depth discussion, here are links to some of Scott’s articles on FCoE:

Continuing the FCoE Discussion
Why No Multi-Hop FCoE?
There Might Be an FCoE End to End Solution

Tags: , , , , , , , ,

Author’s Note: This content was first published over at Storage Monkeys, but it appears that it has since disappeared and is no longer available. For that reason, I’m republishing it here (with minor edits). Where applicable, I’ll also be republishing other old content from that site in the coming weeks. Thanks!

In this article, I’m going to tackle what will probably be a sensitive topic for some readers: VMware over NFS. All across the Internet, I run into article after article after article that sings the praises of NFS for VMware. Consider some of the following examples:

That first link looks to be mostly a reprint of this blog post by Nick Triantos. Now, Nick is a solid storage engineer; there is no question in my mind that he knows Fibre Channel, iSCSI, and NFS inside out. Nick is certainly someone who is more than qualified to speak to the validity of using NFS for VMware storage. But…

I am going to have to disagree with some of the statements that are being propagated about NFS for VMware storage. Is NFS for VMware environments a valid choice? Yes, absolutely. However, there are some myths about NFS for VMware storage that need to be addressed.

  1. Myth #1: All VMDKs are thin provisioned by default with NFS, and that saves significant amounts of storage space. That’s true—to a certain point. What I pointed out back in March of 2008, though, was that these VMDKs are only thin provisioned at the beginning. What does that mean? Perform a Storage VMotion operation to move those VMDKs from one NFS datastore to a different NFS datastore, and the VMDK will inflate to become a thick provisioned file. Clone another VM from the VM with the thin provisioned disks, and you’ll find that the cloned VM has thick VMDKs. That’s right—the only way to get those thin provisioned VMDKs is to create all your VMs from scratch. Is that what you really want to do? (Note: VMware vSphere now supports thin provisioned VMDKs on all storage platforms, and corrects the issues with thin provisioned VMDKs inflating due to a Storage VMotion or cloning operation, so this point is somewhat dated.)
  2. Myth #2: NFS uses Ethernet as the transport, so I can just add more network connections to scale the bandwidth. Well, not exactly. Yes, it is possible to add Ethernet links and get more bandwidth. However, you’ll have to deal with a whole list of issues: link aggregation/802.3ad, physical switch redundancy (which is further complicated when you want to use link aggregation/802.3ad), multiple IP addresses on the NFS server(s), multiple VMkernel ports on the VMware ESX servers, and multiple IP subnets. Let’s just say that scaling NFS bandwidth with VMware ESX isn’t as straightforward as it may seem. This article I wrote back in July of 2008 may help shed some light on the particulars that are involved when it comes to ESX and NIC utilization.
  3. Myth #3: Performance over NFS is better than Fibre Channel or iSCSI. Based on this technical report by NetApp—no doubt one of the biggest proponents of NFS for VMware storage—NFS performance trails Fibre Channel, although by less than 10%. So, performance is comparable in almost all cases, and the difference is small enough not to be noticeable. The numbers do not, however, indicate that NFS is better than Fibre Channel. You can read my views on this storage protocol comparison at my site. By the way, also check the comments; you’ll see that the results in the technical report were independently verified by VMware as well. Based on this information, someone could certainly say that NFS performance is perfectly reasonable, but one could not say that NFS performance is better than Fibre Channel.

Now, one might look at this article and say, “Scott, you hate NFS!” No, actually, I like using NFS for VMware Infrastructure implementations, and here’s why:

  • Provisioning is a breeze. It’s dead simple to add NFS datastores.
  • You can easily (depending upon the storage platform) increase or decrease the size of NFS datastores. Try decreasing the size of a VMFS datastore and see what happens!
  • You don’t need to deal with the complexity of a Fibre Channel fabric, switches, WWNs, zones, ISLs, and all that. Now, there is some complexity involved (see Myth #2 above), but it’s generally easier than Fibre Channel. Unless you’re a Fibre Channel expert, of course…

So there are some tangible benefits to using NFS for VMware Infrastructure. But let’s be real about this, and not try to gloss over technical details. While NFS has some real advantages, it also has some real disadvantages, and organizations choosing a storage protocol need to understand both sides of the coin.

Tags: , , ,

In April 2008, I wrote an article on how to use jumbo frames with VMware ESX and IP-based storage (NFS or iSCSI). It’s been a pretty popular post, ranking right up there with the ever-popular article on VMware ESX, NIC teaming, and VLAN trunks.

Since I started working with VMware vSphere (now officially available as of 5/21/2009), I have been evaluating how to replicate the same sort of setup using ESX/ESXi 4.0. For the most part, the configuration of VMkernel ports to use jumbo frames on ESX/ESXi 4.0 is much the same as with previous versions of ESX and ESXi, with one significant exception: the vNetwork Distributed Switch (vDS, what I’ll call a dvSwitch). After a fair amount of testing, I’m pleased to present some instructions on how to configure VMkernel ports for jumbo frames on a dvSwitch.

How I Tested

The lab configuration for this testing was pretty straightforward:

  • For the physical server hardware, I used a group of HP ProLiant DL385 G2 servers with dual-core AMD Opteron processors and a quad-port PCIe Intel Gigabit Ethernet NIC.
  • All the HP ProLiant DL385 G2 servers were running the GA builds of ESX 4.0, managed by a separate physical server running the GA build of vCenter Server.
  • The ESX servers participated in a DRS/HA cluster and a single dvSwitch. The dvSwitch was configured for 4 uplinks. All other settings on the dvSwitch were left at the defaults.
  • For the physical switch infrastructure, I used a Cisco Catalyst 3560G running Cisco IOS version 12.2(25)SEB4.
  • For the storage system, I used an older NetApp FAS940. The FAS940 was running Data ONTAP 7.2.4.

Keep in mind that these procedures or commands may be different in your environment, so plan accordingly.

Physical Network Configuration

Refer back to my first article on jumbo frames to review the Cisco IOS commands for configuring the physical switch to support jumbo frames. Once the physical switch is ready to support jumbo frames, you can proceed with configuring the virtual environment.

Virtual Network Configuration

The virtual network configuration consists of several steps. First, you must configure the dvSwitch to support jumbo frames by increasing the MTU. Second, you must create a distributed virtual port group (dvPort group) on the dvSwitch. Finally, you must create the VMkernel ports with the correct MTU. Each of these steps is explained in more detail below.

Setting the MTU on the dvSwitch

Setting the MTU on the dvSwitch is pretty straightforward:

  1. In the vSphere Client, navigate to the Networking inventory view (select View > Inventory > Networking from the menu).
  2. Right-click on the dvSwitch and select Edit Settings.
  3. From the Properties tab, select Advanced.
  4. Set the MTU to 9000.
  5. Click OK.

That’s it! Now, if only the rest of the process was this easy…

By the way, this same area is also where you can enable Cisco Discovery Protocol support for the dvSwitch, as I pointed out in this recent article.

Creating the dvPort Group

Like setting the MTU on the dvSwitch, this process is pretty straightforward and easily accomplished using the vSphere Client:

  1. In the vSphere Client, navigate to the Networking inventory view (select View > Inventory > Networking from the menu).
  2. Right-click on the dvSwitch and select New Port Group.
  3. Set the name of the new dvPort group.
  4. Set the number of ports for the new dvPort group.
  5. In the vast majority of instances, you’ll want to set VLAN Type to VLAN and then set the VLAN ID accordingly. (This is the same as setting the VLAN ID for a port group on a vSwitch.)
  6. Click Next.
  7. Click Finish.

See? I told you it was pretty straightforward. Now on to the final step which, unfortunately, won’t be quite so straightforward or easy.

Creating a VMkernel Port With Jumbo Frames

Now things get a bit more interesting. As of the GA code, the vSphere Client UI still does not expose an MTU setting for VMkernel ports, so we are still relegated to using the esxcfg-vswitch command (or the vicfg-vswitch command in the vSphere Management Assistant—or vMA—if you are using ESXi). The wrinkle comes in the fact that we want to create a VMkernel port attached to a dvPort ID, which is a bit more complicated than simply creating a VMkernel port attached to a local vSwitch.

Disclaimer: There may be an easier way than the process I describe here. If there is, please feel free to post it in the comments or shoot me an e-mail.

First, you’ll need to prepare yourself. Open the vSphere Client and navigate to the Hosts and Clusters inventory view. At the same time, open an SSH session to one of the hosts you’ll be configuring, and use “su -” to assume root privileges. (You’re not logging in remotely as root, are you?) If you are using ESXi, then obviously you’d want to open a session to your vMA and be prepared to run the commands there. I’ll assume you’re working with ESX.

This is a two-step process. You’ll need to repeat this process for each VMkernel port that you want to create with jumbo frame support.

Here are the steps to create a jumbo frames-enabled VMkernel port:

  1. Select the host and and go the Configuration tab.
  2. Select Networking and change the view to Distributed Virtual Switch.
  3. Click the Manage Virtual Adapters link.
  4. In the Manage Virtual Adapters dialog box, click the Add link.
  5. Select New Virtual Adapter, then click Next.
  6. Select VMkernel, then click Next.
  7. Select the appropriate port group, then click Next.
  8. Provide the appropriate IP addressing information and click Next when you are finished.
  9. Click Finish. This returns you to the Manage Virtual Adapters dialog box.

From this point on you’ll go the rest of the way from the command line. However, leave the Manage Virtual Adapters dialog box open and the vSphere Client running.

To finish the process from the command line:

  1. Type the following command (that’s a lowercase L) to show the current virtual switching configuration:
    esxcfg-vswitch -l
    At the bottom of the listing you will see the dvPort IDs listed. Make a note of the dvPort ID for the VMkernel port you just created using the vSphere Client. It will be a larger number, like 266 or 139.
  2. Delete the VMkernel port you just created:
    esxcfg-vmknic -d <dvPort ID>
  3. Recreate the VMkernel port and attach it to the very same dvPort ID:
    esxcfg-vmknic -a -i <IP addr> -n <Mask> -m 9000 <dvPort ID>
  4. Use the esxcfg-vswitch command again to verify that a new VMkernel port has been created and attached to the same dvPort ID as the original VMkernel port.

At this point, you can go back into the vSphere Client and enable the VMkernel port for VMotion or FT logging. I’ve tested jumbo frames using VMotion and everything is fine; I haven’t tested FT logging with jumbo frames as I don’t have FT-compatible CPUs. (Anyone care to donate some?)

As I mentioned in yesterday’s Twitter post, I haven’t conducted any objective performance tests yet, so don’t ask. I can say that NFS feels faster with jumbo frames than without, but that’s purely subjective.

Let me know if you have any questions or if anyone finds a faster or easier way to accomplish this task.

Tags: , , , , , , , , ,

This session describes NetApp’s MultiStore functionality. MultiStore is the name given to NetApp’s functionality for secure logical partitioning of network and storage resources. The presenters for the session are Roger Weeks, TME with NetApp, and Scott Gelb with Insight Investments.

When using MultiStore, the basic building block is the vFiler. A vFiler is a logical construct within Data ONTAP that contains a lightweight instance of the Data ONTAP multi-protocol server. vFilers provide the ability to securely partition both storage resources and network resources. Storage resources are partitioned at either the FlexVol or Qtree level; it’s recommended to use FlexVols instead of Qtrees. (The presenters did not provide any further information beyond that recommendation. Do any readers have more information?) On the network side, the resources that can be logically partitioned are IP addresses, VLANs, VIFs, and IPspaces (logical routing tables).

Some reasons to use vFilers would include storage consolidation, seamless data migration, simple disaster recovery, or better workload management. MultiStore integrates with SnapMirror to provide some of the functionality needed for some of these use cases.

MultiStore uses vFiler0 to denote the physical hardware, and vFiler0 “owns” all the physical storage resources. You can create up to 64 vFiler instances, and active/active clustered configurations can support up to 130 vFiler instances (128 vFilers plus 2 vFiler0 instances) during a takeover scenario.

Each vFiler stores its configuration in a separate FlexVol (it’s own root vol, if you will). All the major protocols are supported within a vFiler context: NFS, CIFS, iSCSI, HTTP, and NDMP. Fibre Channel is not supported; you can only use Fibre Channel with vFiler0. This is due to the lack of NPIV support within Data ONTAP 7. (It’s theoretically possible, then, that if/when NetApp adds NPIV support to Data ONTAP that Fibre Channel would be supported within vFiler instances.)

Although it is possible to move resources between vFiler0 and a separate vFiler instance, doing so may impact client connections.

Managing vFilers appears to be the current weak spot; you can manage vFiler instances using the Data ONTAP CLI, but vFiler instances don’t have an interactive shell. Therefore, you have to direct commands to vFiler instances via SSH or RSH or using the vFiler context in vFiler0. You access the vFiler context by prepending the “vfiler” keyword to the commands at the CLI in vFiler0. Operations Manager 3.7 and Provisioning Manager can manage vFiler instances; FilerView can start, stop, or delete individual vFiler instances but cannot direct commands to an individual vFiler. If you need to manage CIFS on a vFiler instance, you can use the Computer Management MMC console to connect remotely to that vFiler instance to manage shares and share permissions, just as you can with vFiler0 (assuming CIFS is running within the vFiler, of course).

IPspaces are a logical routing construct that allow each vFiler to have its own routing table. For example, you may have a DMZ vFiler and an internal vFiler, each with their own, separate routing table. Up to 101 IPspaces are supported per controller. You can’t delete the default IPspace, as it’s the routing table for vFiler0. It is recommended to use VLANs and/or VIFs with IPspaces as a best practice.

One of the real advantages of using MultiStore and vFilers is the data migration and disaster recovery functionality that it enables when used in conjunction with SnapMirror. There are two sides to this:

  • “vfiler migrate” allows you to move an entire vFiler instance, including all data and configuration, from one physical storage system to another physical storage system. You can keep the same IP address or change the IP address. All other network identification remains the same: NetBIOS name, host name, etc., so the vFiler should look exactly the same across the network after the migration as it did before the migration.
  • “vfiler dr” is similar to “vfiler migrate” but uses SnapMirror to keep the source and target vFiler instances in sync with each other.

It makes sense, but you can’t use “vfiler dr” or “vfiler migrate” on vFiler0 (the physical storage system). My own personal thought regarding “vfiler dr”: what would this look like in a VMware environment using NFS? There could be some interesting possibilities there.

With regard to security, a Matasano security audit was performed and the results showed that there were no vulnerabilities that would allow “data leakage” between vFiler instances. This means that it’s OK to run a DMZ vFiler and an internal vFiler on the same physical system; the separation is strong enough.

Other points of interest:

  • Each vFiler adds about 400K of system memory, so keep that in mind when creating additional vFiler instances.
  • You can’t put more load on a MultiStore-enabled system than a non-MultiStore-enabled system. The ability to create logical vFilers doesn’t mean the physical storage system can suddenly handle more IOPS or more capacity.
  • You can use FlexShare on a MultiStore-enabled system to adjust priorities for the FlexVols assigned to various vFiler instances.
  • As of Data ONTAP 7.2, SnapMirror relationships created in a vFiler context are preserved during a “vfiler migrate” or “vfiler dr” operation.
  • More enhancements are planned for Data ONTAP 7.3, including deduplication support, SnapDrive 5.0 or higher support for iSCSI with vFiler instances, SnapVault additions, and SnapLock support.

Some of the potential use cases for MultiStore include file services consolidation (allows you to preserve file server identification onto separate vFiler instances), data migration, and disaster recovery. You might also use MultiStore if you needed support for multiple Active Directory domains with CIFS.

UPDATE: Apparently, my recollection of the presenters’ information was incorrect, and FTP is not a protocol supported with vFilers. I’ve updated the article accordingly.

Tags: , , , , , , , , , , , ,

NetApp has recently released TR-3747, Best Practices for File System Alignment in Virtual Environments. This document addresses the situations in which file system alignment is necessary in environments running VMware ESX/ESXi, Microsoft Hyper-V, and Citrix XenServer. The authors are Abhinav Joshi (he delivered the Hyper-V deep dive at Insight last year), Eric Forgette (wrote the Rapid Cloning Utility, I believe), and Peter Learmonth (a well-recognized name from the Toasters mailing list), so you know there’s quite a bit of knowledge and experience baked into this document.

There are a couple of nice tidbits of information in here. For example, I liked the information on using fdisk to set the alignment of a guest VMDK from the ESX Service Console; that’s a pretty handy trick! I also thought the tables which described the different levels at which misalignment could occur were quite useful. (To be honest, though, it took me a couple of times reading through that section to understand what information the authors were trying to deliver.)

Anyway, if you’re looking for more information on storage alignment, the different levels at which it may occur, and the methods used to fix it at each of these levels, this is an excellent resource that I strongly recommend reading. Does anyone have any pointers to similar documents from other storage vendors?

Tags: , , , , , , , , , ,

Welcome to Virtualization Short Take #25, the first edition of this series for 2009! Here I’ve collected a variety of articles and posts that I found interesting or useful. Enjoy!

  • We’ll start off today’s list with some Hyper-V links. First up is this article on how to manually add a VM configuration to Hyper-V. It would be interesting to me to know some of the technical details—i.e., the design decisions that led Microsoft to architect things in this way—that might explain why this process is, in my opinion, so complicated. Was it scalability? Manageability? If anyone knows, please share your information in the comments.
  • It looks like this post by John Howard on how to resolve event ID 4096 with Hyper-V is also closely related.
  • This blog post brings to light a clause in Microsoft’s licensing policy that forces organizations to use Windows Server 2008 CALs when accessing a Windows Server 2003-based VM hosted on Hyper-V. In the spirit of disclosure, it’s important to note that this was written by VMware, but an independent organization apparently verified the licensing requirements. So, while you may get Hyper-V at no additional cost (not free) with Windows Server 2008, you’ll have to pay to upgrade your CALs to Windows Server 2008 in order to access any Windows Server 2003-based VMs on those Hyper-V hosts. Ouch.
  • Wrapping up this edition’s Microsoft virtualization coverage is this post by Ben Armstrong warning Hyper-V users about the use of physical disks with VMs. Apparently, it’s possible to connect a physical disk to both the Hyper-V parent partition as well as a guest VM, and…well, bad things can happen when you do that. The unfortunate part is that Hyper-V doesn’t block users from doing this very thing.
  • Daniel Feller asks the question, “Am I the only one who has trouble understanding Cloud Computing?” No, Daniel, you’re not the only one—I’ve written before about how amorphous and undefined cloud computing is. In this post over at the Citrix Community site, Daniel goes on to indicate that cloud computing’s undefined nature is actually its greatest strength:
     

    As I see it, Cloud Computing is a big white board waiting for organizations to make their requirements known. Do you want a Test/QA environment to do whatever? This is cloud computing. Do you want someone to deliver office productivity applications for you? That is cloud computing. Do you want to have all of your MP3s stored on an Internet storage repository so you can get to it from any device? That is also cloud computing.

    Daniel may be right there, but I still insist that there need to be well-defined and well-understood standards around cloud computing in order for cloud computing to really see broad adoption. Perhaps cloud computing is storing my MP3s on the Internet, but what happens when I want to move to a different MP3 storage provider? Without standards, that becomes quite difficult, perhaps even impossible. I’m not the only one who thinks this way, either; check this post by Geva Perry. Until some substance appears in all these clouds, people are going to hold off.

  • Rodney Haywood shared a useful command to use with VMware HA in this post about blades and VMware HA. He points out that it’s a good idea to spread VMware HA primary nodes across multiple blade chassis so that the failure of a single chassis does not take down all the primary nodes. One note about the using the “ftcli” command is that you’ll need to set the FT_DIR environment variable first using “export FT_DIR=/opt/vmware/aam” (assuming you’re using bash as the shell on VMware ESX). Otherwise, the advice to spread clusters across chassis as well as to ensure that primary agents are spread across chassis is advice that should be followed.
  • Joshua Townsend has a good post at VMtoday.com about using PowerShell and SQL queries to determine the amount of free space within guest VMs. As he states in his post, this can often impact the storage design significantly. It seems to me that there used to be a plug-in for vCenter that added this information, but I must be mistaken as I can no longer find it. Oh, and one of Eric Siebert’s top 10 lists also points out a free utility that will provide this information as well.
  • I don’t have a record of where this information turned up, but this article from NetApp (NOW login required) on troubleshooting NFS performance was quite helpful. In particular, it linked to this VMware KB article that provides in-depth information on how to identify IRQ sharing that’s occurring between the Service Console and the VMkernel. Good stuff.
  • Want more information on scaling a VMware View installation? Greg Lato posts a notice about the VMware View Reference Architecture Kit, available from VMware, that provides more information on some basic “building blocks” in creating a large-scale View implementation. I’ve only had the opportunity to skim through the documents thus far, but I like what I’ve seen thus far. Chad also mentions the Reference Architecture Kit on his site as well.
  • Duncan at Yellow Bricks posts yet another useful “in the trenches” post about VMFS-3 heap size. If your VMware ESX server is handling more than 4TB of open VMDK files, then it’s worth having a look at this VMware KB article.
  • The idea of “virtual routing” is an interesting idea, but I share the thoughts of one of the commenters in that technologies like VMotion/XenMotion/live migration may not be able to respond quickly enough to changing network patterns to be effective. Perhaps it’s just my server-centric view showing itself, but it seems more “costly” (in terms of effort) to move servers around to match traffic flow than to just route the traffic accordingly.
  • CrossBow looks quite cool, but I’m having a hard time understanding the real business value. I am quite confident that my lack of understanding about CrossBow is simply a reflection of the fact that I don’t know enough about Solaris Containers or how Xen handles networking, but can someone help me better understand? What is the huge deal with Crossbow?
  • Jason Boche shares some information with us about how to increase the number of simultaneous VMotion operations per host. That information could be quite handy in some cases.
  • I had high hopes for this document on VMFS best practices, but it fell short of my hopes. I was looking for hard guidelines on when to use isolation vs. consolidation, strong recommendations on VMFS volume sizes and the number of VMs to host in a VMFS volume, etc. Instead, I got an overview of what VMFS is and how it works—not what I needed.
  • Users interested in getting started with PowerShell with VMware Infrastructure should have a look at this article by Scott Herold. It’s an excellent place to start.
  • Here’s a list of some of the basic things you should do on a “golden master” template for Windows Server VMs. I actually disagree with #15, preferring instead to let Windows manage the time at the guest OS level. The only other thing I’d add: be sure your VMDK is aligned to the underlying storage. Otherwise, this is a great checklist to follow.

I think that should just about do it for this post. Comments are welcome!

Tags: , , , , , , , , , , ,

I was working on some documentation this afternoon that centered around the use of esxcfg-advcfg to set some NFS-related parameters in VMware ESX. While verifying the command syntax, I noticed something interesting about the esxcfg-advcfg command: it’s only partially case-sensitive.

OK, this isn’t world-shattering news or a discovery of paramount importance, but I do find this a bit odd given that with the Linux-based Service Console most everything should be entirely case-sensitive. However, I found that with esxcfg-advcfg only part of the command needs to be properly capitalized.

Consider these examples. If I wanted to increase the maximum number of NFS datastores on VMware ESX, I would use this command:

esxcfg-advcfg -s 32 /NFS/MaxVolumes

This command will also work (note the capitalization, or lack thereof):

esxcfg-advcfg -s 32 /NFS/maxvolumes

But this command won’t work; it will report that it can’t find the parameter:

esxcfg-advcfg -s 32 /nfs/maxvolumes

I noticed the same behavior with a couple of other settings, so I’m reasonably confident that it’s not just this one setting. It appears that esxcfg-advcfg only cares if the first portion of the path is properly capitalized, and the rest of the parameter is case insensitive.

Interesting, no?

Tags: , , , ,

Virtualization Short Take #24

There’s lots of good information flowing around the Internet, and it’s becoming increasingly difficult to sort through all the useless stuff to find the valuable gems. Hopefully, some of the links that I have collected here will prove to be more useful than useless!

  • I came across this VMware KB article titled “Dedicating specific NICs to portgroups while maintaining NIC teaming and failover for the vSwitch”. I was hoping it would shed new light on some NIC teaming functionality. Unfortunately, it was only about overriding the default vSwitch failover policy on a per-portgroup basis. I was already well aware of that functionality and use it quite extensively in my VMware designs, but for others that may prove useful.
  • This video about VMware DPM sparked some debate about spin-up/spin-down affecting drive MTBF and decreasing a VMware ESX server’s operational lifecycle. Chad Sakac of EMC shared some findings from EMC regarding spin-up/spin-down in this post and came to the conclusion that using VMware DPM should not materially affect the reliability or lifetime of servers (at least with regards to drive failures). Personally, I tend to agree that this was FUD, most likely from a competitor, but it’s best to get this sort of thing out in the open and debunked.
  • Leo posted a brief snippet of code to upgrade the VMware Tools on VMs without a reboot. It looks like it might come in handy. And Leo’s guide to configuring jumbo frames with an EMC AX4-5i is quite useful, too—it’s a nice counterpoint to my own guide to configuring jumbo frames.
  • Tomas ten Dam has completed his guide to building a complete “SRM in a Box” setup using the NetApp Data ONTAP Simulator. Of course, Chad wants him to use the Celerra VM…
  • Oh, and while we’re talking VMware SRM, be sure to check out Mike Laverick’s book on VMware SRM, “Administering VMware Site Recovery Manager 1.0″. I haven’t read the book yet, but knowing Mike I’m sure it’s good quality stuff. Maybe Santa will give me a copy for Christmas.
  • Sven H. over at VirtualFuture.info posted a good guide on using thin provisioned VMDKs with VMware ESX 3.5 via the vmkfstools command. (I was going to include a trackback to Sven’s post, but his blog theme doesn’t show the trackback URL.) Seems like I saw somewhere that thin provisioned VMDKs in ESX 3.5 are still unsupported, so deploy accordingly.
  • Via Tony Soper, I found that version 2 of Microsoft’s Offline Virtual Machine Servicing Tool is available. I first discussed the Offline Virtual Machine Servicing Tool back in June during Tech-Ed 2008. You can download the tool here.
  • Also from Tony, here’s a great article on how to balance VM I/O with Hyper-V. An interesting tidbit from this: by default, I/O balancing is enabled for storage, but not for networking. I can see it needing to be enabled for storage, but why disabled by default for networking?
  • More information on controlling resource utilization within Hyper-V is provided in this article by Robert Larson. It’s worth having a quick look if you are unsure how to configure it or how it works.
  • Ben Armstrong answers the question, “Why does it take so long to create a fixed size virtual hard disk?” The answer: the disk space is zeroed out in advance. My question is this: is this need to zero out the disk space a result of how NTFS deletes files or is this scenario applicable to VMFS as well?
  • This has probably been mentioned before, but users considering virtualizing their Active Directory domain controllers should keep these considerations in mind.
  • I recently ran into a situation where we need to change the IP address of an NFS datastore. (It’s a long story as to how this came about.) In any case, I told the customer that I couldn’t be sure that changing the IP address wouldn’t cause problems. Fortunately, before the customer tried it, I found this post by Rick Scherer. The short story: it doesn’t work, and you shouldn’t do it. Create a new datastore with the correct IP address and use Storage VMotion instead.
  • For even more information on Storage VMotion, also check out Chad’s post here.
  • VMwarewolf continues his Resolution Path series with common fault issues in VMware Infrastructure. Good stuff.

It’s clearly been too long since I published one of these, as I still have other links collecting dust in my “link bin”:

Third Brigade offers free security for up to 100 virtual machines
Version 4 of the PowerVDI tool
Go Daddy Wildcard Certificate with VI3
New VMware VI network port diagram request for comments
Auditing ESX root logins with email…

Like I said, there’s just so much information! And now that I’m trying to delve deeper into the storage realm, that’s only doubled up on the information I’m trying to manage. Hopefully I’ve picked out a few gems for you this week. Thanks for reading!

Tags: , , , , , , , , ,

Many thanks to Dave Lawrence, aka VMguy, for the heads-up: Update 1 for VMware Site Recovery Manager 1.0 has been released. Missing in this update: NFS support. I mention that only because I heard numerous references to NFS Support in SRM 1.0U1 at NetApp Insight a few weeks ago. Of course, they also mentioned a March 2009 timeframe, so I was a bit surprised when I saw that Update 1 had been released.

However, there are plenty of other goodies besides NFS support available in Update 1 of SRM 1.0:

  • SRM has separated the permission to test a recovery plan and to actually run a recovery plan. This allows more junior admins to test the plan, but reserve actually running the plan for a more senior admin, for example.
  • RDM support, which enables failover for MSCS clusters
  • Batch IP property customization

And more, but for the all the details, see Dave’s post or the Release Notes!

Tags: , , ,

This session provided information on enhancements to NetApp’s cloning functionality. These enhancements are due to be released along with Data ONTAP 7.3.1, which is expected out in December. Of course, that date may shift, but that’s the expected release timeframe.

The key focus of the session was new functionality that allows for Data ONTAP to clone individual files without a backing Snapshot. This new functionality is an extension of NetApp’s deduplication functionality, and is enabled by changes within Data ONTAP that enable block sharing, i.e., the ability for a single block to appear in multiple files or in multiple places in the same file. The number of times these blocks appear is tracked using a reference count. The actual reference count is always 1 less than the number of times the block appears. A block which is not shared has no reference count; a block that is shared in two locations has a reference count of 1. The maximum reference count is 255, so that means a single block is allowed to be shared up to 256 times within a single FlexVol. Unfortunately, there’s no way to view the reference count currently, as it’s stored in the WAFL metadata.

As with other cloning technologies, the only space that is required is for incremental changes from the base. (There is small overhead for metadata as well.) This functionality is going to be incorporated into the FlexClone license and will likely be referred to as “file-level FlexClone”. I suppose that cloning volumes with be referred to as “volume FlexClone” or similar.

This functionality will be command-line driven, but only from advanced mode (must do a “priv set adv” in order to access the commands). The commands are described below.

To clone a file or a LUN (the command is the same in both cases):

clone start <src_path> <dst_path> -n -l

To check the status of a cloning process or stop a cloning process, respectively:

clone status
clone stop

Existing commands for Snapshot-backed clones (”lun clone” or “vol clone”) will remain unchanged.

File-level cloning will integrate with Volume SnapMirror (VSM) without any problems; the destination will be an exact copy of the source, including clones. Not so for Qtree SnapMirror (QSM) and SnapVault, which will re-inflate the clones to full size. Users will need to run deduplication on the destination to try to regain the space. Dump/restores will work like QSM or SnapVault.

Now for the limitations, caveats and the gotchas:

  • Users can’t run single-file SnapRestore and a clone process at the same time.
  • Users can’t clone a file or a LUN that exists only in a Snapshot. The file or LUN must exist in the active file system.
  • ACLs and streams are not cloned.
  • The “clone” command does not work in a vFiler context.
  • Users can’t use synchronous SnapMirror with a volume that contains cloned files or LUNs.
  • Volume SnapRestore cannot run while cloning is in progress.
  • SnapDrive does not currently support this method of cloning. It’s anticipated that SnapManager for Virtual Infrastructure (SMVI) will be the first to leverage this functionality.
  • File-level FlexClone will be available for NFS only at first. Although it’s possible to clone data regions within a LUN, support is needed at the host level that isn’t present today.
  • Because blocks can only be shared 256 times (within a file or across files), it’s possible that some blocks in a clone will be full copies. This is especially true if there are lots of clones. Unfortunately, there is no easy way to monitor or check this. “df -s” can show space savings due to cloning, but that isn’t very granular.
  • There can be a maximum of 16 outstanding clone operations per FlexVol.
  • There is a maximum of 16TB of shared data among all clones. Trying to clone more than that results in full copies.
  • The maximum volume size for being able to use cloning is the same as for deduplication.

Obviously, VMware environments—VDI in particular—are a key use case for this sort of technology. (By the way, in case no one has yet connected the dots, this is the technology that I discussed here). To leverage this functionality, NetApp will update a tool known as the Rapid Cloning Utility (RCU; described in more detail in TR-3705) to take full advantage of file-level FlexCloning after Data ONTAP 7.3.1 is released. Note that the RCU is available today, but it only uses volume-level FlexClone.

Tags: , , , , , , , ,

« Older entries