Windows

You are currently browsing articles tagged Windows.

This is just a quick post about a potential fix for some timeout issues when using EMC Replication Manager (RM). An e-mail sent to an internal distribution list described a situation in which a user was using RM but was getting an error when trying to take a VMware snapshot. The error reported was a fairly generic error:

Cannot create a quiesced snapshot because the create snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine.

As it turns out, the problem was actually VSS in the Windows Server 2003-based guest. Since RM leverages VSS, an error with VSS was causing the entire process to fail. The fix was to clean up VSS as described in this Microsoft KB article and then reinstall the VMware Tools. After completing both of those steps, the problem was resolved.

If you are using RM and run into this problem, be sure to double-check to ensure that VSS is working as expected.

Tags: , , ,

This is a liveblog for VMware Partner Exchange session TECHBC0320, “How VMware Leverages Microsoft Volume Shadow Services for Virtual Machine Snapshots”. The presenter is Paul Vasquez with VMware; he works within the Technical Alliances Organization at VMware with a focus on backups.

The session starts out with an overview of VMware snapshots followed by a quick overview of Microsoft Volume Shadow Copy Services.

Vasquez is careful to distinguish VMware snapshots from array-based snapshots, which is good since that seems to confuse a number of people. VMware snapshots can include the state of memory (optional), settings, and disk. Snapshots are taken at the VM level, and up to 32 snapshots can be taken. Over 20 snapshots can cause performance concerns and, in Vasquez’s words, “can cause undesirable results”.

In general, a snapshot will include all disks although there are ways to exclude disks from a snapshot.

Operations involving VMware snapshots include taking a snapshot (self-explanatory), reverting to a snapshot (reverts the VM to the snapshot state, the delta file remains until the snapshot is deleted), and deleting a snapshot (delta file is removed, VM continues running in the current state).

Some use cases for snapshots include: rollback capability for testing patches or updates; rollback for failed software installation; protection against unwanted results of OS reconfigurations or testing; backups (for creating consistent copies of a VM); and replication.

The delta file grows as-needed; over time, the delta file will grow larger and larger. Vasquez cautions attendees to be sure to plan datastore sizes to account for snapshots for VMs and the delta file growth caused by the changes to those VMs.

A good question was raised about read I/Os and the impact of snapshots (does

The presentation now moves on to a discussion of VSS. One component of VSS is the requestor; the requestor makes a request from a provider, and the writer provides information on how to provide information to a requestor. Providers are included with Windows and are responsible for intercepting I/O requests to create and represent volume shadow copies on the file system. There are also 3rd party providers. In this context of this discussion (VSS integration with VMware snapshots), VMware Tools is the requestor.

There is a wide range of applications that provide VSS support, including Exchange, SQL, SharePoint, Active Directory, BITS, DHCP, and WINS. The vssadmin list providers command will show all the providers. (Note that you won’t see the VMware Tools when you run this command; it is dynamically loaded only at snapshot time and then unloaded.)

The vssadmin list writers command will show a list of writers.

The general flow of operation with VSS runs like this:

  1. Requestor makes a shadow copy.
  2. The writer is told to freeze all I/O.
  3. The provider creates a shadow copy.
  4. The writer is told to “thaw,” or resume, I/O to the application.
  5. The requestor now has access to the shadow copy.

The writer can support multiple enumerations, or different ways of coordinating the creation of the shadow copy. Exchange, for example, supports Full (backs up databases, logs, and checkpoints; truncates logs), Copy (backs up databases, logs, and checkpoints; does not truncate logs), Incremental (backs up and truncates logs), Differential (backs up logs but does not truncate). Of these, VMware uses the Copy enumeration when requesting shadow copies. Supposedly, the reason this is the case is to prevent interfering with backup applications that aren’t aware that logs were truncated. In addition, when VMware calls VSS, all writers are engaged, so it’s not possible to selectively choose which VSS writers should be engaged (can’t engage VSS for Exchange but not SQL within the same VM, for example).

In the future, VMware Tools will offer granular control over which VSS enumeration is used. Granular control over which VSS writers can be engaged is also planned.

Vasquez now moves into a discussion of how VMware snapshots and VSS integrate together. When a VMware snapshot is taken, this is when VSS integration comes into play. Obviously, for VSS integration the VM must be powered on (the guest OS must be running in order for VSS to be operational).

Some form of quiescing is always used when a snapshot is taken (unless the VM is powered off). The VMware Sync driver provides a crash-consistent copy of the VM but doesn’t interact with applications. This option is available in vSphere 4.0 and can be used when no VSS support from the application is available. Obviously, there is VSS support (hence this session), and there are pre- and post-quiesce scripts that can be used to create homebrew solutions as well. Both VSS and the Sync driver can be enabled using VMware Tools.

VSS support is enabled in VMware ESX 3.5 Update 2 or higher.

Going back to the VSS flow earlier, an additional step is present before the writer resumes I/O to take the VMware snapshot. After the VMware snapshot is taken, the shadow copy created by the provider is discarded because it is no longer needed. Once again, Vasquez reminds attendees that the VMware Tools Requestor only supports the copy enumeration.

An attendee asked if any plans were in place to do quiescing at the VMFS layer (supposedly to assist with hardware-based snapshots); Vasquez responds that some form of VMFS quiescing would be helpful, but there are challenges with that arrangement that make it currently very difficult to actually achieve.

(Vasquez also commented on the end-of-life policy for the ESX Service Console, but I’ll hold on mentioning what was said until I verify the confidentiality of the statement.)

Some additional things to remember:

  • VMware Tools build must be 110268 or higher.
  • VMware Tools must be running and VSS must be functioning properly.
  • VSS Service must be set to Manual or Automatic.
  • ESX 3.5 Update 2 is required for VSS support.
  • Be sure VSS support is installed with VMware Tools.
  • Try not to keep VMware snapshots around for a long time. Manage snapshots carefully.
  • Sync driver can be used as a failback in the event VSS support fails.
  • VSS snapshot has a 10 second timeout. Rare cases could cause a failure of getting the VSS shadow copy.

Most of the information contained in this presentation are found in the current vSphere documents and in Microsoft’s VSS documentation. (I’ll update this post with URLs when possible.)

And that’s it for the session.

Tags: , , , ,

Stu over at vInternals posted an article a couple of days ago about a problem he encountered with VMware vSphere and Windows Server 2008. Apparently, there is an unexpected behavior with Windows Server 2008 and VM hardware version 7 that is described in this VMware KB article. Stu, however, was seeing the behavior not on upgrading VMs from VM hardware version 4 to VM hardware version 7, but on new virtual machines created from the beginning with VM hardware version 7.

According to an update on Stu’s article, VMware has acknowledged this as a bug and will be investigating a fix to the problem. Until then, follow Stu’s advice and speak to your VMware account team if you are experiencing this problem. If you are getting ready to proceed with a VMware vSphere upgrade and have Windows Server 2008 Enterprise Edition VMs in place, keep this behavior in mind and plan accordingly.

Thanks to Stu for bringing this matter to light!

UPDATE: Stu posted an update with more information and an explanation for the unexpected behavior, so be sure to check it out.

Tags: , , , ,

Upgrading a VMware Infrastructure 3.x environment to VMware vSphere 4 involves more than just upgrading vCenter Server and upgrading your ESX/ESXi hosts (as if that wasn’t enough). You should also plan on upgrading your virtual machines. VMware vSphere introduces a new hardware version (version 7), and vSphere also introduces a new paravirtualized network driver (VMXNET3) as well as a new paravirtualized SCSI driver (PVSCSI). To take advantage of these new drivers as well as other new features, you’ll need to upgrade your virtual machines. This process I describe below works really well.

I’d like to thank Erik Bussink, whose posts on Twitter got me started down this path.

Please note that this process will require some downtime. I personally tested this process with both Windows Server 2003 R2 as well as Windows Server 2008; it worked flawlessly with both versions of Windows. (I’ll post a separate article on doing something similar with other operating systems, if it’s even possible.)

  1. Record the current IP configuration of the guest operating system. You’ll end up needing to recreate it.
  2. Upgrade VMware Tools in the guest operating system. You can do this by right-clicking on the virtual machine and selecting Guest > Install/Upgrade VMware Tools. When prompted, choose to perform an automatic tools upgrade. When the VMware Tools upgrade is complete, the virtual machine will reboot.
  3. After the guest operating system reboots and is back up again, shutdown the guest operating system. You can do this by right-clicking on the virtual machine and selecting Power > Shutdown Guest.
  4. Upgrade the virtual machine hardware by right-clicking the virtual machine and selecting Upgrade Virtual Hardware.
  5. In the virtual machine properties, add a new network adapter of the type VMXNET3 and attach it to the same port group/dvPort group as the first network adapter.
  6. Remove the first/original network adapter.
  7. Add a new virtual hard disk to the virtual machine. Be sure to attach it to SCSI node 1:x; this will add a second SCSI adapter to the virtual machine. The size of the virtual hard disk is irrelevant.
  8. Change the type of the newly-added second SCSI adapter to VMware Paravirtual.
  9. Click OK to commit the changes you’ve made to the virtual machine.
  10. Power on the virtual machine. When the guest operating system is fully booted, log in and recreate the network configuration you recorded for the guest back in step 1. Windows may report an error that the network configuration is already used by a different adapter, but proceed anyway. Once you’ve finished, shut down the guest operating system again.
  11. Edit the virtual machine to remove the second hard disk you just added.
  12. While still in the virtual machine properties, change the type of the original SCSI controller to VMware Paravirtual (NOTE: See update below.)
  13. Power on the virtual machine. When the guest operating system is fully booted up, log in.
  14. Create a new system environment variable named DEVMGR_SHOW_NONPRESENT_DEVICES and set the value to 1.
  15. Launch Device Manager and from the View menu select Show Hidden Devices.
  16. Remove the drivers for the old network adapter and old SCSI adapter. Close Device Manager and you’re done!

If you perform these steps on a template, then you can be assured that all future virtual machines cloned from this template also have the latest paravirtualized drivers installed for maximum performance.

Post any questions or clarifications in the comments. Thanks!

UPDATE: Per this VMware KB article, VMware doesn’t support using the PVSCSI adapter for boot devices. That is not to say that it doesn’t work (it does work), but that it is not supported. Thanks to Eddy for pointing that out in the comments!

Tags: , , , ,

Over the 2008-2009 holiday season, I rebuilt my home network. I included the notes and information from my home network rebuild in an article that described the Mac OS X-Ubuntu integration resulting from the rebuild. Since that time, I’ve added a larger hard drive to the home server to make more room for Time Machine backups, movies, music, and other files. Things seemed to be working very well. Until the other day…

My wife made an offhand comment that she couldn’t access the shared music library from her laptop. I tested the connection and, sure enough, every time I clicked the shared library icon it simply disappeared. No error, no warning, no entries in any log files…it just disappeared. I searched the Windows event logs, and I searched the log files on the Ubuntu server downstairs. Neither computer had any entries whatsoever that provided any insight as to why this one computer would not connect to the shared music library.

Being the geeky troubleshooter that I am, I attempted to replicate the problem on some of the other computers on the network. My MacBook Pro worked fine. Three other Windows laptops on the network, running the same version of Windows (Windows XP Professional) and the same Service Pack revision, also worked fine. The problem seemed to be isolated to her computer. Perhaps it was only when she was on the wireless network…nope, the same problem regardless of the network connection.

I upgraded iTunes to the latest version. That didn’t work. I disabled the Windows Firewall on her computer. That didn’t work. I made sure that no traffic was being blocked by the firewall on the Ubuntu server; no traffic was being blocked. In other words, that didn’t work. I was about to give up and just write it off as one of those strange aberrations that couldn’t be resolved and chalk it up to Windows.

Then I stumbled onto this site. I’d already created a daapd.service file for Avahi to use previously, but this site described some additional entries in the daapd.service file that I didn’t have. I made some edits, based on the information on the site, and here’s the daapd.service file I had for Avahi:

<?xml version="1.0" standalone='no'?><!--*-nxml-*-->
<!DOCTYPE service-group SYSTEM "avahi-service.dtd">
<service-group>
<name replace-wildcards="yes">Home Media Server</name>
<service>
<type>_daap._tcp</type>
<port>3689</port>
<txt-record>txtvers=1</txt-record>
<txt-record>iTSh Version=131073</txt-record>
<txt-record>Version=196610</txt-record>
</service>
</service-group>

After changing the daapd.service file to the version listed above, I restarted Avahi. Upon the shared media server re-appearing in iTunes, I clicked on it and…drum roll please…it worked! The previous version I had been using did not have the txt-record entries, and I really have no idea why adding the txt-record entries suddenly made my wife’s iTunes connect properly. I suppose it doesn’t matter why it works, it just matters that I FIXED IT! (ePlus engineers who attended our NSM this year will get this joke.)

Still, in the event you’re running into the same issue—a Windows installation of iTunes that fails to connect to a shared music library running on Firefly Media Server—then perhaps updating your Avahi configuration will correct the problem.

Tags: , , , , , ,

Sanbolic is continuing to differentiate its clustered file system, Melio FS, in advance of the rudimentary clustered file system Microsoft plans on introducing in Windows Server 2008 R2. In an announcement last week, Sanbolic announced support for fully journaled snapshots. This functionality allows any server accessing the clustered file system to invoke a snapshot. The new snapshot functionality provides support for VSS and “full industry standard APIs,” although I’m not really sure what those “full industry standard APIs” are exactly.

You can download the full press release describing the new functionality here.

Separately, Sanbolic also announced that Melio FS fully supports Microsoft System Center Virtual Machine Manager 2008; more information on that is also available.

Now, if only Sanbolic would port Melio FS to VMware ESX/ESXi, then we could have some really interesting discussions. Snapshot functionality built into the shared file system, anyone?

Tags: , , , ,

I first wrote about Marathon Technologies and their everRun VM product last September just prior to the start of VMworld 2008 in Las Vegas, NV. Back at the start of 2009 I also mentioned Marathon’s joint development agreement with Microsoft and the intended plan to bring everRun VM to Hyper-V environments.

Today Marathon announced the availability of their everRun VM Lockstep product, which brings full circle the product announcement from last September. This product, which runs only on Citrix XenServer, puts into place the “three levels” of availability that Marathon has often spoke of:

  • Auto-restart high availability (XenServer HA)
  • Component-level fault tolerance
  • Full system-level fault tolerance

With full system-level fault tolerance, Marathon is able to provide organizations with the ability to protect applications with the highest levels of availability, eliminating downtime due to physical server failure. If a physical server fails, the virtual machine continues running on another physical server without any disruption.

The announcement of everRun VM Lockstep gives Marathon and Citrix a slight edge over competitor VMware, whose similar VMware Fault Tolerance offering has been demonstrated and discussed extensively but has not been officially announced. Given that Marathon expects everRun VM Lockstep to be available within 30 days, they may also have an edge over VMware in getting their product to market as well. Marathon everRun VM Lockstep will run on the free version of Citrix XenServer.

At the same time, Marathon is also announcing everRun 2G, the successor to Marathon’s everRun HA and everRun FT products for Windows Server environments. Marathon everRun 2G combines and extends the functionality of the previous generation of products, allowing organizations to provide high availability to any Windows application without modification or scripting. Like everRun VM, everRun 2G will offer “dialable” availability ranging from automated HA to full system-level fault tolerance.

Like everRun VM Lockstep, everRun 2G is expected to be available within the next 30 days.

Visit the Marathon Technologies web site for more information.

Tags: , , , ,

Today Double-Take Software announces their new Workload Optimization Suite. All of Double-Take’s flagship software products are now organized and unified within the idea of workload optimization, and new products are being announced to help provide a more complete set of solutions.

The products in the Workload Optimization Suite include:

  • Double-Take Move
  • Double-Take Flex
  • Double-Take Backup
  • Double-Take Availability

Of these four products, two of them—Move and Flex—are new products also being announced today. These new products are available today. Double-Take Backup and Double-Take Availability are available under existing Double-Take, Livewire, TimeData, and GeoCluster brands. For example, Double-Take for Windows will fold into the Double-Take Availability and Double-Take Backup products later this year with an update.

Double-Take Move is one of the new products that was announced today. Double-Take Move leverages Double-Take’s existing replication technologies to provide a platform-independent X2X migration engine. X2X means any-to-any: physical-to-virtual (P2V), physical-to-physical (P2P), virtual-to-physical (V2P), and virtual-to-virtual (V2V) are all supported. In addition, Double-Take Move will automatically create a VMware ESX or Microsoft Hyper-V virtual machine when the destination is a virtual machine. Pricing is available per-use or for an entire site. Personally, I’ve never been a fan of tools that are licensed on a per-use basis, but with a product like this there are only so many ways to license it. For larger projects, I hope Double-Take’s site licensing won’t be too unattractive.

The second new product, Double-Take Flex, allows organizations to boot systems via an iSCSI SAN without the need for expensive iSCSI hardware initiators. Any ordinary Ethernet NIC within a server or desktop can be used to iSCSI boot and enable diskless operation. In addition, Double-Take Flex contains an iSCSI target for Windows Server, allowing smaller organizations to build their own iSCSI SANs. When using the Double-Take Flex iSCSI target, organizations also gain the ability to share base images, so that the storage requirements are greatly reduced and management of the OS image is simplified. I would love to have seen this sort of support with enterprise iSCSI targets like those provided by NetApp, EMC, HP, and others, but for now the shared image support is limited to Double-Take’s own iSCSI target implementation. Double-Take assured me that APIs are available to allow other vendors to add this support to their systems; only time will tell if anyone actually takes advantage of those APIs.

Both Double-Take Move and Double-Take Flex provide management consoles for ease of administration.

More information on Double-Take Move, Double-Take Flex, and the rest of the Workload Optimization Suite is available from Double-Take’s web site.

Tags: , , , , ,

A reader contacted me a short while ago to inquire about a problem he was having with his Linux-AD integration efforts. It seems he had recently added a new domain controller (DC) that was intended to be a DC for a disaster recovery (DR) site. When he took this new DR DC offline in order to physically move it to the DR site, some of his AD-integrated Linux systems started failing to authenticate. More specifically, Kerberos continued to work, but LDAP lookups failed. When the reader would bring the DR DC back online, those systems started working again.

There was a clear correlation between the DR DC and the AD-integrated Linux systems, even though the /etc/ldap.conf file specifically pointed to another DC by IP address. There was no reference whatsoever, by IP address or host name, to the DR DC. Yet, every time the DR DC was taken offline, the behavior returned on a subset of Linux hosts. The only difference we could find between the affected and unaffected hosts was that the affected hosts were not on the same VLAN as the production domain controllers.

I theorized that Windows’ netmask ordering feature, which prioritizes the return of DNS lookups to provide clients with addresses that are “closer” to them, was playing a role here. However, the /etc/ldap.conf was using IP addresses, not the domain name or even the fully qualified domain name of a DC. It couldn’t be DNS, at least not as far as I could tell.

Upon further investigation, the reader discovered that the affected Linux servers—those that were on a different VLAN than both the production DCs as well as the DR DC—were maintaining persistent connections to the DR DC. (He found this via netstat.) When the DR DC went offline, the affected Linux hosts tried to continue to communicate to that DC and that DC only. Once the reader was able to get the affected Linux hosts to drop that persistent connection, he was able to take the DR DC offline and the Linux hosts worked as expected.

So, the real question now becomes: how (or why) did the Linux servers connect to the DR DC instead of the production DC for which they were configured? I think that Active Directory issued an LDAP referral to direct the affected Linux servers to the DR DC as a result of site topology. Perhaps due to an incorrect or incomplete site topology configuration, Active Directory believed the DR DC should handle the VLANs where the affected Linux servers resided. If that is indeed the case, the fix would be to make sure that your AD site topology is correct and that subnets are appropriately associated with sites. Of course, this is just a theory.

Has anyone else seen an issue similar to this? What fix were you able to implement in order to correct it?

Tags: , , , , , ,

BlueStripe Software, a company I’ve written about a couple of times and who is based in the Raleigh-Durham, NC, area, is launching FactFinder 2.0. I discussed the previous version, FactFinder 1.1, here; and have talked with the guys from BlueStripe on a number of occasions, including at VMworld 2008. I try not to just post press releases or other news information without also adding a little bit of extra information, analysis, or my own thoughts. I have seen FactFinder in action, and it does represent a different way of approaching performance and availability issues. It’s not necessarily a unique way, though, as there are other companies that also work from the application level. I’m confident they don’t use the same technology, but they are working from the application level.

One of the major new features in this new release of FactFinder is support for Red Hat Enterprise Linux, which allows FactFinder to bring the same application discovery and service-level awareness now to the Linux platform. Of course, FactFinder fully supports both physical and virtualized instances of both Windows and now Linux, and in fact describes their product as a great way to ensure successful P2V (physical-to-virtual) conversions—not just from a “Did the conversion work?” perspective, but more importantly from the perspective of, “Does the application still perform at the required levels?” It’s the applications people care about, after all.

The full press release is here.

Tags: , ,

« Older entries § Newer entries »