VMwareSRM

You are currently browsing articles tagged VMwareSRM.

Virtualization Short Take #31

Welcome back to yet another Virtualization Short Take! Here is a collection of virtualization-related items—some recent, some not, but hopefully all interesting and/or useful.

  • Matt Hensley posted a link to this VIOPS document on how to setup VMware SRM 4.0 with an EMC Celerra storage array. I haven’t had the chance to read through it yet.
  • Jason Boche informs us that both Lab Manager 3 and Lab Manager 4 have problems with the VMXNET3 virtual NIC. In this blog post, Jason describes how his attempts to install Lab Manager server into a VM with the VMXNET3 NIC was failing. Fortunately, Jason provides a workaround as well, but you’ll have to read his article to get that information.
  • Bruce Hoard over at Virtualization Review (disclaimer: I write a regular column for the print edition of Virtualization Review) stirred up a bit of controversy with his post about Hyper-V’s three problems. The first problem is indeed a problem, but not an architectural or technological problem; VMware is indeed the market leader and has a quite solid user base. The second two “problems” stem from Microsoft’s architectural decision to embed the hypervisor into Windows Server. Like any other technology decision, this decisions has its advantages and disadvantages (these technology decisions are a real double-edged sword). Based on historical data, it would seem that the need to patch Windows Server will impact the uptime of the Windows virtualization solution; however, this is not to say that VMware ESX/ESXi are not without their patches and associated downtime as well. I guess the key takeaway here is that VMware seems to be doing a much better job of lessening (or even removing) the impact of the downtime through things like VMotion, DRS, HA, maintenance mode, and the like.
  • Apparently there is a problem with the GA release of the Host Update utility that is installed along with the vSphere Client, as outlined here by Barry Coombs. Downloading the latest version and reinstalling seems to fix the issue.
  • And while we are on the subject of ESX upgrades, here’s another one: if the /boot partition is too small, the upgrade to ESX 4.0.0 will fail. This isn’t really anything too new and, as Joep points out, is documented in the vSphere Upgrade Guide. I prefer clean installations of VMware ESX/ESXi anyway.
  • Dave Mishchenko details his adventures (part 1, part 2, and part 3) in managing ESXi without the VI Client or the vCLI. While it’s interesting and contains some useful information, I’m not so sure that the exercise is useful in any way other than academically. First of all, Dave enables SSH access to ESXi, which is unsupported. Second, while he shows that it’s possible to manage ESXi without the VI Client or the vCLI, it don’t seem to be very efficient. Still, there is some useful information to be gleaned for those who want to know more about ESXi and its inner workings.
  • I think Simon Seagrave and Jason Boche were collaborating in secret, since they both wrote posts about using vSphere’s power savings/frequency scaling functionality. Simon’s post is dated October 27; Jason’s post is dated November 11. Coincidence? I don’t think so. C’mon, guys, go ahead and admit it.
  • Thinking of using the Shared Recovery Site feature in VMware SRM 4.0? This VMware KB article might come in handy.
  • I’m of the opinion that every blogger has a few “masterpiece” posts. These are posts that are just so good, so relevant, so useful, that they almost transcend the other content on the blogger’s site. Based on traffic patterns, one of my “masterpiece” posts is the one on ESX Server, NIC teaming, and VLAN trunking. It’s not the most well-written post I’ve ever published, but it seems to have a lasting impact. Why do I mention this? Because I believe that Chad Sakac’s post on VMware I/O queues, microbursting, and multipathing is one of his “masterpiece” posts. Like Scott Drummonds, I’ve read that post multiple times, and every time I read it I get something else out of it, and I’m reminded of just how much I have yet to learn. Time to get back out of that comfort zone!
  • Oh, and speaking of Chad’s blog…this post is handy, too.

That’s all for now, folks. Stay tuned for the next installation, where I’ll once again share a collection of links about virtualization. Until then, feel free to share your own links in the comments.

Tags: , , , , , , ,

Virtualization Short Take #24

There’s lots of good information flowing around the Internet, and it’s becoming increasingly difficult to sort through all the useless stuff to find the valuable gems. Hopefully, some of the links that I have collected here will prove to be more useful than useless!

  • I came across this VMware KB article titled “Dedicating specific NICs to portgroups while maintaining NIC teaming and failover for the vSwitch”. I was hoping it would shed new light on some NIC teaming functionality. Unfortunately, it was only about overriding the default vSwitch failover policy on a per-portgroup basis. I was already well aware of that functionality and use it quite extensively in my VMware designs, but for others that may prove useful.
  • This video about VMware DPM sparked some debate about spin-up/spin-down affecting drive MTBF and decreasing a VMware ESX server’s operational lifecycle. Chad Sakac of EMC shared some findings from EMC regarding spin-up/spin-down in this post and came to the conclusion that using VMware DPM should not materially affect the reliability or lifetime of servers (at least with regards to drive failures). Personally, I tend to agree that this was FUD, most likely from a competitor, but it’s best to get this sort of thing out in the open and debunked.
  • Leo posted a brief snippet of code to upgrade the VMware Tools on VMs without a reboot. It looks like it might come in handy. And Leo’s guide to configuring jumbo frames with an EMC AX4-5i is quite useful, too—it’s a nice counterpoint to my own guide to configuring jumbo frames.
  • Tomas ten Dam has completed his guide to building a complete “SRM in a Box” setup using the NetApp Data ONTAP Simulator. Of course, Chad wants him to use the Celerra VM…
  • Oh, and while we’re talking VMware SRM, be sure to check out Mike Laverick’s book on VMware SRM, “Administering VMware Site Recovery Manager 1.0″. I haven’t read the book yet, but knowing Mike I’m sure it’s good quality stuff. Maybe Santa will give me a copy for Christmas.
  • Sven H. over at VirtualFuture.info posted a good guide on using thin provisioned VMDKs with VMware ESX 3.5 via the vmkfstools command. (I was going to include a trackback to Sven’s post, but his blog theme doesn’t show the trackback URL.) Seems like I saw somewhere that thin provisioned VMDKs in ESX 3.5 are still unsupported, so deploy accordingly.
  • Via Tony Soper, I found that version 2 of Microsoft’s Offline Virtual Machine Servicing Tool is available. I first discussed the Offline Virtual Machine Servicing Tool back in June during Tech-Ed 2008. You can download the tool here.
  • Also from Tony, here’s a great article on how to balance VM I/O with Hyper-V. An interesting tidbit from this: by default, I/O balancing is enabled for storage, but not for networking. I can see it needing to be enabled for storage, but why disabled by default for networking?
  • More information on controlling resource utilization within Hyper-V is provided in this article by Robert Larson. It’s worth having a quick look if you are unsure how to configure it or how it works.
  • Ben Armstrong answers the question, “Why does it take so long to create a fixed size virtual hard disk?” The answer: the disk space is zeroed out in advance. My question is this: is this need to zero out the disk space a result of how NTFS deletes files or is this scenario applicable to VMFS as well?
  • This has probably been mentioned before, but users considering virtualizing their Active Directory domain controllers should keep these considerations in mind.
  • I recently ran into a situation where we need to change the IP address of an NFS datastore. (It’s a long story as to how this came about.) In any case, I told the customer that I couldn’t be sure that changing the IP address wouldn’t cause problems. Fortunately, before the customer tried it, I found this post by Rick Scherer. The short story: it doesn’t work, and you shouldn’t do it. Create a new datastore with the correct IP address and use Storage VMotion instead.
  • For even more information on Storage VMotion, also check out Chad’s post here.
  • VMwarewolf continues his Resolution Path series with common fault issues in VMware Infrastructure. Good stuff.

It’s clearly been too long since I published one of these, as I still have other links collecting dust in my “link bin”:

Third Brigade offers free security for up to 100 virtual machines
Version 4 of the PowerVDI tool
Go Daddy Wildcard Certificate with VI3
New VMware VI network port diagram request for comments
Auditing ESX root logins with email…

Like I said, there’s just so much information! And now that I’m trying to delve deeper into the storage realm, that’s only doubled up on the information I’m trying to manage. Hopefully I’ve picked out a few gems for you this week. Thanks for reading!

Tags: , , , , , , , , ,

This session described VMware Site Recovery Manager (SRM) on NetApp storage. The session started out with a review of VMware SRM, its features and functionality, and some of the requirements. I was not aware, for example, that SRM cannot use SQL Server Express like VirtualCenter can; you must use a full-blown instance of SQL Server. Given VMware’s development history, I should not have been surprised to find that Perl 5.8 is required (it’s included in the distribution and installed automatically).

On the NetApp side, it’s important to note that users must first configure SecureAdmin in order for VMware SRM to use HTTPS when communicating with the NetApp storage arrays. If this isn’t done first, then the NetApp Site Recovery Adapter (SRA) will drop back to plain HTTP. The storage controllers must also have licenses for SnapMirror, iSCSI (included with the storage controllers), FCP (where applicable), and FlexClone. Without FlexClone, it’s impossible to do failover testing. NetApp again re-iterated that they anticipate seeing NFS support in VMware SRM somewhere in the March 2009 timeframe.

Note that there is no support for SnapVault or MetroCluster in SRM, although there are some interesting synergies between MetroCluster and VMware HA that are being explored. It will be interesting to see where, if anywhere, that may lead. NetApp admins may use either Volume SnapMirror (VSM) or Qtree SnapMirror (QSM), although VSM is preferred since it preserves deduplication with replication. QSM does not.

The presenters referred attendees to TR-3671, “VMware Site Recovery Manager in a NetApp Environment,” for more detailed information.

At the Recovery Site, users must configure an additional, non-replicated datastore. This additional datastore does not have to be very large, but it’s required for storing the “shadow VMs” (or “placeholder VMs”) that are created and maintained by VMware SRM.

At present, there is no integration between SnapManager for Virtual Infrastructure (SMVI) and VMware SRM. There are numerous technical questions, and I’m not entirely sure that I fully understand the implications just yet. This will be an area that I will be exploring further so that I can better understand the considerations of using these technologies together. NetApp is working with VMware to try to resolve some of the technical concerns around SMVI-SRM integration, but that will take some time. In other words, don’t hold your breath.

Finally, if you’ve downloaded the NetApp SRA prior to the last week or so (this was back in the middle of November), download it again. There were some issues fixed that have been addressed in a more recent release of the SRA. Unfortunately, VMware would not let NetApp increment the version number on the SRA, so it’s a bit difficult to tell what version you are running. If anyone has more information on that—I don’t recall or have any notes from the session on how to do this—it would be greatly appreciated.

Other miscellaneous notes from the session:

  • There are issues backing up a VMware SRM recovery plan; it’s not currently possible to export it to CSV/XML and then import it back in again)
  • VMware SRM and the NetApp SRA support dissimilar protocols between the Protected and Recovery Sites (e.g., FCP at Protected and iSCSI at Recovery) and dissimilar storage (e.g., FC disks at Protected and SATA disks at Recovery)
  • The appropriate iGroups must exist at the Recovery Site and the VMware ESX servers must be in the correct iGroups, but VMware SRM will handle mapping the LUNs to the iGroups

I think that’s all I have for this session. If any other session attendees have more information, please add it in the comments below.

Tags: , , , , ,

Many thanks to Dave Lawrence, aka VMguy, for the heads-up: Update 1 for VMware Site Recovery Manager 1.0 has been released. Missing in this update: NFS support. I mention that only because I heard numerous references to NFS Support in SRM 1.0U1 at NetApp Insight a few weeks ago. Of course, they also mentioned a March 2009 timeframe, so I was a bit surprised when I saw that Update 1 had been released.

However, there are plenty of other goodies besides NFS support available in Update 1 of SRM 1.0:

  • SRM has separated the permission to test a recovery plan and to actually run a recovery plan. This allows more junior admins to test the plan, but reserve actually running the plan for a more senior admin, for example.
  • RDM support, which enables failover for MSCS clusters
  • Batch IP property customization

And more, but for the all the details, see Dave’s post or the Release Notes!

Tags: , , ,

There is no general session this morning at VMworld 2008; instead, a “keynote” will be delivered about automating disaster recovery (DR) using VMware Site Recovery Manager (SRM). This is similar to the way in which other vendors have delivered various “keynotes” throughout the conference instead of all the announcements being crammed into the morning general sessions.

The speaker this morning is Jay Judkowitz, the product manager for VMware SRM. I’ve met Jay before; he’s a good guy. There’s a small technical glitch as the session begins because the slide deck doesn’t come up, but that gets resolved within only a few minutes and Jay begins his presentation.

The presentation begins with yet another overview of the VDC-OS vision; SRM is considered one of the vCenter management vServices. Jay then goes on to address all the various ways in which VMware provides application availability for applications hosted on VMware Infrastructure. This would be technologies like VMotion, VMware HA, VMware DRS, VMware FT, NIC teaming, storage multipathing, and of course Site Recovery Manager.

The traditional challenges of DR (including complex recovery processes and procedures, hardware dependence, inability to test extensively or repeatedly) are all addressed by VMware SRM. More accurately, they are addressed by the products that form a foundation underneath VMware SRM. Features like hardware independence, encapsulation, partitioning and consolidation, and resource pooling. These features have a direct play in a DR environment. It’s funny to see Jay taking this particular approach; it’s almost like he’s using the same slide deck that I’ve used in DR presentations given over the last couple of months.

That finally brings the discussion around to Site Recovery Manager specifically. Jay goes over some of the features of SRM, and discusses some “do’s and dont’s” for SRM. For example, SRM isn’t really intended to provide failover for a single VM, although you can architect it to do that (put that VM on a single LUN by itself and create a Protection Group for that LUN and VM, then craft your Recovery Plan).

It’s important to note that SRM is not a replication product, but instead relies upon replication products from supported partners. This is done via the Storage Replication Adapter (SRA), a piece of software written by the storage vendor.

When setting up SRM, there are number of steps that it goes through. First, you have to integrate with the storage replication in place already (and yes, the storage replication needs to be in place already). Next, you need to map recovery resources; this creates the link between resources used in the Protected Site to resources that will be used in the Recovery Site. Third, you need to create Recovery Plans, which is the automated equivalent of the DR runbook. That is, the Recovery Plan defines which VMs will failover, in which order, at the Recovery Site. That’s a bit of simplistic overview but it does get the point across.

At this point, I’ve decided that I’m going to try to get into a different session. I’m quite familiar with SRM, a lot of readers are probably familiar with it as well, and it doesn’t look like there is anything new that will be revealed here. For those readers that aren’t familiar with SRM, let me know in the comments. If there’s enough interest, I’ll write something separate after my return from VMworld 2008.

Tags: , , , , , ,

Late again! Man, I need to get on the ball! Fortunately, I only missed the first part of the agenda. Once again, no Wi-Fi access in the session breakout, so I’ll publish this at the first available opportunity.

This is BC1693, Architecting DR Solutions with VMware SRM. The presenters are John Arrasjid and Will Crittenden; these are two solid guys that know DR and know SRM very well.

John starts the session with an overview of the influencing factors that affect a DR solution using SRM. Some of these factors may affect what you may or may not be able to do with SRM.

Clearly, there are different types of disasters. Some of these are true disasters—Hurricane Ike in Galveston, for example, or power outages—and there are “planned” failovers. Each of these needs to be accommodated in the design.

Some key questions to consider for DR design:

  • What applications are mission critical?
  • Is availability or performance more important?
  • How much of my business capacity will run at the remote site and for how long with I be able to sustain that load?
  • What distance is required to protect against goegraphic disasters?
  • What technologies (hardware and software) will be needed?
  • How often will you test the DR plan?
  • What impact will DR plan testing have on the production site? What impact is acceptable to the business?
  • What is the budget for this DR solution?

The three key network influencing factors are distance, bandwidth, and hop count. Throughput is good, but latency must also be considered. The type of replication, synchronous vs. asynchronous, that is being used is also important.

VLANs with SRM can be done in two ways: flat VLANs and disparate VLANs. With flat VLANs, no IP reconfiguration is required; with disparate VLANs, SRM can automate the process of reconfiguration IP addresses on VMs during the failover process.

Compliance guidelines that impact the business also need to be considered and incorporated into the design. Things like manual vs. automatic, SLA/RPO/RTO, failback requirements, security and access controls, and which technologies to use are all important. What about requirements to ensure that data is isolated to its own media?

What makes a DR solution successful? First, you need to understand what part of the business need to be protected. Understanding the applications and the dependencies (upstream and downstream) will help in this area. Ongoing testing of the DR plan is another key factor. The core virtualization itself is important—do you have the right version, is it correctly architected, are resources appropriately managed, etc. And, finally, operational readiness is important as well. Teams need to be trained on the different technologies and need to understand the workflows created within SRM.

When setting up SRM, two VirtualCenters and two SRM servers are required. Back-end SQL servers are necessary as well, in addition to authentication servers, and of course a supported data replication mechnaism. SRM should not be used to protect “shared services,” like authentication (Active Directory), although these certainly can be virtual machines.

Inventory mapping is used to map networks/port groups at the Protected Site to the coresponding networks/port groups on the Recovery Site. The SRA (Storage Replication Adapter) handles the matching of LUNs between the Protected Site and the Recovery Site. Empty LUNs (LUNs without a VM) won’t be properly recognized by SRM. Protection Groups form the basis of the Recovery Plan and are centered around LUNs. When a LUN is failed over, all VMs on that LUN must be failed over at the same time. This may require some re-organization of the VMs on the various LUNs to group VMs together for similar service levels/failover requirements.

In the Recovery Plan, high priority VMs are started sequentially, in the order defined in the plan; medium priority and low priority VMs are started in parallel. It’s critical that business requirements and dependencies are understood here so that systems can be failed over and restarted in the correct order.

John now moved deeper into the design considerations, like server types, network configuration, DNs services, Active Directory services, VirtualCenter infrastructure (two VC servers, one at each site), and ESX hosts (needed at both sites). Of course, SRM servers are needed at both sites. Distributed Power Management (DPM) may also play a role here to help reduce power costs for VMware ESX hosts at the Recovery Site.

Will and John then proceeded to review some sample logical diagrams, sample recovery plan, sample workflow based on the sample recovery plan, and to discuss in more detail these various items.

Overall, the session is very good, but it is much more business-oriented and not technology-oriented. That may be due in large part to the nature of SRM; in order to be successful in building a DR solution, a strong business focus is required. If nothing else, it would be important for attendees of this session to at least understand that a successful SRM implementation involves much, much more than just installing and configuring SRM.

Tags: , , ,

I’ve never really discussed VMware Site Recovery Manager (SRM) here; there always seemed to be plenty of coverage elsewhere. Just recently, though, I had the opportunity to spend some time with a very knowledgeable SRM resource within VMware, and gathered some notes about VMware SRM that I thought might be helpful. Some of this stuff may be obvious, so bear with me.

  • Storage array replication is a necessity. Without it, SRM can’t be used. Keep in mind that only certain arrays and certain replication technologies are supported, so be sure to check the SRM compatibility list.
  • The Storage Recovery Adapter (SRA) is a critical part of an SRM deployment, but it doesn’t come from VMware. It comes from the storage vendor (assuming that it is a compatible array and compatible replication technology).
  • Two instances of VirtualCenter (VC) are required. One of these will be at the “Protected Site,” the other will be at the “Recovery Site.”
  • Likewise, two instances of SRM are needed, one at each site.
  • The VC servers and SRM servers at each site need to be able to talk with each other, i.e., they need IP-based connectivity. SRM will communicate with the local VC server over TCP ports 443 and 8095. SRM will communicate with the remote VC server over TCP port 443. The local SRM server uses the remote VC server as a proxy to communicate with the remote SRM server instead of communicating with it directly.
  • VC and SRM each require their own database.
  • If the physical hardware is sufficiently equipped, then VC and SRM can be co-located on the same server. Otherwise, VC and SRM should be placed on their own physical server.
  • SRM does not support failback. Instead, create a Recovery Plan in reverse.
  • The VC and SRM databases do not replicate between the sites. They are maintained separately.
  • Observe the “DNS Rule of Four” for SRM—forward lookup, reverse lookup, short name, and fully-qualified domain name (FQDN). All four of these should work properly.
  • All VMs in a Protection Group will fail over at the same time, so users will want to properly architect the Protection Groups to provide the appropriate DR functionality for the right VMs. Application dependencies are important here—failing over some VMs but not others that provide dependency services won’t do much good, now will it?
  • VMware SRM requires Virtual Center 2.5, and VirtualCenter 2.5 Update 1 is recommended. Update 2 is not supported.
  • Similarly, VMware ESX 3.5 Update 2 is also not supported (yet).

I’m confident I’ll have more posts on VMware SRM in the coming months. In the meantime, feel free to add your thoughts in the comments below.

Tags: , , ,

Here’s the latest installation of Virtualization Short Takes, my occasionally-weekly view on various virtualization news, reviews, and other happenings. Hopefully I can share something interesting with you!

  • Via VMblog.com, I saw that Transitive Corporation is supporting the use of QuickTransit within Hyper-V virtual machines. This is interesting because it extends the ability of Hyper-V to help customers consolidate applications. QuickTransit, in case you aren’t aware, allows applications written for Solaris/SPARC environments to run in Linux/x86 environments. It was also the technology behind Apple’s Rosetta, which allowed Mac users to run PowerPC apps on Intel Macs. Does anyone know if QuickTransit is supported within VMware VMs, or is this specific to Hyper-V?
  • This one was quite interesting to me. Question #2 is particularly applicable: why is a reboot required, anyway? (Yes, yes, I know—there is a workaround that does not require a reboot. It’s the principle of the matter.)
  • Via various sources on the Internet, I learned about the release of ESX Manager. This looks like quite an interesting tool, although I have not yet had the opportunity to install or try it yet. Anyone out there tried this and have some feedback for us?
  • Every now and then, something comes up about Citrix XenServer and Xen and it makes me wonder about the relationship between Citrix and the open source Xen community. The latest thing is what appears to be an offhand comment by Simon Crosby of Citrix where he says, “Because we own the hypervisor, we can do much more integration and development around it” (read it in context here). What does that mean? What does “ownership” of the Xen hypervisor mean? And if the Xen hypervisor is licensed under an open source license (GNU GPL v2, according to this page), how can Citrix make proprietary extensions to the hypervisor without being forced to release those extensions back to the community? I guess I just don’t understand the relationship there and how it works. This is where the murky waters of a commercial entity “owning” an open source project come into play, in my mind.
  • I ran across this very useful tip for creating a vSwitch with a specific number of ports. It looks like Dwight Hubbard, the maintainer of the site, also has some other interesting posts. Might be worth adding his feed to your RSS reader.
  • Nick Triantos discusses NetApp’s Site Recovery Adapter (SRA) and its role with VMware Site Recovery Manager (SRM). Anyone have any links to similar discussions of the SRAs for other storage vendors?
  • John Howard provides a great breakdown of how Hyper-V generates dynamic MAC addresses and how Hyper-V attempts to protect against MAC collisions in some circumstances.
  • The VI3 Security Hardening Guide has been updated, which is good because some people felt it just didn’t go far enough.
  • VMware re-iterated their stance on being storage protocol agnostic, and in the article included a very useful table that summarizes the various products and technologies and which are supported with which storage protocols. While the rest of the post is helpful, that summary of supported features is probably the most helpful.
  • Interesting in trying out Hyper-V, but don’t have shared storage? Take a look at this blog post. I think you’ll find it helpful.

I’m always on the lookout for other interesting or useful virtualization news, tips, and tricks, so feel free to share with me and other readers in the comments.

Tags: , , , , , , , ,