VMwareSRM

You are currently browsing articles tagged VMwareSRM.

There is no general session this morning at VMworld 2008; instead, a “keynote” will be delivered about automating disaster recovery (DR) using VMware Site Recovery Manager (SRM). This is similar to the way in which other vendors have delivered various “keynotes” throughout the conference instead of all the announcements being crammed into the morning general sessions.

The speaker this morning is Jay Judkowitz, the product manager for VMware SRM. I’ve met Jay before; he’s a good guy. There’s a small technical glitch as the session begins because the slide deck doesn’t come up, but that gets resolved within only a few minutes and Jay begins his presentation.

The presentation begins with yet another overview of the VDC-OS vision; SRM is considered one of the vCenter management vServices. Jay then goes on to address all the various ways in which VMware provides application availability for applications hosted on VMware Infrastructure. This would be technologies like VMotion, VMware HA, VMware DRS, VMware FT, NIC teaming, storage multipathing, and of course Site Recovery Manager.

The traditional challenges of DR (including complex recovery processes and procedures, hardware dependence, inability to test extensively or repeatedly) are all addressed by VMware SRM. More accurately, they are addressed by the products that form a foundation underneath VMware SRM. Features like hardware independence, encapsulation, partitioning and consolidation, and resource pooling. These features have a direct play in a DR environment. It’s funny to see Jay taking this particular approach; it’s almost like he’s using the same slide deck that I’ve used in DR presentations given over the last couple of months.

That finally brings the discussion around to Site Recovery Manager specifically. Jay goes over some of the features of SRM, and discusses some “do’s and dont’s” for SRM. For example, SRM isn’t really intended to provide failover for a single VM, although you can architect it to do that (put that VM on a single LUN by itself and create a Protection Group for that LUN and VM, then craft your Recovery Plan).

It’s important to note that SRM is not a replication product, but instead relies upon replication products from supported partners. This is done via the Storage Replication Adapter (SRA), a piece of software written by the storage vendor.

When setting up SRM, there are number of steps that it goes through. First, you have to integrate with the storage replication in place already (and yes, the storage replication needs to be in place already). Next, you need to map recovery resources; this creates the link between resources used in the Protected Site to resources that will be used in the Recovery Site. Third, you need to create Recovery Plans, which is the automated equivalent of the DR runbook. That is, the Recovery Plan defines which VMs will failover, in which order, at the Recovery Site. That’s a bit of simplistic overview but it does get the point across.

At this point, I’ve decided that I’m going to try to get into a different session. I’m quite familiar with SRM, a lot of readers are probably familiar with it as well, and it doesn’t look like there is anything new that will be revealed here. For those readers that aren’t familiar with SRM, let me know in the comments. If there’s enough interest, I’ll write something separate after my return from VMworld 2008.

Tags: , , , , , ,

Late again! Man, I need to get on the ball! Fortunately, I only missed the first part of the agenda. Once again, no Wi-Fi access in the session breakout, so I’ll publish this at the first available opportunity.

This is BC1693, Architecting DR Solutions with VMware SRM. The presenters are John Arrasjid and Will Crittenden; these are two solid guys that know DR and know SRM very well.

John starts the session with an overview of the influencing factors that affect a DR solution using SRM. Some of these factors may affect what you may or may not be able to do with SRM.

Clearly, there are different types of disasters. Some of these are true disasters—Hurricane Ike in Galveston, for example, or power outages—and there are “planned” failovers. Each of these needs to be accommodated in the design.

Some key questions to consider for DR design:

  • What applications are mission critical?
  • Is availability or performance more important?
  • How much of my business capacity will run at the remote site and for how long with I be able to sustain that load?
  • What distance is required to protect against goegraphic disasters?
  • What technologies (hardware and software) will be needed?
  • How often will you test the DR plan?
  • What impact will DR plan testing have on the production site? What impact is acceptable to the business?
  • What is the budget for this DR solution?

The three key network influencing factors are distance, bandwidth, and hop count. Throughput is good, but latency must also be considered. The type of replication, synchronous vs. asynchronous, that is being used is also important.

VLANs with SRM can be done in two ways: flat VLANs and disparate VLANs. With flat VLANs, no IP reconfiguration is required; with disparate VLANs, SRM can automate the process of reconfiguration IP addresses on VMs during the failover process.

Compliance guidelines that impact the business also need to be considered and incorporated into the design. Things like manual vs. automatic, SLA/RPO/RTO, failback requirements, security and access controls, and which technologies to use are all important. What about requirements to ensure that data is isolated to its own media?

What makes a DR solution successful? First, you need to understand what part of the business need to be protected. Understanding the applications and the dependencies (upstream and downstream) will help in this area. Ongoing testing of the DR plan is another key factor. The core virtualization itself is important—do you have the right version, is it correctly architected, are resources appropriately managed, etc. And, finally, operational readiness is important as well. Teams need to be trained on the different technologies and need to understand the workflows created within SRM.

When setting up SRM, two VirtualCenters and two SRM servers are required. Back-end SQL servers are necessary as well, in addition to authentication servers, and of course a supported data replication mechnaism. SRM should not be used to protect “shared services,” like authentication (Active Directory), although these certainly can be virtual machines.

Inventory mapping is used to map networks/port groups at the Protected Site to the coresponding networks/port groups on the Recovery Site. The SRA (Storage Replication Adapter) handles the matching of LUNs between the Protected Site and the Recovery Site. Empty LUNs (LUNs without a VM) won’t be properly recognized by SRM. Protection Groups form the basis of the Recovery Plan and are centered around LUNs. When a LUN is failed over, all VMs on that LUN must be failed over at the same time. This may require some re-organization of the VMs on the various LUNs to group VMs together for similar service levels/failover requirements.

In the Recovery Plan, high priority VMs are started sequentially, in the order defined in the plan; medium priority and low priority VMs are started in parallel. It’s critical that business requirements and dependencies are understood here so that systems can be failed over and restarted in the correct order.

John now moved deeper into the design considerations, like server types, network configuration, DNs services, Active Directory services, VirtualCenter infrastructure (two VC servers, one at each site), and ESX hosts (needed at both sites). Of course, SRM servers are needed at both sites. Distributed Power Management (DPM) may also play a role here to help reduce power costs for VMware ESX hosts at the Recovery Site.

Will and John then proceeded to review some sample logical diagrams, sample recovery plan, sample workflow based on the sample recovery plan, and to discuss in more detail these various items.

Overall, the session is very good, but it is much more business-oriented and not technology-oriented. That may be due in large part to the nature of SRM; in order to be successful in building a DR solution, a strong business focus is required. If nothing else, it would be important for attendees of this session to at least understand that a successful SRM implementation involves much, much more than just installing and configuring SRM.

Tags: , , ,

I’ve never really discussed VMware Site Recovery Manager (SRM) here; there always seemed to be plenty of coverage elsewhere. Just recently, though, I had the opportunity to spend some time with a very knowledgeable SRM resource within VMware, and gathered some notes about VMware SRM that I thought might be helpful. Some of this stuff may be obvious, so bear with me.

  • Storage array replication is a necessity. Without it, SRM can’t be used. Keep in mind that only certain arrays and certain replication technologies are supported, so be sure to check the SRM compatibility list.
  • The Storage Recovery Adapter (SRA) is a critical part of an SRM deployment, but it doesn’t come from VMware. It comes from the storage vendor (assuming that it is a compatible array and compatible replication technology).
  • Two instances of VirtualCenter (VC) are required. One of these will be at the “Protected Site,” the other will be at the “Recovery Site.”
  • Likewise, two instances of SRM are needed, one at each site.
  • The VC servers and SRM servers at each site need to be able to talk with each other, i.e., they need IP-based connectivity. SRM will communicate with the local VC server over TCP ports 443 and 8095. SRM will communicate with the remote VC server over TCP port 443. The local SRM server uses the remote VC server as a proxy to communicate with the remote SRM server instead of communicating with it directly.
  • VC and SRM each require their own database.
  • If the physical hardware is sufficiently equipped, then VC and SRM can be co-located on the same server. Otherwise, VC and SRM should be placed on their own physical server.
  • SRM does not support failback. Instead, create a Recovery Plan in reverse.
  • The VC and SRM databases do not replicate between the sites. They are maintained separately.
  • Observe the “DNS Rule of Four” for SRM—forward lookup, reverse lookup, short name, and fully-qualified domain name (FQDN). All four of these should work properly.
  • All VMs in a Protection Group will fail over at the same time, so users will want to properly architect the Protection Groups to provide the appropriate DR functionality for the right VMs. Application dependencies are important here—failing over some VMs but not others that provide dependency services won’t do much good, now will it?
  • VMware SRM requires Virtual Center 2.5, and VirtualCenter 2.5 Update 1 is recommended. Update 2 is not supported.
  • Similarly, VMware ESX 3.5 Update 2 is also not supported (yet).

I’m confident I’ll have more posts on VMware SRM in the coming months. In the meantime, feel free to add your thoughts in the comments below.

Tags: , , ,

Here’s the latest installation of Virtualization Short Takes, my occasionally-weekly view on various virtualization news, reviews, and other happenings. Hopefully I can share something interesting with you!

  • Via VMblog.com, I saw that Transitive Corporation is supporting the use of QuickTransit within Hyper-V virtual machines. This is interesting because it extends the ability of Hyper-V to help customers consolidate applications. QuickTransit, in case you aren’t aware, allows applications written for Solaris/SPARC environments to run in Linux/x86 environments. It was also the technology behind Apple’s Rosetta, which allowed Mac users to run PowerPC apps on Intel Macs. Does anyone know if QuickTransit is supported within VMware VMs, or is this specific to Hyper-V?
  • This one was quite interesting to me. Question #2 is particularly applicable: why is a reboot required, anyway? (Yes, yes, I know—there is a workaround that does not require a reboot. It’s the principle of the matter.)
  • Via various sources on the Internet, I learned about the release of ESX Manager. This looks like quite an interesting tool, although I have not yet had the opportunity to install or try it yet. Anyone out there tried this and have some feedback for us?
  • Every now and then, something comes up about Citrix XenServer and Xen and it makes me wonder about the relationship between Citrix and the open source Xen community. The latest thing is what appears to be an offhand comment by Simon Crosby of Citrix where he says, “Because we own the hypervisor, we can do much more integration and development around it” (read it in context here). What does that mean? What does “ownership” of the Xen hypervisor mean? And if the Xen hypervisor is licensed under an open source license (GNU GPL v2, according to this page), how can Citrix make proprietary extensions to the hypervisor without being forced to release those extensions back to the community? I guess I just don’t understand the relationship there and how it works. This is where the murky waters of a commercial entity “owning” an open source project come into play, in my mind.
  • I ran across this very useful tip for creating a vSwitch with a specific number of ports. It looks like Dwight Hubbard, the maintainer of the site, also has some other interesting posts. Might be worth adding his feed to your RSS reader.
  • Nick Triantos discusses NetApp’s Site Recovery Adapter (SRA) and its role with VMware Site Recovery Manager (SRM). Anyone have any links to similar discussions of the SRAs for other storage vendors?
  • John Howard provides a great breakdown of how Hyper-V generates dynamic MAC addresses and how Hyper-V attempts to protect against MAC collisions in some circumstances.
  • The VI3 Security Hardening Guide has been updated, which is good because some people felt it just didn’t go far enough.
  • VMware re-iterated their stance on being storage protocol agnostic, and in the article included a very useful table that summarizes the various products and technologies and which are supported with which storage protocols. While the rest of the post is helpful, that summary of supported features is probably the most helpful.
  • Interesting in trying out Hyper-V, but don’t have shared storage? Take a look at this blog post. I think you’ll find it helpful.

I’m always on the lookout for other interesting or useful virtualization news, tips, and tricks, so feel free to share with me and other readers in the comments.

Tags: , , , , , , , ,