VMware HA Failover Capacity Changes

Continuing the discussion regarding VMware HA failover capacity started in this article and continued in this follow-up article, it appears that VMware has added the ability to modify the “slot size” used in calculating VMware HA failover capacity as part of ESX Server 3.5 and VirtualCenter 2.5.

Alerted to this VMware KB article by Duncan Epping of Yellow Bricks in this posting on his site, there’s a reference in the PDF from VMware that discusses a new option for setting the default “slot size”.  To quote from Duncan’s site:

If no VM reservations are set in a cluster VMware HA assumes cluster-wide average CPU and memory reservation sizes of 256 Mhz and 256 MB to use in admission control calculations. Alternative values can be specified instead…
 
Add the das.vmMemoryMinMB = <value> and das.vmCpuMinMHz = <value> option/value pairs to the cluster’s settings where <value> represents the desired values in terms of MB and MHz. Higher values will reserve more space for failovers.

So this looks like it’s allowing us to specify how VMware HA should calculate the default slot size, but as in so many areas of VMware HA there is precious little documentation.  I like VMware HA; I really do.  But VMware needs to get somebody on the ball to document the exact configuration and operation of VMware HA so that this solution becomes less of the “black box” that it is today.  As it stands currently, many customers are forgoing the benefits of VMware HA because it can’t be reliably and consistently configured and debugged.

<aside>Cases in point: I was at a meeting before Christmas with a customer who is having problems with their VMware HA clusters and we can’t find anyone—inside or outside of VMware—that can speak definitively about VMware HA, how it should be configured, or how it operates.  Back in October, I blogged about problems with isolation response, and still haven’t gotten those problems resolved.  C’mon, VMware, don’t drop the ball!</aside>

If anyone can shed some light on these new settings—I plan on testing them in the lab as soon as possible—that would be very useful.  In the meantime, I encourage everyone to check out the PDF linked in the VMware KB article on VMware HA best practices.  And, just for fun, check out this white paper on the VMware HA VM failure monitoring functionality that’s new in ESX Server 3.5.

Tags: , , ,

I will try to test these settings this week if I can find the time. Indeed it seems that it’s a way to have a more accurate calculation for your cluster. For some reason the documentation is indeed poor, same goes for VC and ESX by the way. The in-program docu seems to be “half-updated”.

Duncan,

Let me know what you find, if anything. Thanks!

The VMware HA agent is actually using EMC’s AutoStart package (just found that out myself, not sure how widely available that fact is). Getting access to those documents would probably prove quite useful.

Jason,

EMC AutoStart–is that the new name for the old Legato Automated Availability Manager (AAM)? That’s what the HA agent used to be based upon, but no one can seem to find any information on that product set, either.

I have a Legato AAM Concepts Guide (from 2001)which ive managed to find if you want it. If so, ping me a mail.

Cheers,

Warren