A problem has been identified with VMware ESX 3.5 Update 3 when using VMware HA and VM failure monitoring. This problem results from a delay in the transmission of a heartbeat from a VM to VMware HA; VMware HA then detects this as a VM failure and restarts the VM. It appears that this problem affects both VMware ESX and VMware ESXi.
More information on the problem is available in this KB article.
To fix the problem, users have two options:
- Disable virtual machine failure monitoring within the VMware HA cluster.
- Reconfigure the host to change the heartbeat delay.
To reconfigure the host to change the heartbeat delay, follow the steps below:
- Disconnect the host from VC (right-click on the host in the VI Client and select “Disconnect”).
- Login to the VMware ESX server via SSH and obtain root permissions. Remember that best practices specify not to allow root SSH login, so you’ll need to login as an ordinary user and then use “su -” to become root.
- Using a text editor such as nano or vi, edit the file “/etc/vmware/hostd/config.xml” and set the value of heartbeatDelayInSecs to 0, like this:
- Restart the management agents on the VMware ESX server.
- Reconnect the host in VC (right-click on the host in the VI Client and select “Connect”).
No information is yet available on when this issue will be fixed.