<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Troubleshooting VMware HA Isolation Response</title>
	<atom:link href="http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/</link>
	<description>The weblog of an IT pro specializing in virtualization, storage, and servers</description>
	<pubDate>Mon, 01 Dec 2008 20:17:13 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
		<item>
		<title>By: slowe</title>
		<link>http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-41882</link>
		<dc:creator>slowe</dc:creator>
		<pubDate>Wed, 08 Oct 2008 19:44:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-41882</guid>
		<description>Daniel,

Disabling HA will "uninstall" the HA agent on the VMware ESX servers and theoretically clean up any lingering information. Aside from a host failure occuring while HA is disabled, there's no risk of which I am aware. Even so, prepare appropriately in production environments--just in case.</description>
		<content:encoded><![CDATA[<p>Daniel,</p>
<p>Disabling HA will &#8220;uninstall&#8221; the HA agent on the VMware ESX servers and theoretically clean up any lingering information. Aside from a host failure occuring while HA is disabled, there&#8217;s no risk of which I am aware. Even so, prepare appropriately in production environments&#8211;just in case.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: DCasota</title>
		<link>http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-41881</link>
		<dc:creator>DCasota</dc:creator>
		<pubDate>Wed, 08 Oct 2008 19:36:52 +0000</pubDate>
		<guid isPermaLink="false">http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-41881</guid>
		<description>I think we have the same situation like TomK: Working hosts are trying to power on VMs which hasn't been powered off. But the network connection has reconnected.
It seems that each esx server "memorizes" all VMs which have to be restarted as response to the isolation event.

Killing the "wrong" VMs on each esx would be a solution, but with more than 60 VMs and 6 esx it's quite a lot of work.

What possibilites do we have to clean up the situation? What will occur if we would disable HA? Will it cleanup the isolation response and no VM will be restarted unexpectedly?

Daniel</description>
		<content:encoded><![CDATA[<p>I think we have the same situation like TomK: Working hosts are trying to power on VMs which hasn&#8217;t been powered off. But the network connection has reconnected.<br />
It seems that each esx server &#8220;memorizes&#8221; all VMs which have to be restarted as response to the isolation event.</p>
<p>Killing the &#8220;wrong&#8221; VMs on each esx would be a solution, but with more than 60 VMs and 6 esx it&#8217;s quite a lot of work.</p>
<p>What possibilites do we have to clean up the situation? What will occur if we would disable HA? Will it cleanup the isolation response and no VM will be restarted unexpectedly?</p>
<p>Daniel</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: TomK</title>
		<link>http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-40567</link>
		<dc:creator>TomK</dc:creator>
		<pubDate>Wed, 13 Aug 2008 10:28:57 +0000</pubDate>
		<guid isPermaLink="false">http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-40567</guid>
		<description>Great post, Scott - exactly the same behaviour I'm seeing (except the vmkernel log, I will need to test that to see if I'm seeing the same there). 

I was wondering if this has been logged as a bug with VMware? Clearly if the working hosts are trying to power on the VM before it has been powered off, the software has almost created a split-brain scenario without any mis-configuration - surely a bug?

I'm going to do some testing with ESX 3.5 Update 2 (assuming the timebomb fix is stable) - will post results.

Cheers,
TK</description>
		<content:encoded><![CDATA[<p>Great post, Scott - exactly the same behaviour I&#8217;m seeing (except the vmkernel log, I will need to test that to see if I&#8217;m seeing the same there). </p>
<p>I was wondering if this has been logged as a bug with VMware? Clearly if the working hosts are trying to power on the VM before it has been powered off, the software has almost created a split-brain scenario without any mis-configuration - surely a bug?</p>
<p>I&#8217;m going to do some testing with ESX 3.5 Update 2 (assuming the timebomb fix is stable) - will post results.</p>
<p>Cheers,<br />
TK</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Erik Bussink</title>
		<link>http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-40162</link>
		<dc:creator>Erik Bussink</dc:creator>
		<pubDate>Fri, 25 Jul 2008 14:11:21 +0000</pubDate>
		<guid isPermaLink="false">http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-40162</guid>
		<description>Hiya,

I also create a larger script to help get some additional information from HA.

ha_status.sh

!/bin/sh
# Erik Bussink (July 2008)
#
# requires the FT_DIR variable
# add the following line to you're /root/.bashrc
#  export FT_DIR=/opt/vmware/aam
#


# Monitor AAM Health (Command taken from VMworld 2006 - Effective DRA and HA in production by Nitin Suri)
/usr/bin/perl /opt/vmware/aam/ha/aam_config_util.pl -z -cmd=listnodes -domain=vmware


# List current HA members status
echo
/opt/vmware/aam/bin/ftcli -domain vmware -cmd "la -l"
echo

# List status of AAM Agent on each node
echo
echo -[ esx11 ]---------------------------------------------------------------------
/opt/vmware/aam/bin/ftcli -domain vmware -cmd "status esx11"
echo -[ esx12 ]---------------------------------------------------------------------
/opt/vmware/aam/bin/ftcli -domain vmware -cmd "status esx12"
echo -[ esx13 ]---------------------------------------------------------------------
/opt/vmware/aam/bin/ftcli -domain vmware -cmd "status esx13"
echo -[ esx14 ]---------------------------------------------------------------------
/opt/vmware/aam/bin/ftcli -domain vmware -cmd "status esx14"
echo -[ esx15 ]---------------------------------------------------------------------
/opt/vmware/aam/bin/ftcli -domain vmware -cmd "status esx15"
echo -[ esx16 ]---------------------------------------------------------------------
/opt/vmware/aam/bin/ftcli -domain vmware -cmd "status esx16"

Sure I have not really optimized the bash shell lines near the end, but it's easy and it can be usefull.

Erik</description>
		<content:encoded><![CDATA[<p>Hiya,</p>
<p>I also create a larger script to help get some additional information from HA.</p>
<p>ha_status.sh</p>
<p>!/bin/sh<br />
# Erik Bussink (July 2008)<br />
#<br />
# requires the FT_DIR variable<br />
# add the following line to you&#8217;re /root/.bashrc<br />
#  export FT_DIR=/opt/vmware/aam<br />
#</p>
<p># Monitor AAM Health (Command taken from VMworld 2006 - Effective DRA and HA in production by Nitin Suri)<br />
/usr/bin/perl /opt/vmware/aam/ha/aam_config_util.pl -z -cmd=listnodes -domain=vmware</p>
<p># List current HA members status<br />
echo<br />
/opt/vmware/aam/bin/ftcli -domain vmware -cmd &#8220;la -l&#8221;<br />
echo</p>
<p># List status of AAM Agent on each node<br />
echo<br />
echo -[ esx11 ]&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
/opt/vmware/aam/bin/ftcli -domain vmware -cmd &#8220;status esx11&#8243;<br />
echo -[ esx12 ]&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
/opt/vmware/aam/bin/ftcli -domain vmware -cmd &#8220;status esx12&#8243;<br />
echo -[ esx13 ]&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
/opt/vmware/aam/bin/ftcli -domain vmware -cmd &#8220;status esx13&#8243;<br />
echo -[ esx14 ]&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
/opt/vmware/aam/bin/ftcli -domain vmware -cmd &#8220;status esx14&#8243;<br />
echo -[ esx15 ]&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
/opt/vmware/aam/bin/ftcli -domain vmware -cmd &#8220;status esx15&#8243;<br />
echo -[ esx16 ]&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
/opt/vmware/aam/bin/ftcli -domain vmware -cmd &#8220;status esx16&#8243;</p>
<p>Sure I have not really optimized the bash shell lines near the end, but it&#8217;s easy and it can be usefull.</p>
<p>Erik</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: slowe</title>
		<link>http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-40022</link>
		<dc:creator>slowe</dc:creator>
		<pubDate>Fri, 11 Jul 2008 13:58:14 +0000</pubDate>
		<guid isPermaLink="false">http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-40022</guid>
		<description>Erik,

Good stuff--thanks for sharing it here!</description>
		<content:encoded><![CDATA[<p>Erik,</p>
<p>Good stuff&#8211;thanks for sharing it here!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Erik Bussink</title>
		<link>http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-40016</link>
		<dc:creator>Erik Bussink</dc:creator>
		<pubDate>Fri, 11 Jul 2008 10:20:01 +0000</pubDate>
		<guid isPermaLink="false">http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-40016</guid>
		<description>And a 2nd one that will continiously monitor the status

ha_watch.sh

#!/bin/sh
# Created by Erik Bussink (July 2008)
#
# requires the FT_DIR variable
# add the following line to you're /root/.bashrc
#  export FT_DIR=/opt/vmware/aam
#
watch '/opt/vmware/aam/bin/ftcli -domain vmware -connect localhost -port 8042 -timeout 60 -cmd listnodes'</description>
		<content:encoded><![CDATA[<p>And a 2nd one that will continiously monitor the status</p>
<p>ha_watch.sh</p>
<p>#!/bin/sh<br />
# Created by Erik Bussink (July 2008)<br />
#<br />
# requires the FT_DIR variable<br />
# add the following line to you&#8217;re /root/.bashrc<br />
#  export FT_DIR=/opt/vmware/aam<br />
#<br />
watch &#8216;/opt/vmware/aam/bin/ftcli -domain vmware -connect localhost -port 8042 -timeout 60 -cmd listnodes&#8217;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Erik Bussink</title>
		<link>http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-40015</link>
		<dc:creator>Erik Bussink</dc:creator>
		<pubDate>Fri, 11 Jul 2008 10:18:53 +0000</pubDate>
		<guid isPermaLink="false">http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-40015</guid>
		<description>Create a small script on my ESX server to list the HA settings. Might be usefull to some
The file is named ha_list.sh

#!/bin/sh
# Created by Erik Bussink (July 2008)
#
# requires the FT_DIR variable
# add the following line to you're /root/.bashrc
#  export FT_DIR=/opt/vmware/aam
#

# List current HA members, their Roles and their Status
/opt/vmware/aam/bin/ftcli -domain vmware -connect localhost -port 8042 -timeout 60 -cmd listnodes

echo

# List current HA members settings
/opt/vmware/aam/bin/ftcli -domain vmware -cmd "la -l"</description>
		<content:encoded><![CDATA[<p>Create a small script on my ESX server to list the HA settings. Might be usefull to some<br />
The file is named ha_list.sh</p>
<p>#!/bin/sh<br />
# Created by Erik Bussink (July 2008)<br />
#<br />
# requires the FT_DIR variable<br />
# add the following line to you&#8217;re /root/.bashrc<br />
#  export FT_DIR=/opt/vmware/aam<br />
#</p>
<p># List current HA members, their Roles and their Status<br />
/opt/vmware/aam/bin/ftcli -domain vmware -connect localhost -port 8042 -timeout 60 -cmd listnodes</p>
<p>echo</p>
<p># List current HA members settings<br />
/opt/vmware/aam/bin/ftcli -domain vmware -cmd &#8220;la -l&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: slowe</title>
		<link>http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-39603</link>
		<dc:creator>slowe</dc:creator>
		<pubDate>Thu, 26 Jun 2008 17:49:40 +0000</pubDate>
		<guid isPermaLink="false">http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-39603</guid>
		<description>Russ,

You don't need to set das.isolationaddress (which defaults to the SC's default gateway) unless that address does not respond to ping, or unless you'd like to use some other device on the network.

For das.isolationaddress2, set it to be any device that responds to ping, is available via that network adapter, and could be used as a way for the host to determine if it is isolated.

Hope this helps!</description>
		<content:encoded><![CDATA[<p>Russ,</p>
<p>You don&#8217;t need to set das.isolationaddress (which defaults to the SC&#8217;s default gateway) unless that address does not respond to ping, or unless you&#8217;d like to use some other device on the network.</p>
<p>For das.isolationaddress2, set it to be any device that responds to ping, is available via that network adapter, and could be used as a way for the host to determine if it is isolated.</p>
<p>Hope this helps!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Russ</title>
		<link>http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-39600</link>
		<dc:creator>Russ</dc:creator>
		<pubDate>Thu, 26 Jun 2008 16:35:20 +0000</pubDate>
		<guid isPermaLink="false">http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-39600</guid>
		<description>Not quite getting this part..

Once I create a second SC connection, on a different vswitch on a different subnet I have to: 1) add the das.isolationaddress in Advanced options, and make that the SC gateway.  20 Then add das.isolationaddress2 and - here's what I don't get - add what address?  The gateway address for that subnet, or the actual address of the second console?  Or something else entirely?</description>
		<content:encoded><![CDATA[<p>Not quite getting this part..</p>
<p>Once I create a second SC connection, on a different vswitch on a different subnet I have to: 1) add the das.isolationaddress in Advanced options, and make that the SC gateway.  20 Then add das.isolationaddress2 and - here&#8217;s what I don&#8217;t get - add what address?  The gateway address for that subnet, or the actual address of the second console?  Or something else entirely?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: slowe</title>
		<link>http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-39554</link>
		<dc:creator>slowe</dc:creator>
		<pubDate>Tue, 24 Jun 2008 13:17:22 +0000</pubDate>
		<guid isPermaLink="false">http://blog.scottlowe.org/2007/10/05/troubleshooting-vmware-ha-isolation-response/#comment-39554</guid>
		<description>Bernie,

Do you have two VMkernel ports? Looking back at your comments, I see mention of a VMkernel port on vSwitch0 as well as on vSwitch1. Unless you are using IP-based storage on the 10.10.10.x network, you don't need a VMkernel port on vSwitch0, just a Service Console port group. That would leave you with a single VMkernel port on vSwitch1. Or am I misunderstanding your configuration?

Feel free to move this to e-mail if you want. You can get my e-mail address from the About page on this site.</description>
		<content:encoded><![CDATA[<p>Bernie,</p>
<p>Do you have two VMkernel ports? Looking back at your comments, I see mention of a VMkernel port on vSwitch0 as well as on vSwitch1. Unless you are using IP-based storage on the 10.10.10.x network, you don&#8217;t need a VMkernel port on vSwitch0, just a Service Console port group. That would leave you with a single VMkernel port on vSwitch1. Or am I misunderstanding your configuration?</p>
<p>Feel free to move this to e-mail if you want. You can get my e-mail address from the About page on this site.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
