Error Connecting to VM Console

During an upgrade of a server running ESX Server 3.0.0 to ESX Server 3.0.2, we also moved the server to a new server room on a new subnet.  The upgrade itself was uneventful and took only a few minutes (as I had expected), but what happened afterward caught me a little off-guard, as did the eventual solution.

We needed to change the IP address of the service console, so after the upgrade was complete I simply edited the /etc/sysconfig/network-scripts/ifcfg-vswif0 file to include the new IP address, restarted networking, and went on about my way.  Everything seemed fine; the ESX host responded across the network, responded properly within VirtualCenter, powered on the VMs, etc.  In hindsight, I probably should have used the esxcfg-vswif command instead of editing the configuration file directly, but as they say, “Hindsight has 20/20 vision.”

It wasn’t until a few minutes later that we realized we were unable to connect to any VM’s console.  When we tried to open a VM’s console, we received an error message to the effect that the “host had responded incorrectly”.  Strangely enough, this problem only seemed to affect VI client installations; we were able to connect without any problems from the VirtualCenter server itself.

Thinking that perhaps we had run into an ACL on one of the network switches, I tried opening a telnet connection to TCP port 902 on the VirtualCenter server.  That worked just fine, so that eliminated the possibility of a router/switch ACL blocking the traffic, and also eliminated the possibility that a host-based firewall on the VirtualCenter server was causing the problem.  (A second review a couple minutes later verified again that Windows Firewall was not running and therefore could not be the problem.)  It wasn’t DNS name resolution; both the VC server and VI clients were able to resolve the hostnames of all the ESX servers as well as the individual guest VMs.

“Aha!” I thought.  “I need to restart mgmt-vmware because I didn’t restart that service after changing the IP address.”  Alas, that didn’t work either.

Finally, a Google search turned up this thread and this thread from the VMTN Community Forums, both of which referenced the /root/anaconda-ks.cfg file.  An Anaconda kickstart file causing the problem?  It didn’t make any sense to me, but just for kicks I made the following changes:

  • Edited the /root/anaconda-ks.cfg file to show the correct IP addressing information for the Service Console
  • Edited the /etc/sysconfig/network file to have the right gateway IP address (strangely enough, the Service Console seemed to be routing traffic correctly even with an incorrect gateway IP address)
  • Restarted networking and the mgmt-vmware services

Lo and behold, the VM consoles now worked perfectly.  I’m still not sure which of the changes actually corrected the problem; I hope to be able to try to recreate this problem in the lab and more closely determine what the exact cause and resolution were.  When I have some additional information, I’ll post it here.

Anyone else run into this problem?

Tags: , , ,

Hi Scott,

I agree, anaconda-ks.cfg shouldn’t have anything to do with the issue. This file stores all of the parameters used when installing the host (anaconda is the installation app).

I would be very confident in saying the issue was in your /etc/sysconfig/network file. And yes, your hindsight observation of using ‘esxcfg-vswif’ is dead on (I learned the hard way as well).

Hi Scott,

You just saved me from hours of looking around. Mine was definitely just the anaconda-ks.cfg file. Made the correction and the console now works fine. Now to figure out why HA won’t play nice.

Cheers
Mark

Mark,

Have a look at your DNS settings. I have found that DNS problems will definitely cause problems with HA. In some cases, in fact, I’ve resorted to hard-coding entries in the /etc/hosts file on the ESX servers, so that a DNS outage doesn’t cause problems with HA/DRS.

Hi Scott,

Thanks got it going DNS was all fine except that I had the hosts listed in capitals and dns was all in lower case. Changing the host to all lower case seemed to do the trick. Talk about picky.

Cheers
Mark

I can confirm that anaconda-ks.cfg also needs to be changed. I moved an existing server to a different IP address (same subnet so no change to GW) using esxcfg-vswif. After a network restart evething was working (ssh, scp, VI client) excep the VM console. I changed the server address in anaconda-ks.cfg and rebooted. The VM console then worked normally.

Mike.