What I had hoped to be able to publish today would be an article describing how to configure and use ESX’s software iSCSI initiator as a failover path for Fibre Channel, so that if the Fibre Channel fabric completely failed VM traffic would automatically failover to software iSCSI. I thought that this would be a great, low-cost way to add another layer of redundancy to your VMware ESX environment.
Unfortunately, I can’t make it work. Here’s the setup I’ve been using for testing:
- A 200GB LUN visible to ESX over both Fibre Channel (FC) and software iSCSI
- A VM, stored on this LUN, running Windows Server 2003 R2
Initial tests led me to believe that it would indeed work. I verified that both the FC path as well as the iSCSI path were listed as separate paths for the same LUN. Without placing any load on the VM, I pulled the FC connection from the back of the server. The VM stayed up, and I was able to browse the local hard drive inside the VM. Network connectivity remained active. And the “Manage Paths” dialog box even showed the FC connection as “Dead” and the iSCSI connection as On/Active. Given that information, it seemed like all was good.
Determined to verify that it was working as I expected, I trotted out a copy of IOmeter and tried to repeat the tests. This time around, though, the tests did not go quite so well. IOmeter showed that disk throughput stopped, and the VI Client locked up. I repeated this set of tests a couple of times, and each time—while IOmeter was running—I ran into issues.
Based on these results, I’m inclined to say that one of two things is true. Either:
- I did something very, very wrong; or
- ESX isn’t quite right to support automatic failover between FC and software iSCSI.
Has anyone else tried this, or am I the only one? If you have tried it, did it work? If so, what steps did you have to take—if any—to make it work properly?
-
Theoretically, this config should work, although, it’s not supported by VMware and by extension the majority, if not all, storage vendors.
Although, it sounds like a good solution, customers tend to be leary using different stacks. In fact we’ve supported this type of config for standalone windows environments and some UNIX environments (i.e HP-UX) for sometime now.
Try setting the Disk TimeOutValue(default=10″) in the Windows registry to be higher than the FC HBA driver timeout. It’s quite possible the Disk Class driver’s timing out the requests prior to the path switch been completed, especially if you’re queuing a whole bunch of requests…which you are.
-
Scott, what do the logs tell you from that timeframe?
-
I have seen this behaviour when a failover of two fiber paths. Even if the error looks to be in the ESX, you can try to change this registry entry (being a w2003):
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\TimeOutValue
BTW, what do the event logs in the w2003 are saying?If you can try with a linux VM you probably will see the root (/) partition going read only (for example Red Hat Linux 4 Update 4 needs this rpm http://kb.vmware.com/KB/51306 in order to prevent going on Read Only in the event of a fail over)
Hope this helps, congratulations for the blog!
Jon -
Scott-
Just a theoretical here; what about putting in a FC-to-iSCSI bridge, so that you’re not failing over to a new stack? I think that would simplify this situation, by making the iSCSI path appear to exist in the FC world. Of course, you’d probably need to build an additional seperate fabric, which means additional FC card (ugh), but it would technically be supported right? Really expensive, but then again, we’re talking FC SANs!
-Glenn
-
Yes, it works, even under IOMeter stress load.
But one must add the Disk TimeOutValue to Win registry to let them survive the failover delay. I’ve set it to 60 (seconds) and it works. While paths switch, IOs stall, but resume. WOW! (not the Vista kind
I’m wondering about setting some kind of path cost for iSCSI to force VMWare to fail back to FC if available – 4 GBit >> 1 Gbit. Any ideas?
-
Hi guys, i know it’s an old post but as i was testing FC on Nexenta I gave a shot to try it and it actually works very well. Check out the video on my blog post http://www.hypervisor.fr/?p=3164



11 comments
Comments feed for this article
Trackback link: http://blog.scottlowe.org/2008/04/28/fibre-channel-to-software-iscsi-failover-failures/trackback/