iSCSI Boot with Microsoft MPIO

There’s a small gotcha when using Microsoft’s iSCSI initiator and MPIO driver to do iSCSI multipathing:  the Microsoft initiator and MPIO driver will overwrite the IQN of the iSCSI HBA.  Obviously, this could cause problems where access control to iSCSI LUNs is based on initiator IQN.

As pointed out in this Qlogic support document (check the “Additional Notes” section at the bottom of the page), installation of the Microsoft iSCSI initiator will overwrite the IQN of the HBA with a Microsoft-generated IQN, like “1991-05.com.microsoft:servername.domain.com” or similar.

In environments where access to LUNs is controlled in part or in whole by initiator IQN, this is a problem.  One such environment is NetApp iSCSI SANs, where initiator groups (or “igroups”) control access to LUNs based on the IQNs of the initiators.  To work around this, you’ll want to add the original IQNs of the HBAs (before the installation of the Microsoft iSCSI initiator) as well as the Microsoft IQN in the igroups for the LUNs that should be visible to that server.  Otherwise, you could lose access to the LUN after installation of the Microsoft initiator.

(By the way, in case you’re wondering why one would install the Microsoft iSCSI initiator when you’ve already got HBAs, there’s a good reason—to get multipath support.)

Tags: , , , ,

  1. Nick’s avatar

    I have a question for you: Did you get fail over to work on a SAN booted OS drive or just the data drive?

  2. slowe’s avatar

    Nick,

    The SAN-booted OS drive as well as any data drives. There’s no fault tolerance during the boot process, but once Windows has booted the OS drive fails over successfully, as do any data drives for that server.

  3. Nick’s avatar

    I haven’t been able to get the OS drive fail over to work. And MS support says it doesn’t work at all for the OS drive.. Would you mind shooting me an email?

  4. Charles’s avatar

    I too am curious about how you’ve gotten this to work.

    I have two Qlogic 4050C cards in an HP server. I’ve installed Windows 2003 server (64-bit) and the MS iSCSI tools. I’ve tried about every combination I can think of but have yet to find one that works.

    I’m booting from the SAN. I have one Qlogic card set to “manual” boot mode and specified all the right information. I can boot from the SAN just fine; just no failover.

    To test failover, I’m disabling a port on my Cisco switch. I wait about 20 second and then turn it back on because the server becomes unresponsive.

    My iSCSI SAN vendor is LeftHand.

    Any suggestions on making it work would be great.

    Thanks,
    Charlie

  5. slowe’s avatar

    Charlie,

    You didn’t mention if you were using blades, but my friend Aaron over at BladeVault.info posted this article a short while ago:

    http://bladevault.info/?p=4

    Perhaps this is affecting you as well?

  6. Charles’s avatar

    No blades here but I *think* I got it figured. I set the KeepAliveTO to 15 seconds.

    I also went under the targets tab and clicked “Log On”. I made sure to check the “Enable multi-path” and the automatic restore options. I then went to the “Advanced” button and changed the “Local Adapter” to the first Qlogic adapter. I left all other settings at their default. I repeated the process for my second HBA.

    After that, clicking on the “Details” button, I see 4 sessions under the “Sessions” tab. Two for the HBA that appeared just because and two that I just added.

    Under the “Devices” tab, I have 4 devices that are all multi-path. I’m using “Failover” for the “Load Balance Policy”.

    I suspect I could just do all this for the second HBA since the first one is the one I booted from. More testing will tell. I know this works though. I can shutdown the active port on my Cisco switch and 15 seconds later, I can use the server again. I then reversed the process and shutdown the standby port to make sure there were no adverse affects from that scenario as well.

    I may set the KeepAliveTO to a lower value, but I’m not sure how low I can go before causing instability.

    Charlie

  7. Aaron Delp’s avatar

    Hey Charlie – I suspect that you can go pretty low on the timeout. When we got it going on the IBM Blades (to NetApp storage) we set the targets the same but specified different initiators for each port on the card. We then put both of these initiators in the allowed port group on the storage.

    To install the OS, we only presented one of the two paths. Once everything was up, we then loaded MPIO. Once the OS was up, we presented the second path.

    Since both would try to “boot” without the MS MPIO drivers you would get contention and the boot process would go in the weeds. With MPIO it will figure it out and one will go active (it was always the first port for us).

    On our testing we were able to pull a cable on the boot drive or the data drives and we would failover in about 5 seconds.

    Let me know if you have any more questions!

    Hey Scott – your site form doesn’t like my bladevault.info e-mail address :)

  8. slowe’s avatar

    Aaron, I’ll have a look at the code for the site form, it may be assuming a “traditional” TLD. Thanks for the heads-up!