iSCSI and ESX Server 3

A few weeks ago I wrote about trying to use NetApp’s ONTAP Simulator as a VM under ESX Server so that I could do some testing with the new NAS and iSCSI functionality in ESX Server 3.0.  I finally got that working, but later had to shut it down when I started working with ESX Server 3.  As it turns out, the ProLiant 6400R servers I was using for my VMware lab (running ESX 2.5.x) were not supported for the final release of ESX 3 because the cpqarray.o driver was dropped from the final release, and the cpqarray.o driver is what supported the Smart Array 3200 and Smart Array 4250 RAID controllers in these two boxes.  No ESX 3 on these boxes.

Thankfully, I was able to locate an unused ProLiant ML350 G4p on which I could run ESX Server 3.  Instead of trying to use the ONTAP Simulator as a VM, then, I rebuilt one of the 6400R servers with Red Hat Linux 9.0 and installed the NetApp ONTAP Simulator directly from there.  After a fair amount of work, I finally had everything setup, and the simulated Filer was running with about 20GB of available storage to serve up via iSCSI, CIFS, NFS, or HTTP.

iSCSI was what I was really interested in, so the available storage from the ONTAP Simulator got carved into a couple of LUNS to be presented via iSCSI, along with the requisite initiator groups.  Once everything looked to be in place on the Simulator, I moved to the configuration of ESX Server.  Had I been lucky, I might have been trying to do this after Mike Laverick released his Service Console Guide, but I was a few days too early, and used the VI Client GUI instead to configure the VM Kernel NIC and the software iSCSI client.  The configuration looked correct, but there was no connectivity.

Not sure if the problem was ESX Server or the ONTAP Simulator, I mapped another LUN and created another initiator group and tested iSCSI connectivity from a Windows Server 2003 VM using Microsoft’s iSCSI initiator.  That worked perfectly—not even the first problem.  Clearly, the problem was not with the ONTAP Simulator, but with ESX Server instead.

Reviewing the configuration, I did find a couple of problems:

  • The iSCSI node name I had specified in the igroup configuration on the Simulator was incorrect, so I fixed that.  (Or thought I had; more on that in a moment.)
  • The iSCSI security configuration was incorrect, so I fixed that as well.

ESX Server should see the storage now, but there was still no connectivity.  Finally, I came across a blurb somewhere about the new firewall in ESX Server 3.0, and how it controlled not only inbound traffic, but also outbound traffic.  A quick trip to the service console and this command:

esxcfg-firewall -e swISCSIClient

You would expect that using the VI Client to enable the software iSCSI client would also enable outbound iSCSI traffic support on the ESX firewall, but it didn’t.  As soon as I entered that command, the Simulator’s console (where I was logged in) started showing iSCSI connection requests.  This, in turn, revealed another problem—ESX Server insisted on using its own iSCSI node name instead of the node name I had assigned.  That was easily and quickly corrected, and I was finally able to mount the iSCSI LUN and create a VMFS datastore.

Key points to remember:

  • Apparently, the iSCSI node name you specify in configuring ESX Server will be ignored, so don’t bother.  Just use whatever ESX is already configured with.
  • Be sure to either configure iSCSI from the command line (where outbound iSCSI traffic is allowed through the firewall automatically) or go back and allow outbound iSCSI traffic through the ESX firewall.

Having iSCSI-based storage eliminates one potential block to testing and demonstrating VMotion to customers—shared storage—but now I need to work on getting Gigabit Ethernet into the test lab as well.  Too bad there’s not a software workaround for that, too…

UPDATE:  I corrected the command-line above to reflect that the first “I” should be uppercase, not lowercase as was previously noted.  Thanks to Chauncey for catching the error!

Tags: , , , , , , , , , , ,

13 comments

  1. Dennis’s avatar

    Great to know it can work… Somehow it doesn’t at our site :)

  2. slowe’s avatar

    Sorry to hear you are having problems, Dennis…what kinds of issues are you experiencing?

  3. Dennis’s avatar

    I’m not sure where the problem is but the effect is that I can’t see the filer nor the lun on the esx server.

    I have read your article &
    http://www.netapp.com/library/tr/3393.pdf
    http://www.netapp.com/library/tr/3428.pdf
    http://www.netapp.com/library/tr/3401.pdf

  4. slowe’s avatar

    Are you seeing iSCSI operations on the Filer? At the console, you’ll see regular status updates showing how many HTTP, CIFS, NFS, and iSCSI operations…do you see any iSCSI operations? If not, you may still be blocking outbound iSCSI traffic from the ESX Server. Run the “esxcfg-firewall -e swiSCSIClient” (if you haven’t already) to make sure that iSCSI traffic is being allowed out from the ESX server.

    In the meantime, I’ll review those NetApp documents and see what additional information I may be able to provide.

  5. Tim Teller’s avatar

    Scott,

    We have a NetApp Filer that is configured with iSCSI. We haven’t used it yet. We are purchasing VMWare I 3 enterprise. I am wondering if VMotion works well with iSCSI. We also have an OnStore NAS gateway that support NFS, could you do VMotion with NFS?

    Thanks in advance,
    Tim Teller

  6. slowe’s avatar

    Tim,

    iSCSI works reasonably well as a storage mechanism, but that’s only part of the picture when it comes to VMotion. Since the virtual disk file (the VMDK file) doesn’t move during a VMotion operation, the storage is kind of secondary to the whole equation. You will need Gigabit Ethernet connectivity on the same subnet for VMotion to work between ESX servers, and you should be using Gigabit Ethernet for iSCSI anyway. Although I have not tested NFS yet (plan to soon), VMware’s product page indicates full support for the storage of VMDK files and VMotion with NFS. Refer to http://www.vmware.com/products/vi/esx/#interoperability and see the section titled “Storage” for the official party line.

  7. slowe’s avatar

    Dennis,

    I reviewed the NetApp documents you mentioned in your comment, but none of them pertain to iSCSI configuration or troubleshooting; they are all high-level overviews of using NetApp Filers and Snapshot technology with VMware. I would suggest you step back from that and fall back to troubleshooting the basic iSCSI connectivity: make sure you have correctly mapped the LUNs on the Filer, make sure the initiator groups are correctly configured, and (on the VMware side) make sure you have allowed the software iSCSI traffic through the firewall.

    Let me know how I can help further.

  8. Jae Ellers’s avatar

    We’re doing this now, have 740 GB coming online in 3 luns for migrating ESX 2 –> 3. Vmotion works great so far with limited usage on ibm x366 systems. Will have to see how the iSCSI load impacts the filer. We’ve already got a new 4-port card to add if the production network becomes overloaded. Anyone comment on where we should see the bottleneck? Filer CPU or NIC? It’s a FAS 3050.

  9. slowe’s avatar

    Jae,

    Good question–I’d hazard to say NIC as opposed to Filer CPU, but I’m not a NetApp guru so don’t quote me on that. I’ll see if I can get a NetApp guru from my office to provide his perspective.

  10. David Blaisdell’s avatar

    I just experienced the issue relating to iSCSI and the VM ESX firewall. The twist I encountered was that after the initial build of 4 ESX hosts everything worked fine and I added 8 LUNs and had a dozen VMs running with no issue. However when trying to add a 9th LUN I could not. Somehow the firewall was enabled for inbound/outbound iSCSI connections but then got turned off somehow??

    Thanks for your coverage of this – it will save many some unnecessary frustration.

    David

  11. slowe’s avatar

    David,

    Sounds like your issue is not the ESX firewall but instead a default setting in ESX that only lets it see 8 LUNs. Anything after 8 is invisible. Look for the Disk.MaxLUN setting and set it higher than your highest LUN to be presented to the servers. I’m not in front of ESX right at this moment so I can’t directly you to the exact place where it is set.

  12. slowe’s avatar

    Don,

    Glad to hear you’ve found the site helpful.

    Thanks,
    Scott

Comments are now closed.