Binding the ESX Software iSCSI Initiator to Two (or more) VMkernel Ports

If you are running vSphere in an iSCSI environment, I am sure that you’ve read the iSCSI SAN Configuration Guide (4.1 here and 4.0 here). One of the major functionality improvements made from VI3 to vSphere 4 was the multpathing capabilities of the software iSCSI initiaor on the ESX host.

In order to get the most throughput out of your ESX software iSCSI initiator, you need to bind it to two separate physical NICs. To do this you need to create two VMkernel port groups, make only one NIC active for each port group, then get on the ESX CLI and bind the software initiator to both VMkernel port groups. 

The details are as follows, and assumes you already have an iSCSI initiator configured on the host.  You will need:

  1. Two pNICs to be used for the iSCSI traffic
  2. An additional IP address in the iSCSI network

To make this binding of the iSCSI software initiator to both pNICs on a CLASSIC vSWITCH, follow these steps:

  1. Login to the vCenter or ESX Server with the vSphere client and go to the networking tab of the host you need to configure
  2. If you don’t already have both pNICs added to your iSCSI vSwitch, add your NIC
  3. Create a new VMkernel portgroup on your vSwitch (VMkernel 2)
  4. On the next screen give the VMkernel an appropriate IP address and finish the wizard
  5. After the VMkernel portgroup has been created, go into the properties of your original VMkernel port group, and go to the NIC Teaming tab
  6. Check the box to override the vSwitch NIC failover order, moving on of the vNICs into the “Unused Adapters” grouping
  7. Now open the properties for the VMkernel 2 port group and change the NIC teaming policy so that the pNIC set to “Unused Adapters”  in step 6 is set to “Active Adapters” and the pNIC from “ActiveAdapters” in step 6 is set to “Unused Adapters” here on VMkernel 2.
  8. Open a putty session to your host and type the following commands, where vmhba32 is your iSCSI initiator (you can find this on the storage adapters section on the configuration tab of your host):
    1. esxcli swiscsi nic add -n vmk0 -d vmhba32
    2. esxcli swiscsi nic add -n vmk1 -d vmhba32
  9. Verify that both portgroups are configured for the software initiator
    1. esxcli swiscsi nic list -d vmhba32

If you’re using a vNetwork Distributed Switch, the process is nearly the same, only the steps for changing the pNIC failover policy is in a slightly different place.

  1. Go to the networking section in your vCenter
  2. Go to your iSCSI vDS, and create a new port group for your second VMkernel
  3. You can set the teaming and failover now, so set a dvUplink in each of the VMkernel portgroups as “Unused” by right clicking on the port group and going to “Edit Settings” and selecting the “Teaming and Failover” section
  4. Now you can go to each host and configure a virtual adapter in the VMkernel 2 portgroup
  5. At this point you should have two VMkernel ports, each using only one pNIC, so you can go into the CLI and execute the following commands to bind the software iSCSI initiator to the two VMkernel ports.
    1. esxcli swiscsi nic add -n vmk0 -d vmhba32
    2. esxcli swiscsi nic add -n vmk1 -d vmhba32
  6. You can verify that the settings are in place with
    1. esxcli swiscsi nic list -d vmhba32

That’s it – you will now be load balancing the software iSCSI initiator traffic over two pNICs. You can verify the load by opening a putty session to the host, entering into ESXTOP, hitting “n” and monitoring your vmk0 and vmk1 traffic.

Kill an Unresponsive VM on ESX

If you’ve been using ESX for a while, you’ve probably come across the situation where you’ve got a VM that’s unresponsive, and you cannot power it down via the vSphere client. Maybe you’ve selected Power–>Shutdown Guest, and it got stuck at 95%. If it is a critical VM, you’re probably scrambling to get the VM rebooted, but the ESX Server has apparently lost control of it. The following is a set of instructions for powering down a VM from the ESX 4.x console (or SSH session).

NOTE: Follow these steps in order! Do NOT skip to the last option until you’ve done all of the preceeding steps.

First Attempt: Use VM-Support to Kill the VM

  1. Determine the WorldID with the command:
    vm-support -x
  2. Kill the virtual machine by using the following command in the home directory of the virtual machine:
    vm-support -X <world_ID>

Second Attempt Use vmkload_app -k 9:

  1. List all running virtual machines to find the vmxCartelID:
    # /usr/lib/vmware/bin/vmdumper -l
  2. Scroll through the list until you see your virtual machine’s name. The output appears similar to:
    vmid=2652 pid=-1 cfgFile=”/vmfs/volumes/<vol>/your_vm/your_vm.vmx” uuid=”56 4d a6 db 0a e2 e5 3e-a9 2b 31 4b 69 29 15 19″ displayName=”your_vm” vmxCartelID=####
  3. Run the following command to shut the virtual machine down with the vmxCartelID:
    # /usr/lib/vmware/bin/vmkload_app -k 9 ####

Third Attempt: Use vmware-cmd to Power Off the virtual machine

  1. Run the following command from the Service Console:
    vmware-cmd <path.vmx> stop
  2. Check to see if the VM was successfully stopped by running the following command:
    vmware-cmd <path.vmx> getstate
  3. If the VM was not successfully stopped, run the same command, this time using the “hard” paramter:
    vmware-cmd <path.vmx> stop hard
  4. To verify that it is stopped, run the command:
    vmware-cmd <path.vmx> getstate

Fourth and Final Attempt: Kill the VM Process from ESX Command Line

  1. Determine the virtual machine’s process:
    ps auxwww | grep -i <your_vm>.vmx
  2. The output will show you the process id (PID) of the VM:
    root 8799 0.0 0.2 3148 1240 ? S<s Oct14 0:03 /usr/lib/vmware /bin/vmkload_app –sched.group=host/user/pool9/pool11 /usr/lib/vmware/bin/vmware -vmx -ssched.group=host/user/pool9/pool11 -# name=VMware ESX;version=4.0.0;build number=261974;licensename=VMware ESX Server;licenseversion=4.0 build-261974; -@ pipe=/tmp/vmhsdaemon-0/vmx273ea43ae079ecac; /vmfs/volumes/4c2cda9a-0bc5b815-c25e -001cc47a5f84/your_vm/your_vm.vmx
  3. Type the following to kill the process:
    kill <your_pid>
  4. Wait a minute and check for the process again. If it’s STILL running, you’ll have to be a bit more aggressive still, and type:
    kill -9 <your_pid>
  5. If your VM is still running after all of these procedures have been followed, you’re going to have to reboot the ESX Server.