Friday, February 19, 2016

VMware ESXi host build notes: install image, HCL, drivers, firmware - confirming HCL recommendations once you have the hardware

For your purposes the important KBs now are 1031534 and 1034674

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1031534

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1027206

With hardware turned on and ESXi setup (we will talk about the install image later on) we access shell and can now check the IO devices. In my case we are running this with esxi v5.5

The two commands that I like to check HBAs and NICs are slight variations to the ones in the KB:

vmkchdev -l | grep vmnic

example output:

~ # vmkchdev -l | grep vmnic
0000:01:00.0 8086:1521 1028:1f60 vmkernel vmnic0
0000:01:00.1 8086:1521 1028:1f60 vmkernel vmnic1
0000:01:00.2 8086:1521 1028:1f60 vmkernel vmnic2
0000:01:00.3 8086:1521 1028:1f60 vmkernel vmnic3
0000:82:00.0 8086:1521 8086:5001 vmkernel vmnic4
0000:82:00.1 8086:1521 8086:5001 vmkernel vmnic5
0000:82:00.2 8086:1521 8086:5001 vmkernel vmnic6
0000:82:00.3 8086:1521 8086:5001 vmkernel vmnic7

vmkchdev -l | grep vmhba

example output:

~ # vmkchdev -l | grep vmhba
0000:00:11.4 8086:8d62 1028:0600 vmkernel vmhba1
0000:00:1f.2 8086:8d02 1028:0600 vmkernel vmhba2
0000:02:00.0 1000:005f 1028:1f4b vmkernel vmhba0
0000:04:00.0 10df:f100 10df:f100 vmkernel vmhba5
0000:04:00.1 10df:f100 10df:f100 vmkernel vmhba6
0000:05:00.0 10df:f100 10df:f100 vmkernel vmhba3
0000:05:00.1 10df:f100 10df:f100 vmkernel vmhba4

The first column is a PCI bus identifier, the 2nd and 3rd are hardware identifierss, and the last column is how ESXi calls the device.

From the KB:

For example, to check the compatibility of vmnic0 and vmhba0, note the hardware IDs:

000:003:00.0 14e4:1639 103c:7055 vmkernel vmnic0
000:069:00.0 103c:323a 103c:3243 vmkernel vmhba0


The section in bold indicates the device properties in the format VID:DID SVID:SSID, where:

VID = Vendor Id
DID = Device Id
SVID = Sub-Vendor Id
SSID = Sub-Device Id

These numbers are very important, because they allow us to check the VMware compatibility list we had talked about before, but in a very specific manner. Let's check the IDs we got from the example.

This is the I/O devices view of the Compatibility Guide. We don't need to select vendors or model anymore - we only need to input the 4 numbers

With those 4 numbers we now know our vmin0-vmnic3 are the Intel I350-t NDC (network daughter card) which are the LAN on motherboard ports. It's good that they are listed, but if we click on the ESXi version from the results, we now can see the list of driver and firmware combinations that VMware supports:


So we have information, how do we find what drive and firmware is running on ESXi?

For network cards

Use this command:

ethtool -i vmnicX

Example:

~ # ethtool -i vmnic0
driver: igb
version: 5.3.1
firmware-version: 1.67, 0x80000d93, 16.5.20
bus-info: 0000:01:00.0

So we have found out that the driver is called igb, it's version 5.3.1 and firmware is v1.67. Checking the website, we see that this driver is compatible with any firmware version (if there was a specific firmware it would be listed, instead of N/A, like in the case of igb 5.2.7). This means for this device we are running on VMware approved drivers and firmware.


For storage HBAs

The commands for HBAs are not as clean and may vary by your vendor. 

You first find the driver being used. This command lists HBA devices:

esxcfg-scsidevs -a

~ # esxcfg-scsidevs -a
vmhba38 ahci              link-n/a  sata.vmhba38                            (0:0:31.2) Intel Corporation Wellsburg AHCI Controller
vmhba39 ahci              link-n/a  sata.vmhba39                            (0:0:31.2) Intel Corporation Wellsburg AHCI Controller
vmhba0  megaraid_perc9    link-n/a  unknown.vmhba0                          (0:2:0.0) LSI / Symbios Logic Dell PERC H330 Mini Adapter
vmhba1  ahci              link-n/a  sata.vmhba1                             (0:0:17.4) Intel Corporation Wellsburg AHCI Controller
vmhba2  ahci              link-n/a  sata.vmhba2                             (0:0:31.2) Intel Corporation Wellsburg AHCI Controller
vmhba3  lpfc              link-up   fc.20000090XXXXXXXX:10000090XXXXXXXX    (0:5:0.0) Emulex Corporation LPe12000 8Gb Fibre Channel Host Adapter

Or you can also use

esxcli storage core adapter list

In this case i'm interested in double checking the Emulex 8GB FC adapters. Now I know my driver is called lpfc. With this information I now run this command

vmkload_mod -s lpfc | grep Version

Example output:

~ # vmkload_mod -s lpfc |grep Version
 Version: 10.6.126.0-1OEM.550.0.0.1331820

To determine firmware on a FibreChannel HBA we need yet another KB, 1002413


The command for ESXi v5.5 is

/usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -a


In the meantime, I find this grep sufficient to reduce the output to what we are interested in so I can paste an excerpt:

# /usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -a | grep 'FW Version'
FW Version:     2.01A12
...

Notice that all HBAs show the same firmware version (they are all the same model). Always be mindful that if you have different hardware, you have to check each!

Checking this HBA's 4 ID numbers and comparing the driver and firmware versions we see:



Here we can see an interesting situation - we have a new driver but old firmware, and they are incompatible (VMware would tell you, if you have a problem, you aren't running an HCL-approved configuration). In this case, you will need to update the firmware on the HBA card. 

In my experience, I find that the vendor tools (through HP SPP or Dell OME ISO) do a good job of upgrading all server BIOS/firmware on all vendor-provided cards to their latest Linux versions. After updating firmware through these tools, you still have to check that you are in an approved configuration. There are also many posts on the internet on how to execute a standalone firmware upgrade but most depend on a device specific tool. If you can't get the firmware to the correct version to use that latest driver, you might need to downgrade the driver to match your existing firmware. Remember, as long as you are on the HCL, VMware will support you if you run into problems (small caveat below).

A note on "inbox" drivers versus Partner Async
  • Inbox drivers are what VMware provides in their base installer. They are the driver/firmware combination that they certified works when doing a release. They own support on this driver/firmware combination.
  • Partner Async drivers are updates to drivers provided by vendors that pass a certification process. However, if a problem is determined on this new version, the support lies on the Partner, not on VMware. VMware can recommend you to fall back to the inbox drivers.
You have the eternal dilemma of stability versus bug fixes and performance improvements. I always go for Partner Async unless I find trouble. Note that vendors provide VMware ISO installers that include these new drivers by default. The topic of the differences between the VMware provided installation ISO (that only includes the inbox drivers) and the vendor provided ISOs is the next and last post in this series.

In Summary

Now you know how to check the hardware you own against HCL for firmware and drivers. You would run the numbers for each one of the devices you want to check on your hardware. Documenting the configuration that will ultimately go to production will help as you add more hardware into the clusters and help you keep a homogeneous environment. 

No comments:

Post a Comment