Monday, April 25, 2016

Working with the Dell sold Samsung NVMe XS1715 SSD 800GB

TL:DR - you will probably need to do a firmware downgrade and driver update to comply with the HCL. Don't trust the Dell installer or OME Linux updater!

We bought some Dell servers that have this NVMe drive. As far as I know, this is the only hot-plug NVMe drive they sell for the R730XD series, but I suspect this will change sometime in the future.

The installer originally used on this server didn't have the NVMe driver, so I wasn't being able to see the drive in my storage interface. To make sure the drive is working properly before installing the driver, you can check it shows up in the storage section of the iDRAC interface and you can use this command to see if this PCIe device shows up in ESXi.

~ # lspci -v |grep NVMe
0000:87:00.0 Non-Volatile memory controller Mass storage controller: Samsung, Inc. Express Flash NVMe XS1715 SSD 800GB

With this information, even without having it mapped to an vmhba yet, I can get the 4 hardware IDs and be able to check this device in the HCL

~ # vmkchdev -l |grep 0000:87:00.0
0000:87:00.0 144d:a820 1028:1f96 vmkernel

In my case, this is being installed with 5.5u3.


The firmware version for all ESXi releases is the same (IMP0DD3Q), and a quick google shows this is the initial release of the firmware from 2014. This server has already had an OME boot CD run, and I didn't see any updates to this device; but checking in iDRAC, it shows I am running a different release, IPM0ID3Q:

Checking the Dell page for that firmware release, it seems this is the 3rd release, and the VMware HCL has not caught up to these releases. The release I'm currently running is from June 2015 and improves the wear levelling algorithm, while the 2nd release improves management functions. Tsk tsk, VMware/Dell...

For now it seems I will have to figure out how to downgrade the firmware to comply with HCL. I checked the VSAN HCL as well, and there are currently no Samsung NVMe drives certified, but I was told by other vExperts they are in progress and will update once I have news on that front. Once I figure it out, I will update this section.


Installing using the Dell image to this date (VMware-VMvisor-Installer-5.5.0.update03-3568722.x86_64-Dell_Customized-A05.iso) I was able to see the driver used with a "esxcli storage core adapter list"

vmhba7 nvme link-n/a pscsi.vmhba7 (0:135:0.0) Samsung Electronics Co Ltd Dell Express Flash NVMe XS1715 800GB PCIe SSD Controller
And we can check this driver's information with "vmkload_mod -s nvme"

~ # vmkload_mod -s nvme
vmkload_mod module information
 input file: /usr/lib/vmware/vmkmod/nvme
 License: BSD
 Required name-spaces:
  nvme_compl_worlds_num: int
    Total number of NVMe completion worlds/queues.
  nvme_dbg: int
    Driver NVME_DEBUG print level
  io_timeout: int
    IO timeout second for internal checker
  max_scsi_unmap_requests: int
    Maximum number of scsi unmap requests supported
  max_namespaces: int
    Maximum number of namespaces supported
  io_cpl_queue_size: int
    NVMe number of IO completion queue entries
  io_sub_queue_size: int
    NVMe number of IO submission queue entries
  admin_cpl_queue_size: int
    NVMe number of Admin completion queue entries
  admin_sub_queue_size: int
    NVMe number of Admin submission queue entries.
  nvme_log_level: int
    Log level.
        1 - error
        2 - warning
        3 - info (default)
        4 - verbose

        5 - debug

As seen in the HCL in the beginning of this post, the only driver version on the HCL is 1.0e.0.30-1vmw so I will also need to download and install a custom driver for this card. At least in vSphere versions 6.0+ it lists the VMware inbox driver instead.

The driver from the VMware site should be downloaded, saved to a datastore or /tmp, and then the commands to uninstall and install should be run. I'll paste the commands here soon.

No comments:

Post a Comment