ESXi 7.0/6.7 and MegaRaid on Alder Lake Asrock Z690M-ITX/ax

I'm going to upgrade my home server that runs Core i5-2500K on old P8H67-I mainboard with LSI 9265-8i SAS controller and I've done some experiments prior to this upgrade on a new hardware, that is Asrock Z690M-ITX/ax motherboard with i5-12600/i5-12600K processor. Generally I always have a common platform between my main PC and the server so that I can try some stuff ahead of server migration with minimal hassle and it paid off this time.

IMPORTANT NOTICE: This configuration is not on HCL and unsupported, you may incur data loss, please read this article carefully before making decisions or changing your setup. Always have a backup available off the system!

Part I: MegaRAID cards firmware woes

The experimental setup was Asrock Z690M-ITX/ax BIOS ver 7.02, i5-12600K (server will have plain 12600 though) and LSI 9260-4i card (LSI 2108 family, Dell PERC H700 compatible) with RAID-1. The server I'm going to upgrade has 9265-8i, that is LSI 2208 or Dell PERC H710, different family, but as these have common firmware source code most issues would likely apply there as well.

Modern Intel chipsets don't have video BIOS for CSM, you can only boot in UEFI mode or should use some external GPU for CSM boot. With just one PCIe slot this is not an option, so you'll miss any visual output from LSI cards during boot, system seems to hang a bit more in B4 and A0 POST phases, in some cases for almost a minute. This is not a bug and should be expected.

I've started with 9260-4i card with RAID-1 out of the abandoned server, the card had 12.12.0-0077 firmware. As soon as the card was initialized, something within firmware or UEFI setup triggered volume reconstruction from RAID-1 to RAID-0! It took ages to complete, more than 24 hours with 4Tb drives and it can't be stopped by any means other than deleting the volume. This magic happens every reboot with any volume other that RAID-0! More to it, at some point during reboot controller has triggered fast volume initialization wiping out first 100 megabytes of data off the volume.. Not good, to say the least.

However, I was amazed to find that even with this pretty old firmware LSI controller configuration utility appeared in motherboard BIOS setup as an item under Advanced tab. You can create and delete arrays, change array properties and so on, it's pretty comprehensive and has all the features of old MegaRaid WebBIOS utility. See the images below:



Next, I have flashed the latest firmware for the card, that is 12.15.0-0239. Done it from the VM with MegaCLI and it completed successfully, necessitating a system reboot. That did not happen, as the motherboard failed to POST regardless of presence of the controller, presence of PCIe GPU, CMOS reset and battery disconnect! Motherboard starts up, fans are running, no video anywhere and no signs of boot activity. Um, what a success that was. Had to flash the motherboard BIOS from USB flash drive via magic button recovery mechanism to resurrect the beast. There's a note for some previous firmware releases that reads "after the newest image is installed, servers often will not boot", maybe it was the case..

With the latest firmware all unexpected reconstruction/initialization issues are gone, I've tested it extensively. However, when you try to create a RAID volume in UEFI BIOS and click on Select disks system appears to hang forever, although I didn't wait for a long time, maybe it would come back in a couple of minutes, haven't tried that as for me it's easier to create an array from MegaCLI. You can even boot from RAID volume if you set it up as active with MegaCli -AdpBootDrive -Set -L0 -a0 and install ESXi 6.7 on it! In this case it appears as UEFI OS under boot selection menu.

Haven't tried 9265-8i yet, but the plan is to flash it with the latest firmware in the old system and then start migration process, I'll post an update if something new would happen there.

Part II: ESXi 7.0

Processors with E-cores should have all of them disabled in BIOS, otherwise ESXi kernel will PSOD.

Asrock Z690M-ITX/ax has two network controllers - Intel I-219V [8086:1a1d] and Realtek RTL8125 2.5G Ethernet [10ec:8125]. For the Realtek there are no drivers for ESXi 7.0, for Intel the Community networking driver for ESXi fling does work as expected when added to custom boot ISO. VLANs do work, no issues with the network were discovered.

Intel VMD should be kept disabled completely or at least for the devices on SATA and PCIe buses that you want visible under ESXi, as otherwise you would be able to boot off these devices, but they will be invisible under ESXi storage tabs. When VMD is disabled, both NVMe and SATA devices appear as disks under ESXi and work as expected.

LSI 9260-4i support was removed completely in ESXi 7.0 due to VMKLinux kernel API deprecation. Adding PCI IDs to lsi_mr3 does not help. However, you can passthrough these cards to a Linux VM, in my case Ubuntu 20.04 server, set up an iSCSI target there and export it back to ESXi. I was able to achieve decent performance with this setup with some 250MB/sec rates on these volumes. You can also pass the Realtek card in there and create a complete NAS VM visible from outside. MegaCLI works as a charm.

LSI 9265-8i should work with ESXi 7.0 as its PCI IDs are listed for lsi_mr3 driver and in fact 9265-8i was addded to lsi_mr3 native driver just with the release of 6.7u3. In fact, 9265-8i is on the list of unsupported devices not removed from the driver for which removal is scheduled in the next release of ESXi. I'll update this in a couple of weeks as I would make progress with the main server.

Part III: ESXi 6.7u3

Processors with E-cores should have all of them disabled in BIOS, otherwise ESXi kernel will PSOD. 

Most of 6.7 releases including vanilla 6.7u3 would hang after initial module loading prior to yellow startup screen with the dreaded Shutting down firmware services... Using 'simple offset' UEFI RTS mapping policy message. If your machine has PCIe graphics card this could be avoided by using it, otherwise it appears that installer shots the video in the head, as installer actually progresses further in headless mode, but no video output is displayed. This is fixed in some point release after 6.7u3, in my case ESXi670-202201001-standard.zip does work without this problem.

Intel VMD should be disabled completely in BIOS, otherwise the kernel will PSOD during module initialization with NOT_IMPLEMENTED bora/vmkernel/hardware/pci/pci.c:414 exception. 

Intel I-219V network controller seems to exist in a whole bunch of flavors, the particular one  [8086:1a1d] on Asrock board is not supported by inbox ne1000 driver and adding PCI IDs won't help either. There was, however, a significant effort on behalf of VMware to satisfy requests of the Intel NUC community for a network drivers for some newer NUC models, traces of which could be found here: ESXi on Intel NUC 10 (Frost Canyon). There are no sings of driver for 6.7 on that page, and the author ignores necro-comments that request it just as any blogger would do :).

The Internet remembers everything though, so given the filename and some google-fu skills I was able to retrieve it from the depths of Baidu and here it is: Intel-NUC-ne1000_0.8.4-3vmw.670.0.0.8169922-offline_bundle-16654787.zip 

With the above driver integrated into ESXi670-202201001-standard.zip profile the system will boot and install without issues. VLANs do work and no network issues were discovered.

There's a Realtek RTL8125 driver for 6.7 available here, you can find binary modules of this elsewhere on the Net, but it lacks VLAN support.

LSI 9260-4i as well as LSI 9265-8i work with ESXi 6.7u3 natively with inbox driver within standard distribution. Performance is within expected limits for cache with BBU, up to 300MB/sec. I have confirmed 6.7u3 could boot off 9260-4i RAID-1 volume on this platform in UEFI mode with no issues at all.

Comments

Popular posts from this blog

HP DL380 G7 won't power up after a power cord removal/power loss

Accessing MegaRAID BIOS (WebBIOS, Ctrl-H) on consumer motherboards