VMware VCF on Dell VxRail 4.2 to 4.3 – Tips & Tricks

VMware VCF 4.3 is GA and not a moment too soon! The release of 4.3 brings the vRealize product set back up to date and does a number of patch releases on the core infrastructure. If you are running VCF on VxRail then additional packages are also available to maintain firmware. This is a short page listing my general notes on the upgrade process.

https://docs.vmware.com/en/VMware-Cloud-Foundation/4.3/rn/vmware-cloud-foundation-43-on-dell-emc-vxrail-release-notes.html

** Update. VCF 4.3.1 is now available so jump to this version instead to resolve some massive vCenter vulnerabilities **

Suggested Health Checking

The pre-requisite list of tasks prior to updating VCF lacks a lot of detail:

https://docs.vmware.com/en/VMware-Cloud-Foundation/4.3/com.vmware.vcf.vxrail.admin.doc/GUID-CDDBC489-11A9-4D56-9BC9-DCCADB60AB35.html

I suggest running the following health checks against your environment before considering progressing to update.

  1. VxVerify – A Dell utility which runs on the VxRail manager appliance and checks your ESXi, vCenter and NSX-T environment. https://www.dell.com/support/kbdoc/en-uk/000021527/vxrail-how-to-run-vxverify?lang=en
  2. SDDC SOS – An inbuilt health check utility within the SDDC manager. Details found at https://docs.vmware.com/en/VMware-Cloud-Foundation/4.3/vcf-deploy/GUID-10BA552D-FE25-4DF4-AFCA-F8A520DD881C.html. Run ./sos –health-check –force but I suggest also running in verbose vcf mode with both the -d and –vcf switches for a better test.
  3. SDDC Manager pre-checks. By the time you fix all the errors found in tool 1 & 2, the pre-checks should be coming back pretty cleanly prior to pushing the upgrade button.

My general comments on these tests is that they feel a little inconsistent as to the results which are returned. I found I could run SOS three times back to back and get three different results, so do your own checking and take the results with a pinch of salt.

I got to the point where I ended up running an upgrade to see whether errors were actually returned and often the next stages ran through fine.

Upgrade Bundles

I work in an air gapped environment so I use the offline bundle process. This is tedious and relies on you having a Dell and VMware account which are suitably privileged to access the correct software. The Dell account is required to download a composite ZIP bundles (around 10GB) which contains all the Dell VxRail specific firmware, BIOS updates etc.

This bundle needs uploading to the SDDC manager separately to the main VCF files, so in actaul fact you will need to create 2 sets of markerFiles – one containing just the VxRail software, and another containing all other VCF files. I found this not to be clear in the documentation, so beware.

NTP Drift

Despite my environment having a stratum 1 NTP source aligned to all VMware appliances, the various pre-check tools were throwing NTP errors. This is somewhat infuriating but I found this useful post which provided allowed a little drift in the time and after applying it, all pre-checks came back green.

https://kb.vmware.com/s/article/83831

The Bug that destroyed my system

If you are using the NSX-T distributed firewall with layer 7 context rules then you must take note of this prior to upgrading NSX-T.

https://kb.vmware.com/s/article/82043

The moment our workload domain NSX-T upgraded I lost all connectivity to our infrastructure as 90% of our firewall rules were running layer 7. My suggestion to anyone would be to disable the DFW or convert rules to layer 4 until all ESXi hosts are upgraded at which point the bug is resolved and the DFW will continue to process traffic as expected.

**This caused me hours of troubleshooting so I hope it helps someone**

Leave a Reply