Citrix MCS No_Host_Available

I’m currently working on a project where I’m migrating the customer’s users to a newly created Citrix Virtual Apps and Desktops (CVAD) environment based on Windows 2019 combined with Nvidia GPUs. This environment is based on Citrix Machine Creation Services (MCS). In the past, they decided to let their VMs run on local storage, which is a relatively normal choice in a small environment. Throughout the project, we needed to update the machine catalogs with a new version of the master image. Luckily this is all based on the code, so every image is recreated from scratch. But after every change, we needed to update the Machine Catalog using MCS. That’s usually not an issue if you use shared storage and have plenty of resources.

As you can guess, this isn’t the case with this customer. They have six hosts with GPUs and use local storage, as mentioned before. Within Citrix Studio, you create a Hosting Connection where you specify the storage used. As the customer uses local storage, you must select the local storage repository (SR) from all hosts.

Hosting Connection Citrix Studio

When you update a Machine Catalog with a new version of the master image (Snapshot) during the MCS process, it will start a preparation VM when the copy task is completed. This VM is based on the original VM (Snapshot, which is just uploaded), So Memory, GPU, and CPU, but the network is disconnected. This is when the problem starts. Because Citrix studio can’t look at the available resources within a Hosting Connection, it will start the preparation VM on the first available host within the Cluster or Pool. When this host has no resources available, you get the error NO_HOST_AVAILABLE.

NO_HOST_AVAILABLE

Solution

Citrix cannot look at the resources available from the hosting connection mentioned in this Support Article, so you need to look at other options. In our case, as we have a static environment and use GPUs in passthrough mode, we selected one host where the resources are available to start the preparation VM. To achieve this and make sure the preparation VM is always started on that host, the only option is to edit the hosting connection and only select the SR for that host. After this, all the Preparation VMs will start on that host, and every Machina Catalog update will be successful. The screenshot shows that the host used for the preparation VM is CH4 (Citrix Hypervisor).

Hosting Connection Citrix Studio

Conclusion

Every company likes to use all the resources available within its environment. Of course, they will know they need to update the Master Image and need resources for the preparation VM. Still, it’s sad to see it’s not possible to specify one host within the studio to act as a Preparation VM host. Suppose you only have shared storage combined with GPUs and/or limited memory and don’t use the local storage within a hypervisor. Then you are screwed, it could be that the preparation VM is started on a host that currently doesn’t have resources available, and your Machine Catalog update will fail.

As a feature request for Citrix, it would be nice if you could specify a host within the hosting connection that will be used to run the preparation VM as long as they can’t check the available resources for the Hosting Connection.

Creating a Memory dump Citrix MCS/PVS and Hypervisor

When you need to debug some issues with your current deployment, you’re probably asked to create a memory dump. When you’re deployment is a traditional VM than that’s no issue, if you are using PVS than the memory dump is probably disabled because the PVS optimization disabled it or you used the Citrix optimizer which also disables the creation of memory dumps.

Just enabling the creation of a memory dump isn’t working, you need to specify a location for the memory dump to be created. Citrix created different articles on how to create a memory dump for PVS and MCS, CTX127871 and CTX261722

PVS

When using PVS you probably enabled “Cache on RAM with overflow on disk” and added a dedicated Write Cache Disk (WCD) for the vdiskdif.vhdx file. Because all the best practices tell you to change the location of the Page file and the Log files to a dedicated disk you probably use the WCD as alternative location.

When enabling Memory Dump creation you don’t want the space on the WCD got full because the DedicatedDumpFile is created on that disk. I recommend using a separate disk just for creating the Memory Dump, I make the size of that disk 1GB bigger than the assigned memory for that VM. Because I have situations were we have 40-60 VM’s, I don’t assign a Memory Dump disk to every device, because I do not have the storage available.

After the creation of the dedicated disk for the memory dump and you reboot your VM, you will notice that the vdiskdif.vhdx is located on the new disk. To make sure this doesn’t happen anymore you need to create a file with the following name and place it in the root of the new disk: {9E9023A4-7674-41be-8D71-B6C9158313EF}.VDESK.VOL.GUID See CTX218221.

Enabling the Memory Dump

I always set the location for the Memory Dump to the E: drive, for both PVS and MCS I create a dedicated disk as mentioned earlier.

To enable the creation of the memory dump just add the following registry keys as explained within the previous mentioned articles:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl

  • AutoReboot
  • CrashDumpEnabled
  • LogEvent
  • Overwrite
  • DumpFileSize
  • IgnorePagefileSize
  • DedicatedDumpFile
  • AlwaysKeepMemoryDump

I attached a TXT file that I always use to set it correctly, it’s at the end of this article. When you downloaded it, you need to rename it to .reg and you can merge the settings.

Creating a memory Dump

Because we mostly work with the Citrix Hypervisor (previous called XenServer), I have written down the steps to create a Memory Dump on this hypervisor. When I have to create a Memory Dump on VMware or Hyper-V I will write this down here to.

Ok, we now have a situation where you need to create a memory dump of the VM. After Identifying the VM, write down or copy the UUID.

Getting the UUID of the VM

Then run the following command within the console of the Hypervisor, where VM_UUID is the UUID you just written down or copied:

list_domains | grep -i <VM_UUID>

Then you get a Number which you have to write down as you can see in the following screenshot.

Receiving the Domain ID (## 239)

Then you run the following command, where Domain ID is the number you got from the previous step.

xen-hvmcrash [domain ID]

For this example that would be: xen-hvmcrash 239

Making the VM crash 🙂

Now the machine will crash and you can look at a BSOD within the console and after the machine rebooted you can find the Memory Dump on the E: drive. You can copy the memory dump and upload it or analyze it yourself.

When you have questions regarding these steps, please let me know.

Below the reg file, it’s a txt file within a zip file. You have to rename the txt to reg.

Upgrading Nvidia firmware

During the last couple off days I needed to update the firmware from the Nvidia Tesla T4 card in our servers. When following the installation steps provided by HPE I ran into some issues, so I decided to create a step by step guide on how to update the firmware.

  1. Download the latest firmware from your vendor
  2. Upload the RPM file to /usr/local/bin using Winscp or your favorite tool
  3. Connect using SSH to the host
    1. Browse to cd /usr/local/bin
    2. unpack the RPM file using the following command: rpm –ivh ./Tesla_T4_90.04.96.00.01-1-0.x86_64.rpm
      The RPM file name can be different when upgrading a newer version or other Nvidia card.
    3. Go to the folder where the RPM file is extracted for now this is the Tesla_T4_90.04.96.00.01 folder: cd /usr/local/bin/Tesla_T4_90.04.96.00.01/
    4. Change the permissions of the file
      chmod +x Tesla_T4_90.04.96.00.01.scexe
    5. Make sure all nvidia kernel modules are removed
      init 3
      rmmod nvidia
    6. When you get the following error :
      ERROR: Module nvidia is in use
      run the following command:
      service xcp-rrdd-gpumon stop
      and then run:
      rmmod nvidia
    7. Now we can upgrade the firmware using the following command:
      ./Tesla_T4_90.04.96.00.01.scexe -f
      The SCEXE file name can be different when upgrading a newer version or other Nvidia card.
      Choose -i if you would control the upgrade for every card in the host.
  4. When all the cards are upgraded you need to reboot the host and continue to the next host.

Good luck with upgrading, as you can see it’s easy.