Scratch location set to /scratch – vSAN node on SD card

At the place I work, we experienced continuous “The ramdisk ‘root’ is full” issues on our vSAN ESXi nodes.

The first thing we did was to raise a support call and have vmware check what is filling up the ramdisk.

Support suggested that we need to limit the size of vsantraces to 200MB, and pointed to the below KB Article

https://kb.vmware.com/kb/2150320

This puzzled me, as the vsantraces was not full.

Ramdisk Size Used Available Use% Mounted on
root 32M 32M 0B 100% --
etc 28M 5M 22M 18% --
opt 32M 368K 31M 1% --
var 48M 728K 47M 1% --
tmp 256M 492K 255M 0% --
iofilters 32M 0B 32M 0% --
hostdstats 1553M 17M 1535M 1% --
snmptraps 1M 0B 1M 0% --
vsantraces 300M 167M 132M 55% --

I kept digging and I found out that the scratch partition on the hosts was not pointing scratch -> /tmp/scratch

but it was on / instead.

drwxr-xr-x 1 root root 512 Sep 30 11:00 scratch

I have changed the ScratchConfig.CurrentScratchLocation under Advanced Settings, but the change did not persist after a reboot.

Raised another call with support, and after escalating to a senior engineer, we have been pointed to a new KB Article

KB2151209

Seems after upgrading the hosts using a custom HPE ESXi 6.5U1 image, we ran into the same issue, as the Dell EMC custom image

that the article points out.

Checked the drivers and they re were there, even though the card is not in use.

esxcli software vib list | grep elx

elx-esx-libelxima.so 11.2.1238.0-03 ELX VMwareCertified 2017-09-04
elxiscsi 11.2.1238.0-1OEM.650.0.0.4598673 EMU VMwareCertified 2017-09-04
elxnet 11.2.1149.0-1OEM.650.0.0.4240417 EMU VMwareCertified 2017-09-04
emulex-esx-elxnetcli 11.1.28.0-1.26.5969303 VMware VMwareCertified 2017-09-04

The solution to the issue is as follows:

  1. Stop hostd (disconnects the host from vcenter)
    • /etc/init.d/hostd stop
      
      watchdog-hostd: Terminating watchdog process with PID 70699
      
      hostd stopped.
  2. Remove the below drivers
    esxcli software vib remove -n elxiscsi -n elx-esx-libelxima.so
    
    Removal Result:
    
     Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
    
     Reboot Required: true
    
     VIBs Installed:
    
     VIBs Removed: ELX_bootbank_elx-esx-libelxima.so_11.2.1238.0-03, EMU_bootbank_elxiscsi_11.2.1238.0-1OEM.650.0.0.4598673
    
     VIBs Skipped:
  3. Start hostd (host gets connected back to vcenter)
    /etc/init.d/hostd start
    
    hostd started.
  4. Configure ScratchConfig.ConfiguredScratchLocation field to /tmp/scratch in Advanced System Settings
  5. Reboot the Host

 

the issue is resolved and scratch is persistent to /tmp/scratch

lrwxrwxrwx    1 root     root            12 Oct  3 15:29 scratch -> /tmp/scratch

 

Of course, that wouldn’t be an issue if ESXi was installed onto magnetic disks, or if scratch was redirected to a Datastore (shared storage)

these ESXi nodes form a vSAN cluster where, vSAN is the only Datastore.

 

 

You may also like

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.