Tuesday, May 30, 2017
Inducing Failures on a Virtual SAN cluster
The coolness of python scripts.
One of the many tools available for testing Virtual SAN is a buried python script called vsan.DiskFaultInjection.pyc. Located in the /usr/lib/vmware/vsan/bin directory, this utility can generate permanent or transient errors. Furthermore, it can emulate unplugging of disks.
Using the -h option (for help), an administrator can see the options available for this command. Only to be used pre-production, this script can generate failures to allow the user to understand what happens in such cases.
Below is an example of what happens when a capacity disk is affected by such permanent failure. In the case of a raid5 virtual machine, the virtual machine would continue to run. If enough servers and/or disks are available, the rebuilding of the date would take place immediately. The -p option is used for permanent errors and the -u option to unplug a disk.
Errors would be seen everywhere, notice the capture below.
The -c (clear option) is used to remove the permanent error. If using the -u option, simply rescan the storage via esxcfg-rescan -A.