When a NetScaler VPX begins reporting Hard Drive Errors, this can be caused by a number of reasons as mentioned below:
These hard disk drive errors can generally be displayed by running the command ns_hw_err.bash from the bash shell.
This calls a script which reports if there are any issues with the hardware.
Considering this is a virtual machine, the only output that will matter is the HDD errors.
Generally the errors look like this:
opctxns02d kernel: g_vfs_done():da0s1e[WRITE(offset=3743465472, length=32768)]error = 5
When these errors are reported on a
Physical Appliance, it indicates the
HDD needs to be replaced.
When these errors are reported on a
Virtual Appliance, it could be caused by any of the following:
1) The physical storage could be damaged.Are other virtual appliances, using these physical disks, reporting errors?
If you setup a new VPX using these physical disks, are you seeing the same errors?
2) There could be a timeout between the VPX and the physical storage.If the Read/Write requests are taking too long between the VPX and the physical storage and the NetScaler is reporting the errors.
You would need to check the storage connections in this environment.
3) There could be corruption on the VPX instance/virtual disk.If you create another VPX instance using the physical array of disks and the same connections methods and you are not seeing Hard Drive errors on this new appliance, then this would indicate the physical storage is not the issue.
In this case, this would point to corruption in the VPX instance or virtual disk.
You would have to check from a HyperVisor level, the integrity of the virtual disk.
It could be fragmented/corrupted and in this case it would be best to create a new instance and move all the settings and licenses to this new VPX