In this blog post we are going to discuss about the most common situations we may face in the production environments regarding vSAN cluster consistency. This health check validates if the hosts in the cluster and the corresponding disks have a consistent configuration with the cluster.
While working on vSAN production, I have seen that vSAN cluster consistency health check is showing warning in vSAN health. When I checked the reason behind this inconsistency, I found that that network configuration is out of sync. Having this very statement, only two thoughts came into my mind a.) vSAN Kernel port b.) unicastagent lists.
You can refer to the screenshot below of error message:
In above screenshot, error is highlighted and pertaining to this error-ed hostname is mentioned.
To resolve this issue, you can directly click on “remediate inconsistent configuration” icon on the top right. However, before going ahead you need to check what is actually misconfigured.
I initially suspected may be vSAN vmkernel is causing this issue but if it would have been then this host was in partitioned state which is not the case here. Secondly, I listed unicast agent list by running below command on this host
[root@ESXI001:~] localcli vsan cluster unicastagent list NodeUuid IsWitness Supports Unicast IP Address Port Iface Name ------------------------------------------------------------------------------------------------- 53edfd6f-d759-xxxxxxxxxxx 0 true 10.10.10.30 12321 53ee5c46-a821-xxxxxxxxxx 0 true 10.10.10.31 12321 599642d7-a4bc-xxxxxxxxxx 0 true 10.10.10.32 12321 5ad8a937-e78f-xxxxxxxxxxx 0 true 10.10.10.33 12321 vmk0 58e215a5-7243-xxxxxxxxxx 0 true 10.10.10.34 12321 vmk0
I found that in 5 node vSAN cluster, 5 unicast entries are listed on this host, when I logged into other hosts it only showed 4 unicast entries which is true. As you know unicastagents form a cluster membership and each node has neighbors entries. In this case, I found localunicast agent entry which caused this inconsistency in the cluster environment.
So, now this entry needs to be removed and there are two processes which can be followed in this case. First, you can manually remove this entry by running below commands and secondly you can directly remediate the cluster which will automatically correct things for you. Even before that you need to set a parameter which allows vCenter to accept new configuration.
[root@ESXI001:~] esxcfg-advcfg -g /VSAN/IgnoreClusterMemberListUpdates Value of IgnoreClusterMemberListUpdates is 1 (By default value is set to 1)
Run below command to set it to false which will allow you to change the configuration
[root@ESXi001:~] esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListUpdates Value of IgnoreClusterMemberListUpdates is 0
Now, you can remediate the cluster by clicking on “Remediate inconsistent configuration” or by running command
esxcli vsan cluster unicastagent remove -t node -a 10.10.10.34
Note: Make sure host is in MM with ensure accessibility before performing this activity. Once the activity is performed you can change the cluster-memberlist updates value to 1
I hope this blog post has been informative for you. Thank you for reading!!