Few months back with vSAN 6.7 release, most awaited feature WSFC ( Windows Server Failover Cluster) with vSAN iSCSI Target Service was released. As part of vSAN, iSCSI Service was released in 6.5 and becoming more & more popular.
As of this release, fully transparent failover of LUNs is now possible with the iSCSI service for vSAN when used in conjunction with WSFC. This feature is incredibly powerful as it can protect against scenarios in which the host that is serving a LUN’s I/O fails. This failure might occur for any reason: power, hardware failure or link loss. In these scenarios, the I/O path will now transparently failover to another host with no impact to the application running in the WFSC.
In my earlier blog posts, I have already discussed about vSAN iSCSI target in depth with failover scenarios. Let’s begin with WSFC on vSAN datastore.
Let us take look at my environment first:
- WSFC has been configured with two nodes ( wsfcp & wsfcs)
- Cluster is acting as Fileserver and serving share //accesspoint/basic
- vSAN iSCSI Target has been configured on vSAN 6.7 with below details:
- Target Name: vsanfailover
- lun0 & lun1 with vSAN default storage policy
- IO Owner is blr3.vhabit.com
- Workstation acting as client “blrclient”
- Network Drive mapped F:\ basic (\\accesspoint)
With above screenshots, I hope you have clear picture of environment and will help in understanding further testing on this.
How does it work?
So, the procedure is very simple & let me conclude it in few steps:
- Create iSCSI target and LUN based on the cluster role you will be using ( I am using file server and two luns at minimum is required, 5GB LUN is acting as disk witness and 10GB LUN is available for storage).
- Install Failover cluster and MPIO on windows node.
- Enable iSCSI initiator on both the windows node and discover target portal.
- Create heartbeat network for node internal communication.
- Configure MPIO and validate failover cluster.
- Create failover cluster and add the fileserver role.
- Create SMB share and map it with workstations.
WSFC functionality majorly depends upon diskwitness and in our environment lun0 is serving as diskwitness. As you know diskwitness is quorum or tiebreaker.
Let’s simulate failover and check how does wsfc nodes react and what happens at client end. In order to simulate this, I am starting copy process in client computer from one network drive to sharedrive and reset the IO target portal which is serving both the luns.
In this screenshot, it is shown that wsfcp is the owner node for quorum and storage in windows cluster. wsfcp is acting as active and wsfcs is passive.
Window cluster disk management properties are show below
iSCSI initiator has been configured with MPIO and MPIO settings are as per below. We have added two target portals with each 2 luns ( lun0 & lun1). Initiator has sessions with target portals and it’s been configured in failover only mode. One session is active at a time. If active session ( target) goes down, standby will take it over.
Note: One Target Portal= One session (You can create multiple session per target)
Below screenshot tell us that session id ends with 09 associated with target 192.168.2.103 ( blr3.vhabit.com)
So far, we are clear that IO Target is blr3 and this target has a session with initiator. This session is active and serving LUNs access. We need to see if this target bl3 goes down, how would nodes tolerate failure. Hence, I have started the copy process in client workstation and this will tell us how seamless the under-hood fail-over is.
I have introduced failure in blr3 and now vSAN will transfer the target ownership to some other host. You can see below in client workstation, copy process stuck for few seconds. This means that transfer is taking place under hood and maintaining the disk witness access to windows clusters
In webclient, we can see that blr3 is not responding and target ownership has been transferred to blr1.vhabit.com for below luns. New target owner is blr1.vhabit.com
Copy process continued and finished. Now, lets look at the errors in windows nodes. Below screenshots from event viewer shows that initiator failed to connect to target blr3 as it failed and based on MPIO configuration initiator is trying to connect to another target by making it active from standby. Request then transfers to target owner via iscsi direct.
I hope this post has been informative for you.
Please check KB for designing SQL WSFC Cluster here https://kb.vmware.com/s/article/54461
Thanks for reading!!