info@techdevops.com | 437-991-3573 | Data Engineering Services
TechDevOps.com
Resources Tools
Experts in Microsoft SQL Server on Windows, Linux, Containers | Clusters, Always On, FCI | Migrations, Cloud, Performance



Always On: The availability replica manager is going offline because the local Windows Server Failover Clustering (WSFC) node has lost quorum.
by BF (Principal Consultant; Architecture; Engineering)
2018-02-02









SQL Server Error Log:


Always On: The availability replica manager is going offline because the local Windows Server Failover Clustering (WSFC) node has lost quorum.

Always On Availability Groups: Local Windows Server Failover Clustering node is no longer online. This is an informational message only. No user action is required.

Failed to update Replica status within the local Windows Server Failover Clustering (WSFC) due to exception 41005.

The state of the local availability replica in availability group 'agXYZ' has changed from 'PRIMARY_NORMAL' to 'RESOLVING_NORMAL'. The state changed because the availability group is going offline. The replica is going offline because the associated availability group has been deleted, or the user has taken the associated availability group offline in Windows Server Failover Clustering (WSFC) management console, or the availability group is failing over to another SQL Server instance. For more information, see the SQL Server error log, Windows Server Failover Clustering (WSFC) management console, or WSFC log.

A connection timeout has occurred on a previously established connection to availability replica 'NodXYZ' with id [4EB963EF9EA]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.



FailOver Cluster - Cluster Events:

Event ID: 1177
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

Event ID: 1564
File share witness resource 'File Share Witness' failed to arbitrate for the file share '\\VM01\Quorum'. Please ensure that file share '\\VM01R\Quorum' exists and is accessible by the cluster.

Event ID: 1069
Cluster resource 'File Share Witness' of type 'File Share Witness' in clustered role 'Cluster Group' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Event ID: 1135
Cluster node 'NodXYZ' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.



Cluster Log:

Note: Cluster Log entry times are in UTC (EST + 5hrs)

C:\Windows\Cluster\Reports\Cluster.log

Root Cause: Network Connectivity in the Cluster Resources:

00000db4.000009d4::2018/02/02-10:36:51.678 INFO [IM] got event: LocalEndpoint x.x.x.17:~3343~ has missed two consecutive heartbeats from x.x.x.18:~3343~

00000db4.00000db0::2018/02/02-19:24:20.525 DBG [NETFTAPI] Signaled NetftRemoteUnreachable event, local address x.x.x.17:3343 remote address x.x.x.18:3343

00000db4.000009d4::2018/02/02-19:24:20.525 INFO [IM] got event: Remote endpoint x.x.x.18:~3343~ unreachable from x.x.x.17:~3343~

00000db4.000009d4::2018/02/02-19:24:20.525 INFO [IM] Marking Route from x.x.x.17:~3343~ to x.x.x.18:~3343~ as down