Interesting one today:
In our lab environemnt, one of the SQL cluster environment ran into this error.
Error Message:
Clustered role 'Cluster Group' has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period. The Cluster service failed to bring clustered role 'Cluster Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.
Resolution:
Errors like this are more common in Lab environments than in production environment. In any case, if you encounter the same error in production environment, then take extra caution before you follow these steps.
Possible Root Cause:
In lab sometimes, as part of some other effort, we inadvertently end up failing over the cluster several times within a short period of time. There is a setting in Cluster that measures the failover count.
- If that count hits a particular threshold, it flags the Resource Group as ‘Failed’ state
- And creates an entry in the the Cluster Events, that Cluster Resource Group failed after reaching the threshold (see the error message : Clustered role ‘Cluster Group’ has exceeded its failover threshold)
Resolution Steps:
According to this MSDN post, we could alter that failover count threshold to allow the resource group to come back up in a healthy state.
Step 1:
Go to Failover Cluster Manager >> Roles >> right click on the Resource Group and to go Properties:
- Change the Maximum failures in a specified period to a larger number to account for the repeated failovers in recent hour.
Step 2:
Go to Failover Cluster Manager >> Roles >>
In the bottom portion of the window where we the individual resources are listed, right click on the Resource that is in failed state and go Properties:
- Increase the Maximum restarts in the specified period setting to a larger number to account for recent restarts.
NOTE: This is not a standard solution for production environments.
_Sqltimes