How to recover from FWS and DAG Member failure in 2 Node DAG

November 21, 2017 at 12:28 pm Leave a comment

Hi Folks

Recently we had a situation where one of our customer was affected with a malware and  most of his servers became unusable. The impact caused the File Witness Servers( a Domain Controller) and one of the Exchange Node from the 2 Node DAG environment to become instable.

So after studying the impact we decided to do the below ;

  • Remove the Failed Node from the DAG and rebuild it from scratch and attach it to the DAG again.
  • Change the FWS to another server.

But unfortunately we were not able to proceed as we expected because the cluster service on the remaining node was not able to reach any cluster defined.  When I opened the Failover Cluster Manager I was not able to reach or connect it to the DAG Cluster (As it was not able to reach any the Quorum in our case it is the FWS.  The same was confirmed by the below command:

  • cluster node
    This will show the failed node as down and the survived DAG node in Joining state

To overcome the problem  you have to restart the cluster without quorum to do that type the below command on the  Exchange server

net stop clussvc

net start clussvc  /fq

 

Boom ..  everything  returned normal with Windows Clustering on the remaining node ( you could verify it with the same command ;  cluster node) . I was able to connect it to the DAG cluster via the Windows Clustering Manager.

Now the cluster is restored and I had to move the FWS to another server so I ran the command below which set the new FWS ( Source: https://practical365.com/exchange-server/recovering-a-failed-exchange-2016-database-availability-group-member/)

Set-DatabaseAvailabilityGroup -Identity “DAG-Name” -WitnessDirectory c:\FWS -WitnessServer “New Server Name”

Now  we were able to proceed with the remaining steps that is to
– remove the Mailbox Copies from the Failed Server
–  Move the Active Mailboxes from the Failed Server to the active Server

The commands I used are

  • Get-MailboxDatabaseCopyStatus -Server “Failed Exchange Server Name”  | Remove-MailboxDatabaseCopy -Confirm:$false
  • Move-ActiveMailboxDatabase “Mailbox Database Name” -ActivateOnServer “Exchange Server Name”  -SkipHealthChecks -SkipActiveCopyChecks -SkipClientExperienceChecks -SkipLagChecks -MountDialOverride:BESTEFFORT

Thereafter you could proceed with the remaining steps as mentioned below;

To remove the failed server from the DAG (-ConfigurationOnly switch will execute the command without trying to contact the failed server)

  • Remove-DatabaseAvailabilityGroupServer -Identity “DAG Name”  -MailboxServer “Failed Exchange Server Name” -ConfigurationOnly

Thereafter you need to remove the failed server from the Cluster Group to do that;

  • Get-ClusterNode “Failed Exchange Server Name”  | Remove-ClusterNode

Once you are able to pass through all the steps  , the only thing left is to rejoin the Failed Exchange Server to the same DAG. (Refer Article:https://practical365.com/exchange-server/recovering-a-failed-exchange-2016-database-availability-group-member/)

Hope this will help someone in a similar situation.

Good Luck

Muralee

Advertisement

Entry filed under: Exchange and O365. Tags: , , , , , , , .

SYSVOL Replication Error on Windows 2012 R2 How to use the RHEL / CentOS Media as the Repository.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackback this post  |  Subscribe to the comments via RSS Feed


Archives

Categories

Follow Hope you like it.. on WordPress.com

Blog Stats

  • 68,225 hits

%d bloggers like this: