Tuesday, May 11, 2010

Exchange 2010 Site Disaster Recovery on a Dime! Part 3: Backup, Restore, Recovery

I began this series by explaining how to build a low cost site or datacenter disaster recovery solution with the new Database Availability Group (DAG) feature in Exchange 2010. Next, I covered the process of failing over to your other site in case of a disaster. Naturally, I hope you never experience a disaster in your Exchange environment, but if you do this article will ensure you are prepared.

In Part 3, I will walk you through the steps of performing a Backup, Restore, and finally a Recovery. While it is important to know how to do a proper backup, it is equally if not more important to be able to use it in case disaster strikes.

The Backup Process
Performing a backup of Exchange 2010 databases is not that difficult; just make sure that your backup software uses VSS technology since the traditional streaming backup API is not available in Exchange 2010.

The built-in Windows Server Backup has this capability, but it lacks many other functions that a real backup solution has. Therefore, I tend to rank Windows Server Backup as a poor man’s backup software. See my earlier post about Windows Server Backup and Exchange 2007 for more details. While the article is written for Exchange 2007, it is also applicable to Exchange 2010.

Other vendors are working on their backup software to ensure its compatibility with Exchange 2010 and some already have it working. An example is Microsoft’s Data Protection Manager 2010.

No matter which backup software you use, the steps for doing a backup are essentially the same. Backup software communicates with VSS, which in turn communicates with the Exchange Write that is installed during the Exchange installation. During this process, only changed blocks on disk will be transferred to the backup software, which is responsible for saving and storing data for later retrieval. By transferring only changed blocks on disk the backup time is decreased and so is the number of bytes on the wire.

In the example we have been using for this series, we have one server running in the primary datacenter and another server in the Disaster & Recovery (DR) datacenter.
The question arises: ‘Where do I do the backup -- on one or both servers?’ The answer is ‘It depends.’ (Don’t you love this answer!)

Your options are:
1. Do the Exchange database backup on one server and Exchange database copy of the other servers
2. OR backup only one server, but which one?

For the Exchange Admin who has been around Exchange awhile, the question about purging transaction log files always comes up. The beauty of the DAG design in Exchange 2010 can be seen during this process: when doing backup of a database in a database availability group (DAG), it will automatically purge the corresponding transaction log files on all replicas of that database. The server running the database where the backup is performed communicates to the other servers having a database replicated, telling them that a backup has been done and that it is now time to purge transaction log files. Which files to purge depends on several factors, such as checkpoint, replay lag time and truncation lag time. Thus, you should not expect them all to be purged with a normal full backup. With this in mind, make sure to size your transaction log LUN correctly if using replay lag time and truncation log time.

The Restore Process
The process of restoring a database located in a DAG is pretty much the same as doing it on a mailbox database that is not a member of DAG replication. The decision you must make is whether to use the lagged copy of the database or to perform a traditional restore.

How can we take advantage of the lagged database copy?
Lagged database copies can be used for recovering a logical corruption in a mailbox or mailbox database, or recovering individual mails or folders within a mailbox. The recovery process is simple, but you must consider the replaylagtime settings carefully so that you can discover a problem in time to use the lagged database copy before transaction log files are replayed into the lagged database.

Components needed for the recovery include a ‘recovery mailbox database’. The first step is to create a recovery mailbox database:

New-MailboxDatabase -Name RecoveryDB -Verbose -Recovery –EdbFilePath E:\Recovery\RecoveryDB.edb –LogFolderPath E:\Recovery -Server FQDNofServerInRecoverySite

This will create a recovery mailbox database with paths set to E:\Recovery

Next step is to get a file copy of the mailbox database you want to extract data from into the E:\Recovery folder. You could use a regular restore from your backup, but it’s often faster to make a copy of the lagged database. You may use the amount of transaction log files that suits your purpose.

Before doing a file copy, it is best to pause the replication with the Suspend-MailboxDatabaseCopy command:

Suspend-MailboxDatabaseCopy 'database name\FQDNofServerInDRSite'-SuspendComment "Recover data from database" -Verbose

We use VSS to do a shadow copy of the database we want to extract data from. The Syntax for vssadmin.exe command line tool is “vssadmin create shadow /For=”

As you can see, you can only do a VSS shadow copy for full volume, meaning the volume is either a disk such as D:\ or a mountpoint. You probably have database files and transaction log files on separate disks, so you must create shadow copies for both disks.

vssadmin create shadow /For=D:
vssadmin create shadow /For=G:

Pay attention to the result you get:
Shadow Copy Volume Name:
\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2

In previous versions of Windows, you could simply do a copy from the strange path above to your recovery folder, but this seems to be have either broken or been taken away in Windows Server 2008 R2. This is how it used to look:

copy “\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\path_to_your_edb_file” E:\RecoveryDB

But I discovered another way of doing it, with explorerer.exe:
Right click on your C drive, select Properties and Previous Version Tab. Here you should see the newly created shadow copy. Select it and click open. A new window opens in which you can drill down to wherever your Exchange database files are located and simply do a file copy of the edb file and corresponding transaction log files to the E:\Recovery folder. Which transaction log files you need to copy depends on how far forward you want to replay information into the database. Simply check the file stamps on the files. Warming: In real life, this file copy will take a long time!

The Recovery Process

Now we can do a recovery of the database in E:\Recovery folder. Start by deleting the checkpoint file “xxx.chk”. Next use eseutil from an elevated command prompt in the E:\Recovery folder:

eseutil /r xxx /a

Where xxx is the transaction log file prefix, such as E00. Output will look something similar to this:

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 14.00
Copyright (C) Microsoft Corporation. All Rights Reserved.
Initiating RECOVERY mode...
Logfile base name: e00
Log files:
System files:
Performing soft recovery...
Restore Status (% complete)
0 10 20 30 40 50 60 70 80 90 100
----------------------------------------
..............................

This could take a long time depending on how many transaction log files you have to roll into the database file. The speed of rolling transaction log files is approximately 2 log files per second.

Next step is to rename the edb file to recoverydb.edb since that was the name chosen when we created the recovery database.

If everything has gone well, we can simply mount our recoverydb:

Mount-MailboxDatabase RecoveryDB

To see what mailboxes there is in the RecoveryDB, use:

Get-MailboxStatistics -Database RecoveryDB

To extract data from the Recovery database, use the Restore-Mailbox command:

Restore-Mailbox -RecoveryDatabase RecoveryDB -TargetFolder Recovery -Identity 'target mailbox' -RecoveryMailbox 'mailbox to get data from' -BadItemLimit 999 –Verbose

You could use several more parameters with Restore-Mailbox such as -ExcludeFolders, -SenderKeyWords, -AttachmentFilenames, -ContentKeywords, -AllContentKeywords, and many more. See the documentation on TechNet for full syntax of Restore-Mailbox.

When restore-mailbox command is finished, you will see a folder structure inside the ‘target mailbox’ named ‘Recovery’ with the extracted data beneath.

Now it’s Time to Clean Up

Now that you have managed to extract data from the lagged copy, you must start cleaning up (‘But Mom!’) First delete the shadow copy; otherwise, it will eventually fill up the shadow storage disk space.

Start by listing your current shadow copies with:

vssadmin list shadows

Look for the shadow you made before. (For example, timestamp is good to use.) Then delete the shadow copy with:

vssadmin delete shadows /Shadow=ShadowId

Where shadowID looks like {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}

When you have finished, use the Recovery database. Simply delete it with:

Remove-MailboxDatabase RecoveryDB

And delete files in the E:\RecoveryDB folder.

Don’t forget to un-pause the replication of transaction log files with Resume-MailboxDatabaseCopy:

Resume-MailboxDatabaseCopy -Identity 'Mailbox Database 2036433681\FQDNofServerInDRSite' –ReplicationOnly

-ReplicationOnly is there to stop ActiveManager from accidentally activating the database DR site.

What if you encounter a corrupted file and need to recover the complete server but also want to go back a few hours in time? (and your DeLorean is all out of plutonium…)
You can always use your regular backup, but you could also use the lagged copy. Using the lagged copy in this scenario is even simpler than described above. Suspend replication, delete checkpoint file and as many transaction log files as you need to “go back in time.” Then select the amount of time and use eseutil /r to replay the transaction log files left on disk.

Next step is to do a switchover to the recovery site and server. Please see part 2 of this series for more detail: Exchange 2010 Site Disaster Recovery on a dime! Part2: Navigating the Failover process

Another approach would be to use a dial-tone database together with a recovery database. I will save this discussion for a future article.