![Page 1: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/1.jpg)
High Availability and Disaster Recovery Deep Dive
![Page 2: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/2.jpg)
2
Agenda
Solutions for Disaster RecoveryMailbox Server High AvailabilityCCR and SCR: Better TogetherWhy CCR? Why not SCC?Continuous Replication Demystified
![Page 3: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/3.jpg)
3
Solutions for Disaster Recovery
![Page 4: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/4.jpg)
Solutions for Disaster RecoveryDeleted Item Retention – default 14 daysDeleted Mailbox Retention – default 30 daysMailbox Service and Data Recovery
Server RecoverySetup /m:RecoverServerSetup /recoverCMS
Database portabilityDial tone portabilityContinuous replicationBackup and Restore
Legacy streaming ESE backupsVolume Shadow Copy Service (VSS) backupsRecovery Storage Groups, alternate restores
Edge Transport Server Cloned Configuration
![Page 5: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/5.jpg)
Solutions for Disaster RecoveryAugment built-in solutions with other processes
Configuration ManagementServer build standardizationServer build documentation
Change managementRelease managementProactive monitoringDetailed recovery plansRegular integrity checksRegular practice drills
![Page 6: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/6.jpg)
6
Server RecoverySetup /m:recoverServer
All roles except EdgeFresh install and ImportEdgeConfig for Edge
All custom settings on Client Access server must be recreatedRestrictions: Can’t use this for…
repairing a failed setupmigrating between different operating systemsrecovering or un-clustering a clustered mailbox server
Setup /recoverCMSFor CCR and SCC onlyRestrictions: Can’t use this for…
changing from CCR to SCC or vice versamigrating between different operating systemsclustering a standalone Mailbox serversplitting or merging clustered Exchange environments
Does not trigger Transport DumpsterWindows 2003 clustering has dependency on PDC Emulator
![Page 7: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/7.jpg)
7
Data Recovery
Switch to a replicated copy (Activation)Passive copy (LCR/CCR)Target copy (SCR)
Restore from backupSame serverDatabase portability on alternate server
Database portability from Windows 2003 to Windows 2008 has initial performance impact
Dial tone and data merge using RSG
![Page 8: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/8.jpg)
8
Mailbox Server High Availability
![Page 9: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/9.jpg)
Mailbox Server High Availability
Built-in features for various levels of availabilityLocal Continuous Replication (LCR) – data availabilitySingle Copy Cluster (SCC) – service availabilityCluster Continuous Replication (CCR) – data and service availabilityStandby Continuous Replication (SCR) – disaster recovery and site resilience
![Page 10: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/10.jpg)
10
Mailbox Server High Availability
Local Continuous Replication (LCR)
![Page 11: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/11.jpg)
11
Mailbox Server High Availability
Single Copy Cluster (SCC)
![Page 12: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/12.jpg)
12
Mailbox Server High Availability
Cluster Continuous Replication (CCR)
![Page 13: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/13.jpg)
13
Standby Continuous Replication
CCR
Standalone
SCC
Standalone MailboxServer (w/o LCR)
Standby Cluster with Passive Mailbox Role
SCR Sources SCR Targets
![Page 14: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/14.jpg)
14
CCR and SCR: Better Together
![Page 15: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/15.jpg)
CCR and SCR: Better Together
CCR provides high-availability for Mailbox data and services within the datacenterSCR replicates data remotely to provide site resilience for the Mailbox data
Datacenter A Datacenter B
![Page 16: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/16.jpg)
16
CCR across 2 SitesDatacenter A Datacenter B
![Page 17: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/17.jpg)
17
CCR local / SCR to remote Site
Datacenter A Datacenter B
![Page 18: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/18.jpg)
18
CCR/SCR vs SCC/Sync – 2 sitesDatacenter A Datacenter B
DB
Lo
gs
DB
Lo
gs
Q
Lo
gs
DB
Lo
gs
DB
Exchange Disaster Recovery or 3rd Party Failover
PhysicalCorruption Physical
Corruption
VS
S
Clo
ne
VS
S
Clo
ne
Undetected Physical Corruption
1 month later, Undetected Physical Corruption
On full Storage or Site Failure in Primary Site,corruption is detected, must Recover from Backup
Log corruption detected immediately on replication at both targets
Physical Corruption
Lo
gs D
B
Setup /recovercms, play logs forward
On Site Failure in Primary Site,if corruption not detected and corrected from a test failover, must Recover from Backup
CCR
SCC
![Page 19: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/19.jpg)
19
Why CCR?Why Not SCC?
![Page 20: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/20.jpg)
20
CCR SCCSingle Point of Failure
None when stretched across sites or combined with SCR for site resiliency
Data, Storage and Site single points of failurePotential for massive data loss on single failure:• Storage device failures can lose collocated backups• Hardware replication can propagate physical errors• Storage failure requires activation of remote copy if
one exists• Requires two VSS clones plus a remote copy of data
to achieve RPO equal to CCR
Simplicity Simple setup• No special storage
configuration Built-in Site Resilience Same technology and
redundancy model for intra- and inter-site protection
Shared storage Storage configuration before and after forming
cluster Complex storage stack Complex deployment to get RTO/RPO of 1 CCR
cluster
Why CCR? Why not SCC?
![Page 21: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/21.jpg)
21
CCR SCCBackups Backups off passive copy
eliminates/reduces backup window
Backups must be off active
TCO Reduced TCO• Cheaper hardware• No special storage
expertise required• In-the-box solution• Integrated management• Single operations team• Reduced backup cost
Higher TCO• Additional products needed to achieve
equivalent combined RTO/RPO• Separate management tools for HA
operations may be required• Higher-end servers and storage required• Storage expertise needed
Large Mailboxes
• Great RTO/RPO, Simplicity, No Maintenance Window, Reduced TCO → improved support for larger mailboxes
Higher TCO, long recovery times constrain mailbox size
Why CCR? Why not SCC?
![Page 22: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/22.jpg)
22
Failure CCRStretched CCR or CCR + SCR
SCCSCC + SCR/3rd party replication + 2 VSS clones
to approach combined RTO/RPO of 1 CCR cluster
RTO
Server ~ 2 minutes ~ 2 minutes
Data or LUN ~ 2 minutes 15 min – 1 hour Full Storage ~ 2 minutes ~ 15 min with synchronous replication
Days with VSS clones only
Site ~ 2 minutes for Stretched CCR 30-60 minutes for CCR + SCR
~ 15 min with synchronous replication
Days with VSS clones only
RPO
Server 0 for mail*appointment, contact, task, draft
0 – uses same copy of data
Physical Corrupt
DB 0 Hours to days if sync repl; point in time if VSSLogs 0 (must reseed passive) N/A if log not needed; same as DB if needed
DB LUN dies 0 0 with synchronous replication
Point-in-time with VSS clones
LOG LUN dies 0 for mail*appointment, contact, task, draft
0 with synchronous replication Point-in-time with VSS clones
Full Storage 0 for mail*appointment, contact, task, draft
0 with synchronous replication
Hours to days with VSS clones only
Site Same as Server for Stretched CCR 1 Log**
0 with synchronous replication
Hours to days with VSS clone
* Assumes following best practice guidance for Transport Dumpster **Assumes replication’s keeping up
Why CCR? Why not SCC?
![Page 23: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/23.jpg)
23
Why CCR? Why not SCC?
SCC: no mechanism to detect database corruption on the copy replicated by 3rd Party solutionsSCC: no mechanism to detect log corruption on the copy replicated by 3rd Party solutionsWith hardware-based replication, deeper stack can lead to corruption caused by:
HBA driver/firmwareMulti-path driver Server hardware FC Switch firmwareStorage controller firmware/OSTarget storage controller firmware/OS
Corruptions caused by the applicationLogical corruption replicated by all replication solutionsSCR with lag replay can mitigate if detected early
Logical Corruption
Physical Corruption
![Page 24: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/24.jpg)
24
Continuous Replication Demystified
![Page 25: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/25.jpg)
25
Log Copier
LogReplayer
Basic Replication Pipeline
SourceDB
Store
Log Inspector
Source LogDirectory
InspectorDirectory
ReplicaLogDirectory
TargetDB
![Page 26: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/26.jpg)
26
Continuous Replication Basics
When current log file is closed, it is copied to the replication target by the Replication serviceReplication service
at source: creates read-only shares for log directoryat target: reads from the shares and pulls a copy of the log filecontains a ReplicaInstance for each storage group
Configuration discovered from Active Directory (every 30 sec for LCR/CCR, every 3 min for SCR)
![Page 27: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/27.jpg)
27
Continuous Replication Basics
Communication is done via logs, registry, cluster database and RPC
Logs: replicate database changes and backup statusRegistry: used in LCR and SCR. Also in CCR for checkpointing the current log generation value for loss calculationCluster database: cluster res "Exchange Information Store Instance (CMSName)" /priv | findstr /i replayRPCs: Target Replication service RPCs into Store for log truncation coordination
![Page 28: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/28.jpg)
28
Lost Log Resilience (LLR)
Designed to minimize need to reseed after lossy failoverDatabase changes written to log file prior to database, and the database can be updated as soon as change is loggedLLR modifies this behavior by delaying updates to the database until 1 or more log generations are createdUtilizes a new log stream marker called the waypoint
Minimum Log Required to prevent database divergenceNo modifications after the waypointhave been written to the database
![Page 29: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/29.jpg)
Log Stream Markers
Committed: Log generation 20Checkpoint: Log generation 2Waypoint: Log generation 10What this means:
Only logs 2-10 are neededLogs 11-20 can be discarded
Initiating FILE DUMP mode... Database: priv1.edb ... State: Dirty Shutdown Log Required: 2-10 (0x2-0xA) Log Committed: 0-20 (0x0-0x14) ...
![Page 30: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/30.jpg)
17
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
checkpoint
waypoint
NodeB
18
19
20
2121
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
NodeAHealthy CCR
NodeA fails and a failover to NodeB occurs
Validate database can mount logs lost <
AutoDatabaseMountDial
Logs are generated on NodeB (beyond gen21)
NodeA recovers and performs a
divergence check
NodeA performs incremental reseed and copies logs
Healthy CCR
18
19
20
21
1717
![Page 31: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/31.jpg)
31
When Do I Need A Full Reseed?
RarelyLost log past current Waypoint
Admin accepted large amount of loss by running Restore-StorageGroupCopyAutomatic mount while LLR was “not honored”Automatic lossy mount with “stale” loss window calculation
Log corruption prior to log replayESE cannot skip over logs
Database files modified outside of Store or Replication service
E.g., Offline defrag, eseutil /r
![Page 32: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/32.jpg)
32
Transport Dumpster Hub Transport servers retain messages that have been delivered to destination mailbox until size or time limit is reachedTransport Dumpster is per storage group per Hub Transport server for servers in same Active Directory site as the storage groupTransport Dumpster statistics:
Get-StorageGroupCopyStatus -DumpsterStatistics Output:
DumpsterServersNotAvailable:{HUB1}DumpsterStatistics:
{HUB2(2/25/2009 10:20:37 PM; 2 ; 1032KB)}
![Page 33: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/33.jpg)
33
CCR CMS
MBX2
MBX1
HUB1SG Dumpster Contents
SG1
SG2
HUB2SG Dumpster Contents
SG1
SG2
SG1 SG2
SG1 SG2
Passive
SG Dumpster Contents
SG1 Msg1
SG2 Msg1
SG Dumpster Contents
SG1 Msg2
SG2
SG Dumpster Contents
SG1 Msg1
SG2 Msg1,Msg3
SG Dumpster Contents
SG1 Msg2,Msg4
SG2 Msg4
SG Resubmit Required
SG1
SG2
SG Resubmit Required
SG1 HUB1,HUB2
SG2 HUB1,HUB2
Redeliver SG1,SG2(returns Retry)
Redeliver SG1,SG2(returns timeout)
SG Resubmit Required
SG1 HUB1
SG2 HUB1
Active
Redeliver SG1,SG2(returns Success)
Redeliver SG1,SG2(returns retry)Redeliver SG1,SG2(returns success)
Transport Dumpster
![Page 34: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/34.jpg)
34
Transport Dumpster How much data loss can transport dumpster mitigate?
18 MB dumpster per storage group on 8 Hub Transport servers = 144 MB / storage group[20 MB / 10 hour] x [100 users / SG] = 200 MB message traffic in one hourPutting the above two together gives
60 min X 144 / 200 43.2 minutes worth of datain 43.2 minutes 144+ logs created per SG
Customize transport dumpster size/time limitSet-TransportConfig –MaxDumpsterSizePerStorageGroup 30MB
–MaxDumpsterTime 07.00:00:00
No time window guaranteesIf there are no message size limits, a single large message (e.g., 15 MB) will purge all other messages for destination storage group(s) on a given Hub Transport server
![Page 35: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/35.jpg)
35
Transport Dumpster
When CCR detects a lossy failover:Expands loss window by 12 hours back and 4 hours forward Finds all Hub Transport servers in the local Active Directory siteRequests transport dumpster redelivery from all detected servers
New servers not added to redelivery list
Inaccessible servers: CCR retries same request every 30 seconds until configured MaxDumpsterTime If multiple lossy failovers take place, new loss is window added to previous one
Restore-StorageGroupCopy on LCR is one time request, no retriesRedelivery not triggered as part of Setup /recoverCMSNo other ways to redeliver messages from transport dumpster
![Page 36: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/36.jpg)
Redundant Networks
Use for log shipping and seeding in CCR
Enable-ContinuousReplicationHostName
SeedingUpdate-StorageGroupCopy -DataHostNames:Host1,Host2
Get-ClusteredMailboxServerStatus OperationalReplicationHostNames:FailedReplicationHostNames:InUseReplicationHostNames:
Watch out for misconfigured host file
![Page 37: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/37.jpg)
37
Circular Logging
One configuration setting with two consumersStore service: requires database to be dismounted and re-mounted to take effectReplication service: picks up new setting dynamically
In CCR, it’s no big deal to switch between on/off/onIn some settings, logs are deleted prematurely
Example: turn off circular logging, then enable LCR without dismount/mount of database
ESE is still doing log truncation with circular logging logicLogs will get truncated before making it to the LCR copy
To be safe follow this recipe: Suspend, dismount, change setting, mount, resume
![Page 38: Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive](https://reader035.vdocument.in/reader035/viewer/2022062418/554ebabfb4c905de468b46b5/html5/thumbnails/38.jpg)
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.