exchange server 2013 site resilience scott schnoll

43
Exchange Server 2013 Site Resilience Scott Schnoll

Upload: citlali-lampen

Post on 15-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Exchange Server 2013 Site Resilience Scott Schnoll

Exchange Server 2013Site Resilience

Scott Schnoll

Page 2: Exchange Server 2013 Site Resilience Scott Schnoll

Agenda

• The Preferred Architecture

• Namespace Planning and Principles

• Datacenter Switchovers and Failovers

• Dynamic Quorum and DAGs

Page 3: Exchange Server 2013 Site Resilience Scott Schnoll

The Preferred Architecture

Page 4: Exchange Server 2013 Site Resilience Scott Schnoll

Site Resilience changes in Exchange 2013

Frontend/Backend recovery are independent

Most protocol access in Exchange Server 2013 is HTTPDNS resolves to multiple IP addressesHTTP clients have built-in IP failover capabilitiesClients skip past IPs that produce hard TCP failures

Namespace no longer a single point of failureSingle or multiple namespace optionsAdmins can switchover by removing VIP from DNS or disablingNo dealing with DNS latency

Page 5: Exchange Server 2013 Site Resilience Scott Schnoll

Preferred ArchitectureNamespace Design

For a site resilient datacenter pair, a single namespace / protocol is deployed across both datacenters

autodiscover.contoso.comHTTP: mail.contoso.comIMAP: imap.contoso.comSMTP: smtp.contoso.com

Load balancers are configured without session affinity, one VIP / datacenter

Round-robin, geo-DNS, or other solutions are used to distribute traffic equally across both datacenters

mail VIP

mail VIP

Page 6: Exchange Server 2013 Site Resilience Scott Schnoll

Preferred ArchitectureDAG Design

• Each datacenter should be its own Active Directory site

• Deploy unbound DAG model spanning each DAG across two datacenters

• Distribute active copies across all servers in the DAG

• Deploy 4 copies, 2 copies in each datacenter

• One copy will be a lagged copy (7 days) with automatic play down enabled

• Native Data Protection is used

• Single network is used for MAPI and replication traffic

• Third datacenter used for Witness server, if possible

• Increase DAG size density before creating new DAGs

DAG

mail VIP

mail VIP

Witness Server

Page 7: Exchange Server 2013 Site Resilience Scott Schnoll

Selina(somewhere in

NA)DNS Resolution

DAG

na VIP na VIP

Batman(somewhere in Europe)

DNS Resolution

DAG

eur VIP

eur VIP

Preferred Architecture

na.contoso.comeur.contoso.com

Page 8: Exchange Server 2013 Site Resilience Scott Schnoll

Namespace Planning & Principles

Page 9: Exchange Server 2013 Site Resilience Scott Schnoll

Namespace Planning

• No need for namespaces required by Exchange 2010• Can still deploy regional namespaces to control traffic• Can still have specific namespaces for protocols

• Two namespace models• Bound Model• Unbound Model

• Leverage split-DNS to minimize namespaces and control connectivity

• Deploy separate namespaces for internal and external Outlook Anywhere host names

Page 10: Exchange Server 2013 Site Resilience Scott Schnoll

Sue (somewhere in

NA) DNS Resolution

DAG1

mail VIP mail2 VIP

mail.contoso.com

mail2.contoso.com

DAG2

Jane(somewhere in

NA)DNS Resolution

Passive

Active

Active

Passive

Bound Model

Page 11: Exchange Server 2013 Site Resilience Scott Schnoll

Round-Robin between # of VIPs

Sue (somewhere in

NA) DNS Resolution

DAG

VIP #1 VIP #2

mail.contoso.com

Unbound Model

Page 12: Exchange Server 2013 Site Resilience Scott Schnoll

Load Balancing

• Exchange 2013 no longer requires session affinity to be maintained on the load balancer

• For each protocol session, CAS now maintains a 1:1 relationship with the Mailbox server hosting the user’s data

• Load balancer configuration and health probes will factor into namespace design

• Remember to configure health probes to monitor healthcheck.htm, otherwise LB and MA will be out of sync

Page 13: Exchange Server 2013 Site Resilience Scott Schnoll

CASOWA

ECP

EWS

EAS

OAB

MAPI

RPC

AutoD

Single Namespace / Layer 4

autodiscover.contoso.com

User

Layer

4LB

mail.contoso.com

health check

Page 14: Exchange Server 2013 Site Resilience Scott Schnoll

CASOWA

ECP

EWS

EAS

OAB

MAPI

RPC

AutoD

Single Namespace / Layer 7

autodiscover.contoso.com

User

Layer

7LB

mail.contoso.com

health check

Health check executes against each virtual directory

Page 15: Exchange Server 2013 Site Resilience Scott Schnoll

mapi.contoso.com

User

Layer

4LB

mail.contoso.com

ecp.contoso.com

ews.contoso.com

eas.contoso.com

oab.contoso.com

oa.contoso.com

CASOWA

ECP

EWS

EAS

OAB

MAPI

RPC

AutoD

autodiscover.contoso.com

Multiple Namespaces / Layer 4

Page 16: Exchange Server 2013 Site Resilience Scott Schnoll

Datacenter Switchovers and Failovers

Page 17: Exchange Server 2013 Site Resilience Scott Schnoll

Witness Server Placement

New Witness Server placement options availableChoose based on business needs and available options

Third location DAG witness server improves DAG recovery behaviorsAutomatic recovery on datacenter loss;Third location network infrastructure must have independent failure modes

Deployment scenario RecommendationsDAG(s) deployed in a single datacenter Locate witness server in the same datacenter as DAG members; can share one server across DAGs

DAG(s) deployed across two datacenters; No additional locations available Locate witness server in primary datacenter; can share one server across DAGs

DAG(s) deployed across two+ datacenters Locate witness server in third location; can share one server across DAGs

Page 18: Exchange Server 2013 Site Resilience Scott Schnoll

alternate datacenter: Portlandprimary datacenter: Redmond

Site Resilience - CAS

cas3 cas4cas1 cas2

VIP: 192.168.1.50X VIP: 10.0.1.50

mail.contoso.com: 192.168.1.50, 10.0.1.50Removing failing IP from DNS puts you in control of in service time of VIP

With multiple VIP endpoints sharing the same namespace, if one VIP fails,clients automatically failover to alternate VIP!

mail.contoso.com: 10.0.1.50

Page 19: Exchange Server 2013 Site Resilience Scott Schnoll

third datacenter: Stockholm alternate datacenter: Portlandprimary datacenter: Redmond

Site Resilience - Mailbox

mbx1 mbx2 mbx3 mbx4

Assuming MBX3 and MBX4 are operating and one of them can lock the witness.log file, automatic failover should occur

witnessX

Page 20: Exchange Server 2013 Site Resilience Scott Schnoll

alternate datacenter: Portlandprimary datacenter: Redmond

Site Resilience - Mailbox

witness

mbx1 mbx2 mbx3 mbx4

1. Mark the failed servers/site as down: Stop-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Redmond

2. Stop the Cluster Service on Remaining DAG members: Stop-Clussvc

3. Activate DAG members in 2nd datacenter: Restore-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Portland

X X X

Page 21: Exchange Server 2013 Site Resilience Scott Schnoll

alternate datacenter: Portlandprimary datacenter: Redmond

Site Resilience - Mailbox

witness

mbx1 mbx2 mbx3 mbx4

alternate witness

1. Mark the failed servers/site as down: Stop-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Redmond

2. Stop the Cluster Service on Remaining DAG members: Stop-Clussvc

3. Activate DAG members in 2nd datacenter: Restore-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Portland

X

Page 22: Exchange Server 2013 Site Resilience Scott Schnoll

Activation Block ComparisonTool Parameter Value Instance Usage

Suspend-MailboxDatabaseCopy

ActivationOnly N/A Per database copy

• Keep active off a working but questionable drive

Set-MailboxServer DatabaseCopyAutoActivationPolicy “Blocked” or “Unrestricted”

Per server • Used to control active/passive SR configurations and maintenance

• Can force admin moveSet-MailboxServer DatabaseCopyActivationDisabledAndMoveNow $true or $false Per server • Used to do faster site

failovers and maintain database availability

• Databases are not blocked from failing back

• Continuous move-off operation

Page 23: Exchange Server 2013 Site Resilience Scott Schnoll

DatabaseDisabledAndMoveNow

New server setting to improve site resilience

Get all active databases off server – FAST!Last resort to not move an active!

Proactively continue move databases attempts

Server can still be in serviceDatabases mounted and mail delivery!

Page 24: Exchange Server 2013 Site Resilience Scott Schnoll

Best Practices

Automate your recovery logic; make it reliableThink of it as rack/site maintenance

Exercise it regularly

Recovery times directly dependent on detection & decision times!Flip the bit! Don’t ask repair times, “if outage go…”Humans are the biggest threat to recovery times

Page 25: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum and DAGs

Page 26: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

In Windows Server 2008 R2, quorum majority is fixed, based on the initial cluster configuration

In Windows Server 2012 (and later), cluster quorum majority is determined by the set of nodes that are active members of the cluster at a given time

This new feature is called Dynamic Quorum, and it is enabled for all clusters by default

Page 27: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

Cluster dynamically manages vote assignment to nodes, based on the state of each nodeWhen a node shuts down or crashes, the node loses its quorum voteWhen a node rejoins the cluster, it regains its quorum vote

By adjusting the assignment of quorum votes, the cluster can dynamically increase or decrease the number of quorum votes required to keep running

Page 28: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

By dynamically adjusting the quorum majority requirement, a cluster can sustain sequential node shutdowns to a single nodeThis is referred to as a “Last Man Standing” scenario

Page 29: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

Does not allow a cluster to sustain a simultaneous failure of majority of voting membersTo continue running, the cluster must always maintain quorum after a node shutdown or failure

If you manually remove a node’s vote, the cluster does not dynamically add the vote back

Page 30: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic QuorumMajority of 7 required

Page 31: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

XX

X

Majority of 4 requiredMajority of 7 required

Page 32: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

XX

XX

Majority of 3 required

Page 33: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

XX

XXX

Majority of 2 required

Page 34: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

XX

XX

X

Majority of 2 required

Page 35: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

XX

XX

X

1

0

Majority of 2 required

Page 36: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

XX

XX

X

0

1

Majority of 2 required

Page 37: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

XX

XX

X

0

1

Majority of 2 required

X

Page 38: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

XX

XX

X

0

1

Majority of 2 required

XX

Page 39: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

Use Get-ClusterNode to verify votes0 = does not have quorum vote1 = has quorum vote

Get-ClusterNode <Name> | ft name, *weight, state

Name DynamicWeight NodeWeight State---- ------------- ---------- -----EX1 1 1 Up

Page 40: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

Works with most DAGsThird-party replication DAGs not tested

All internal testing has it enabled

Office 365 servers use it

Exchange is not dynamic quorum-aware

Does not change quorum requirements

Page 41: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Quorum

Cluster team guidance:Generally increases the availability of the clusterEnabled by default, strongly recommended to leave enabledAllows the cluster to continue running in failure scenarios that are not possible when this option is disabled

Exchange team guidance:Leave it enabled for majority of DAG membersIn some cases where a Windows 2008 R2 DAG would have lost quorum, a Windows 2012 DAG can maintain quorumDon’t factor it into availability plans

Page 42: Exchange Server 2013 Site Resilience Scott Schnoll

Dynamic Witness

Witness OfflineWitness vote gets removed by the cluster

Witness OnlineIf necessary, Witness vote is added back by the cluster

Witness FailureWitness vote gets removed by the cluster

Windows Server 2012 R2 and later

Page 43: Exchange Server 2013 Site Resilience Scott Schnoll

Questions?