scott schnoll microsoft...

41

Upload: others

Post on 22-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Scott SchnollPrincipal Technical WriterMicrosoft CorporationSession Code: UNC322

Agenda

Disk Storage Technology in 2010 and Beyond

Large Mailbox Value

Store and ESE Database Innovations

Designing Exchange 2010 Storage Solutions

Disk Storage Technology in 2010 and Beyond

Disk Capacity trend predicted to continue2 TB desktop-class SATA disks available (3-4 TB next year)

1 TB Nearline/Midline SAS disk available (2 TB end of year)

Sequential I/O throughput increasing linearly based on areal density (2010 SATA = ~250 MB/sec)

Random I/O performance not expected to improve substantially (15K RPM is the ceiling)

Solid State Disks (SSD)/FlashHigh $/GB, low $/IO

Write performance improving

Reliability mostly addressed with wear leveling

Random vs. Sequential Disk IO

Random IODisk head has to move to process subsequent IO

Head movement = High IO latency

Seek Latency limits IOPS

Sequential IODisk head does not move to process subsequent IO

Stationary Head = Low IO latency

Disk RPM speed limits IOPS

7.2K RPM SATA Disk (20 ms latency)Random = 50 IOPSSequential = +300 IOPS!

IOPS = Input/Outputs(IO’s) per second

Disk Head

E-mail Trends

E-mail volume is growing

Users expect larger corporate mailboxes

E-mail is business critical• Time loss after a failure is measured in seconds

• Data loss after a failure needs to be close to zero

Business users report that they currently spend 19% of their work day, or close to 2 hours/day, on e-mail. (Radicati, 2007)

The average corporate user, today, can expect to send and receive about 156 messages a day, and this number is expected to grow to about 233 messages a day by 2012. An increase of 33% over the four-year period. (Radicati, 2008)

Large Mailbox Value

Large Mailbox: 1-10 GB+Aggregate Mailbox = Primary mailbox + Archive Mailbox

~1 Year of mail (minimum)

Increased knowledge worker productivity

Reduced mailbox management

Client Accessibility (Outlook/OWA/Mobile)

Eliminate/Reduce PSTs

Time ItemsMailbox Size (MB)

1 Day 200 15

1 Month 4000 ~300

1 Year 52,000 ~3800

4 Years 208,000 ~15000

160 Receive + 40 Send/Day Profile, 75KB, no deletions, 5-day work week

Large Mailbox Challenges & SolutionsClient Experience

Performance Improvements: Office 2007 SP2 (KB 953195)

Updated OST sizing guidance (10 GB)

Utilize the Archive Mailbox to reduce data cached to OST

Store/ESE changes

Outlook 2007 Performance (Cached Exchange Mode)

Outlook 2007 (Online)/OWA Performance

Items/folder LimitationsView Creation Performance

Client Search Performance

Store/ESE changes

Search Performance Improvements

Real-time result views

2x increase in indexing performance

Store/ESE changes

Large Mailbox Challenges & SolutionsDeployment/Operations Backup off passive copies

Daily Incremental/Weekly Full backups

DPM Express Full Backups

HA + Hold Policy is your backup

Long Backup Times

Fast Recovery Requirements (RTO)

High Storage CostsIOPS (efficiently utilizing low performance/high capacity disks)RAID overhead

HA

Store/ESE changes

Move Mailbox Downtime Online Move Mailbox

Database MaintenanceOnline Maintenance Duration (OLD)

DB corruption (-1018) pain point

DB re-seed performance hit on active copy

Store/ESE changes

Exchange 2010 Storage Vision

IO ReductionSequential IO

Large, Fast, Low-cost Mailboxes

SATA/Tier 2 Disk Optimization

Storage Design Flexibility

RAID’less Storage (JBOD)

Store Schema ChangesIOPS Reductions

Store Schema: the way the Store organizes data in the ESE Database

One simple theme: move away from doing many, random, small size, disk IOs to doing fewer, sequential, large size, disk IOs

Store Schema ChangesIOPS Reductions

Significant benefits, including fast/efficient…OWA/Outlook Online Mode

…end user viewing for “cold” states/first time view creation

…Calendar Operations

…Search performance

Outlook Cached Mode/Exchange ActiveSyncOST sync = sequential IO

EAS sync = sequential IO

Server Management…Move mailbox

…Content Index Crawls

Store Table ArchitectureIOPS Reduction

E2007

Message/Folder Table (MFT)

Joe:Inbox:H3

Joe:Inbox:H2

Joe:Inbox:H1

Per Database Per Folder

Mailbox Table

Jeff’s Mbx

Ann’s Mbx

Joe’s Mbx

Attachments Table

Jeff:Excel.xls

Ann:Pic.bmp

Joe:Help.doc

Message Table (Msg)

Joe:Msg10

Jeff:Msg32

Ann:Msg180

Folders Table

Jeff:Inbox

Ann:Drafts

Joe:Unread

E2010

View Tables (e.g. From)

Joe:H920

Joe:H302

Joe:H10

Secondary Indexes used for Views

Per Mailbox

Mailbox Table

Jeff’s Mbx

Ann’s Mbx

Joe’s Mbx

Body

Joe:Msg10

Joe:Help.doc

Joe:Msg302

Message Header Table

Joe:H10

Joe:H302

Joe:H920

Folders Table

Joe:Inbox

Joe:Drafts

Joe:Unread

Per Database

New Store Schema = no more single instance storage within a DB

Per View

Store Schema ChangesPhysical Contiguity

1078

B+ Tree

92 4577 6 872 7210 3278 21 9346

1078

B+ Tree

1079 1080 1081 1082 1083 3456 3457 3458

Ex2007

Ex2010

Many, small size, IOs (1 per 8K page)

Fewer, larger size, sequential IOs

DB Pages (Page Numbers)

B+Tree = Table

Store Schema ChangesLazy View Updates

Ex2007

Ex2010

Many, random, IOs (1 per update)

Fewer, sequential, IOs (1 per view)

All Unread or Flagged items (view)

TimeM1 arrives M2 arrives M1 flagged M3 arrives M2 deleted

User uses OWA/Outlook Online and switches to this view

All Unread or Flagged items (view)

M1 M2 M1 M3 M2

M1 M2 M1 M3 M2

Nickel & Dime Approach

Pay to Play Approach

DB I/O

Reducing IO by deferring view updatesView updates utilize sequential IO

ESE ChangesIOPS Reductions

Optimize for new Store SchemaSpace Hints (allocate database space in contiguous manner)

Re-wrote how database maintenance works (maintain database contiguity)

Utilize space efficiently (Database compression)

Increase IO SizesDB page size increased from 8 KB to 32 KB

Improved read/write IO coalescing (Gap coalescing)

Provide improved async read capability (Pre-read)

Increase Cache Effectiveness100 MB Checkpoint Depth (HA configurations only)

DB Cache Compression (Dehydration)

DB Cache Priority (Fast Evict)

Space ManagementAllocate space based on contiguity

Page 1

Used

Page 3

Used

Disk

Database Space Allocation Hints:• Allocate DB space based on either data compactness or data contiguity (usage pattern)

DB Cache

Page X

Msg Header

Page Y

Msg Header

Page Z

EventHistory

Contiguity

Space Contiguity

Space Compactness

Page 4

Msg Header

Page 5

Msg Header

Page 2

EventHistory

Sequential/BloatRandom/Compact

Maintain ContiguityIOPS Reductions

New database maintenance architectureESE Function Exchange 2007 SP1 Exchange 2010

Cleanup(deleted items/mailboxes)

Cleanup performed during Online Defrag (OLD) which occurs during Online Maintenance (OLM) time window

ESE performs cleanup at run time (when store hard delete occurs). Happens during Store dumpster cleanup (OLM), pages are zeroed by default.

Space Compaction Database is compacted and space reclaimed during Online Defrag (OLD)

Database is compacted and space reclaimed at run-time by OLD2. Auto-throttled.

Maintain Contiguity N/A: Contiguity is compromised by space compaction

Database is analyzed for contiguity andspace at run time and is defragmented in the background (B+Tree Defrag/OLD2). Auto-throttled.

Database Checksum When configured, ½ of OLD maintenance window reserved for sequential scan (Checksum), manual throttle. Active DB copy only.

Two options (both Active and Passive copies):1. Run DB Checksum in the

background 24x7 (default). Sequential IO

2. Run DB Checksum during OLM window. Sequential IO

DB Contiguity ResultsIOPS Reductions

Ex2007 Message Folder Table (aka MFT)

Ex2010 Message Header Table (aka MsgHeader)

Blue = contiguous (good)Red = fragmented (bad)

*Production database analysis

Random Deletes at the tail

FRAGMENTED

CONTIGUOUS

DB Page Numbers

Database CompressionMitigate DB Space Growth

Store Schema change, Space Hints, B+Tree Defrag & 32KB page size combine to increase DB file size by 20%.

Growth is 100% mitigated by Database Compression7bit/XPRESS (based on LZ77) Compression for message headers and text/html bodies (Long Values)

1.001.20

1.000.88

0.00

0.50

1.00

1.50

E2007/RTF E2010/RTF E2010/Mix E2010/HTML

1 Database, 750 x 250MB mailboxes, RTF = RTF Compressed, Mix = 77% HTML, 15% RTF, 8% Text, Avg. Message size = ~50KB

DB File Size Comparison

DB Page Size Increased to 32 KBIOPS Reductions

Page 1

Msg Header

Page 2

X

Page 3

Msg Body

Disk

Page 4

X

Page 5

MsgBody

DBCache

Page 1

Msg Header

Page 3

Msg Body

Page 5

MsgBody

3 Read IOs

Page 1 (32 KB)

Msg Header, Msg Body

Disk

DBCache

1 Read IO

Ex2007 DB Read20 KB Message

Ex2010 DB Read20 KB Message

8 KB Pages

32 KB Pages

Page 2 (32 KB)

X

Page 1 (32 KB)

Msg Header, Msg Body

IO Gap CoalescingIOPS Reduction: Read Case

Page 1

Msg Header

Page 2

X

Page 3

Msg Body

Disk

Page 4

X

Page 5

Msg Body

Ex2007 DBRead Behavior

Ex2010 DBRead Behavior

DBCache

Page 1

Msg Header

Page 3

Msg Body

Page 5

Msg Body

3 Read IOs

Page 1

Msg Header

Page 2

X

Page 3

Msg Body

Disk

Page 4

X

Page 5

Msg Body

DBCache

Page 1

Msg Header

Page 3

Msg Body

Page 5

Msg Body

Page 2

TempBuffer

Page 4

TempBuffer

1 Read IO

100 MB Checkpoint Depth (Active Copies)IOPS Reductions

Checkpoint Depth: amount of data waiting to be committed to the database file (edb)

Default Checkpoint Depth Max increased from 20 MB to 100 MB for active mailbox databases configured for HA (non-HA is 20 MB, passive is 5 MB)

Loadgen Test: 3000 Mailbox, 12 DB, Outlook 2007 Online Very Heavy Profile

0

20

40

60

80

100

120

20 40 60 80 100

Database Pages Repeatedly Written/sec

DB Writes/sec (avg)

Checkpoint Depth (MB)

100MB Checkpoint Depth = 40% DB write IO reduction

Deep Checkpoint Benefit = Efficient DB writes (~40% reduction)

Deep Checkpoint Risks: long store shutdown times, long crash recovery times

Risk Mitigation: shutdown databases in parallel, failover on store crash

DB Cache CompressionIOPS Reductions

Problem: New Store Schema + 32 KB pages can reduceefficiency of cache (e.g., a page with 8 KB of data consumes 32 KB of memory in the DB Cache)

Solution: Implement DB Cache Compression to shrink partially used cached pages in memory; allowing more Effective cache.

Page 1 (32 KB)

8 KB

Disk

DBCache

Page 1 (32 KB)

8 KB

1. 32 KB Page with only 8 KB of data is read off disk

2. 32 KB page is compressed to an 8 KB in-memory image

Up to 30% more cache/mailbox server

More Cache = Less DB IO!

Page 1 (8 KB)

8 KB

DB Cache PriorityIOPS ReductionsProblem: Background and recovery DB operations can pollute the cache (e.g., checksumming, OLD2, HA log replay)

Solution: Implement DB Cache Priority to allow lower cache priorities for background/replay operations

Now Past Future

DB Cache Time

Outlook Message Read

HA Log Replay (Passive)

DB Maintenance

Cache Eviction Cache Entry

ESE Caching Algorithm = LRU-K (Least Recently Used)

Optimize for SATA/Tier 2 DisksDB Write IO “Burstiness”Problem: Bursty DB writes negatively affects DB read and Log write latency because the more write IOs issued at a time, the more disk contention

0

20

40

60

80

100

120

2 4 8 16 32 64

IO Latency Based on Max DB Write IOs (ms)

Maximum DB Write IOs Issued

Latency (ms)

DB Read IO

Single 7.2k SATA disk, logs/db on same spindle, Loadgen load generating 250 RPC Operations/second, ~50 IOPS

Log Write IO

Solution: Throttle DB writes based on Checkpoint target (QoS), DB Write Smoothing

DB Write Smoothing Results

3000 Mailboxes, 3MB DB Cache/user, 12 x 7.2k SATA disks (DB/Logs on same spindles), Loadgen Outlook 2007 Online Very Heavy Profile

0

5

10

15

20

25

30

35

40

45

50

Exchange 2010 Baseline Exchange 2010 Smooth DB IO

49

34

3.70.7

10.1

5.1

Ex2010 Smooth DB IO Benefit

DB Read Latency (ms)

Log Write Latency (ms)

RPC Average Latency

50% Reduction!

Ex2007 vs. Ex2010 IOPS Reduction Results

0

50

100

150

200

250

300

350

400

450

500

E2007 E2010

DB IOPS Comparison

DB Read IO/Sec

DB Write IO/Sec

DB IO/Sec

+70% Reduction!

3000 Mailboxes, 3MB DB Cache/user, Loadgen Outlook 2007 Online Very Heavy Profile, 250MB Mailbox Size, E2010 Beta

Exchange IOPS Trend

0

0.2

0.4

0.6

0.8

1

Exchange 2003 Exchange 2007 Exchange 2010

DB IOPS/Mailbox

Exchange 2003

Exchange 2007

Exchange 2010

+ 90% Reduction!

JBOD/RAID'less Storage: Now an option!

JBOD : 1 disk = 1 Database/Log

Requires Exchange 2010 HA (3+ DB Copies)

Annual Disk Failure Rate (AFR) = ~5%

JBOD AdvantagesReducing Storage Costs/Complexity

Eliminates unnecessary DB copies: Server and Storage redundancy can be symmetrical

Reduces Disk IO: Eliminates RAID write penalty

Enables Simple Storage Design: 1 disk = 1 database

Enables Simple Storage Failure Recovery

JBOD ChallengesExchange HA/Storage must replace RAID functionality

Disk Striping performance (e.g. RAID10) cannot be leveraged

Disk Failure = Database Failover (~30 second outage)

Re-enabling Resiliency = Spare disk assignment/partitioning/format/DB re-seed (scriptable)

Soft Disk Errors (bad blocks) must be detected and repaired

Mailbox Server Node 1

Mailbox Server Node 2

Database Availability Group (DAG)

Page1

Page2

Page3

Mailbox Server Node 3

1. Page corruption detected on Active Copy (e.g. -1018)

2. Active DB places marker in log stream to notify passive copies to ship up to date page

3. Passive receives log and replays up to marker, retrieves good page, invokes Replication service callback and ships page

4. Active receives good page, writes page to DB. Page is restored.

DB1-Active

Database

Log

Page1

Page2

Page3

DB1-CopyA

Database

Log

Page1

Page2

Page3

DB1-CopyB

Database

Log

5. Subsequent page repair from additional copies ignored

JBOD/RAID'less Storage: Single Page Restore (Active)

Storage Design Flexibility

SAN DAS (SAS) JBOD (SATA/Tier2)

• HA = Shared Storage Cluster• +1.0 IOPS/Mailbox• 3.5” 15K 146GB FC Disks• RAID10 for DB & Logs• Dedicated Spindles• Multi-path (HBA’s, FC Switches, SAN array controllers)• Backup = Streaming off active • Fast Recovery = Hardware VSS (Snapshots/Clones)

• HA = CCR• .33 IOPS/Mailbox• 2.5” 146GB 10K SAS Disks• RAID5 for DB• RAID10 for Logs• SAS Array Controller (/w BBU)• Backup = VSS Snapshot• Fast Recovery = CCR

• HA = DAG (2+ DB copies)• .11 IOPS/Mailbox• 3.5” 1TB 7.2K SATA/Tier2 Disks• RAID10 for DB & Logs• SAS Array Controller (/w BBU)• Backup = VSS Snapshot/Optional• Fast Recovery = Database Failover

DAS (SATA/Tier2)

• HA = DAG (3+ DB copies)• .11 IOPS/Mailbox• 3.5” 1TB 7.2K SATA/Tier2 Disks• 1 DB = 1 Disk• SAS Array Controller (/w BBU)• Backup = VSS Snapshot/Optional• Fast Recovery = Database Failover

More options to reduce storage cost

Storage Design Flexibility

Personal Archive provides mailbox storage flexibility

Exchange 2010 supportsDAS, SAN, JBOD*, RAID, SATA class, Enterprise Class, SSD**

Exchange 2010 has been optimized for DAS storage and Tier 2 (SATA) disks

IOPS reductions/SATA optimizations enable lower performing storage!

Maximum number of databases/server = 100

Max recommended DB Size = 2 TB*

Max recommended Folder Item Count = 100,000***

* 2+ HA copies only** Not recommended for mainstream due to high $/GB

*** Assuming no 3rd party applications

Exchange 2010 Storage Requirements

Storage Guidance Stand Alone HA (2 copies) HA (3+ copies)

Storage Type DAS, SAN (Fibre Channel, iSCSI)

Disk Type SAS, Fibre Channel, SATA/Tier2 , SSD

RAID RAID recommended RAID optional

RAID Type RAID-1/0, RAID-5, RAID-6 JBOD

DB/Log Isolation Best Practice Not required

Windows Disk Type Basic (recommended), Dynamic (supported)

Partition Type GPT (recommended), MBR (supported)

Partition Alignment Windows 2008/R2 Default (1 MB)

File System NTFS

NTFS Allocation Unit Size 64 KB for both database and log volumes

Encryption Support Outlook Protection Rules, Bitlocker

See Appendix for full details

HA/JBOD ExampleSingle Site, 3 Node, 3 Copy DAG

DB1

DB1

DB1 DB2 DB3 DB4 DB5 DB6

DB7 DB8 DB9 DB10 DB11 DB12

DB13 DB14 DB15 DB16 DB17 DB18

DB19 DB20 DB21 DB22 DB23 DB24

DB25 DB26 DB27 DB28 DB29 DB30

Legend

Active copy Passive copy Spare Disk

DB1

DB1

DB1 DB2 DB3 DB4 DB5 DB6

DB7 DB8 DB9 DB10 DB11 DB12

DB13 DB14 DB15 DB16 DB17 DB18

DB19 DB20 DB21 DB22 DB23 DB24

DB25 DB26 DB27 DB28 DB29 DB30

DB1

DB1

DB1 DB2 DB3 DB4 DB5 DB6

DB7 DB8 DB9 DB10 DB11 DB12

DB13 DB14 DB15 DB16 DB17 DB18

DB19 DB20 DB21 DB22 DB23 DB24

DB25 DB26 DB27 DB28 DB29 DB30

Mbx Server 1

10,000 Mailboxes

3,333 Active Mailboxes/Server

3 Nodes, 3 Copies = double disk failure resiliency

8 Cores32 GB RAM

8 Cores32 GB RAM

8 Cores32 GB RAM

2 GB Mailbox Size

.11 IOPS/Mailbox

1 TB 7.2k RPM disks (SAS/SATA/Tier2)

Online Spares

Battery BackedCaching ArrayController

120 Messages/day

JBOD: 30 disks/node

Database Availability Group (DAG)

Mbx Server 2 Mbx Server 3

Key Takeaways

Exchange Server 2010…Reduces DB IOPS by +70%...again!

Optimizes for large mailboxes (10 GB+) and 100,000 Item counts

Improved performance for SATA (Tier 2 class) disks

Enables JBOD/RAID'less scenarios

Enables unmatched storage flexibility to reduce costs

http://microsoft.com/technet

Resources for IT Professionals

http://microsoft.com/msdn

Resources for Developers

www.microsoft.com/learning

Microsoft Certification & Training Resources

Resources

Complete an

evaluation on

CommNet and

enter to win!

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,

IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.