high-performance metadata integrity protection in the · pdf filehigh-performance metadata...

27
Harendra Kumar, Yuvraj Patel, Ram Kesavan, Sumith Makam High-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System NetApp, Inc., University of Wisconsin-Madison

Upload: vuongdien

Post on 15-Mar-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Harendra Kumar, Yuvraj Patel, Ram Kesavan, Sumith Makam

High-Performance Metadata Integrity Protectionin the WAFL Copy-on-Write File System

NetApp, Inc., University of Wisconsin-Madison

Page 2: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Example

2

Customer Data Center

“Freeing free block” panic Support checklist• Start recovery run (fsck

like tool)• Seek Engineering help

root-cause the panic

Recovery Run?

Page 3: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Example

3

Scribble bug or Logic bug?

H/W fault or S/W bug?

India USA

How long the recovery run will take???

Engineering

When corruption happened?

Customer

Page 4: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Summary¡ Bugs keep coming

– Hardware faults– Software bugs

¡ Important to protect metadata for correctness¡ Need of the hour

– Simple techniques for strong data integrity– No/negligible performance impact (deployable)– Diagnostic capability

4

Page 5: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Our Solution¡ Separate solutions for separate problems

– Deployed in production¡ Incremental checksum for scribble bugs¡ Digest-based transaction auditing for logic bugs

– In house¡ Page-level protection for diagnostics

5

Page 6: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Key Results¡ Techniques protect metadata

– Negligible performance impact– More than 3x reduction in recovery runs– Deployed in > 250K systems worldwide

¡ Field data (~5 years)– 33 systems protected from 8 unique scribble bugs– 50 systems protected from 9 unique logic bugs

6

Page 7: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Outline¡ Introduction¡ Scribble protection¡ Page-level protection¡ Digest-based transaction audit¡ Evaluation¡ Conclusion

7

Page 8: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Scribble protection¡ Aim: Avoid scribbles corrupting metadata¡ Rolling Incremental checksum on all metadata

update

8

Page 9: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Incremental checksum example

9

P Q R S T …

TimeIndirect block loaded in memory

P QR’ S T …

Indirect block modified

Incremental checksum = C’

P Q’R’ S T …

Incremental checksum = C’’

Indirect block modified

Just before persisting

• Compute Adler 32 bit checksum of the block = C”

• Compare full checksum & Incremental checksum

On successful verification

RAID/Storage

Persist

Incremental checksumcomputation is dependent on the amount of data modified

and cache-line friendly

Metadata updates• Small in Size• Frequent

Concurrent incrementalchecksum computationpossible without locks

Adler 32 bit checksum of full block = C Incremental checksum initialized to C

Page 10: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

10

Incremental checksum example

10

P Q R S T …

TimeIndirect block loaded in memory

P QR’ S T …

Adler 32 bit checksum of full block = C Incremental checksum = C

Indirect block modified

Incremental checksum = C’

P Q’R’ S T …

Incremental checksum = C’

Scribble bug Just before persisting

• Compute Adler 32 bit checksum of the block = C”

• Compare full checksum & Incremental checksum

On verification failure

Panic the system as there can be potential other metadata that is corrupted.

Scribble ends up corrupting the indirect block.(Q à Q’)

Without incremental checksum, this scribble bug can lead to “Freeing free block” panic

Page 11: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Outline¡ Introduction¡ Scribble protection¡ Page-level protection¡ Digest-based transaction audit¡ Evaluation¡ Conclusion

11

Page 12: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Page-level protection¡ Scribble bugs only caught at the end of CP¡ Difficult to root cause scribble bugs¡ Page permissions + Write Protect Enable (WP)

bit– Keep pages read-only by default– Flip WP bit before and after modification

12

Page 13: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Outline¡ Introduction¡ Scribble protection¡ Page-level protection¡ Digest-based transaction audit¡ Evaluation¡ Conclusion

13

Page 14: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Digest-based transaction auditing¡ Logic bugs and their nature¡ Distributed invariants à Consistency equations¡ Lightweight digest (transaction checksum)

– Maintain different digests for different invariants

14

Page 15: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Digest-based transaction auditing

15

Inode

A

XYZB

1 1 0 0 0Bitmap

(A) (B) (C) (D) (E)

B

Client modifies inode A• Adds new block

In-memory state of inode

Inode

A

XYZB

B

PQR

Page 16: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Digest-based transaction auditing

16

Inode

A

XYZB

B

PQR

Inode

D

XYZB

B C

PQR

0 1 1 1 0Bitmap

(A) (B) (C) (D) (E)

During CP

BA

• During indirect block updates• Maintain blocks allocated

digest D1 = C + D• Maintain blocks freed

digest D2 = A

C

Freed block

Allocated block

• During bitmap updates• Maintain blocks allocated

digest D3 = C + D• Maintain blocks freed

digest D4 = AEnd of CP

Compare digests1. D1 == D32. D2 == D4

1 1 0 0 0 Bitmap(A) (B) (C) (D) (E)

Page 17: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Digest-based transaction auditing

17

Inode

A

XYZB

B

PQR

Inode

D

XYZB

B C

PQR

0 1 1 1 0Bitmap

(A) (B) (C) (D) (E)

During CP

BA

• During indirect block updates• Maintain blocks allocated

digest D1 = C + D• Maintain blocks freed

digest D2 = A

C

Freed block

Allocated block

• During bitmap updates• Maintain blocks allocated

digest D3 = C + D• Maintain blocks freed

digest D4 = AEnd of CP

Compare digests1. D1 == D32. D2 == D4

Digests are easy to maintainLightweight - Strong one to one audit avoided

1 1 0 0 0 Bitmap(A) (B) (C) (D) (E)

Page 18: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

• During bitmap updates• Maintain blocks allocated

digest D3 = C• Maintain blocks freed

digest D4 = A

D not updated due to race

Digest-based transaction auditing

18

Inode

A

XYZB

B

PQR

Inode

D

XYZB

B C

PQR

0 1 1 0 0Bitmap

(A) (B) (C) (D) (E)

During CP

BA

• During indirect block updates• Maintain blocks allocated

digest D1 = C + D• Maintain blocks freed

digest D2 = A

C

Freed block

Allocated block

End of CPCompare digests

1. D1 != D32. D2 == D4

1 1 0 0 0 Bitmap(A) (B) (C) (D) (E)

Page 19: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

• During bitmap updates• Maintain blocks allocated

digest D3 = C• Maintain blocks freed

digest D4 = A

D not updated due to race

Digest-based transaction auditing

19

Inode

A

XYZB

B

PQR

Inode

D

XYZB

B C

PQR

0 1 1 0 0Bitmap

(A) (B) (C) (D) (E)

During CP

BA

• During indirect block updates• Maintain blocks allocated

digest D1 = C + D• Maintain blocks freed

digest D2 = A

C

Freed block

Allocated block

End of CPCompare digests

1. D1 != D32. D2 == D4

Without Digest-based transaction auditing, this race can lead to “Freeing free block” panic

1 1 0 0 0 Bitmap(A) (B) (C) (D) (E)

Page 20: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Outline¡ Introduction¡ Scribble protection¡ Page-level protection¡ Digest-based transaction audit¡ Evaluation¡ Conclusion

20

Page 21: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Evaluation

21

¡ Running on >250K systems for 5+ years¡ Negligible regression on file server benchmarks

(eg. SPEC FS) ¡ Heavy metadata updates by DB workloads

– Database/OLTP benchmark (similar to SPC-1) built in-house

Page 22: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

0

7.5

15

22.5

30

80K 88K 96K 104K 112K 120K 128K

ObservedLatency(m

s)

AchievedThroughput(IOPS)

alloff

allon

Performance Evaluation

22

Incremental checksum + Digest-based transaction auditing performance20+ audit equations

• Negligible throughput and latency until 120K ops

• 25% Increase in latency - thereafter

High range - 20 core, 128 GB DRAM, 8 GB NVRAM

Page 23: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Performance evaluation¡ Page level protection

– 20% performance penalty– Used in-house (debug only kernels)– Only used once in field to catch a recurring

scribble bug

23

Page 24: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Protection from corruption bugs¡ 5 year data during in-house development

– Unit test data hard to gather– 75 scribble bugs found by page-level protection– 32 scribble bugs found by incremental checksum– 23 logic bugs found by transaction auditing

¡ More than 3x reduction in no. of recovery runs across ONTAP 8.0 -> 8.3

24

Page 25: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Outline¡ Introduction¡ Scribble protection¡ Page-level protection¡ Digest-based transaction audit¡ Evaluation¡ Conclusion

25

Page 26: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Conclusion¡ Introduced two techniques to enforce data

integrity with minimal performance impact¡ Disprove common belief - “Strong data integrity

requires high performance penalty” ¡ End-to-end protection applicable to databases,

distributed applications¡ Concentrate more on innovation than worrying

about data integrity

26

Page 27: High-Performance Metadata Integrity Protection in the · PDF fileHigh-Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System ... Time Indirect block ... ¡

Thank you!

Questions???J

27