conquest: preparing for life after disks october 2, 2003 an-i andy wang

83
Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

Upload: job-casey

Post on 17-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

3 Conquest Approach Combine disk and persistent RAM (e.g., battery-backed RAM) in a novel way Simplification At least 20% smaller code base than ext2, reiserfs, and SGI XFS Performance (under popular benchmarks) 24% to 1900% faster than LRU disk caching Best performance boost since Berkeley FFS

TRANSCRIPT

Page 1: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

Conquest:Preparing for Life After Disks

October 2, 2003

An-I Andy Wang

Page 2: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

2

Conquest Overview File systems are optimized for disks

Performance problem Complexity

Now we have tons of inexpensive RAM What can we do with that RAM?

Page 3: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

3

Conquest Approach Combine disk and persistent RAM (e.g.,

battery-backed RAM) in a novel way Simplification

At least 20% smaller code base than ext2, reiserfs, and SGI XFS

Performance (under popular benchmarks) 24% to 1900% faster than LRU disk caching Best performance boost since Berkeley FFS

Page 4: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

4

Performance Problem of Disks

1990 2000

1 KHz

1 MHz

1 GHzCPU (50% /yr)memory (50% /yr)

disk (15% /yr)

accessespersecond(log scale)

105106

1995(1 sec : 6 days) (1 sec : 3 months)

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 5: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

5

Inside Pandora’s Box

Disk arm Disk platters

Access time = seek time (disk arm) + rotational delay (disk platter) + transfer time

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 6: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

6

Disk Optimization Methods Disk arm scheduling Group information on

disk Disk readahead Buffered writes Disk caching Data mirroring Hardware parallelism

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 7: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

7

Complexity Bytes

synchronization

predictive readahead

cache replacement

elevator algorithm

data clusteringdata consistencyasynchronous write

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 8: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Caceres et al., 1993; Hillyer et al., 1996; Qualstar 1998; Tanisys 1999; Quantum 2000; Micron Semiconductor Products 2002]

8

Storage Media Alternatives

accesses/sec (log scale)

$/MB (log scale)

100 103

persistent RAM

magnetic RAM?

(write once) flash memorydisktape

battery-backed DRAM10-3

10-3 106

10-6

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 9: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

9

The Genesis of Conquest Idea: persistent-RAM-only file system

Improved performance Remove disk-related complexity

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 10: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Grochowski 2002] 10

The Genesis of Conquest (2) Problem: wrong growth curves

Disk prices dropping faster than RAM prices Disks will stay around

1995 2005

100

year

$/MB (log scale)

2000

10-2

10-1

101

102

3.5" HDD 2.5" HDD1" HDDpersistent RAM

booming of digitalphotography

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 11: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Grochowski 2002] 11

The Genesis of Conquest (3) New idea: hybrid system for transition

Takes advantage of RAM speed Still simplifies code

1995 2005

100

year

$/MB (log scale)

2000

10-2

10-1

101

102

paper/film

3.5" HDD 2.5" HDD1" HDDpersistent RAM

booming of digitalphotography

4 to 10 GB of persistent RAM

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 12: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

12

Conquest Design Questions How to make effective use of RAM?

Common usage patterns Physical characteristics of RAM storage

Where and how to reduce complexity? Data paths Data structures and associated management Shutdown/boot sequence

How to assure the integrity of file system components that reside in BB-DRAM?

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 13: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Ousterhout 1985; Baker et al., 1991; Iram 1993; Douceur and Bolosky 1999; Roselli et al., 2000; Evans and Kuenning 2002]

13

User Access Patterns Small files

Take little space (10%) Represent most accesses (90%)

Large files Take most space Mostly sequential accesses

Not characteristic of database applications

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 14: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

14

Characteristics of Storage Media RAM

Fast random accesses Cost-effective in performance

Disk Fast sequential accesses Cost-effective in storage

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 15: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

15

The Design of Conquest Deliver all file system services from memory,

with the exception of high-capacity storage Persistent RAM

Data content of small files (smaller than 1 MB) Metadata (file descriptions for large and small

files, directories, and data structures) Disk

Data content of large files Two separate data paths to memory and disk

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 16: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[McKusick et al., 1990; Ganger et al., 2000; Roselli et al., 2000; Seltzer et al., 2000]

16

Conquest Alternatives Disk caching

Assumption of scarce memory Use disk as the final storage destination Complex mechanisms to maintain consistency

RAM drives and RAM file systems Not meant to be persistent Use disk-related mechanisms Limitations on storage capacity

Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

Page 17: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

17

Simplification of Data Paths

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 18: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

18

Content of Persistent RAM Data content of small files (< 1MB)

No seek time or rotational delays Fast byte-level accesses Virtual contiguous allocation

Metadata (e.g., directories, file system states) Fast synchronous update No dual representations For both large and small files

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 19: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

19

Memory Data Path of ConquestConventional File Systems

I/O buffer

disk management

storage requests

I/O buffermanagement

disk

persistencesupport

Conquest Memory Data Path

storage requests

persistencesupport

battery-backedRAM

small file and metadata storage

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 20: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Namesys 2002] 20

Large-File-Only Disk Storage Only store the data content of large files Allocate in big chunks

Lower access overhead Reduced management overhead

No fragmentation management No tricks for small files

Storing data in metadata No elaborate data structures

Wrapping a balanced tree onto disk cylinders Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 21: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

21

Sequential-Access Large Files Sequential disk accesses

Near-raw bandwidth Well-defined readahead semantics Read-mostly

Little synchronization overhead (between memory and disk)

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 22: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

22

Disk Data Path of ConquestConventional File Systems

I/O buffer

disk management

storage requests

I/O buffermanagement

disk

persistencesupport

Conquest Disk Data Path

I/O buffermanagement

I/O buffer

storage requests

disk management

disk

battery-backedRAM

small file and metadata storage

large-file-only file system

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 23: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Baker et al., 1991; Vogels 1999; Roselli et al., 2000] 23

Random-Access Large Files Random access?

Common definition: nonsequential access A typical movie has 150 scene changes MP3 stores the title at the end of the files

Near sequential access? Simplifies large-file metadata representation

significantly

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 24: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

24

Simplification of Data Structures

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 25: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

25

Logical File Representation

File

Name(s) i-node File attributes

Data

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 26: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

26

Physical File Representation

File

Name(s) i-node File attributes Data locations

Data blocks

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 27: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

27

Ext2 Data Representation

data block location

index block location

index block location

index block location

data block location

index block location

index block location

data block location

data block location

i-node(stored on disk)

10data block location

data block locationdata block location

data block location

index block location

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 28: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

28

Disadvantages with Ext2 Design Optimization for small files makes things

complex Designed for disk storage Random-access data structure for large files

that are accessed mostly sequentially Data access time dependent on the byte

position in a file Maximum file size is limited

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 29: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

29

Conquest Representation

index array locationindex array location

i-node(stored in RAM) data block location

data block locationdata block location

data block location

Persistent RAM Single-level dynamically allocated index

Fast data access for files stored in RAM

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 30: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

30

Conquest Representation (2)

segment list locationsegment list location

i-node(stored in RAM)

end block location

begin block locationbegin block location

end block location

Disk

end block location

begin block locationbegin block location

end block location

Worst case: sequential memory search for random disk locations Maximum file size limited by physical storage

(stored on disk)

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 31: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

31

Conquest Directories Per-directory hash tables stored in memory Collisions resolved by rehashing Hard links: multiple names point to same

data Problem:

Dynamic resizing of directories Need to handle the current file position Important for rm -fr

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 32: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

32

The Difficulty With Shrinking rm –fr

hash table locationhash table location

i-node(stored in RAM)

<empty>

NULL

<empty>

NULL

<empty>

NULL

<deleted>

NULL

file i-node location

file1

i-node location

0110 | file1

file i-node location

file1

i-node location

1001 | file2

file i-node location

file1

i-node location

1000 | dir

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 33: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

33

The Difficulty With Shrinking rm -fr

hash table locationhash table location

i-node(stored in RAM)

<deleted>

NULL

<empty>

NULL

<empty>

NULL

<deleted>

NULL

file i-node location

file1

i-node location

0110 | file1

file i-node location

file1

i-node location

1001 | file2

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 34: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

34

The Difficulty With Shrinking rm -fr

hash table locationhash table location

i-node(stored in RAM)

<deleted>

NULL

<empty>

NULL

<empty>

NULL

<deleted>

NULL

file i-node location

file1

i-node location

0110 | file1

file i-node location

file1

i-node location

1001 | file2

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 35: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

35

The Difficulty With Shrinking rm -fr

hash table locationhash table location

i-node(stored in RAM)

<empty>

NULL

<empty>

NULL

file i-node location

file1

i-node location

0110 | file1

file i-node location

file1

i-node location

1001 | file2

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 36: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

36

The Difficulty With Shrinking rm -fr

hash table locationhash table location

i-node(stored in RAM)

<empty>

NULL

<empty>

NULL

file i-node location

file1

i-node location

0110 | file1

Quick fixes Never shrink hash tables (for rm –fr) No promises for ls while adding files

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 37: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Fagin et al., 1979] 37

Extensible Hash Tables Use top, not bottom, bits of hash code

hash table locationhash table location

i-node(stored in RAM)

<empty>

NULL

<empty>

NULL

file i-node location

file1

i-node location

0110 | file1

file i-node location

file1

i-node location

1001 | file2

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 38: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

38

Extensible Hash Tables Preserve ordering of entries when resizing

hash table locationhash table location

i-node(stored in RAM)

<empty>

NULL

<empty>

NULL

<empty>

NULL

<empty>

NULL

file i-node location

file1

i-node location

1001 | file2

file i-node location

file1

i-node location

0110 | file1

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 39: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

39

Additional Engineering Details Dynamic file positioning Need to handle collisions Memory overhead and complexity tradeoffs

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 40: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

40

Simplification of Metadata Management

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 41: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

41

Metadata Allocation Requirements

Keep track of usage status of metadata entries

Avoid duplicate allocation with unique IDs

Fast retrieval of metadata with a given ID

ID: 30| free ID: 81| in useID: 58| freeID: 16| freeID: 89| in useID: 88| free

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 42: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

42

Existing Memory Allocation Services

Keep track of unallocated memory

No duplicate allocation of physical addresses

Hmm…

ADDR 0xe000000| free ADDR 0xe000038| in use ADDR 0xe000070| free ADDR 0xe0000A8| free ADDR 0xe0000E0| free ADDR 0xe000118| in use

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 43: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

43

Conquest Metadata Management Metadata = memory allocated by memory

manager Metadata ID = physical address of metadata

ID: 30| free ID: 81| in useID: 58| freeID: 16| freeID: 89| in useID: 88| free

ADDR 0xe000000| free ADDR 0xe000038| in use ADDR 0xe000070| free ADDR 0xe0000A8| free ADDR 0xe0000E0| free ADDR 0xe000118| in use

Usage status

Unique IDs and fast retrieval

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 44: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

44

Simplification of Shutdown/Boot Sequence

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 45: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

45

Persistence Support Restore file system states after a reboot

Data Metadata Memory manager

Keep track of metadata allocation Reinitialized at boot time No knowledge of persistently allocated data

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 46: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

46

Linux Memory Manager Page allocator maintains individual pages

Page allocator

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 47: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

47

Linux Memory Manager (2) Zone allocator allocates memory in power-of-

two sizes

Page allocator

Zone allocator

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 48: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

48

Linux Memory Manager (3) Slab allocator groups allocations by sizes to

reduce internal memory fragmentation

Page allocator

Zone allocator

Slab allocator

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 49: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

49

Memory Allocation Example Allocate a 455-byte data structure

Slab allocator

One page of data structures

Zone allocator

One page from DMA zone

Page allocator

Page address 0x0000d000 Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 50: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

50

Linux Memory Manager (4) Difficult to restore the persistent states

Three layers of pointer-rich mappings Mixing of persistent and temporary allocations

Page allocator

Slab allocator

Zone allocator

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 51: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

51

Conquest Persistence Create memory zones with own instantiations

of memory managers

Page allocator

Slab allocator

Zone allocator

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 52: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

52

Conquest Persistence Reuse existing memory manager code Encapsulate all pointers within each zone Pointers can survive reboots No serialization and deserialization Swapping and paging

Disabled for Conquest memory zones Enabled for non-Conquest zones

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 53: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Ng et al., 1996] 53

Integrity of Content in RAM User-level program crashes

Same file system interface as others Access control Memory protection

Operating system crashes 1.5% of crashes lead to memory corruption Lose about one data block a decade

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 54: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

54

Other Reliability Mechanisms Instantaneous metadata commit Daily backups Pointer-switch commit semantics

pointerpointer

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 55: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

55

Implementation Status Kernel module under Linux 2.4.2 Operational and POSIX compliant Modified memory manager to support

Conquest persistence Need to overcome BIOS limitations for

distribution

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 56: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

56

Performance Evaluation Architectural simplification

Feature count Performance improvement

Memory-only workloads Memory-and-disk workloads

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 57: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

57

Conventional Data Path Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management

Conventional File Systems

I/O buffer

disk management

storage requests

I/O buffermanagement

disk

persistencesupport

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 58: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

58

Memory Path of Conquest Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management

Conquest Memory Data Pathstorage requests

Persistencesupport

battery-backedRAM

small file and metadata storage

Memory manager encapsulation

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 59: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

59

Disk Path of Conquest Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management

Conquest Disk Data Path

I/O buffermanagement

I/O buffer

storage requests

disk management

disk

battery-backedRAM

small file and metadata storage

large-file-only file system

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 60: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Card et al., 1994; Sweeney et al., 1996; Katcher 1997; Namesys 2002] 60

Conquest is comparable to ramfs At least 24% faster than the LRU disk cache

ISP workload (emails, web-based transactions)

PostMark Benchmark (1)

0100020003000400050006000700080009000

5000 10000 15000 20000 25000 30000

files

trans / sec

SGI XFS reiserfs ext2fs ramfs Conquest

40 to 250 MB working set with 2 GB physical RAM

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 61: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

61

0

1000

2000

3000

4000

5000

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0

percentage of large files

trans / sec

SGI XFS reiserfs ext2fs Conquest

When both memory and disk components are exercised, Conquest can be several times faster than ext2fs, reiserfs, and SGI XFS

PostMark Benchmark (2)

10,000 files,80 MB to 3.5 GB working setwith 2 GB physical RAM

> RAM<= RAM

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 62: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

62

When working set > RAM, Conquest is 1.4 to 2 times faster than ext2fs, reiserfs, and SGI XFS

PostMark Benchmark (3)

0

20

40

60

80

100

120

6.0 7.0 8.0 9.0 10.0

percentage of large files

trans / sec

SGI XFS reiserfs ext2fs Conquest

10,000 files,80 MB to 3.5 GB working setwith 2 GB physical RAM

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 63: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Rosenblum and Ousterhout 1991] 63

Sprite LFS Microbenchmarks Small-file benchmark

Operates on 10,000 1-KB files in three phases

020000400006000080000

100000120000140000160000180000

create read delete

op / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 64: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

65

Sprite LFS Microbenchmarks (2) Modified large-file microbenchmark: ten

1-MB files (Conquest in-core files)

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 65: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

66

Sprite LFS Microbenchmarks (3) Modified large-file microbenchmark: ten

1.01-MB files (Conquest on-disk files)

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 66: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

67

Sprite LFS Microbenchmarks (4) Large-file microbenchmark: forty 100-MB

files (Conquest on-disk files)

0

5

10

15

20

25

30

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs Conquest

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 67: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

68

istory’s Mystery

Puzzling Microbenchmark Numbers…

Geoff Kuenning: “If Conquest is slower than ext2fs, I will toss you off of the balcony…”

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 68: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

69

With me hanging off a balcony… Original large-file microbenchmark: one

1-MB file (Conquest in-core file)

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 69: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

70

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Odd Microbenchmark Numbers Why are random reads slower than sequential

reads?

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 70: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

71

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Odd Microbenchmark Numbers Why are RAM-based file systems slower than

disk-based file systems?

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 71: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Keshava and Penkovski 1999; Torvalds 2001; Abraham 2002] 72

A Series of Hypotheses Warm-up effect?

Maybe Why do RAM-based systems warm up slower?

Bad initial states? No

Pentium III streaming I/O option? No

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 72: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

73

Effects of L2 Cache FootprintsLarge L2 cache footprint Small L2 cache footprint

write a file sequentially

footprint file end

footprint

read the same file sequentially

footprint

flush

file endfile

read

write a file sequentially

footprint file end

footprint

read the same file sequentially

footprint

flush

file end

read

file

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 73: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

74

LFS Sprite Microbenchmarks Modified large-file microbenchmark: ten

1-MB files (Conquest in-core files)

0

100

200

300

400

500

600

700

seq write seq read rand write rand read seq read

MB / sec

SGI XFS reiserfs ext2fs ramfs Conquest

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 74: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Baker et al., 1992; Garcia-Molina and Salem 1992; Wu and Zwaenepoel 1994; Chen et al., 1996; Riedel 1998; Quantum 2000; Miller et al., 2001]

76

Related Work Main-Memory Databases

Memory-based data structures and query mechanisms

File-system applications of persistent RAM Write buffers Flash-memory-based file systems Disk emulators Rio file cache MRAM enabled storage

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 75: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

[Anderson et al., 2000; Palm 2000; IBM 2002; Microsoft 2002] 77

Related Work (2) PDA operating systems

Designed with severe memory constraints Slice

Distributed storage system Dedicated servers for metadata, small files,

and large files

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 76: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

78

Lessons Learned Faster than LRU caching, unexpected

Heavyweight disk handling Severe penalty for accessing memory content

Matching user access patterns to storage media offers considerable simplification and better performance Not an automatic result Need careful design

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 77: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

79

More Lessons Learned Effects of L2 caching become highly visible in

memory workloads (modern workloads) Cannot blindly apply existing disk-based

microbenchmarks to measure memory performance of file systems

Need to consider states of L2 cache and memory behaviors at each stage of microbenchmarking

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 78: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

80

Additional Lessons Learned Don’t discuss your performance numbers next

to a balcony…unless…

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 79: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

81

Going Beyond Conquest Matching usage patterns with heterogeneous

machines in the distributed domain Specialized tasks for machines within a cluster Preferably self-organizing and self-evolving

State-rich computing Caching of runtime data structures Similar to specialized temporary file system

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 80: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

82

Going Beyond Conquest (2) Separate storage of metadata from data

Opportunity for hierarchical replication across devices with different calibers

Benchmarking memory performance of file systems Developing new memory benchmarks

Why are modern operating systems so complicated? More places to expand Conquest approach

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 81: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

83

Contributions Demonstrated the feasibility of disk-memory

hybrid file systems Showed performance does not preclude

simplicity Pinpointed cache-related problems with

modern benchmarks Opened doors to many exciting areas of

research

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 82: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

84

Conclusion Conquest demonstrates how rethinking

changes in underlying assumptions can lead to significant architectural and performance improvements

Radical changes in hardware, applications, and user expectations in the past decade should lead us to rethink other aspects of OS as well.

Genesis • Conquest Design • Performance Evaluation • Conclusion

Page 83: Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

85

Questions . . .Conquest: http://www.cs.fsu.edu/~awang/conquestAndy Wang: [email protected]