external memory data structures
DESCRIPTION
External Memory Data Structures. Srinivasa Rao Satti Workshop on Recent Advances in Data Structures December 20, 2011. Fundamental Algorithmic Problems. Searching : Given a list (sequence) L of elements x1, x2, .., xn and query element x , check whether x is present in L . - PowerPoint PPT PresentationTRANSCRIPT
External Memory Data Structures
Srinivasa Rao Satti
Workshop on Recent Advances in Data Structures
December 20, 2011
Fundamental Algorithmic Problems
• Searching: Given a list (sequence) L of elements x1, x2, .., xn and query element x, check whether x is present in L.
– When L is not sorted, we use linear search – scan the list to check if x is present in it.
– When L is sorted, we use binary search – divide the remaining list to be searched in half with every comparison.
Also insert and delete elements to/from L.
• Sorting: Given a sequence of elements, sort them in increasing (or decreasing) order.
– Insertion sort, bubble sort, quick sort, merge sort
2
Random Access Machine (RAM) Model
• Standard theoretical model of computation:
– Infinite memory
– Uniform access cost
• Unit-cost RAM model: All the basic operations (reading/writing a location from/to the memory, standard arithmetic and Boolean operations) take one unit of time.
• Simple model crucial for success of computer industry.
R
A
M
3
Hierarchical Memory
• Modern machines have complicated memory hierarchy
– Levels get larger and slower further away from CPU
– Data moved between levels using large blocks
L
1
L
2
R
A
M
4
Hard disk drive
5
Slow I/O
– Disk systems try to amortize large access time transferring large contiguous blocks of data (8-16Kbytes)
• Important to store/access data to take advantage of blocks (locality)
• Disk access is 106 times slower than main memory access
track
magnetic surface
read/write armread/write head
“The difference in speed between modern CPU and
disk technologies is analogous to the difference
in speed in sharpening a pencil using a sharpener on
one’s desk or by taking an airplane to the other side of
the world and using a sharpener on someone else’s
desk.” (D. Comer)
6
N = # of items in the problem instance
B = # of items per disk block
M = # of items that fit in main memory
T = # of items in output
I/O: Move block between memory and disk
Performance measures:
Space: # of disk blocks used by the structure
Time: # of I/Os performed by the algorithm
(CPU time is “free”)
D
P
M
Block I/O
External Memory Model
8
[Aggarwal-Vitter 1988]
Scalability Problems: Block Access Matters• Example: Traversing linked list
– Array size N = 10 elements
– Disk block size B = 2 elements
– Main memory size M = 4 elements (2 blocks)
• Large difference between N and N/B since block size is large
– Example: N = 256 x 106, B = 8000 , 1ms disk access time
N I/Os take 256 x 103 sec = 4266 min = 71 hr
N/B I/Os take 256/8 sec = 32 sec
Algorithm 2: N/B=5 I/OsAlgorithm 1: N=10 I/Os
1 5 2 6 73 4 108 9 1 2 10 9 85 4 76 3
9
Queues and Stacks• Queue:
– Maintain push and pop blocks in main memory
O(1/B) Push/Pop operations
• Stack:
– Maintain push/pop blocks in main memory
O(1/B) Push/Pop operations
Push Pop
10
Fundamental Bounds Internal External
• Scanning: N
• Sorting: N log N
• Searching:
• Note:
– Linear I/O: O(N/B)
– B factor VERY important:
– Cannot sort optimally with search tree
NBlogBN
BN
BMlog
BN
N2log
11
Search trees: API
• Given a set S of keys, support the operations:
– search(x) : return TRUE if x is in S, and FALSE otherwise
– insert(x) : insert x into S (error if x is already in S)
– delete(x) : delete x from S (error if x is not in S)
– rangesearch(x,y) : return all the keys z such that x ≤ z ≤ y
14
– If nodes are stored arbitrarily on disk Search in I/Os Rangesearch in I/Os
• Binary search tree:
– Standard method for search among N elements
– We assume elements in leaves
– Search traces a root-to-leaf path
Binary Search Trees
)(log2 NO
)(log2 N
)(log2 TNO
15
External Search Trees
• BFS blocking:
– Block height
– Output elements blocked
Rangesearch in I/Os
• Optimal: O(N/B) space and query
)(log2 B
)(B
)(log)(log/)(log 22 NOBONO B
)(log BT
B N
)(log BT
B N
16
• Maintaining BFS blocking during updates?
– Balance is normally maintained in search trees using rotations
• Seems very difficult to maintain BFS blocking during rotation
– Also need to make sure output (leaves) is blocked!
External Search Trees
x
y
x
y
17
B-trees• BFS-blocking naturally corresponds to tree with fan-out
• B-trees balanced by allowing node degree to vary
– Rebalancing performed by splitting and merging nodes
)(B
18
• (a,b)-tree uses linear space and has height
Choosing a,b = each node/leaf stored in one disk block
space and query
(a,b)-tree• T is an (a,b)-tree (a≥2 and b≥2a-1)
– All leaves on the same level and contain between a and b elements
– Except for the root, all nodes have degree between a and b
– Root has degree between 2 and b
)(log NO a
)(log BT
B N
)(B
tree
19
(a,b)-Tree Insert• Insert:
Search and insert element in leaf v
DO { if v has b+1 elements/children
Split v:
make nodes v’ and v’’ with
and elements
insert element (ref) in parent(v)
(make new root if necessary)
v=parent(v) }
• Insert touches nodes
bb 2
1 ab 2
1
)(log Na
v
v’ v’’
21b 2
1b
1b
20
(2,4)-Tree Insert
21
(a,b)-Tree Delete• Delete:
Search and delete element from leaf v
DO { if v has a-1 elements/children
Fuse v with sibling v’:
move children of v’ to v
delete element (ref) from parent(v)
(delete root if necessary)
If v has >b (and ≤ a+b-1<2b) children split v
v=parent(v) }
• Delete touches nodes )(log NO a
v
v
1a
12 a
22
v’
(2,4)-Tree Delete
23
Summary/Conclusion: B-tree• B-trees: (a,b)-trees with a,b =
– O(N/B) space
– O(logB N+T/B) I/Os for search and rangesearch
– O(logB N) I/Os for insert and delete
• B-trees with elements in the leaves sometimes called B+-tree
• Construction in I/Os
– Sort elements and construct leaves
– Build tree level-by-level bottom-up
)(B
)log(BN
BN
BMO
24
25
B-tree Construction• In internal memory we can sort N elements in O(N log N) time using
a balanced search tree:
– Insert all elements one-by-one (construct tree)
– Output in sorted order using in-order traversal
• Same algorithm using B-tree use I/Os
– A factor of non-optimal
• As discussed we could build B-tree bottom-up in I/Os
– In general we would like to have dynamic data structure to use in algorithms I/O operations
)log( NNO B
)(log
log
BBM
BO
)log(BN
BMBNO
€
O( NB logM B
NB ) )log( 1
BN
BMBO
Flash memory
30
Flash memory
31
32
Flash memory• Non-volatile memory which can be erased and programmed
• Characteristics:
– Lighter
– Provides better shock resistance
– Provides more throughput
– Consumes less power
– More denser (uses less space)
compared to magnetic disks
• Commonly used in digital cameras, handheld computers, mobile phones, portable music players etc.
• Also used in embedded systems, sensor networks; and even replacing magnetic disks in PCs.
HDD vs SSD
33
The disassembled components of a hard disk drive (left)
and of the PCB and components of a solid-state drive (right)
Limitations of flash memory• Memory cells in a flash memory device can be written only a
limited number of times
– between 10,000 and 1,000,000, after which they wear out and become unreliable.
• The only way to set bits (change their value from 0 to 1) is to erase an entire region memory. These regions have fixed size in a given device, typically ranging from several kilobytes to hundreds of kilobytes, and are called erase units.
• Two different types of Flash memories: NOR and NAND
– they have slightly different characteristics
34
Flash memory• The memory space of the chip is partitioned into blocks called erase
blocks. The only way to change a bit from 0 to 1 is to erase the entire unit containing the bit.
• Each block is further partitioned into pages, which usually store 2048 bytes of data and 64 bytes of meta-data. Erase blocks typically contain 32 or 64 pages.
• Bits are changed from 1 to 0 by programming (writing) data onto a page. An erased page can be programmed only a small number of times (1 to 3) before it must be erased again.
35
Flash memory• Reading data takes tens of microseconds for the first access to a
page, plus tens of nanoseconds per byte.
• Writing a page takes hundreds of microseconds, plus tens of nanoseconds per byte.
• Erasing a block takes several milliseconds.
• Each block can sustain only a limited number of erasures.
Algorithms/data structures designed for I/O model do not always work well when implemented on flash memory.
36
Flash memory models (I)• General flash model:
• The complexity of an algorithm is x + c · y, where x and y are the number of read and write I/Os respectively, and c is a penalty factor for writing.
• Typically, we assume that BR < BW << M, and c ≥ 1.
37
BR
BW
cM Flash
Flash memory models (II)• Unit-cost flash model:
• General flash model augmented with the assumption of an equal access time per element for reading and writing.
• The cost of an algorithm performing x read I/Os and y write I/Os is given by x.BR + y.BW.
• This simplifies the model considerably, as it becomes easier to adapt external-memory results.
38
BR
BW
M Flash
B-trees on flash memory• An insertion in a B-tree updates a single leaf (unless the leaf splits)
• Since we cannot perform an in-place update in flash memory, we need to create a new copy of the leaf, with the new element inserted.
• Since the parent of this leaf has to update its pointer to the leaf, we need to create a new copy of the parent. And so on..up to the root.
• Thus the write performance is quite bad for the naïve implementation.
39
Flash Translation Layer (FTL)• Software layer on the flash disk which performs logical to physical
block mapping.
• Distributes writes uniformly across blocks.
• B-tree with FTL:
– All nodes contain just the logical address of other nodes
– Allows any update to write just the target node
• Achieves one erase per update (amortized)
40
μ-tree• Minimally Updated tree
• Achieves similar performance as ‘B-tree with FTL’ on raw flash
• Sizes of the nodes decreases exponentially from leaf to the root
• Each block corresponds to a leaf-to-root path, and stores the nodes on a prefix of this path
• Works only when log2 B ≥ logB N
41
[Kang, Jung, Kang, Kim, 2007]
FD-tree
• Flash Disk aware tree index
• Transforms random writes into sequential writes
• Limits random writes to within a small region
42
[Li, He, Yang, Luo, Yi, 2010]
FD-tree
43
•Flash Disk aware tree index
•Transforms random writes into sequential writes
•Contains a head tree and a few levels of sorted runs of increasing sizes
•O(logk N) levels, where k is the size ratio between levels
Other B-tree indexes for flash memory• BFTL [Wu, Luo, Chang, 2007]
• Lazy Adaptive tree [Agrawal, Ganesan, Sitaraman, Diao, Singh, 2009]
• Lazy Update tree [On, Hu, Li, Xu, 2009]
• In-page Logging approach [Lee, Moon, 2007]
• …
All these are designed to get better practical performance, and take different aspects of flash characteristics into consideration.
-- not easy to compare with each other
44
Comparison of tree indexes on flash
45
N – number of elements
BR – read block size
BW – write block size
BU – size of buffer
h – height of the tree
k - parameter
Directions for further research
• The area is still in its infancy.
• Not much is work has been done apart from the development of some file systems and tree indexing structures
• Open problems:
– Efficient tree indexes for flash memory
– Tons of other (practically significant) algorithmic problems
– Better memory model.
46
47
Thank You