the tiledb array data storage manager - harvard...
TRANSCRIPT
![Page 1: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/1.jpg)
The TileDB Array Data Storage Manager
Shiyu Huang
![Page 2: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/2.jpg)
What is the problem?
![Page 3: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/3.jpg)
Storage manager for multi-dimensional arrays
![Page 4: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/4.jpg)
Why is it important?
![Page 5: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/5.jpg)
Large scientific data naturally represented as multi-dimensional arrays
![Page 6: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/6.jpg)
Why is it hard?
![Page 7: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/7.jpg)
Read & Write on large array
u Array representation
u Compression
u Parallel access
u performance
![Page 8: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/8.jpg)
Why existing solutions do not work?
![Page 9: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/9.jpg)
HDF5Array-
oriented DBRelational databases
u Hard to identify and manage dense array
u In-place write
u Regular dimensional chunk as atomic unit
u Encoding the element indices as extra table columns
![Page 10: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/10.jpg)
Core intuition for the solution?
![Page 11: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/11.jpg)
Data model
u Dimensions
u Coordinates
u Cells
u Attributes
![Page 12: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/12.jpg)
Global cell order
Co-located data according to the characteristics of the data
![Page 13: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/13.jpg)
Data tiles
Dense arraysu Space tile
u equi-sized hyper-rectangles
Sparse arraysu Capacity
u Minimum bounding rectangle
u Each non-empty cell in one data tile
![Page 14: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/14.jpg)
Fragmentsu Timestamped snapshot of a batch of updates
![Page 15: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/15.jpg)
Physical Organization
Array -> Directory
Fragment -> Sub-directory Array schema
Values of Variable sized
attribute 1
Fixed-sized attribute 1
Starting offsets of variable –sized
attribute 1
![Page 16: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/16.jpg)
Bookkeeping metadata:Minimum-bound rectangleBounding coordinates
![Page 17: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/17.jpg)
Read operation
u Dense array: visit each space tile in global order
u Sparse array: visit each range that start before the minimum end bounding coordinate
![Page 18: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/18.jpg)
Multiple fragments?
Algorithm!
![Page 19: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/19.jpg)
Sort disjoint ranges on global cell order
u <start coordinate, end coordinate, fragment id>
u Each range appears contiguously on the disk
e.g. querying <2,5>:Get range <2,3>, <4,5>
![Page 20: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/20.jpg)
Priority queue
<1,4,5>
<4,8,2>
<1,4,3> <8, 12, 3>
<8,12,4>
Compare on SC
When tied, give higher fid priority
![Page 21: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/21.jpg)
Operations in the priority queue
![Page 22: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/22.jpg)
<8,12,3>
<8,12,4>
<8,12,4>
<8,12,3>
<4,8,2>
<4,8,2>
<8,12,4>
<8, 12, 3>
<1,4,5>
<1,4,3>
<4,8,2>
<8,12,4>
<8, 12, 3>
<1,4,5>
Priority queue
<1,4,5>
<4,8,2>
<1,4,3> <8, 12, 3>
<8,12,4>
Compare on SC
When tied, give higher fid priority
![Page 23: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/23.jpg)
Write – dense fragment
u The user:
u populates one buffer per attribute, storing the cell values respecting the global cell order
u Write function:
u Append the values from the the buffers into the corresponding attribute file
![Page 24: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/24.jpg)
Sparse fragment
u Mode:
u User provides sorted buffer
u User provides unsorted buffer, TileDB sorts it internally
u Buffer:
u Only non-empty cells
u Extra buffer with coordinates
u Extra write state information
u Deletion:
u Insertions of empty cells
![Page 25: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/25.jpg)
Consolidation
Read operation
Write the retrieved cells to a new fragment Delete the old fragments
![Page 26: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/26.jpg)
Parallel Programming
u Concurrent write: Each process/thread creates a separate fragment, no locking necessary
u Concurrent read:
u Multiple process: separate bookkeeping data and state, no locking
u Multiple thread: One bookkeeping data, only lock on it
u Mixed read and write: Special file indicates if the fragment is visible
u Background consolidation:
u Get the lock when delete the old fragment, release the lock after new fragment is visible
![Page 27: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/27.jpg)
Does the paper prove its claims?
![Page 28: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/28.jpg)
Does the paper prove its claims?
u Clear description of the physical layout and functions
u Logical justification of the design decision
u Comprehensive evaluations
![Page 29: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/29.jpg)
Analysis/experiments?
![Page 30: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/30.jpg)
Dense Array
LOAD
UPDATE
One Core Parallel
![Page 31: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/31.jpg)
Read subarray
![Page 32: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/32.jpg)
Fragments and consolidation
After consolidation
![Page 33: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/33.jpg)
Sparse array
u Load
One core Parallel
![Page 34: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/34.jpg)
Subarray read
One core
Parallel
![Page 35: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/35.jpg)
Gaps in the logic/proof?Possible next step
![Page 36: The TileDB Array Data Storage Manager - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019. 5. 13. · Storage Manager ShiyuHuang. What is the problem? Storage](https://reader036.vdocument.in/reader036/viewer/2022071404/60f8fe13258e7639ea289d2b/html5/thumbnails/36.jpg)
u Distributed version of database?
u Versioning?