aravindan raghuveer david.h.c.du disc - dtc · 2012. 8. 16. · aravindan raghuveer david.h.c.du...
TRANSCRIPT
![Page 1: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/1.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Aravindan RaghuveerDavid.H.C.Du
DISC
Object-based Storage for Exhaustive Search
![Page 2: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/2.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Introduction
Exhaustive Search– Examine all objects in a storage system.– An expensive Operation
Why Exhaustive Search ?– Fuzzy Queries:
Semantic gap in image, video hard to annotate Content-based (Query-by-Example) Demonstrated in the Diamond project at Intel/CMU
– Index Creation: Not effective: Curse of dimensionality Too expensive Not always possible: Fuzzy queries
A “necessary evil” feature on all filesystems.
![Page 3: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/3.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
The factors . . .
Four recent developments spur us to rethink theway exhaustive search is implemented today:
Data Characteristics Disk Technology Trends Filesystem and Database Design Concurrent Applications
![Page 4: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/4.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Factor-1: Data Characteristics
Petabyte scale data sets are becoming common indata mining and HPC apps
Video surveillance:– terabytes of data (depending on number of feeds) that needs
to be searched by content. For instance, experiments @ Mayo clinic generate:
– 3 million images per week– 1TB per 9 days– Presentation in ISW last year.
Impact on exhaustive search– Reduce search space of exhaustive search as much as
possible (for instance, based on metadata)– Exhaustive Search better be darn efficient !
![Page 5: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/5.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Factor-2: Disk Technology Trends
Bits per unit area increasing rapidly I/O Bandwidth lagging behind Effect on exhaustive search:
– 1 day to sequentially read 10TB*– 5 months with 8KB chunk random access !!
Impact on exhaustive search:– Exhaustive Search algorithm should be conscious of avoiding
random disk seeks– Try to get as sequential performance as we possibly can.
* Dr. Jim Gray’s keynote from FAST’05:
![Page 6: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/6.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Factor-3: Filesystem and Database Design
Exhaustive search (eg. grep) runs on top of a filesystem.
Filesystem or a database is not even aware that theapplication is exhaustively searching
Exhaustive Search is not the primary design criteriafor today's’ filesystems.– Filesystem level exhaustive search: Recursive exploration of
directories. With aged, fragmented filesystems:
– At the disk: an Exhaustive search will look more like randomaccess than sequential.
Databases : not as efficient as filesystems in handlingblobs in the presence of fragmentation*.
Impact on exhaustive search: F/S and D/B are not theright place to embed exhaustive search.
* R. Sears, C.Van. Ingen, “Fragmentation in Large Object Repositories”, CIDR 2007
![Page 7: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/7.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Factor 4: Concurrent Applications
Exhaustive Search : Long running, I/O intensive task. Other filesystem applications running concurrently. Concurrent execution of both:
– Performance Isolation: Impact on response time of other applications should be minimal. Impact on efficiency of exhaustive search should be as low as
possible.
Not possible with block storage:– Cannot distinguish one block request from another.– No priorities and QoS levels assigned to requests
Impact on exhaustive search:– The storage device should be able to provide differentiated
services.
![Page 8: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/8.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Summary of what we’ve seen:
Data Characteristics:– Exhaustive Search better be darn efficient !
Disk Technology Trends:– Exhaustive Search algorithm should be conscious
of avoiding random disk seeks Filesystem and Database Design:
– F/S and D/B are not the right place to embedexhaustive search
Concurrent Applications:– The storage device should be able to provide
differentiated services.
![Page 9: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/9.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
What this work is about ?
A fresh look at Exhaustive Search Ensure that the storage system isnever the bottleneck in performance. Conscious of random disk seeks. Close-to-sequential performancealways Concurrent execution with otherfilesystem apps.
– Without compromising extensively onresponse time and efficiency
![Page 10: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/10.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
An Overview of proposed approach
Layout aware:– Search order not based on logical filesystem view but
physical on-disk organization.– As close to sequential performance as possible.
Suspend-and-resume– On a real-time request to disk:
Suspend exhaustive search. Service real-time request. Resume exhaustive search.
– Modify search order based on current disk headposition.
![Page 11: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/11.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Questions to be answered … Architecture:
– Where to embed functionality: filesystem or smart object baseddisk ?
Layout-Aware Search:– Planning the search ?– Metadata handling and placement?
Where are object extents located List of objects already scanned
Suspend-Resume:– Maintaining search progress metadata to avoid re-scanning [suspend]– Computing new search plan [resume]
![Page 12: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/12.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Proposed Solution
Architecture: An intelligent storage node (ISN)capable of exhaustive search.– Exposes an object interface– A case for application-awareness at the storage level to
improve performance Why OSD ?
– File-system or databases does not have idea of storageinternals and parameters.
– Filesystem can be built on multiple layers of virtualization disconnected from actual reality (disk boundaries)
– Filesystem level search performance degrades withfragmentation.
– Block-storage does not differentiate real-time and exhaustivesearch.
![Page 13: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/13.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
The Intelligent Storage Node Storage node with:
– T10 compatible object interface– Capable of executing a limited set of exhaustive search queries.– Called an Intelligent Storage Node (ISN)
Extension to the OSD interface:– Command OSD_QUERY to trigger an exhaustive search.– resultSetCollectionID OSD_QUERY(queryType,
exampleObjectID)
Search Order : Object-Fragment level search as opposed to objectlevel.
Suspend and Resume :– Static : Search Order not modified on resume– Dynamic : Search Order adjusted on resume
![Page 14: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/14.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
ISN for Exhaustive Search
A case for application-aware intelligent storage Application Characteristic:
– Order in which object fragments are scanned is notimportant
Storage-Device characteristic:– Sequential performance is 10X better than random access
performance
Application-Aware Storage Optimization:– Determining search order of fragments to obtain close-to-
sequential performance– Suspend-and-resume support for real time requests.
![Page 15: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/15.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Architecture of ISN:
Prototype under development based on DISC OSD referenceimplementation:– Object filesystem (ext3)– Fragment Indexer– Search Planner
Real-time request support implementation in progress.
Initial results look very promising..
OSD Command Interpreter
Object Filesystem
Fragment Indexer
Search Planner
Block Device
![Page 16: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/16.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Experimental Setup
Aging tool
F/S SearchPlanner
Layout -Aware Search Planner
Search executor
Ext3 Filesystem
Aging tool syntheticallyfragments a filesystem through
file append, delete, createoperations.
Filesystem search plan Layout-Aware search plan
![Page 17: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/17.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Results
Storage Age = 5 Filesystem usage 10G (Partition Size = 63G) Time taken for exhaustive search
– Filesystem : 41 mins– Layout-aware search : 7 mins
![Page 18: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/18.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Acknowledgments
DISC Team– Faculty and Cory Devor– Students
Member Companies
![Page 19: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/19.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Thank You!
Questions ??
![Page 20: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/20.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
More spindles ??
More spindles comes at a cost:– Hardware cost : not too bad.– Maintenance : backup, scrubbing : expensive.– Concurrent failure issues.
Our technique can make “more-spindles” evenbetter.
More spindles not a solution for all scenarios– Home user– Video and image search imaginable.
![Page 21: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/21.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Lazy defragmentation of filesystem??
Alleviates the issue but an expensiveoperation in itself.
Still may not work as good as a storage levelsearch with multiple levels of virtualization.
![Page 22: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology](https://reader036.vdocument.in/reader036/viewer/2022071421/611ab2cea7e15c01fe39bd7e/html5/thumbnails/22.jpg)
University of Minnesota Digital Technology Center Intelligent Storage Consortium
Investigations toDo:
Layout-Awareness:– 2 modes of layout-aware search.– Pre-planned and adhoc.
Pre-planned used when the disk stores a small number ofobjects.
Adhoc mode used when the disk is almost full. Pre-planned and adhoc can be used at finer granularities
(example: different modes on different areas of the disk)– Suspend-Resume:
Suspend: Search Metadata is distributed over the disk, close to thedata.
Resume: Based on the remaining number of objects we either shiftto the pre-planned or adhoc mode.