an active and hybrid storage system for data-intensive applications
DESCRIPTION
Since large-scale and data-intensive applications have been widely deployed, there is a growing demand for high-performance storage systems to support data-intensive applications. Compared with traditional storage systems, next-generation systems will embrace dedicated processor to reduce computational load of host machines and will have hybrid combinations of different storage devices. We present a new architecture of active storage system, which leverage the computational power of the dedicated processor, and show how it utilizes the multi-core processor and offloads the computation from the host machine. We then solve the challenge of applying the active storage node to cooperate with the other nodes in the cluster environment by design a pipeline-parallel processing pattern and report the effectiveness of the mechanism. In order to evaluate the design, an open-source bioinformatics application is extended based on the pipeline-parallel mechanism. We also explore the hybrid configuration of storage devices within the active storage. The advent of flash-memory-based solid state disk has become a critical role in revolutionizing the storage world. However, instead of simply replacing the traditional magnetic hard disk with the solid state disk, researchers believe that finding a complementary approach to corporate both of them is more challenging and attractive. Thus, we propose a hybrid combination of different types of disk drives for our active storage system. An simulator is designed and implemented to verify the new configuration. In summary, this dissertation explores the idea of active storage, an emerging new storage configuration, in terms of the architecture and design, the parallel processing capability, the cooperation of other machines in cluster computing environment, and the new disk configuration, the hybrid combination of different types of disk drives.TRANSCRIPT
04/08/2023
An Active and Hybrid Storage System for Data-intensive Applications
Ph.D Candidate: Zhiyang Ding
Defense Committee Members:Dr. Xiao QinDr. Kai H. ChangDr. David A. UmphressUniversity Reader:Prof. Wei Wang,Chair of the Art Design Dept.
2
Cluster Computing
04/08/2023
• Large-scale Data Processing is everywhere.
3
Motivation
04/08/2023
• Traditional Storage Nodes on the Cluster
Client Network switch
Compute Nodes
Storage Node (or Storage Area Network)Internet
Head Node
4
Motivation
04/08/2023
• What’s the next? • More “Active”.
Storage Node
Client Network switch
Compute Nodes
Internet
Head Node
Computation OffloadI/O Request
Raw DataPre-processed Data
5
About the Active Storage
04/08/2023
pp-mpiBlast:How to deploy Active Storage?
McSD: A Smart Disk Model
Storage Node HcDD:Hybrid Disk for Active Storage
604/08/2023
McSD: A Multicore Active Storage Device
• I/O Wall Problem: CPU--I/O Gap– Limited I/O Bandwidth– CPU Waiting and Dissipating the Power
• How to – Bridge CPU--I/O Gap– Reduce I/O Traffic
7
• “Active”: – Leveraging the Processing Power of Storage Devices
• Benefits:– Offloading Data-intensive Computation– Reducing I/O Traffic– Pipeline Parallel Programming
04/08/2023
Why McSD?
8
• Design a prototype of a multicore active storage
• Design a pre-assembled processing module
• Extend a shared-memory MapReduce system
• Emulate the whole system on a real testbed
04/08/2023
Contributions
9
• Traditional Smart/Active Disks– On-board: Embedding a processor into the hard disk– Various Research Models• e.g. active disk, smart disk, IDISK, SmartSTOR, and etc.
04/08/2023
Background: Active Disks
• However, “active disk” is not adopted by hardware vendors
Improved attachment technologies
I/O Bound Workloads
Cost of the System
Reliability
10
• Multi-core Processors or Multi-processors– 45% transistors increase 20% processing power
• MapReduce: a Parallel Programming Model– MapReduce by Google– Hadoop, Mars, Phoenix, and etc.
• Multicore and Shared-memory Parallel Processing
04/08/2023
Background: Parallel Processing
1104/08/2023
Design: System Overview
Multicore and Shared-memory
Parallel Processing
Communication Mechanism
Hybrid Storage Disks
Pipeline Parallel Processing
Design of an Active Storage
12
• Computation Mechanism– Pre-assembled Processing Model– smartFAM
• Extend the Shared-Memory MapReduce by Partitioning
04/08/2023
Design and Implementation
13
• Pre-assembled Processing Modules– Meet the nature of embedded services– Reduce Complexity and Cost– Provide Services• E.g. Multi-version antivirus service, Pre-process of data-
intensive apps, De-duplication, and etc.
• How to invoke services?
04/08/2023
Pre-assembled Processing Modules
14
• smartFAM = Smart File Alternation Monitor– Invokes the pre-assembled processing modules or
functions by monitoring the changes of the system log file.
• Two Components:– an inotify function: a Linux system function– a trigger daemon
04/08/2023
smartFAM
1504/08/2023
Design and Implementation
12
3
1604/08/2023
Extend the Phoenix:A Shared-memory MapReduce Model
• Extend the Phoenix MapReduce Programming Model by partitioning and merging– New API: partition_input– New Functions:
• partition (provided by the new API)• merge (Develop by user)
• Example:– wordcount [data-file][partition-size][]
1704/08/2023
Pipeline Processing
18
• Testbed
• Benchmarks– Word Count– String Match– Matrix Multiplication
• Individual Node Performance• System Performance04/08/2023
Evaluation Environment
19
Word Count (seconds) String Match (seconds)
1 GB 1.25 GB 1 GB 1.25 GB
w/ Partition 40.60 50.91 17.76 20.61
w/o Partition 85.74 139.54 17.62 21.00
04/08/2023
Individual Node Performance
20
Matrix-Multiplication and Word-Count (Speedups)
Input Data Size vs Single Machine vs Single-core Active vs McSD w/o Partition
500 MB 1.47 X 2.15 X 0.99 X
750 MB 1.45 X 2.09 X 1.04 X
1 GB 7.62 X 2.14 X 6.07 X
1.25 GB 19.01 X 2.50 X 15.39 X
04/08/2023
System Evaluation
21
• It can improve system performance by offloading data-intensive computation
• McSD is a promising active storage model with– Pre-assembled processing modules– Parallel data processing – Better Evaluation Performance
04/08/2023
Summary
22
Storage Node
About the Active Storage
04/08/2023
pp-mpiBlast:How to deploy Active Storage?
McSD: A Smart Disk Model
HcDD:Hybrid Disk for Active Storage
23
• So far, we know the potential of Active Storages
• Challenge: How to coordinate active storage nodes with computing nodes?
• Propose a Pipeline-parallel Processing pattern
04/08/2023
Apply Active Storages to a Cluster
24
• Propose a pipeline-parallel processing framework to
“connect” a Active Storage node with computing nodes.
• Evaluate the framework using both an analytic model
and a real implementation.
• Case Study: Extend an existing bioinformatics
application based on the framework.
04/08/2023
Contributions
2504/08/2023
Background: Active Storage
SSD
Mass Storage
Active Storage Node
SSD
Memory
Buff Disks
Processor
Computation
Bridge?
27
• BLAST*: Basic Local Alignment Search Tool– Comparing primary biological sequence
information
• mpiBLAST** is a freely available, open-source, parallel implementation of NCBI BLAST. – Format raw data files– Run a parallel BLAST function
04/08/2023
Background: Bioinformatics App
*http://blast.ncbi.nlm.nih.gov/**http://www.mpiblast.org/
28
• Offload the raw-data formatting task to where data stores.
• Intra-application Pipeline-parallel Processing by “partition” and “merge”.
• pp-mpiBlast, a case study.
04/08/2023
Pipeline-parallel Design
29
Active Storage Node Computing Nodes
04/08/2023
Pipelining Workflow
Output File
RawInput File
Partition 1
2
…Partition
n
Intermediate 12
…Intermediate
n
Partition
Sub-output 1
2
…Sub-output
n
FormatDB mpiBlast Merge
(n-1) times
n
(n-1) times
1
Inter-mediat
esFormart DB OutputFormart DB
3004/08/2023
Analytic Model
• Three Critical Measures
31
Computing Nodes Configuration Active Storage ConfigurationCPU Intel XEON X3430 Intel Core 2 Q9400
Memory 2 GB DDR3 (PC3-10600)OS Ubuntu 9.04 Jaunty Jackalope 32bit Version
Kernel 2.6.28-15-genericNetwork Gigabit LAN
04/08/2023
Evaluation Environment
Our Testbed Opposite Testbeds“Pipeline-parallel” “12-node Cluster” “13-node Cluster”12 Computing Nodes 12 Computing Nodes 13 Computing Nodes1 Active Storage Node 1 Storage Node 1 Storage Node
3204/08/2023
Pipeline-parallel Design
Results: Compared With 12-node System
Results: Compared With 13-node System
3304/08/2023
Speedups Trends: Partition Size
34
• We proposed a pipeline-parallel processing mechanism to apply an Active Storage Node.
• As a case study, we extended a classic bioinformatics application based on the pipeline-parallel style.
04/08/2023
Summary
35
About the Active Storage
04/08/2023
pp-mpiBlast:How to deploy Active Storage?
McSD: A Smart Disk Model
Storage Node HcDD:Hybrid Disk for Active Storage
3604/08/2023
What’s Hybrid?
A Hybrid Combination of a Gas Engine and a Electronic Engine
Power Efficiency
3704/08/2023
Hybrid Disk Drives
• A Hybrid Combination of Two Types of Storage Devices: HDD and SSD– HDD: Magnetic Hard Disk– Solid State Disk: Built by NAND-based flash memory.
What are their roles?
3804/08/2023
Motivation
• However, SSDs suffer reliability issues.
• In a hybrid storage system, using SSDs as the buffer can boost the performance.
39
• Flash Memory:– Each Block consists 32 or 64 or128 pages. – Each Page is typically 512 or 2,048 or 4,096 bytes.
• “Erase-before-write” at block level.• Lifespan is 10,000 Program/Erase cycles.– E.g., *The lifespan of an 80 GB MLC SSD can only
last 106 days, if the write rates is 30 MB/s.
04/08/2023
Limitations Related to SSDs
• Rethink about their roles?*Based on the SSD lifespan calculator provided by Virident.com
40
• Hybrid Combination of HDD and SSD disks
• De-duplication Service using HDDs as a Write Buffer
• Internal-parallel Processing in SSD
• Simulation of the Whole System For Evaluation
04/08/2023
Contributions
4104/08/2023
Hybrid Disk Configuration
HDD
SSD
I/O Requests
Read Requests
Data of Write Requests
data
Data
De-duplication
Dedicated Processor
Pre-processingRead RequestsPre-processed Data
dataDeduplicated
4204/08/2023
HcDD Architecture
4304/08/2023
Deduplication Design
4404/08/2023
Internal Parallel Processing
4504/08/2023
Evaluation
4604/08/2023
Internal Parallelism Evaluation:Single Node
4704/08/2023
Single Node: Dedup Ratio
4804/08/2023
System Performance Evaluation
4904/08/2023
System Performance Evaluation
5004/08/2023
Summary
51
Conclusion
04/08/2023
pp-mpiBlast:How to deploy Active Storage?
McSD: A Smart Disk Model
Storage Node HcDD:Hybrid Disk for Active Storage
52
Future Work
04/08/2023
53
Many Thanks!And Questions?
04/08/2023