![Page 1: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/1.jpg)
Campaign Storage
storage for tiersspace for everything
Peter Braam
Co-founder & CEO
Campaign Storage, LLC
2017-05
campaignstorage.com
![Page 2: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/2.jpg)
Contents
• Brief overview of the system• Creating and updating policy databases• Data management API’s
5/7/17 Campaign Storage LLC 2
![Page 3: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/3.jpg)
Thank you
The reviewers of our paper asked quite a few insightful questions
Thank you.
5/7/17 Campaign Storage LLC 3
![Page 4: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/4.jpg)
Campaign StorageInvented at LANLBeing productized at Campaign Storage
5/7/17 Campaign Storage LLC 4
![Page 5: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/5.jpg)
CPU or GPU packages
CPU cores
HighBandwidth
Memory
NVRAMe.g. XPOINT,
PCM, STTRAM
RAM
FLASH DISK
BW Cost $/ (GB/s) $10 (CPU included!) $10 $200 $2K $30K
Capacity Cost $/GB $ $8 $0.3 $0.05 $0.01
Node BW (GB/sec) 1 TB/s 100 GB/s 20 GB/s 5 GB/s
Cluster BW (TB/sec)
1 PB/s 100 TB/s 5 TB/s 100 GB/s 10’s GB/s
Software Language level Language levelHDF5 / DAOS
DDN IMECray Data Warp
Parallel FSCampaign Storage
ArchiveCampaign
5/7/17 Campaign Storage LLC 5
TAPE
![Page 6: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/6.jpg)
Campaign Storage - a new tier
5/7/17 Campaign Storage LLC 6
Parallel File System
Archive
Parallel File System
Archive
Burst Buffer
Campaign Storage
Cloud
decreasingemphasis
High BW, high $$$Decreasing capacities
Old World New World
TB/sec
10 GB/sec
100 GB/seclarge, reliable
archive stage
![Page 7: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/7.jpg)
HPC Cluster A
Simulation Cluster20PF
Burst Buffer5 PB & 5 TB/s
HPCD & Viz Cluster HDFS
HPC Cluster B
HPCDCluster20PF
Burst Buffer5 PB & 5 TB/s
Lust
reFS
1 T
B/s
Campaign Storage
Campaign StorageMover Nodes
Campaign StorageMetadata Repository
Campaign StorageMover Nodes
Parallel Staging & Archiving
Campaign StorageObject Repository
Search & Data Management
Campaign Storage
5/7/17 Campaign Storage LLC 7
File System Interface
Optional other tools:• Policy managers (e.g. Robinhood)• Workflow managers (e.g. Irods)
customer infrastructure
![Page 8: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/8.jpg)
Campaign Storage
It is …A file system - staging and archivingBuilt from low cost HW but:
• Industry standard object stores• Existing metadata stores
High integrityHigh capacity, ultra scalableNot highest BW or lowest latency
• 10-100x higher than archives• 10x lower than PFS
It is not …General purpose file system
• Wait … these don’t exist actually
Using object stores has problems• Data mover support takes effort• We will ease that pain
5/7/17 Campaign Storage LLC 8
![Page 9: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/9.jpg)
Implementation - modules
5/7/17 Campaign Storage LLC 9
OS with VFS and Fuse
MarFS
Object Storage Metadata FS
Data Movers
HSM – Lustre / DMAPIMPI
Enterprise NASgridftp
Management
Analytics & SearchMigrationContainers
![Page 10: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/10.jpg)
Campaign Storage - deployment
5/7/17 Campaign Storage LLC 10
Campaign Storage
Campaign StorageMover Nodes
Campaign Storage
MetadataRepository
Campaign StorageMover Nodes
Campaign StorageObject
Repository
Search & Data Management
Object Repository
Disk object stores- Commercial & OSS
Archival object stores- Black Pearl
Full POSIX objects- Stored in metadata FS
Metadata Repository
Some nearly POSIX distributed FS with EA’s• Lustre / ZFS• GPFS
Move & Manage
Nodes: 1-100’s - Mount MarFS & other FS
Mover software- Software on mover node
Management- Search analytics in MarFS- 3rd party movement- Containers
deploy
![Page 11: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/11.jpg)
Policy Databases
5/7/17 Campaign Storage LLC 11
![Page 12: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/12.jpg)
Traditional approach
Database with a record for each fileFound in HPSS, Robinhood, DMF etc
Used forUnderstanding what is in the file system
which files are old, recent, big, belong to group, on deviceAssist in automatic (“policy”) or manual data management Typically histogram ranges are computed from search results
5/7/17 Campaign Storage LLC 12
![Page 13: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/13.jpg)
Challenges
ChallengesPerformance – both ingest and queries
queries on 100M file database can take minutesScalability
Requires significant RAM (e.g. 30% of DB size)Handling more than 1B files is very difficult presently
Never 100% in syncAdds load to premium storage
5/7/17 Campaign Storage LLC 13
![Page 14: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/14.jpg)
Approaches
Horizontally scaling key value storeLANL is exploring this
A variety of proprietary approaches – e.g. Komprise
Histogram analyticsMaintaining aggregate data has it own challenges:
e.g. How to measure the change in size of a fileVery few changelogs record old size
5/7/17 Campaign Storage LLC 14
![Page 15: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/15.jpg)
Analytics - subtree search
Every directory has histogram recording properties of its subtree• encode: #files, #bytes in subtree have a property?• Limited granularity, limited relational algebra
• Store perhaps ~100,000 properties per directory
Examples:• Quota in subtree? User/group database for subtree?• What fileservers contain files?• Geospatial information in file?• (file type, size, access time) tuples
• Allows limited relational algebraNot a new idea. Can be added to ZFS & Lustre
5/7/17 Campaign Storage LLC 15
![Page 16: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/16.jpg)
5/7/17 Campaign Storage LLC 16
Include e.g. linked list of subdirectories and database of parents of files link count > 1
![Page 17: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/17.jpg)
Iterate over subdirectories
5/7/17 Campaign Storage LLC 17
HistoDB
prv nxt
Subdir 1
dir
Subdir 3
HistoDB
prv nxt
Subdir 2
HistoDB
prv nxt
HistoDB
![Page 18: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/18.jpg)
Key properties
Generate initially from a scan, then update with changelogsmathematically provehisto(changelog ○ FS1) = histo_update(changelog) + histo(FS1)
Additive property: histograms can be added, either increase count or add new barshisto(dir) = sum histo(subdirs) + contributions(files in dir)this is Merkl tree property – graft subtrees with simple addition
Keep 100% consistent with snapshotsSpace consumption on par with policy database with 100K histogram buckets
5/7/17 Campaign Storage LLC 18
![Page 19: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/19.jpg)
Inserting subtrees
5/7/17 Campaign Storage LLC 19
HistoDB
subtree
/
/a
/a/b
HistoDB
subtree
HistoDB
HistoDB
HistoDB
+
+
+
=
![Page 20: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/20.jpg)
Evaluation
A single histogram lookup may provide the overview that a policy search provided
But
A histogram approach may has insufficient data for efficient general searches. Adapting histograms can be costly – how common is this?
5/7/17 Campaign Storage LLC 20
![Page 21: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/21.jpg)
Missing Storage API’s
5/7/17 Campaign Storage LLC 21
![Page 22: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/22.jpg)
Reflect on Storage Software
Since 1980’s a utility has been added “afs” “bfs” “cfs” … “zfs”implements a set of non-standardized featuresfile sets, data layout, ACL’s
ACL’s and extended attributes became part of POSIX in 2000’s
Storage software almost always centers around batch data operations:caches do this inside the OSutilities do this – rsync, zip, cloud software does this – dropboxcontainers do this - Docker
5/7/17 Campaign Storage LLC 22
![Page 23: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/23.jpg)
Lack of standardized API’s
Unnecessarily complicated software
Not portable, locked in to a platform
5/7/17 Campaign Storage LLC 23
![Page 24: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/24.jpg)
Example - data movement across many files
• Objective store batches of files• New concept: file level I/O vectorization
• Includes server driven ordering• Packing small files into one object• Cache flushes
5/7/17 Campaign Storage LLC 24
int copy_file_range(copy_range *r, uint count, int flags)
struct copy_range {int source_fd;int dest_fd;off_t source_offset;off_t dest_offset;size_t length;
}
![Page 25: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/25.jpg)
Extending the API - alternatives
In some areas concepts must be defineddata layoutsub-sets and subtrees of file systems (very similar to “mount”)
DB world solved this problem – SQL as a domain specific languageA file level data management solution could build on:
asynchronous data and metadata API’sbatch / transaction boundariesintelligent processing
Possibly a better approach than more API callsevidence is seen in SQFSCK
New problems will keep appearing, e.g. doing this in clusters
5/7/17 Campaign Storage LLC 25
![Page 26: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/26.jpg)
Thank you
5/7/17 Campaign Storage LLC 26
![Page 27: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/27.jpg)
Metadata Movement
5/7/17 Campaign Storage LLC 27
![Page 28: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/28.jpg)
Batch metadata handling
Well studied problem, not easily productized
Several sides to the problem1. scale out the server side – data layout2. bulk communication
in many cases this utilizes replay of operations3. tree requires linking subtrees and subsets
Conflicting demands between latency & throughput
5/7/17 Campaign Storage LLC 28
![Page 29: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/29.jpg)
Role of containers
Fundamentally Unlikelydifferent tiers perform data movement at similar granularity
Containers are a must-have
5/7/17 Campaign Storage LLC 29
![Page 30: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/30.jpg)
Example Container Functionality
5/7/17 Campaign Storage LLC 30
Layer 1
Base layer ZFS file system
ZFS snapshot
ZFS clone ZFS snapshot
ZFS cloneContainer layer Serializeddifferential
Serializeddifferential
Analytics differential
Containeranalytics
ZFS Pool
Application interface Implementation Slower tier interface
![Page 31: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/31.jpg)
Containers as distributed namespace
Requires being able to locate the containerLocation database: a subtree resides on a node
Performance will scale well as long as containers can be large enoughFragmented vs. co-located metadataLocal node performance x #nodes
Related to STT trees, not identical. CMU published a series of papers on this.
5/7/17 Campaign Storage LLC 31
![Page 32: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/32.jpg)
Other approaches / key unsolved problems
Other approaches:Peer to peer metadata protocolsLANL scaled them to 1B file creates / sec (in an experiment)
Allow conflicts
Distributed namespace consistencyAn “epoch” approach tracking dependent updates should workThere is little understanding of fragmented vs contiguous MD
5/7/17 Campaign Storage LLC 32
![Page 33: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/33.jpg)
Conclusions
Campaign Storage: bulk data store, archive – focus on data movement
Massive data handling at file level is importantAmazon introduced S3FS, Dropbox and Gdrive rule
Search, batch metadata movement key ingredients
Richer API’s or a DSL could create a better eco system
campaignstorage.com
5/7/17 Campaign Storage LLC 33
![Page 34: storage for tiers space for everythingstorageconference.us/2017/Presentations/CampaignStorage-slides.pdf · Campaign Storage storage for tiers space for everything Peter Braam. Co-founder](https://reader036.vdocument.in/reader036/viewer/2022081612/5f12d3925e5d581872174746/html5/thumbnails/34.jpg)
Thank you
5/7/17 Campaign Storage LLC 34