enabling rapid and secure metadata search across storage tiers … · 2019. 12. 21. · 2019...

55
2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 1 Enabling Rapid and Secure Metadata Search Across Storage Tiers with GUFI Dominic Manno Los Alamos National Laboratory LA-UR-19-29542

Upload: others

Post on 08-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 1

Enabling Rapid and Secure Metadata Search Across Storage Tiers with GUFI

Dominic MannoLos Alamos National Laboratory

LA-UR-19-29542

Page 2: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 2

Acknowledgments

§ Some slides and content/diagrams provided by LANL colleagues: § David Bonnie, Gary Grider, Jason Lee, Brad Settlemyer

Page 3: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 3

Overview

§ HPC at LANL§ GUFI Description§ Deployment Details§ Road to Performance§ Next Steps

Page 4: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 4

LANL HPC Environment

Page 5: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 5

HPC at LANL

§ Eight decades of weapons computing support to keep the nation safe§ Simulation to determine stability, defects, etc.

§ Cutting edge technology enables large, long-running, multi physics 3D simulations§ Jobs can last months running on 80% of the machine

Page 6: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 6

“Better” Science calls for Better Computers

Roadrunner (2007)1st Petaflop/Accelerator Platform

Cielo (2011)1.7 Petaflop Platform

Trinity (2015)~20 Petaflops, 4 PB Burst Buffer

Page 7: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 7

Storage Tiers

Page 8: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 8

Storage Tiers20.158 PF/smeasured

Page 9: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 9

Scratch – lustre (mostly)

Page 10: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 10

Campaign Storage60PB

Page 11: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 11

Archive

Page 12: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 12

Home and Project Space

Page 13: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 13

Metadata Problem

§ This model depends on users knowing about their data§ Where did it get written?§ Does it need to be backed up? If so, did I already save a

copy?§ Good naming and hierarchy

§ Without explicit management the archive would collect far too much data

§ Need to provide better tools

Page 14: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 14

GUFI Overview

Page 15: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 15

Early Design Discussions

§ Index that could cover all storage systems§ Share index – admins and users, securely!§ Reasonable update times, minimizing impact

to source file systems§ Parallel§ Xattrs§ Leverage existing tech

Page 16: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 16

GUFI Design

§ Re-create source FS tree§ Maintain ownership and permissions on the newly

created tree§ Secure – we already depend on these

permissions on the source§ Use embedded DB in every dir

§ sqlite§ This is where all file information goes

§ Threads!

Page 17: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 17

GUFI Design

Page 18: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 18

GUFI Design

Page 19: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 19

Alternative Approaches

§ Flatten the namespace§ Rename on high in the tree is costly§ Implementing security for users and admins to

share is hard and likely a performance hit§ Why not just write MPI or MPI libcircle jobs to do

this?§ Resources§ Users like find | grep and ls --with-color

§ Storing index in source trees

Page 20: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 20

Development

Page 21: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 21

GUFI_* tools

§ Very powerful queries enabled with base gufi code§ Simple user interface required for those “easier”

more frequent queries§ Try to emulate popular MD tools: ls, find, stat, du, etc

§ Don’t require ton of new code, just basic wrappers

Page 22: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 22

Ingest Tools

§ POSIX tree-walk ingest done§ FS specific optimizations exist – use when

applicable§ Incremental update capability could provide speedup

Page 23: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 23

Hardening

§ Bug fixes§ Stable releases and build env§ Travis – auto builds for RedHat, SUSE, macOS

Page 24: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 24

Reports and Web Interface

*images from qdirstat – windirstatlinux variant

§ Represent commonly used queries as reports§ Provide ability to find “hot-spots” in tree§ Enable intuitive visuals of user data

Page 25: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 25

Reports and Web Interface

Page 26: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 26

Reports and Web Interface

Page 27: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 27

User Interaction Phase I

§ Web – user built and canned reports§ Deploy to allow ease of use for users§ Encourages more use and demonstrates power of

index§ Allow remote queries

Page 28: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 28

Phase I Model

Cluster1 frontend

Cluster2 frontend

Cluster3 frontend

hpc-net

LANL local net

GUFI Search Box

Web-users

Web-admins

gufi_tools

User desktop

ssh

ssh

webssh

Page 29: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 29

Performance Path

Page 30: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 30

Test Setup

§ Single server, Dell R7425§ CPU: AMD Epyc 7401§ Memory: 512 GB§ Kernel 3.10§ Using NVMe SSDs – reported results are only using

1 SSD (unless otherwise specified)§ XFS filesystem

Page 31: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 31

Earliest Performance from Real Trees

§ OK – not what we expected§ Best case ~25x over POSIX tools§ Worst case only ~4x

Page 32: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 32

Opening DBs

Page 33: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 33

Opening DBs

Greater than 10x

Page 34: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 34

Tuning SQLite

§ Sqlite3 has protections § No need for multiple threads to ever access the

same DB at the same time§ VFS: unix-none§ Thread-safe = 0

Page 35: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 35

Opening DBs -- Improved

Page 36: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 36

Improved Query Results

Find all files in NFS Home as uid 12345

Find all files in NFS Home

POSIX GUFI POSIX GUFI

Files 294,188 294,188 13,360,753 13,229,405

Dirs 13,012 13,012 1,633,564 1,622,424

Time 32.1 0.47 2,040 39.2

Files/sec 9,164 625,931 6,549 337,484

Page 37: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 37

Improved Query Results

Find all files in NFS Home as uid 12345

Find all files in NFS Home

POSIX GUFI POSIX GUFI

Files 294,188 294,188 13,360,753 13,229,405

Dirs 13,012 13,012 1,633,564 1,622,424

Time 32.1 0.47 2,040 39.2

Files/sec 9,164 625,931 6,549 337,484

Page 38: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 38

Improved Query Results

Find all files in NFS Home as uid 12345

Find all files in NFS Home

POSIX GUFI POSIX GUFI

Files 294,188 294,188 13,360,753 13,229,405

Dirs 13,012 13,012 1,633,564 1,622,424

Time 32.1 0.47 2,040 39.2

Files/sec 9,164 625,931 6,549 337,484

68x

Page 39: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 39

Improved Query Results

Find all files in NFS Home as uid 12345

Find all files in NFS Home

POSIX GUFI POSIX GUFI

Files 294,188 294,188 13,360,753 13,229,405

Dirs 13,012 13,012 1,633,564 1,622,424

Time 32.1 0.47 2,040 39.2

Files/sec 9,164 625,931 6,549 337,484

51x68x

Page 40: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 40

Improved Query Results

Find all files in scratch1 as uid 67890

Find all files in lustrescratch1

Find all files in scratch1 and NFS home as uid 67890

POSIX GUFI POSIX GUFI POSIX GUFI

Files 22,771,329 22,509,652 119,296,067 118,509,899 - 22,522,140

Dirs 240,736 237,759 5,541,230 5,523,153 - 239,603

Time (s) 531.6 14.5 11,309 134.2 - 14.9

Files/s 42,835 1,553,956 10,548 883,413 - 1,511,553

Page 41: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 41

Improved Query Results

Find all files in scratch1 as uid 67890

Find all files in lustrescratch1

Find all files in scratch1 and NFS home as uid 67890

POSIX GUFI POSIX GUFI POSIX GUFI

Files 22,771,329 22,509,652 119,296,067 118,509,899 - 22,522,140

Dirs 240,736 237,759 5,541,230 5,523,153 - 239,603

Time (s)

531.6 14.5 11,309 134.2 - 14.9

Files/s 42,835 1,553,956 10,548 883,413 - 1,511,553

36x

Page 42: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 42

Improved Query Results

Find all files in scratch1 as uid 67890

Find all files in lustrescratch1

Find all files in scratch1 and NFS home as uid 67890

POSIX GUFI POSIX GUFI POSIX GUFI

Files 22,771,329 22,509,652 119,296,067 118,509,899 - 22,522,140

Dirs 240,736 237,759 5,541,230 5,523,153 - 239,603

Time (s)

531.6 14.5 11,309 134.2 - 14.9

Files/s 42,835 1,553,956 10,548 883,413 - 1,511,553

84x36x

Page 43: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 43

Improved Query Results

Find all files in scratch1 as uid 67890

Find all files in lustrescratch1

Find all files in scratch1 and NFS home as uid 67890

POSIX GUFI POSIX GUFI POSIX GUFI

Files 22,771,329 22,509,652 119,296,067 118,509,899 - 22,522,140

Dirs 240,736 237,759 5,541,230 5,523,153 - 239,603

Time (s)

531.6 14.5 11,309 134.2 - 14.9

Files/s 42,835 1,553,956 10,548 883,413 - 1,511,553

84x36x

Page 44: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 44

Index Creation Results

§ 13,229,405 files from NFS home filesystem into GUFI tree: 38.4s -- 344,515 files/s

§ 118,509,899 files from lustre filesystem into GUFI tree: 148.9s -- 795,902 files/s

§ 154,045,072 files from HPSS archive into GUFI tree: ~175 sec -- 880,257 files/s

Page 45: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 45

jemalloc ImprovementsFind all files in NFS Home

as uid 12345Find all files in NFS Home

POSIX GUFI POSIX GUFI

Files 294,188 294,188 13,360,753 13,229,405

Dirs 13,012 13,012 1,633,564 1,622,424

Time 32.1 0.470.48

2,040 39.222.9

Files/sec 9,164 625,931612,891

6,549 337,484576,947

51x ~89x68x ~68x

Page 46: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 46

jemalloc improvementsFind all files in scratch1

as uid 67890Find all files in lustre

scratch1

Find all files in scratch1 and NFS home as uid 67890

POSIX GUFI POSIX GUFI POSIX GUFI

Files 22,771,329 22,509,652 119,296,067 118,509,899 - 22,522,140

Dirs 240,736 237,759 5,541,230 5,523,153 - 239,603

Time (s)

531.6 14.511.5

11,309 134.2110.7

- 14.911.5

Files/s 42,835 1,553,9561,957,361

10,548 883,4131,070,550

- 1,511,5531,958,446

84x 101x36x 46x

Page 47: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 47

Rollup Optimization§ Find “like” permissions in sub-tree, if allowed, place all

information into single DB higher up in tree. (post-process of created gufi tree)

§ Rollup generalization on NFS home tree:§ Scripts for analysis available. § At Level 1: 1197 rollups possible (2004 dirs, ~60%)

§ Expected speedup of 5-10x § More complex? updates

Page 48: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 48

Rollup Optimization

§ NFS was 68x over normal tools, so expect over 100x, likely much more

§ Lustre was 46x over normal tools, so expect over 80x, likely much more

§ Those are for users, admins queries can be flattened further

(extrapolations from early test runs on dev box)

Page 49: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 49

Next Steps in Performance§ Sharding

§ Likely to see speedup, but need to incorporate into code

§ FS comparison§ User-space?

§ Roll-up integration – how far?§ More sqlite_open optimizations

Page 50: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 50

Next Steps in Performance§ Characterization of cost related to different

categories of operations§ Equations to analyze impact of things like

rollup level choice and other optimizations

Page 51: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 51

Questions?

§ Thank you!§ Contribute, test, file bugs§ Collaborate!

Page 52: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 52

Links§ https://github.com/mar-file-system/GUFI§ Other on-going work and research can be found via Ultrascale

Systems Research Center webpage: https://usrc.lanl.gov/§ Join us in our effort to obtain higher efficiency with the Efficient

Mission Centric Computing Consortium: https://usrc.lanl.gov/emc3.php

Page 53: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 53

Rollup RulesNon exhaustive relatively simple rules to see if rollup is legal:

• If any of my children are not rolled up then NO• When me and all my descendants are o+rx YES• When me and all my descendants are ugo same

and same uid, gid then YES• When me and all my descendants are ug are same

and same uid and gid and top o-rx then YES• When me and all my descendants u same and top go-rx

and same uid then YES• Else NO

Page 54: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 54

Rollup Step 1

d11 d12 d13

d111 d112 d121

d1111

f11.2

f11.1

f13.2

f111.1

f12.1

f13.1

f121.2

f112.1

f121.1

f1111.1

d1L0

L1

L2

L3

d2d0

Parallelize across subtrees

Dirsd1d11d12d13d111d112d121d1111

Filesf11.1f11.2f12.1f13.1f13.2f111.1f112.1f121.1f121.2f1111.1

Walk down and record file/link info and directory info (name/path, uid, gid, mode, level, etc.)

Page 55: Enabling Rapid and Secure Metadata Search Across Storage Tiers … · 2019. 12. 21. · 2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved

2019 Storage Developer Conference. © Los Alamos National Laboratory. All Rights Reserved. 55

Rollup Step 2DirsTop,uid,gid,mode,levelD1,uid,gid,mode,levelD11,….D12,….D13,…D111,….D112,….D121,….D1111,….

d11NO

D12 YES

D13 YES

d111YES

d112YES

d121YES

D1111 YES

D1 NO

L0

L1

L2

L3

d2d0

Parallelize across subtrees

ProcessFrom L3-4 to L0-1 (decr by 1)

Process level by level up the treeDetermine if each dir at the level can

be marked as ”rollable”

Rollup where child is YES and parent is NOD111, D112, D12, D13NOT D1111, D121 (as parent is YES)Not D1, D11 (as child is NO)Totals: total of 8 but 6 rolled into 4 so Dirs to visit went from 8 to 6