hopsfs & epipe - github pages · 2018-10-03 · hopsfs & epipe 1 distributed computing and...

111
Mahmoud Ismail KTH HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26 th 2018

Upload: others

Post on 21-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Mahmoud IsmailKTH

HopsFS & ePipe

�1

Distributed Computing and Analytics Workshop, September 26th 2018

Page 2: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

�2

Page 3: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

�2

Page 4: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

�2Problem: Data layer to store millions of these images and their annotations

Page 5: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

At Scale

�3

Open Images Dataset

Page 6: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

At Scale

�3

Open Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetDataset X

Page 7: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Requirements

•Reading/Writing millions of images with high throughput

•Attaching annotations to each image, and then searching using these annotations

�4

Page 8: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS

�5

Page 9: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS

�5

Hadoop Software Stack

Page 10: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS

�5

Hadoop Software Stack

Page 11: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS

�5

Hadoop Software Stack

Page 12: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS

�5

Hadoop Software Stack

Page 13: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS

�5

Hadoop Software Stack

Page 14: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

Page 15: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

Page 16: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

NameNode

Page 17: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

HDFS Client

NameNode

Page 18: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

HDFS Client

File1

NameNode

Page 19: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

HDFS Client

File1

Where can I save the file?

NameNode

Page 20: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

HDFS Client

File1

Where can I save the file?

DataNodes Addresses

NameNode

Page 21: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

HDFS Client

File1

NameNode

Page 22: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

HDFS Client

File1

NameNode

Page 23: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

HDFS Client

File1

NameNode

Page 24: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

HDFS Client

File Blocks Mappings

File System Metadata

File1

NameNode

Page 25: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

File Blocks Mappings

File System Metadata

File1 Blk1 ! DN1, Blk2 ! DN5, Blk3 ! DN3

NameNode

Page 26: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Architecture

�6

DataNodes

File Blocks Mappings

File System Metadata

File1 Blk1 ! DN1, Blk2 ! DN5, Blk3 ! DN3 File2 Blk1 ! DN1, Blk2 ! DN4File3 Blk1 ! DN1, Blk2 ! DN2, Blk3 ! DN3File4 Blk1 ! DN100File5 Blk1 ! DN4, Blk2 ! DN2, Blk3 ! DN9

… … … …FileN Blk1 ! DN2, Blk2 ! DN8

NameNode

Page 27: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Performance at Scale

�7

DataNode

NameNode

Page 28: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Performance at Scale

�7

DataNode

NameNode

Page 29: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Performance at Scale

�7

DataNode

NameNode

Page 30: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Performance at Scale

�7

DataNode

NameNode

Page 31: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

`

Page 32: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HDFS Limitations

• Namespace size upper bound: ~ 500 million files

• At most 70-80 thousands file system operations / sec

�9

Page 33: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS

�10

Page 34: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS

�10

Page 35: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS

�10

Page 36: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS Architecture

�1111

NameNodeFile Blocks Mappings

File System Metadata

File1 Blk1 ! DN1, Blk2 ! DN4, Blk3 ! DN5

File5 Blk1 ! DN4, Blk2 ! DN2, Blk3 ! DN9

File4 Blk1 ! DN100

File3 Blk1 ! DN1, Blk2 ! DN2, Blk3 ! DN3

File2 Blk1 ! DN1, Blk2 ! DN4

FileN Blk1 ! DN2, Blk2 ! DN8

Page 37: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS Architecture

�1111

NameNode File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Page 38: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS Architecture

�1111

NameNode File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

Page 39: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

Page 40: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

Page 41: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

Page 42: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

Page 43: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

Page 44: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

Page 45: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS Scalability

• 16X-37X the throughput of HDFS

• 37 times more files than HDFS

• 10 times lower latency

�12

Scale Challenge Winner (2017)Hops

Page 46: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Integration with NVMe

�13

https://cloud.google.com/compute/docs/disks/performance

Page 47: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Integration with NVMe

�13

https://cloud.google.com/compute/docs/disks/performance

Page 48: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Integration with NVMe

�13

https://cloud.google.com/compute/docs/disks/performance

HDFS (and S3) are designed around large blocks (optimized to overcome slow random I/O on disks), while new NVMe hardware supports fast random disk I/O (and potentially small blocks sizes)

Page 49: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Small files

�14

Page 50: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Small files

�14

0

0.2

0.4

0.6

0.8

1

1 KB

4 KB

5 KB

6 KB

8 KB

16 KB

32 KB

64 KB

100 KB

512 KB

1 MB

8 MB

64 MB

256 MB

1 GB

128 GB

CDF

File Size

a. File Size Distribution

Yahoo HDFS File DistributionSpotify HDFS File Distribution

LC HopsFS File Distribution At Yahoo! and Spotify ≈20% of the files are less than 4 KB. Logical Clocks’ HopsFS cluster ≈68% of the files are less than 4 KB

Page 51: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Small files

�14

0

0.2

0.4

0.6

0.8

1

1 KB

4 KB

5 KB

6 KB

8 KB

16 KB

32 KB

64 KB

100 KB

512 KB

1 MB

8 MB

64 MB

256 MB

1 GB

128 GB

CDF

File Size

a. File Size Distribution

Yahoo HDFS File DistributionSpotify HDFS File Distribution

LC HopsFS File Distribution At Yahoo! and Spotify ≈20% of the files are less than 4 KB. Logical Clocks’ HopsFS cluster ≈68% of the files are less than 4 KB

0

0.2

0.4

0.6

0.8

1

1 KB

4 KB

5 KB

6 KB

8 KB

16 KB

32 KB

64 KB

100 KB

512 KB

1 MB

8 MB

64 MB

256 MB

1 GB

128 GB

CDF

File Size

b. File Operations Distribution

Spotify HDFS File Ops DistributionLC HopsFS File Ops Distribution At Spotify, and Logical

Clocks ≈ 42% and ≈18% of all the file system operations are performed on files less than 16 KB files

Page 52: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Size Matters

�15

Page 53: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Small Files performance in HopsFS

�16

Page 54: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Open Images dataset

�17

83.5% of the files in the dataset are ⩽ 64 KB.

Page 55: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Open Images Dataset

�18

Page 56: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Requirements

•Reading/Writing millions of images with high throughput

• Attaching annotations to each image, and then searching using these annotations

�19

Page 57: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Attaching Extended Metadata

�20

Page 58: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Attaching Extended Metadata

�20

Page 59: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Attaching Extended Metadata

�20

Page 60: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Attaching Extended Metadata

�20

Foreign key

Page 61: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Attaching Extended Metadata

�20

attach /images/1.jpeg ’1 cat and 1 guitar’

Foreign key

Page 62: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Attaching Extended Metadata

�20

attach /images/1.jpeg ’1 cat and 1 guitar’

Foreign key

Page 63: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Attaching Extended Metadata

�20

attach /images/1.jpeg ’1 cat and 1 guitar’

Foreign key

Free text search?

Page 64: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

Page 65: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS

Page 66: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

Page 67: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

Page 68: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1.jpeg

Page 69: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog

1.jpeg

Page 70: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog

1.jpeg

dog [1.jpeg,……]

Page 71: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

Page 72: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

Page 73: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog1 Cat and 1 Guitar

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

Page 74: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog1 Cat and 1 Guitar

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

?

Page 75: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog1 Cat and 1 Guitar

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

?

Store X

Page 76: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog1 Cat and 1 Guitar

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

?

Store X

Page 77: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

ePipe

Page 78: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

HopsFS

ePipe

Page 79: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

HopsFS

NDB ePipe

Page 80: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

HopsFS

NDB ePipe

Page 81: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

HopsFS

NDB

Log fs changes

ePipe

Page 82: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

Page 83: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

Page 84: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ElasticSearch

Page 85: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ElasticSearch

Store X

Store Y

Page 86: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ElasticSearch

Store X

Store Y

Page 87: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ElasticSearch

Store X

Store Y

App A

App B

Page 88: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ElasticSearch

Store X

Store Y

App A

App B

Page 89: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDBePipe

Page 90: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1

Page 91: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1

Epoch1

Create f1Append f1

Page 92: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1

Epoch1

Create f1Append f1

Page 93: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1

Create f1Append f1

Page 94: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2

Create f1Append f1

Create f2Delete f1

Page 95: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2

Create f1Append f1

Create f2Delete f1

Page 96: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2

Create f1Append f1

Create f2Delete f1

Delete f2

Page 97: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Page 98: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

Page 99: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

Page 100: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

Page 101: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

NDB Ordering Properties

• Property 1: epochs are totally ordered.

• Property 2: Changes within the same transaction happen in the same epoch.

• Property 3: Changes on files are ordered only if they are in different epochs, that is, no ordering is guaranteed within the same epoch

�24

Page 102: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

Page 103: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

, 1

, 1

Page 104: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

, 1, 2

, 1, 2

Page 105: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

, 1, 2

, 1, 2

Page 106: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

, 1, 2

, 1, 3

, 1, 2

, 1, 3

Page 107: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

, 1, 2

, 1, 3, 2

, 1, 2

, 1, 3

, 2

Page 108: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe Ordering Properties

•Property 4 & 5: Version number ensures serializability of changes on the same file/directory within epochs.

•Property 6: The order of changes for different files/directories within the same epoch doesn't matter.

�26

Page 109: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

ePipe

• Low replication lag (~100msec)

• High throughput

�27

Page 110: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Requirements

• Reading/Writing millions of images with high throughput

• Attaching annotations to each image, and then searching using these annotations

�28

Page 111: HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and Analytics Workshop, September 26th 2018. 2. 2. ... File7 Metadata File8 Metadata Distributed

Questions?

�29