byzantine fault tolerant cloud storage for storing sensor data

Post on 05-Dec-2014

278 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

My presentation for attaining my Master's Degree in Computing Science. Thesis links: http://irs.ub.rug.nl/dbi/522047fa91559 http://www.cs.rug.nl/~aiellom/tesi/vdtil.pdf

TRANSCRIPT

Cloud Storagefor storing sensor data

Jos van der Til

Byzantine fault tolerant

WHAT WAS THAT FIRST SLIDE?

I KNOW SOME OF THESE WORDS!SENSOR DATA

I KNOW SOME OF THESE WORDS!SENSOR DATAVARIES BY MEASUREMENT INTERVAL

I KNOW SOME OF THESE WORDS!SENSOR DATAVARIES BY MEASUREMENT INTERVAL

VARIES BY DIMENSIONALITY

I KNOW SOME OF THESE WORDS!SENSOR DATAVARIES BY MEASUREMENT INTERVAL

VARIES BY DIMENSIONALITY

VARIES BY SIZE

I KNOW SOME OF THESE WORDS!SENSOR DATAVARIES BY MEASUREMENT INTERVAL

VARIES BY DIMENSIONALITY

VARIES BY SIZE

IMPORTANT: IMAGES AND VIDEO ARE ALSO SENSOR DATA!

I KNOW SOME OF THESE WORDS!

CLOUD STORAGESENSOR DATA

I KNOW SOME OF THESE WORDS!

CLOUD STORAGESENSOR DATA

I KNOW SOME OF THESE WORDS!

CLOUD STORAGESENSOR DATAUNLIMITED STORAGE

I KNOW SOME OF THESE WORDS!

CLOUD STORAGESENSOR DATA

ACCESSIBLE FROM ANYWHERE

UNLIMITED STORAGE

I KNOW SOME OF THESE WORDS!

CLOUD STORAGESENSOR DATA

ACCESSIBLE FROM ANYWHERE

ACCESSIBLE ANYTIME

UNLIMITED STORAGE

I KNOW SOME OF THESE WORDS!

CLOUD STORAGESENSOR DATA

ACCESSIBLE FROM ANYWHERE

ACCESSIBLE ANYTIME

UNLIMITED STORAGE

PAY FOR WHAT YOU USE!

I KNOW SOME OF THESE WORDS!

CLOUD STORAGEFAULT TOLERANT

SENSOR DATA

I KNOW SOME OF THESE WORDS!

CLOUD STORAGEFAULT TOLERANT

SENSOR DATAPROCESSES ONLY CRASH

I KNOW SOME OF THESE WORDS!

CLOUD STORAGEFAULT TOLERANT

SENSOR DATAPROCESSES ONLY CRASH…RIGHT?

I KNOW SOME OF THESE WORDS!

CLOUD STORAGEFAULT TOLERANT

HOW BAD CAN IT GET? SENSOR DATA

I KNOW SOME OF THESE WORDS!

CLOUD STORAGEFAULT TOLERANT

BYZANTINE

SENSOR DATA

HOW DO PROCESSES FAIL?

HOW DO PROCESSES FAIL?

Fail stop Crash

HOW DO PROCESSES FAIL?

Fail stop Crash

Send OmissionReceive

Omission

General

Omission

HOW DO PROCESSES FAIL?

Fail stop Crash

Send OmissionReceive

Omission

General

Omission

Arbitrary failures

with message

authentication

HOW DO PROCESSES FAIL?

HOW DO PROCESSES FAIL?

Fail stop Crash

Send OmissionReceive

Omission

General

Omission

Arbitrary failures

with message

authentication

Arbitrary

(Byzantine)

failures

Storage clouds

Sensor Network

Measurements

Sensor server

Sto

rage

Lib

Measurements

Storage Lib

Processing server

Writer

Reader

HOW DO PROCESSES FAIL?

READERS ARE PROCESSES

HOW DO PROCESSES FAIL?

READERS ARE PROCESSES

HOW DO PROCESSES FAIL?

WRITERS ARE PROCESSES

READERS ARE PROCESSES

HOW DO PROCESSES FAIL?

WRITERS ARE PROCESSESbut they are cool.

READERS ARE PROCESSES

HOW DO PROCESSES FAIL?

WRITERS ARE PROCESSESbut they are cool.

can fail without causing damage.

are only expected to fail by crashing.

HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES

HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES

but they are NOT cool.

HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES

but they are NOT cool.

can leak your data

HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES

but they are NOT cool.

can corrupt your data

HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES

but they are NOT cool.

can delete your data

HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES

but they are NOT cool.

can stop responding to your requests

HOW DO PROCESSES FAIL?CLOUD PROVIDERS ARE PROCESSES

but they are NOT cool.

HAS FULL CONTROL OVER YOUR DATA BUT BEHAVES BYZANTINE

YOUR DATA IS STORED

IN A PROCESS THAT CAN FAIL

BYZANTINE

HOW TO ACHIEVE

BYZANTINE FAULT TOLERANCE?

DO NOT TRUST A SINGLE CLOUD!

DO TRUST MULTIPLE CLOUDS!

UPLOAD DATA TO ALL THE CLOUDS!

HOW MANY CLOUDS DO WE NEED?

𝑛 ≥ 3𝑓 + 1

HOW IS DATA STORED?

DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT

DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT

DATA CAN NOT BE RETRIEVED BY LESS THAN f+1 PROVIDERS

DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT

DATA CAN NOT BE RETRIEVED BY LESS THAN f+1 PROVIDERS

ENCRYPTION

SECRET SHARING

DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT

DATA CAN NOT BE RETRIEVED BY LESS THAN f+1 PROVIDERS

SHOULD NOT REQUIRE n TIMES THE STORAGE SPACE

DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT

DATA CAN NOT BE RETRIEVED BY LESS THAN f+1 PROVIDERS

SHOULD NOT REQUIRE n TIMES THE STORAGE SPACE

ERASURE CODING

DATA IS STORED IN A QUORUM OF CLOUD PROVIDERSQUORUM OF 2f+1 PROVIDERS TO BE EXACT

DATA CAN NOT BE RETRIEVED BY LESS THAN f+1 PROVIDERS

SHOULD NOT REQUIRE n TIMES THE STORAGE SPACE

ERASURE CODING𝑛

𝑓 + 1

−1

=𝑓 + 1

𝑛

lim𝑓→∞

𝑓 + 1

𝑛=

𝑓 + 1

3𝑓 + 1=1

3

SPACE EFFICIENCY

MAXIMUM SPACE EFFICIENCY

lim𝑓→∞

1

𝑛= 0NOT BAD COMPARED TO

TRADITIONAL APPROACHDATA IS A BLOCK!

TRADITIONAL APPROACHDATA IS A BLOCK!

I WANT TO READ THIS BLOCK

TRADITIONAL APPROACHDATA IS A BLOCK!

I WANT TO ENCRYPT THIS BLOCK

TRADITIONAL APPROACHDATA IS A BLOCK!

I WANT TO HASH THIS BLOCK

TRADITIONAL APPROACHDATA IS A BLOCK!

I WANT TO UPLOAD THIS BLOCK

TRADITIONAL APPROACHDATA IS A BLOCK!

I WANT TO DOWNLOAD THIS BLOCK

TRADITIONAL APPROACHDATA IS A BLOCK!

BLOCK DOES NOT FIT IN MEMORY

TRADITIONAL APPROACHDATA IS A BLOCK!

BLOCK DOES NOT FIT IN MEMORY

:(

NOW WHAT?

MY APPROACHDATA IS A STREAM!

MY APPROACHDATA IS A STREAM…of blocks!

MY APPROACHDATA IS A STREAM…of blocks!

Every block should fit into memory

MY APPROACHDATA IS A STREAM…of blocks!

Every block should fit into memory

Every block is processed independent

MY APPROACHDATA IS A STREAM…of blocks!

Every block should fit into memory

Every block is processed independent

Every block has a checksum (think BitTorrent)

MY APPROACHHAS MORE OVERHEAD IN MEMORY AND STORAGE

MY APPROACHHAS MORE OVERHEAD IN MEMORY AND STORAGE

MY APPROACHHAS MORE OVERHEAD IN MEMORY AND STORAGE

REQUIRES LESS MEMORY FOR PROCESSING

MY APPROACHHAS MORE OVERHEAD IN MEMORY AND STORAGE

REQUIRES LESS MEMORY FOR PROCESSING

CAN FAIL FASTER, SAVING BANDWIDTH

WHEN IS THIS USED?

WHEN IS THIS USED?

DATA IS PETABYTE SCALE

DATA IS PETABYTE SCALE

…OR ATLEAST A COUPLE OF 100 TERABYTES A YEAR

WHEN IS THIS USED?

DATA IS PETABYTE SCALE

…OR ATLEAST A COUPLE OF 100 TERABYTES A YEAR

DATA IS CONTINOUSLY ADDED

WHEN IS THIS USED?

DATA IS PETABYTE SCALE

…OR ATLEAST A COUPLE OF 100 TERABYTES A YEAR

DATA IS CONTINOUSLY ADDED

TIME BETWEEN BATCHES IS TOO LONG

WHEN IS THIS USED?

DATA IS PETABYTE SCALE

…OR ATLEAST A COUPLE OF 100 TERABYTES A YEAR

DATA IS CONTINOUSLY ADDEDTIME BETWEEN BATCHES IS TOO LONG

KEEPING THOUSANDS OF MACHINES RUNNING IS EXPENSIVE

WHEN IS THIS USED?

DATA IS PETABYTE SCALE

…OR ATLEAST A COUPLE OF 100 TERABYTES A YEAR

DATA IS CONTINOUSLY ADDEDTIME BETWEEN BATCHES IS TOO LONG

KEEPING THOUSANDS OF MACHINES RUNNING IS EXPENSIVE

WHAT IF MY HADOOP CLUSTER IS DESTROYED?

WHEN IS THIS USED?

New Data

All data

Streaming cluster

Batch cluster Batch view

Realtime view

ClientQuery

Query

WHY NOT HADOOP?

WHY NOT HADOOP?

HADOOP STILL HAS ITS PLACE

WHY NOT HADOOP?

HADOOP STILL HAS ITS PLACE

JUST NOT FOR STORAGE

New Data

All data

Streaming cluster

Batch cluster Batch view

Realtime view

ClientQuery

Query

OK…WHY NOT HADOOP STORAGE?

OK…WHY NOT HADOOP STORAGE?BATCH LAYER IS AN EXCELLENT PLACE FOR HADOOP PROCESSING

OK…WHY NOT HADOOP STORAGE?BATCH LAYER IS AN EXCELLENT PLACE FOR HADOOP PROCESSING

BUT HADOOP STORAGE IS EXPENSIVE!

OK…WHY NOT HADOOP STORAGE?BATCH LAYER IS AN EXCELLENT PLACE FOR HADOOP PROCESSING

BUT HADOOP STORAGE REQUIRES MAINTENANCE!

OK…WHY NOT HADOOP STORAGE?BATCH LAYER IS AN EXCELLENT PLACE FOR HADOOP PROCESSING

BUT HADOOP STORAGE IS ONLINE EVEN WHEN IDLE!

OK…WHY NOT HADOOP STORAGE?BATCH LAYER IS AN EXCELLENT PLACE FOR HADOOP PROCESSING

BUT HADOOP STORAGE CONSUMES LOTS OF ENERGY!

DOES THIS WORK?

PERFORMANCE

Requests done by 16 threads concurrently

PERFORMANCE

Requests done by 16 threads concurrently

8 core virtual machine

at least 4 GB RAM (but often > 16GB)

PERFORMANCE

Requests done by 16 threads concurrently

8 core virtual machine

At least 4 GB RAM (but often > 16GB)

f = 1, thus n = 4

PERFORMANCE

Requests done by 16 threads concurrently

8 core virtual machine

At least 4 GB RAM (but often > 16GB)

f = 1, thus n = 4

Two implementations:

Streaming DepSky-A

Streaming DepSky-CA

PERFORMANCE

Throughput downstream (per thread):

Filesize 4MB, 750 KB/second (90th percentile)

Filesize 8MB, 1 MB/second (90th percentile)

Throughput upstream (per thread):

Filesize 4MB, 1.2 MB/second (90th percentile)

Filesize 8MB, 1.7 MB/second (90th percentile)

0.9997

0.9998

0.9999

1.0000

0 5 10 15 20 25

log2(Filesize (b))

Success r

ate

HTTP Verb

GET

PUT

DELETE

LIST

Streaming DepSky-A

0.994

0.996

0.998

1.000

0 5 10 15 20 25

log2(Filesize (b))

Success r

ate

HTTP Verb

GET

PUT

DELETE

LIST

Streaming DepSky-CA

AVAILABILITY

thanks!

Thesis available at:

http://www.cs.rug.nl/~aiellom/tesi/vdtil.pdf

top related