©2011 hewlett-packard company and vertica confidential11 cloud storage challenges dr. dinkar...

20
1 Cloud Storage Challenges Dr. Dinkar Sitaram [email protected]

Upload: jordon-edwardson

Post on 31-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

©2011 Hewlett-Packard Company and Vertica Confidential11

Cloud Storage Challenges

Dr. Dinkar Sitaram

[email protected]

Page 2: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

2

Overview

– Types of cloud storage

– Building cloud-scale storages

– Challenges: theoretical considerations

– Dealing with the challenges

Based on Moving to the Cloud by Dinkar Sitaram & Geetha Manjunath,to be published by Elsevier

Page 3: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

©2011 Hewlett-Packard Company and Vertica Confidential33

Types of cloud storage

Page 4: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

4

File-based cloud storage

– Allow storage of files in cloud

– Amazon S3, Windows Azure, …

– Built on top of HTTP

– Amazon S3 Overview• Create bucket, objects

• GET http://dinkar.s3.amazon.aws.com/project/file.c

• No directories: file names

• Need AWS Access Key and AWS Secret Key

– Region: geographical

Page 5: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

5

Database oriented cloud storage

– Offers a database service

– Examples: Amazon RDS (MySQL), Windows Azure SQL

– RDS examples• Can administer (e.g., create, replicate) database using Amazon RDS

APIs− Db.createDBInstanceAsync (parms) creates a database

• Use JDBC APIs to build applications− ResultSet rs = stmt.executeQuery (“SELECT * FROM Employee”)

Page 6: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

6

Key-value stores

– Database consists of <key, value> pairs• No schema as in relational databases

• Typically data need not be normalized

• More flexible than RDBMS, scales due to fewer restrictions

• More work in application (e.g., valid values) to guarantee traditional RDBMS qualities

– Examples: Amazon SimpleDB, Google BigTable, Hadoop HBase

– Programming example (SDB)• Google SimpleJDBC

• String insert = "INSERT INTO employees (name, title) VALUES (‘Dinkar', ‘Architect’)";

• int val = st.executeUpdate(insert);

Page 7: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

7

XML databases

– Store XML documents

– Examples: MongoDB• Stores JSON documents { “Name”: “Dinkar”, “Attributes”: {“Sex”: “M”,

“Title”: “Architect”} }

• Documents can have pointers to other documents

• Index on any attribute (including embedded): db.Orders.EnsureIndex()

• Searching: db.orders.find()

– XML DBs midway between key-value stores and RDBMS• Explicitly create indices

• More complex structures

• Some XML DBs, e.g., CouchDB, offer transactions

Page 8: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

©2011 Hewlett-Packard Company and Vertica Confidential88

Building cloud-scale storage

Page 9: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

9

Cloud storage requirements

– Scaling to cloud-scale: partitioning

– Availability: replication

Page 10: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

10

Partitioning strategies

– Similar to methods for partitioning databases

– Round-robin on partitioning attributes• Loses associativity

– Hash partitioning

– Range-based

– Directory-based• Memcached

• Can provide, e.g., geographical partitioning

– References: Parallel database systems: the future of high performance database systems, by DeWitt, D and Gray, J, Communications of the ACM, Volume 35 Issue 6, June 1992.

Page 11: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

11

Amazon availability

– Multiple availability zones per regions• Zones failure isolated from each other

– Data replicated across 3 availability zones by default

Page 12: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

©2011 Hewlett-Packard Company and Vertica Confidential1212

Challenges: Theoretical considerations

Page 13: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

13

CAP theorem

– Fundamental limitation of distributed systems

– No distributed system can satisfy all three properties below• Conjectured in [Brewer00], proved in [LynGil02] by considering a two-node cluster

• Consistency: all operations appear to be serialized on a non-distributed object

• Availability: every operation returns a result

• Partition-tolerance: Arbitrary number of messages between service nodes are lost

– References1.[Brewer00] Towards Robust Distributed Systems by Eric A. Brewer, ACM Symposium on Principles of Distributed Systems, July 16-19 2000, Portland, Oregon

2.[LynGil02] Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services, by Nancy Lynch and Seth Gilbert, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59

Page 14: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

14

2-node example

1. Servers replicated for availability

2. If network partitions3.Allow servers to operate independently (inconsistent) OR

4. Bring servers down (no availability)

Page 15: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

15

Practical example: Netflix

– Netflix: video on demand over the Internet

– Runs on Amazon cloud

– Consider the following scenario• User at TV updates list of favorites• Load balancer sends update to server

1• Set top box requests favorites list• Load balancer sends update to server

2• Is the returned result consistent?

Depends!

– Comparing NoSQL Availability Models by Adrian Cockcroft, http://perfcap.blogspot.com/2010/10/comparing-nosql-availability-models.html

Page 16: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

©2011 Hewlett-Packard Company and Vertica Confidential1616

Dealing with inconsistency predicted by CAP theorem

Page 17: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

17

Relaxed consistency

– Consistency can be relaxed• Weak consistency: system does not guarantee to return consistent

results• Eventual consistency: if no further updates, system will become

consistent. If updates are infrequent, can wait for some time to get consistent value

• Read your writes consistency: a client performing a read after a write will always see its own updates

• Session consistency: consistency within a session

– Amazon S3• US Standard Region: Eventual consistency• US West, EU, Asia Pacific Regions: Read your writes consistency for new

object creation, eventual consistency for writes and deletes

– Reference: Eventual Consistency by Werner Vogel, Communications of the ACM, January 2009

Page 18: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

18

Example: Handling inconsistency

– BASE: an alternative to ACID [Brewer00]

• Basically Available

• Soft-state

• Eventually consistent

– Example: online shopping portal• User table: transactions by user

• Transaction table: transactions used for billing

• How do we update both tables after a purchase?

– Traditional database method• Begin transaction

• Update User table

• Update Transaction table

• End transaction

– BASE, an ACID Alternative, by D. Pritchett, ACM Queue, June 2008

– A common cloud Method• Queue update to user table

• Queue update to transaction table

– Databases could be inconsistent

– Will become eventually consistent

User table Transaction table

Application

Page 19: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

©2011 Hewlett-Packard Company and Vertica Confidential1919

Conclusions

Page 20: ©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram dinkar.sitaram@hp.com

20

Conclusions

– Many alternatives for building cloud storage exist

– Careful trade-off between consistency and availability