cs 347: distributed databases and transaction...

56
CS 347 Lecture 1 1 CS 347: Distributed Databases and Transaction Processing Notes 01: Introduction Hector Garcia-Molina

Upload: others

Post on 17-Apr-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 1

CS 347: Distributed Databases and

Transaction Processing

Notes 01: Introduction

Hector Garcia-Molina

Page 2: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 2

In CS245: Centralized DB system

Software:

ApplicationSQL Front EndQuery ProcessorTransaction Proc.File Access

P

M ...

• Simplifications:• single front end• one place to keep locks• if processor fails, system fails, ...

Page 3: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 3

In CS347

• Multiple processors ( + memories)• Heterogeneity and autonomy of

“components”

Page 4: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 4

Multiple processors

• Opportunity for parallelism• Opportunity for reliability• Synchronization issues

➳ To illustrate synchronization problems: Two Generals Problem

Page 5: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 5

The one general problem (Trivial!)

➨ Battlefield

G

Troops

Page 6: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 6

The two general problem:

<------------------------------->messengers

Blue army Red army

BlueG Red

G

Enemy

Page 7: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 7

• Blue and red army must attack at same time

• Blue and red generals synchronize through messengers

• Messengers can be lost

Rules:

Page 8: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 8

How Many Messages Do We Need?

BG RG

attack at 9am

assume blue starts...

Is this enough??

Page 9: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 9

How Many Messages Do We Need?

BG RG

attack at 9am

assume blue starts...

Is this enough??

ack (red goes at 9am)

Page 10: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 10

How Many Messages Do We Need?

BG RG

attack at 9am

assume blue starts...

Is this enough??

ack (red goes at 9am)

got ack

Page 11: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 11

Stated problem is Impossible!

• Theorem: There is no protocol that uses a finite number of messages that solves the two-generals problem (as stated here)

Alternatives??

Page 12: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 12

Probabilistic Approach?

• Send as many messages as possible, hope one gets through...

BG RG

attack at 9am

assume blue starts...

attack at 9am

attack at 9am

attack at 9am

Page 13: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 13

Eventual Commit

• Eventually both sides attack...

BG RG

attack ASAP

assume blue starts...

on my way!

retransmitsretransmits

Page 14: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 14

2-Phase Eventual Commit

• Eventually both sides attack...

BG RG

ready to attack?

assume blue starts...

yes, at your disposal

attack ASAP

ack

retransmits

retransmits

phase 1

phase 2

Page 15: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 15

Commit Protocols

• Will study commit protocols like these...

Page 16: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 16

Heterogeneity

Select new

investmentsApplication

RDBMS FilesStocktickertape

PortfolioHistory ofdividends,ratios,...

Page 17: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 17

Autonomy

Example: unable to get statisticsfor query optimization

Example: blue general may have mind of his (or her) own!

Page 18: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 18

• So, in CS347 we study data management

with multiple processors and possible

autonomy, heterogeneity– Impact on:

• Data organization• Query processing• Access structures• Concurrency control• Recovery

Page 19: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 19

• Renewed Interest in Distributed/ParallelData Processing!– Massive web data, manage with many computers

– How to crawl and search the web?

– Peer-to-peer systems manage huge amounts of data

– Data from many sources (e.g., comparison shopping): how to integrate?

– Sensor Networks: data generated an many sensors/devices, need to analyze

– Multi-player games (e.g., Second Life): tons of distributed data

Page 20: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 20

It’s the Economy, Stupid!

• Example: Multi-player games

Data

state

P

P

PP

P

P

P

P

P

P

Page 21: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 21

It’s the Economy, Stupid!

• Example: Multi-player games

Data

state

P

P

PP

P

P

P

P

P

P

state

Page 22: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 22

Logistics• LECTURES: Mondays and Wednesdays 12:50pm to

2:05pm, Skilling 193

• INSTRUCTOR: Hector Garcia-Molina; Office: Gates Hall 434 Email: [email protected]; Office Hours: Mondays, Wednesdays 11am to 12noon.

• ADDITIONAL INSTRUCTOR: Zoltan Gyongyi; Email: [email protected].

• TEACHING ASSISTANT: Petros Venetis; Office: Gates 424; Email: [email protected]; News Group: su.class.cs347; Office Hours: TBD

• SECRETARY: Marianne Siroker; Office: Gates Hall 436;Email: [email protected]; Phone: (650) 723-0872

Page 23: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 23

Logistics• TEXTBOOK: No required textbook. Some material for the

lectures will be drawn from the following book:– M. Tamer Ozsu and Patrick Valduriez, "Principles of

Distributed Database Systems," Second Edition, Prentice Hall 1999.

• CLASS WEB PAGE: http://www.stanford.edu/class/cs347Will contain homework assignments, course news, etc. Be sure to check it periodically.

• ASSIGNMENTS: about 5 homeworks

• GRADING: Homeworks: 20%, Midterm 30%, Final: 50%.

Page 24: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 24

Tentative Syllabus 2010 (Part I)

DATE TOPIC• Monday March 29 Introduction [N01]• Wednesday March 31 Data Fragmentation [N02] Z• Monday April 5 Query processing [N03] Z• Wednesday April 7 Query processing [N03] Z• Monday April 12 Query Optimization [N04] Z• Wednesday April 14 Concurrency Control, Failures [N06]• Monday April 19 Reliable Data Management [N07]• Wednesday April 21 Reliable Data Management [N07]• Monday April 26 Replicated Data Management [N08]• Wednesday April 28 Midterm

Page 25: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 25

Tentative Syllabus 2010 (Part II)DATE TOPIC

• Monday May 3 Network Partitions [N09]• Wednesday May 5 Peer to Peer Systems [N05]• Monday May 10 Peer to Peer Systems [N05]• Wednesday May 12 Map-Reduce Z• Monday May 17 Distributed IR Z• Wednesday May 19 Distributed Entity Resolution [new]

• Monday May 24 Time [N10]• Wednesday May 26 Heterogeneous Systems, Source Capabilities

[N11, 12, 13] Z• Wednesday June 2 Extra Lecture (have one more this year!)

Page 26: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 26

Concepts you should be familiar with:

• CS245: query plan, cost estimation, join algorithms, recovery, logging,…

• Interconnection networks (bus, mesh, hypercube,…)

• Computer networks (LAN, WAN,…)

Page 27: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 27

Introductory topics

• Database architectures• Client-server systems• Distributed vs. parallel DB systems• Cloud Computing

Page 28: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 28

DB architectures

(1) Shared memory

P P P...

M

Page 29: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 29

DB architectures

(2) Shared disk

...

...

P

M

P P

M M

Page 30: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 30

DB architectures

(3) Shared nothing

P

M

P

M

P

M

...

Page 31: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 31

DB architectures

(4) Hybrid example

M

P P P...

M

P P P...

Page 32: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 32

DB architectures

(4) Hybrid example 2

P

M

P

M

P

M

P

M

... ... ...

WAN

LAN #1

R R

LAN #2

Page 33: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 33

DB architectures

(4) Hybrid Tandem-like

P

M

P

M

P

M

P

M

...

Page 34: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 34

DB architectures

(5) Unusual?Datacycle (Broadcast disks)

P P P

MMM

Entire DB broadcast

Page 35: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 35

(5) Unusual Sorting network

Sortnet

P

M

P

M

...

Page 36: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 36

(5) Unusual — processor per track or processor per disk

M

P

P’

P’

P’

...

“small” processors+ “tiny” memories

Page 37: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 37

(6) Unusual — sensor networks

P’

M

M

PB

M

PB

M

PB

M

PB

M

PB

data collection node

sensor

battery

Page 38: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 38

Issues for selecting architecture

• Reliability• Scalability• Geographic distribution of data• Data “clusters”• Performance• Cost

Page 39: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 39

Client-Server Systems(or how to partition software)

ApplicationFront EndQuery ProcessorTransaction ProcessingFile Access

client

server

Page 40: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 40

Client-Server Systems(or how to partition software)

ApplicationFront EndQuery ProcessorTransaction ProcessingFile Access

client

server

Page 41: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 41

Client-Server Systems(or how to partition software)

ApplicationFront EndQuery ProcessorTransaction ProcessingFile Access

client

server

Page 42: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 42

Transaction Servers

• Clients ship transactions consisting of 1 or more SQL commands

E.g., Open DataBase Connectivity (ODBC)

(standard API)

Page 43: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 43

Data Servers

• Client requests pages or records• Popular for OODB systems

Page 44: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 44

Issues

• Object granularity• Where is data cached?• Where is locking done?

Page 45: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 45

Basic Tradeoff

•Offloading work to clients

•Data transmitted

C C

SS

Get pages

Reservehotel room

Page 46: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 46

Note: Similar issues arise when we partition software/functionality within server

Reservehotel room P

M

P

M

P

M

...

•Where is data cached?

•Where is locking done?

Page 47: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 47

Parallel or distributed DB system?• More similarities than differences!

Page 48: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 48

• Typically, parallel DBs:– Fast interconnect– Homogeneous software– High performance is goal– Transparency is goal

Page 49: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 49

• Typically, distributed DBs:– Geographically distributed– Data sharing is goal (may run into

heterogeneity, autonomy)– Disconnected operation possible

Page 50: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

Cloud Computing

• Is CC just a marketing term??– utility (like power)– data or CPU cycles?– many processors, many storage units– business model

CS 347 Lecture 1 50

Page 51: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

Is CC a subset, superset, disjoint from, or overlaps with:

• grid computing• distributed computing• Web 2.0• Cluster Computing• Peer-to-peer computing• software as a service• client-server computing• data center as a computer• massively parallel computing

CS 347 Lecture 1 51

(A)

CC(B)

CC(C)

CC(D)

CC

Page 52: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

Clash of the Clouds (Economist April 4, 2009)

CS 347 Lecture 1 52

Page 53: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CC Issues

• Customer lock-in• Privacy• Standards• Software licensing

CS 347 Lecture 1 53

Page 54: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 54

Next

• How to describe distributed data

• Query processing in parallel DBs

• Query processing in distributed DBs

Page 55: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 55

Query processing in parallel DBs:

• Typically: we can distribute/partition/sort…. data to make certain DBoperations (e.g., Join) fast

Page 56: CS 347: Distributed Databases and Transaction Processingi.stanford.edu/~venetis/cs347/notes/347Notes01.pdf · CS 347 Lecture 1 23 Logistics • TEXTBOOK: No required textbook. Some

CS 347 Lecture 1 56

Query processing in distributed DBs:

• Typically: we are given data distribution; we need to find query processing strategy to minimize cost

(e.g., communication cost)