cs 347lecture 11 cs 347: parallel and distributed data management notes 01: introduction brian...
TRANSCRIPT
![Page 1: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/1.jpg)
CS 347 Lecture 1 1
CS 347: Parallel and Distributed
Data Management
Notes 01: Introduction
Brian Cooper
Based on Material by Hector Garcia-Molina
![Page 2: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/2.jpg)
CS 347 Lecture 1 2
In CS245: Centralized DB system
Software:
ApplicationSQL Front EndQuery ProcessorTransaction Proc.File Access
P
M ...
• Simplifications:• single front end• one place to keep locks• if processor fails, system fails, ...
![Page 3: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/3.jpg)
CS 347 Lecture 1 3
In CS347
• Multiple processors ( + memories)• Heterogeneity and autonomy of
“components”
![Page 4: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/4.jpg)
CS 347 Lecture 1 4
Multiple processors
• Opportunity for parallelism• Opportunity for reliability• Synchronization issues
To illustrate synchronization problems: Two Generals Problem
![Page 5: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/5.jpg)
CS 347 Lecture 1 5
The one general problem (Trivial!)
Battlefield
G
Troops
![Page 6: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/6.jpg)
CS 347 Lecture 1 6
The two general problem:
<------------------------------->messengers
Blue army Red army
BlueG Red
G
Enemy
![Page 7: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/7.jpg)
CS 347 Lecture 1 7
• Blue and red army must attack at same time
• Blue and red generals synchronize through messengers
• Messengers can be lost
Rules:
![Page 8: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/8.jpg)
CS 347 Lecture 1 8
How Many Messages Do We Need?
BG RG
attack at 9am
assume blue starts...
Is this enough??
![Page 9: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/9.jpg)
CS 347 Lecture 1 9
How Many Messages Do We Need?
BG RG
attack at 9am
assume blue starts...
Is this enough??
ack (red goes at 9am)
![Page 10: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/10.jpg)
CS 347 Lecture 1 10
How Many Messages Do We Need?
BG RG
attack at 9am
assume blue starts...
Is this enough??
ack (red goes at 9am)
got ack
![Page 11: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/11.jpg)
CS 347 Lecture 1 11
Stated problem is Impossible!
• Theorem: There is no protocol that uses a finite number of messages that solves the two-generals problem (as stated here)
Alternatives??
![Page 12: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/12.jpg)
CS 347 Lecture 1 12
Probabilistic Approach?
• Send as many messages as possible, hope one gets through...
BG RG
attack at 9am
assume blue starts...
attack at 9am
attack at 9am
attack at 9am
![Page 13: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/13.jpg)
CS 347 Lecture 1 13
Eventual Commit
• Eventually both sides attack...
BG RG
attack ASAP
assume blue starts...
on my way!
retransmitsretransmits
![Page 14: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/14.jpg)
CS 347 Lecture 1 14
Eventual Commit
• One message sent every time unit• Probability of success one message is p• What is probability that red commits by
time t?
BG RGattack ASAP
on my way!
retransmitsretransmits
![Page 15: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/15.jpg)
CS 347 Lecture 1 15
Eventual Commit
BG RGattack ASAP
on my way!
retransmitsretransmits
• C(1) = p
![Page 16: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/16.jpg)
CS 347 Lecture 1 16
Eventual Commit
BG RGattack ASAP
on my way!
retransmitsretransmits
• C(1) = p• C(2) = p + (1-p)p
![Page 17: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/17.jpg)
CS 347 Lecture 1 17
Eventual Commit
BG RGattack ASAP
on my way!
retransmitsretransmits
• C(1) = p• C(2) = p + (1-p)p• C(3) = p + (1-p)p + (1-p)2p • C(4) = p + (1-p)p + (1-p)2p + (1-
p)3p
![Page 18: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/18.jpg)
Eventual Commit
CS 347 Lecture 1 18
C(t)
t
p
![Page 19: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/19.jpg)
CS 347 Lecture 1 19
Eventual Commit
BG RGattack ASAP
on my way!
retransmitsretransmits
• How expensive is protocol?• E = expected number of messages• Homework: compute E (function of
p)
![Page 20: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/20.jpg)
CS 347 Lecture 1 20
2-Phase Eventual Commit
• Eventually both sides attack...
BG RG
ready to attack?
assume blue starts...
yes, at your disposal
attack ASAP
ack
retransmits
retransmits
phase 1
phase 2
![Page 21: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/21.jpg)
CS 347 Lecture 1 21
Commit Protocols
• Will study commit protocols like these...
![Page 22: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/22.jpg)
CS 347 Lecture 1 22
Heterogeneity
Select new
investmentsApplication
RDBMS FilesStocktickertape
Portfolio History ofdividends,ratios,...
![Page 23: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/23.jpg)
CS 347 Lecture 1 23
Autonomy
Example: unable to get statisticsfor query optimization
Example: blue general may have mind of his (or her) own!
![Page 24: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/24.jpg)
CS 347 Lecture 1 24
• So, in CS347 we study data management
with multiple processors and possible
autonomy, heterogeneity– Impact on:
• Data organization• Query processing• Access structures• Concurrency control• Recovery
![Page 25: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/25.jpg)
CS 347 Lecture 1 25
• Renewed Interest in Distributed/ParallelData Processing!– Massive web data, manage with many
computers– How to crawl and search the web?– Peer-to-peer systems manage huge amounts of
data– Data from many sources (e.g., comparison
shopping): how to integrate?– Sensor Networks: data generated an many
sensors/devices, need to analyze– Multi-player games (e.g., World of Warcraft):
tons of distributed data
![Page 26: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/26.jpg)
CS 347 Lecture 1 26
It’s the Economy, Stupid!
• Example: Multi-player games
Data
state
P
P
PP
P
P
P
P
P
P
![Page 27: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/27.jpg)
CS 347 Lecture 1 27
It’s the Economy, Stupid!
• Example: Multi-player games
Data
state
P
P
PP
P
P
P
P
P
P
state
![Page 28: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/28.jpg)
CS 347 Lecture 1 28
Logistics• LECTURES: Mondays and Wednesdays 12:50pm
to 2:05pm, Gates B01• INSTRUCTOR: Brian Cooper; Email:
[email protected]; Office Hours: Mondays, Wednesdays 11:30 to 12:30.
• TEACHING ASSISTANT: – Vasilis Verroios, Email: [email protected]
• Piazza forum (tentative): https://piazza.com/class#spring2015/cs347.
• SECRETARY: Marianne Siroker; Office: Gates Hall 435;Email: [email protected]; Phone: (650) 723-0872
![Page 29: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/29.jpg)
CS 347 Lecture 1 29
Logistics
• TEXTBOOK: No required textbook. You'll be expected to read several research papers.
• CLASS WEB PAGE: http://www.stanford.edu/class/cs347Will contain homework assignments, course news, etc. Be sure to check it periodically.
• ASSIGNMENTS: about 5 homeworks• GRADING: Homeworks: 20%, Midterm 30%, Final:
50%.
![Page 30: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/30.jpg)
CS 347 Lecture 1 30
Tentative Syllabus 2014 (Part I)
DATE TOPIC• Monday March 30 Introduction [N01]• Wednesday April 1 Data Fragmentation [N02]• Monday April 6 Query processing [N03]• Wednesday April 8Query processing & Optimization [N04]• Monday April 13 Concurrency Control, Failures [N05]• Wednesday April 15 Reliable Data Management [N06]• Monday April 20 Reliable Data Management [N06]• Wednesday April 22 Guest Lecture – Cloudera Impala• Monday April 27 Replicated Data Management
[N07]• Wednesday April 29 Midterm
![Page 31: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/31.jpg)
CS 347 Lecture 1 31
Tentative Syllabus 2014 (Part II)DATE TOPIC
• Monday May 4 Partitions, Entity Resolution [N11]• Wednesday May 6 Peer to Peer Systems [N08]• Monday May 11 Map-Reduce & Pig [N09]• Wednesday May 13 Map-Reduce & Pig [N09]• Monday May 18 Other Open Source Systems [N09b]• Wednesday May 20 Other Open Source Systems [N09b]• Wednesday May 27 Distributed IR [N10]• Monday June 1 Time [N12]• Wednesday June 3 Publish/Subscribe Systems [N13]• Monday June 8 8:30 am!!! FINAL EXAM
![Page 32: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/32.jpg)
Interesting New Systems
• Storm (from Twitter)• S4 (from Yahoo)• Cassandra (key-value store)• Hive (SQL over Hadoop)• Pregel (graph execution)• Kestrel (queues?)• ZooKeeper (replicated data)• Sparkl or Spark (Berkeley?)• H-Base• HyRacks (UC Irvine)
CS 347 Lecture 1 32
• MemCache-D • Pnuts (Yahoo)• Dynamo (Amazon)• Spanner (Google)• Paxos• G-Store (UC Santa Barbara)• Elastras (UC Santa Barbara)• Tao (Facebook)• Kafka (LinkedIn)• Impala (Cloudera)
![Page 33: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/33.jpg)
CS 347 Lecture 1 33
Concepts you should be familiar with:
• CS245: query plan, cost estimation, join algorithms, recovery, logging,…
• Interconnection networks (bus, mesh, hypercube,…)
• Computer networks (LAN, WAN,…)
![Page 34: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/34.jpg)
CS 347 Lecture 1 34
Introductory topics
• Database architectures• Client-server systems• Distributed vs. parallel DB systems• Cloud Computing
![Page 35: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/35.jpg)
CS 347 Lecture 1 35
DB architectures
(1) Shared memory
P P P...
M
![Page 36: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/36.jpg)
CS 347 Lecture 1 36
DB architectures
(2) Shared disk
...
...
P
M
P P
M M
![Page 37: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/37.jpg)
CS 347 Lecture 1 37
DB architectures
(2B) Shared data storage (disk or file?)
...
...
P
M
P P
M M
• storage area network (SAN)• Hadoop/Google file system
![Page 38: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/38.jpg)
CS 347 Lecture 1 38
DB architectures
(3) Shared nothing
P
M
P
M
P
M
...
![Page 39: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/39.jpg)
CS 347 Lecture 1 39
DB architectures
(4) Hybrid example
M
P P P...
M
P P P...
![Page 40: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/40.jpg)
CS 347 Lecture 1 40
DB architectures
(4) Hybrid example 2
P
M
P
M
P
M
P
M
... ... ...
WAN
LAN #1
R R
LAN #2
![Page 41: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/41.jpg)
CS 347 Lecture 1 41
DB architectures
(4) Hybrid Tandem-like also in: Microsoft SQLServer Parallel Data Warehouse
P
M
P
M
P
M
P
M
...
![Page 42: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/42.jpg)
CS 347 Lecture 1 42
DB architectures
(5) Unusual?Datacycle (Broadcast disks)
P P P
MMM
Entire DB broadcast
![Page 43: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/43.jpg)
CS 347 Lecture 1 43
(5) Unusual Sorting network
Sortnet
P
M
P
M
...
![Page 44: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/44.jpg)
CS 347 Lecture 1 44
(5) Unusual — processor per track or processor per disk
M
P
P’
P’
P’
...
“small” processors+ “tiny” memories
Related idea inOracle Exadata"DB machine"
![Page 45: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/45.jpg)
CS 347 Lecture 1 45
(6) Unusual — sensor networks
P’
M
M
PB
M
PB
M
PB
M
PB
M
PB
data collection node
sensor
battery
![Page 46: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/46.jpg)
CS 347 Lecture 1 46
Issues for selecting architecture
• Reliability• Scalability• Geographic distribution of data• Data “clusters”• Performance• Cost
![Page 47: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/47.jpg)
CS 347 Lecture 1 47
Client-Server Systems
(or how to partition software)
ApplicationFront EndQuery ProcessorTransaction ProcessingFile Access
client
server
![Page 48: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/48.jpg)
CS 347 Lecture 1 48
Client-Server Systems
(or how to partition software)
ApplicationFront EndQuery ProcessorTransaction ProcessingFile Access
client
server
![Page 49: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/49.jpg)
CS 347 Lecture 1 49
Client-Server Systems
(or how to partition software)
ApplicationFront EndQuery ProcessorTransaction ProcessingFile Access
client
server
![Page 50: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/50.jpg)
CS 347 Lecture 1 50
Transaction Servers
• Clients ship transactions consisting of 1 or more SQL commands
E.g., Open DataBase Connectivity (ODBC)
(standard API)
![Page 51: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/51.jpg)
CS 347 Lecture 1 51
Data Servers
• Client requests pages or records• Popular for OODB systems
![Page 52: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/52.jpg)
CS 347 Lecture 1 52
Issues
• Object granularity• Where is data cached?• Where is locking done?
![Page 53: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/53.jpg)
CS 347 Lecture 1 53
Basic Tradeoff
• Offloading work to clients• Data transmitted
C C
SS
Get pages
Reservehotel room
![Page 54: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/54.jpg)
CS 347 Lecture 1 54
Note: Similar issues arise when we partition software/functionality within server
Reservehotel room P
M
P
M
P
M
...
•Where is data cached?
•Where is locking done?
![Page 55: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/55.jpg)
CS 347 Lecture 1 55
Parallel or distributed DB system?• More similarities than differences!
![Page 56: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/56.jpg)
CS 347 Lecture 1 56
• Typically, parallel DBs:– Fast interconnect– Homogeneous software– High performance is goal– Transparency is goal
![Page 57: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/57.jpg)
CS 347 Lecture 1 57
• Typically, distributed DBs:– Geographically distributed– Data sharing is goal (may run into heterogeneity, autonomy)– Disconnected operation possible
![Page 58: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/58.jpg)
Cloud Computing
• Is CC just a marketing term??– utility (like power)– data or CPU cycles?– many processors, many storage units– business model
CS 347 Lecture 1 58
![Page 59: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/59.jpg)
Is CC a subset, superset, disjoint from, or overlaps with:
• grid computing• distributed computing• Web 2.0• Cluster Computing• Peer-to-peer computing• software as a service• client-server computing• data center as a computer• massively parallel
computingCS 347 Lecture 1 59
(A)
CC(B)
CC(C)
CC(D)
CC
![Page 60: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/60.jpg)
Clash of the Clouds (Economist April 4, 2009)
CS 347 Lecture 1 60
![Page 61: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/61.jpg)
Silver lining (cloud price war) (Economist, August 30, 2014)
CS 347 Lecture 1 61
![Page 62: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/62.jpg)
CC Issues
• Customer lock-in• Privacy• Standards• Software licensing
CS 347 Lecture 1 62
![Page 63: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/63.jpg)
CS 347 Lecture 1 63
Next
• How to describe distributed data• Query processing in parallel DBs• Query processing in distributed DBs
![Page 64: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/64.jpg)
CS 347 Lecture 1 64
Query processing in parallel DBs:
• Typically: we can distribute/partition/sort…. data to make certain DBoperations (e.g., Join) fast
![Page 65: CS 347Lecture 11 CS 347: Parallel and Distributed Data Management Notes 01: Introduction Brian Cooper Based on Material by Hector Garcia-Molina](https://reader035.vdocument.in/reader035/viewer/2022062308/56649cd75503460f9499f52e/html5/thumbnails/65.jpg)
CS 347 Lecture 1 65
Query processing in distributed DBs:
• Typically: we are given data distribution; we need to find query processing strategy to minimize cost
(e.g., communication cost)