cloud computing
TRANSCRIPT
![Page 1: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/1.jpg)
Cloud Computing Lecture #1
What is Cloud Computing?(and an intro to parallel/distributed processing)
Jimmy LinThe iSchoolUniversity of Maryland
Wednesday, September 3, 2008
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
![Page 2: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/2.jpg)
Source: http://www.free-pictures-photos.com/
![Page 3: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/3.jpg)
The iSchoolUniversity of Maryland
What is Cloud Computing?
1. Web-scale problems
2. Large data centers
3. Different models of computing
4. Highly-interactive Web applications
![Page 4: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/4.jpg)
The iSchoolUniversity of Maryland
1. Web-Scale Problems Characteristics:
Definitely data-intensive May also be processing intensive
Examples: Crawling, indexing, searching, mining the Web “Post-genomics” life sciences research Other scientific data (physics, astronomers, etc.) Sensor networks Web 2.0 applications …
![Page 5: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/5.jpg)
The iSchoolUniversity of Maryland
How much data? Wayback Machine has 2 PB + 20 TB/month (2006)
Google processes 20 PB a day (2008)
“all words ever spoken by human beings” ~ 5 EB
NOAA has ~1 PB climate data (2007)
CERN’s LHC will generate 15 PB a year (2008)
640K ought to be enough for anybody.
![Page 6: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/6.jpg)
Maximilien Brice, © CERN
![Page 7: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/7.jpg)
Maximilien Brice, © CERN
![Page 8: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/8.jpg)
The iSchoolUniversity of Maryland
There’s nothing like more data!
s/inspiration/data/g;
(Banko and Brill, ACL 2001)(Brants et al., EMNLP 2007)
![Page 9: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/9.jpg)
The iSchoolUniversity of Maryland
What to do with more data? Answering factoid questions
Pattern matching on the Web Works amazingly well
Learning relations Start with seed instances Search for patterns on the Web Using patterns to find more instances
Who shot Abraham Lincoln? X shot Abraham Lincoln
Birthday-of(Mozart, 1756)Birthday-of(Einstein, 1879)
Wolfgang Amadeus Mozart (1756 - 1791)Einstein was born in 1879
PERSON (DATE –PERSON was born in DATE
(Brill et al., TREC 2001; Lin, ACM TOIS 2007)(Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )
![Page 10: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/10.jpg)
The iSchoolUniversity of Maryland
2. Large Data Centers Web-scale problems? Throw more machines at it!
Clear trend: centralization of computing resources in large data centers Necessary ingredients: fiber, juice, and space What do Oregon, Iceland, and abandoned mines have in
common?
Important Issues: Redundancy Efficiency Utilization Management
![Page 11: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/11.jpg)
Source: Harper’s (Feb, 2008)
![Page 12: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/12.jpg)
Maximilien Brice, © CERN
![Page 13: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/13.jpg)
The iSchoolUniversity of Maryland
Key Technology: Virtualization
Hardware
Operating System
App App App
Traditional Stack
Hardware
OS
App App App
Hypervisor
OS OS
Virtualized Stack
![Page 14: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/14.jpg)
The iSchoolUniversity of Maryland
3. Different Computing Models
Utility computing Why buy machines when you can rent cycles? Examples: Amazon’s EC2, GoGrid, AppNexus
Platform as a Service (PaaS) Give me nice API and take care of the implementation Example: Google App Engine
Software as a Service (SaaS) Just run it for me! Example: Gmail
“Why do it yourself if you can pay someone to do it for you?”
![Page 15: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/15.jpg)
The iSchoolUniversity of Maryland
4. Web Applications A mistake on top of a hack built on sand held together by
duct tape?
What is the nature of software applications? From the desktop to the browser SaaS == Web-based applications Examples: Google Maps, Facebook
How do we deliver highly-interactive Web-based applications? AJAX (asynchronous JavaScript and XML) For better, or for worse…
![Page 16: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/16.jpg)
The iSchoolUniversity of Maryland
What is the course about? MapReduce: the “back-end” of cloud computing
Batch-oriented processing of large datasets
Ajax: the “front-end” of cloud computing Highly-interactive Web-based applications
Computing “in the clouds” Amazon’s EC2/S3 as an example of utility computing
![Page 17: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/17.jpg)
The iSchoolUniversity of Maryland
Amazon Web Services Elastic Compute Cloud (EC2)
Rent computing resources by the hour Basic unit of accounting = instance-hour Additional costs for bandwidth
Simple Storage Service (S3) Persistent storage Charge by the GB/month Additional costs for bandwidth
You’ll be using EC2/S3 for course assignments!
![Page 18: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/18.jpg)
The iSchoolUniversity of Maryland
This course is not for you… If you’re not genuinely interested in the topic
If you’re not ready to do a lot of programming
If you’re not open to thinking about computing in new ways
If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software
If you can’t put in the time
Otherwise, this will be a richly rewarding course!
![Page 19: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/19.jpg)
Source: http://davidzinger.wordpress.com/2007/05/page/2/
![Page 20: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/20.jpg)
The iSchoolUniversity of Maryland
Cloud Computing Zen Don’t get frustrated (take a deep breath)…
This is bleeding edge technology Those W$*#T@F! moments
Be patient… This is the second first time I’ve taught this course
Be flexible… There will be unanticipated issues along the way
Be constructive… Tell me how I can make everyone’s experience better
![Page 21: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/21.jpg)
Source: Wikipedia
![Page 22: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/22.jpg)
Source: Wikipedia
![Page 23: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/23.jpg)
Source: Wikipedia
![Page 24: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/24.jpg)
Source: Wikipedia
![Page 25: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/25.jpg)
The iSchoolUniversity of Maryland
Things to go over… Course schedule
Assignments and deliverables
Amazon EC2/S3
![Page 26: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/26.jpg)
The iSchoolUniversity of Maryland
Web-Scale Problems? Don’t hold your breath:
Biocomputing Nanocomputing Quantum computing …
It all boils down to… Divide-and-conquer Throwing more hardware at the problem
Simple to understand… a lifetime to master…
![Page 27: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/27.jpg)
The iSchoolUniversity of Maryland
Divide and Conquer
“Work”
w1 w2 w3
r1 r2 r3
“Result”
“worker” “worker” “worker”
Partition
Combine
![Page 28: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/28.jpg)
The iSchoolUniversity of Maryland
Different Workers Different threads in the same core
Different cores in the same CPU
Different CPUs in a multi-processor system
Different machines in a distributed system
![Page 29: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/29.jpg)
The iSchoolUniversity of Maryland
Choices, Choices, Choices Commodity vs. “exotic” hardware
Number of machines vs. processor vs. cores
Bandwidth of memory vs. disk vs. network
Different programming models
![Page 30: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/30.jpg)
The iSchoolUniversity of Maryland
Flynn’s Taxonomy
Instructions
Single (SI) Multiple (MI)
Da
ta
Mu
ltip
le (
MD
)
SISD
Single-threaded process
MISD
Pipeline architecture
SIMD
Vector Processing
MIMD
Multi-threaded Programming
Sin
gle
(S
D)
![Page 31: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/31.jpg)
The iSchoolUniversity of Maryland
SISD
D D D D D D D
Processor
Instructions
![Page 32: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/32.jpg)
The iSchoolUniversity of Maryland
SIMD
D0
Processor
Instructions
D0D0 D0 D0 D0
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D0
![Page 33: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/33.jpg)
The iSchoolUniversity of Maryland
MIMD
D D D D D D D
Processor
Instructions
D D D D D D D
Processor
Instructions
![Page 34: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/34.jpg)
The iSchoolUniversity of Maryland
Memory Typology: Shared
Memory
Processor
Processor Processor
Processor
![Page 35: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/35.jpg)
The iSchoolUniversity of Maryland
Memory Typology: Distributed
MemoryProcessor MemoryProcessor
MemoryProcessor MemoryProcessor
Network
![Page 36: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/36.jpg)
The iSchoolUniversity of Maryland
Memory Typology: Hybrid
MemoryProcessor
Network
Processor
MemoryProcessor
Processor
MemoryProcessor
Processor
MemoryProcessor
Processor
![Page 37: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/37.jpg)
The iSchoolUniversity of Maryland
Parallelization Problems How do we assign work units to workers?
What if we have more work units than workers?
What if workers need to share partial results?
How do we aggregate partial results?
How do we know all the workers have finished?
What if workers die?
What is the common theme of all of these problems?
![Page 38: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/38.jpg)
The iSchoolUniversity of Maryland
General Theme? Parallelization problems arise from:
Communication between workers Access to shared resources (e.g., data)
Thus, we need a synchronization system!
This is tricky: Finding bugs is hard Solving bugs is even harder
![Page 39: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/39.jpg)
The iSchoolUniversity of Maryland
Managing Multiple Workers Difficult because
(Often) don’t know the order in which workers run (Often) don’t know where the workers are running (Often) don’t know when workers interrupt each other
Thus, we need: Semaphores (lock, unlock) Conditional variables (wait, notify, broadcast) Barriers
Still, lots of problems: Deadlock, livelock, race conditions, ...
Moral of the story: be careful! Even trickier if the workers are on different machines
![Page 40: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/40.jpg)
The iSchoolUniversity of Maryland
Patterns for Parallelism Parallel computing has been around for decades
Here are some “design patterns” …
![Page 41: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/41.jpg)
The iSchoolUniversity of Maryland
Master/Slaves
slaves
master
![Page 42: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/42.jpg)
The iSchoolUniversity of Maryland
Producer/Consumer Flow
CP
P
P
C
C
CP
P
P
C
C
![Page 43: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/43.jpg)
The iSchoolUniversity of Maryland
Work Queues
CP
P
P
C
C
shared queue
W W W W W
![Page 44: Cloud Computing](https://reader030.vdocument.in/reader030/viewer/2022020307/55cb9c1cbb61eb090d8b4589/html5/thumbnails/44.jpg)
The iSchoolUniversity of Maryland
Rubber Meets Road From patterns to implementation:
pthreads, OpenMP for multi-threaded programming MPI for clustering computing …
The reality: Lots of one-off solutions, custom code Write you own dedicated library, then program with it Burden on the programmer to explicitly manage everything
MapReduce to the rescue! (for next time)