distributed systems principles and paradigmsariel/download/ds590/pdfs/slides-static.12.pdf ·...
TRANSCRIPT
Distributed SystemsPrinciples and Paradigms
Maarten van Steen
VU Amsterdam, Dept. Computer ScienceRoom R4.20, [email protected]
Chapter 12: Distributed Web-Based Systems
Version: December 10, 2012
Distributed Web-Based Systems 12.1 Architecture
Distributed Web-based systems
EssenceThe WWW is a huge client-server system with millions of servers; eachserver hosting thousands of hyperlinked documents.
Documents are often represented in text (plain text, HTML, XML)Alternative types: images, audio, video, applications (PDF, PS)Documents may contain scripts, executed by client-side software
Client machine
Browser
OS
Server machine
Web server
1. Get document request (HTTP)
3. Response
2. Server fetchesdocument fromlocal file
2 / 19
Distributed Web-Based Systems 12.1 Architecture
Multi-tiered architectures
ObservationAlready very soon, Web sites were organized into three tiers.
Web server Database serverCGI process
CGI program
1. Get request
3. Start process to fetch document
5. HTML document created
HTTP request handler6. Return result
4. Database interaction
3 / 19
Distributed Web-Based Systems 12.1 Architecture
Web services
ObservationAt a certain point, people started recognizing that it is was more than justuser↔ site interaction: sites could offer services to other sites⇒standardization is then badly needed.
Service description (WSDL)
Client machine
Client application
Stub
Server application
Stub
Communication subsystem
Communication subsystem
SOAP
Service description (WSDL)Service description (WSDL)
Directory service (UDDI)
Publish serviceLook up
a service
Generate stub from WSDL description
Server machine
Generate stub from WSDL description
4 / 19
Distributed Web-Based Systems 12.2 Processes
Apache Web server
Observation: More than 52% of all 185 million Web sites are Apache.
The server is internally organized more or less according to the steps neededto process an HTTP request.
Hook Hook Hook Hook
Function
... ... ...
Module Module Module
Apache coreFunctions called per hook
Link between function and hook
Request Response5 / 19
Distributed Web-Based Systems 12.2 Processes
Server clusters
EssenceTo improve performance and availability, WWW servers are often clustered ina way that is transparent to clients.
Frontend
Webserver
Webserver
Webserver
Webserver
Request Response
Front end handlesall incoming requestsand outgoing responses
LAN
6 / 19
Distributed Web-Based Systems 12.2 Processes
Server clusters
ProblemThe front end may easily get overloaded, so that special measuresneed to be taken.
Transport-layer switching: Front end simply passes the TCPrequest to one of the servers, taking some performance metricinto account.Content-aware distribution: Front end reads the content of theHTTP request and then selects the best server.
7 / 19
Distributed Web-Based Systems 12.2 Processes
Server Clusters
QuestionWhy can content-aware distribution be so much better?
SwitchClient
Webserver
Webserver
Distributor
Distributor
Dis-patcher
1. Pass setup requestto a distributor
2. Dispatcher selectsserver
3. Hand offTCP connection
4. InformswitchSetup request
Other messages
5. Forwardothermessages
6. Server responses
8 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Web proxy caching
Basic ideaSites install a separate proxy server that handles all outgoing requests.Proxies subsequently cache incoming documents. Cache-consistencyprotocols:
Always verify validity by contacting serverAge-based consistency:
Texpire = α · (Tcached −Tlast modified)+Tcached
9 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Web proxy caching
Basic idea (cnt’d)Cooperative caching, by which you first check your neighbors on acache miss
Webproxy
Webserver
Webproxy
WebproxyCache
Cache
Cache
Client
Client
ClientClient
Client
ClientClient
Client
Client
2. Ask neighboring proxy caches
1. Look inlocal cache
HTTP Get request
3. Forward requestto Web server
10 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication in Web hosting systems
ObservationBy-and-large, Web hosting systems are adopting replication to increaseperformance. Much research is done to improve their organization. Followsthe lines of self-managing systems.
Web hosting system
Metric estimation
Analysis
+/-+/-+/-
Reference input
Initial configuration
Uncontrollable parameters (disturbance / noise)
Observed output
Measured outputAdjustment triggers
Corrections
Replica placement
Consistency enforcement
Request routing
11 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Handling flash crowds
ObservationWe need dynamic adjustment to balance resource usage. Flashcrowds introduce a serious problem.
(a) (b)
(c) (d)
2 days 2 days
6 days 2.5 days
12 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Server replication
Content Delivery NetworkCDNs act as Web hosting services to replicate documents across theInternet providing their customers guarantees on high availability andperformance (example: Akamai).
Origin server
Client
CDN server
CDN DNS server
Regular DNS system
Cache
1. Get base document
2. Document with refs to embedded documents
6. Get embedded documents (if not already cached)
5. Get embedded documents
7. Embedded documentsReturn IP address client-best server
DNS lookups 3
4
13 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication of Web applications
ObservationReplication becomes more difficult when dealing with databses andsuch. No single best solution.
AssumptionUpdates are carried out at origin server, and propagated to edgeservers.
14 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication of Web applications: normal
Appllogic
Appllogic
Authoritative
databaseSchema Schema
Webserver
Webserver
query
response
full/partial data replication
full schema replication/
query templates
Content-aware
cache
Database
copy
Edge-server side Origin-server side
Content-blind
cache
Client
15 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication of Web applications
Alternative solutions
Full replication: high read/write ratio, often in combination with complexqueries.Partial replication: high read/write ratio, but in combination with simplequeriesContent-aware caching: Check for queries at local database, andsubscribe for invalidations at the server. Works good with range queriesand complex queries.Content-blind caching: Simply cache the result of previous queries.Works great with simple queries that address unique results (e.g., norange queries).
QuestionWhat can be said about replication vs. performance?
16 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication Web apps.: full/partial replication
Appllogic
Schema
Webserver
response
full/partial data replication
full schema replication/
query templates
Content-blind
cache
Content-aware
cache
Database
copy
Client
Edge-server side
Authoritative
databaseSchema
Webserver
query
Origin-server side
Appllogic
17 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication Web apps.: content-aware caching
Appllogic
Schema
Webserver
response
full/partial data replication
full schema replication/
query templates
Content-blind
cache
Content-aware
cache
Database
copy
Client
Edge-server side
Authoritative
databaseSchema
Webserver
query
Origin-server side
Appllogic
18 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication Web apps.: content-blind caching
Appllogic
Schema
Webserver
response
full/partial data replication
full schema replication/
query templates
Content-blind
cache
Content-aware
cache
Database
copy
Client
Edge-server side
Authoritative
databaseSchema
Webserver
query
Origin-server side
Appllogic
19 / 19