submitting: barak pinhas037584042 gil fiss031801731 laurent levy 921012076
Post on 21-Dec-2015
217 views
TRANSCRIPT
Caching by proxy servers
• The World-Wide Web evolution since its introduction in 1990 has evolved from a simple client server model into a complex distributed architecture.
• This evolution has been driven largely due to the scaling problems associated with exponential growth.
• One of the core infrastructure components that have been employed to meet the demands of this growth is caching by proxy servers.
• A proxy server is an intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients.
• The proxy’s cache stores cacheable responses in order to reduce the response time and network bandwidth consumption on future, equivalent requests.
A proxy:
Proxy cache capacity
• The proxy cache must utilize a clean-up algorithm to determine which documents to throw away when its capacity limit is reached.
• When the proxy free space reaches a certain limit, a number of web pages are cleared from the proxy.
Proxy cluster
director
userProxy #1
Proxy #2
Proxy #n
Web server1.The user request for a web page
2.The director select a proxy to handle the request
4.if the requested web page is not in the proxy’s cahce,the proxy searches the neighboring proxies for the web page,and if the web page is not found in the neighboring proxies the proxy takes the page from the web server.
3.The proxy checks if he has the requestedpage
Proxy cache capacityMethods to determine the web pages that would be cleared:
• LFU aging
Least Frequently Used aging. The document which has been requested least frequently is cleared from the cache.
• LRU aging
Least Recently Used aging. The document which has been requested least recently is cleared from the cache.
• Perfect LFU
Perfect Least Frequently Used aging. In this method the document which has been requested least frequently is cleared from the cache.
Project’s aims and measurements
Proxy servers that cache web pages can potentially reduce:
• The number of requests that arrive to the web servers ( the load on web servers in units of web pages).
• The volume of network traffic resulting from document requests ( the load on web servers in units of bytes).
• The latency that an end user experiences in retrieving a document
Selection methods for the Director
1. RRM - Round Robin selection method
When a page request arrives, the director selects the proxy that is after the last selected proxy in a predetermined fixed order.
2.Hash Function
The hash key is the URL, when a new request arrives with a given URL the request goes to the proxy number hash(URL).
www.cs.technion.ac.il
Index kdirector
Proxy #1
Proxy #k
Proxy #n
Hash function
Selection methods for the Director
• The Hash function is:
Hash(Url Number) =
<sum of the Url Number digits> mod < number of proxies >
Selection methods for the Director
3.Unfinished job
evaluating the number of jobs that the proxy takes care of and giving the request to the proxy with the least jobs at the time of the request.
1.The director selects the proxy with the least tasks
Proxy #1 Proxy #2 Proxy #3
tasks
time time
tasks
time
At the time of the request,proxy #1 has 1 task
At the time of the request,proxy #2 has 2 task
At the time of the request,proxy #3 has2 task
Selection methods for the Director
5. BLFU – byte least frequently used.
Selection methods for the Director
In this Method each proxy has a BLFU Value( In The PLFU method the system remembers the usage frequency record of each request after that request was erased.)
4. PLFU – perfect LFU:
Selection methods for the Director
In this Method each proxy has a PLFU Value( In The PLFU method the system remembers the usage frequency record of each request after that request was erased.)
Expected results
• We expected ROUND-ROBIN oriented selection methods (RRM TASKS and PLFU/BLFU) to get a lower local hit ratio then the hash selection methods, mainly because while using the hash method a cluster hit will always come with a local hit. That is why a hash function that is not even might lead a a lower cluster hit rate.
• Among the ROUND-ROBIN oriented selection methods we expected the “even load distribution“ methods like TASKS and PLFU/BLFU to get a better hit ratio then the plain RRM because we expected these methods to grant us an even distribution and that way less pruning and a more adaptive cluster (a better cluster hit ratio).
• Object hit ratio
Expected results
• Byte hit ratio
• We expected the methods that takes requests sizes into consideration to get a better byte hit ratio
• Load distribution
• We expected the methods that strive to achieve a better distribution (like RRM TASKS and PLFU/BLFU) to get a better load distribution then the Hash method.
• We expected the following best results ranking:
• BLFU/PLFU - because of the “perfect” technique used.
• TASK - because of the load distribution.
• RRM - because of a simple distribution.
• HASH – because the hash keying might not be even.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Pro
xy R
equ
ests
Rat
io
1 2 3 4 5 6 7
Proxy Number
Proxy Requests Distribution
test #1
test #2
test #3
test #4
test #5
The results
In this graph we see a significant advantage to the hash selection method.The average hit rate for the hash selection method is about 62%-65%.But a disadvantage of this method is that the load distribution is not even.All the other method produced about 50% hit rate.
Comparing the director selection methods
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Pro
xy
Re
qu
es
ts R
ati
o
1 2 3 4 5 6 7
Proxy Number
Proxy Requests Distribution
test #1
test #2
test #3
test #4
test #5
test #6
test #7
test #8
test #9
test #10
test #11
Hit rate
As we can see in the graph,the best result of hit rate was in hash for the director method and LFU,LRU and PLFU for the proxies pruning method.In the next place we see the task method for the director and PLFU for the proxy pruning method.
Comparing all the possible director selection and proxy pruning methods
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Pro
xy R
equ
ests
Byt
es R
atio
1 2 3 4 5 6 7
Proxy Number
Proxy Requests Bytes Distribution
test #1
test #2
test #3
test #4
test #5
test #6
test #7
test #8
test #9
test #10
test #11
Byte hit rate
In this graph we can see again that the hash method for the director Got the best results.Comparing to the hit rate we can see that the hash method result differsMore From other method result in the byte hit rate campers to the hit rate graph.
Comparing all the possible director selection and proxy pruning methods
16000000000
17000000000
18000000000
19000000000
20000000000
21000000000
22000000000
23000000000
Pro
xy R
equ
ests
Byt
es
1 2 3 4 5 6
Proxy Number
Load Distribution
test #1
test #2
test #3
test #4
test #5
test #6
test #7
test #8
test #9
test #10
test #11
Load distribution
In this graph we see that there is no even distribution among the proxies.Proxy #1 got more request in compare to the other proxies when hash methodWas chosen for the director method,and proxy #5 got more request when tasksMethod was chosen for the director.
Comparing all the possible director selection and proxy pruning methods