p-jigsaw: a cluster-based web server with cooperative caching supports ge chen, cho-li wang, francis...
Post on 23-Dec-2015
216 Views
Preview:
TRANSCRIPT
p-Jigsaw: A Cluster-based Web Server
with Cooperative Caching Supports
p-Jigsaw: A Cluster-based Web Server
with Cooperative Caching Supports
Ge Chen, Cho-Li Wang, Francis C.M. Lau(Presented by Cho-Li Wang)
The Systems Research GroupDepartment of Computer Science and Information Systems
The University of Hong Kong
2
What’s a cluster ?What’s a cluster ?
A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone/complete computers cooperatively working together as a single, integrated computing resource – IEEE TFCC.
3
Rich Man’s ClusterRich Man’s Cluster Computational Plant (C-Plant
cluster) Rank: 30 at TOP500 (11/2001) 1536 Compaq DS10L 1U servers
(466 MHz Alpha 21264 (EV6) microprocessor, 256 MB ECC SDRAM)
Each node contains a 64-bit, 33 MHz Myrinet network interface card (1.28 Gbps/s) connected to a 64-port Mesh64 switch.
48 cabinets, each of which contains 32 nodes
(48x32=1536)
4
Poor Man’s ClusterPoor Man’s Cluster HKU Linux Cluster 32 733 MHz Pentium III PCs,
392MB Memory Hierarchical Ethernet-based
network : four 24-port Fast Ethernet switches + one 8-port Gigabit Ethernet backbone switch)
Additional 80-port Cisco Catalyst 2980G Fast Ethernet Switch
6
Cluster Computer ArchitectureCluster Computer Architecture
High-Speed LAN (Fast/Gigabit Ethernet, SCI, Myrinet)
Availability Infrastructure
Single System Image Infrastructure
Programming Environment
(Java, C, MPI, HPF)
Web WindowsUser Interface
Other Subsystems(Database, Web server
OLTP, etc.)
OS
Node
OS
Node
OS
Node
OS
Node
7
Talk OutlineTalk Outline Motivation -- The Need for Speed Cluster-based Solutions System Architecture of p-Jigsaw Performance Evaluation Conclusion and Future Work Other SRG Projects
8
The ChallengesThe Challenges Netscape Web site in November 1996: 120 million hits
per day Microsoft Corp. Web site received more than 100 M
hits per day. (1,200 hits per second) Olympic Winter Games 1998 (Japan): 634.7M (16 days),
peak day 57M, peak minute 110K Winbledon July 1999, 942 M hits (14 days), peak day
125M, peak minute 430K Olympic Games 2000 : peak day 502.6 M, peak minute
600K hits. (10K hits per second)
9
The Need for SpeedThe Need for Speed Internet user popularity is growing very fast
According to United States Internet Council’s report, regular Internet user has increased from less then 9M in 1993 to more than 300M in the summer of 2000, and is still growing fast
Broadband becomes popularAccording IDG’s report, 57% of the workers in U.S access Internet via broadband in office. The figure will be more than 90% by 2005. Home broadband user will also increase from less than 9M now to over 55M by 2005
HTTP requests account for larger portion of Internet traffic now
One study shows that HTTP activity has grown to account for 75%~80% of all Internet traffic
11
The Need for SpeedThe Need for Speed The Need for Speed
Growing user numberFaster last-mile connection speed Increasing portion of HTTP requests accounts for all Internet traffic
Require a more powerful Web server architecture
Cluster-Based SolutionCluster-Based Solution
Cluster -- A Low-cost yet efficient parallel computing
architecture
15
Dispatcher-basedDispatcher-based
Dispatcher
A network component of the Web-server system acts as a dispatcher which routes the requests to one of the Web servers to fulfill the load balancing. Each Web server works individually.
InternetInternet
client
Layer 4 switching with level 2 address translation :
One-IP, IBM eNetwork, WebMux, LVS in DR mode
Layer 4 switching with level 3 address translation : Cisco LocalDirector, Alteon ACEDirector, F5 Big/IP, LVS in NAT mode.
Layer 7 switching (Content-based): LARD, IBM Web Accelerator, Zeus Load Balancer (ZLB)
17
p-Jigsaw -- Goalsp-Jigsaw -- Goals High Efficiency:
Explore aggregate power of cluster resources (CPU, memory, disk, network bandwidth).Explore in-memory Web caching on cluster-based Web servers
High Scalability Maintain high cache hit rate and high throughput as cluster size growsEliminate potential bottleneck in the overall design
High PortabilityMulti-platform supportHeterogeneous cluster
18
Main Features of p-Jigsaw Web serversMain Features of p-Jigsaw Web servers
Global Object Space (GOS) Hot Objects Caching Cooperative Object Caching Distributed Cache Replacement Algorithms
19
Global Object SpaceGlobal Object Space(All Web objects in system are visible and accessible to every (All Web objects in system are visible and accessible to every
node through GOS)node through GOS)
OSOSOSOS
JVMJVMJVMJVM
p-Jigsawp-Jigsawp-Jigsawp-Jigsaw
OSOSOSOS
JVMJVMJVMJVM
p-Jigsawp-Jigsawp-Jigsawp-Jigsaw
OSOSOSOS
JVMJVMJVMJVM
p-Jigsawp-Jigsawp-Jigsawp-Jigsaw
OSOSOSOS
JVMJVMJVMJVM
p-Jigsawp-Jigsawp-Jigsawp-Jigsaw
Server NodeServer Node Server NodeServer Node Server NodeServer Node Server NodeServer Node
High-Speed LAN
Global Object SpaceGlobal Object Space
Memory Cache Memory Cache Memory Cache Memory Cache
Hot Object Cache
23
Incoming Request: http://p-Jigsaw.org/node4/dir2/pic1.jpg
Hard
diskH
ard disk
Hot Object CacheHot Object Cache
Node 1Node 1
1111
Local Object Table for Node 1Local Object Table for Node 1Object URLObject URL AGACAGAC LACLAC HNHN
//node3/dir31/fig311.jpgnode3/dir31/fig311.jpg 1500015000 100100 33
//node1/dir12/fig121node1/dir12/fig121 1100011000 5050 11
…… …… …… ……2222
Node 2Node 2 Node 3Node 3Node 4Node 4
Global Object Table for Node 1Global Object Table for Node 1Object URLObject URL HNHN AGACAGAC CCNNCCNN
//node4/dir2node4/dir2 44…… …… …… ……
//node1/dir12/fig121.bmpnode1/dir12/fig121.bmp 11 1100011000 2-3-42-3-4
Hard
diskH
ard disk
Hot Object CacheHot Object Cache
Hard
diskH
ard disk
Hot Object CacheHot Object Cache
Hard
diskH
ard disk
Hot Object CacheHot Object Cache
Mis
Mis
s!s!
3333
Global Object Table for Node 4Global Object Table for Node 4Object URLObject URL HNHN AGACAGAC CCNNCCNN
…… …… …… ……
//node4/dir12/pic1.jpgnode4/dir12/pic1.jpg 44 1500015000 2-3-42-3-4
4444
6666
HN HN : Home Node: Home NodeAGAC AGAC : Approximated Global Access Counter : Approximated Global Access CounterCCNN CCNN : Cache Copy Node Number: Cache Copy Node NumberLAC LAC : Local Access Counter: Local Access Counter
Redirect the request to node 4Redirect the request to node 4(Home node of the requested page)(Home node of the requested page)Search on GOT1 (Hashing)Search on GOT1 (Hashing)
Cached copy is forwarded from node 2,3, Cached copy is forwarded from node 2,3, or 4, depends on the server workloador 4, depends on the server workload
Cached copy is forwarded from node 2,3, Cached copy is forwarded from node 2,3, or 4, depends on the server workloador 4, depends on the server workload
5555
Cached in node 1
24
Distributed Cache ReplacementDistributed Cache Replacement Two LFU-based Algorithms are Implemented:
LFU-Aging: AGAC/ 2 ; every △ tWeighted-LFU: AGAC/ (file size)Global LRU (GLRU) is implemented for comparison
Try to cache the “hottest objects” in global object space Cached object’s life time is set according to HTTP
timestamp. Cache consistency is maintained by invalidation scheme.
25
Update of Access CountersUpdate of Access Counters
GOS
GOT for Node 1
Object URL/Partial URL AGAC HN
/node3/dir31/doc3.html 3
/node4/dir41 4
/node1/dir11/doc1.html 105 1
… … …
LOT for Node 1
Object URL LAC AGAC HN
/node3/dir32/pic3.jpg 45 200 3
/node1/dir11/doc1.html 24 105 1
… … …
GOT for Node 3
Object URL AGAC HN
/node3/dir31/doc3.html 50 3
/node3/dir32/pic3.jpg 200 3
… … …
1 2 3 4
HOC HOC HOC HOC
45+200 = 245
245
2450
LAC is periodically sent back to objects’ HN to maintain an approximate global access counter for every cached object
27
Experiment SetupExperiment Setup 32-node PC cluster. Each node
consists of a 733MHz Pentium III PC running Linux 2.2.4.
The nodes are connected with an 80-port Cisco Catalyst 2980G Fast Ethernet Switch.
A NFS server (2-way SMP) with Gigabit Ethernet link to the switch.
16 nodes acts as clients, and the rest as Web servers.
Each of the server nodes has 392MB physical memory installed
32-node PC cluster
28
Requests Throught ( With 64MB HOC at each node)
0
100
200
300
400
500
600
2 Nodes 4 Nodes 8 Nodes 16 Nodes
Req
uest
s/S
econ
d
Weighted-LFU with CC LFU-Aging with CCGLRU with CC Weighted-LFU without CCLFU-Aging without CC Without GOS Support
Experiment ResultsExperiment Results
Effects of Scaling the Cluster Size
4.56
2.02
29
Experiment ResultsExperiment Results
Effects of Scaling the Cache SizeRequests Throughput (16 nodes)
0
100
200
300
400
500
600
1.8% 3.6% 7.2% 14.4%
Relative Cache Size
Weighted-LFUwith CC
LFU-Agingwith CC
GLRU with CC
Weighted-LFUwithout CC
LFU-Agingwithout CC
Aggregated Cache size for 16 nodes = 1.8% (8 MB per node), 3.6%, 7.2%, and 14.4% (64 MB per node) of the size of the data set
31
Analysis of Requests Handle PatternsAnalysis of Requests Handle Patterns Local Cache Object (in local memory)
The server that receives the request has the requested object in its local hot object cache.
Peer Node Cache Object (in remote memory)The server that receives the request does not have the requested object in its local hot object cache. The object is fetched from either the home node or a or other peer nodes.
Disk Object (local or remote disk)The requested object is not in the global object space, and has to be fetched from the file server. This has the longest serving time.
32
Weighted-LFU with CC(64MB HOC Size at Each Node)
0
0.2
0.4
0.6
0.8
1
4 Nodes 8 Nodes 16 Nodes
Local Cache Object Peer Node Object Disk Object
Analysis of Requests Handle PatternsAnalysis of Requests Handle Patterns
~60%
LFU-based algorithms show high local cache hit rates. With 64 MB cache per node, the local cache hit rate is around 60% for both Weighted-LFU and LFU-Aging,
33
Weighted-LFU with CC(8MB HOC Size at Each Node)
0
0.2
0.4
0.6
0.8
1
4 Nodes 8 Nodes 16 Nodes
Local Cache Object Peer Node Object Disk Object
Analysis of Requests Handle PatternsAnalysis of Requests Handle Patterns
~6.7%
~35.2%~50%
~25%
With small cache size (8MB), the cooperative cache can improve the global cache hit rate and reduce the costly file server disk access, which is a common bottleneck
for a website.
36
ConclusionsConclusions Use of cluster wide physical memory as object
cache can lead to improved performance and scalability of Web server systems
With relatively small amount of memory dedicated for object content caching, we are able to achieve a high hit rate with the cooperative caching
Favor replicating more hot objects rather than squeezing more different objects into the global object space.
37
Future WorkFuture Work
The HKU “Hub2World” ProjectBuild a giant proxy cache server on a large PC cluster with HKU’s 300-node Gideon cluster based on p-Jigsaw
Cache hot objects on 150 GB in-memory cache (0.5GB x 300) + 12 Terabytes disk space (40GB x 300)
Design of new caching algorithms
Other SRG ProjectsOther SRG Projects
Welcome to download our software packages and test them on your clusters.
URL: http://www.srg.csis.hku.hk/
39
Current SRG ClustersCurrent SRG Clusters
40
JESSICA2 – A Distributed JVMJESSICA2 – A Distributed JVM
JVM JVM JVM JVM JVM
Thread Migration
Global Object Space
JVM
A Multithreaded Java Program
41
JUMP Software DSMJUMP Software DSM Allows programmers to assume a globally shared virtual
memory, even they execute program on nodes that do not physically share memory
The DSM system will maintain the memory consistency among different machines. Data faulting, location, and movement are handled by the DSM.
Proc 1
Mem 1
Proc N
Mem N
Proc N-1
Mem N-1
Proc 2
Mem 2
Network
Globally Shared Virtual Memory
42
HKU DP-II on Gigabit EthernetHKU DP-II on Gigabit Ethernet
050
100
150200
4 256 512 768 1024 1280 1500Message (Bytes)
La
ten
cy
(us
)
DP-II, back to back on Gigabit EthernetDP-II, through switch on Gigabit EthernetTCP/IP Latency
Single-trip Latency Test(Min: 16.3 µs)
Bandwidth Test(Max: 79.5 MB/s)
0
20
40
60
80
1 480 992 1504
Message size(Byte)B
an
dw
idth
(M
B/s
) DP-II TCP/IP
RWCP GigaE PM : 48.3 us round-trip latency and 56.7 MB/s on Essential Gigabit Ethernet NIC Pentium II 400 MHz.
RWCP GigaE PM II : 44.6 us round trip time. 98.2 MB/s bandwidth on Packet Engines G-NIC II for connecting Compaq XP-1000 (Alpha 21264 at 500 MHz.),.
43
SPARKLE ProjectSPARKLE Project A Dynamic Software Architecture for Pervasive
Computing – “Computing in Small”
Won’t Fit
Application
Applications distributed as monolithic blocks
Our component-based solution
Facets
44
SPARKLE ProjectSPARKLE Project
Facet Servers
Execution Servers
Computational Grid
Co-operativeCaching
(User Mobility)
Intelligent proxies
Facet Query
Facet Retrieval
Service Providers
Delegation/ Mobiie
code
Peer-to-Peer Interaction
Clients (Linux + JVM)
Overview of the proposed software architecture
45
ClusterProbe : Cluster Monitoring ToolClusterProbe : Cluster Monitoring Tool
Q&AQ&A
For more information, please visithttp://www.csis.hku.hk/~clwang
top related