oclc online computer library center parallel text searching on a beowulf cluster using srw ralph...

12
OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

Upload: rachel-sparks

Post on 27-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

OCLC Online Computer Library Center

Parallel Text Searching on a

Beowulf Cluster using SRWRalph LeVan

OCLC Research

Page 2: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

GoalGoalDemonstrate 100 searches/second on our

50 million record WorldCat database residing on a small Beowulf Cluster

Page 3: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

Beowulf ClusterBeowulf Cluster24 nodes– 2 2.8GHtz Xeon CPUs– 4 GB of memory

80 GB of disk on 23 application nodes

130 GB of disk on root node

Page 4: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

DatabaseDatabase50 million records

69 partitions (~700,000 records)– 3 partitions per application node

Partitioned by popularity

Searched using OCLC Research’s Open Source Gwen and Pears toolkits

Page 5: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

ArchitectureArchitecture1 Tomcat on each application node

3 SRW/U databases configured for each Tomcat

1 client application on the root node

Page 6: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

Trial #1Trial #1SRW client searching 69 databases

Result:

2 searches/second (437ms/search)

Ganglia Cluster Report shows the root node glowing red and the application nodes a peaceful blue

Page 7: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

Trial #2Trial #2SRU client with scanned response searching 69 databases

Result:

25 searches/second (40ms/search)

Ganglia Cluster Report still shows the root node glowing red and the application nodes a peaceful blue

Page 8: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

Trial #3Trial #3SRW client with hand built XML and scanned response searching 69 databases

Result:

21 searches/second (46ms/search)

Ganglia Cluster Report still shows the root node glowing red and the application nodes a peaceful blue

SRW dropped

Page 9: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

RearchitectureRearchitectureProblem: Ganglia Reports indicate that

the client is the bottleneck

Solution: Put a 3-way federator on each Tomcat (a virtual database for the client) and have the client search 23 databases instead of 69

Page 10: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

ResultResultSRU client: 71 searches/second (14 ms)

Hand-built SRW client: 33 searches/second (30ms)

Original SRW client: 6 searches/second(164)

Ganglia cluster report still shows root node red, but application nodes are now green and yellow

Page 11: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

RearchitectureRearchitectureCreate a virtual 23-way database on each Tomcat that will federate searches from the 23 virtual 3-way databases

Put one of these on each Tomcat

Create a new client that sends searches on threads to each available 23-way database

Page 12: OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

ResultResult

With 23 threads, 172 searches/second– Average response time of 170ms

The Ganglia report showed all nodes running red