an investigation into implementations of dna sequence pattern matching algorithms peden nichols...

10
An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April, 2004-2005

Upload: rosalind-berry

Post on 05-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April, 2004-2005

An Investigation into Implementations of DNA

Sequence Pattern Matching Algorithms

Peden NicholsComputer Systems

ResearchApril, 2004-2005

Page 2: An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April, 2004-2005

Abstract

Purpose: To investigate the relative practicality of the three

main computational methods (supercomputing, clusters, and grid computing) as applications to the problem of DNA sequence pattern matching

To establish data for local unused processor time and the degree of invasiveness of various backgrounding methods, for possible applications to grid computing implementations of BLAST

To quantify the cost to efficiency when adding processors for various BLAST algorithms.

Page 3: An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April, 2004-2005

Abstract

Methods Grid Computing

MPI (Message Passing Interface) implementations of BLAST algorithms

Systems Lab Resources: 40+ networked desktop machines, 16 machines currently mpi-enabled

Backgrounding Algorithms run in the background while computer is in

use To what extent do users notice performance change?

Dynamic load balancing Reallocate work as new processors become

available/current processors become unavailable

Page 4: An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April, 2004-2005

Background

Huge amounts of genetic data Human Genome Project The Institute for Genomic Research (TIGR)

Current computational resources overwhelmedDecoding genome not profitable (yet)Independent/Government organizations don't

have the money

Page 5: An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April, 2004-2005

Background

Two approaches to solutionMake the algorithms run faster

- mpiBLAST with dynamic load balancing- Port algorithm implementations to clusters,

supercomputersIncrease utilization of current computational

resources- Schools, labs across the country with lots of idle time- Most processors are nowhere near 100% load- Applications for other computationally intensive

problems

Page 6: An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April, 2004-2005

Procedure

1 – Demonstrate potential for greater utilization of local computational resources- CPU Load Perl Script

- Records one-minute load averages every second- Produces graphable results- Sample output:

Page 7: An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April, 2004-2005

Procedure

As the graphs show, CPU usage on most processors is close to zero before I start running tests on other students' computers.

These graphs show CPU load averages (a measure of processor use) of computers being used by students during a systems lab class.

Page 8: An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April, 2004-2005

Procedure

But did students even notice when the processor use on their computers spiked over 100% above normal levels?

No! Seven out of seven students tested reported no noticeable change in performance.

Page 9: An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April, 2004-2005

Procedure

Conclusions:- Average Systems Lab students use nowhere near their processor's full capabilities, during class time- That processor time/power is effectively going to waste- We haven't even mentioned the time when there is no user logged in to a given computer!

Think how much time is just wasted at the login screen...

Page 10: An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April, 2004-2005

Where it goes from here...

Taking advantage of unused time-Develop automated backgrounding of BLAST algorithms-Long-term tests of CPU usage-Attempt to realize increase in long-term usage with new applications

Optimizing the algorithms-Test BLAST algorithms against one another-Test algorithms on different machines-Investigate supercomputer, cluster implementations