An Investigation into Implementations of DNA
Sequence Pattern Matching Algorithms
Peden NicholsComputer Systems
ResearchApril, 2004-2005
Abstract
Purpose: To investigate the relative practicality of the three
main computational methods (supercomputing, clusters, and grid computing) as applications to the problem of DNA sequence pattern matching
To establish data for local unused processor time and the degree of invasiveness of various backgrounding methods, for possible applications to grid computing implementations of BLAST
To quantify the cost to efficiency when adding processors for various BLAST algorithms.
Abstract
Methods Grid Computing
MPI (Message Passing Interface) implementations of BLAST algorithms
Systems Lab Resources: 40+ networked desktop machines, 16 machines currently mpi-enabled
Backgrounding Algorithms run in the background while computer is in
use To what extent do users notice performance change?
Dynamic load balancing Reallocate work as new processors become
available/current processors become unavailable
Background
Huge amounts of genetic data Human Genome Project The Institute for Genomic Research (TIGR)
Current computational resources overwhelmedDecoding genome not profitable (yet)Independent/Government organizations don't
have the money
Background
Two approaches to solutionMake the algorithms run faster
- mpiBLAST with dynamic load balancing- Port algorithm implementations to clusters,
supercomputersIncrease utilization of current computational
resources- Schools, labs across the country with lots of idle time- Most processors are nowhere near 100% load- Applications for other computationally intensive
problems
Procedure
1 – Demonstrate potential for greater utilization of local computational resources- CPU Load Perl Script
- Records one-minute load averages every second- Produces graphable results- Sample output:
Procedure
As the graphs show, CPU usage on most processors is close to zero before I start running tests on other students' computers.
These graphs show CPU load averages (a measure of processor use) of computers being used by students during a systems lab class.
Procedure
But did students even notice when the processor use on their computers spiked over 100% above normal levels?
No! Seven out of seven students tested reported no noticeable change in performance.
Procedure
Conclusions:- Average Systems Lab students use nowhere near their processor's full capabilities, during class time- That processor time/power is effectively going to waste- We haven't even mentioned the time when there is no user logged in to a given computer!
Think how much time is just wasted at the login screen...
Where it goes from here...
Taking advantage of unused time-Develop automated backgrounding of BLAST algorithms-Long-term tests of CPU usage-Attempt to realize increase in long-term usage with new applications
Optimizing the algorithms-Test BLAST algorithms against one another-Test algorithms on different machines-Investigate supercomputer, cluster implementations