campaign – clustering algorithms in modular, parallel, and...

1
CAMPAIGN – Clustering Algorithms in Modular, Parallel, and Accelerated Implementation for GPU Nodes Kai J. Kohlhoff 1 , Marc Sosnick 4 , Vijay S. Pande 2,3 , and Russ B. Altman 1,3 Departments of 1 Bioengineering, 2 Chemistry, and 3 Computer Science, Stanford University, CA 94305 4 Department of Computer Science, San Francisco State University, CA 94132 The algorithms in the CAMPAIGN library achieve up to two orders of magnitude performance improvement over conventional CPU implementations on a single processor. They greatly reduce the time required for data clustering making it possible to create new data analysis protocols based on higher throughput, finer sampling, or higher numbers of iterations or repetitions. Creating optimized GPU-specific code requires more effort than conventional CPU code. CAMPAIGN seeks to create a common, open-source platform. It is modular, making it easy to adapt to specific problems, and extensible, resulting in high flexibility. We hope that the library will be adopted and grow through contributions from the scientific community. CAMPAIGN will soon be freely available for download from SimTK.org [10] . References We demonstrate the speedup achieved by a GPU (Nvidia Tesla C1060) over a CPU (Intel Xeon E5420) for K-means and K-centers with Euclidean distance metric. In addition, we show the performance gain in comparison with K-means from the MatLab software package. Resulting clusters were identical. Introduction Materials and Methods Results Discussion Clustering algorithms are of central interest in computer science and an essential component of many data analysis toolkits. An active research area is the analysis of a wide variety of biological data sets [1-4] . Massively parallel Graphics processors (GPUs) outperform CPUs in terms of sheer floating point operations by one to two orders of magnitude. They are thus rapidly gaining in popularity for use in computational research and data analysis. A number of parallel K-means implementations on GPUs have been published [5,6] , but an extended set of open-source clustering codes for easy use and benchmarking is still missing. The CAMPAIGN GPU library attempts to fill this gap. A flowchart for a typical clustering algorithm (e.g. K-means) is depicted above. Different modules for CPU and GPU can be combined for preprocessing, clustering and distance calculations, specific to the user’s need. Graphics processing units Initially CAMPAIGN supports Nvidia graphics hardware (pictured above) using the ‘C for CUDA’ parallel API. GPU vs. CPU raw performance Example computing kernel for GPU Examples for use of CAMPAIGN CAMPAIGN modules Floating-Point Operations per Second GT200 = GeForce GTX 280 G92 = GeForce 9800 GTX G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce 6800 Ultra NV35 = GeForce FX 5950 Ultra NV30 = GeForce FX 5800 Clustering algorithms K-means, K-centers, hierarchical clustering, and self-organizing map (GPU and CPU) are either completed or near completion. In addition, research is done on novel codes optimized specifically for the GPU many-core platform and taking into account memory restraints. // K-centers: Assign cluster center; N elements, ctrID current cluster center ID // ASSIGN data - center assignments, MINDIST/NEWDIST min. and curr. distance __global__ void assignPointToCenter(int N, int ctrID, int *ASSIGN, float *MINDIST, float *NEWDIST) { // get thread number int threadID = blockIdx.x * blockDim.x + threadIdx.x; if (threadID < N) { // check if current cluster center is closer than previous and if, update if (NEWDIST[threadID] < MINDIST[threadID]) { ASSIGN[threadID] = ctrID; MINDIST[threadID] = NEWDIST[threadID]; } } [1] Andreopoulos B, An A, Wang X, Schroeder M. “A roadmap of clustering algorithms: finding a match for a biomedical application”. Brief Bioinform, 2009 10(3): 297-314. [2] Belacel N, Wang Q, Cuperlovic-Culf M. “Clustering methods for microarray gene expression data”. OMICS, 2006 10(4): 507-31. [3] Chodera JD, Singhal N, Pande VS, Dill KA, Swope WC. “Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics”. J Chem Phys, 2007 126(15): 155101. [4] Zemla A, Geisbrecht B, Smith J, Lam M, Kirkpatrick B, Wagner M, Slezak T, Zhou CE, “STRALCP--structure alignment-based clustering of proteins”. Nucleic Acids Res, 2007 35(22): e150. [5] Shalom, SAA, Dash M, and Tue M, “Efficient k-means clustering using accelerated graphics processors”. DaWaK 2008, 2008 LNCS 5182: 166-75. [6] Wu R, Zhang B, and Hsu M, “Clustering billions of data points using GPUs”. Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop, 2009, ACM: Ischia, Italy. [7] Liang MP, Banatao DR, Klein TE, Brutlag DL, Altman RB. "WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures." Nucleic Acids Res, 2003 31(13):3324-7. [8] Bowman GR, Huang X, Pande VS. “Using generalized ensemble simulations and Markov state models to identify conformational states”, Methods, 2009 49(2): 197-201 [9] Theobald DL, “Rapid calculation of RMSDs using a quaternion-based characteristic polynomial”. Acta Cryst, 2005 A61: 478-480. [10] SimTK: https://simtk.org/xml/index.xml Ex. 1: FEATURE [7] FEATURE investigates local similarities in a protein structure by probing grid points for biophysical and biochemical features. Clustering the resulting feature vectors helps detect yet unknown binding sites. Ex. 2: Protein dynamics By clustering millions of protein conformations from computer simulations, MSM Builder [8] creates Markov State Models for detection of a protein’s functional states. Speed comparisons Performance gain GPU over CPU Performance gain GPU over MatLab K-means and K-centers run more then 35x faster on the GPU than on the CPU for large enough data sets. Our examples (left) benefit from this. CAMPAIGN outperforms MatLab R2008b in terms of speed by two orders of magnitude. In addition, it was able to handle larger data sets. Acknowledgements This work is funded by the National Institutes of Health through the NIH Roadmap for Medical Research Grant U54 GM072970 and NIH Grant LM-05652. K=512 K=256

Upload: trankien

Post on 15-Apr-2018

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CAMPAIGN – Clustering Algorithms in Modular, Parallel, and ...forum.stanford.edu/events/posterslides/CAMPAIGNClustering... · CAMPAIGN – Clustering Algorithms in Modular, Parallel,

CAMPAIGN – Clustering Algorithms in Modular, Parallel, and Accelerated Implementation for GPU Nodes

Kai J. Kohlhoff1, Marc Sosnick4, Vijay S. Pande2,3, and Russ B. Altman1,3

Departments of 1Bioengineering, 2Chemistry, and 3Computer Science, Stanford University, CA 94305 4Department of Computer Science, San Francisco State University, CA 94132

The algorithms in the CAMPAIGN library achieve up to two orders of magnitude performance improvement over conventional CPU implementations on a single processor. They greatly reduce the time required for data clustering making it possible to create new data analysis protocols based on higher throughput, finer sampling, or higher numbers of iterations or repetitions.

Creating optimized GPU-specific code requires more effort than conventional CPU code. CAMPAIGN seeks to create a common, open-source platform. It is modular, making it easy to adapt to specific problems, and extensible, resulting in high flexibility. We hope that the library will be adopted and grow through contributions from the scientific community. CAMPAIGN will soon be freely available for download from SimTK.org[10].

References

We demonstrate the speedup achieved by a GPU (Nvidia Tesla C1060) over a CPU (Intel Xeon E5420) for K-means and K-centers with Euclidean distance metric. In addition, we show the performance gain in comparison with K-means from the MatLab software package. Resulting clusters were identical.

Introduction Materials and Methods Results

Discussion

Clustering algorithms are of central interest in computer science and an essential component of many data analysis toolkits. An active research area is the analysis of a wide variety of biological data sets[1-4].

Massively parallel Graphics processors (GPUs) outperform CPUs in terms of sheer floating point operations by one to two orders of magnitude. They are thus rapidly gaining in popularity for use in computational research and data analysis.

A number of parallel K-means implementations on GPUs have been published[5,6], but an extended set of open-source clustering codes for easy use and benchmarking is still missing.

The CAMPAIGN GPU library attempts to fill this gap.

A flowchart for a typical clustering algorithm (e.g. K-means) is depicted above. Different modules for CPU and GPU can be combined for preprocessing, clustering and distance calculations, specific to the user’s need.

Graphics processing units

Initially CAMPAIGN supports Nvidia graphics hardware (pictured above) using the ‘C for CUDA’ parallel API.

GPU vs. CPU raw performance

Example computing kernel for GPU

Examples for use of CAMPAIGN

CAMPAIGN modules

Floating-Point Operations per Second

GT200 = GeForce GTX 280 G92 = GeForce 9800 GTX G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce 6800 Ultra NV35 = GeForce FX 5950 Ultra NV30 = GeForce FX 5800

Clustering algorithms K-means, K-centers, hierarchical clustering, and self-organizing map (GPU and CPU) are either completed or near completion. In addition, research is done on novel codes optimized specifically for the GPU many-core platform and taking into account memory restraints.

// K-centers: Assign cluster center; N elements, ctrID current cluster center ID // ASSIGN data - center assignments, MINDIST/NEWDIST min. and curr. distance __global__ void assignPointToCenter(int N, int ctrID, int *ASSIGN, float *MINDIST, float *NEWDIST) { // get thread number int threadID = blockIdx.x * blockDim.x + threadIdx.x; if (threadID < N) {

// check if current cluster center is closer than previous and if, update if (NEWDIST[threadID] < MINDIST[threadID]) { ASSIGN[threadID] = ctrID; MINDIST[threadID] = NEWDIST[threadID]; }

}

[1] Andreopoulos B, An A, Wang X, Schroeder M. “A roadmap of clustering algorithms: finding a match for a biomedical application”. Brief Bioinform, 2009 10(3): 297-314. [2] Belacel N, Wang Q, Cuperlovic-Culf M. “Clustering methods for microarray gene expression data”. OMICS, 2006 10(4): 507-31. [3] Chodera JD, Singhal N, Pande VS, Dill KA, Swope WC. “Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics”. J Chem Phys, 2007 126(15): 155101. [4] Zemla A, Geisbrecht B, Smith J, Lam M, Kirkpatrick B, Wagner M, Slezak T, Zhou CE, “STRALCP--structure alignment-based clustering of proteins”. Nucleic Acids Res, 2007 35(22): e150. [5] Shalom, SAA, Dash M, and Tue M, “Efficient k-means clustering using accelerated graphics processors”. DaWaK 2008, 2008 LNCS 5182: 166-75. [6] Wu R, Zhang B, and Hsu M, “Clustering billions of data points using GPUs”. Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop, 2009, ACM: Ischia, Italy. [7] Liang MP, Banatao DR, Klein TE, Brutlag DL, Altman RB. "WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures." Nucleic Acids Res, 2003 31(13):3324-7. [8] Bowman GR, Huang X, Pande VS. “Using generalized ensemble simulations and Markov state models to identify conformational states”, Methods, 2009 49(2): 197-201 [9] Theobald DL, “Rapid calculation of RMSDs using a quaternion-based characteristic polynomial”. Acta Cryst, 2005 A61: 478-480. [10] SimTK: https://simtk.org/xml/index.xml

Ex. 1: FEATURE[7]

FEATURE investigates local similarities in a protein structure by probing grid points for biophysical and biochemical features. Clustering the resulting feature vectors helps detect yet unknown binding sites.

Ex. 2: Protein dynamics

By clustering millions of protein conformations from computer simulations, MSM Builder[8] creates Markov State Models for detection of a protein’s functional states.

Speed comparisons

Performance gain GPU over CPU

Performance gain GPU over MatLab

K-means and K-centers run more then 35x faster on the GPU than on the CPU for large enough data sets. Our examples (left) benefit from this.

CAMPAIGN outperforms MatLab R2008b in terms of speed by two orders of magnitude. In addition, it was able to handle larger data sets.

Acknowledgements This work is funded by the National Institutes of Health through the NIH Roadmap for Medical Research Grant U54 GM072970 and NIH Grant LM-05652.

K=512

K=256