unsupervised clustering in mrna expression profiles

Post on 31-Dec-2015

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Unsupervised clustering in mRNA expression profiles. D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis Computational Intelligence Laboratory (CILAB), Department of Mathematics, University of Patras, GR-26110 Patras, Greece - PowerPoint PPT Presentation

TRANSCRIPT

Unsupervised clustering in mRNA expression profiles

D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis

Computational Intelligence Laboratory (CILAB), Department of Mathematics, University of Patras, GR-26110 Patras, Greece

University of Patras Artificial Intelligence Research Center (UPAIRC), University of Patras, GR-26110 Patras, Greece

Computers in Biology and Medicine In Press, Corrected Proof, Available online 24 October 2005

K-Windows Clustering

• Adaptation of K-means, originally proposed in 2002 by Vrahatis et. al.

• Windowing technique improves speed and accuracy

• Tries to place a d-dimensional window (box) containing all patterns that belong to a single cluster

K-Windows – Basic Concepts

• Move windows to find cluster centers (fig a)1. Select k points as centers of d-windows of size a.2. Window means becomes new center.3. Repeat until stopping criterion (movement of center).

• Enlarge windows to determine cluster edges (fig b)1. Enlarge one dimension by a specified percent.2. Relocate window as above.3. Keep only if increase in instances in window exceeds threshold

Unsupervised K-Windows (UKW)

• Start with sufficiently large number of windows• Merge to automatically determine the number of

clusters• For each pair of overlapping windows, calculate

proportion of overlap for each window.a) Large overlap, considered same cluster, W1 is deleted.b) Many points in common, considered the same cluster.c) Low overlap, considered two different clusters.

Experimental Setup

• Leukemia dataset – well characterized• Default UKW parameters used• Supervised dimension reduction

– Two previously published gene subsets and their union

• Unsupervised dimension reduction– Biclustering with UKW– PCA– PCA and UKW hybrid

Supervised Feature Selection

• Use two gene subsets selected in previously published papers using supervised techniques.

• All algorithms did best on combined set, results below.

Unsupervised Feature Selection(Biclustering Technique)

• Apply UKW to cluster genes, select one gene, closest to cluster center, as representative from each cluster.

• Apply UKW to samples, using those genes (239).

• UKW accuracy: 93.6% (ALL) and 76% (AML)

• No results reported for other algorithms

Unsupervised Feature Selection(PCA Techniques)

• PCA and scree plot to reduce features– Poor Performance

• Hybrid PCA and UKW method– Partition genes using UKW– Transform each partition using PCA– Select representative factors from each

cluster– UKW accuracy: 97.87% (ALL) and 88% (AML)

UKW Results Summary

Dataset ALL Accuracy AML Accuracy

Published Gene Subsets

(Supervised)

90% 100%

UKW Biclustering (Unsupervised)

93.6% 76%

PCA (Unsupervised)

N/A N/A

PCA-UKW Hybrid (Unsupervised)

97.87% 88%

• Default parameters– initial window size a=5– enlargement threshold θe=0.8– merging threshold θm=0.1– coverage threshold θc=0.2– variability threshold θv=0.02

• Link to article

top related