the incompatible desiderata of gene cluster propertiesroseh/slides/rcg05_slides.pdf · gene cluster...
TRANSCRIPT
![Page 1: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/1.jpg)
The Incompatible Desiderata of Gene Cluster Properties
Rose HobermanCarnegie Mellon University
joint work with Dannie Durand
![Page 2: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/2.jpg)
How to detect segmental homology?
Intuitive notions of what gene clusters look like
Enriched for homologous gene pairsNeither gene content nor order is perfectly preserved
How can we define a gene cluster formally?
![Page 3: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/3.jpg)
Definitions will be application-dependent
If the goal is to estimate the number of inversions, then gene order should be preserved
If the goal is to find duplicated segments, allow some disorder
![Page 4: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/4.jpg)
Gene Clusters DefinitionsLarge-Scale DuplicationsVandepoele et al 02McLysaght et al 02Hampson et al 03Panopoulou et al 03Guyot & Keller, 04Kellis et al, 04...
Genome rearrangementsBourque et al, 05 Pevzner & Tesler 03Coghlan and Wolfe 02...
Functional Associations between GenesTamames 01Wolf et al 01Chen et al 04Westover et al 05...
Algorithmic and Statistical CommunitiesBergeron et al 02Calabrese et al 03Heber & Stoye 01...
![Page 5: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/5.jpg)
Groups find very different clusters when analyzing the same data
0 20 40 60 80
Vandepoele et al, 03
Simillion et al, 04
Wang et al, 05
Guyot et al, 04
Paterson et al, 04
Yu et al, 05
Percent Coverage of Rice Genome
![Page 6: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/6.jpg)
Cluster locations differ from study
to study
Inference of duplication
mechanism for individual genes varies greatly
The Genomes of Oryza sativa: A History of Duplications Yu et al, PLoS Biology 2005
![Page 7: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/7.jpg)
Goals:
Characterizing existing definitions
Formal properties form a basis for comparison
Gene cluster desiderata
![Page 8: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/8.jpg)
Outline
IntroductionBrief overview of gene cluster identificationProposed properties for comparisonAnalysis of data: nested property
![Page 9: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/9.jpg)
Detecting Homologous Chromosomal Segments (a marker-based approach)
1. Find homologous genes2. Formally define a “gene cluster” 3. Devise an algorithm to identify clusters4. Statistically verify that clusters indicate
common ancestry
![Page 10: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/10.jpg)
Cluster definitions in the literature
Descriptive:
r-windows connected components
(Pevzner & Tesler 03)
common intervals (Uno and Tagiura 00)
max-gap…
Constructive:
LineUp (Hampson et al 03)
CloseUp (Hampson et al 05)
FISH (Calabrese et al 03)
AdHoRe (Vandepoele et al 02)
Gene teams (Bergeron et al 02)
greedy max-gap (Hokamp 01)
…Require search algorithms Harder to reason about formally
![Page 11: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/11.jpg)
Cluster definitions in the literature
Descriptive:
r-windowsconnected components
(Pevzner & Tesler 03)
common intervals (Uno and Tagiura 00)
max-gap…
Constructive:
LineUp (Hampson et al 03)
CloseUp (Hampson et al 05)
FISH (Calabrese et al 03)
AdHoRe (Vandepoele et al 02)
Gene teams (Bergeron et al 02)
greedy max-gap (Hokamp 01)
…
I illustrate properties with a few definitions
![Page 12: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/12.jpg)
r-windows
r =4, m ≥ 2
Two windows of size r that share at least mhomologous gene pairs
(Calvacanti et al 03, Durand and Sankoff 03, Friedman & Hughes 01, Raghupathy and Durand 05)
![Page 13: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/13.jpg)
max-gap cluster
A set of genes form a max-gap cluster if the gap between adjacent genes is never greater than g on either genome
Widely used definition in genomic studies
g ≤ 2 g ≤ 3
![Page 14: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/14.jpg)
Outline
IntroductionBrief overview of existing approachesProposed properties for comparisonAnalysis of data: nested property
![Page 15: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/15.jpg)
Proposed Cluster Properties
SymmetrySizeDensityOrderOrientationNestednessDisjointnessIsolationTemporal Coherence
![Page 16: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/16.jpg)
Symmetry
=?clusters found clusters found
Many existing cluster algorithms are not symmetric with respect to chromosome
![Page 17: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/17.jpg)
Asymmetry: an example
FISH (Calabrese et al, 2003)Constructive cluster definition: clusters correspond to paths through a dot-plot
Publicly available software
Statistical model
![Page 18: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/18.jpg)
1 2 3 6 5 99 4 7 8 9123456789
Asymmetry: an example
FISH
Euclidian distance between gene pairs is constrained
Paths in the dot-plot must always move to the right
![Page 19: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/19.jpg)
Switching the axes yields different clusters
8
987654321
9
749956321
FISH
Euclidian distance between markers is constrained
Paths in the dot-plot must always move to the right
![Page 20: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/20.jpg)
8
987654321
9749956321
Ways to regain symmetry
1. Paths in the dot-plot must always move downand to the right
miss the inversion
2. Paths can move in any direction
statistics becomes difficult
Regaining symmetry entails some tradeoffs
![Page 21: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/21.jpg)
Proposed Cluster Properties
SymmetrySizeDensityOrderOrientationNestednessDisjointnessIsolationTemporal Coherence
![Page 22: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/22.jpg)
size = 5, length = 12
density = 5/12
Cluster Parameterssize: number of homologous pairs in the clusterlength: total number of genes in the clusterdensity: proportion of homologous pairs (size/length)
![Page 23: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/23.jpg)
• cluster grows to its natural size • cluster of size m may be of length m to g(m -1)+ m• maximal length grows as size grows
gap ≤ ggap ≤ ggap ≤ gmax-gap clusters
length ≤ rr-windows
• cluster size is constrained• cluster of size m may be of length m to r• maximal length is fixed, regardless of cluster size
![Page 24: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/24.jpg)
A tradeoff: local vs global density
max-gap constrains local densityonly weakly constrains global density (≥ 1/(g+1))
r-windowconstrains global densityonly weakly constrains local density (maximum possible gap ≤ r-m)
![Page 25: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/25.jpg)
Even when global density is high,
Density = 12/18
a region may not be locally dense
![Page 26: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/26.jpg)
Size vs Density: An exampleApplication: all-against-all comparison of human
chromosomes to find duplicated blocks
Maximum Gap Cluster Size Post-Processing
McLysaght et al, 2002 constrained test statistic
Panopoulou et al, 2003 test statistic constrained merged nearby
clusters
![Page 27: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/27.jpg)
Panopoulou et al 2003
Size >= 2Gap ≤ 10
1
10
20
30
0 5 10 15 20 25 30
Gap
Size
Large and Dense
Small but dense
Largebut less dense
McLysaght et al, 2002Gap ≤ 30, Size ≥ 6
A Tradeoff in Parameter Space
![Page 28: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/28.jpg)
Proposed Cluster Properties
SymmetrySizeDensityOrderOrientationDisjointntessIsolationNestednessTemporal Coherence
![Page 29: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/29.jpg)
Order and Orientation
density = 6/8density = 6/8
Local rearrangements will cause both gene order and orientation to diverge
Overly stringent order constraints could lead to false negativesPartial conservation of order and orientation provide additional evidence of regional homology
![Page 30: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/30.jpg)
Wide Variation in Order Constraints
None (r-windows, max-gap, ...)
Explicit constraints:Limited number of order violations (Hampson et al, 03)
Near-diagonals in the dot-plot (Calabrese et al 03, ...)
Test statistic (Sankoff and Haque, 05)
Implicit constraints: via the search algorithm(Hampson et al 05, ...)
![Page 31: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/31.jpg)
Proposed Cluster Properties
SymmetrySizeDensityDisjointnessIsolationOrderOrientationNestednessTemporal Coherence
![Page 32: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/32.jpg)
Nestedness
In particular, implicit ordering constraints are imposed by many greedy, agglomerative search algorithms
Formally, such search algorithms will find only nested clusters
A cluster of size m is nested if
it contains sub-clusters of size m-1,...,1
![Page 33: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/33.jpg)
Greedy Algorithms Impose Order Constraints
g = 2
A greedy, agglomerative algorithminitializes a cluster as a single homologous pairsearches for a gene in proximity on both chromosomeseither extends the cluster and repeats, or terminates
![Page 34: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/34.jpg)
Greediness: an example (Bergeron et al, 02)
g = 2
A max-gap cluster of size four
No greedy, agglomerative algorithm will find this clusterThere is no max-gap cluster of size 2 (or 3)
In other words, the cluster is not nested
![Page 35: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/35.jpg)
Thus: different results when searching for max-gap clusters
Greedy algorithms agglomerativefind nested max-gap clusters
Gene Teams algorithm (Bergeron et al 02; Beal et al 03,...)
divide-and-conquerfinds all max-gap clusters, nested or not
![Page 36: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/36.jpg)
An example of a greedy search: CloseUp (Hampson et al, Bioinformatics, 2005)
Software tool to find clusters
Goal: statistical detection of chromosomal homology using density alone
Method:greedy search for nearby matchesterminates when density is lowrandomization to statistically verify clusters
![Page 37: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/37.jpg)
A comparative study (Hampson et al, 05)
Is order information necessary or even helpful for cluster detection?
Empirical comparison: CloseUp: “density alone”, but greedyLineUp and ADHoRe: density + order informationevaluated accuracy on synthetic data
![Page 38: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/38.jpg)
A comparative study (Hampson et al, 05)
Is order information necessary or even helpful for cluster detection?
Result: CloseUp had comparable performance
Their conclusion: order is not particularly helpful
My conclusion: results are actually inconclusive, since CloseUp implicitly constrains order
![Page 39: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/39.jpg)
Proposed Cluster Properties
SymmetrySizeDensityOrderOrientationNestednessDisjointnessIsolationTemporal Coherence
![Page 40: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/40.jpg)
Gene clusters: islands of homology in a sea of interlopers
How can we formally describe this intuitive notion?
![Page 41: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/41.jpg)
Islands of Homology
Disjoint: A homologous gene pair should be a member of at most one cluster
Isolated: The minimum distance between clusters should be larger than the maximum distance between homologous gene pairs within the cluster
![Page 42: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/42.jpg)
Various types of constraints lead to overlapping (or nearby) clusters that cannot be merged
If we search for clusters with density ≥ ½:
If we search for nested max-gap clusters, g=1:
![Page 43: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/43.jpg)
Our Proposed Cluster Properties
SymmetrySizeDensityDisjointnessIsolationOrderOrientationNestednessTemporal Coherence
![Page 44: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/44.jpg)
Temporal coherence
now
before
time
Divergence times of homologous pairs within a block should agree
![Page 45: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/45.jpg)
OutlineIntroductionBrief overview of existing approachesProposed properties for comparisonMy analysis of data: nested property
Many groups use a greedy, agglomerative search to find gene clusters
Does a greedy search have a large effect on the set of clusters identified in real data?
![Page 46: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/46.jpg)
Data
10,33817,70922,216Human & Chicken
14,76825,38322,216 Human & Mouse
1,3154,2454,108 E. coli & B. subtilisorthologsgenes (2)genes (1)
pairwise genome comparisons
Gene orthology data:bacterial: GOLDIE database http://www.intellibiosoft.com/academic.html
eukaryotes: InParanoid database http://inparanoid.cgb.ki.se
![Page 47: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/47.jpg)
Methods
Maximal max-gap clusters Gene Teams software http://.www-igm.univ-mlv.fr/~raffinot/geneteam.html
Maximal nested max-gap clusterssimple greedy heuristic (no merging)
For each genome comparison and gap size:
![Page 48: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/48.jpg)
Percent of gene teams that are nested
0 10 20 30 40 500
0.005
0.01
0.015
0.02
g: gap size
Human/Chicken Human/Mouse E. coli/B. subtilis
98
99
100
Perc
enta
ge N
este
d
![Page 49: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/49.jpg)
Number of genes in some gene team of size 7 or greater that are not in any nested cluster of 7 or greater
0 10 20 30 40 500
10
20
30
40
50
60
70
80
g: gapsize
Num
ber
of g
enes
k=2 Human/Mousek=2 Human/Chickenk=7 Human/Mousek=7 Human/Chicken
Chicken/Human
![Page 50: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/50.jpg)
Results
For the datasets analyzed, a nestedness constraint does not appear too conservative
However, we didn’t survey a wide range of evolutionary distances
expect nestedness to decrease with evolutionary distanceopen question: are there more rearranged datasets for which the proportion of nested clusters is much smaller?
![Page 51: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/51.jpg)
Is nestedness desirable?A nestedness constraint:
offers a middle ground between no order constraints and strict order
However, nestednessprovides no formal description of order constraintsis restrictive rather than descriptive
We may instead prefer methods thatallow for parameterization of degree of disorderconsider order conservation in the statistical tests
![Page 52: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/52.jpg)
ConclusionProposed 9 properties to compare and evaluate methods for identifying gene clusters
Illustrated cluster differences due tocluster definitionsearch algorithmstatistics
Incompatible Desiderata: these properties are intuitively natural yet many are surprisingly difficult to satisfy with the same definition
![Page 53: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/53.jpg)
Acknowledgements
David SankoffThe Durand LabBarbara Lazarus Women@IT FellowshipSloan FoundationNHGRI, Packard Foundation
![Page 54: The Incompatible Desiderata of Gene Cluster Propertiesroseh/Slides/rcg05_slides.pdf · Gene cluster desiderata. Outline Introduction ¾Brief overview of gene cluster identification](https://reader035.vdocument.in/reader035/viewer/2022071015/5fce06e82de1631e2f1ccec5/html5/thumbnails/54.jpg)
Discussion
are our intuitions about clusters reasonable?which cluster properties are important or desirable?how can we quantitatively evaluate cluster definitions?what are the tradeoffs between methods?how can better definitions be designed?