28-way vertebrate alignment and conservation track in the ucsc genome browser journal club dec. 7,...

33
28-Way vertebrate alignm ent and conservation tra ck in the UCSC Genome Br owser Journal club Dec. 7, 2007

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

28-Way vertebrate alignment and conservation track in the

UCSC Genome Browser

Journal club Dec. 7, 2007

Page 2: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Vertebrate genome sequencing

the Broad Institute of MIT (Massachusetts Institute of Technology) and Harvard

the Human Genome Sequencing Center at the Baylor College of Medicine

the Genome Sequencing Center at Washington University. the Sanger Center the Department of Energy’s (DOE’s) Joint Genome Institut

e the National Institute of Genetics in Japan.

Page 3: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007
Page 4: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Alignment:

Similarities & differences between genome sequences:

1. functional noncoding regions2. protein-coding genes3. non-coding RNA genes

Page 5: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Aims

1. to more reliably identify functional elements via sequence alignment

2. To enhance the effectiveness of the disease-model species for experiment

3. To determine the course of evolution & reconstruct the ancestral genome sequence

Page 6: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

April 2007:17 2811 old species data6 updated old species11 new species

Page 7: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007
Page 8: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007
Page 9: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007
Page 10: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

>79%

Heterogeneous mix

Page 11: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Coverage:2 – >99%16 – 5.1% ~ 8.5%10 – ~2x(2x – 87.5%, 5x – 99.4%)Cloning bias…

Page 12: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Applications

Application 1: indels in protein-coding regions

Application 2: conservation of start and stop codons

Application 3: phylogenetic extent of alignment of funct

ional regions

Page 13: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Application 1 Indels accumulated at a uniform rate during th

e evolution?

The phenotypic consequence of human- specific protein indels?

Positions of potentially disease-associated indels resisted substitution over evolutionary time – interspecies conservation

Page 14: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

6-bp indel near the start of PRNP

Primate & glires

P G D

Page 15: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007
Page 16: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Total Indel: 209

# of Indel / # per MY

Parametric bootstrap test ---- significantly differ from hypothesis

4/MY

2/MY

Page 17: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Human specific protein indels

Page 18: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

SULF1: human specific 3-bp insertion in exon 11

Replication slippage1. Fixed in humans

2. Very conserved region (retain 4Es over 2 billion years)

3. Without 3D data

Page 19: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

GFM2: human specific 6-bp insertion

1. Not conserved region

2. This insertion only occurs in some human individuals

3. Similar protein 3D data implied no phynotypic consequence

Page 20: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007
Page 21: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Human replacement disease-associated amino acid mutations are overabundant occur predominantly in positions essential to the structure and function of the proteins

Subramanian and Kumar, BMC Genomics 2006, 7:306

Page 22: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Disease-associated deletion More species considering Data from PhenCode Locus Variants PAH Simplified distance -- # of distinct aa.

Page 23: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

6

Page 24: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

>79%

<

Hard to identify precise gene boundaries based on comparative genomics data

Drift away

Page 25: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Hypothesis 1:

the CpG islands that are common near gene starts are more difficult to sequence

Page 26: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Hypothesis 2:

Selection at the start codon might be more relaxed in genes with multiple promoters (alternate promoters)

4%1.65%

Page 27: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Hypothesis 3:

the program may not have enough surrounding conserved sequence to reliably align the small initial coding exon around the start codon

Page 28: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Hypothesis 3:

the program may not have enough surrounding conserved sequence to reliably align the small initial coding exon around the start codon

similar

Page 29: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Conclusion A bias against CpG islands in the draft sequ

ence combined with difficulty in aligning small initial coding exons does explain a great deal of the observed unalignability of start codons compared with stop codons

Gene model based on multiple genomic alignments must be aware of the start codon

Page 30: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Background – finding functional elements conservation in noncoding regions is much

more subject to evolutionary turnover than in protein-coding regions.

Evolutionary(conservation) turnover-- Most studies tacitly equate homology of functional elements with sequence homology. This assumption is violated by the phenomenon of turnover, in which functionally equivalent elements reside at locations that are nonorthologous at the sequence level.

Frith et al. Genome research 2006

More species genomics data --- higher resolution

Page 31: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

251000 coding exons of RefSeq genes 481 ultraconserved elements 94000 predicted regulatory regions(P

RPs) 3900 putative transcriptional regulato

ry regions (pTRRs)

Page 32: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Alignability: the fraction that aligns with a designated comparison species

Page 33: 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007

Hu

ma

n