rosalind elsie franklin biophysicist and crystallographer x-ray diffraction images of dna tobacco...

27
Rosalind Elsie Franklin Biophysicist and crystallographer X-ray diffraction images of DNA Tobacco mosaic and polio viruses 1920-1958 (source: wikipedia)

Upload: gwendolyn-warner

Post on 20-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Rosalind Elsie Franklin Biophysicist and

crystallographer X-ray diffraction

images of DNA Tobacco mosaic

and polio viruses 1920-1958

(source: wikipedia)

Page 2: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

A Structural Split in the Human Genome

Clara S. M. Tang and Richard J. EpsteinPLoS One (2007) 7:e603

February 13, 2007I. Elizabeth Cha

Page 3: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Introduction

PCIsPromoter-associated CpG islands

Mediate methylation-dependent gene silencing

Co-locate to transcriptionally active genes

60% of human genes contains PCIs

Page 4: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

CpG Islands

Genomic regions containing high frequency of CG dinucleotides

CpGcytidine-phosphodiester-guanosine

Formal definition At least 200bp GC percentage >50% CpG ratio >60%

Page 5: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

DNA Methylation

Page 6: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Materials and Methods Sequence data and annotations Determination of CpG island overlapping

transcription start site Housekeeping genes and paralogs of

pseudogenes Bimodal distribution of GC content Gene expression data Evolutionary rate determination Principal component analysis

Page 7: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Sequence Data and Annotations

UCSC genomic assemblies, RefSeq dataset, Emsembl gene dataset Human (hg18, 3/2006) Mouse (mm6, 3/2006) Fugu (fr1, 8/2002) Fruit fly (dm2, 4/2004) Worm (ce2, 3/2004)

Page 8: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Data Preprocessing

RepeatMask – Alu Discard sequences

Not commencing with ATG codons Not terminating with canonical stop

codons

Retain the longest genomic sequences containing identical exonic sequences

Page 9: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Determination of CpG Island Overlapping Transcription Start Site

Download CpG islands annotation (cpgIslandExt) from UCSC

Identify CpG islands overlapping with promoter regions

Map with RefGene annotation (200bp upstream and 500bp downstream)

Page 10: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Data and Tools

502 Housekeeping genes 1220 pseudogene paralogs

NOCOM program SAGEmap Homologue data XSTAT

Page 11: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Results – PCI+ Genes

Housekeeping genehigher GC contentlower intron length/number

Pseudogene paraloglower GC contenthigher intron length/number

Functional distinguishable

Page 12: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Table 1

Page 13: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Results – PCI- Genes

Higher evolutionary rate Narrower expression breadth than

PCI+ genes More frequent tissue-specific

inactivation

Page 14: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Figure 1 Biphasic GC/AT Distribution of PCI+ Genes

A. Distribution of GC content among different regions of genes

3’ UTR

5’ UTR

coding region

intronic

Page 15: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Figure 1 Biphasic GC/AT Distribution of PCI+ Genes (cont’d)

With ‘start’ CpG islands (CGI+)

Without ‘start’ CpG islands (CGI+)

B&C Proportion of genes among different GC groups.

Page 16: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Figure 2 GC Content of Promoter vs. Non-promoter CpG Island Overlapping Genes

All genesGenes with medium total intron size (10-50kb)

Intronless genes

Genes with short total intron size (<10kb) and long intron size (>50kb)

PCI+: solid line; PCI-: dash line

Page 17: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Figure 3 Distribution of Coding GC% of RefGenes with PCIs

pseudogenes House-keeping genes

Page 18: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Figure 4 Quantitative Comparison of Gene Subsets

L: low, GC<40%; H: high, GC>65%;

double dark, <0.001; single dark, <0.01; open, < 0.05

Page 19: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Figure 4 Quantitative Comparison of Gene Subsets (cont’d)

L: low, GC<40%; H: high, GC>65%;

double dark, <0.001; single dark, <0.01; open, < 0.05

Page 20: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Figure 4 Quantitative Comparison of Gene Subsets (cont’d)

L: low, GC<40%; H: high, GC>65%;

double dark, <0.001; single dark, <0.01; open, < 0.05

Page 21: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Figure 6 Model of human genomic evolution

Page 22: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Conclusions

PCIs Transcriptional regulators Evolutionary accelerators to facilitate

intron insertion

Mthylated PCIs on transcription and chromatin accelerate adaptive evolution towards biological complexity

Page 23: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Conclusions Adaptive evolution of human genome

Declining transcription of a subset of PCI+ genes

Predisposing to both CpGTpA mutation and intron insertion

Biological complexity model Environmentally selected gains/losses of

PCI methylation (+/-) Polarizing PCI+ gene structures arounda

genomic core of ancestral PCI- genes

Page 24: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Discussion

AT-rich, PCI+ gene vs. GC-rich PCI+ housekeeping gene Lower transcriptional activity Higher intron number Higher evolutionary rate

Loss of negative selection pressure

Page 25: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Discussion (cont’d)

PCI- genes vs. PCI+ genes Higher evolutionary rate Lower expression breadth

Intron number relates more directly to PCI positivity

Page 26: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Figure 5 Principal component analysis (PCA)

A. PCA analysis using six variables at either 53% (left) or 59% (right) variance

Page 27: Rosalind Elsie Franklin  Biophysicist and crystallographer  X-ray diffraction images of DNA  Tobacco mosaic and polio viruses  1920-1958 (source: wikipedia)

Figure 5 Principal component analysis (PCA) (cont’d)

B. 2D dot plots C. 3D dot plots

GC-rich, blue; GC-poor, red