lecture 9 genomic wide analysis of nucleic acids

63
Lecture 9 Genomic wide analysis of nucleic acids.

Post on 15-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 9 Genomic wide analysis of nucleic acids

Lecture 9Genomic wide analysis of nucleic acids.

Page 2: Lecture 9 Genomic wide analysis of nucleic acids

A Genome Revolution in Biology and Medicine

• We are in the midst of a "Golden Era" of biology

• The Human Genome Project has produced a huge storehouse of data that will be used to change every aspect of biological research and medicine

• The revolution is about treating biology as an information science, not about specific biochemical technologies.

Page 3: Lecture 9 Genomic wide analysis of nucleic acids

Historical Milestones

Year Milestone1866 Mendel’s discovery of genes1871 Discovery of nucleic acids1951 First protein sequence (insulin)1953 Double helix structure of DNA1960s Elucidation of the genetic code1977 Advent of DNA sequencing1975-79 First cloning of human genes1986 Fully automated DNA sequencing1995 First whole genome (Haemophilus Influenza)1999 First human chromosome(Chr #22)2000 Drosophila / Arabidopsis genomes2001 Human and mouse genomesMuch more genomes since them!!

Page 4: Lecture 9 Genomic wide analysis of nucleic acids

• Genomic data– Whole genome data sets. According to http://

www.ebi.ac.uk/genomes/ as at 30-sept-04• Archea – 19• Bacteria – 167• Eukaryota - 36• Organelles – 569• Phages – 137• Plasmids – 204• Viroids – 36• Viruses – 911

• TOTAL:2079

Page 5: Lecture 9 Genomic wide analysis of nucleic acids

The ….omics

Page 6: Lecture 9 Genomic wide analysis of nucleic acids

Genomics

• The application of high-throughput automated technologies to molecular biology.

• The experimental study of complete genomes.

Page 7: Lecture 9 Genomic wide analysis of nucleic acids

Genomics Technologies

• Automated DNA sequencing• Automated annotation of sequences• DNA microarrays

– gene expression (measure RNA levels)– single nucleotide polymorphisms (SNPs)

• Protein chips (SELDI, etc.)• Protein-protein interactions

Page 8: Lecture 9 Genomic wide analysis of nucleic acids

New Types of Biological Data

• Microarrays - gene expression

• Multi-level maps: genetic, physical, sequence splicing, expression, function

• Networks of protein-protein interactions

• Cross-species relationships• homologous genes• chromosome organization (synteny)• common regulatory sequences

Page 9: Lecture 9 Genomic wide analysis of nucleic acids

Biological Information

Protein 3-D Structure

The Cell

Protein 2-D gelmRNA Expression

Genome sequence Mass Spec.

Page 10: Lecture 9 Genomic wide analysis of nucleic acids

What is gene expression?

• The amount of RNA produced from a gene.• Level of RNA produced from a gene is controlled

by:– Transcription– Degradation

• Transcriptome - Expressed transcripts in a cell under defined experimental conditions.– mRNA(5-10% of total RNA).– rRNA, tRNA - make up most of total RNA

Page 11: Lecture 9 Genomic wide analysis of nucleic acids

Analysis of gene expression at the single gene level.

• Northern Blots– Measure RNA levels by hybridization of a

labeled probe to total RNA.

• Reporter Genes– Use of an enzyme to measure the amount of

transcription from a promoter.

• Quantitative RT-PCR.

Page 12: Lecture 9 Genomic wide analysis of nucleic acids

Assaying the regulation of 1000s of genes in a single experiment

• DNA microarrays– DNA molecules printed at high density used to

determine the level of RNA or DNA in a sample.

– Can be thought of a “reverse Northern blots”

• Other technologies

- SAGE

- Microbeads

Page 13: Lecture 9 Genomic wide analysis of nucleic acids

DNA Microarrays• Spotted DNA arrays (glass slides)

– Competitive binding of samples– Fluorescent detection - Cy3 and Cy5– Small sample sizes (10-30µl).– PCR or cDNA arrays– Long oligonucleotide arrays

• Short oligonucleotide arrays– ex. Affymetrix

• DNA spotted onto nylon membranes (macroarrays)

Page 14: Lecture 9 Genomic wide analysis of nucleic acids

Applications of DNA microarrays

• Expression profiling– Determining the relative levels of RNA in two or more

samples.

• DNA/DNA hybridizations– Investigate gene content between different strains– Determine gene dosage – 16S arrays - microbial communities (being developed).

• Identification of protein binding sites– ChIP-Chip. Immunoprecipitation of protein/DNA

complexes. Assaying those interactions with microarrays.

Page 15: Lecture 9 Genomic wide analysis of nucleic acids

cDNA spotted microarrays

Page 16: Lecture 9 Genomic wide analysis of nucleic acids

Labeling RNA or DNA with Cy3 or Cy5.

• Cy3 and Cy5 - most often used fluorescent molecules used to label samples for microarray analysis.– Absorb light at one wavelength and emit at another.– Emission and Excitation spectra do not overlap

significantly.– In arrays Cy3 and Cy5 are usually false colored green

(Cy3) and red (Cy5) for ease of visualization.

Page 17: Lecture 9 Genomic wide analysis of nucleic acids
Page 18: Lecture 9 Genomic wide analysis of nucleic acids

Affymetrix Gene Chips

Page 19: Lecture 9 Genomic wide analysis of nucleic acids

Microarray Experiment - labeling, hybridizing, scanning

Page 20: Lecture 9 Genomic wide analysis of nucleic acids
Page 21: Lecture 9 Genomic wide analysis of nucleic acids

Each gene on an Affy chip is represented by a probe set

Affymetrix = Oligonucleotide Microarray

Page 22: Lecture 9 Genomic wide analysis of nucleic acids

Rationality of Affy analysis

- MM probes are used to measure background signals due to non-specific sources and scanner offset.

- Using a MM probe as an estimate of background seems them great in theory.

- The expression value for a gene is a combination of the (PM-MM) signals for each of the probes (i.e. the average)

Page 23: Lecture 9 Genomic wide analysis of nucleic acids

Microarray Data Analysis• Data mining and visualization• Controls and normalization of results• Statistical validation• Linkage between gene expression data

and gene sequence/function/metabolic pathways databases

• Clustering and pattern detection• Discovery of common sequences in co-

regulated genes

Page 24: Lecture 9 Genomic wide analysis of nucleic acids
Page 25: Lecture 9 Genomic wide analysis of nucleic acids

Regulons and Stimulons

• Operon - group of genes co-expressed on a single transcript.– One location of the genome

• Regulon - genes that are regulated by a single transcription factor.– Genes and operons throughout the genome

• Stimulon - collection of genes that are regulated in response to environmental changes.– Can be multiple regulons affected at once.

• Regulatory network - alternative term for regulon.

Page 26: Lecture 9 Genomic wide analysis of nucleic acids
Page 27: Lecture 9 Genomic wide analysis of nucleic acids
Page 28: Lecture 9 Genomic wide analysis of nucleic acids
Page 29: Lecture 9 Genomic wide analysis of nucleic acids
Page 30: Lecture 9 Genomic wide analysis of nucleic acids

Identifying geneswhose expression changes at specificstages of the cell cycle

Page 31: Lecture 9 Genomic wide analysis of nucleic acids
Page 32: Lecture 9 Genomic wide analysis of nucleic acids
Page 33: Lecture 9 Genomic wide analysis of nucleic acids

12346 hr

YCR033WYCR034W FEN3YCR035C RRP43YCR036W RBK1YCR037C PHO87YCR038C BUD5YCR039C MAT2

MAT1YCR040W YCR041WYCR042C TSM1YCR043CYCR044CYCR045C

1 2 3 4 6 hr

RBK1 PHO87 BUD5 MAT2 MAT1 TSM1 YCR043C

HO

Microarray analysis of 150 damage-regulated mRNAs after a single unrepaired HO-induced DSB

4 kb/hr

Audrey GaschMoreshwar Vaze

Page 34: Lecture 9 Genomic wide analysis of nucleic acids

day night day night day night

Circadian Rhythms

Genes whose expression changesduring the dayin fruit flies

Page 35: Lecture 9 Genomic wide analysis of nucleic acids

Cancer can be qualified from the transcriptome

Page 36: Lecture 9 Genomic wide analysis of nucleic acids

Bioinformatics

• Genomics produces high-throughput, high-quality data, and bioinformatics provides the analysis and interpretation of these massive data sets.

• It is impossible to separate genomics laboratory technologies from the computational tools required for data analysis.

Page 37: Lecture 9 Genomic wide analysis of nucleic acids
Page 38: Lecture 9 Genomic wide analysis of nucleic acids
Page 39: Lecture 9 Genomic wide analysis of nucleic acids
Page 40: Lecture 9 Genomic wide analysis of nucleic acids
Page 41: Lecture 9 Genomic wide analysis of nucleic acids
Page 42: Lecture 9 Genomic wide analysis of nucleic acids
Page 43: Lecture 9 Genomic wide analysis of nucleic acids
Page 44: Lecture 9 Genomic wide analysis of nucleic acids
Page 45: Lecture 9 Genomic wide analysis of nucleic acids
Page 46: Lecture 9 Genomic wide analysis of nucleic acids
Page 47: Lecture 9 Genomic wide analysis of nucleic acids

What type of data we can use to build a transcriptional network?

-Protein-Protein interaction data

-Expression data

-ChIP data

Page 48: Lecture 9 Genomic wide analysis of nucleic acids

CHIP ON CHIP

Page 49: Lecture 9 Genomic wide analysis of nucleic acids
Page 50: Lecture 9 Genomic wide analysis of nucleic acids
Page 51: Lecture 9 Genomic wide analysis of nucleic acids
Page 52: Lecture 9 Genomic wide analysis of nucleic acids

• The Assumption that underlies comparitive genomics is that the two genomes had a common ancestor and that each organism is a combination of the ancestor and the action of evolution.

• Evolution can be broadly thought of as the combination of two processes: mutational forces that generate random mutations in the genome sequence, and selection pressures that

1. Eliminate random mutations (negative selection),

2. Have no effect on mutations (neutral selection) or,

2. Increase the frequency of mutant alleles in the population as a result of a gain in fitness (positive selection).

• The combined action of mutation and selection is represented generally by a RATE MATRIX of base-pair changes between the two observed genomes.

Comparative Genomics

Page 53: Lecture 9 Genomic wide analysis of nucleic acids

Comparative GenomicsHuman

Mouse

Rat

C.Elegans

Evolutionary relationship between metazoans that are sequenced, or due for sequencing.

Evolutionary distances are in millions of years.

Page 54: Lecture 9 Genomic wide analysis of nucleic acids

• Comparative genomics may be defined as the derivation of genomic information following comparison of the information content of 2 or more species genome sequences

Comparative Genomics

Page 55: Lecture 9 Genomic wide analysis of nucleic acids

http://www.ornl.gov/TechResources/Human_Genome/graphics/slides/ttmousehuman.html

The similarity is such that human chromosomes can be cut (schematically at least) into about 150 pieces (only about 100 are large enough to appear here), then reassembled into a reasonable

approximation of the mouse genome.

Page 56: Lecture 9 Genomic wide analysis of nucleic acids
Page 57: Lecture 9 Genomic wide analysis of nucleic acids

Harnessing the genome to answer real problems

How do we control infectious disease?How do we slow or stop the effects of cancer?

How can we detect and treat genetic disorders?

Only 2% of human diseases are due to single gene defectsthe rest involve networks of gene expression.

Most pharmaceutical drugs act on individual proteins or sets of proteins.

Page 58: Lecture 9 Genomic wide analysis of nucleic acids

Proteomics

The study of the ‘proteome’

While an organism has only one genome, it has many transcriptomes, proteomes and metabolomes

Page 59: Lecture 9 Genomic wide analysis of nucleic acids

mRNA level expressed protein level nor does it indicate the nature of the functional protein product

GenomicSequence

mRNAProteinProduct

FunctionalProteinProduct

Transcriptional

Control

Translational

Control

Post-Translational

Control

Page 60: Lecture 9 Genomic wide analysis of nucleic acids

Temporal Changes in mRNA and protein

When you measure expression affects what you find

ProteinGene Expression

t t t

Page 61: Lecture 9 Genomic wide analysis of nucleic acids

Does mRNA level correlate with protein level?

Anderson & SeilhamerElectrophoresis1997 18:533-537

Anderson & AndersonElectrophoresis1998 19:1853-1861

From Tew et al 1996

20 liver proteins and corresponding mRNAs

Glutathione-S-transferasein 60 human cell lines

xx

x

xx

x

LungOvarianCNSLeukemiaRenalMelanomaBreast

0.1 1.0 10 1000.1

1.0

10

100

1000

R = 0.43

Protein (Affinity-HPLC)

mR

NA

(N

orth

ern)

0.1 1 10 1000.1

1

10

100

1000

R=0.48

Protein (2D gels)

mR

NA

(E

ST

clo

nes)

Page 62: Lecture 9 Genomic wide analysis of nucleic acids

Genomics, proteomics era.

-Lots of data (lots of real data and lots of noise!). Needs validation!

-Dangers : + Become too descriptive and reductionist + Forget about the biological problem

Page 63: Lecture 9 Genomic wide analysis of nucleic acids

Success is the ability to go from failure to failure without losing your enthusiasm.- Winston Churchill, 1874-1965

Year by year we are becoming better equipped to accomplish the things we are striving for. But what are we actually striving for?- Bertrand de Jouvenel, 1903-1987