1 ee381v: genomic signal processing lecture #13. 2 the course so far gene finding dna genome...

21
1 EE381V: Genomic Signal Processing EE381V: Genomic Signal Processing Lecture #13 Lecture #13

Upload: candace-wright

Post on 11-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

1

EE381V: Genomic Signal ProcessingEE381V: Genomic Signal Processing

Lecture #13Lecture #13

Page 2: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

2

The Course So FarThe Course So Far

Gene finding

DNA

Genome assembly

Regulatory motif discovery

Comparative genomics

Gene expression analysis Clusterdiscovery

Regulatory networks inference

Emerging network properties

Protein network analysis

SEQUENCES

INTERACTIONS

Sequence alignment

DONE

Page 3: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

Next: Biomolecular Detection Systems

3

ABI Prism ® 310 Genetic Analyzer Affymetrix GeneChip ® Roche LightCycler ®

DNA SequencingGene Expression

Profiling DNA Amplification

DONE

Page 4: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

Fundamentals of Detection

Biological Assay

Transducer/Interface

DetectionCircuitry

Detection

BiochemicalUncertainties

Fabrication andProcess Variation

Electronic NoiseQuantization Noise

Molecular BiologyBiochemistry

Fabrication/Synthesis Processes

CircuitDesign

SignalProcessing

BiologicalSample Data

ADC

DNA Microarrays Photodiode Image Sensor Analog to Digital Conversion

Biomolecular Detection Systems

• Bio-molecular detection systems, in general, have the following sub-blocks:

4

Example:

Page 5: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

* There are different transduction methods for “counting” the binding events, e.g., fluorescence, electrochemical, chemi-luminescence …

1 2 IncubationIncubation 3 DetectionDetectionSample exposureSample exposure

Cap

ture

DN

A p

rob

es

CapturedDNA

Target DNA

Lin

ker

5

Affinity-Based Biosensors

• Exploit the affinity of certain biomolecules for each other to capture and

detect:

Page 6: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

Parallel Affinity-Based Sensing

Biological sample

Capturing sites

Planar solidsurface

Capturingprobe (A)

Capturingprobe (B)

6

• For simultaneous detection of multiple targets, use affinity-based sensors in

parallel:

Page 7: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

DNA Microarrays

DNAProbe (A)

DNAProbe (B)

TargetDNA (B)

TargetDNA (A)

Fluorescent Tag

20-100µm

50-300µm

A

B

10,0

00 s

po

ts

7

• DNA microarrays are massively parallel affinity-based biosensor arrays:

Page 8: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

8

Applications of DNA Microarrays

• Recall central dogma:

• DNA microarrays interrupt the information flow and measure gene

expression levels

• frequently, the task is to measure relative changes in mRNA levels

• this gives information about the cell from which the mRNA is sampled

(e.g., cancer studies)

• Other applications:

• single nucleotide polymorphism (SNP) detection

• simultaneous detection of multiple viruses, biohazard / water testing

Page 9: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

En

erg

y

ΔE

ACGTACGT

TGCATGCA

ACGTACGT

TGCATGCA

GACTGACT

CTGACTGA

GACTGACT

CTGACTGA

GACTGACT

CTGACTGA

GACTGACT

CTGACTGA

Hydrogenbonds

Sensing in DNA Microarrays

9

• When complementary ssDNA molecules get close to each other, electrostatic

interactions (in form of hybridization bonds) may create dsDNA

• Because of thermal energy, the binding is a reversible stochastic process

• Relative stability of the dsDNA structures depend on the sequences

Page 10: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

Stability of dsDNA: Melting Temperature

En

erg

yACGTACGT

TGCATGCA

ACGTACGT

TGCATGCA

Hydrogenbonds

dsDNA50%

ssDNA50%

10

• The melting temperature (Tm) of a DNA fragment: the temperature at which

50% of the molecules form a stable double helix while the other 50% are

separated into single strand molecules

• Melting temperature is a function of DNA length, sequence content, salt

concentration, and DNA concentration

• For sequences shorter than 18 ntds, there is a simple (Wallace) heuristic:

)(4)(2 CGTATm

Page 11: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

There are many different methods to approximate melting temperature:

Wallace method (DNA strands less than 18mers):

)(4)(2 CGTATm

%GC method:

])/625]GC[%4.0]Na[10(log6.165.81 NTm

1. Breslauer, K.J. et al. (1986) Proc. Natl. Acad. Sci. USA. 83: 3746-3750. 2. Rychlik, W. et al. (1990) Nucleic Acids Res. 18: 6409-6412.

Nearest neighbor method [1]:

]Na[10log6.1615.273)4/ln(

kcal4.3

CRSA

HTm

ΔH is the sum of nearest neighbor enthalpy changesA is the initiation constant of -10.8 cal/K° mole for non-self complementary sequences, -12.4 cal/Ko mole for complementary sequencesΔS is the sum of nearest neighbor entropy changesR is the gas constant 1.987 cal/K°moleC is the concentration of DNA (generally fixed at 250 pM) [2]

Computing Melting Temperature

11

Page 12: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

Temperature

Pro

ba

bil

ity

of

ss

DN

A

T1 T2 T3 T4 T5

DN

A (

1)

DN

A (

2)D

NA

(3)

DN

A (

4)D

NA

(5)

Melting temperature is a function of DNA length, sequence content, salt concentration, and DNA concentration

DNA Melting Curve

Computing Melting Temperature

12

Page 13: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

En

erg

y

ΔE1

(Bond stability) 1 > (Bond stability) 2

ACGTTGCA

ACGTTGCA

ACGT

TGCA

ΔE2

AGGTTGCA

TGCA

AGGT

Not a perfectmatch

ΔE1 > ΔE2

13

• Depending on sequences, non-specific binding may also happen:

Nonspecific Binding (Cross-hybridization)

• So, non-specific binding is not as stable as specific binding

Page 14: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

14

Probe (A)Probe (B) Probe (C)

(A) (B) (C)

),,( CBAfSA

Non-specific binding in microarrays

• Interference may lead to erroneous conclusions

• Useful signal is affected by the interfering molecules, false positives

possible

• Non-specific binding (cross-hybridization) manifests as interference:

Page 15: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

The kinetics of reactions depends on:

1. The frequency that the reactive species (e.g., molecules) get close to each

other

2. If in close proximity, can thermal energy “facilitate” microscopic interactions

Ene

rgy

ΔE2

ΔE1

State 2Molecules activated

State 1 Molecules close

State 0Molecules bind

ssDNA (1) ssDNA (2)

ssDNA (1)

ssDNA (2)

dsDNA (1+2)

15

Road to a Model: Underlying Physics

Page 16: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

Arrhenius equation represents the dependence of the rate constant of areaction on temperature :

In its original form the pre-exponential factor and the activation energy are considered to be temperature-independent.

T

RTEaAek /

k

A

aE

)I(

Ene

rgy

ΔE2

ΔE1

kTEeAk /11

2

kTEEeAk /)(1 1

12

16

Road to a Model: Underlying Physics

Page 17: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

The reaction kinetics of two molecules has the following Markov model:

Ene

rgy

ΔE2

ΔE1

2 10

State 2Molecules activated

State 1 Molecules close

State 0Molecules bind

2/1

teAp kTE

/12,0

22

teAp kTEE /)(12,1

122

2/1

2,11,1 1 pp 2,00,0 1 pp

17

Markov Model

Page 18: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

2 10

State 2Molecules activated

State 1 Molecules close

State 0Molecules bind

2/1

teAp kTE

/12,0

22

teAp kTEE /)(12,1

122

2/1

2,11,1 1 pp 2,00,0 1 pp

A more practical model is the two-state Markov model

10

State 1 Molecules close

State 0Molecules bind

tkp 12,0

tkp 12,1tkp 11,1 1tkp 10,0 1

18

Markov Model

Page 19: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

A more practical model is the two-state Markov model

10

State 1 Molecules close

State 0Molecules bind

tkp 12,0

tkp 12,1tkp 11,1 1tkp 10,0 1

19

Markov Model

11

1

11

1

11

11

11

1

11

1

1

1

kk

kkk

k

tktk

tktk

kk

kkk

k

kTEkk

k/

11

1

1Re1

1

kTEkk

k/

11

1

1Re1

1

)II( )III(

Steady-state distribution:

Page 20: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

The Number of Captured Targets: A Random Process

Distribution of Captured Particles

CapturedTargets at

Time (hours)C

aptu

red

An

aly

tes

Cap

ture

d A

na

lyte

s)

(tx)]([ txE

)( 1txxσ2

1t

)]([ txE

xtxE )]([

xtxE )]([

The number of captured molecules forms a continuous-time Markov process

20

Page 21: 1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics

Array Fabrication2

Sample Preparation1

Incubation3 Detection4

21

The steps involved in an experiment:

Microarrays