bioinformation technology: case studies in bioinformatics and biocomputing with dna chips byoung-tak...

Bioinformation Technology: Bioinformation Technology: Case Studies in Bioinformatics andCase Studies in Bioinformatics and

Biocomputing with DNA ChipsBiocomputing with DNA Chips

Byoung-Tak ZhangCenter for Bioinformation Technology (CBIT)

Seoul National University

[email protected]://bi.snu.ac.kr/~btzhang

2

OutlineOutline

Bioinformation Technology Bioinformatics

DNA Chip Data Analysis: IT for BT DNA Computing: BT for IT

DNA Computing with DNA Chips Outlook

3

Human Genome ProjectHuman Genome Project

Genome Health Implications

A New DiseaseEncyclopedia

New Genetic Fingerprints

NewDiagnostics

NewTreatments

Goals• Identify the approximate 40,000 genes in human DNA• Determine the sequences of the 3 billion bases that make up human DNA• Store this information in database• Develop tools for data analysis• Address the ethical, legal and social issues that arise from genome research

4

Bioinformatics vs. BiocomputingBioinformatics vs. Biocomputing

BTBTITIT

Bioinformatics

Biocomputing

5

BioinformaticsBioinformatics

6

What is Bioinformatics?What is Bioinformatics?

Bioinformatics vs. Computational Biology Bioinformatik (in German): Biology-based computer scien

ce as well as bioinformatics (in English)

Informatics – computer scienceBio – molecular biology

Bioinformatics – solving problems arising from biology using methodology from computer science.

7

Molecular Biology: Flow of Molecular Biology: Flow of Information Information

DNA RNA Protein Function

DNAPhe Cys LysCysAspCys ArgSerAla

Leu

Protein

ACTGGA AGCTTATC

8

DNA (Gene) RNA ProteinDNA (Gene) RNA Protein

Controlstatement

TATA start

Termination stop

Controlstatement

Ribosomebinding

Gene

Transcription (RNA polymerase)

mRNA

Protein

Translation (Ribosome)

5’ utr 3’ utr

9

Nucleotide and Protein SequenceNucleotide and Protein Sequence

aacctgcgga aggatcattaccgagtgcgg gtcctttgggcccaacctcc catccgtgtctattgtaccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg ggcgcctctgccccccgggc ccgtgcccgccggagacccc aacacgaacactgtctgaaa gcgtgcagtctgagttgatt gaatgcaatcagttaaaact ttcaacaatggatctcttgg ttccggctgc tattgtaccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg ggcgcctctgccccccgggc ccgtgcccgccggagacccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg cggagacccc

gcgggcccgc cgcttgtcggccgccggggg ggcgcctctgccccccgggc ccgtgcccgcaacctgcgga aggatcattaccgagtgcgg gtcctttgggcccaacctcc catccgtgtctattgtaccc tgttgcttcggcgggcccgc cgcttgtcggagttaaaact ttcaacaatggatctcttgg ttccggctgc tattgtaccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg ggcgcctctgccccccgggc ccgtgcccgccggagacccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg cggagacccc gcgggcccgc cgcttgtcggccgccggggg ggcgcctctg

cgcttgtcgg ccgccgggggccccccgggc ccgtgcccgccggagacccc aacacgaacactgtctgaaa gcgtgcagtctgagttgatt gaatgcaatcagttaaaact ttcaacaatggatctcttgg aacctgcggaccgagtgcgg gtcctttgggcccaacctcc catccgtgtctattgtaccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg ggcgcctctgagttaaaact ttcaacaatggatctcttgg ttccggctgc tattgtaccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg ggcgcctctgccccccgggc ccgtgcccgccggagacccc tgttgcttcg

SQ sequence 1344 BP; 291 A; C; 401 G; 278 T; 0 other

DNA (Nucleotide) Sequence

CG2B_MARGL Length: 388 April 2, 1997 14:55 Type: P Check: 9613 .. 1 MLNGENVDSR IMGKVATRAS SKGVKSTLGT RGALENISNV ARNNLQAGAK KELVKAKRGM TKSKATSSLQ SVMGLNVEPM EKAKPQSPEP MDMSEINSAL EAFSQNLLEG VEDIDKNDFD NPQLCSEFVN DIYQYMRKLE REFKVRTDYM TIQEITERMR SILIDWLVQV HLRFHLLQET LFLTIQILDR YLEVQPVSKN KLQLVGVTSM LIAAKYEEMY PPEIGDFVYI TDNAYTKAQI RSMECNILRR LDFSLGKPLC IHFLRRNSKA GGVDGQKHTM AKYLMELTLP EYAFVPYDPS EIAAAALCLS SKILEPDMEW GTTLVHYSAY SEDHLMPIVQ KMALVLKNAP TAKFQAVRKK YSSAKFMNVS TISALTSSTV MDLADQMC

Protein (Amino Acid) Sequence

10

Some FactsSome Facts

1014 cells in the human body. 3 109 letters in the DNA code in every cell in

your body. DNA differs between humans by 0.2% (1 in 500

bases). Human DNA is 98% identical to that of

chimpanzees. 97% of DNA in the human genome has no known

function.

11

Topics in BioinformaticsTopics in Bioinformatics

Structure analysisStructure analysis Protein structure comparison Protein structure prediction RNA structure modeling

Pathway analysisPathway analysis Metabolic pathway Regulatory networks

Sequence analysisSequence analysis Sequence alignment Structure and function prediction Gene finding

Expression analysisExpression analysis Gene expression analysis Gene clustering

12

Extension of Bioinformatics ConcExtension of Bioinformatics Concept ept Genomics

Functional genomics Structural genomics

Proteomics: large scale analysis of the proteins of an organism

Pharmacogenomics: developing new drugs that will target a particular disease

Microarray: DNA chip, protein chip

13

Applications of BioinformaticsApplications of Bioinformatics

Drug design Identification of genetic risk factors Gene therapy Genetic modification of food crops and animals Biological warfare, crime etc.

Personal Medicine? E-Doctor?

14

Bioinformatics as Information TecBioinformatics as Information Technologyhnology

Bioinformatics

InformationRetrieval

GenBankSWISS-PROT

Hardware

Agent

Machine Learning

Algorithm

Supercomputing

Information filteringMonitoring agent

ClusteringRule discoveryPattern recognition

Sequence alignment

Biomedical text analysis

Database

15

Background of BioinformaticsBackground of Bioinformatics Biological information infra

Biological information management systems Analysis software tools Communication networks for biological research

Massive biological databases DNA/RNA sequences Protein sequences Genetic map linkage data Biochemical reactions and pathways

Need to integrate these resources to model biological reality and exploit the biological knowledge that is being gathered.

16

Structural Genomics

FunctionalGenomics Proteomics Pharmaco-

genomics

AGCTAGTTCAGTACATGGATCCATAAGGTACTCAGTCATTACTGCAGGTCACTTACGATATCAGTCGATCACTAGCTGACTTACGAGAGT

Microarray (Biochip)

Infrastructure of Bioinformatics

Areas and Workflow of BioinformAreas and Workflow of Bioinformaticsatics

17

DNA Chip Data Analysis:DNA Chip Data Analysis:IT for BTIT for BT

18

cDNA MicroarraycDNA Microarray

cDNA clones(probes)

PCR product amplificationpurification

Printing

Microarray

Hybridize target to microarray

mRNA target

Excitation

Laser 1Laser 2

Emission

Scanning

Analysis

Overlay images and normalize

0.1nl/spot

19

The Complete Microarray BioinforThe Complete Microarray Bioinformatics Solutionmatics Solution

DataManagement

Databases

StatisticalAnalysis

ImageProcessing

Automation

DataMining

ClusterAnalysis

20

DNA Chip ApplicationsDNA Chip Applications

Gene discovery: gene/mutated gene Growth, behavior, homeostasis …

Disease diagnosis Cancer classification

Drug discovery: Pharmacogenomics Toxicological research: Toxicogenomics

21

Disease Diagnosis:Disease Diagnosis:Cancer Classification with DNA MicroarrayCancer Classification with DNA Microarray

- cDNA microarray data of 6567 gene expression levels [Khan ’01].

- Filter genes that are correlated to the classification of cancer using PCA and ANN learning.

- Hierarchical clustering of the DNA chip samples based on the filtered 96 genes.

- Disease diagnosis based on DNA chip.

[Fig.] Flowchart of the experimental procedure.

22

Disease Diagnosis:Disease Diagnosis:Hierarchical Clustering Based on Gene Expression LevelsHierarchical Clustering Based on Gene Expression Levels

- Hierarchical clustering of cancer by 96 gene expression levels.

- The relation between gene expression and cancer category.

- Four cancer diagnostic categories

[Fig.] The dendrogram of four cancer clusters and gene expression levels (row: genes, column: samples).

23

AI Methods for DNA Chip Data AI Methods for DNA Chip Data AnalysisAnalysis Classification and prediction

ANNs, support vector machines, etc. Disease diagnosis

Cluster analysis Hierarchical clustering, probabilistic clustering, etc. Functional genomics

Genetic network analysis Differential models, relevance networks, Bayesian netw

orks, etc. Functional genomics, drug design, etc.

24

Cluster AnalysisCluster Analysis

[DNA microarray dataset]

[Gene Cluster 1]

[Gene Cluster 2]

[Gene Cluster 3]

[Gene Cluster 4]

http://www.gene-chips.com/sample1.jpg




25

Methods for Cluster AnalysisMethods for Cluster Analysis

Hierarchical clustering [Eisen ’98] Self-organizing maps [Tamayo ’99] Bayesian clustering [Barash ’01] Probabilistic clustering using latent variables [Shi

n ’00] Non-negative matrix factorization [Shin ’00] Generative topographic mapping [Shin ’00]

26

Clustering of Cell Cycle-regulated Clustering of Cell Cycle-regulated Genes in Genes in S. cerevisiae S. cerevisiae (the Yeas(the Yeast)t) Identify cell cycle-regulated

genes by cluster analysis. 104 genes are already known to

be cell-cycle regulated. Known genes are clustered into

6 clusters. Cluster 104 known genes and

other genes together. The same cluster

similar functional categories.

[Fig.] 104 known gene expression levels according to the cell cycle(row: time step, column: gene).

27

Probabilistic Clustering Using Probabilistic Clustering Using Latent VariablesLatent Variables

gi: ith gene

zk: kth clustertj: jth time stepp(gi|zk): generating probability of ith gene given kth clustervk=p(t|zk): prototype of kth cluster

)()()|()|()(

i

kkiikki p

zpzpzpzpg

ggg

i j k

kjkikij ztpzpzpgztf ))|()|()(log(),,( gg

j

kjijki vxsimilarity ),( vx

: (*) objective function(maximized by EM)

28

Experimental Result:Experimental Result:Identify Cell Cycle-Regulated GenesIdentify Cell Cycle-Regulated Genes

Clustering result

[Table] Clustering result with -factor arrest data. In 4 clusters, the genes, that have high probability of being cell cycle-regulated, were found.

29

Experimental Result:Experimental Result:Prototype Expression Levels of Found ClustersPrototype Expression Levels of Found Clusters

[Fig.] Prototype expression levels of genes found to be cell cycle-regulated (4 clusters).

• The genes in the same cluster show similar expression patterns during the cell cycle.• The genes with similar expression patterns are likely to have correlated functions.

30

Clustering Using Non-negative Clustering Using Non-negative Matrix Factorization (NMF)Matrix Factorization (NMF)

NMF (non-negative matrix factorization)

r

aaiaii HW

1

)()( WHG

WHG

G : gene expression data matrix

W : basis matrix (prototypes)

H : encoding matrix (in low

dimension)

0,, aiai HWG

NMF as a latent variable model

…

…

h1 hr

g1 g2 gn

W

Whg

h2

31

Experimental Result:Experimental Result:Five Clusters Found by NMFFive Clusters Found by NMF

5 prototype expression levels during the cell cycle.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Time step in cell cycle

Expr

essi

on le

vel

32

Clustering Using GenerativeClustering Using Generative Topographic Mapping (GTM) Topographic Mapping (GTM)• GTM: a nonlinear, parametric mapping y(x;W)

from a latent space to a data space.

y(x;W): mapping

t1

t3

t2

x2

x1

Grid

<Latent space> <Data space>

Visualization

Generation

33

Experimental Result:Experimental Result:Clusters Found by GTMClusters Found by GTM

Three cell cycle-regulated clusters found by GTMCluster center No. of train

Data/ no. in cluster

Correct no. / test data

Overall mean expression levels (Cln/b) of known genes

S/G2 5 / 1 / 2 (.148 .184 -.367 -.044)

S (0.111 –0.333) 5 / 5 5 / 5 (100%) (1.075 1.482 -.233 -.375)

M/G1 c1 c2 c3

(0.111 0.333)(-0.111 –0.111)(0.323 0.1)

13 / 7 / 2 / 2

1 / 60 / 60 / 6

(-.171 -.573 .091 .311)

G2/M c1 c2

(0.111 0.333)(0.111 0.111)

10 / 5 / 3

0 / 53 / 5 (80%)

(-.616 –1.01 1.832 1.596)

G1 c1 c2

(-0.111 0.333)(-0.111 0.111)

35 / 18 / 7

10 / 16 (62%) 0 / 16

(.894 .907 -.766 -.479)

34

Experimental Result:Experimental Result:Comparison with other methodsComparison with other methods

Comparison of prototype expression levelsNo. of selected genes

Mean expression levels by GTM

No. of selected genes by Spellman

Mean expression levels by Spellman

S/G2 92 (.13 -.06 -.1 .01) 121 (.13 .05 -.16 .03)

S 25 (.84 .81 -.42 -.33) 71 (.46 .47 -.43 -.18)

M/G1 c1 c2 c3

1203410

(.82 .65 -.65 -.38)(-.04 -.37 -.01 -.11)(.32 .29 -.3 .05)

113 (-.21 -.61 -.04 .07)

G2/M c1 c2

3360

(-.59 -.96 1.34 1.29)(.08 -.30 .51 .57)

195 (-.32 -.62 .49 .54)

G1 c1 c2

122 74(total = 570)

(.92 .74 -.62 -.33)(.79 .82 -.48 -.34)

300

(total = 800)

(.66 .49 -.55 -.33)

35

Genetic Network AnalysisGenetic Network Analysis

- Discover the complex regulatory interaction among genes.

- Disease diagnosis, pharmacogenomics and toxicogenomics

- Boolean networks

- Differential equations

- Relevance networks [Butte ’97]

- Bayesian networks [Friedman ’00] [Hwang ’00]

[Fig.] Basin of attraction of 12-gene Boolean genetic network model [Somogyi ’96].

36

Bayesian NetworksBayesian Networks

Represent the joint probability distribution among random variables efficiently using the concept of conditional independence.

BA

C D

Enet) Bayes example (by the )|()|(),|()()(

rule)chain (by ),,,|(),,|(),|()|()(),,,,(

CEPBDPBACPBPAPDCBAEPCBADPBACPABPAP

EDCBAP

•A, C and D are independent given B.

•C asserts dependency between A and B.

•A, B and E are independent given C.

An edge denotes the possibility of the causal relationship between nodes.

37

Bayesian Networks LearningBayesian Networks Learning

Dependence analysis [Margaritis ’00] Mutual information and 2 test

Score-based search

• D: data, S: Bayesian network structure

NP-hard problem Greedy search Heuristics to find good massive network structures quick

ly (local to global search algorithm)

n

i

q

j

r

kijk

ijkijk

ijij

iji i NN

Sp

SDpSpSDp

1 1 1 )()(

)()(

)(

)|()(),(

38

The Small Bayesian Network for The Small Bayesian Network for Classification of CancerClassification of Cancer

Zyxin

Leukemiaclass

MB-1

C-mybLTC4STraining error Test error

Bayes nets 0/38 2/34Neural trees 0/38 1/34

RBF networks 0/38 1.3/34

•The Bayesian network was learned by full search using BD (Bayesian Dirichlet) score with uninformative prior [Heckerman ’95] from the DNA microarray data for cancer classification (http://waldo.wi.mit.edu/MPR/).

[Table] Comparison of the classification performance with other methods [Hwang ’00].

39

Large-Scale Bayesian Network Large-Scale Bayesian Network with with 1171 Genes1171 Genes

- Genetic networks for understanding the regulatory interaction among genes and their derivatives

- Pharmacogenomics and Toxicogenomics

[Fig.] The Bayesian network structure constructed from DNA microarray data for cancer classification (partial view).

40

DNA Computing: BT for ITDNA Computing: BT for IT

41

DNA ComputingDNA Computing: BioMolecules a: BioMolecules as Computers Computer

011001101010001 ATGCTCGAAGCT

42

Why DNA Computing?Why DNA Computing?

6.022 1023 molecules / mole Immense, brute force search of all possibilities

Desktop: 109 operations / sec Supercomputer: 1012 operations / sec 1 mol of DNA: 1026 reactions

Favorable energetics: Gibb’s free energy

1 J for 2 1019 operations Storage capacity: 1 bit per cubic nanometer

-1mol 8kcalG

43

HPPHPP

...

......

...ATGATGACGACG

TGCTGC

CGACGA

TAATAAGCAGCA

CGTCGT...

...

...

...... ...

...

...

10

3

2 56

4

SolutionSolution

ATGTGCTAACGAACG

ACGCGAGCATAAATGTGCCGTACGCGAGCATAAATGTGCCGT

TAAACG

CGACGT

TAAACGGCAACG

...

...

...

...

CGACGTAGCCGT

...

...

...

ACGCGAGCATAAATGTGCCGTACGCGAGCATAAATGTGCCGTACGCGTAGCCGT

ACGCGT

......

...

...

...

ACGGCATAAATGTGCACGCGTACGCGAGCATAAATGCGATGCCGT


... ... .........


...

.........

...

Decoding

Ligation

Encoding

Gel Electrophoresis

Affinity Column

ACGCGAGCATAAATGTGCACGCGT

ACGCGAGCATAAATGCGATGCACGCGT

ACGCGAGCATAAATGTGCACGCGT

ACGCGAGCATAAATGCGATGCACGCGT

2

0 13 4

56

Node 0: ACG Node 3: TAANode 0: ACG Node 3: TAANode 1: CGA Node 4: ATGNode 1: CGA Node 4: ATGNode 2: GCA Node 5: TGCNode 2: GCA Node 5: TGC

Node 6: CGTNode 6: CGT

Flow of DNA ComputingFlow of DNA Computing

PCR(Polymerase

Chain Reaction)

44

Biointelligence on a Chip?Biointelligence on a Chip?

Biological Computer

MolecularElectronics

BioinformationTechnology

Computing Models:The limit of conventional computing models

Computing Devices: The limit of siliconesemiconductor technology

Information Technology

Biotechnology

Biointelligence Chip

45

Intelligent Biomolecular InformatioIntelligent Biomolecular Information Processingn Processing

Bio-Memory Biocomputing

Theoretical Models

S

GFP

Cytochrome c

S

GFP

Cytochrome c

Bio-Processor

Input AInput AController

OutputReaction Chamber

(Calculating)

46

Evolvable Biomolecular HardwarEvolvable Biomolecular Hardwaree

Sequence programmable and evolvable molecular systems have been constructed as cell-free chemical systems using biomolecules such as DNA and proteins.

http://www.mrc.uidaho.edu/fpga/circuit.html

47

DNA Computers vs. Conventional DNA Computers vs. Conventional ComputersComputers

DNA-based computers Microchip-based computersslow at individual operations fast at individual operations

can do billions of operations simultaneously

can do substantially fewer operations simultaneously

can provide huge memory in small space

smaller memory

setting up a problem may involve considerable preparations

setting up only requires keyboard input

DNA is sensitive to chemical deterioration

electronic data are vulnerable but can be backed up easily

48

Molecular Operators for DNA Molecular Operators for DNA ComputingComputing• Hybridization: complementary pairing of two single-stranded polynucleotides

5’- AGCATCCA –3’

3’- TCGTAGGT –5’+ 5’- AGCATCCA –3’

3’- TGCTAGGT –5’

• Ligation: attaching sticky ends to a blunt-ended molecule

TGACTACGACTG

ATGCATGCTACG + ATGCATGCTGAC

TACGTACGTGAC

sticky end

49

Research GroupsResearch Groups

MIT, Caltech, Princeton University, Bell Labs EMCC (European Molecular Computing Consorti

um) is composed of national groups from 11 European countries

BioMIP Institute (BioMolecular Information Processing) at the German National Research Center for Information Technology (GMD)

Molecular Computer Project (MCP) in Japan Leiden Center for Natural Computation (LCNC)

50

Applications of Biomolecular ComApplications of Biomolecular Computingputing Massively parallel problem solving Combinatorial optimization Molecular nano-memory with fast associative search AI problem solving Medical diagnosis Cryptography Drug discovery Further impact in biology and medicine:

Wet biological data bases Processing of DNA labeled with digital data Sequence comparison Fingerprinting

51

NACST NACST (Nucleotide Acid Computing Simulation Toolkit)(Nucleotide Acid Computing Simulation Toolkit)

GUI

DNA Sequence Generator

Genetic Algorithm

Ligation Unit

PCR Unit

Electrophoresis UnitAffinity Column Unit

Enzyme Unit

NACST Engine Controller

DNA Sequence Optimizer

52

NACSTNACSTOutputs Inputs

53

Combinatorial Problem SolverCombinatorial Problem Solver

1

32

AGCT TAGGP1A P1B

ATGG CATGP2A P2B

CGAT CGAAP3A P3B

10

3

2 5

6

4

3

53

3

7

113

3

9

11

33 7 3

P1B P3A

ATCC GCCT GCTAW13P1B P2A

ATCC ATCA TACCW12

TSP (Traveling Salesman Problem)

Representations

0 1 2 3 4 5 6 0

54

Combinatorial Problem SolverCombinatorial Problem Solver Weight

representation methods

1. Molecules with high G-C content tend to hybridize easily.

2. Molecules with high G-C content tend to be denatured at higher temperature.

3. Molecules with larger population in tube will have more probability to hybridize.

Hybridization/Ligation

PCR/Gel electrophoresis

Affinity chromatography

PCR/Gel electrophoresis

Temperature Gradient Gel Electrophoresis

Graduate PCR

55

Experimental Results for 4-TSPExperimental Results for 4-TSP

Hybridization (37°C)Ligation (16 °C 15hr)

PCR (36 cycle)Gel electrophoresis

(10% polyacrylamide gel)

50 bp markerOligomer mixture

Ligation result

Final PCRresult(140bp)

56

Molecular Theorem ProverMolecular Theorem Prover Resolution refutation method

RQP QTS S TP R

RQ QT

Q

R

nilR is true!

Problem under consideration:

Turn into , add R as

?true , , , ,

RPTSQTSRQP

BA BA

R

RPTSQTSRQP

, , , ,

57

Molecular Theorem ProverMolecular Theorem Prover(Abstract Implementation)(Abstract Implementation)

Implementation 1 Implementation 2

¬S ¬T Q

¬Q ¬P R

P ¬R

TS

¬S ¬T Q¬Q ¬P R

P ¬R

TS

¬S ¬T Q¬Q ¬P R

P ¬RTS

R

¬Q

Q

¬P¬S

¬T ¬R

T SP

58

Molecular Theorem ProverMolecular Theorem Prover(Experiments for Method 1)(Experiments for Method 1)

실험 과정 실험 결과

II. DenaturationII. Denaturation

( 95( 95°C 10 min)°C 10 min)

IV. Polyacrylamide gel Electrophoresis(20%)IV. Polyacrylamide gel Electrophoresis(20%)

( PAGE )( PAGE )

V. Detection of solution V. Detection of solution

: 75bp ds DNA: 75bp ds DNA

III. AnnealingIII. Annealing9595°C 1 min °C 1 min 15 °C : 1°C down/min 15 °C : 1°C down/min

I. I. 각 분자들을 혼합각 분자들을 혼합

100pmol/each 100pmol/each Total 20 Total 20 ulul

200 bp200 bp

20 bp20 bp

11 22 33 44 55 66

20 bp DMA marker (Talara)

Mixture Reaction

59

Solving Logic Problems by Solving Logic Problems by Molecular ComputingMolecular Computing Satisfiability Problem

Find Boolean values for variables that make the given formula true

3-SAT Problem Every NP problems can be see

n as the search for a solution that simultaneously satisfies a number of logical clauses, each composed of three variables.

)or or ( AND )or or ()or or ( AND )or or (

321321

654321

xxxxxxxxxxxx

)()()( 324431 xxxxxx

DNA Computing with DNA ChipsDNA Computing with DNA Chips

61

DNA Chips for DNA ComputingDNA Chips for DNA Computing

I. Make: oligomer synthesis

II. Attach (Immobilized): 5’HS-C6-T15-CCTTvvvvvvvvTTCG-3’

III. Mark: hybridization

IV. Destroy: Enzyme rxn (ex.EcoRI)

V. Unmark * 문제를 만족시키지 않는 모든 stran

d 제거

VI. Readout: N cycle 의 마지막 단계에 해가 남게

되 면 , PCR 로 증폭하여 확인 !

62

Variable Sequences and the Variable Sequences and the Encoding SchemeEncoding Scheme

63

Tree-dimensional Plot and Tree-dimensional Plot and Histogram of the FluorescenceHistogram of the Fluorescence

S3: w=0, x=0, y=1, z=1 S7: w=0, x=1, y=1, z=1 S8: w=1, x=0, y=0, z=0 S9 : w=1, x=0, y=0, z=1

y=1: (w V x V y) 만족 z=1: (w V y V z) 만족 x=0 or y=1: (x V y) 만족 w=0: (w V y) 만족

Four spots with high fluorescence intensity correspond to the four expected solutions.

DNA sequences identified in the readout step via addressed array hybridization.

64

OutlookOutlook

IT gets a growing importance in the advancement of BT. Bioinformatics DNA Microarray Data Mining

IT can benefit much from BT. Biocomputing and Biochips DNA Computing (with DNA Chips)

Bioinformation technology (BIT) is essential as a next-generation information technology. In Silico Biology vs. In Vivo Computing

65

ReferencesReferences [Barash ’01] Barash, Y. and Friedman, N., Context-specific Bayesian

clustering for gene expression data, Proc. of RECOMB’01, 2001. [Butte ’97] Butte, A.J. et al., Discovering functional relationships betw

een RNA expression and chemotherapeutic susceptibility using relevance networks, Proc. Natl Acad. Sci. USA, 94, 1997.

[Eisen ’98] Eisen, M.B. et al., Cluster analysis and display of genome-wide expression patterns, Proc. Natl Acad. Sci. USA, 95, 1998.

[Friedman ’00] Friedman, N. et al, Using Bayesian networks to analyze expression data, Proc. of RECOMB’00, 2000.

[Heckerman ’95] Heckerman, D. et al., Learning Bayesian networks: the combination of knowledge and statistical data, Machine Learning, 20(3), 1995.

[Hwang ’00] Hwang, K.-B. et al., Applying machine learning techniques to analysis of gene expression data: cancer diagnosis, CAMDA’00, 2000.

66

ReferencesReferences [Khan ’01] Khan, J. et al., Classification and diagnostic prediction of c

ancers using gene expression profiling and artificial neural networks, Nature Medicine, 7(6), 2001.

[Margaritis ’00] Margaritis, D. and Thrun, S., Bayesian network induction via local neighborhoods, Proc. of NIPS’00, 2000.

[Shin ’00] Shin, H.-J. et al., Probabilistic models for clustering cell cycle-regulated genes in the yeast, CAMDA’00, 2000.

[Somogyi ’96] Somogyi, R. and Sniegoski, C.A., Modeling the complexity of genetic networks: understanding multigenic and pleiotropic regulation, Complexity, 1(6), 1996.

[Tamayo ’99] Tamayo, P. et al., Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation, Proc. Natl Acad. Sci. USA, 96, 1999.

67

More information atMore information at http://cbit.snu.ac.kr/http://cbit.snu.ac.kr/http://bi.snu.ac.kr/http://bi.snu.ac.kr/

http://cbit.snu.ac.kr/

http://bi.snu.ac.kr/

bioinformation technology: case studies in bioinformatics and biocomputing with dna chips byoung-tak...

Documents