Transcript
Page 1: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Introduction to Bioinformatics

CSCI8980: Applied Machine Learning in Computational Biology

Rui Kuang

Department of Computer Science and Engineering

University of Minnesota

[email protected]

Page 2: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

His

tory

of B

ioin

form

atic

sH

isto

ry o

f Bio

info

rmatic

s

Thanks to Luce Skrabanek

Page 3: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

His

tory

of B

ioin

form

atic

sH

isto

ry o

f Bio

info

rmatic

s

Thanks to Luce Skrabanek

Page 4: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

His

tory

of B

ioin

form

atic

sH

isto

ry o

f Bio

info

rmatic

s

Thanks to Luce Skrabanek

Page 5: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Biological Data

DNA

Transcription

Translation

Jacques van Helden, David Gilbert and A.C. Tan, 2003

RNA

Biological Function

Protein

Page 6: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Biological Data

DNA

Transcription

Translation

Genome

RNA

Biological Function

Protein

Genome sequences

Page 7: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Biological Data

DNA

Transcription

Translation

Genome

RNA sequences

RNA

Biological Function

Protein

Genome sequences

Page 8: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Biological Data

DNA

Transcription

Translation

Genome

RNA sequences Gene Expression

(Microarray)

RNA

Biological Function

Protein

Genome sequences

Page 9: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Biological Data

DNA

Transcription

Translation

Genome

RNA sequences Gene Expression

(Microarray)

RNA

Biological Function

Protein

Genome sequences

Protein Sequences and Structures

Page 10: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Biological Data

DNA

Transcription

Translation

Genome

RNA sequences Gene Expression

(Microarray)

Protein Expression

RNA

Biological Function

Protein

Genome sequences

Protein sequences and Structures

Protein Expression

Protein Function Annotation

Protein-protein/Protein-DNA interaction

Page 11: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Other Data

� SNPs

� Organism-specific databases

� Genomes� Genomes

� Molecular pathways

� Scientific literature

� Disease information

� ……

Page 12: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Combinatory Algorithms� Get multiple copies of DNA

segments.

� Alignment the segments to reconstruct the sequence.

Closing the GAP with slow � Closing the GAP with slow and expensive experiments.

� Combinatory algorithms for closing the gap with minimal number of pool tests.

Page 13: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Inferring Gene Regulatory

Network with Bayesian Networks

CSCI8980: Applied Machine Learning in Computational Biology

Network with Bayesian Networks

Rui Kuang

Department of Computer Science and Engineering

University of Minnesota

[email protected]

Page 14: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Cellular Networks

� Complex functions of cells are carried out by the coordinated activity of genes and their products

� Cellular network of interactions of

1000s of genes and their products

Figure: Snyder and Gerstein Labs

1000s of genes and their products

� New high-throughput genomic data,

such as microarray data, enables computational study of cellular

networks genome-widely.

� DNA (genes) � mRNA � proteinsTranscription

Page 15: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Gene Regulatory Networks� Gene regulatory networks: switching on

and off of genes by regulation of transcriptional machinery

� Learning problem: Model gene regulatory behavior using genome-wide data, extract hypotheses for wet lab testingbehavior using genome-wide data, extract hypotheses for wet lab testing

� Descriptive models, such as probabilistic graphical models, linear network models, clustering, are interpretable models totraining data.

� Can check if local components of model reflect known biological mechanisms.

Page 16: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Gene Regulation� Regulatory proteins (transcription factors) bind to

non-coding regulatory sequence (promoter) of a gene to control rate of transcription

bindingsite

regulator

Figure: Griffiths et al. "Modern Genetic

Analysis"

mRNA

transcript

protein

generegulatory

sequence

site

Page 17: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Gene Regulation� Regulatory proteins (transcription factors) bind to

non-coding regulatory sequence (promoter) of a gene to control rate of transcription

bindingsite

regulator

Figure: Griffiths et al. "Modern Genetic

Analysis"

mRNA

transcript

protein

generegulatory

sequence

site

Page 18: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Genome-wide Expression Data

� Microarray (and other high-

throughput) technologies

measure mRNA transcript

expression levels for 1000s of expression levels for 1000s of

genes at once

� Noisy and sparse data

� Snapshot of the cellular system: transcriptome, i.e.

protein expression not observed

� Difficult to infer regulatory relation between genes.

Page 19: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Regulatory Components in yeast� For simple organisms like yeast (S. cerevisiae),

previous studies and data sources the components needed in model:

�Known and putative

transcription factors Gene

Transcriptionfactor

promoter

Signalingmolecule

transcription factors

�Signaling molecules that

activate transcription factors

�Known and putative binding site “motifs” in promoter regions

� In yeast, regulatory sequence = 500 bp upstream region

Genepromoter

Binding Motif

Page 20: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Analyze Gene Expression Data� Clustering

� Groups genes with similar expression patterns

� The gene clusters do not reveal the regulatory structure of the genes

� Boolean Networks� Deterministic models of the logical interactions between genes� Deterministic models of the logical interactions between genes

� Gene is in either on state or off state

� Not feasible to learn from microarray data

� Bayesian Networks� Measure expression level of each gene

� Gene as random variables affecting on others

� Can possibly include other random variables, such as external stimuli, environment parameters, and biological factors

Page 21: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Model Validation of Genetic

Regulatory Networks

� Using Bayesian scoring metric to choose the right

network structure

,loglog

)|(log)(

cp(D|S)p(S)

DSpSoreBayesianSc

++=

=

Hartemink et al. 2001

� Validated on the galactose system in S.

cerevisiae

� Expression data: 52 genomes worth of Affymetrix

GeneChip expression data

S. model on theprior a is andfunction likelihood theis where

,loglog

P(S)p(D|S)

cp(D|S)p(S) ++=

Page 22: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Hypothesis of Galactose System

Gal80p inhibits Gal4p post-translationally

Page 23: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Scoring Possible Structures� Binary quantization of gene expression

into up/down (3 binary random variables)

Page 24: Introduction to Bioinformatics.ppt - Weeblybokibme.weebly.com/uploads/5/4/9/8/5498285/lecture_16.pdf · Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational

Scoring Possible Structures� Binary quantization of gene expression

into up/down (3 binary random variables)


Top Related