william h. hsu with haipeng guo, rengakrishnan subramanian, ben perry, and julie a. thornton

16
Kansas State University Department of Computing and Information Sciences Bioinformatics and Machine Learning: Bioinformatics and Machine Learning: Building Probabilistic Models Building Probabilistic Models of Gene Expression from Microarray of Gene Expression from Microarray Data Data William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton Department of Computing and Information Sciences Kansas State University Laboratory for Knowledge Discovery in Databases http:// www.kddresearch.org/Groups/Bioinformatics

Upload: kelly-montoya

Post on 04-Jan-2016

21 views

Category:

Documents


2 download

DESCRIPTION

Bioinformatics and Machine Learning: Building Probabilistic Models of Gene Expression from Microarray Data. William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton Department of Computing and Information Sciences Kansas State University - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

Bioinformatics and Machine Learning:Bioinformatics and Machine Learning:Building Probabilistic ModelsBuilding Probabilistic Models

of Gene Expression from Microarray Dataof Gene Expression from Microarray Data

William H. Hsu

with Haipeng Guo, Rengakrishnan Subramanian,

Ben Perry, and Julie A. Thornton

Department of Computing and Information Sciences

Kansas State UniversityLaboratory for Knowledge Discovery in Databases

http://www.kddresearch.org/Groups/Bioinformatics

Page 2: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

OverviewOverview

• Computer Science: What We Do– Software: operating systems, programming languages, software

engineering, databases

– Hardware: logic design, organization and architecture

– Theory of Computation: algorithms, complexity, languages

– Artificial Intelligence (AI): learning, reasoning, planning, agents

– Computer Graphics, Geometry, and Vision

– Computational Science and Engineering (CSE)

• Artificial Intelligence (AI) – Fields of Study– Areas: learning, planning, vision, robotics

– Applications in science, engineering, business, and defense

• Computer Graphics – Some Current Projects and Fun Stuff– Computer-Aided Design (CAD) and Engineering (CAE)

– Information Visualization

– Computer-Generated Images (CGI) and Animation (CGA)

• High-Performance Computing: Linux and Beowulf

Page 3: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information SciencesSPIRIX software ThemeScapes http://www.cartia.com

6500 news storiesfrom the WWWin 1997

Information Retrieval (IR) and Text Mining: Information Retrieval (IR) and Text Mining: Commercial ApplicationsCommercial Applications

Page 4: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

Visual Programming andVisual Programming andSoftware EngineeringSoftware Engineering

Page 5: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

Stages of Data Mining andStages of Data Mining andKnowledge Discovery in DatabasesKnowledge Discovery in Databases

Page 6: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

Knowledge Discovery in Databases (KDD)Knowledge Discovery in Databases (KDD)and Fraud Detectionand Fraud Detection

Page 7: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

[2] Representation Evaluatorfor Learning Problems

Genetic Wrapper forChange of Representationand Inductive Bias Control

D: Training Data

: Inference Specification

Dtrain (Inductive Learning)

Dval (Inference)

[1] Genetic Algorithm

αCandidate

Representation

f(α)Representation

Fitness

OptimizedRepresentation

α̂

eI

Genetic Algorithms for Parameter Tuning in Genetic Algorithms for Parameter Tuning in Bayesian Network Structure Learning [1]Bayesian Network Structure Learning [1]

Page 8: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

[2] Representation Evaluatorfor Input Specifications

: Evidence SpecificationeI

Dtrain (Model Training)

Dval (Model Validation by Inference)

f(α)

Specification Fitness(Inferential Loss)

[B] Validation(Measurementof Inferential

Loss)

hHypothesis

[A] Inductive Learning(Parameter Estimation

from Training Data)

α

CandidateInput Specification

Genetic Algorithms for Parameter Tuning in Genetic Algorithms for Parameter Tuning in Bayesian Network Structure Learning [2]Bayesian Network Structure Learning [2]

Page 9: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

LearningEnvironment

Specification Fitness(Inferential Loss)

[B] ParameterEstimation

[A] StructureLearning

G = (V, E)Graph Component of BN

D: Microarray Data

B = (V, E, )BN with Probabilities

Dval (Model Validation by Inference)

G1

G2

G3

G4 G5

G1

G2

G3

G4 G5

Page 10: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

MicroarraysMicroarrays

Page 11: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

A Gene Network for YeastA Gene Network for Yeast[Friedman, Nachman, Linial, Pe’er, 2000][Friedman, Nachman, Linial, Pe’er, 2000]

Page 12: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

Publication(e.g., PubMed)

Source(e.g.,

Taxonomy)

Gene(e.g., GenBank)

Experiment

Sample Hybridization Array

Normalization/Discretization

Data

Components of A Microarray Experiment:Components of A Microarray Experiment:HybridizationHybridization

Page 13: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

ComputationalWorkflows

(e.g., myGrid)

ExperimentalServices &Metadata

(Mage-ML XML)

GeneExpression

Model

Pathway &NetworkLearning

Specification

DataPreprocessingSpecification

ParameterLearning

Specification

ModelAnalysis

Specification

DiscretizationUse Case

Data MiningUse Case

Feature Selection

Specification

Validation(e.g., Bootstrap)

Use Case

Components of A Microarray Experiment:Components of A Microarray Experiment:Computational Gene Expression ModelingComputational Gene Expression Modeling

Page 14: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

Domain-Specific Repositories

Experimental DataSource Codes and Specifications

Data ModelsOntologies

Models

DESCRIBER

Personalized Interface

Domain-SpecificCollaborative Filtering

New QueriesLearning and Inference

Components

HistoricalUse Case & Query Data

Decision SupportModels

Users ofScientificDocumentRepository

Interface(s) to Distributed Repository

Example Queries:• What experiments have found cell cycle-regulated

metabolic pathways in Saccharomyces?

• What codes and microarray data were used, and why?

DESCRIBERDESCRIBER: An Experimental: An ExperimentalIntelligent FilterIntelligent Filter

Page 15: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

Module 2

Learning & Validationof Bayesian Network

Models forUse Cases

Module 4Learning & Validationof Bayesian Network

Models forMAGE Data & Codes

Relational Models of MAGE Data

Module 1Intelligent Collaborative

Filtering Front-End

Data

Historical Use Case& Query Data

Personalized Interface Module 5MAGE

Data Model

User

Estimationof

ConstraintParameters

Graphical Modelsof Use Cases

Module 3

Constrained Models of Use Cases

New Queries

DESCRIBERDESCRIBEROverviewOverview

Page 16: William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Kansas State University

Department of Computing and Information Sciences

Intelligent Collaborative FilteringFront-End

Personalized Interface

Relational Models of(Domain-Specific) Data

Constrained Modelsof Use Cases

RelationalProbabilistic

ModelConstraintSelector

IntegratedReasoning

Component:

XML Validator andConstraint Checker

Constraintson Repository

Content

Responseto User

New Queryfrom User

Module 1

DESCRIBERDESCRIBERCollaborative Filtering ModuleCollaborative Filtering Module