ontology-based knowledge discovery and sharing in biological and medical research

53
Dept. of Chemical Pathology @ CUHK Hong Kong August 17, 2010 Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research Jingshan Huang Jingshan Huang Assistant Professor Assistant Professor School of Computer and Information School of Computer and Information Sciences Sciences University of South Alabama University of South Alabama http://cis.usouthal.edu/~huang/

Upload: kalkin

Post on 12-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research. Jingshan Huang Assistant Professor School of Computer and Information Sciences University of South Alabama http://cis.usouthal.edu/~huang/. Dept. of Chemical Pathology @ CUHKHong Kong August 17, 2010. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Dept. of Chemical Pathology @ CUHK Hong Kong August 17, 2010

Ontology-Based Knowledge Discovery and Sharing in

Biological and Medical Research

Jingshan HuangJingshan Huang

Assistant ProfessorAssistant ProfessorSchool of Computer and Information SciencesSchool of Computer and Information Sciences

University of South AlabamaUniversity of South Alabamahttp://cis.usouthal.edu/~huang/

Page 2: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Presentation Outline

• Research Motivation

• Ontologies and Ontological Techniques

• Apply Ontological Techniques into Biological and Medical Research

• Ongoing Research – OMIT Project

Page 3: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Research Motivation – Overview

• Information from heterogeneous sources has different semantics

Long (English)

Long (Chinese Pinyin) -> 龙 ( 龍 ) ->

• Knowledge discovery and sharing in biological/medical research is both important and challenging

• Integrating the information from heterogeneous sources must make use of all available clues, including syntax, semantics, context, and pragmatics

• Ontologies are a formal model to encode semantics

• Ontological techniques are critical in knowledge acquisition

Page 4: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Research Motivation – More Details

• In medical informatics area, an abundance of digital data has possibly promised a profound impact in knowledge discovery and innovation

• Worldwide health scientists are producing, accessing, analyzing, integrating, and storing massive amounts of digital medical data daily

• Such data was obtained through observation, experimentation, and simulation

• If we were able to effectively transfer and integrate data from all possible resources, then it is possible to obtain:

① a deeper understanding of all these data sets,② better exposed knowledge, and③ appropriate insights and actions

• Unfortunately, in many cases, the data users are not the data producers

• They thus face challenges in harnessing data in unforeseen and unplanned ways

Why???

Page 5: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Research Motivation – An Example Scenario

• The identification and characterization of important roles microRNAs (miRNAs) played in human cancer is an increasingly active area

• In particular, it is very challenging to effectively identify miRNAs’ target genes

• Cancer patients’ prognosis depends largely on their chemosensitivity (sensitivity to chemotherapy)

• Research has discovered that some specific genes increase the permeability of mitochondria (a cellular component) membrane, which in turn leads to apoptosis (cell death)

• As a result, the patient’s chemosensitivity will increase and the chemotherapy will be more effective

• Certain miRNAs can regulate the aforementioned genes and thus affect cancer patients’ prognosis

• If biologists were able to identify such miRNAs, a breakthrough on cancer treatment would have been made

Unfortunately, such identification is very difficult…

Page 6: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Research Motivation – An Example Scenario (cont.)

• Biologists need to extract a large number of candidate target genes from existing miRNA databases

• They will also have to manually search these genes’ related information from resources other than miRNA databases for every one of hundreds of candidate target genes

① cellular component② biological process③ and so on…

• In a word, the whole process is time-consuming, error-prone, and subject to biologists’ limited prior knowledge

• In addition, such a situation could be even worse① It is further aggravated by great complexity and imprecise terminologies,

which characterize typical biological and biomedical research fields② A great deal of variety has been identified in the adoption of different

biological terms, along with different relationships among all these terms

③ Such variety has inhibited effective information acquisition by humans

Page 7: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

• The biological and medical research area is facing a challenging problem: knowledge discovery and sharing among distributed parties

• In order to integrate heterogeneous data, and thereby efficiently revolutionize the traditional medical and biological research, new methodologies are in great need

• As a formal knowledge representation model, ontologies play a key role in defining formal semantics in traditional knowledge engineering

Conclusion:It is necessary to apply ontological techniques into the biological and medical research investigation

Research Motivation – Summary

Page 8: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Presentation Outline

• Research Motivation

• Ontologies and Ontological Techniques

• Apply Ontological Techniques into Biological and Medical Research

• Ongoing Research – OMIT Project

Page 9: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Definition of Ontologies• The simplest definition:

An ontology is a computational model (a.k.a. knowledge representation model) of some domain of the world

• It describes the semantics of the terms (a.k.a. concepts) used in the domain

• It is often captured in the form of DAG (directed acyclic graph)What is a DAG then?

• Nodes represent ontology concepts while arcs represent their relationships

• May be augmented by rules, constraints, or functions

• In brief, ontologies aim to make explicit the knowledge contained within software applications for a particular domain:An ontology = a finite set of concepts + properties + relationships

• Such graphical structures are also known as ontology schemas

• Actual data sets contained in these schemas are referred to as instances

• Most real-world ontologies have very few or no instances at all

Page 10: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Ontology Engineering• The creation and maintenance of ontologies in the domain of interest

• In other words, it focuses on the methodologies by which to build ontologies

• To create an ontology, three different approaches can be applied① Top-down approach (knowledge driven)② Bottom-up approach (data/inference driven)③ Combination of top-down and bottom-up

• Languages to represent ontologies in computer systems① OWL (Web Ontology Language) – most popular one② Open Biological and Biomedical Ontologies (OBO)③ Knowledge Interchange Format (KIF)④ Open Knowledge Base Connectivity (OKBC)

• GUI tools for ontology engineering① Protégé (by Stanford) – most popular one② CmapTools (by IHMC)③ OntoEdit (by Ontoprise)

Page 11: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Ontology Engineering(Protégé GUI – Upper Bio Ontology)

Page 12: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Ontology Engineering(Example OWL File – Upper Bio Ontology)

Page 13: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Ontology Heterogeneity

• Heterogeneity is an important, inherent characteristic of ontologies developed by different parties for the same (or similar) domains

• This is due to the fact that ontologies reflect their designers’ different conceptual models for some domain

• The heterogeneous semantics may occur in different ways① different terms could be used for the same concept;② an identical term could be adopted for different concepts;③ properties and relationships could be different

As a result, Ontology Matching has become an increasingly active topic

Page 14: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Ontology Matching

• “Ontology Matching” is short for “Ontology Schema Matching”

• Also known as “Ontology Alignment” or “Ontology Mapping”

• It refers to the process of determining correspondences between concepts from heterogeneous ontologies

• It aims to handle the aforementioned challenge in ontology heterogeneity

• Many different relationships will be involved① equivalentWith② subClassOf③ superClassOf④ siblings⑤ and so on…

Page 15: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Current Ontology-Matching Algorithms

Rule-Based Matching① Consider schema information alone② Specify a set of rules③ Apply them to schema information

Learning-Based Matching① Consider both schema and instances② Apply different machine learning techniques

Brief Introduction of Machine Learning① A scientific discipline that is concerned with the design

and development of some special algorithms② These algorithms allow computers to change behavior

based on “training data”③ The major focus is to recognize complex patterns and

make intelligent decisions

Page 16: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Pros and Cons for Current Approaches

Rule-Based Matching① Is relatively fast ()② Ignores instance information ()③ Uses ad hoc predefined weights ()

concept semantics: name + properties + relationships

Learning-Based Matching① Obtains extra clues from instances ()② Runs longer ()③ Has difficulty in getting sufficient instances ()

most real-world ontologies do not have instances

Page 17: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Presentation Outline

• Research Motivation

• Ontologies and Ontological Techniques

• Apply Ontological Techniques into Biological and Medical Research

• Ongoing Research – OMIT Project

Page 18: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Ontological Techniques in Bio Research

• Ontological techniques have been widely applied to medical and biological research

• The most successful example is the Gene Ontology (GO) project

• Unified Medical Language System (UMLS) and the National Center for Biomedical Ontology (NCBO) are two other successful examples

• Besides, efforts have been carried out for ontology-based data integration in bioinformatics and medical informatics

Page 19: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Why Gene Ontology (GO) Project?

• Biologists have wasted a lot of time and effort in searching for all of the available information about each small area of research

• It is further hampered by the wide variations in terminology that may be common usage at any given time

• A simple example: if you were searching for new targets for antibiotics, you might want to find all the gene products that are involved in bacterial protein synthesis

• Suppose that one database describes these molecules as being involved in “translation”, whereas another uses the phrase “protein synthesis”

• It will then be difficult for human to find functionally equivalent terms, let alone any computer software

• As an effort to address the need for consistent descriptions of gene products in different databases, the GO began as a collaboration between three model organism databases (Flies, Saccharomyces, and Mouse) in 1998

• The GO Consortium has grown to include many databases, including several of the world’s major repositories for plant, animal, and microbial genomes

Page 20: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Three Sub-Ontologies in the GO

• Cellular Component, Biological Process, and Molecular Function

• A gene product might:① be associated with or located in one or more cellular components;② be active in one or more biological processes;③ during which it performs one or more molecular functions

ExampleThe gene product, cytochrome c , can be described by:

① the molecular function term “oxidoreductase activity”② the biological process terms “oxidative phosphorylation” and

“induction of cell death”③ the cellular component terms “mitochondrial matrix” and

“mitochondrial inner membrane”

Page 21: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

GO Structure

• The GO ontology is essentially a Hierarchy-Like DAG

• In other words, each node is a GO term, and each arc represents a relationship between two GO terms

• Directed feature

For example, a mitochondrion is an organelle, but not vice versa

• Acyclic feature (cycles are not allowed)

For example, it is inappropriate to specify that “A1 is an A2” “A2 is an A3” … “Ai is an A1”

• Hierarchy-Like feature (generalized-specialized relationship plus possibly multiple parents)

For example, the biological process term hexose biosynthetic process has two parents, hexose metabolic process and monosaccharide biosynthetic process (biosynthetic process is a type of metabolic process and a hexose is a type of monosaccharide)

Page 22: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

An Example GO Diagram

Page 23: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Three Relationships in the GO

• The GO ontology defines three different relationships among terms

① is a , a.k.a. is a subtype of, represented as ;

② part of , represented as ; and

③ regulates , represented as

Note that regulates includes two sub-relationships, i.e., negatively regulates and positively regulates, represented as and , respectively

Page 24: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

is a Relationship in the GO

• If A is a B, it means that A is a subtype of B① For example, mitotic cell cycle is a cell cycle② Another example, lyase activity is a catalytic activity

• The difference between is a relationship and “is an instance of” (meaning that a specific example of something), for example:

① A cat is a mammal② George is an instance of a cat, therefore, the claim that

“George is a cat” is incorrect③ However, it is safe to claim that every one of the instances of

a cat is also an instance of a mammal

Page 25: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over is a Relationship

• The is a relationship is transitive:

• Example

Page 26: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

part of Relationship in the GO

• B is part of A, meaning that the presence of B implies the presence of A

• But not vice versa, i.e., given the presence of A, we cannot conclude the presence of B

• In other words① all B are part of A② but only some A have part B

• Example

Page 27: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over part of Relationship (1)

• The part of relationship is also transitive:

• Example

Page 28: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over part of Relationship (2)

• part of followed by is a :

• Example

Page 29: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over part of Relationship (3)

• part of following is a :

• Example

Page 30: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over part of Relationship (4)

• The aforementioned logical rules regarding the part of and is a relationships hold no matter how many intervening is a and part of relationships are there

• Example

Page 31: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

regulates Relationship in the GO

• B regulates A, meaning that the presence of B implies the presence of A

• But not vice versa, i.e., given the presence of A, we cannot conclude the presence of B

• In other words① all B regulate A② but only some A are regulated by B

• Example

Page 32: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over regulates Relationship (1)

• Both negatively regulates and positively regulates imply regulates

• Example

Page 33: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over regulates Relationship (2)

Page 34: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over regulates Relationship (3)

Page 35: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over regulates Relationship (4)

Page 36: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over regulates Relationship (5)

• Example

Page 37: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over regulates Relationship (6)

• Example

Page 38: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Reasoning over regulates Relationship (7)

• Example

Page 39: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Presentation Outline

• Research Motivation

• Ontologies and Ontological Techniques

• Apply Ontological Techniques into Biological and Medical Research

• Ongoing Research – OMIT Project

Page 40: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Ongoing Research: OMIT Projecthttp://omit.cis.usouthal.edu/

Besides Sun Lab at CUHK, there are five other collaborating labs from around the world

Page 41: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Project Overview

• An innovative computing framework based on the Ontology for MicroRNA Target Prediction (OMIT) to handle the aforementioned challenge in predicting miRNAs’ target genes

• The OMIT is a domain-specific ontology upon which it is possible to facilitate knowledge discovery and sharing from existing sources

• The long-term research objective of the OMIT framework is to assist biologists in unraveling important roles of miRNAs in human cancer, and thus to help clinicians in making sound decisions when treating cancer patients

• We aim to synthesize data from existing source miRNA databases into a comprehensive conceptual model that permits an emphasis on data semantics

• Consequently, a more accurate, complete view of miRNAs’ biological functions can be acquired

We thus provide users with a single query engine that takes their needs in a nonprocedural specification format

Page 42: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

System Framework

Page 43: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Five Tasks in the OMIT Project

① To develop a miRNA-domain-specific ontology that contains a set of OMIT concepts, along with the relationships among these concepts

② To align the OMIT with the GO so that gene-related information can be automatically acquired and integrated

③ To annotate source miRNA databases with OMIT concepts for existing databases to be enriched with formal semantics

④ To integrate OMIT-annotated miRNA databases into a centralized RDF data warehouse

⑤ To perform complicated search/query in a unified style so that deep knowledge can be obtained out of a wealth of miRNA data

Page 44: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

An Example Research Scenario

Suppose a cancer biologist is interested in investigating thechemosensitivity of breast cancer cells

• By comparing chemosensitive and chemoresistant cancer cells it is demonstrated that miR-125b, a specific miRNA, may confer the increased chemosensitivity of cancer cells

• After the OMIT system obtains candidate targets for miR-125b, the gene information of these targets will be further acquired, including cellular localization (e.g., in mitochondria) and biological process (e.g., apoptosis)

• The availability of such integrated knowledge will make it much easier for the cancer biologist to deduct the actual targets for miR-125b

• As a result, a breakthrough in breast cancer treatment may be granted

Page 45: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

A Typical Knowledge Acquisition Cycle

• Steps 1-3: the user initiates a search/query; recognized miRNA concept is used to query the RDF data warehouse

• Steps 4-5: miRNA targets are retrieved and utilized to acquire more gene information

• Steps 6-8: miRNA targets and their related gene information are returned to the user

Corresponding RDF-based query:

SELECT DISTINCT OMIT:targetGeneFROM OMIT:miRNA, GO-CC:cellComponent, GO-BP:bioProcessWHERE OMIT:miRNA ID = “miR-125b”AND OMIT:miRNA targetID = GO-CC:cellComponent geneIDAND OMIT:miRNA targetID = GO-BP:bioProcess geneIDAND GO-CC:cellComponent localization = “mitochondria”AND GO-CC:cellComponent permeabilityIncrease = “yes”AND GO-BP:bioProcess apoptosisIncrease = “yes”USING NAMESPACEOMIT = <http://omit.cis.usouthal.edu/ontology/OMIT.owl>,GO-CC = <http://www.geneontology.org/formats/oboInOwl#>,GO-BP = <http://www.geneontology.org/formats/oboInOwl#>.

Page 46: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Top-Level OMIT Concepts

Page 47: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Expanded View of OMIT Concepts (Portion)

Page 48: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Linkage between the OMIT and the GO

• Some OMIT concepts are directly inherited and extended from GO conceptsFor example, OMIT concept GeneExpression is designed to describe miRNAs’ regulation of gene expression. This concept is inherited from concept gene expression in the BiologicalProcess ontology. This way, subclasses of gene expression, such as negative regulation of gene expression, are then accessible in the OMIT for describing the negative gene regulation of miRNAs in question

• Some OMIT concepts are equivalent to (or similar to) GO conceptsFor example, OMIT concept PathologicalEvent and its subclasses are designed to describe biological processes that are disturbed when a cell becomes cancerous. Although not immediately inherited from any specific GO concepts, these OMIT concepts do match up with certain concepts in the BiologicalProcess ontology. OMIT concepts TargetGene and Protein are two other examples, which correspond to individual genes and individual gene products, respectively, in the GO

Page 49: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

OMIT GUI Design

Page 50: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

OMIT Summary

• It is an innovative computing framework based on the miRNA-domain-specific ontology

• It aims to handle the challenge of predicting miRNAs’ target genes

• The OMIT is the very first ontology in the miRNA domain

• It will assist biologists in unraveling important roles of miRNAs in human cancer, and thus help clinicians in making sound decisions when treating cancer patients

• Such long-term research goal will be achieved via facilitating knowledge discovery and sharing from existing sources

• The first version OMIT ontology has been added into NCBO BioPortal (http://bioportal.bioontology.org/ontologies/42873)

• Updates are available at the project website: http://omit.cis.usouthal.edu/

Page 51: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Presentation Outline

• Research Motivation

• Ontologies and Ontological Techniques

• Apply Ontological Techniques into Biological and Medical Research

• Ongoing Research – OMIT Project

Page 52: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Summary

• Knowledge discovery and sharing is critical in biological and medical research

• As a formal knowledge representation model, ontologies render great help in defining formal semantics

• Ontological techniques have been widely applied in the bioinformatics and medical informatics

• The most successful example is the Gene Ontology (GO) project

• Our ongoing project, OMIT, aims to investigate the challenging issue of miRNA target prediction in human cancer

Page 53: Ontology-Based Knowledge Discovery and Sharing in Biological and Medical Research

Thank you!!!

•Suggestions?

•Comments?

•Questions?