bio ont

Upload: raazia-mir

Post on 07-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 bio ont

    1/19

    Bio-Ontologies:

    a New Means of Travelfor Biological Facts

    DMSR

  • 8/6/2019 bio ont

    2/19

    Outline

    The role of Bio-Ontologies [BOs] in biological databases

    Four interpretive steps in standardization

    The epistemic status of BO terms: situating concepts

    A new type of theory in biology? Back to Mary Hessesnetwork view

    Implications: data traveland use across researchcontexts

    Conclusion: on technology and theory-making

  • 8/6/2019 bio ont

    3/19

    Biological Ontologies [BOs]

    Context: Fast accumulation of data on model organisms, esp. genomics

    Fragmentation of biology into local epistemic cultures

    Common yearning for integrative understanding of organisms

    Goal: enhance availabilityand usabilityof data acrossresearch contexts

    Means: formal representations of areas of knowledge in which theessential terms are combined with structuring rules that describe the

    relationship between the terms. Knowledge that is structured in a bio-ontology can then be linked to the molecular databases (Bard and Rhee 2004)

    Precisely defined terms related through DAGs structures

    Association of terms with datasets

  • 8/6/2019 bio ont

    4/19

    .E.g. Gene Ontology:

    Precise definition, large set of associated data

  • 8/6/2019 bio ont

    5/19

  • 8/6/2019 bio ont

    6/19

    Wnt receptor signaling pathway

    Search by GO

  • 8/6/2019 bio ont

    7/19

    Searchreturnschildren

    Sum ofMGIdata

  • 8/6/2019 bio ont

    8/19

    Returns

    set of

    genes

    annotated

    to this

    term

    Search returnsannotations toterms and sub-

    terms (children)

  • 8/6/2019 bio ont

    9/19

    BO Terms as Standards

    Standard = Coordination device facilitatinginterdisciplinary research (Berg 2004)

    BO terms as neutral tools for scientific communicationand exchange:

    Data are attached to specific BO terms purely for the purposes ofretrieval by biologists interested in investigating the phenomenonto which the term refer

    No theoretical interpretation involved: BO terms are broadclassificatory concepts conceived to pass on information

    without distorting or interpreting it

    However: Interpretation in standardisation is unavoidable(Bowker & Star 1999)

  • 8/6/2019 bio ont

    10/19

    Interpreting to Standardize: 4 Steps

    1. Abstraction processes: Masking, distorting, simplifying oreliminating characteristics of entities to be standardised(data formatting)

    2. De-contextualisation processes: Black-boxing specificinterests, methods and goals of producers of data

    (non-locality: decoupling marks from provenance)

    3. Knowledge-stabilisation processes: Assemble precisedefinitions for each term and relation so as to mirror (whatcurators see as) the consensus in contemporary biology

    4. Situating processes: Associate each dataset with aspecific term (and thus a specific phenomenon)

    = standardisation processes influence the database usersunderstanding and use of data

  • 8/6/2019 bio ont

    11/19

    BO Terms as Situating Concepts

    Unambiguously defined as referring to specificphenomena (knowledge-stabilisation process)

    Through gene annotation, each available dataset isassociated with one or more BO term (situating process).This makes it possible to retrieve data relevant to thephenomena captured by those terms. But also, ..

    .. it fixes the biological relevance of data as evidence:BO terms determine the range of phenomena to beresearched by reference to each dataset

    BO terms are situating concepts = they determine thefuture applicability of data by fixing the research contextsin which data can be of use Vs. Unifying or explanatory concepts: do not aim at explaining

    phenomena, but rather at describing a phenomenon so that data

    associated with it can easily be retrieved

  • 8/6/2019 bio ont

    12/19

    Select data about gene product TSK frompublication: Suzuki et al., 2005 Plant Cell Physiol.

    46:736-742. TONSOKU Is Expressed in S Phaseof the Cell Cycle and Its Defect Delays Cell CycleProgression in Arabidopsis

    Associate with term G2/M transition of mitoticcell cycle, which is defined as progression fromG2 phase to M phase of the mitotic cell cycle

    Looking for data on the mitotic cell cycle,researchers find gene product TSK as relevant tothe G2/M transition

    Gene product TSK could be relevant toresearching other parts of the mitotic cell cycle but there is no evidence for this, so the databasedoes not report this possibility

    = the biological relevance of dataset Y is restrictedto the phenomenon captured by the term G2/Mtransition, thus excluding other, possibly relevantphenomena

    Select data from

    publication or

    repository

    Associate data

    with GO term

    GO term refers to

    phenomenon X

    Data are situated

    as relevant to

    phenomenon X

    and not to other

    phenomena

  • 8/6/2019 bio ont

    13/19

    A New Type of Theory in Biology?

    Mary Hesses three criteria:

    1. Network of concepts: Situating concepts rather than unifying or

    explanatory concepts

    2. Observational and theoretical language Concepts are primarily meant to refer to existing

    phenomena: mix of observational and theoretical

    3. Internal coherence and economy: Consistency among terms = should not have the

    same referents (otherwise redundant/obsolete)

    Minimalism = the most useful standards are thosethat consist of the minimal number of the most

    informative parameters (Brazma et al 2006, 594)

  • 8/6/2019 bio ont

    14/19

    Implications

    Data travel made easier: Easy retrieval and comparison

    Easy to check and form new hypotheses

    Relatively simple access skills: IT skills &acquaintance with BOs

    What about data use?

    Easy retrieval of information about dataprovenance (evidence codes)

    BUT users need to be aware of interpretiveprocesses involved in standardization

  • 8/6/2019 bio ont

    15/19

    Conclusion: When Technology Makes a

    Difference to Theory-Making

    Digital technology does not guarantee objectivity:

    Curators estimate the biological relevance of data asevidence for phenomena

    Curators define situating concepts

    Yet, technology efficiently mediates between different(local) expertises:

    Integration of data from various sources

    Opportunity for comparisons and queries

    Differential access to information depending onexpertise: layers of complexity and detail reachablethrough a mouse click

    BOs & bioinformatics: towards integration without unification?

  • 8/6/2019 bio ont

    16/19

    Abstract

    Bio-ontologies are often presented as a neutral tool for the diffusionof facts about organisms to biologists: that is, as a way tostandardise the terminology and relations among terms used todescribe biological processes, so that the immense amount of(especially microbiological) data recently accumulated on variousaspects of the main model organisms can be brought together andmade accessible to the whole biological community. In this paper, I

    argue that bio-ontologies are not a neutral vehicle for the diffusion ofevidence. Rather, they constitute a new type of biological theory,incorporating a specific perspective on biological phenomena,through which data are re-interpreted in order to fit specific researchgoals. Notably, one of these goals consists of integrating theavailable knowledge about various aspects of any organisms into anoverall understanding of their biology. The main issues that I shall

    address in this paper are thus the following: how well do biologicalfacts circulate through bio-ontologies? How effective is the use ofbio-ontologies towards obtaining integration in biology? And whatkind of integration is that is it actually possible to distinguish it froma kind of theoretical unification? In addressing these questions, Ifocus on the use of one of the bio-ontologies, the so-called GeneOntology, to structure and display data about Arabidopsis thaliana

    within The Arabidopsis Information Resource.

  • 8/6/2019 bio ont

    17/19

    No

    associated

    data!

  • 8/6/2019 bio ont

    18/19

  • 8/6/2019 bio ont

    19/19

    OpensBrowser