bio ont
TRANSCRIPT
-
8/6/2019 bio ont
1/19
Bio-Ontologies:
a New Means of Travelfor Biological Facts
DMSR
-
8/6/2019 bio ont
2/19
Outline
The role of Bio-Ontologies [BOs] in biological databases
Four interpretive steps in standardization
The epistemic status of BO terms: situating concepts
A new type of theory in biology? Back to Mary Hessesnetwork view
Implications: data traveland use across researchcontexts
Conclusion: on technology and theory-making
-
8/6/2019 bio ont
3/19
Biological Ontologies [BOs]
Context: Fast accumulation of data on model organisms, esp. genomics
Fragmentation of biology into local epistemic cultures
Common yearning for integrative understanding of organisms
Goal: enhance availabilityand usabilityof data acrossresearch contexts
Means: formal representations of areas of knowledge in which theessential terms are combined with structuring rules that describe the
relationship between the terms. Knowledge that is structured in a bio-ontology can then be linked to the molecular databases (Bard and Rhee 2004)
Precisely defined terms related through DAGs structures
Association of terms with datasets
-
8/6/2019 bio ont
4/19
.E.g. Gene Ontology:
Precise definition, large set of associated data
-
8/6/2019 bio ont
5/19
-
8/6/2019 bio ont
6/19
Wnt receptor signaling pathway
Search by GO
-
8/6/2019 bio ont
7/19
Searchreturnschildren
Sum ofMGIdata
-
8/6/2019 bio ont
8/19
Returns
set of
genes
annotated
to this
term
Search returnsannotations toterms and sub-
terms (children)
-
8/6/2019 bio ont
9/19
BO Terms as Standards
Standard = Coordination device facilitatinginterdisciplinary research (Berg 2004)
BO terms as neutral tools for scientific communicationand exchange:
Data are attached to specific BO terms purely for the purposes ofretrieval by biologists interested in investigating the phenomenonto which the term refer
No theoretical interpretation involved: BO terms are broadclassificatory concepts conceived to pass on information
without distorting or interpreting it
However: Interpretation in standardisation is unavoidable(Bowker & Star 1999)
-
8/6/2019 bio ont
10/19
Interpreting to Standardize: 4 Steps
1. Abstraction processes: Masking, distorting, simplifying oreliminating characteristics of entities to be standardised(data formatting)
2. De-contextualisation processes: Black-boxing specificinterests, methods and goals of producers of data
(non-locality: decoupling marks from provenance)
3. Knowledge-stabilisation processes: Assemble precisedefinitions for each term and relation so as to mirror (whatcurators see as) the consensus in contemporary biology
4. Situating processes: Associate each dataset with aspecific term (and thus a specific phenomenon)
= standardisation processes influence the database usersunderstanding and use of data
-
8/6/2019 bio ont
11/19
BO Terms as Situating Concepts
Unambiguously defined as referring to specificphenomena (knowledge-stabilisation process)
Through gene annotation, each available dataset isassociated with one or more BO term (situating process).This makes it possible to retrieve data relevant to thephenomena captured by those terms. But also, ..
.. it fixes the biological relevance of data as evidence:BO terms determine the range of phenomena to beresearched by reference to each dataset
BO terms are situating concepts = they determine thefuture applicability of data by fixing the research contextsin which data can be of use Vs. Unifying or explanatory concepts: do not aim at explaining
phenomena, but rather at describing a phenomenon so that data
associated with it can easily be retrieved
-
8/6/2019 bio ont
12/19
Select data about gene product TSK frompublication: Suzuki et al., 2005 Plant Cell Physiol.
46:736-742. TONSOKU Is Expressed in S Phaseof the Cell Cycle and Its Defect Delays Cell CycleProgression in Arabidopsis
Associate with term G2/M transition of mitoticcell cycle, which is defined as progression fromG2 phase to M phase of the mitotic cell cycle
Looking for data on the mitotic cell cycle,researchers find gene product TSK as relevant tothe G2/M transition
Gene product TSK could be relevant toresearching other parts of the mitotic cell cycle but there is no evidence for this, so the databasedoes not report this possibility
= the biological relevance of dataset Y is restrictedto the phenomenon captured by the term G2/Mtransition, thus excluding other, possibly relevantphenomena
Select data from
publication or
repository
Associate data
with GO term
GO term refers to
phenomenon X
Data are situated
as relevant to
phenomenon X
and not to other
phenomena
-
8/6/2019 bio ont
13/19
A New Type of Theory in Biology?
Mary Hesses three criteria:
1. Network of concepts: Situating concepts rather than unifying or
explanatory concepts
2. Observational and theoretical language Concepts are primarily meant to refer to existing
phenomena: mix of observational and theoretical
3. Internal coherence and economy: Consistency among terms = should not have the
same referents (otherwise redundant/obsolete)
Minimalism = the most useful standards are thosethat consist of the minimal number of the most
informative parameters (Brazma et al 2006, 594)
-
8/6/2019 bio ont
14/19
Implications
Data travel made easier: Easy retrieval and comparison
Easy to check and form new hypotheses
Relatively simple access skills: IT skills &acquaintance with BOs
What about data use?
Easy retrieval of information about dataprovenance (evidence codes)
BUT users need to be aware of interpretiveprocesses involved in standardization
-
8/6/2019 bio ont
15/19
Conclusion: When Technology Makes a
Difference to Theory-Making
Digital technology does not guarantee objectivity:
Curators estimate the biological relevance of data asevidence for phenomena
Curators define situating concepts
Yet, technology efficiently mediates between different(local) expertises:
Integration of data from various sources
Opportunity for comparisons and queries
Differential access to information depending onexpertise: layers of complexity and detail reachablethrough a mouse click
BOs & bioinformatics: towards integration without unification?
-
8/6/2019 bio ont
16/19
Abstract
Bio-ontologies are often presented as a neutral tool for the diffusionof facts about organisms to biologists: that is, as a way tostandardise the terminology and relations among terms used todescribe biological processes, so that the immense amount of(especially microbiological) data recently accumulated on variousaspects of the main model organisms can be brought together andmade accessible to the whole biological community. In this paper, I
argue that bio-ontologies are not a neutral vehicle for the diffusion ofevidence. Rather, they constitute a new type of biological theory,incorporating a specific perspective on biological phenomena,through which data are re-interpreted in order to fit specific researchgoals. Notably, one of these goals consists of integrating theavailable knowledge about various aspects of any organisms into anoverall understanding of their biology. The main issues that I shall
address in this paper are thus the following: how well do biologicalfacts circulate through bio-ontologies? How effective is the use ofbio-ontologies towards obtaining integration in biology? And whatkind of integration is that is it actually possible to distinguish it froma kind of theoretical unification? In addressing these questions, Ifocus on the use of one of the bio-ontologies, the so-called GeneOntology, to structure and display data about Arabidopsis thaliana
within The Arabidopsis Information Resource.
-
8/6/2019 bio ont
17/19
No
associated
data!
-
8/6/2019 bio ont
18/19
-
8/6/2019 bio ont
19/19
OpensBrowser