the semantic web and knowledge grids
TRANSCRIPT
TECHNOLOGIES
DRUG DISCOVERY
TODAY
The Semantic Web and KnowledgeGridsCarole Goble*, Robert Stevens, Sean BechhoferSchool of Computer Science, The University of Manchester, Oxford Road, Manchester, UK M13 9PL
Drug Discovery Today: Technologies Vol. 2, No. 3 2005
Editors-in-Chief
Kelvin Lam – Pfizer, Inc., USA
Henk Timmerman – Vrije Universiteit, The Netherlands
Knowledge management
The Semantic Web and the Knowledge Grid are
recently proposed technological solutions to distribu-
ted knowledgemanagement. Early experimental appli-
cations from the Life Science community indicate that
the approaches have promise and suggest that this
community be an appropriate nursery for grounding,
developing and hardening the current, rather imma-
ture, machinery needed to deliver on the technological
visions, which thus far have been dominated by tech-
nological curiosity rather than application-led practi-
cality and relevance. Further necessary developments
in theory, infrastructure, tools, and content manage-
ment should and could be steered opportunistically by
the needs and applications of Life Science.
Introduction: what is the Semantic Web?
The Web has served life scientists well. Many data sets and
tools are published and accessed using web protocols and web
browsers. Sharing data repositories and tool libraries is
straightforward. Widespread collaboration is possible by pub-
lishing a simple web page. Thus, the Web enables individual
scientists to answer simple ‘low volume’ questions over large
but relatively simple data sets without needing a profound
knowledge of computer science. However, standard web
technology is now straining to meet the needs of biologists.
A Web-based distributed information infrastructure is a place
where a person performs complex tasks and computers pre-
*Corresponding author: C. Goble ([email protected])URL: http://www.cs.man.ac.uk/�carole/
1740-6749/$ � 2005 Elsevier Ltd. All rights reserved. DOI: 10.1016/j.ddtec.2005.08.005
Section Editor:Manuel Peitsch – Novartis, Basel, Switzerland
sent and fetch web pages. People have to manually search the
Web for content or just know where to go; interpret and process
page content by reading it and interacting with web pages;
infer crosslinks between information in web pages or other
sites; integrate content from multiple resources and consolidate
the heterogeneous information while preserving the under-
standing of its context. Well-known specialist applications like
SRS and Entrez are designed to overcome these difficulties but
they are not general solutions, and the interpretation of the
information is still buried in the application code. Service
providers still publish resources assuming that a person will
be ‘point-clicking’ at a browser and reading the text; without
a programmatic interface, automatic processing is difficult
and fragile.
The Semantic Web could enable automated processing and
effective reuse of information on the Web that would support
intelligent searching and improved interlinking [1,2]. The
Semantic Web is an extension of the Web in which informa-
tion is given well-defined meaning by being associated with
metadata described in common terms [3]. Ontologies repre-
sent the vocabulary terms, and how they inter-relate, for the
concepts shared by a community. This requires that all kind
of web content be marked up with metadata that encodes its
meaning in a way that is machine-interpretable and hence be
processed by agents, search engines and applications to auto-
mate the content discovery and integration tasks that people
currently do manually. The meaning of information is
embedded in the Web, not the applications, so it can be
unambiguously and reliably shared across many applications.
www.drugdiscoverytoday.com 225
Drug Discovery Today: Technologies | Knowledge management Vol. 2, No. 3 2005
Thus, the Web could evolve from documents published for
people to read to knowledge published for computer applica-
tions to process. We can think of it as a way of representing
data on the Web or as a globally linked knowledge base.
In practice there will be many high-quality Semantic Webs
linked together through regular Web links and search mechan-
isms or low-quality metadata. Semantic Webs are costly and
difficult to produce and maintain, so only community ‘web-
lets’ or corporate intranets that gain from the significant added
value will bother with the high quality metadata needed.
Semantic Webs for Life Science communities and organisa-
tions involved in drug discovery are obvious candidates. Why?
They are knowledge driven, fragmented, and have valuable
knowledge assets whose contents need to be combined and
used by many applications. The content is diverse, being
structured (databases, electronic lab books), semistructured
(papers, ExcelTM sheets) and unstructured (PowerpointTM
documents, Web blogs, images). Its scale necessitates that
the processing be done automatically. There are many suppli-
ers and consumers of knowledge and a loose coupling between
suppliers and consumers – information is used in unantici-
pated ways by knowledge workers unknown to those who
deposited it. People naturally form communities of practice,
and there is a culture of sharing and knowledge curation. The
explosion of data coupled with the need to innovate means the
problem is mainstream and urgent.
Knowledge Grids
The Semantic Web aims to facilitate machine support for
distributed knowledge management through the provision of
a semantic infrastructure. The Grid aims to support secure
and flexible coordinated resource sharing through the provi-
sion of a middleware platform for advanced distributing
computing [4]. Grid machinery aims to allow the collection
of all kinds of resources – computing, storage, data sets, digital
libraries, scientific instruments, people, among others – to
easily form Virtual Organisations that cross organisational
boundaries and work together to solve a problem. Computa-
tional-file based Grids are the most mature, harnessing avail-
able compute power to support compute-intensive analysis
applications. Whereas Computational Grids present the illu-
sion of a single virtual computer to an application, Data Grids
present a single virtual data store that is really distributed and
multilocated. Portals provide a way for application develo-
pers to submit their compute job or query. On top of these
‘plumbing’ Grids, Application Grids aim to present the illu-
sion that applications work together when they do not. Grid
Computing is supported by the Global Grid Forum (http://
www.ggf.org) and the Enterprise Grid Alliance (http://
www.gridalliance.org/) through a series of standards, core
services and applications. Standardisation routes include
OASIS and IETF. The Grid has a strong industry influence
and major applications pull chiefly from global, large-scale
226 www.drugdiscoverytoday.com
scientific collaborations. The chief emphasis to date has been
on practical deployment of data and building compute grids.
Over the past four years the Open Grid Service Architecture
Grid initiative has revised Grid computing to adopt the
Service Oriented Architecture paradigm using Web Services,
a distributed computing platform from the Web community
and heavily supported by industry. This revision is still in a
state of flux, having had one false start in the OGSI specifica-
tion (now the WSRF specification). OGSA services are still at
the prototype stage. However, OGSA does bring together the
Web and Grid communities to a common platform.
Knowledge Grids are much less established as a concept, and
the term is still controversial [5]. Cannataro and Talia [6] define
the term as an environment for the design and execution of
geographically distributed high-performance knowledge dis-
covery applications, which is pretty much the purpose of all
Grids. Knowledge Grids use knowledge-based methodologies,
including knowledge engineering tools, discovery and analysis
techniques such as data mining and machine learning, intel-
ligent software agents, mathematical modelling, simulation or
planning. Goble and De Roure [7] distinguish a Knowledge
Grid from a Semantic Grid by suggesting that Knowledge Grids
apply to knowledge associated with Grid domain applications
and resources and Semantic Grids to knowledge associated
with grid middleware and entities. Either way, Knowledge
Grids are intended to provide intelligent guidance for decision
makers, through an agreed knowledge representation and the
provision of homogeneous access over heterogeneous sources
of information, metadata and ontologies.
We interpret a Knowledge Grid as a knowledge base to
support distributed knowledge discovery applications. From
this standpoint, a Semantic Web is a Knowledge Grid with the
emphasis on distributed knowledge representation and inte-
gration, and a Knowledge Grid is a platform for distributed
knowledge processing over a Semantic Web. In combination
they seek to support automation by exposing the implicit and
tacit knowledge used by humans and providing an appro-
priate programming infrastructure.
Key Semantic Web machinery
The Semantic Web has the backing of the World Wide Web
Consortium (W3C) standards organisation (World Wide Web
Consortium Semantic Web Activity, http://www.w3.org/
2001/sw/) and has spawned extensive research, development
and standards activity supported by industry and academia. A
range of technologies and machinery is needed to deliver
vision, and these are currently in various states of maturity,
from established standards and commercial products to
things improbable.
There are four key ingredients necessary to deliver a Seman-
tic Web: (a) a universal model of metadata, knowledge and
assertions over the current web. This forms a web of knowl-
edge layered over conventional web (or Grid, resources); (b)
Vol. 2, No. 3 2005 Drug Discovery Today: Technologies | Knowledge management
semantic content; (c) tools to build and maintain the model
and capture the content; and (d) knowledge-driven applica-
tions that use the semantic infrastructure. To date the com-
puting community has concentrated on (a) and (c). We can
legitimately say the Semantic Web is technology pushed
rather than application pulled. Standardisation activities
are focused on (a); commercial activities by specialist and
major vendors (notably Oracle, IBM and HP) are focused on
(c) and (d).
The Semantic Web model has five main components
(Fig. 1), usually presented as a layered stack. XML is a carrying
syntax for all the languages of the model and for all the
languages of the Web.
Identification
The Universal Resource Identifier (URI) ensures a unique
identity for each Semantic Web entity. The Life Science
Figure 1. The layered architecture of the Semantic Web.
Identifier (LSID) [8] protocol is a domain-specific protocol
that introduces a standardised way of naming data resources,
backed by an OMG standard. LSIDs have been successfully
adopted [9,10] (BioDash http://www.w3.org/2005/04/swls/
BioDash/Demo/), but teething problems include poor sup-
port for versions and some confusion over what data can
change without changing identity. Semantic Web purists
claim that the LSID is unnecessary although it seems not
to have developed applications for life scientists.
Metadata annotation
Resources are annotated by metadata that asserts facts about
and between them and their content in a common, flexible
data model. The Resource Description Framework (RDF;
http://www.w3.org/RDF/) describes objects and relations
between them in a self-describing data model. RDF is a key
to integrated and federated data storage, making this the
www.drugdiscoverytoday.com 227
Drug Discovery Today: Technologies | Knowledge management Vol. 2, No. 3 2005
‘integration’ layer of the Semantic Web infrastructure. We
have webs of metadata (Fig. 1) as well as webs of pages. The
RDF model is based on ‘subject-predicate-object’ statements
(‘triples’), assembled into graphs that assert facts about and
between resources (Fig. 2(1 and 2)). Facts are commonly held
separate from the resource held in triple stores (http://simile.-
mit.edu/reports/stores/). Oracle Corp is incorporating RDF
support into their regular relational databases (http://www.or-
acle.com/technology/tech/semantic_technologies/). RDF sup-
ports statements about statements, crucial for provenance
(‘this fact is asserted by EMBL-EBI’); timestamping (‘this fact
is asserted on 31-05-2005’) and meta-statements (‘this fact is
untested’). Although the RDF standard is established, the
technologies are still relatively immature. There is a confusing
range of RDF query languages although the SPARQL RDF query
language is being standardised (http://www.w3.org/TR/rdf-
sparql-query/). Performance over medium-large data sets is
disappointing. There is poor support for grouping statements
(‘named graphs’) [11]. Representing RDF within an HTML page
is an issue (http://www.cs.vu.nl/�guus/public/carroll-rdf-
html.pdf). The RDF syntax is designed for machines and
should never be revealed to humans; the same is true for
XML, of course, but RDF is particularly distracting.
Figure 2. The Semantic Bus. Suppliers (resources) expose their contents as RD
provide the glue that ties the data together and the knowledge that helps us to i
contents of resources, instruments and data sources in a uniform fashion – for ex
GSK3beta is associated with diabetes type 2. (2) Graph merging based on identit
GSK3beta and diabetes. (3) Ontologies provide the consensus and shared kno
phosphorylates proteins, which is an enzyme catalysis, which is a kind of control in
reference genes and their products. (4) Ontologies and rules provide an infere
GSK3beta might play in the Insulin Signal Transduction pathway. (5) The Semanti
gathered together and that the knowledge tools and inference engines process
228 www.drugdiscoverytoday.com
Knowledge
A shared interpretation of what the metadata means requires
ontologies for describing controlled vocabularies and back-
ground knowledge. Ontologies – consensual, shared models
in an executable form of concepts, relations and their con-
straints tied to a scaffold of taxonomies [12] – are common in
Life Sciences (http://www.sofg.org and http://obo.sourcefor-
ge.net/). We use ontology terms to assert facts about
resources, like web pages, databases, services, a protein, a
compound, a gene, a person, among others (Fig. 2(3 and
4)). Topic Maps (http://www.topicmaps.org/) have been pro-
posed as ‘indexing’ mechanisms for the Web content. How-
ever, the two standardised languages for exchanging and
representing ontologies are: RDF Schema (RDFS) [13], for
simple taxonomies and OWL [14], a family of Web Ontology
Languages extending RDFS (see [15] in this issue). The Seman-
tic Web Rule Language (SWRL) [16] adds rules to OWL knowl-
edge bases, encoding constraints and supporting further
deduction of knowledge. This is undergoing standardisation
with only prototype implementations and no commercial
support. Specialist vendors such as Ontoprise (http://
www.ontoprise.com) and Cerebra (http://cerebra.com) pro-
vide OWL tool suites.
F, allowing Consumers (applications) to make use of the data. Ontologies
nterpret the data. (1) RDF provides a common data model, exposing the
ample glycogen synthase kinase 3 beta (GSK3beta) is a protein kinase, and
ies from URIs provides links at the syntactic level to link resources about
wledge that ties data together semantically – that a protein kinase
teraction or that chemical entities reference compounds and drug targets
nce apparatus for generating implied knowledge, to infer the role that
c Bus is the collective knowledge base that could be physically or virtually
.
Vol. 2, No. 3 2005 Drug Discovery Today: Technologies | Knowledge management
Inference and reasoning
Metadata, ontologies and rules combine to make a distributed
knowledge base. The added value is to use the OWL and
SWRL computational reasoning capabilities to infer new
unasserted facts on or between resources and classify
resources based on their descriptions. The design of OWL
has been strongly influenced by its reasoning procedures.
However, concerns include the benefits, performance and
scalability of reasoning mechanisms and whether they can
cope with the incomplete or inconsistent knowledge found
in reality. One practice is to use reasoning ‘offline’, fix the
results and use these in real-time applications [17]. A few
reasoning engines are available commercially (Cerebra Ser-
verTM, RacerProTM http://www.racer-systems.com/) to be
embedded in applications and middleware.
Trust, proof, policy and context
The separation of assertions from the resource, and the
ability to assert facts about facts, is intended to support a
tangled accumulative web of third party assertions over
resources by those other than the resource creators or own-
ers. The Semantic Web is envisioned as a ‘democracy’ where
everyone can annotate resource or an annotation. The
Semantic Web is changeable, inconsistent and will contain
many dubious, outdated or conflicting statements.
Metadata on the metadata ties assertions (inferred or stated)
to a context such as its origin or creation date. This is
important for intellectual property, provenance tracing,
accountability and security as well as untangling contra-
dictions or weighting support for an assertion. Thus, in
addition to webs of metadata we gain webs of meta-meta-
data (Fig. 1). This is the least well-developed layers of the
technology stack. Very little infrastructure exists, the issues
are poorly understood and research is patchy and imma-
ture.
Semantic Web content
Without content, the Semantic Web has no semantics.
Acquiring the ontologies, rules and metadata is the prime
bottleneck to adoption. The Life Sciences have made great
efforts to develop standard domain ontologies for annotating
data sets (e.g. The Gene Ontology) and indexing documents
(e.g. UMLS). Ontologies have also been developed for describ-
ing services [17]. However, a Semantic Web for Life Sciences
also needs ontologies for publications, resources, experi-
ments, hypothesis, analysis, workflows among others as well
as people and organisations. Concerns arise over the com-
plexity of developing and maintaining ontologies. Knowl-
edge engineering tools like Protege-OWL [18], still tend to be
oriented to knowledge engineers not subject specialists, and
there is a paucity of shared best practices. Machine learning
and language processing to automatically generate ontolo-
gies [19] is in its infancy.
Annotation of resources with metadata splits into (a) high
quality manual annotation using tools like OntoMat Anno-
tizer [20] and (b) automatic (or semiautomatic) annotation
using text mining [21], language processing techniques [22]
or other forms of processing for other types of data. Text
mining, UMLS, MeSH and a culture of manual curation make
these approaches feasible. An alternative is to generate meta-
data in RDF directly from the content generating services, like
content management systems, data mining tools or by service
providers [23]. Semantic content generated by service provi-
ders, publishers and instrument suppliers needs to become
universal and ubiquitous to have an impact. UniProt (http://
www.isb-sib.ch/�ejain/rdf/) exports results in RDF, as do
some publishers like Nature [24] but these are exceptions.
Fig. 1 shows that there will be webs of knowledge in
addition to webs of metadata and web pages. Multiple, pos-
sibly overlapping, ontologies need aligning, merging and
linking, multiple metadata even on the same resource that
can potentially conflict multiple rules and multiple assertions
of trust and context. Just how this will work in practice, the
mechanisms needed to cope and the tools and services to
assist applications are still research subjects. The current
reality is either simple versions of the Semantic Web, or these
problems being passed on to the applications.
Semantic Web applications
Knowledge-driven applications consume the semantic infra-
structure. Applications have been mainly confined to corpo-
rate intranets for managing ring-fenced knowledge assets –
semantic islands in the sea of the Web. Current applications
divide into those that emphasise a Web of Semantics and
those that just emphasis the processing of knowledge. We
take the knowledge-oriented applications first.
Ontology development
The inferencing capabilities of OWL have been shown to aid
the building of large and sophisticated ontologies such as The
Gene Ontology [25] and BioPAX (http://www.biopax.org/).
The concept classification is derived and inconsistencies are
automatically identified. The standardisation of a language
significantly helps the exchange of ontologies. However,
there are some problems with the expressivity of OWL for
Life Science, Chemical and Clinical ontologies.
Advanced metadata modelling
The self-describing nature of RDF and OWL models enables
flexible descriptions for data collections, suiting those whose
schemas might evolve and change or whose data types are
hard to fix, like knowledge bases of scientific hypotheses,
provenance records of in silico experiments [26,27] or pub-
lication collections.
Now consider the applications where metadata is asso-
ciated with distributed resources.
www.drugdiscoverytoday.com 229
Drug Discovery Today: Technologies | Knowledge management Vol. 2, No. 3 2005
Intelligent searching and document discovery
Semantically enabled search engines, like TAP [28] could
support ‘concept-based’ searches over content annotations,
exploiting the structure of the ontology by automatically
narrowing or broadening search terms. The ontology hier-
archy classifies the contents of metadata collections. An
ontology definition for an oncogene including references
to organism, function, locus, and associated diseases anno-
tating a web page about function can be inferred to link to a
paper on locus despite the paper not mentioning it [1].
Semantically powered catalogues and service registries can
find data sets and tools based on ontological descriptions.
Semantic Portals describe the properties, relationships and
classifications of the various information items using an
ontology, for example, PlanetOnto (http://kmi.open.ac.uk/
projects/kmi-planet/) [29] and the SEmantic portAL [30].
Social and knowledge networking
A key knowledge commodity is who knows what and making
connections with similar minded communities. The popular
Friend of a Friend (FOAF) project http://www.foaf-project.org/
creates a Web of machine-readable homepages describing
people, the links between them and the things they create
and do. In addition to providing simple directory services,
FOAF can be used to provide assistance to new entrants in a
community and locate people with similar interests. It is a
simple RDF vocabulary; the power is the links. The big plus
comes when the data is aggregated and can then be explored
and crosslinked. SciFOAF builds a FOAF community mined
from the analysis of authors and publications over PubMed
(http://www.urbigene.com/foaf/).
Advanced content syndication and publication
Syndication is the process by which a web site is able to share
information, such as articles, with other web sites. Scientific
publishers like the Institute of Physics (http://syndication.io-
p.org/) and the International Union of Crystallography
(http://journals.iucr.org/services/rss.html) publish RSS feeds
in RDF using standard RSS, Dublin Core and PRISM (http://
www.prismstandard.org/) RDF vocabularies. Uniprot has an
experimental publication of results in RDF. As with FOAF, the
fun starts when data is aggregated and crosslinked.
Integration, aggregation and crosslinking
Results in RDF graphs yielded from multiple heterogeneous
sources, say, UniProt, Interpro and PubMed, are manipulated
and combined using shared ontological concepts and corre-
sponding URIs as bridges (Figs 2 and 3). The semantic defini-
tions serve as ‘smart glue’ specifying how objects relate to
each other, managing these inferred or explicitly proposed
relations outside any one repository and aggregating geno-
mic, proteomic, cellular, physiological, and chemical data,
even when these data are kept in different databases with
230 www.drugdiscoverytoday.com
different schemas and bridging the structured and unstruc-
tured free texts. Shared ontologies and the common RDF data
model help overcome the data of syntactic and semantic
heterogeneity, forming a global integration ‘Semantic bus’
that knowledge discovery applications build upon. For exam-
ple, YeastHub [31] converts the outputs of a variety of data-
bases into RDF and combines them in a warehouse built over
a native RDF data store. BioDASH (http://www.w3.org/2005/
04/swls/BioDash/Demo/) (Fig. 3) is an experimental Drug
Development Dashboard that uses RDF and OWL to associate
disease, compounds, drug progression stages, molecular biol-
ogy and pathway knowledge for a team of users. Correspon-
dences are not necessarily obvious to detect, requiring
specific rules. BioDASH uses ‘Semantic Lenses’ to filter and
aggregate RDF particularly in bio-orientation – for example, a
pathway lens.
Application interoperability
It is not just web pages that can be annotated; middleware like
the Web and Grid services can also be annotated to make
them accessible to applications [32]. Techniques for plan-
ning, composing, editing, reasoning and analysing about
these descriptions are being investigated and deployed to
resolve semantic interoperability between services within
scalable, open environments. Semantic annotations on appli-
cations and data sets enable them to be intelligently discov-
ered, compared and combined into workflows [33,34] to run
complex applications, integrate data and mine knowledge
(DiscoveryNet http://www.discovery-on-the.net/) [9].
Knowledge Grids and Knowledge mining
The purpose of all this technology is to create a knowledge
substrate for knowledge mining. Examples of Grid-based
Problem Solving Environments that integrate ontology and
workflow approaches for bioinformatics application on the
Grid include Proteus [35], myGrid [9] and DiscoveryNet
(http://www.discovery-on-the.net/).
Concluding remarks
The Semantic Web, and its less well-defined cousin the
Knowledge Grid, promises a paradigm shift for the delivery
of knowledge just as the Web changed how we publish and
find information. Certainly there is a tremendous amount of
activity in the development of standards, machinery and
tools for the semantic infrastructure and research into its
theoretical underpinnings. The technologies being devel-
oped have already demonstrated their value for ontology
development and advanced metadata modelling for general
knowledge-based applications. The first W3C Semantic Web
for Life Science Workshop in 2004 attracted over 100 parti-
cipants with representation from all the major pharmaceu-
tical and drug discovery players. Within a closed enterprise
pharmaceutical environment, such as KSpace in Novartis, or
Vol. 2, No. 3 2005 Drug Discovery Today: Technologies | Knowledge management
Figure 3. The BioDASH Drug Development Dashboard prototype. BioDASH associates disease, compounds, drug progression stages, molecular biology,
and pathway knowledge for a team of users. It is based on the concept of a therapeutic topic model. Information sources are exposed as RDF – common
vocabularies then tie these resources together. Applications can then make use of knowledge services and inference to support user tasks. The figure shows
an investigation into the therapeutic value of glycogen synthase kinase 3 beta (GSK3beta), a regulatory enzyme associated with multiple diseases, including
diabetes type 2. The BioDASH demonstration is built on Haystack [10], which is an extensible Semantic Web Browser developed by MIT. Data is aggregated
and browsed from OMIM and Uniprot by using LSIDs resolved into RDF and automatically linked with a BioPAX pathway, also in RDF, using common LSIDs.
The pathway view combines the protein and interacting compounds views. The BioPAX model is represented in OWL. Current work uses reasoning to
consistency check nutrient-related analyses for essential compounds, missing essential compounds, and reactions, both fired and unfired.
bounded communities, like the Alzheimer’s Forum we are
beginning to see Semantic Webs as they were envisioned.
For a Semantic Web to flourish, the communities it would
serve needs to be knowledge driven, globally distributed and
able and willing to create and maintain the semantic content.
It is this latter point that is crucial. As Gardner [15] suggests in
this issue, the drug discovery related communities are embra-
cing ontologies. The Life Science world has the desire for
collaboration, a culture of annotation, and act as service
providers that might be persuaded to generate RDF or at least
annotated XML. A Semantic Web is expensive to set up and
maintain and, thus, is only probable to work for communities
where the added value is worthwhile and an ‘open source
data’ philosophy prevails.
However, beware of the hype. Most of the technologies are
yet to be established and many key components – trust,
security, context – are missing. Reusing data in unexpected
and uncontrolled ways might not be desirable. There are
sceptics [36] who argue that XML is enough, and for tightly
defined and static problems they could be right. The best
examples of Semantic Web applications have been when the
emphasis has been on the content and connectivity rather
than elaborate reasoning. Simple ontologies and simple inte-
gration approaches, like FOAF and RSS, have had an impact.
Elaborate reasoning has chiefly been used in knowledge
applications or for building large ontologies, rather than
integrating metadata, despite its centre stage position in
the W3C standards activity. Although Semantic Web advo-
cates like to distance themselves from A.I. the Semantic Web
is still sold with far-fetched A.I. scenarios [3]. In reality it is the
Web aspect – the integration layer and a means to simply link
heterogeneous semistructured data – that will win out rather
than the rich semantics, although the research activity, and
hype, has primarily been the other way about [37]. The
important lesson is that a Semantic Web needs Semantics,
however simple and however scruffy.
Up to now there has been more emphasis on technology
push and not enough on application and service provider
www.drugdiscoverytoday.com 231
Drug Discovery Today: Technologies | Knowledge management Vol. 2, No. 3 2005
Links
� The Semantic Web portal http://www.semanticweb.org/
� Semantic Web for Life Sciences http://www.biopathways.org/
semweb/
� W3C Semantic Web Activity http://www.w3.org/2001/sw/
� Semantic Grid portal http://www.semanticgrid.org
� Global Grid Forum http://www.ggf.org
� W3C Semantic Web for Life Science Workshop http://
www.w3.org/2004/07/swls-ws.html
� Standards and Ontologies for Functional Genomics http://
www.sofg.org
� Open Biological Ontologies portal http://obo.sourceforge.net/
� RDF Resource Description Framework http://www.w3.org/RDF/
� RDF Schema http://www.w3.org/TR/rdf-schema
� OWL Web Ontology Language http://www.w3.org/2004/OWL/
� LSID Life Science Identifier http://lsid.sourceforge.net/
� Friend of a Friend, FOAF http://www.foaf-project.org/
pull. While waiting for the ‘killer application’, the machinery
for the platform has been built in a partial vacuum. Some of
this infrastructure is cumbersome and hard for application
developers to engage with – the layering of OWL over RDF
lacks elegance and the number of layers and components can
make debugging tricky. Service providers need migration
tools and application developers need client libraries and
simple APIs, and both need appropriate tooling. In an ideal
world, users should never see the Semantic Web at all.
The Web needed to be incubated by a highly motivated
community with an application and a generous spirit – in this
case, High Energy Physics. The Semantic Web needs the
nursery of another community that would benefit hugely
from its capabilities. It is not e-commerce. It should be drug
discovery.
Related articles
Hendler, J. (2003) Science and the Semantic Web. Science 299, 520–521
Berners-Lee, T. et al. (2001) The Semantic Web. Scientific American,
May
Neumann, E. (2005) A Life Science Semantic Web: are we there yet?
Sci. STKE 2005, 10 May
Outstanding issues
� Making it semantic: content acquisition and content management.
� Making it Web: tackling distribution, heterogeneity and
inconsistency of metadata, ontologies and rules.
� Making it understandable: lowering the barriers of entry for
developers and simplifying the complexity of the components and
language layers.
� Making it safe: security, trust, proof and policy management.
� Making it accessible: providing the tooling and machinery for
applications and content providers.
� Making it work: focus on performance and scalability, not just
language expressivity.
232 www.drugdiscoverytoday.com
References1 Hendler, J. (2003) Science and the Semantic Web. Science 299, 520–521
2 Neumann, E. (2005) A Life Science Semantic Web: are we there yet? Sci.
STKE 283, pe22
3 Berners-Lee, T. et al. (2001) The Semantic Web. Scientific American, May
4 Foster, I. and Kesselman, C. (1999) The Grid: Blueprint for a New Computing
Infrastructure. Morgan Kaufmann
5 Goble, C.A. et al. (2000) Enhancing services and applications with
knowledge and semantics. In The Grid: Blueprint for a New Computing
Infrastructure (2nd edn) (Foster, I. and Kesselman, C., eds), Morgan
Kaufman
6 Cannataro, M. and Talia, D. (2004) Semantic and Knowledge Grids:
building the next-generation grid. IEEE Intell. Syst. (ISSI-0095-1203) –
Special Issue on E-Science 19, 56–63
7 Goble, C. and De Roure, D. (2004) The Semantic Grid: building bridges and
busting myths. 16th European conference on Artificial Intelligence ECAI 2004
including Prestigious Applicants of Intelligent Systems, 22–27 August 2004,
Valencia, Spain, PAIS IOS Press (ISBN 1-58603-452-9)
8 Clark, T. et al. (2004) Globally distributed object identification for
biological knowledgebases. Brief. Bioinform. 5.1, 59–70 (http://
lsid.sourceforge.net/)
9 Stevens, R. et al. (2004) Exploring Williams–Beuren Syndrome using
myGrid. Bioinformatics 20, i303–i310. Proceedings of 12th Intelligent Systems
in Molecular Biology (ISMB), 31st July–4th August, Glasgow, UK (myGrid
http://www.mygrid.org.uk)
10 Quan, D. et al. (2003) Haystack: a platform for authoring end user semantic
web applications. Proceedings of the 2nd Intl. SemanticWeb Conference ISWC
2003, Sanibel, FL
11 Carroll, J.J. et al. (2004) Named graphs, provenance and trust. Proceedings of
the 14th International Conference on World Wide Web, Chiba, Japan. pp.
613–622
12 Stevens, R.D. (2000) Ontology-based knowledge representation for
bioinformatics. Brief. Bioinform. 1, 398–414
13 RDF Vocabulary Description Language 1.0: RDF Schema. W3C
Recommendation, 10th February 2004 (http://www.w3.org/TR/rdf-schema)
14 Horrocks, I. et al. (2003) From SHIQ and RDF to OWL: the making of a web
ontology language. J. Web Semant. Sci. Serv. AgentsWorldWideWeb 1, 7–26
15 Gardner, S.P. Ontologies. Drug Discov. Today Technol. (in press)
16 Horrocks, I. et al. (2005) OWL rules: a proposal and prototype
implementation. J. Web Semant. 3, 23–40
17 Lord, P. et al. (2004) Applying semantic web services to bioinformatics:
experiences gained, lessons learnt. Proceedings of the 3rd International
Semantic Web Conference – ISWC 2004, 9–11 November, Hiroshima, Japan
(Springer LNCS 3298)
18 Knublauch, H. et al. (2004) The Protege-OWL Plugin: an open
development environment for semantic web applications. Third
International SemanticWeb Conference – ISWC 2004, November, Hiroshima,
Japan (http://protege.stanford.edu/plugins/owl/)
19 Sabou, M. et al. (2005) Learning domain ontologies for web
service descriptions: an experiment in bioinformatics. Proceedings
of the 17th International Conference on World Wide Web (WWW2005),
May, Japan
20 Handschuh, S. and Staab, S. (2002) Authoring and annotation of web
pages in CREAM. Proceedings of the 11th International World Wide Web
Conference (WWW2002), 7–11 May, Honolulu, Hawaii, USA
21 Ghanem, M. et al. (2005) A grid infrastructure for mixed bioinformatics
data and text mining. Proceedings of the 3rd ACS/IEEE International
Conference on Computer Systems and Applications, January, Cairo, Egypt,
IEEE Computer Society
22 Ciravegna, F. et al. (2004) Learning to harvest information for the
Semantic Web. Proceedings of the 1st European SemanticWeb Symposium, 10–
12 May, Heraklion, Greece
23 Volz, R. et al. (2003) Unveiling the hidden bride: deep annotation for
mapping and migrating legacy data to the Semantic Web. J. Web Semant.
Sci. Serv. Agents World Wide Web 1, 187–206
24 Lund, B. (2004) Science Publishing and the Semantic Web: RSS, RDF and
Urchin, Position Paper W3C Semantic Web for Life Science Workshop
(http://www.w3.org/2004/07/swls-ws.html)
Vol. 2, No. 3 2005 Drug Discovery Today: Technologies | Knowledge management
25 Wroe, C.J. et al. (2003) A methodology to migrate the gene ontology to a
description logic environment using DAML+OIL. Proceedings of the 8th
Pacific Symposium on Biocomputing (PSB), January, Hawaii
26 Zhao, J. et al. (2004) Using Semantic Web technologies for representing e-
science provenance. Proceedings of the 3rd International Semantic Web
Conference ISWC2004, 9–11 November, Hiroshima, Japan (Springer LNCS
3298)
27 Frey, J. et al. (2004) Less is More: Lightweight Ontologies and User Interfaces for
Smart Labs. The UK e-Science All Hands Meeting 2004
28 Guha, R. and McCool, R. (2003) TAP: a semantic web test-bed. J. Web
Semant. Sci. Serv. Agents World Wide Web 1, 81–87
29 Domingue, J. and Motta, E. (2005) Planet-Onto: from news publishing to
integrated knowledge management support. IEEE Expert Syst. Special Issue
on Knowledge Management and Distribution over the Internet 15, 26–32
30 Maedche, A. et al. (2001) Semantic portal – the SEAL approach. In Creating
the Semantic Web (Fensel, D. et al. eds), MIT Press
31 Cheung, K.H. et al. (2005) YeastHub: a semantic web use case for
integrating data in the life sciences domain. Bioinformatics 21 (Suppl. 1),
i85–i96
32 Nixon, L. and Paslaru, E. State of the Art of Current Semantic Web Services
Initiatives, Deliverable 2.4.ID1 Knowledge Web Network of Excellence
(http://knowledgeweb.semanticweb.org)
33 Lord, P. et al. (2005) Feta: a light-weight architecture for user oriented
semantic service discovery. In European Semantic Web Conference, Lecture
Notes in Computer Science, (Vol. 3532) (Gomez-Perez, A. and Euzenat, J.,
eds) Springer-Verlag
34 Potter, S. and Aitken, S. (2005) A semantic service environment: a case
study in bioinformatics. In European Semantic Web Conference, Lecture
Notes in Computer Science, (Vol. 3532) (Gomez-Perez, A. and Euzenat, J.,
eds) Springer-Verlag
35 Cannataro, M. et al. (2004) Proteus, a grid based problem solving
environment for bioinformatics: architecture and experiments. IEEE
Comput. Intell. Bull. 3, 7–18 (ISSN 1727-5997)
36 Nee, E. (2005) Web Future is Not Semantic, Or Overly Orderly CIO Insight 5
May (http://www.cioinsight.com/article2/0,1540,1815338,00.asp)
37 McBride, B. (2004) Four Steps Towards the Widespread Adoption of the
Semantic Web. Proceedings of the 1st International Semantic Web Conference
ISWC2002, 9–12 June, Sardinia, Italy (Springer LNCS 2342)
www.drugdiscoverytoday.com 233