searching the news - vanatteveldt.comvanatteveldt.com/wp-content/uploads/ijmso.pdf · searching the...
TRANSCRIPT
Searching the news
Using a rich ontology with time-bound roles to search throughannotated newspaper archives
Wouter van Atteveldt1, Nel Ruigrok2, Stefan Schlobach1, and Frank van Harmelen1
1 Department of Artificial IntelligenceFree University Amsterdam
De Boelelaan 1071, 1071 HV Amsterdam{wva,schlobac,Frank.van.Harmelen}@few.vu.nl
2 The Netherlands News MonitorUniversity of Amsterdam
Kloveniersburgwal 48, 1012 CX [email protected]
Abstract. A frequent motivation for annotating documents using ontologies is to allowmore efficient search. For collections of newspaper articles, it is often difficult to find spe-cific articles based on keywords or topics alone. This paper describes a system that uses aformalisation of the content of newspaper articles to answer complex queries. The data forthis system is created using Relational Content Analysis, a method used in CommunicationSciences in which documents are annotated using a rich annotation scheme based on an on-tology that includes political roles with temporal validity. Using custom inferencing over thetemporal relations and query translation, our system can be used to search for and browsethrough newspaper articles and to perform systematic analyses by evaluating queries againstall articles in the corpus. This makes the system useful both for the (Social) Scientist andfor interested laypersons.
1 Introduction
A number of services exist that offer keyword-based search of news content, such as Google news3
and LexisNexis4. Such services have severe limitations, however. The first difficulty is the semantic
gap between keywords and meaning that is always present in keyword based search. Another
limitation is that it is impossible to look for a relation, such as positive or negative, between
two concepts without specifying all of its possible lexical representations. For example a keyword-
based query ‘Blair support EU’ will not return documents containing ‘Prime Minister praises new
Commission.’ A third limitation is that search is generally bounded by articles or sentences as the
only possible unit of search, meaning that one cannot search for ‘Two articles within one week in
which a politician’s stance on a topic has changed.’
Queries such as the ones above require metadata not just about the topic and publishing details
of an article but also about the content of that article. Generating such metadata automatically is a3 http://news.google.com4 http://www.lexisnexis.com
2 Wouter van Atteveldt, Nel Ruigrok, Stefan Schlobach, and Frank van Harmelen
formidable challenge, but there are large corpora of articles manually annotated by Communication
and Political Scientists that can be used. This paper is based on an analysis of the news coverage
of the 2006 Dutch parliamentary election campaign, which was annotated manually using a rich
annotation schema, summarised as answering: “Who says what about whom/what according to
whom?” The concepts in this annotation are drawn from a detailed ontology of (political) actors
and issues, including time-dependent political function and party membership information.
The system presented here utilizes these annotations for semantically informed search through
the newspaper corpus, inspecting the results quantitatively and visually, and retrieving the original
articles the results are derived from. Based on a formalised version of the annotation of the news-
paper content, the system allows for very sophisticated queries. Moreover, since these annotations
are based on a rich background ontology, it is possible to ask general queries as well as very detailed
ones, bridging the gap between abstract concepts and concrete representation. Finally, we perform
automatic reasoning over the validity of the temporal political functions, making it possible to use
a political function such as a minister in the query, which yields answers from statements about
the various ministers during their respective time in office.
The primary users of this system are Social Scientists investigating communication processes
and effects. In the preparatory phase of such an investigation, a researcher can use the system to
form an understanding of the corpus and to formalise the concepts he or she is interested in. In the
analysis phase, the researcher can use the system to query the corpus using the formalised concepts
and export the results for statistical analysis. In the post-analysis phase, the researcher can use the
system to check the results, retrieve interesting articles for qualitative sense-making, and obtain
quotes and examples for writing about the results.
It should be noted, however, that this system can also be very interesting for users outside
academia. Often, the annotated material is of high relevance to society, such as election campaigns
and high-profile issues such as the war in Iraq or Middle East policy. This material can be very
interesting to politicians, civil society groups, NGO’s, and citizens. The system described in this
paper can help them search through the corpus for general trends or specific patterns.
The contribution of this paper is twofold. Firstly, using the annotations and the ontology sup-
plied by the Social Scientists, we are able to create and test a search system which showcases
the benefits of formalised metadata, such as querying at various levels of abstraction, searching
for potentially highly complex patterns, and reasoning about temporal roles. Secondly, the system
presented here makes the annotations created by the Social Scientists more accessible, making it
Searching the news 3
easier for the scientist to perform both quantitative and qualitative analysis, and providing other
users with a richer way of searching for news.
1.1 Knowledge Representation and Content Analysis
In this paper we present a system for searching specific patterns in an annotated newspaper archive.
In our vision, there is a potential for synergy between Knowledge Representation, especially Se-
mantic Web techniques, and Content Analysis. Studying the complex relationship between politics,
the media, and the public requires very large data sets. In order to find general statistical patterns,
these data sets often need to span multiple countries and events. Since this data is expensive to
obtain, and often needs detailed knowledge of the subject language and society, it is important that
researchers are able to combine and share data sets. Additionally, since there are many competing
theories and methodologies for analysing news patterns and effects, these data sets should lend
themselves to multiple analyses rather than being specific to one study.
We believe that using Semantic Web techniques can help alleviate these problems. We propose
annotating as close to the text as possible, and using a formal ontology to aggregate the detailed
objects found in the text to the theoretical concepts needed for the analysis. This minimises the
amount of interpretation done by the annotators, and thus the potential for unreliable coding.
Additionally, this makes it possbile to combine the analyses of the different countries or time
periods in a single scheme, where it is possible to have a concept such as ‘opposition politician’
map to different objects depending on time and place. Moreover, since it is possible to combine
different aggregation schemes in one ontology, this allows for the same data to be used for different
analyses. Finally, it is possible to express interesting variables, such as politicial frames like ‘strategic
framing’ or ‘internal conflict’, as formal patterns, for example as a SeRQL query or OWL definition.
This makes the process of aggregating and analyzing data more transparant, and makes it easier
for researchers to duplicate and expand upon studies from other groups.
This is not a one way street. Content Analysts have been annotating texts for the last decades,
and many methods have been developed to perform systematic annotation and evaluate annotation
quality (Holsti, 1969; Krippendorff, 2004). As manual annotation is a necessary part of the Semantic
Web vision, either for creating the metadata directly or for bootstrapping and evaluating Machine
Learning tools, these techniques can play an important role. Moreover, the rich annotated corpora
such as the one used in this paper can be very useful data for building and testing systems to show
the usefulness of the Semantic Web techniques.
4 Wouter van Atteveldt, Nel Ruigrok, Stefan Schlobach, and Frank van Harmelen
Previously, we have shown how to use hybrid logic to define concepts within annotated media
material (Van Atteveldt and Schlobach, 2005). This can be directly generalised to using OWL
to define such concepts, and also shows the limits of this approach because of the lack of variable
binding in OWL. In (Van Atteveldt et al., 2007) we provide an in-depth discussion of the possibilities
and challenges in using RDF to formalise media data. Finally, in (Van Atteveldt et al., 2006), we
propose a vocabulary and formalisation standard for converting content analysis data to RDF.
Structure of this paper The following section will describe the corpus used in this paper and the
Relational Content Analysis that was used for the annotation. Section three will describe the way
the data were coded formally and give an overview of the ontology used for this encoding. The
fourth section will provide more information about the implemented system, and the fifth section
will discuss its usage and usefulness.
2 Domain
In this section we describe the annotated data corpus on which this study is based, a collection of
annotated newspapers and TV news items about the 2006 Dutch election campaign. After a brief
description of the campaign we will describe the method used for annotating the data and some
aspects of the corpus this resulted in.
2.1 Data Collection
The use case presented in this study is based on a content analysis of political news from August
14th (first party manifesto) until November 22nd (Election Day) in six daily newspapers and two
television news programs. Each article in the newspapers as well as each item in the television news
programs referring to a party, a politician or an issue were included in the dataset, resulting in
5.707 newspaper articles and 134 news items from the television news programs.
For the content analysis of the articles annotators used the NET method (Van Cuilenburg et al.,
1986, Network analysis of Evaluative Texts). This method is a Relational Content Analysis method
and an elaboration and generalisation of Osgood et al. (1956)’s evaluative assertion analysis. It is
based on the idea that the explicit or manifest content of a text can be depicted as a network
consisting of relations between actors and issues. Specifically, each sentence of a news item is
coded as a 〈subject, predicate, object〉 triple. The subject and object are predefined within an
extensive ontology (see next section). The predicate describes the connection between the subject
and the object and consists of a type, and quality. The type indicates the kind of relationship
Searching the news 5
The Dutch elections in the Media
Dutch parliamentary elections took place on 22nd of November 2006 with the Dutch voters causinga landslide in the political arena of The Hague. Instead of the expected titanic struggle betweenthe leaders of the Christian Democrats (CDA) and the Social Democrats (PvdA), voters desertedestablished parties, especially the PvdA and the conservative Liberal Party (VVD), in favour ofmore outspoken parties on both the left and right wing of the political spectrum. As shown by(Kleinnijenhuis et al., 2007) these short-term dynamics of voter preferences are highly influenced bypreceding news coverage.
After the summer the news coverage in the Netherland was filled with the upcoming elections andthe campaign preceding the vote. The highly speculated duel between the incumbent Prime MinisterJan Peter Balkenende and the leader of the PvdA, Wouter Bos, failed to occur. On the contrary, inSeptember the incumbent government presented the 2007 budget full of good news “After the bittercomes the sweet”, De Telegraaf, 20 September 2006) and the CDA became more often presentedas successful party responsible for economic growth, while the PvdA went into a downward spiralwith Bos presented as a ‘flip-flop.’ The other incumbent party, the VVD, could not profit from thissuccess as it was burdened down with internal fights. After the internal party election between twocandidates presenting the liberal and conservative wing of the party, the party could not manage toform a unified front. In the shadow of this battle and the problems of the PvdA, the smaller partiesseized the opportunity to present themselves as a more outspoken alternative: The right wing Partyfor Freedom (PVV) presented itself as an alternative for the right winged liberals, while the SocialistParty (SP) portrayed itself as a stronger and more outspoken alternative for the PvdA.
Fig. 1: A short overview of the Dutch 2006 elections
between subject and object such as ‘causual’, ‘action’, and ‘affinitive.’ Quality is a number that
indicates the strength and direction of relationship between subject and object and ranges from -1
to 1. Moreover, some sentences in a newspaper are quoted or paraphrased. In such cases a source
argument can be added to the coded sentence. As an example, consider the newspaper excerpt
visibile in the screenshot in Figure 2 overleaf. The headline is coded as a negative relation between
the political blocks Left and Right. The first sentence of the lead specifies this as a fight between
the incumbent prime minister Balkenende (CDA) and the challenger Wouter Bos (PvdA). In the
next sentence, Balkenende states that Bos is scaring people, which is coded as Bos acting against
the Dutch citizens with Balkenende as source. The final sentence expresses two relations: according
to Bos, investing more money would be good for the Health Care, and Bos wants to invest money
in Health Care, here coded as an affinity (issue position) relation between Bos and Health Care
Investments.
The annotators used the annotation program iNet5, which was created specifically for this
research. As shown in Figure 2, it allows annotators to efficiently select subjects and predicates
5 http://www.content-analysis.org/inet
6 Wouter van Atteveldt, Nel Ruigrok, Stefan Schlobach, and Frank van Harmelen
Fig. 2: Example annotation in iNet
from the ontology using auto-complete and stores the results immediately in an RDF repository as
described in the next section.
2.2 Corpus and Ontology
The corpus consists of newspaper articles and transriptions of television news items. For the newspa-
per articles, all newspaper articles published in five Dutch national daily newspapers that contained
one of the major issues or political actors were selected. For the television items, all news items in
two evening news programs that were deemed relevant by the annotators were included. Using the
method described above, the head and lead of each news article and the full television news items
were coded. This resulted in 26.186 statements from newspaper articles and 3.967 statements from
the television news programs.
As described, the subjects and objects in these statements are drawn from a fixed vocabulary.
Originally, this consisted of 1490 concepts (310 politicians, 224 other actors, and 956 issues) in
an informal mixed is-a, has-a and part-of hierarchy. Using this hierarchy presented a number of
difficulties. Because each concept could only have one parent, it was not possible to represent
the fact that a poltician has a party membership as well as a political function. Since in the
hierarchy politicians were placed under their function, this made it difficult to do an analysis of
Searching the news 7
party relations regardless of function. A related problem is that different analyses require different
ways of organising issues, which is very difficult with a simple hierarchy. Finally, the fact that there
is no distinction between has-a, is-a, and part-of makes it difficult to do any inference other than
simple aggregation.
To alleviate these problems, this hierarchy was formalised into an ontology, separating classes
and instances and using type, subclass, and part-of relations where appropriate. The remainder of
this section will describe the two most interesting aspects of this ontology: the issue hierarchy and
the political actors.
The issue hierarchy The issue hierarchy contains a large number of issues that occur in the news,
ranging from general issues such as ‘Security’ to highly concrete issues as ‘Biometric Passports.’
In principle, this hierarchy works as an is-a hierarchy, that is, a ‘biometric passport’ is-a security
issue. However, since the more abstract issues also have to be available for annotation, and it is
undesirable to use classes as annotation objects, we decided to formalise the whole issue hierarchy
as instances in a thesaurus-like structure. As argued by van Assem et al. (2006), this is often a
good choice to formalise and combine different hierarchies, providing the needed ‘broader-term’
reasoning without committing to the semantics of subclasses and instances. Thus, we encoded the
hierarchy using a transitive and reflexive k06:subIssueOf predicate which is a subproperty of the
skos:broader term defined for SKOS (Miles and Brickley, 2005).
During the formalisation, we also merged two different categorisations of the issues. For the elec-
tion studies the vocabulary was initially developed, the categorisation was by political alignment:
leftist issues such as Social Securiy, rightist issues such as fiscal conservatism, pro-environment etc.
Afterwards, the same vocabulary was adopted for the research of the Netherlands News Monitor,
which categorised the issues by governmental department: defense, social affairs, health, etc.. As
a side benefit, by merging these vocabularies in one subissue ‘multitree’, data can be shared more
easily between these groups, and it made the categorisation of difficult issues clearer. An example
of the latter is a peacekeeping mission, which belongs to the Department of Defense but is seen as
a neo-leftist (ethically progressive) issue.
Keeping these issues as close to the text as possible and using the formal hierarchy for cate-
gorising is vital for reliable coding: if the coders have to decide whether peacekeeping is pro-defense
or neo-leftist, results will diverge and afterwards it is impossible to trace why coders made certain
decisions. As far as possible, coders should simply pick the issue closest to the occurrence in the
text without interpreting the position of that issue. In total, we have 956 issues categorised into a
double hierarchy with five layers, 15 root directions, and 20 root departmental topics.
8 Wouter van Atteveldt, Nel Ruigrok, Stefan Schlobach, and Frank van Harmelen
Political Actors The other area of interest are the political actors. For the political actors, the
naıve approach is to classify them as Representatives, Administrators, Senators, etc. Unfortunately,
politicians generally fulfill such functions only for a short period, and especially for longitudinal
studies it cannot be assumed that this hierarchy is static. Therefore, all politicians are simply
classified as ‘Person’, with the real semantics in the roles rather than the class membership: for each
politician we include their party membership and poltical functions using specialised memberOf
properties. Also, for the parliamentary fractions it is recorded whether they are member of the
coalition or opposition block. These memberOf properties are all subproperties of the transitive
and reflexive k06:partOf relation to make it easy to query patterns such as a person being a member
of a part of the coalition block. In total, we have 310 political actors distributed over 16 parties
and 48 functions.
3 Representing Newspaper Content
As described above, the NET method provides a rich annotation by representing texts as networks
of statements between nodes, where the nodes are drawn from an ontology of actors and issues.
In order to effectively search through an archive of material annotated with this method, we have
formalised the NET data representation and the ontology in RDF(S). This section will briefly
describe RDF(S) and discuss some of the issues encountered in this formalisation.
Note that although this dicussion focusses on the NET method, it is for a large part applicable
to other Relational Content Analysis methods as well. Van Atteveldt et al. (2006) survey the
representational requirements of a number of (Relational) Content Analysis methods, and the
formalisation presented here can be easily adopted for the other methods as well. That study also
introduces the amcat6 namespace (standing for Amsterdam Content Analysis Toolkit) that will be
used for general content analysis vocabulary in addition to the net7 namespace, that contains NET-
specific vocabulary, and the k068 namespace containing the data from the 2006 election campaign.
3.1 RDF : The Resource Description Framework
A central element of the Semantic Web is the Resource Description Framework (RDF). This is
a standard representation specified by the Wold Wide Web Consortium (W3C) for describing
documents and other resources on the Internet, creating an interconnected Semantic Web (Antoniou
6 http://www.content-analysis.org/vocabulary/amcat#7 http://www.content-analysis.org/vocabulary/net#8 http://www.content-analysis.org/vocabulary/ontologies/k06#
Searching the news 9
and Van Harmelen, 2004). Using a graph as its data model and using XML syntax to describe
and exchange information, RDF allows data to be mixed, exported, and shared across different
applications. Using distinguished vocabulary within RDF, RDF Schema (RDFS) allows for the
definition of ontological relations such as class membership and subclass and subproperty hierarchies
(Brickley and Guha, 2004). Since the ‘data’ and ‘metadata’ are combined in one RDF graph, it is
easy to construct queries over the combined network of media data and background knowledge.
3.2 Representing media data in RDF
Annotation
k06:Bos
ont:HealthCare
k06:PvdA
net:subject
net:object
+1
net:quality
k06:Party
ont:Actor
k06:Health
k06:socialist k06:direction
k06:topic
amcat:roleSubject
Article1
Telegraph
rdf:type
rdf:type
dc:subject
dc:publisher
dc:date
2006-11-22
k06:Politician
rdf:typerdfs:subClassOf
Hester
dc:creator
k06:role-Bos-PvdA
k06:partyMemberOf
2020-01-01
amcat:roleFrom
amcat:subIssueOf
amcat:subIssueOf
rdf:type
Metadata Media Data Ontology
net:Affinity
net:ArrowType
Fig. 3: The data model used
Figure 3 exemplifies the overall formalisation of the NET method. In the center is the node
representing the annotation. To the left, this node is connected via the Dublin Core subject relation
to the sentence or article that it stems from. The metadata of the article, such as headline, date,
and publisher, is specified on the article using standard Dublin Core Vocabulary. Additionally, the
coder of the annotation is stored in the Dublin Core creator property.
In the middle pane, the quality, predicate, and arrow type of the annotation are specified
using properties. Most importantly, the subject and object of the annotation are given as links to
instances in the ontology, in this case PvdA-leader Bos and the issue Health Care. This issue is
connected via the subissue hierarchy to the topic ‘Health’ and the polticial direction ‘pro-welfare’.
On the bottom right hand side, it can be seen that Bos plays the role of PvdA-member since 1981.
This role is of type poltician, which should be interpreted as meaning a person playing this role is
a politician. The role instance is also connected with the PvdA node, which is of type party. Both
party and politician are subclasses of actor.
10 Wouter van Atteveldt, Nel Ruigrok, Stefan Schlobach, and Frank van Harmelen
Two aspects of this design that merit additional discussion are representing the NET statements
as their own resource, and the way in which temporal roles are represented. These two aspects will
be discussed in the remainder of this section.
3.3 Statements about Statements
Although it seems natural to represent the media data network extracted using the NET method
in a graph-based language, it is important to keep in mind that statements and graphs are not
first-class citizens in RDF: contrary to the subject, predicate, and object they consist of, statements
do not have URIs, and it is not possible to make statements about statements. In other words: RDF
is a language for describing resources using triples, not a language for describing triples. We are
not the first to signal this difficulty: MacGregor and Ko (2003) cite the need for enriching triples
to describe event data, and a number of authors want to use RDF for describing RDF documents,
for example for reasoning about provenance and trust (Carroll et al., 2005).
Within the RDFS specification, it is possible to use RDF Reification for making statements
about statements, but this does not have clear formal semantics and is not supported by most
tools. Another solution that does not require additions to the language is using the N-ary design
pattern desribed by Noy and Rector (2005). Unfortunately, there is no standard vocabulary for
indicating that something is an N-ary triple and what the status of the different places is, making
the resulting graph difficult to interpret by third party applications. Additional proposals exist in
the literature to deal with this problem. Two proposed solutions are adding a fourth place to a
statement, creating a ‘quad’ rather than a triple (MacGregor and Ko, 2003; Dumbill, 2003), or
creating Named Graphs, assigning a URI to a set of statements (Carroll et al., 2005).
As argued in Van Atteveldt et al. (2007), enriching triples is not as simple as adding a URI to
all statements: it is important to distinguish between transparant enrichment, where the original
statement is available, and opaque enrichment, where the original statement is not available to
agents that do not interpret the enrichment. It might seem that transparant enrichment is always
preferable, but quoted statements or temporally limited relations should not be visible outside their
scope. Moreover, some enrichments add a fourth place to the statement, some give more details
about the predicate (such as the quantification used in the NET method), and some enrichments
add information about the whole triple (such as the Dublin Core metadata about an article).
Current proposals either choose one of these cases, or do not provide for mechanisms to indicate
what the intended semantics of the new information are. Since the NET method uses both opaque
Searching the news 11
and transparant enrichments and has extra arguments, metadata, and predicate quantification, this
is a serious obstacle.
Given these considerations, it was decided to follow Noy and Rector (2005) and let a NET
annotation be represented by a node rather than an arrow, with net:subject, object and net:
predicate edges to the corresponding objects. This means that the network semantics of the NET
network are lost to third parties, but it also means that we can add arbitrary information about
these annotations, such as quality, quoted source, and a link to the newspaper article the annotation
is part of. Note that this is equivalent to using the N-ary design pattern with a URI representing the
statement rather than a blank one. Also, this is structurally equivalent to using RDF reification:
our encoding can be turned into reification by having net:subject be a subproperty of rdf:subject
and likewise for the other relations.
3.4 Temporal Roles
There are a number of political roles and functions, such as being president, member of parliament,
or member of a party, that are only fulfilled by politicians for a certain time period. The functions
are social roles in the sense that they are anti-rigid, meaning that the existence of the politician
does not depend on his playing a role, and dynamic (cf. Masolo et al., 2004). As surveyed by
Steimann, treating such roles in frame-based models such as RDF, is not trivial and has received
considerable attention in the literature (Steimann, 2000; Sowa, 1988, 2000; Guarino, 1992). The
simplest approach, treating a role as a predicate, does not allow for specifying temporal bounds or
other information for the reasons discussed above. Another possibility is making each occurrence
of a role being played a subproperty of the role property. As argued by Steimann (2000), this leads
to a number of complications and does not really solve the problem.
The approach chosen here is creating an adjunct instance representing one occurrence of the
role (Wong et al., 1997). An example of this is the node k06:role-Bos-PvdA in Figure 3. This
instance is linked to the role player (ko6:Bos), and the statements that define the role (such as
the k06:partyMemberOf relation with k06:PvdA) are made with the role instance as subject. In our
system, we specify the From and To dates as properties of the role instance. This allows normal
reasoning to occur on the role instance, leaving the task of determining whether a specific person is
a member of a role for a certain annotation to the inference system (see Section 4.1). This is different
from the approach taken by Mika and Gangemi (2004), who use reification for representing roles,
and Gutierrez et al. (2005), who propose an extension to the RDF model that can be expressed
using a form of reification for reasoning with temporal validity. The reasons for taking a different
12 Wouter van Atteveldt, Nel Ruigrok, Stefan Schlobach, and Frank van Harmelen
approach are that we want to work within existing tools, and want to keep querying as simple as
possible, which is enabled by treating the adjunct instances as a ‘normal’ object playing a role
rather than as a reification pointing to an object and a role.
4 Visualising and Searching Newspaper Content
The previous sections introduced the Relational Content Analysis methodoloy of modeling a text as
statements between actors and issues, and described a way of formalising these statements and the
background knowledge about the actors and issues. As argued by Sicilia (2006), it is important to
keep in mind that metadata is created to perform a specific function. The purpose the annotations
were originally created for was scientific analysis of newspaper content, and an important use of
the system lies in this function. The other purpose for formalising this knowledge is allowing the
user to search and browse through the collection more efficiently. Since the RDF graph is fairly
complex due to the requirements of making statements about the NET statements and of temporal
social roles, it is not sensible to expose the users directly to this representation for either of these
use cases. This section will describe how user queries are translated into SeRQL queries, and how
the results are visualised and presented to the user.
Since our implementation is based on the Sesame RDF storage and inference system (Broekstra,
2005), we translate user queries into SeRQL rather than the upcoming W3C recommendation
SPARQL (Prud’hommeaux and Seaborne, 2006). However, since SeRQL and SPARQL are rather
similar and we only use fairly straightforward constructs, the results presented here apply equally
to SPARQL.
4.1 Reasoning with roles
As described above, NET statements are represented by annotation resources, which have net:
subject and net:object relations to an actor or issue. Actors are often involved in social roles, such
as being president, which connect them to groups or functions for a specific period. As shown in
Figure 4, this is encoded by making an adjunct instance that represents the specific role, which is
connected to the role player, and about which the To and/or From dates are specified. This adjunct
instance is then treated as the role player playing that role, so the temporal background knowledge
is expressed about the adjunct instance. For example, in Figure 4, it is encoded that Bos has played
the role ’being a PvdA member’ since 1981; the membership of the PvdA is expressed as a direct
characteristic of the role instance.
Searching the news 13
k06:PvdA
amcat:roleSubject
k06:memberOf
1981-01-01amcat:roleFrom
k06:Bos
k06:role-Bos-PvdAAnnotation
net:subjectdc:date
2006-11-01
net:subject
Fig. 4: Inference of a Role played by a Politician
Querying this data model directly would lead to very complex queries, where the date of an
annotation has to be compared to dates of roles, with optional paths for missing To or From dates,
for each subject and object in the query. To make the querying easier, we perform a reasoning step
in the RDFS repository, using the rule shown below, which can be read as: If an annotation has
a ?rel relation to an object, and the object is the subject of a certain role-instance, and the role-
instance is valid at that moment in time, then conclude that the annotation has the ?rel relation
to the role-instance.
IF ?annotation dc:date ?ADate
AND ?annotation ?rel ?object
AND ?roleInstance amcat:roleSubject ?object
[AND ?roleInstance amcat:roleFrom ?FDate ]
[AND ?roleInstance amcat:roleTo ?TDate ]
AND (FDate is null OR FDate <= ADate)
AND (TDate is null OR TDate >= ADate)
THEN ?annotation ?rel ?roleInstance
For example, in Figure 4 the reasoner would conclude the dotted net:subject arc from the An-
notation to the role-Bos-PvdA instance, since there is no To date specified for the role, and the
date of the annotated article is after the From date. This means that we can simply query for an
annotation with a subject who is a member of the PvdA, without knowing that the original subject
Bos has not always been a PvdA member.
This rule was implemented as a custom inference layer on top of the normal RDFS engine in our
Sesame repository. Sesame computes the closure by exhaustively applying all rules using forward
changing, deriving all relations between annotations and roleInstances that were valid on that
date. Since these inferred triples are avaible for the RDFS inferencer, normal reasoning such as
subclass inference can be used to draw more conclusions from the adjunct instance. For example,
if the role instance is of type Representative, and Representative is of type Politician, we can also
conclude that the subject of our annotation is a politician.9
9 Note that we cannot assert that Bos himself is a politician, since politicians can become societal actorsand the other way around.
14 Wouter van Atteveldt, Nel Ruigrok, Stefan Schlobach, and Frank van Harmelen
4.2 Querying the news
We want to allow users to specify patterns in terms of the relational NET network, ie the network of
statements between actors or issues. Since the RDF data is formalised at a lower level of abstraction,
it is necessary to translate the user queries to SeRQL queries performed on the underlying data set.
These actors and issues can be specified either as one specific instance, or in terms of the background
knowledge. For example, a user might want to query for “A PvdA-member of parliament disagreeing
with a Defense plan proposed by a Minister.” Such queries must be translated to SeRQL queries
that are evaluated on the RDF repository.
In our system, we assume that a user specifies two types of information: patterns of relations,
and constraints on the nodes in these relations. Canonically, this consists of triples of the form
(Node, +/-, Node) and (Variable, Relation, Value), respectively. For example, the question above
can be expressed by the user as the query given below:10
?Mem - ?Iss
?Min + ?Iss
?Mem ont:memberOf ont:PvdA
?Min rdf:type ont:minister
?Iss ont:subIssueOf ont:Defense
The first line specifies that we are looking for an annotation where a concept ?Mem is negative about
another concept ?Iss. The second line specifies that ?Min has to be positive about the same issue.
The third line constrains ?Mem to be a member of the PvdA party, while the fourth line constrains
?Min to be of type Minister. The final line constrains the issue to be a subissue of Defense.
These queries are translated into SeRQL as follows: a relation triple is translated into “{Ann1}net:
subject{Su};net:predicate{Pr};net:object{Obj}].” If an element of a triple is a URI, it is literally
inserted instead of the placeholder; for variables the variable name is inserted. For node constraints,
the triple is translated into “{Var}Rel{Val}”, where Var is the variable name, and rel and Val are
the specified relation and value. Since the constraints and relations use the same name for the
shared variables, these translated statements can simply be conjoined to form the FROM part of
the SeRQL. Some very useful but less trivial relations, such as ’maximal part-of’, needed custom
clauses in the WHERE part of the query. Depending on whether the user wants these patterns
to occur in a single sentence, article, week, or other unit of analysis, an additional link is made
from all annotations in the query to the article identifier, date, or week number. Finally, because
10 This notation can be easily enhanced, for example by allowing chaining of triples syntax (eg ?Mem -
?Iss; + ?Iss) or direct specification of properties on the node (eg {.ont:memberOf.Pvda} - ?Iss).The translation of such constructions to the triple form is trivial and does not affect the discussion inthis paper. We did included an option to specify the label of a concept rather than the uri, where theconcept with the best matching label is picked automatically.
Searching the news 15
we want to return the natural persons rather than the role instances, a link is made using the
(reflexive) roleSubject predicate, “{Var1}amcat:roleSubject{V1}”, and a condition is included in
the WHERE clause to specify that V1 may not be a role instance: “WHERE NOT EXISTS Var1
rdf:type amcat:roleInstance”.
A full translation of the user query given above is:
SELECT DISTINCT ArticleID, Ann1, Ann2, Mem, Min, Iss FROM
{Ann1} net:subject {VAR_Mem};
net:quality {QUA_1};
net:object {VAR_Iss};
dc:subject {} dc:identifier {ArticleID},
{Ann2} net:subject {VAR_Min};
net:quality {QUA_2};
net:object {VAR_Iss};
dc:subject {} dc:identifier {ArticleID},
{VAR_Min} amcat:roleSubject{Min};
rdf:type {ont:minister},
{VAR_Iss} amcat:roleSubject{Iss};
amcat:partOf {ont:defensie},
{VAR_Mem} amcat:roleSubject{Mem};
ont:memberOfParty {ont:pvda},
WHERE (QUA_1 < "0"^^xsd:float)
AND (QUA_2 > "0"^^xsd:float)
AND (NOT EXISTS (
SELECT * FROM {Iss} rdf:type {amcat:roleInstance}))
AND (NOT EXISTS (
SELECT * FROM {Min} rdf:type {amcat:roleInstance}))
AND (NOT EXISTS (
SELECT * FROM {Mem} rdf:type {amcat:roleInstance}))
4.3 Visualising the network
The result of this query can be used in different ways. This section will discuss these scenarios, and
show how they are supported by the system.
Searching and Browsing The simplest scenario is perhaps that of the non-academic user. The
user enters a query, such as “An article in which prime minister Balkenende is positive about Social
Security”, which translates into [Balkenende + ?I | ?I partof SocialSecurity], relying on the label
lookup to resolve the two concepts. The output of this query is shown in Figure 5(a). Each article
is listed with metadata and the number of hits, and two buttons of the left allow the user to inspect
the statements that were identified as hits, and visualise the article. This second button shows the
text of the article and the matched statements, as well as a visualisation of the full network of
the article, with the matched statements highlighted, as shown in Figure 5(b). This visualisation
offers the option to display the raw statements or aggregate by party line or political function, as
discussed above.
16 Wouter van Atteveldt, Nel Ruigrok, Stefan Schlobach, and Frank van Harmelen
(a) List of matched articles
(b) Visualisation of the first article
Fig. 5: Searching for Balkenende on Social Security
Analyzing The Social Scientist gives rise to a more complex scenario. He or she has a specific
pattern in mind, and needs to know how often each possible instantiations of the pattern occurred
for each unit of analysis. For example, a researcher might be interested in the relation between
Parliament and Government. This translates to the query shown query window in the top part of
Figure 6(a). This query looks for a relation between an MP, who is a member of the Parliament
and of a Party, and a Minister, who is part of something that is-a Ministry.
The results shown in Figure 6(a) give a very broad overview of a specific aspect of the campaing.
Often, the researcher also wants to know which party and ministry were involved in the relation.
For this, the user can enter a number of variables into the ‘select’ box, which causes the system to
output results for each combination of values in the selected variables, as shown in 6(b).
Zooming in Additionally, the researcher wants to be able to check and give meaning to results
by going back to the articles. This is supported by the buttons next to each row, which allow the
researcher to ask for the list of articles or statements that are the raw data of the aggregate result.
Clicking on the ‘show articles’ button next to the August 2006 results in the list in Figure 6(c).
This list gives the source and headline of all articles matching the pattern in August, and provides
Searching the news 17
(a) Aggregate results
(b) Frequency per instantiation
(c) List of articles
VVD
Min Economy
-1.01
Min Immigration
+0.25
PvdA
Min Health
-1.01
Min Education
+1.01
SP
Min Finance
-1.01
(d) Visualising the instantations
Fig. 6: Analyzing the relationship between Government and Parliament
18 Wouter van Atteveldt, Nel Ruigrok, Stefan Schlobach, and Frank van Harmelen
a final option to show the actual article as described for the first scenario, allowing the researcher
to check the pattern and/or collect quotes and headlines for use in the report.
Visualising Finally, since the underlying data is a network, it is often helpful to look at a graph
visualisation of the results. The researcher can enter a triple of variables into the ‘visualise’ box,
and the system then creates a graph consisting of the connections between these variables. Figure
6(d) shows a visualisation of the Party and Ministry variables, showcasing the fact that this makes
it possible to visualise at an arbitrary level of aggregation.11
5 Using the system: Parties in the news
This section will show how the system described above can be used for systematic analysis in a
real-world setting. Earlier election studies reveal a number of aspects of newspaper coverage that
can be shown to have a short-term effect on voter behaviour (Kleinnijenhuis et al., 2007, 2003,
1998). In the remainder of this section, we will perform a selection of the analyses from these
studies, showing that the system can give useful results for actual queries. In particular, we will
consider news on issue positions, conflict between parties, conflict within parties, and news about
a minister who was forced to step down weeks before the elections.
5.1 Issue ownership
Research about the role of media during political (election) campaigns show that different forms of
news coverage have an important influence on voters’ decisions (Kleinnijenhuis et al., 2007). First
of all parties try to generate as much media attention as possible, preferably in association with
issues they ‘own’. According to the issue-ownership theory parties which gain most attention in
the media for their own issues will get the most votes. An issue is ‘owned’ by a certain party when
the general public perceives that party to be the best party to handle it. Most issues are linked to
parties due to historical cleavages like class and religion. For example, an issue like social security
reflects the interests of the lower classes and is owned by social-democratic parties. Looking at the
issues parties are associated with, we assume that parties “owning” a certain issue will be more
often related to that issue in the news than other parties.
11 Due to the low quality of graph screenshots, the image included here is actually a postscript renderingof the GraphViz dot code (http://www.graphviz.org) generated by the system.
Searching the news 19
In order to investigate the amount of attention given to the different issue positions of the
parties, we used the following query, using the maxpartof predicate to retrieve the issue highest in
the hierarchy as described in section 4.2:
?X = ?I
?X partof ?Party
?Party isa Party
?I maxpartof ?Issue
?Issue isa Issue
The results of this query are shown in Table 1, together with the overall attention for that
party (obtained using a simple query [?X = ?Y, ?X partof ?P, ?P isa Party]). The bold cells in
this table represent the issues that the parties are connected with according to a survey held at
the beginning of the campaign period.
Table 1: Parties and Issues
Overall Issue statementsParty attention Social Security Values Financial ImmigrationSP 5.2 5.9 6.8 3.7 0.6PvdA 21.8 32.5 14.1 19.0 16.6CDA 39.9 33.1 50.5 44.0 32.1VVD 31.1 28.0 23.3 32.7 42.8PVV 2.0 0.4 5.3 0.6 7.9
The table shows the validity of the issue-ownership theory during this election. All parties have
higher scores on the issues they are related to in the popular mind than their overall attention.
Especially the CDA could profit from the issue-ownership. Apart from ‘their’ issue about norms
and values they profit from the news about the financial situation. As stated above, they were able
to claim the success of a growing economy, supplying numerous new jobs and prosperity, while the
traditional ‘issue owner’, the VVD, was busy fighting an internal battle between the leader and
number two. On the other traditional conservative issue, immigration, the new right wing party
PVV managed to compete with the VVD, receiving four times their average attention for their
position on this issue. Interesting in this respect is that the Socialist Party (SP) did not manage to
generate a lot of attention for their position on Social Security, even though respondents indicated
that they associated them with that issue together with the traditional issue owner, the PvdA
5.2 Parties and conflict
Besides news about issues, conflict news is an important factor in the success or failure of parties
during elections. Conflict can work in two ways, as an opportunity and as a threat. Whereas criticism
20 Wouter van Atteveldt, Nel Ruigrok, Stefan Schlobach, and Frank van Harmelen
of opponents provide the party and the members an opportunity to create a distinct profile for
oneself, internal criticism can be fatal for a party. Another threat for a party are controversial
members receiving too much criticism, not only from the direct opponents but also from societal
actors. First we consider the conflict between the main parties during the election campaign. The
following query shows us the conflict among the different parties as well as the internal atmosphere.
!X - !Y
!X partof ?A
!Y partof ?B
?A isa party
?B isa party
PvdA-0.2125
VVD
-0.4101
CDA
-0.7270
PVV
-1.02
SP
+0.134
-0.689
-0.2311
-0.5108
-0.722
-0.7233
-0.5100
+0.2164
+0.85
-0.73
+0.311
-1.02
-0.45
+1.07
+0.224
-0.726
-0.720
+0.537
Fig. 7: Conflict between parties
Using the visualisation as described in the previous section, the results of this query are pre-
sented in Figure 7 with the arrows showing the direction of the statements from one party to
another or towards itself, the thinkness of the arrows representing the amount of statements about
praise and criticism and the color representing the average direction of these statements, ranging
from -1 (maximum disapproval, shown as dark red on the screen) to +1 (maximum approval, shown
as dark green on the screen) From the figure we see the heavy battle between the PvdA and the
CDA criticising each other (PvdA-CDA 270 times and CDA-PvdA 233 times) most often and both
very harshly (-0,7) in the news coverage. Apart from these main opponents, parties being each
others’ rivals for votes on the far ends of the political spectrum seem to leave each other in peace.
On the left, the SP and the PvdA don’t fight too much in the news, and also on the right wing, the
Searching the news 21
VVD and the PVV do not criticise each other too harshly. Most striking in this table, however, is
the enormous amount of criticism within the VVD. We will turn to this aspect in more detail in
the next section.
5.3 Internal conflict
As research showed earlier, internal conflict can be fatal for political parties. A party with closed
ranks is regarded more stable and able to manage the country than a party in which party members
are fighting for power. In this section we will look at the internal praise and conflict in the parties.
We used the following query:
!X - !Y
X partof ?A
Y partof ?A
?A isa party
0
20
40
60
80
100
120
140
160
180
200
CDA PVV PvdA SP VVD
Criticism
Praise
Fig. 8: Internal criticism and praise for the major parties
The results of this query, imported into Excel to create a graph, are shown in Figure 8. The
extensive news coverage about the internal conflict within the VVD is clearly seen. Although the
party members tried to praise each other as often as possible this kind of coverage lags far behind
the coverage focusing on the internal fights. The same picture, in somewhat milder form, is seen
for the PvdA, where internal criticism on the leadership exceeds the internal praise in the news.
The opposite tendency is seen looking at the SP and the Right Wing PVV.
5.4 Same place, different guy
If we investigate the source of the criticism on the CDA, we come across the insteresting case of
Minister of Justice Donner: After a critical report about a fire in a prison facility at Amsterdam
Airport Schiphol, he and a colleague stepped down as minister, and he was replaced by the veteran
22 Wouter van Atteveldt, Nel Ruigrok, Stefan Schlobach, and Frank van Harmelen
Hirsh Ballin. We would like to know how ‘the minister of justice’ was evaluated before and after
this event.
-8
-6
-4
-2
0
2
4
200633
200634
200635
200636
200637
200638
200639
200641
200645
Donner
Hirsh-Ballin
Fig. 9: Criticism of consecutive Minister of Justice: Donner and Hirsh-Ballin(Time is expressed as year and week number)
Due to the dynamic nature of roles in our system, one query sufficed for obtaining the results
we wanted to generate. In Figure 9 we see the average value of the criticism/praise for the Minister
of Justice, ranging from -1 (most negative) to + 1 (most positive). The most interesting result
is that until week 39 Donner is returned by the query, while after that the query returns Hirsh
Ballin. This shows that the roles in system work as expected. Substantively, we see that the weeks
before resigning, the Minister was highly criticised. Only after his declaration to resign the news
coverage became positive. With the new minister the news coverage stays positive and controversy
was defused.
For political communication research as the example above, the system presented in this paper has
proven to be extremely useful. Not only does a simple query often suffice to obtain the results over
a large corpus, the queries are easy to adapt in order to sharpen the research question. Moreover,
the direct connection with the articles the results stem from is very useful. This way the researcher
can check if the query really searches for the pattern the researcher is interested in, and it also
helps to sharpen the research question and find quotes from the article to be used in writing the
report.
6 Conclusion
Searching through and analyzing news archives is difficult with keyword-based methods. In this
paper, we presented a system that utilizes detailed annotations of the content of articles acquired
by Relational Content Analysis, a manual annotation technique developed in the Social Sciences.
This technique represents an article as a network of statements between actors and issues, drawing
on a fixed vocabulary of political and societal actors and various issues.
Searching the news 23
To improve the potential for sharing and combining data, we formalised this vocabulary as
an RDFS ontology. For the issues, we created a SKOS-like thesaurus with multiple categorisation
trees. For the (political) actors, we created a number of political roles with temporal validity to
represent the fact that politicians often switch functions.
The system presented here performs custom RDF inferencing to connect time-stamped news-
paper articles with instances representing a poltician playing a certain role that was valid at that
time stamp. To shield users from this complexity, they can query the resulting graph using a simple
query language, which is translated into SeRQL and executed against the repository. The results
of a query can be listed in a table on various levels of aggregation, allowing a user to zoom in
on interesting results and refine his or her query. The system can also visualise the results as a
network, either on its own or as highlighted edges in the total network representing the article.
By performing a number of analyses that have proven useful for predicting voter behaviour in
earlier studies from Communication Science, we make it plausible that this system is useful for
actual analyses on real world data. The system simplifies the analysis of annotated content by
allowing easy definition and evaluation of complex patterns and giving the possibility of zooming
in on the texts that are the raw data of an aggregate result. This allows a researcher to define,
measure, visualise, check, and refine his or her definitions within one tool.
Limitations and Future Work In our perspective, the main limitation of this work lies in the
complexity of the representation. Even with our simplified query language, formulating queries is
not trivial and the user needs a good understanding of the ontology to define sensible queries.
This can be alleviated by adding shortcuts to the query language and by improving on the query
interface, for example using a ‘menu’ of standard queries and a way to interact with the ontology
to find concepts and relations.
Another limitation is the need for creating and maintaining a complex shared ontology by
multiple non-expert users. Although tool support exists for ontology editing and versioning, at
the moment there are no tools that support versioning and modularisation, are flexible enough to
handle the temporal roles needed in this domain, and simple enough for a user not from knowledge
management or computer science. Given the fact that the ontology needs to be adapted as new
events unfold, this effectively means that a Knowledge Management expert has to be part of a
team using this system.
Acknowledgements The authors would like to thank Mark van Assem and Laura Hollink for
insightful comments on the first version of this article and during discussions on this topic.
Bibliography
Antoniou, G. and Van Harmelen, F. (2004). A Semantic Web Primer. MIT Press, Cambridge, Ma.
Brickley, D. and Guha, R. (2004). Rdf vocabulary description language 1.0: Rdf schema. W3C Recom-mendation (http://www.w3.org/TR/rdf-schema/).
Broekstra, J. (2005). Storage, Querying and Inferencing for Semantic Web Languages. PhD thesis, FreeUniversity Amsterdam.
Carroll, J., Bizer, C., Hayes, P., and Stickler, P. (2005). Named graphs, provenance and trust. In Proceedingsof the Fourteenth International World Wide Web Conference (WWW2005), Chiba, Japan, volume 14,pages 613–622.
Dumbill, E. (2003). Tracking provenance of rdf data. Technical report, ISO/IEC.
Guarino, N. (1992). Concepts, attributes and arbitrary relations: Some linguistic and ontological criteriafor structuring knowledge bases. Data and Knowledge Engineering, 8:249–261.
Gutierrez, C., Hurtado, C., and Vaisman, A. (2005). Temporal rdf. In ESWC 2005, number 3532 in LNCS,pages 93–107, Berlin. Springer.
Holsti, O. (1969). Content Analysis for the Social Sciences and Humanities. Addison-Wesley, Reading MA.
Kleinnijenhuis, J., Oegema, D., de Ridder, J., and Ruigrok, P. (1998). Paarse Polarisatie: De slag om dekiezer in de media. Samson, Alphen a/d Rijn.
Kleinnijenhuis, J., Oegema, D., de Ridder, J., van Hoof, A., and Vliegenthart, R. (2003). De puinhopen inhet nieuws, volume 22 of Communicatie Dossier. Kluwer, Alphen aan de Rijn (Netherlands).
Kleinnijenhuis, J., Scholten, O., Van Atteveldt, W., van Hoof, A., Krouwel, A., Oegema, D., de Ridder,J. A., Ruigrok, N., and Takens, J. (2007). Nederland vijfstromenland: De rol van media en stemwijzersbij de verkiezingen van 2006. Bert Bakker, Amsterdam.
Krippendorff, K. (2004). Content Analysis: An Introduction to Its Methodology (second edition). SagePublications.
MacGregor, R. and Ko, I.-Y. (2003). Representing contextualized data using semantic web tools. InPractical and Scalable Semantic Web Systems (workshop at second ISWC).
Masolo, C., Vieu, L., Bottazzi, E., Catenacci, C., Ferrario, R., Gangemi, A., and Guarino, N. (2004). Socialroles and their descriptions. In Dubois, D., Welty, C., and Williams, M., editors, Proceedings of the NinthInternational Conference on the Principles of Knowledge Representation and Reasoning (KR2004), pages267–277, Whistler, Canada.
Mika, P. and Gangemi, A. (2004). Descriptions of Social Relations. In Proceedings of the 1st Workshop onFriend of a Friend, Social Networking and the (Semantic) Web.
Miles, A. and Brickley, D. (2005). Skos core vocabulary specification. w3c. Public Working Draft, WorldWide Web Consortium, November 2005, http://www.w3.org/TR/swbp-skos-core-spec/.
Noy, N. and Rector, A. (2005). Defining n-ary relations on the semantic web. Working Draft for the W3CSemantic Web best practices group.
Osgood, C., Saporta, S., and Nunnally, J. (1956). Evaluative assertion analysis. Litera, 3:47–102.
Prud’hommeaux, E. and Seaborne, A. (2006). Sparql query language for rdf. Working Draft for the W3CRDF Data Access Working Group.
Sicilia, M. (2006). Metadata, semantics, and ontology: providing meaning to information resources. Inter-national Journal of Metadata, Semantics and Ontologies, 1(1):83–87.
Sowa, J. (1988). Using a lexicon of canonical graphs in a semantic interpreter. In Evens, M., editor,Relational models of the lexicon. Cambridge University Press, Cambridge UK.
Sowa, J. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations.Brooks/Cole, Pacific Grove, CA.
Steimann, F. (2000). On the representation of roles in object-oriented and conceptual modelling. Data andKnowledge Engineering, 35:83–106.
van Assem, M., Malaise, V., Miles, A., and Schreiber, G. (2006). A method to convert thesauri to skos.In Sure, Y. and Domingue, J., editors, Proceedings of the ESWC’06, number 4011 in Lecture Notes inComputer Science, pages 95–109.
Van Atteveldt, W., Kleinnijenhuis, J., and Carley, K. (2006). Rcadf: Towards a relational content analysisstandard. In Presentated at the International Communication Association (ICA), Dresden.
Van Atteveldt, W. and Schlobach, S. (2005). A modal view on polder politics. In Proceedings of Methodsfor Modalities (M4M) 2005 (Berlin, 1-2 December).
Searching the news 25
Van Atteveldt, W., Schlobach, S., and van Harmelen, F. (2007). Media, politics and the semantic web: Anexperience report in advanced rdf usage. In Franconi, E., Kifer, M., and May, W., editors, ESWC 2007,number 4519 in LNCS, pages 205–219, Berlin. Springer.
Van Cuilenburg, J. J., Kleinnijenhuis, J., and De Ridder, J. A. (1986). Towards a graph theory of journalistictexts. European Journal of Communication, 1:65–96.
Wong, R., Chau, H., and Lochovsky, F. (1997). A data model and semantics of objects with dynamic roles.In Proceedings of IEEE Data Engineering Conference, pages 402–411.