loculus: an ontology-based i nformation management...
Post on 25-Jun-2020
3 Views
Preview:
TRANSCRIPT
Loculus: An Ontology-Based I nformation Management Framework for the Motion
Picture Industry •
I ARC CENTRE OF EXCELLENCE
I FOR CREATIV I OUSTRIES
A 0 INNOVATION
Sharmin (Tinni) Choudhury, B.Eng (Hons), B.Com Thesis
Submitted to the Discipline of Computer Science at the Faculty of
Science and Technology In partial fulfilment of the requirements for the
degree of Doctor of Philosophy
At Queensland University of Technology
Brisbane 2010
ii
iii
Dedicated To My Mother and My Grandparents
� بس������م ال������رحيم ال���رحمن
iv
v
STATEMENT OF ORIGINAL AUTHORSHIP
I hereby declare that the work contained in this thesis titled “Loculus: An Ontology-Based
Information Management Framework for the Motion Picture Industry” has not been
previously submitted to meet requirements for an award at this or any other higher education
institution. To the best of my knowledge and belief, the thesis contains no material previously
published or written by another person except where due reference is made.
----------------------------------------------------
Sharmin (Tinni) Choudhury
19 October 2010
vi
vii
SUPERVISORY PANEL
Principal Supervisor
Professor Kerry Raymond
Discipline of Computer Science
Faculty of Science and Technology
Queensland University of Technology
Associate Supervisor
Mr Peter Higgs
ARC Centre of Excellence for Creative Industries and Innovation (CCI)
Creative Industries Faculty
Queensland University of Technology
viii
ix
ABSTRACT
“How do you film a punch?” This question can be posed by actors, make-up artists, directors
and cameramen. Though they can all ask the same question, they are not all seeking the same
answer. Within a given domain, based on the roles they play, agents of the domain have
different perspectives and they want the answers to their question from their perspective. In
this example, an actor wants to know how to act when filming a scene involving a punch. A
make-up artist is interested in how to do the make-up of the actor to show bruises that may
result from the punch. Likewise, a director wants to know how to direct such a scene and a
cameraman is seeking guidance on how best to film such a scene. This role-based difference
in perspective is the underpinning of the Loculus framework for information management for
the Motion Picture Industry.
The Loculus framework exploits the perspective of agent for information extraction and
classification within a given domain. The framework uses the positioning of the agent’s role
within the domain ontology and its relatedness to other concepts in the ontology to determine
the perspective of the agent. Domain ontology had to be developed for the motion picture
industry as the domain lacked one. A rule-based relatedness score was developed to calculate
the relative relatedness of concepts with the ontology, which were then used in the Loculus
system for information exploitation and classification.
The evaluation undertaken to date have yielded promising results and have indicated that
exploiting perspective can lead to novel methods of information extraction and
classifications.
KEYWORDS
Ontology, Motion Picture Industry, Human Factors, Semantic Relatedness, Knowledge
Extraction, Knowledge Classification
x
xi
ACKNOWLEDGEMENTS
Firstly, I would like to thank my principal supervisor Kerry Raymond for all her patient
guidance and support. Kerry rather graciously took over my supervision when my former
principal supervisor Binh Pham retired. I couldn’t have been the easiest student to handle but
she still put up with me. For that I am very grateful. I would also like to take this opportunity
to thank my former principal supervisor Binh Pham for giving me an opportunity to do my
PhD and also for securing the necessary funding, without which doing a PhD would not have
been very financially viable. My thanks also go to my associate supervisor Peter Higgs for
putting up with me for so long, especially acting as the conduit between myself and the
creative industry.
Stuart Cunningham is logically the next person on my list of people to thank. This PhD
would not have been possible without the support of Stuart and the ARC Centre for
Excellence and Innovation in the Creative Industries (CCI). I would also like to thank the
hard working individuals in the research and higher degree office of the Faculty of Science
and Technology, Agatha Nucifora, Carol Richter, Matt Williams, Jason Weiss and Sara
Thomas, who have helped me with everything from the arranging travel to sorting out the
myriad of forms a PhD student has to fill out. I would like to especially thank Ricky Tunny
who did not only help me while he was at the Faculty as the research co-ordinator but
continued to help me with forms, applications and other things after he changed jobs and
became the Coordinator Scholarships, Admission & Enrolments at the QUT research office.
Just goes to show, there is no escaping me.
I would also like to thank my mother (Neena Choudhury) for all her support and for inspiring
me on my PhD journey. My thanks also goes to my grandfather (Sorwar Jan Choudhury) for
always believing in me and for inspiring me to do engineering instead of following both my
parents into chemistry. Also, I would like to thank my grandmother (Meena Choudhury) who
encouraged me whole heartedly in the whole PhD thing despite having to share the “Dr” title
with me, but at least she can still say “yes” to the all important question “Is there a Doctor in
the house?” I would also like to thank my brother (Adnan Ali Khan Choudhury) for simply
being there.
I would also like to thank my friends for sticking by me and peppering me with messages of
support. I would especially like to mention Suzanne Little, who having gone through the PhD
process herself was always there with a sympathetic ear.
xii
xiii
PUBLICATIONS FROM THE RESEARCH
Reference Type Refereed
Choudhury, Sharmin (2006) A Metadata-based Framework for the management,
distribution and reuse of digital Motion Picture content. CCI Symposium, October
11 – 12, 2006, Melbourne, Australia
Symposium Presentation No
Choudhury, Sharmin and Pham, Binh L. and Smith, Robert and Higgs, Peter L. (2007)
Loculus: a metadata wrapper for digital motion picture. Internet and Multimedia
Systems and Applications, August 20 – 22, 2007, Honolulu, Hawaii, USA.
Conference Paper Yes
Smith, Robert and Pham, Binh L. and Choudhury, Sharmin (2007) A digital artwork
expression language (DAEL). Internet and Multimedia Systems and Applications,
August 20 – 22, 2007, Honolulu, Hawaii, USA.
Conference Paper Yes
Choudhury, Sharmin (2007) A metadata-based framework for the management,
distribution and reuse of digital motion pictures. CCI Symposium, October 18-19,
2007, Melbourne, Australia
Symposium Presentation No
Choudhury, Sharmin and Raymond, Kerry and Higgs, Peter L. (2008) A rule-based
metric for calculating semantic relatedness score for the motion picture industry.
Workshop on Natural Language Processing and Ontology Engineering at the Web
IEEE/WIC/ACM Intelligence Conference, December 9 – 12, 2008, Sydney, Australia.
Conference Paper Yes
Choudhury, Sharmin (2008) Ontology based perspective determination and its
implications for searching. Third Workshop of the HCSNet Next-Generation Search
Technology Priority Area, November 13 2008, Melbourne, Australia
Workshop Presentation No
Choudhury, Sharmin (2008) Improving human computer interaction. Creating
Value: Between Commerce and Commons - CCI International Conference, June 25
– 27, 2008, Brisbane, Australia
Conference Presentation No
Choudhury, Sharmin and Raymond, Kerry and Higgs, Peter L. Ontology-Based
Information Extraction and Classification: Exploiting User Perspective within
the Motion Picture Industry. International Conference on Asian Digital Libraries,
July 21 – 25, 2010, Gold Coast, Australia
Conference Poster No
xiv
xv
TABLE OF CONTENTS
1 Introduction ................................................................................................................... 1
1.1 Motion Picture Industry Background ..................................................................... 2
1.1.1 Timelines ........................................................................................................... 3
1.1.2 Importance of People ......................................................................................... 4
1.1.3 The Product of the Industry ................................................................................ 6
1.1.4 Challenges being faced ...................................................................................... 9
1.2 Motivation for the Project .................................................................................... 11
1.2.1 Decision Support ............................................................................................. 12
1.2.2 Repurpose ........................................................................................................ 13
1.2.3 Reuse ............................................................................................................... 14
1.2.4 Driving Questions ............................................................................................ 14
1.3 Project Background ............................................................................................. 15
1.3.1 Brief introduction to CCI and AFTRS .............................................................. 15
1.3.2 What AFTRS wanted from the Project ............................................................. 16
1.3.3 Overview of the Standards and Metadata Project ............................................. 17
1.4 The Thesis ........................................................................................................... 18
1.4.1 Contribution ..................................................................................................... 21
1.4.2 Broader application of the research .................................................................. 22
1.4.3 Structure of Thesis ........................................................................................... 23
2 Literature Review ........................................................................................................ 25
2.1 Information Models and Metadata Models ........................................................... 25
2.1.1 Existing Models ............................................................................................... 26
2.1.2 Metadata Schemas and Standards ..................................................................... 31
2.1.3 Summary ......................................................................................................... 33
2.2 Ontology.............................................................................................................. 34
2.2.1 The Use of Ontology within Computer Science ................................................ 34
xvi
2.2.2 Existing Ontologies .......................................................................................... 38
2.2.3 Ontologies in other domains............................................................................. 42
2.2.4 Ontology Implementation Languages ............................................................... 43
2.2.5 Summary ......................................................................................................... 44
2.3 Semantic relatedness ............................................................................................ 44
2.3.1 What is semantic relatedness ............................................................................ 44
2.3.2 Existing measures of semantic relatedness ....................................................... 46
2.3.3 Summary ......................................................................................................... 49
2.4 Information Extraction ......................................................................................... 49
2.5 Summary ............................................................................................................. 51
3 Research Plan .............................................................................................................. 53
3.1 Research Questions .............................................................................................. 53
3.2 Research Methodology ........................................................................................ 53
4 Loculus Ontology ........................................................................................................ 57
4.1 Conceptual Foundation of the Ontology ............................................................... 58
4.2 Axioms ................................................................................................................ 59
4.2.1 General Axioms ............................................................................................... 59
4.2.1.1 Inclusion Axioms .................................................................................... 59
4.2.1.2 Temporal Context ................................................................................... 62
4.2.1.3 Temporal Axioms ................................................................................... 64
4.2.2 Concept Axioms .............................................................................................. 65
4.2.2.1 Inheritance Axioms ................................................................................. 65
4.2.2.2 Agent Context ......................................................................................... 69
4.2.2.3 Linkage Axioms...................................................................................... 70
4.2.2.4 Terminology Axiom................................................................................ 73
4.2.3 Meta-Link Axioms ........................................................................................... 74
4.3 Structure of the Ontology ..................................................................................... 76
xvii
4.3.1 MPI Concepts Ontology ................................................................................... 77
4.3.2 Agent Concepts Ontology ................................................................................ 77
4.3.3 Common Concepts Ontology ........................................................................... 78
4.3.4 The Root Concepts........................................................................................... 78
4.3.5 Lattice Structure .............................................................................................. 82
4.3.6 Three Axes ...................................................................................................... 83
4.4 Ontology Implementation .................................................................................... 84
4.4.1 OWL ............................................................................................................... 84
4.4.2 Altova SemanticWorks .................................................................................... 86
4.4.3 How Concepts are Represented ........................................................................ 86
4.5 Ontology completeness ........................................................................................ 89
4.6 Summary ............................................................................................................. 89
5 Semantic Relatedness Metric ....................................................................................... 91
5.1 Introduction to the Relatedness Metric ................................................................. 92
5.2 Rules for Calculation ........................................................................................... 94
5.2.1 Reach Score ..................................................................................................... 96
5.2.1.1 Inheritance Axis ...................................................................................... 97
5.2.1.2 Linkage Axis ........................................................................................ 100
5.2.2 Temporal Score .............................................................................................. 101
5.2.2.1 Motion Picture Industry Production Cycle............................................. 101
5.2.2.2 Motion Picture Life Stage ..................................................................... 102
5.2.3 Example Calculations .................................................................................... 103
5.3 Abstraction ........................................................................................................ 110
5.4 Application of the Metric ................................................................................... 111
5.4.1 Information Extraction ................................................................................... 113
5.4.2 Information Classification .............................................................................. 117
5.5 Summary ........................................................................................................... 118
xviii
6 Loculus System and Loculus Schema ........................................................................ 119
6.1 The System ........................................................................................................ 119
6.1.1 The High-Level System Architecture ............................................................. 119
6.1.2 The Loculus System Architecture .................................................................. 122
6.1.3 Technology Choices ...................................................................................... 124
6.2 Ingestation Modules ........................................................................................... 124
6.3 Record Management Module ............................................................................. 125
6.3.1 Loculus Wrapper Schema .............................................................................. 126
6.3.2 Implementation Details .................................................................................. 129
6.4 Semantic Module ............................................................................................... 130
6.4.1 Ontology Reader Class................................................................................... 131
6.4.2 Classification Identification Class .................................................................. 132
6.4.3 Production Cycle Identification Class ............................................................ 133
6.4.4 Distance Metric Class .................................................................................... 133
6.5 Classification Module ........................................................................................ 135
6.5.1 Ingest Class.................................................................................................... 135
6.5.2 Record Formulation Class .............................................................................. 136
6.5.3 Classification Class ........................................................................................ 136
6.6 Information Extraction Module .......................................................................... 137
6.6.1 Query Class ................................................................................................... 138
6.6.2 Result Class ................................................................................................... 139
6.6.3 Ranking Class ................................................................................................ 139
6.6.4 Disseminate Class .......................................................................................... 140
6.7 Dissemination Module ....................................................................................... 140
6.8 External Data Services Module .......................................................................... 141
6.9 So, how DO you film a punch? .......................................................................... 141
6.10 Summary ........................................................................................................... 143
xix
7 Evaluation and Discussion ......................................................................................... 145
7.1 Evaluation of the Ontology ................................................................................ 145
7.2 Evaluation of the Metric .................................................................................... 145
7.2.1 Stage 1 – Interview with a Producer ............................................................... 148
7.2.2 Stage 2 – Web Survey .................................................................................... 150
7.2.3 Statistical Analysis Of Stage 2 Data ............................................................... 151
7.2.4 Comparison With Rada’s Simple Edge Counting ........................................... 156
7.3 System Evaluation against Available Data ......................................................... 160
7.4 Discussion of the Ontology ................................................................................ 160
7.4.1 Achievements ................................................................................................ 160
7.4.2 Limitations and Future Works ........................................................................ 161
7.5 Discussion of the Relatedness Metric ................................................................. 162
7.5.1 Achievements ................................................................................................ 162
7.5.2 Limitations and Future work .......................................................................... 166
7.6 Discussion of the Loculus System ...................................................................... 166
7.7 Generalization of this Research .......................................................................... 167
8 Conclusions ............................................................................................................... 171
References ........................................................................................................................ 175
Appendix A: Loculus Ontology Availability ..................................................................... 181
Appendix B: Web Survey Results ..................................................................................... 183
xx
xxi
LIST OF FIGURES
FIGURE 1.1: THE TWO INDUSTRY TIMELINES .......................................................................................................... 3
FIGURE 1.2: DISTANCE TO EDITING ....................................................................................................................... 20
FIGURE 2.1: ENTITIES AND “PRIMARY” RELATIONSHIPS OF THE FRBR MODEL ..................................................... 27
FIGURE 2.2: ENTITIES AND “RESPONSIBILITY” RELATIONSHIPS ............................................................................. 28
FIGURE 2.3: ENTITIES AND “SUBJECT” RELATIONSHIPS ......................................................................................... 29
FIGURE 2.4: ODRL MODEL ................................................................................................................................... 30
FIGURE 2.5: ABC CLASS HIERARCHY WITH PROPERTIES ...................................................................................... 40
FIGURE 2.6: HIERARCHICAL SEMANTIC KNOWLEDGE BASE ................................................................................... 49
FIGURE 4.1: THE TWO TIMELINES OF THE INDUSTRY (REUSING FIGURE 1.1) .......................................................... 62
FIGURE 4.2: THE CONCEPT OF EDITING WITH VERTICAL AND HORIZONTAL LINKS ................................................. 65
FIGURE 4.3: INHERITANCE HIERARCHY OF SOME OF THE CONCEPTS WITHIN THE MOTION PICTURE INDUSTRY ...... 67
FIGURE 4.4: THE META-LINK HIERARCHY .............................................................................................................. 76
FIGURE 4.5: ONTOLOGY EXTRACT – EDITING ....................................................................................................... 80
FIGURE 4.6: ONTOLOGY EXTRACT – THE INHERITANCE OF EDITING ..................................................................... 81
FIGURE 4.7: ONTOLOGY EXTRACT – ACTION ........................................................................................................ 81
FIGURE 4.8: ONTOLOGY EXTRACT – MOTION PICTURE ......................................................................................... 82
FIGURE 4.9: ONTOLOGY EXTRACT – EDITING WITHIN THE THREE AXES ............................................................... 83
FIGURE 4.10: ONTOLOGY EXTRACT – CATEGORY AND ITS CHILDREN ................................................................... 85
FIGURE 4.11: THE REPRESENTATION OF EDITOR AT THE XML LEVEL ................................................................... 87
FIGURE 4.12: INHERITANCE RELATIONSHIP REPRESENTATION .............................................................................. 87
FIGURE 4.13: LINKAGE RELATIONSHIP REPRESENTATION ..................................................................................... 88
FIGURE 4.14: XML-LEVEL REPRESENTATION OF THE CONCEPT OF EDITOR .......................................................... 88
FIGURE 5.1: ONTOLOGY EXTRACT - EDITING ........................................................................................................ 91
FIGURE 5.2: ONTOLOGY EXTRACT - INHERITANCE LINKS AND LINKAGE LINKS ................................................... 93
FIGURE 5.3: ONTOLOGY EXTRACT – APPLICATION OF INHERITANCE AXIS RULE 3 ............................................... 98
FIGURE 5.4: CREW HIERARCHY AND CLOSE ASSOCIATION BETWEEN EDITOR AND MAKE-UP ARTIST .................. 100
FIGURE 5.5: THE TWO TIMELINES OF THE INDUSTRY (REUSING FIGURE 1.1) ....................................................... 101
FIGURE 5.6: THE SCORE CALCULATION OF METHOD ACTING TO ACTOR ............................................................. 104
FIGURE 5.7: THE SCORE CALCULATION OF MOOD TO CATEGORY ........................................................................ 105
FIGURE 5.8: THE SCORE CALCULATION OF MOOD TO RATING ............................................................................. 106
FIGURE 5.9: THE SCORE CALCULATION OF FILM FESTIVAL TO PRESTIGE ............................................................ 107
FIGURE 5.10: THE SCORE CALCULATION OF SCORE TO PROP ............................................................................... 108
FIGURE 5.11: THE SCORE CALCULATION OF METHOD ACTING TO CAMERA ........................................................ 109
FIGURE 5.12: THE SCORE CALCULATION OF METHOD ACTING TO CAMERA THROUGH AGENTS .......................... 110
FIGURE 5.13: AGENT PERSPECTIVE ..................................................................................................................... 112
FIGURE 5.14: EDITOR PERSPECTIVE .................................................................................................................... 113
FIGURE 5.15: ONTOLOGY EXTRACT – EDITOR AND CROSS-CUTTING ................................................................... 114
FIGURE 5.16: ONTOLOGY EXTRACT – GENRE, CATEGORY AND TONE................................................................. 115
FIGURE 5.17: ONTOLOGY EXTRACT – CROSS-CUTTING ....................................................................................... 116
xxii
FIGURE 6.1: HIGH-LEVEL SYSTEM ARCHITECTURE ............................................................................................. 120
FIGURE 6.2: USER INTERFACE - LOCULUS MENU ................................................................................................ 121
FIGURE 6.3: SYSTEM WORK FLOW ...................................................................................................................... 122
FIGURE 6.4: THE LOCULUS SYSTEM .................................................................................................................... 123
FIGURE 6.5: THE INGESTATION MODULE ............................................................................................................ 124
FIGURE 6.6: THE LOCULUS RECORD MANAGEMENT MODULE ............................................................................ 125
FIGURE 6.7: THE LIFE STAGE TIMELINE .............................................................................................................. 126
FIGURE 6.8: THE LOCULUS METADATA WRAPPER SCHEMA ................................................................................ 127
FIGURE 6.9: EXAMPLE LOCULUS RECORD ........................................................................................................... 130
FIGURE 6.10: THE SEMANTIC MODULE ............................................................................................................... 131
FIGURE 6.11: THE METHODS OF THE LOCULUS ONTOLOGY READER ................................................................... 132
FIGURE 6.12: THE DISTANCE METRIC CLASS ...................................................................................................... 134
FIGURE 6.13: THE CLASSIFICATION MODULE ...................................................................................................... 135
FIGURE 6.14: THE INFORMATION EXTRACTION MODULE .................................................................................... 137
FIGURE 6.15: USER INTERFACE - DISCOVERY AND DECISION SUPPORT............................................................... 138
FIGURE 6.16: THE DISSEMINATION MODULE ...................................................................................................... 140
FIGURE 6.17: THE EXTERNAL DATA SERVICES MODULE .................................................................................... 141
FIGURE 6.18: CONCEPTS RETURNED ................................................................................................................... 142
FIGURE 7.1: WEB SURVEY INTERFACE ................................................................................................................ 150
FIGURE 7.2: THE SCORE CALCULATION OF MOOD TO RATING (REUSING FIGURE 5.8) ........................................ 164
FIGURE 7.3: THE SCORE CALCULATION OF MOOD TO GENRE ............................................................................. 165
xxiii
LIST OF TABLES
TABLE 4.1: LOCULUS ONTOLOGY BASIC STATISTIC ---------------------------------------------------------------------- 89
TABLE 7.1: THE THIRTY PAIRS OF CONCEPTS AND THEIR RELATEDNESS SCORE------------------------------------- 146
TABLE 7.2: SCORE TRANSFORMATION ------------------------------------------------------------------------------------ 148
TABLE 7.3: RESULTS OF THE STAGE 1 EVALUATION --------------------------------------------------------------------- 148
TABLE 7.4: CORRELATION COEFFICIENT FOR INDIVIDUAL RESPONDENTS -------------------------------------------- 152
TABLE 7.5: CORRELATION COEFFICIENT FOR OVERALL RATING ------------------------------------------------------ 153
TABLE 7.6: CORRELATION COEFFICIENT FOR GROUPS ------------------------------------------------------------------ 154
TABLE 7.7: AVERAGE AND MEDIAN CORRELATION COEFFICIENTS FOR HUMAN AGAINST HUMAN --------------- 154
TABLE 7.8: POSSIBLE OUTLIERS ------------------------------------------------------------------------------------------- 154
TABLE 7.9: RECALCULATED GROUP CORRELATIONS COEFFICENTS --------------------------------------------------- 155
TABLE 7.10: RECALCULATED AVERAGE AND MEDIAN FOR HUMAN AGAINST HUMAN ----------------------------- 155
TABLE 7.11: RADA’S METRIC BASELINE ---------------------------------------------------------------------------------- 156
TABLE 7.12: CORRELATION COEFFICENT BETWEEN RADA'S METRIC AND LOCULUS METRIC --------------------- 157
TABLE 7.13: CORRELATION COEFFICIENT FOR INDIVIDUAL RESPONDENTS ------------------------------------------ 158
TABLE 7.14: CORRELATION COEFFICIENT FOR GROUPS EXCLUDING OUTLIERS ------------------------------------- 159
TABLE 7.15: CORRELATION COEFFICIENT FOR DISTANT CONCEPT PAIRS, EXCLUDING OUTLIERS ---------------- 159
TABLE B.1: LEGEND ............................................................................................................................................ 183
TABLE B.2: TRANSLATION OF METRIC GENERATED SCORE TO A SCALE OF 1 TO 5 ............................................... 183
TABLE B.3: EDITORS ........................................................................................................................................... 184
TABLE B.4: PRODUCERS ...................................................................................................................................... 185
TABLE B.5: MISCELLANEOUS CREW ................................................................................................................... 188
TABLE B.6: MULTI-ROLE ..................................................................................................................................... 189
TABLE B.7: CORRELATION FOR PRODUCERS AGAINST OTHER PARTICIPANTS .................................................... 191
TABLE B.8: CORRELATION FOR EDITORS AND MISCELLANEOUS CREW AGAINST ALL OTHER PARTICIPANTS .... 192
TABLE B.9: CORRELATION FOR REMAINING PARTICIPANTS AGAINST ALL OTHER PARTICIPANTS ..................... 193
xxiv
xxv
LIST OF ABBREVIATIONS
ABC – A Boring Core model and ontology
AFTRS – Australian Film Television and Radio School
AI – Artificial Intelligence
ARC – Australian Research Council
BPM – Business Process Modelling
CCI – ARC Centre of Excellence for Creative Industries and Innovation
CIDOC - Committee on Documentation of the International Council of Museums
CRM – Conceptual Reference Model
DAML - DARPA Agent Markup Language
DARPA - Defense Advanced Research Projects Agency (United States of America)
DREL – Digital Rights Language
FPS – Frames Per Second
FRBR - Functional Requirements for Bibliographic Records
FRBRoo - FRBR-object oriented
IFLA - International Federation of Library Associations and Institutions
JDom – An open source Java-based document object model
KM - Knowledge Modelling
METS - Metadata Encoding and Transmission Standard schema
METS AV - Metadata Encoding and Transmission Standard Audio-Visual schema
MPEG - Moving Picture Experts Group
MPEG-7 – Moving Picture Experts Group Standard for multimedia content description.
xxvi
MPEG-21 – Moving Picture Experts Group Standard that aims at defining an open
framework for multimedia applications
ODRL - Open Digital Rights Language
OIL - Ontology Inference Layer
OWL - Web Ontology Language
QUT – Queensland University of Technology
RDF - Resource Description Framework
REL – Rights Expression Language
UI – User Interface
XML - Extensible Markup Language
XrML - eXtensible rights Markup Language
1
1 Introduction “I would like, if I may, to take you on a strange journey.”- The Criminologist, The Rocky
Horror Picture Show
Why do we seek information? What are we really looking for when we start to seek? What
kind of information are we really looking for? Are we looking for any answer to a question?
Or is there a specific type of answer in our mind when we pose a given question?
The information we seek and the manner in which we go about seeking it differs from person
to person based not only on our information needs but on how much knowledge we already
have on the subject. Where the information need is well-defined, information-seeking
becomes a simple information retrieval task [1], e.g. “who directed The Da Vinci Code?”.
However, where the information needs are for more complex mental activities such as
learning and decision making, information retrieval is necessary but not sufficient [1].
At the library of Australian Film Television and Radio School’s (AFTRS) [2] students
frequently seek the answer to questions such as “How is a punch filmed?1”. While many
students might ask the same question, the answer they are seeking is widely different. Editing
students are seeking opinions on how best to edit a film sequence containing a punch. The
directing students are seeking opinions on what angles work best when filming a punch and
how many angles are needed to provide the editor enough material to choose from. For acting
students the primary concern would be to pick up tips on how best to act when punching or
being punched in a given scene. The acting students might be interested in gaining some
superfluous knowledge of the art of filming a punch from the viewpoint of the editor and the
director, but only to the extent which would allow them best to perform their role as actors.
This is an example of information-seeking for a complex mental process: the students are
seeking to learn. Their information needs are not precisely defined, but equally they are not
completely open to any answer, and do have a set of criteria by which they will judge the
suitability of the information they discover for their purposes. This is in contrast to someone
who merely wishes to retrieve a particular piece of information they already know exists for a
specific purpose. For example, while making an argument regarding a specific topic, to
strengthen their argument they might wish to retrieve specific pieces of information that they
1 For the purposes of the paper, the examples of user queries are expressed in terms of what the user is thinking and not how they interface with the Loculus system.
2
already know to exist either from previous information-seeking activity or from prior
knowledge. This is an instance of a simple information-seeking activity that calumniates in a
simple information retrieval task. Another simple information-seeking activity is where the
seeker simply wishes to verify what they already know, e.g. they believe that Ron Howard
directed the movie The Da Vinci Code and they simply wish to confirm that it is so. This is
the kind of information-seeking activity that current technologies handle well. However the
previous “filming the punch” example is remarkably different and existing query
technologies do not handle such queries very well.
In this thesis, we propose a new method of information extraction support that is based
around the perspective of the information seeker, where perspective takes into account the
seeker’s relationship to the information being sought. We explore the viability of such an
information extraction support mechanism within the context of the motion picture industry
with the aid of our industry partners AFTRS, under the collaborative research umbrella of
the ARC Centre of Excellence for Creative Industries and Innovation (CCI). However, the
method being proposed is applicable to any domain where perspective of the information
seeker matters.
Before we proceed, we should explain what we mean by information extraction and how it
differs from information retrieval. In information retrieval as performed by a search engine,
usually whole documents are returned. In information extraction whole or part of a document
may be returned. In addition, information extraction may also include extracting information
from multiple sources and presenting them to the information seeker in a coherent manner. In
this way, the idea of information extraction is similar in nature to those in the natural
language process where information extraction involves the extraction a part of a corpus of
the text for processing [3]. Perspective sensitivity and ill-defined queries come into play
because without some awareness of perspective, the information cannot be extracted and
presented properly, with ill-defined queries being a reflection the information seeker’s
perspective in terms of their expertise on the topic of the information seeking activity.
1.1 Motion Picture Industry Background
The motion picture has been described as “THE contemporary art form” [4] that draws
together people from a wide variety of fields to create what is both a consumer product and a
cultural artifact. These artifacts are unique and always have a creative underpinning with
3
many films aiming for the heights of artistic expression. However, while most other mediums
of artistic expression deal with one aspect of the sensory field, the motion picture touches
multiple sensory fields [4]. Motion pictures share visual space with paintings and sculpture
[4]. Motion pictures share audible space with theatre, poetry and music [4]. Motion pictures
share the space of action with literature and theatre [4]. This speaks to the complex nature of
motion pictures and its production process, in that it is a collaborative effort that brings
together a variety of people to put together a work of multi-sensory experience that is both a
consumer product and a cultural artifact. In this section, we first look at many facets of the
industry that contribute to its complex nature, namely, the timelines of the industry, the
importance of the people as well as the nature of the product. Finally we look at the
challenges being faced by this industry. It is these challenges, combined with the
characteristics of the industry, that makes the motion picture industry a interesting domain of
application for perspective-based information management.
1.1.1 Timelines
One of the important characteristics of the motion picture industry (MPI), especially during
its production process, is the importance of time. The industry has two timelines, both are
inherently linked - to each other and to other activities within the industry. Graphically the
relationship between the two is showed in Figure 1.1.
pre-production production post-production
conception production utilisation
distribution discovery access preservation
reuse/re purpose
Production Cycle
Life Stage
Production Cycle as a whole
Figure 1.1: The Two Industry Timelines
The first timeline is the ‘production cycle’ timeline, the process by which the motion picture
is created. The production cycle is broken into three phases: pre-production, production and
post-production. It is hard to set precise boundaries on when pre-production starts, as it
usually involves imprecise tasks. It takes time to get the basic concepts of the film to such a
state that it obtains a commitment to fund further development or is ‘green lighted’ as it is
termed in the industry.
4
In contrast, the production phase has precise borderlines, starting on the first day of shooting
and finishing on the last day of shooting. As soon as production ends, post production starts
in earnest, although some post-production activities, e.g. special effects for scenes already
filmed, might have began while the bulk of the motion picture was still in the production
phase. Post-production encompasses everything after production. The reason for this is
because there is always something to do whether it be to produce the final cut, to market the
final cut or to digitally re-master the motion picture for a new generation of viewing
technologies or simply to preserve it. As such, there is merit in saying a completed motion
picture is always in post-production, since there really is no event that can be marked as “the
end”.
The second timeline is the motion picture’s life stages. These life stages are conception,
production and utilization. Utilization in turn comprises distribution, discovery, access,
reuse/repurpose and preservation. The life stages do not map exactly onto the production
cycle, nor do all motion pictures reach all life stages. A motion picture is in the conception
stage when it is conceived and is being fleshed out. The latter part of conception would
correlate with pre-production. A motion picture is in production life stage when it is being
produced; so the latter parts of pre-production, all of production and the post-production
activities that end with the creation of the final cut would correlate with this stage. Utilization
spans the remainder of post-production. However, while there is an order in which the life
stages must be reached, the life stages are not a good measure of chronology. This is because
a motion picture can exist in multiple life stages at once and return to a previous life stage
under certain circumstances. Certainly, the sub-stages of utilization: discovery and access
happen multiple times.
Therefore, while the production cycle and the motion picture life stages are related they are
not the same. An easy way to distinguish between them is that the production cycle creates
the motion picture, while the life stages are the various stages in the life of the motion
picture.
1.1.2 Importance of People
Another important aspect of the industry that adds complexity to its nature is the diverse
range of crafts or skills involved. As mentioned in the introduction to this section, motion
picture development process brings together people from a wide variety of fields, some of
5
these people are only involved with one phase of the production cycle or only one of the life
stages of the motion picture. Others are involved with multiple cycles or life stages and a few
are involved for the entire production cycle and the majority of the life stages.
This mix of agents who are involved short term and agents2 who are involved long term raises
interesting dynamics in terms of how the industry functions. One of the contributing factors
to the stagnation of innovation in the production process of the motion picture has been the
short-term nature of people involved in the production phase of the production cycle and the
production life-stage of the motion picture. Aside from a few notable exceptions, the span of
the production phase is measured in months, even when the span of the entire production
cycle is measured in decades. For example, The Curious Case of Benjamin Buttons started
pre-production in 1994 when film industry executives were first approached with the
possibility of filming an adaptation of the F. Scott Fitzgerald short story of the same name.
Production, however, did not start until 2007 with the film finally released in 2008 [5]. What
this means is that when people are brought together to work for the filming on the motion
picture, everybody is expected to know their role and to perform it according to the industry
standard. This inevitably means that changing the way the industry defines a role can become
a sluggish process. There is a significant risk from the uncertainty associated with changing
roles and how different roles interact, which is a major reason why some parts of the motion
picture production process have remained unchanged for decades.
Despite this, standard industry practices are slowly changing (which will be discussed in
detail later in the chapter) but what is not changing is how the perspective of people differs,
based on the length of time they are associated with a project. This also speaks to
specialisation and collaboration. The movie-making process is inherently a collaborative
process; however some people’s day-to-day activities involve more collaboration than others,
e.g. the director’s work is more collaborative than that of the costume designer. A director’s
job involves directing the technical and artistic crew and cast members to bring to the screen
the director’s vision of the script, in the process controlling the artistic direction of the film.
The costume designer, on the other hand, works on a small subset of the script visualisation
process and, while they are directed by the director and have to make sure that the costumes
fit the actors, they have their own expertise and their own largely self contained way of
2 The term “agent” is used, and will be used throughout this thesis, in reference to practitioners of the domain. “Agents” as understood by the motion picture industry will be referred to more specifically, e.g. talent agent.
6
bringing to life the era and situation depicted in the script into being through their costumes.
Some people’s information needs are mostly limited to those generated by their own
department, e.g. the costume designers do not need too much information from outside their
department. Other people’s information needs extend not only across departments but across
phases and stages of the two timelines, e.g. the producer works across all phases and needs
information from all phases.
Another interesting aspect of the people involved within the Motion picture industry is that
the people are mix of highly technical to highly creative, as well as people who are both
creative and technical. For example, an electrician who installs the lights is a technical
person, but the lighting technician is not merely a fancy name for an electrician. The key grip,
the head of the grip department and chief rigging technician on the set, is part of the creative
technical crew who has an active role in bringing the magic into the movie. On the other
hand, an actor is a purely creative role.
This mix of short-term and long-term, technical and creative, specialised within a narrow
field but collaborating towards one grand vision means that the people involved in the motion
picture industry contributes greatly towards the complex characteristic of the industry.
1.1.3 The Product of the Industry
The motion picture is a creative product; it is a cultural artifact but, at the same time, it is
generally meant for mass consumption. Being a creative production, even the most formulaic
of motion pictures, e.g. Rocky, Rocky II, Rocky III, Rocky IV, Rocky V, is a creative product
that is therefore subject to the idiosyncrasies of the creative process and the creative mind.
Why did the director feel the need to reshoot a scene forty-two times? Would adding more
car crashes make the movie more appealing to the target demographics? How should the film
be edited to make the story more thrilling? It is not always clear why certain decisions are
made, even to the person making the decisions.
More importantly, both as an artistic product and a consumer product, the motion picture is
subject to taste-based subjective measures that make the business of making motion pictures
very unpredictable and therefore financially very risky. In truth, there is no such thing as a
“sure fire hit”. Unexpected films often find box office and/or critical success, e.g. Slumdog
Millionaire, despite featuring unknown actors, being set in a non-western society with part of
the film being filmed in a foreign language and with the entire story strongly tied to the
7
history of the city of Mumbai that would be unfamiliar to western audiences [6]. At other
times, what are considered sure fire hits end up being flops, e.g. Lions For Lambs which
starred Tom Cruise, Meryl Streep, with Robert Redford directing, producing and starring in
the film. In addition the film expounded the heroism of US soldiers in Iraq and Afghanistan
[7]. Yet despite combining patriotic themes with leading actors the movie still failed. This is
both a testament to the fickle nature of motion picture audiences, as well as inherent risk
associated with a creative product.
What the people making the motion picture thought was a good idea, the audience can
perceive as being a very bad idea. On the flipside of the coin, the audience often take a liking
to unexpected films. Whether a film does succeed or fail, the reasons behind it are often
complex and the reasons why they succeed or fail is unquantifiable. Sometimes the reason is
obvious in hindsight but not necessarily replicable, e.g. The Blair Witch project was an
unexpected hit due to the marketing campaign employed that had people convinced that it
was in fact a true story, but it is not a trick that will necessarily work twice.
Given the large budget to make motion pictures, studios and investors would like to be able
to answer the questions: what makes a good motion picture good? What makes a bad motion
picture? Sometimes it is obvious which category a motion picture belong to but most of the
time it depends wholly on who is giving the judgement. In addition, being “good” does not
guarantee commercial success, nor does being “bad” preclude commercial success. Many
blockbuster movies are critically considered to be mediocre at best and more often than not
tend to follow a formula, e.g. The James Bond franchise. This, however, does not stop them
from attracting huge audiences.
On the other hand artistic films, even when critically acclaimed, often are not commercial
success. Indeed, in recent times the gap between artistic films and so called commercial films
has been steadily growing; one of the major signs of this gap is that blockbusters rarely win
Academy Awards these days. This was not historically the case. Indeed, historically big
budget blockbusters such as Ben Hur were more favoured for Oscars than smaller
independent films. These days however, the Oscar for the Best Film tends to go more to
independent films such as Slumdog Millionaire.
There is, however, another reason for the shift in voting pattern for the Oscars. The motion
picture industry is an industry that is highly risky but at the same time risk averse. The
production cost of most movies is counted in the millions and sometimes hundreds of
8
millions of dollars; however, the unit price of a movie ticket at the theatre is ~$15 and a unit
price of a DVD rental of a one-year-old movie can be as low as $3. With this kind of low unit
revenues, the industry generally depends on mass consumption to make a profit. Increasingly,
the major studios are going for the safe option of sequels and prequels to past hits, e.g. the
Star Wars films, adaptations of books/video games/plays that have a built-in audience, e.g.
the Harry Potter films, and therefore a better chance of success in the fickle world of
consumer taste that determines the success and failure of such a taste-based product as a
motion picture. This is one of the main reasons why the voting pattern for the Academy
Awards has shifted from rewarding films from major studios to those being made by smaller
independent studios who often take more creative risks. The Academy is increasingly finding
that the big studios simply do not produce films that fit the Academy’s “arts” self image.
So far we have discussed the motion picture as an artistic product, a creative product and as a
consumer product. However, at the beginning of this section we also stated that the motion
picture was a cultural artifact; motion pictures tell the story of the period in which they are
produced. Even if they are action movies or popcorn blockbusters, they reflect the state of
mind and values of the audiences they are targeting. However, the cultural significance of a
movie cannot always be judged by the immediate reaction that it garners. For example, the
1946 film It’s a Wonderful Life is today deemed "culturally, historically, or aesthetically
significant" by the United States Library of Congress and selected for preservation in
their National Film Registry [8]. However, at the time of its original release the reviews were
almost all negative and the FBI considered the film to be communist propaganda because of
the negative portrayal of a banker [8]. In many parts of the US, audiences walked out of the
film as soon as it appeared that George Bailey, the main character of the film, was
contemplating suicide. Therefore, the movie suffered from poor word of mouth as people did
not realize that it was actually a rather uplifting film, despite its downcast opening.
At the same time, some motion pictures are specifically funded for cultural significance with
little regard to box office potential, e.g. Ten Canoes [9]. Culturally significant movies are
often funded by government bodies, private artistic funds and grants. Such films are almost
certainly not the domain of major studios. Due to toughening economic situations, the major
studios are shying away from the small portion of these kinds of movies they once used to
fund. This speaks directly to the challenges being faced by the industry, which is discussed in
detail in Section 1.1.4.
9
1.1.4 Challenges being faced
During the 30s and 40s, which is generally considered the golden age of cinema, the motion
picture industry was dominated by the four major studios that exerted control over all aspects
of the production and marketing of films. Their business model was a direct pipeline – the
cinema operators would tell studios they wanted three Clark Gable movies, two Spencer
Tracy movies and one Laurence Olivier movie. The studios would then get to work on these
movies, with a guaranteed distribution channel and, while not the only game in town, movies
were by far the entertainment of choice for the masses due to the affordability of tickets and
the general novelty of the concept of motion pictures. However, the coming of television
spelt the death of that model, introducing the prime challenge: rising costs but decreasing
revenue that is being faced by the industry even today, although no longer just from
television, which is now itself an established player in the greater screen based entertainment
industry.
The chief competition for the entertainment consumer’s dollar comes from the computer
games industry. The average age of a computer gamer has risen to 35 - in the heartland of
film’s traditional audience age group [10]. While the unit cost of a single game is higher than
the unit cost of a single movie ticket, a game offers many more hours of entertainment. Not
surprisingly many people, with and without families, now prefer to spend their entertainment
dollars on games, achieving more hours of entertainment per dollar than a night out in the
cinema.
Piracy has long been a problem for the motion picture industry since the advent and
widespread adoption of video cassette recorders and players made it easy to duplicate and
distribute pirated copies. However, the rise of the Internet has taken piracy to a new level.
Sometimes piracy is a direct market response to out-of-date industry practices –for instance
unmet demand is created when film business models required the staggered release of a
motion pictures through a sequence of countries [11], e.g. if a movie is released in the US in
January but does not come to Australia till May often people in Australia who really want to
see the movie will simply download a pirate copy, not because they are averse to paying
money to watch it but because they do not wish to wait [11]. Other times piracy is a result of
the consumers having limited money for entertainment and needing to stretch it [11].
Although there is a cost associated with the bandwidth needed to download, downloading is
still cheaper than going to the movies and, despite the industry coming down hard on pirates,
10
the industry are far from winning the war against piracy [12]. However, a lot of the time
piracy is simply an unwillingness to pay for a given motion picture.
However, even as the Internet has cut into the revenue of the motion picture industry, it has
also opened up new opportunities for the motion picture industry. At its most obvious the
Internet is a new distribution channel that allows the industry to reach consumers and
redefine the concept of a niche market. There have always been niche markets in the movie
industry together with niche production companies servicing these markets, e.g. the Mormon
movie industry, horror movies, surfing movies etc. That being said, these niche markets have
often been limited by access, which is often connected with geography. For example,
Mormon movies are easily accessible in Utah, but followers of the Mormon faith (or people
who just happen to like Mormon movies) outside of Utah have a far more difficult time
getting access to these movies. Internet distribution is able to make niche movies more
accessible to a worldwide audience.
Moreover, the Internet has opened the door to capitalise on fan culture. Fan culture is not
very well understood by the industry and is often viewed with suspicion and, under existing
copyright laws, often treated as criminal [13]. Yet, facilitating fan culture and enabling the
fans to legally do what they already do covertly could potentially generate new revenue
streams for the industry. For example, many fans of Star Wars prefer Han Solo to Luke
Skywalker. These fans make montage videos as tributes to Han Solo and post them on
YouTube. They write fanfiction that rewrites the Star Wars story to turn Han Solo into a Jedi
and the saviour of the galaxy. These fans would love to get hold of footage that ended up on
the cutting room floor more focused on Han Solo and be able to use it, to remix it [13], and
create something new. Fans are also interested in things that might be considered trivial by
non-fans and even the creators of the movies. Fans of Star Wars would be interested in
reading the different version of the script. What was the script originally? How did it change
overtime? Was one of the other versions of the script far superior to the version that was
filmed? These are not questions that interests non-fans but these are exactly the questions that
fans would find fascinating.
There are two issues that stand in the way of fully realising the value of fan culture and
exploiting the niche markets. Firstly, there is the long ingrained business practice of the
industry that stops the industry executives from realising the full potential of new
developments in technology and behaviour of fans in response to the technology. Secondly,
11
there is the issue of hampered ability due to mismanagement of information resources.
Earlier, we mentioned how the different versions of the Star Wars scripts would be of interest
to the fans, however even if industry executives recognised the benefits of making available
these various versions of the script, they may not have the script version available to make
them available. The information regarding the changes made to the script are meticulously
recorded during production and promptly discarded after the film hits post production. Even
if the scripts are kept, they are not preserved in a reliable way. There is nothing we can do to
change the business culture of the motion picture industry directly. However, we can develop
tools and techniques that make it easier for the motion picture industry to manage, reuse,
repurpose their information. Perhaps changing the information culture may also alter the
business culture as it becomes more cost effective to support fan culture and grant access to
more consumers to increase the size of a niche market etc.
1.2 Motivation for the Project
The motivation for this project came from the observation of the inefficiencies of the motion
picture industry in relation to the information management. The motion picture industry is an
information intensive industry, generating and utilising vast quantities of data each day with
the practitioners within the industry having unquantified stores of tacit knowledge. However,
while the industry is deeply reliant on information, the tools and techniques necessary for
efficient management of such information are lacking. There are different types of databases
with various gatekeepers of information presiding over them, such as casting directors with
databases of actors or location managers with databases of locations. These are effectively
information silos with little communication with other silos and even within the silos the
utilisation of information is limited. Part of the problem with these silos is the use of
proprietary formats that make it difficult to export the information. The other problem with
the silos is the gate-keepers, who either do not understand the value of the data to others or
believe providing access to the data would diminish their influence and job security.
Meanwhile, the industry is steadily moving from analogue to digital processes not only in
terms of using digital tools, e.g. digital cameras and editing suites, but due to increased
economic pressures the industry is at long last moving away from century-old manual
processes and adopting semi-automated workflow systems [14, 15] and other tools and
techniques that are generating more “born digital” information than ever before. However,
unless these new information sources are properly managed then the information will have no
12
life beyond the immediate and limited use within the production process. Therefore, there is
considerable scope for development of tools and techniques to aid the industry in managing,
enriching and generally adding value to the digital information that the industry is generating
as part of its production process. This information can be a potential new source of revenue
when utilised to support a fan culture. These kinds of fan activities also serve to keep the
movie in question in the public consciousness and fuels further consumption by the general
public, as well aid the movies in creating a lasting cultural impact.
Information can be a valuable commodity and, increasingly, the contextual information
generated during the production process of motion picture is being recognised as having
value all of its own [16]. However, while the motion picture industry has always generated
and consumed a vast quantity of information during any given production of a single motion
picture, the repurposing and repackaging of that information to generate additional value has
been fairly non-existent. Part of the problem has been the form of the information.
Traditionally the information was in analogue and often cumbersome; it could not easily
repackaged or was held tacitly inside people’s heads, but this is changing. The other is that
many aspects of the industry, as explained earlier in terms of the short-term involvement of
many agents, has been stagnant for many decades now and/or has not changed since the
industry came into being about a hundred years ago and therefore, the value of such
contextual information is not recognised and/or practitioners are unsure how to generate
value from the information.
The information can generate value in three broad ways. Firstly, through repurposing the
information for uses other than what it was generated for. Secondly by reusing existing
information and thirdly by enabling people to make more informed decisions.
1.2.1 Decision Support
The first type of decision support activity that can potentially be supported by allowing more
advanced forms of information manipulation is process improvement. Within the industry
production information collected about budget over-runs and other things can be analysed to
reveal chronic problems and point to possible solutions. This kind of trend analysis requires
sophisticated information retrieval across multiple motion picture productions. If such data
was available in high quantity, then data mining would also be possible to find correlations
that might not be immediately obvious. For example, some actors might be associated with
13
production cost over-runs due to their temperamental behaviour during filming, or the desire
for certain editing techniques might necessitate more cameras during filming. Even simple
analysis could lead to improvements to the production process. Such lessons learnt at the
industry level could flow back to the teaching institutes as comprehensive case studies and
best practice guidelines.
The second type of decision support that can be facilitated through the better management of
data is Learning Support. Even films that are spectacular flops such as the 2004 film
Catwoman can generate valuable information during its production. Catwoman, for example,
was the first film that was totally digital, including totally digital editing. Indeed, this was
cited as one of the contributing factors towards its failures. With the editor complaining that
as the film was totally digital, nothing ended up on the “cutting room floor” so to speak,
leaving too many options open for the final cut. The film’s producers, director and editor had
a hard time locking down the final cut, as they were overwhelmed by the amount of choice
available to them. While this was by no means the only fault with the film, analysis of the
production information of Catwoman might lead to insights and best practice guidelines for
an industry that is rapidly changing, after not changing for a hundred years.
1.2.2 Repurpose
Repurposing can take on two forms: content repurposing and process repurposing,
repurposing information to feed back into industry or related projects, alternatively
information can be repurposed for use outside of the industry.
Some forms of repurposing are already taking place. For example, CGI generated for movies
can now be imported into game engines and can thus double as game graphics. CGI from
movies to games is an obvious move but there might be other possible ports of data and
information. Could editing information be ported into preservation software to enable better
preservation of historically important motion pictures? The answer to such a question is
totally unknown simply because it is currently not feasible to share data and information
easily between various processes of motion picture production.
Repurposing can also go beyond the bounds of the industry. For example, stills taken of a real
location (as opposed to a sound stage) that has been used repeatedly can become a historical
record of the change sustained by a location. With the emergence of the Internet, the value-
added function of remixing has been touted as additional revenue stream, especially within
14
the creative common community as part of the hybrid economy[13]. While remixing parts of
the original film is also controversial, most of the filmed material actually ends up on the
cutting room floor. With better management of these shots, which are now almost always
digital and therefore don’t have to end up in the bin, the motion picture industry can get a
good foothold within the hybrid economy by repurposing what is today being thrown away as
waste product.
1.2.3 Reuse
Information such as location scouting reports can easily be reused. The gatekeepers of the
location scouting reports are location managers, who not only investigate locations for a
particular movie but also go scouting for interesting locations without having any particular
movie in mind. The location reports are often kept on paper and become outdated easily.
However, if moved to digital formats and integrated with tools that can update the location
info automatically, e.g. Google maps, heritage-listing databases, photo archives, the location
scouting reports can become more accessible with the location manager freed from the task of
keeping their existing locations updated and can be free to go find new locations, as well as
spend more time in selecting the right location for each scene of the film. Therefore, through
information re-use, the location manager becomes both more effective and more efficient.
This does not diminish the status of the gatekeeper; it simply makes the gatekeeper more
efficient by removing some of the more tedious aspects of their work. There will still be the
Location Manager, except said Location Manager will no longer have to troll through their
old locations updating tedious details like new zoning regulations, heritage listing or even
change in access condition and other things that can easily done fully automatically.
It is not to say that information within the motion picture industry is not reused now. It is just
that the reuse is often a needlessly laborious process, which often deters or at least reduces
reuse.
1.2.4 Driving Questions
The motivating scenarios mentioned above, combined with the observation regarding
perspective, lead us to the driving questions that motivated the work being reported in this
thesis. The central questions are:
15
• How to model the domain of discourse to reflect and include the different
perspectives of agents?
• Can agent perspective be exploited to better serve the needs of the Motion Picture
Industry?
• What kind of a system could exploit agent perspective to better serve the needs of the
Motion Picture Industry?
1.3 Project Background
The research presented in this thesis is part of a larger project between the ARC Centre of
Excellence for Creative Industries and Innovation (CCI) and the Australian Film Television
and Radio School (AFTRS). The research reported in this thesis is supported by a CCI
scholarship. In this section, we will cover briefly the background from which the thesis
emerged.
1.3.1 Brief introduction to CCI and AFTRS
Established in July 2005, the ARC Centre of Excellence for Creative Industries and
Innovation (CCI) is the first Centre of Excellence funded outside the science, engineering and
technology sectors. The centres research interests falls under three broad categories: Creative
Innovation, Innovative Policy and Creative Human Capital. The centre brings together
researcher from the divergent fields of humanities, the law, various social sciences and
information technology. As part of its Creative Innovation research stream, the centre
engaged in a variety of research projects to develop tools, techniques and methodology for
better management of information and information-related services for the Creative
Industries.
One of these projects was the Standards and Metadata. The primary industry partner for the
Standards and Metadata project was the Australian Film Television and Radio School
(AFTRS), which is also an industry participant in CCI itself.
Created in 1973 by the Australian Government as a key strategy to revive the Australian Film
Industry, AFTRS is the only institution of its kind in the country. AFTRS has an international
reputation for excellence and has had three of its short films (created by their students as part
of their final year project) achieve Oscar nominations. AFTRS has a huge back catalogue of
student films, to which new films are added every year by the graduating students.
16
Consequently, AFTRS has a growing need to effectively manage information about motion
pictures and related activities, making it a suitable partner for the CCI projects in information
management. In addition, AFTRS has total ownership and reserves all rights to these films.
This meant that an agreement with them would suffice and there would be minimal copyright
related issues, as AFTRS owned both the films and all information pertaining to the
production of the film in most circumstances. Moreover, AFTRS has specific issues and
needs that made them potential candidates for early adoption for technologies developed by
the Standards and Metadata project.
1.3.2 What AFTRS wanted from the Project
Like many organisations AFTRS has information silos that result in the considerable wasted
effort of both staff and students in having to supply the same information multiple times to
different departments resulting in information discrepancy and general information
redundancy. It also leads to information fatigue, where providers of information became
reluctant to provide quality information because they have had to provide it so many times.
The organisation as a whole also found itself unable to utilise its information properly both
because of the information silos and the general difficulties in management of information
sources.
AFTRS’ needs fell into the three broad categories of information use mentioned earlier.
AFTRS wants to reuse the information it collected easily; it wants to repurpose its
information to generate more revenue and lastly, as a teaching institute, it would be keen to
utilise its information for decision support purposes.
An example of reuse was in terms of something as simple as distribution release information
for final year graduate films. At present, AFTRS students have to first fill out an internal
AFTRS form providing details to the AFTRS distribution manager, which is used for
AFTRS’ internal records and subsequently supplied to the AFTRS library for their catalogue
information. The AFTRS library is required under the Australian Federal Government
Legislation to keep meticulous records of the student graduation films and then make both the
films and the information available to the National Archives of Australia (NAA) for
permanent storage. Previously the students had to fill out the distribution release form only
once. These days, the students face the additional requirement of re-entering the information
provided in the official AFTRS distribution release form online into multiple databases of
17
distribution clearing houses. This is because AFTRS actively enters their students’ works in
film festivals all over the globe on behalf of their students. These days the majority of the
festivals demand online submission facilitated through online distribution clearing houses.
AFTRS has no means of efficiently or easily porting the data they gather for their internal use
directly into the forms for the online clearing houses. As a result, AFTRS requires its students
to first fill in their internal forms and then provide the same information to three different
online clearing houses. Student participation was problematic when there was only one form.
The cooperation of students has been greatly reduced with so many more forms. AFTRS has
one full -time distribution manager and one part-time assistant, their workload is needlessly
increased when they find themselves forced to track down incomplete forms or identify
inconsistent information that have been provided across the four forms.
Apart from the distribution clearing houses, there are now many websites that are repositories
of information regarding films. Well-known examples are IMDB and Wikipedia. AFTRS
also has a channel on Google Videos where it releases student films. The information
provided in the distribution release form could be repurposed for use in websites such as
IMDB and Wikipedia. The information can also be repurposed for the metadata requirements
for Google Video. This would not be a direct reuse of information as the information would
have to be transformed to “fit” the requirements of IMDB, Wikipedia and Google Videos.
Lastly, in terms of decision support, the information of successes and failures of past student
films at various film festivals could be used to streamline the film festival entrance process
and allow the distribution manager to better focus their limited time and resources.
The above are but few examples of the needs and issues faced by AFTRS and for which they
hoped the Standards and Metadata project would have a solution.
1.3.3 Overview of the Standards and Metadata Project
The idea behind the Standards and Metadata project was to take a leadership role in the
refinement and implementation of cross-sectoral and cross-stage metadata and format
standards. In particular it was to research and demonstrate strategies to incorporate the
effective gathering, use and integration of metadata into the value chain activities.
In essence, the idea was that metadata would be used to enable the reuse, repurposing and
allow for data to be used for decision support purposes. However, it quickly became apparent
18
that metadata alone was not enough and indeed somewhat meaningless. For any system to be
able to properly interpret the metadata it needed a semantic layer that was richer than simple
metadata. It needed an ontology, which in turn became the central focus of this PhD. The link
between the PhD and the project was that the PhD would focus on the development of the
semantic module as part of a larger system that could be deployed within AFTRS. It was
hoped that such a system would aid AFTRS’s efforts of punching holes into its various silos
and improve communication between them. From the point of view of the PhD project the
deployment would be used as a means of testing the semantic modules and its effectiveness
in improving information classification and extraction. Unfortunately, due to a number of
factors (such as personal changes, changes at AFTRS) led to delays with the over-arching
project that resulted in the work undertaken for this PhD being conducted independently of
the overall project. This also had implication in terms of co-operation with other CCI projects
working with AFTRS in the stream of Creative innovation, notably, the Business Process
Management (BPM) project that has been introducing workflow management tools into the
production process of AFTRS final year student short film projects. The idea was that the
data collected via the Business Process Management project, which would be in XML form,
would be used by the system being developed by the Standards and Metadata project for
testing purposes and used to test data for the research reported in this thesis. While some data
has been obtained from the BPM project it is not voluminous as required by this research.
In the meantime, the PhD project has pushed on and developed the semantic module, with
only a skeletal structure of the wider system being developed to support some limited testing
of the semantic module. Overall the system is not yet in a state where it can be deployed as a
fully functional prototype within AFTRS. Therefore, it has not been possible to extensively
test the semantic models, given the lack of the surrounding system and the limited availability
of “real” data. Verification of the semantic module reported in this thesis has resulted from
largely manually constructed test data.
1.4 The Thesis
At the beginning of this chapter we described the scenario of different AFTRS students going
to the AFTRS library and asking the same question and expecting very different results.
Subsequently we described scenarios where the same piece of information, the completed
distribution release form, could be used in various different ways. However, how are these
two things connected? The answer is that the context surrounding the information-seeking
19
activity, which is composed of the purpose to which the seeker wants to put the information
as well as the information that the seeker already possess, determines what subset of a given
information repository the agent seeking the information is most interested in.
The context in which information is captured and how it is to be used can be expressed
through the use of the metadata tags. However, a set of metadata tags do not provide
sufficient semantics to allow for complex and nuanced multifaceted uses of information that
allows different users to pose the same question and get answers framed from their
perspective or allows the relevant parts of a given piece of document to be isolated and
transformed for use in multiple circumstances based on the needs and perspectives of those
seeking the information. Certainly an ontology for the domain is needed to provide meaning
for the metadata tags, however is that all the domain ontology is good for? Can it be used to
determine the perspective of the agents involved with the processes of the domain? If so, how
can that perspective be determined and applied?
In short, the central question being addressed by this thesis is: given a domain ontology, how
can context and perspective be determined from that ontology so as to enhance the seeking of
information to satisfy various needs and situations; a domain ontology being a model of the
concepts of a domain and how they relate to each other.
From the context of the students of AFTRS, the reason the director, editor and actor want
different answers to the same question is because they have different perspectives on the
same situation, due to the role they play within that domain of discourse, a difference that
should be evident upon examination of the domain ontology within the context of the user’s
role. All roles of a domain that can be occupied by an agent should rightly be part of that
domain’s ontology. In addition, the links of these roles to concepts and their nearness to
some concepts more than others can be used to infer the type of answers the agents occupying
those roles are seeking. We can infer how the agent will view and categorize information
because the agent’s internal categories and cognition will be influenced by the role they
occupy within the process of the domain of discourse: the motion picture industry in this
case [17].
As such, the roles of director, actor and editor are all part of the motion picture industry and
would therefore be part of the domain ontology for the motion picture industry. Editing,
Acting, Lighting, Camera Angles and indeed the concept of a Shoot would also be part of the
domain ontology for the motion picture industry. From the ontology we know how concepts
20
are related and whether those relationships are direct or indirect, close or distant. From this
we can determine, for example, that the role of editor is closely associated with editing
technique.
On the other hand, as shown in Figure 1.2, the director is less closely associated with editing
techniques. However, the director is still has closer association with editing techniques then,
say, actors. From this we can infer that when an actor asks the question “How is a punch
filmed?”, they are more likely seeking an answer framed in terms of acting techniques as
opposed to editing techniques.
EditingActor
Director
Editor
Figure 1.2: Distance to Editing
Similar reasoning applies to multiple uses to which a given piece of information can be
applied. When the AFTRS Distribution Release Form is used for standard distribution
purposes by the distribution manager, all the data would be of relevance to the manager.
However, when the information contained within the distribution release forms are used by
agents occupying a different role, for example a marketing manager who is using the
information to generate the Wikipedia page or the IMDB entry for the film in question, then
only a subset of the data would be of relevance. The relevance or rather the relative degree of
relevance of specific pieces of information to a given agent should be determinable through
examination of the domain ontology with reference to the role of the agent and the concepts
to which the information pertains.
21
At the beginning of the chapter we said that the information we seek, and the manner in
which we go about seeking it, differs not only on our information needs but also on how
much knowledge we have to begin with on the subject. Where the information needs are for
more complex mental activities such as learning and decision making, information retrieval is
necessary but not sufficient [1]. The thesis delves into the area of information extraction, as
well as information classification, where simple retrieval is necessary but not sufficient. The
novelty of the approach reported in this thesis lies with exploring the importance of user
perspective and how it can be exploited to enhance the process of information extraction and
classification.
1.4.1 Contribution
The first contribution of the research reported in this thesis is in the development of a domain
ontology for the motion picture industry that also models the agents involved in the industry
and their links to the various concepts of the motion picture industry. The ontology also takes
into account the temporal dimension that is so closely associated with the motion picture
industry. This makes it markedly different from most other ontologies currently in existence
as most ontologies do not take the concept of time into account. Most ontologies also do not
explicitly model agent roles; however both time and people are of paramount importance to
the motion picture industry and therefore need to be incorporated within the ontology. This
makes the ontology atypical.
The second contribution of this thesis project is in the development of a score based
relatedness metric that is used to determine how closely/distantly two concepts are related.
This work has been published in:
• Choudhury, Sharmin and Raymond, Kerry and Higgs, Peter L. (2008) A rule-based
metric for calculating semantic relatedness score for the motion picture industry.
Workshop on Natural Language Processing and Ontology Engineering at the Web
IEEE/WIC/ACM Intelligence Conference, December 9 – 12, 2008, Sydney, Australia.
The combination of the ontology and the relatedness metric then allowed the thesis to pursue
its third contribution to new knowledge by exploring if and how information retrieval and
classification could be improved if the relative perception of users in regards to the concepts
22
involved in retrieval and classification is taken into account. The work related to this was
presented at the following conferences:
• Choudhury, Sharmin (2008) Ontology based perspective determination and its
implications for searching. Third Workshop of the HCSNet Next-Generation Search
Technology Priority Area, November 13 2008, Melbourne, Australia
• Choudhury, Sharmin (2008) Improving human computer interaction. Creating Value:
Between Commerce and Commons - CCI International Conference, June 25 – 27,
2008, Brisbane, Australia
A minor contribution of this project is in the form of an XML wrapper for the picture
industry that wraps information based on the use to which the information is commonly put.
The work related to the schema was published in:
• Choudhury, Sharmin and Pham, Binh L. and Smith, Robert and Higgs, Peter L.
(2007) Loculus: a metadata wrapper for digital motion picture. Internet and
Multimedia Systems and Applications, August 20 – 22, 2007, Honolulu, Hawaii,
USA.
The wrapper schema was also used by another CCI project and that work is published in:
• Smith, Robert and Pham, Binh L. and Choudhury, Sharmin (2007) A digital artwork
expression language (DAEL). Internet and Multimedia Systems and Applications,
August 20 – 22, 2007, Honolulu, Hawaii, USA.
1.4.2 Broader application of the research
Due to the origins of this research the primary domain of testing was the motion picture
industry. However, the work being undertaken does have wider application as many
application domains have a need to reuse, repurpose etc and have agents in different roles
seeking different answers to the same question.
For example, suppose a health professional is executing a search on available health records
to send out information and encourage certain high risks patients for a certain disease to make
an appointment with their doctors. Let us say that the high risk patients are to have high blood
pressure. However, medical records available for such searches are rarely complete. In
23
Australia, at the federal government level, the only detailed medical record available
electronically is through the pharmaceutical subsidiary scheme. As such, when searching
over such records at the federal level it is beneficial to have the system be able to recognise
that a person who is on medication for high blood pressure, is most likely to have high blood
pressure even though no available medical history record definitively says the patient has
high blood pressure. The medical field is one field where there already exists large number of
ontologies and is also a field where perspective matters greatly. The perspective between
specialists and GPs, and the perspective between patient who have just been diagnosed and
patient who have been diagnosed sometime ago differ greatly. If the differences are taken
into account and used to enhance the search experience, it could be very beneficial. This
application of the work is of interest to the information searching research community in
general and novelty of the approach was of interest when the work was presented to the
community:
• Choudhury, Sharmin (2008) Ontology based perspective determination and its
implications for searching. Third Workshop of the HCSNet Next-Generation Search
Technology Priority Area, November 13 2008, Melbourne, Australia
1.4.3 Structure of Thesis
Chapter 2 of this thesis presents the literature review, where we explore three broad topics.
Firstly, we will be looking at what exist in terms of information management for the motion
picture industry. Under this topic we will be covering certain information models, metadata
standards and schema as well as ontologies that exist as well as explore the benefits of using
ontologies for knowledge management. The second topic deals with the idea of similarity/
semantic relatedness and how to measure them and why the ability to measure semantic
relatedness is important. Lastly, we undertake an analysis of information extraction, what it
means, what works is being conducted in the field and groundwork that lead to our own
thinking in relations to user perspective based information extraction.
Chapter 3 presents the research question and outlines the methodology used in this research.
We also present the workflow undertaken by the project to conduct the work being reported
in the thesis.
In Chapter 4 the Loculus Ontology for the motion picture industry is presented. We begin by
explaining the axioms that govern the ontology, as well as detail the sources upon which we
24
based the construction of relationships between concepts and the origins of the concepts
themselves. We then present excerpts from the ontology to highlight key features and
constructions idioms. In addition, we explain the three axes in which the concepts of the
ontology are positioned. The name Loculus comes from Ancient Rome and literally means
little place and was used in a number of senses, including a satchel that formed part of a
Roman legionaries luggage. The name was chosen to reflect the idea that the Loculus
Ontology is a little bag in which you keep the concepts of the industry.
In Chapter 5 we present the semantic relatedness metric, the rules, the reasoning behind the
rules. We also provides example of calculation, as well as presenting applications of the
metric.
In Chapter 6 we give details of the Loculus System, including discussions on all modules and
the technology used to develop the system. We also present the Loculus schema, which is
used internally with the system for information storage and manipulation.
In Chapter 7 we discuss the evaluation of the ontology, the metric and the system. We also
discuss the results of the evaluation, as well as the findings of the research being reported in
this thesis. In addition, we discuss the applicability of the findings of the research in domains
other then the motion picture industry.
Lastly, we conclude in Chapter 8 by revisiting the research questions first presented in
Chapter 3 and discussing in summary format, how the subsequent chapters addressed the
research questions.
25
2 Literature Review “I knew you weren't suited for literature.” – Gonzo, The Muppet Christmas Carol
The literature reviewed during the course of this thesis formed the corner stone upon which
the work was built, justifying the choice of models, methods and technology and served as
motivators towards the research, identifying gaps in knowledge and opening areas for
research and exploration.
One of the first areas we explored in the literature review is what already exists for the
management of information; this is presented in Sections 2.1 and 2.2. In Section 2.1 we will
review the development of information and metadata models that have led to metadata
standards and schemas as semantic containers for information management. In Section 2.2 we
will review information management based on ontologies that have often been deployed in
conjunction with the semantic containers discussed in Section 2.1 to enable higher forms of
semantic manipulation. Another topic that will be dealt with is the idea of similarity/semantic
relatedness and how to measure them and why such a measure can be useful. The review of
literature in relation to semantic relatedness is dealt with in Section 2.3. Lastly, we undertake
an analysis of information exploitation and what potential for contribution exists in this field;
this is presented in Section 2.4.
2.1 Information Models and Metadata Models
As motivation for our research was to enable better information exploitation for the motion
picture industry, a logical starting point was to examine information models. An information
model is an abstract but formal representation of entities including their properties,
relationships and the operations that can be performed on them [18]. Metadata is data
describing data. Metadata models are a type of information models that define metadata for a
given type of information. Metadata schemas that define how data is to be marked-up using
metadata, in a mark-up language such as XML.
A number of existing information and metadata models have been put forward for various
different purposes within the motion picture domain, including archival purposes,
commercial business interaction and rights management, such MPEG-21, MPEG-7 etc. The
question motivating our literature search was: to what extent can information and metadata
models aid in the better management and exploitation of information for the motion picture
industry. Therefore, we began by examining a couple of existing information models and
26
evaluated not just these models but the capabilities and limitations of the information models
in general. Metadata schemas are often become standardised for use within domains. We also
evaluated a number of these schemas and standards to gauge the full effectiveness of
information and metadata models.
2.1.1 Existing Models
Information models are used extensively for archival activities and these activities included
the archiving of motion pictures themselves, although within the archival community they
prefer the more general term “screen content” that encompasses the screen products such as
video games, new media artwork etc, etc. The prevalent information model within the
archival community is the IFLA Functional Requirements for Bibliographic Records (FRBR)
Model Group [19].
The FRBR is an entity-relationship model. There are three groups of entity relationships. The
entities in the first group, as depicted in Figure 2.1, represent the different aspects of user
interests in the products of intellectual or artistic endeavour. The entities defined as work (a
distinct intellectual or artistic creation) and expression (the intellectual or artistic realization
of a work) reflects intellectual or artistic content. The entities defined as manifestation (the
physical embodiment of an expression of a work) and item (a single exemplar of a
manifestation), on the other hand, reflect physical form [19].
The relationships depicted in Figure 2.1 indicate that a work can be realized through one or
more expressions (hence the double arrow on the line that links work to expression). An
expression, on the other hand, is the realization of one and only one work (hence the single
arrow on the reverse direction of that line linking expression to work). An expression can be
embodied in one or more manifestations; likewise a manifestation can embody one or more
expressions. A manifestation, in turn, can be exemplified by one or more items; but an item
can exemplify one and only one manifestation [19].
27
Work
Expression
Manifestation
Item
is realized through
is embodied in
is exemplified by
Figure 2.1: Entities and “primary” relationships of the FRBR model [19]
The entities in the second group, as depicted in Figure 2.2, represent those responsible for the
intellectual or artistic content, the physical production and dissemination, or the
custodianship of the entities in the first group. The entities in the second group include person
(an individual) and corporate body (an organization or group of individuals and/or
organizations). Figure 2.2 depicts the type of “responsibility” relationships that exist between
entities in the second group and the entities in the first group [19].
Figure 2.2 also indicates that a work can be created by one or more persons and/or one or
more corporate bodies. Conversely, a person or a corporate body can create one or more
works. An expression can be realized by one or more persons and/or corporate body; and a
person or corporate body can realize one or more expressions. A manifestation can be
produced by one or more persons or corporate bodies; a person or corporate body can
produce one or more manifestations. An item can be owned by one or more person and/or
corporate body; a person or corporate body can own one or more items [19].
28
Work
Expression
Manifestation
Item
is realized by
is produced by
is owned by
Person
Corporate
is created by
Figure 2.2: Entities and “responsibility” relationships [19]
The entities in the third group, as depicted in Figure 2.3, represent an additional set of entities
that serve as the subjects of works. The group includes concept (an abstract notion or idea),
object (a material thing), event (an action or occurrence), and place (a location). The Figure
2.3 depicts the “subject” relationships between entities in the third group and the work entity
in the first group.
Figure 2.3 indicates that a work can have as its subject one or more than one concept, object,
event, and/or place. Conversely, a concept, object, event, and/or place can be the subject of
one or more than one work. The diagram also depicts the “subject” relationships between
work and the entities in the first and second groups. The diagram indicates that a work can
have as its subject one or more than one work, expression, manifestation, item, person, and/or
corporate body. The FRBR model has been described as a conceptual framework for
developing metadata system suitable for effective indexing [19].
29
WORK
EXPRESSION
MANIFESTATION
ITEM
PERSON
CORPORATE
CONCEPT
OBJECT
EVENT
PLACE
WORK
has as subject
has as subject
has as subject
Figure 2.3: Entities and “subject” relationships [19]
The first major implementation for the IFLA FRBR model was the AustLit project [20].
AustLit is a database that provides information on hundreds of thousands of creative and
critical Australian literature works relating to more than 100,000 Australian authors and
literary organisations. Its coverage spans 1780 to the present day [20]. The AustLit project
also used the “A Boring Core” (ABC) model [21]. ABC is a metadata model developed
within the Harmony International digital library project to provide a common conceptual
model to facilitate interoperability between metadata ontologies from different domains; we
will discuss ontologies further in Section 2.2. The AustLit project augmented the FRBR
bibliographic description model with the ABC event model, thus allowing the FRBR entity
work to have a creation event, the FRBR entity expression to have a realization event and the
manifestation entity to have an embodiment event. AustLit also went on to extend the FRBR
model to represent agents, including such things as birth and death events for agents as well
as others [20]. The AustLit project showed that the implementing the models presents
significant challenges but is achievable, cost effective, offers many benefits to practitioners
and should be considered by a range of information providers[20] .
30
The models mentioned so far do not have any concept of use, certainly none of context,
which is something that is important in the motion picture industry [16]. They were designed
for a specific purpose and they perform that purpose well; they are great information
containers but do not appear to have any higher level semantics needed for more advanced
forms of information exploitation. For example, the lack of context is a significant flaw.
Nonetheless, the IFLA FRBR model has been incorporated within the Open Digital Rights
Language (ODRL) and in that setting the IFLA FRBR model has some sense of use and
context. The ODRL is a Digital Rights Expression Language (DREL) model that links rights
with artifacts and agents [22]. DRELs, sometimes shortened to RELs (Rights Expression
Language), are a machine-processable languages used for Digital Rights Management [23].
Figure 2.4 shows the ODRL model and, as it can be seen, the ODRL model uses the IFLA
model to define the concept of content, in the process, the ODRL model introduces a limited
concept of use and context because of the links content has with Parties (Agents) and Rights.
However, the context is limited to rights and how they pertain to the use of the content by
parties. Even combined with ODRL, IFLA FRBR is not expressive enough to address the
majority of the issues being faced by the motion picture industry.
Figure 2.4: ODRL Model [22]
31
2.1.2 Metadata Schemas and Standards
There is no metadata schema or standard for the motion picture industry as a whole.
However, there have been many metadata schemas and standards for the product of the
motion picture industry, that is, the motion picture itself. More accurately it should be said
that, when the motion picture is encoded into a digital video file, metadata schemas and
standards exist to wrap metadata around the digital video object. This metadata can then be
used to distribute, discover and utilise the digital video object. Schemas and standards also
exist for the purpose of distribution, archiving and preserving the archived digital objects.
Almost all the schemas related to the motion picture industry were intended for one of these
purposes.
The metadata schemas and standards for the video digital objects include the MPEG-7
multimedia content description standard [24, 25], MPEG-21 Digital Item Declaration
Language (DIDL) [26], the MPEG-21 Rights Expression Language (REL) [27],
metadata Encoding and Transmission Standard schema (METS) [28] and metadata Encoding
and Transmission Standard Audio-Visual schema (METS AV) [29]. The MPEG-21 DIDL,
METS and METS Audio-Visual are some of the different standards that exist for object
markup, while some form of REL is used for rights markup. We will explore these standards
in more detail in the remainder of this section.
MPEG-7, formally named "Multimedia Content Description Interface", is a standard for
describing the multimedia content data that supports some degree of interpretation of the
information meaning, which can be passed onto, or accessed by, a device or a computer
program. MPEG-7 is not aimed at any one application in particular; rather, the elements that
MPEG-7 standardizes support as broad a range of applications as possible [24, 25]. That is to
say that, MPEG-7 could be used to describe the content and thus provide a metadata wrapper
for previous MPEG standards such as MPEG-1 and MPEG-2 which were used for encoding
multimedia objects [24]. The objectives of MPEG-7 were to facilitate fast and efficient
searching, filtering and content identification, describe the content characteristics including
audiovisual information and most importantly provide independence between description and
the information itself [24]. In a nutshell, it provided a metadata wrapper for a given
multimedia object.
32
The MPEG-21 DIDL and REL forms part of the MPEG-21 Framework, with the MPEG-21
DIDL expressing in XML the concept of a Digital Item and the MPEG-21 REL expressing
the rights associated with the said Digital Item in eXtensible Rights Markup Language
(XrML) [30] [27]. XrML is based on XML and describes rights, fees and conditions together
with message integrity and entity authentication information [30].
METS has been developed by the Library of Congress to provide an XML document format
for encoding metadata necessary for both management of digital library objects within a
repository and exchange of such objects between repositories [28]. A METS document
consists of seven major sections [28]:
� METS Header - The METS Header contains metadata describing the METS document
itself, including such information as creator, editor, etc.
� Descriptive metadata - The descriptive metadata section can point to descriptive
metadata external to the METS document or contain internally embedded descriptive
metadata, or both. Multiple instances of both external and internal descriptive metadata
can be included in the descriptive metadata section.
� Administrative metadata - The administrative metadata section provides information
regarding how the files were created and stored, intellectual property rights, metadata
regarding the original source object from which the digital library object derives, and
information regarding the provenance of the files comprising the digital library object
(i.e., master/derivative file relationships, and migration/transformation information). As
with descriptive metadata, administrative metadata can be either external to the METS
document or encoded internally.
� File Section - The file section lists all files containing content which comprise the
electronic versions of the digital object. The <file> elements can be grouped within
<fileGrp> elements, to provide for subdividing the files by object version.
� Structural Map - The structural map is the heart of a METS document. It outlines a
hierarchical structure for the digital library object, and links the elements of that structure
to content files and metadata that pertain to each element.
� Structural Links - The Structural Links section of METS allows METS creators to
record the existence of hyperlinks between nodes in the hierarchy outlined in the
Structural Map. This is of particular value in using METS to archive Websites.
33
� Behavior3 - The behaviour section can be used to associate executable behaviours with
content in the METS object. Each behaviour within a behaviour section has an interface
definition element that represents an abstract definition of the set of behaviours
represented by a particular behaviour section. Each behaviour also has a mechanism
element which identifies a module of executable code that implements and runs the
behaviours defined abstractly by the interface definition.
The METS AV extension plugs into the administrative section of a METS document and
allows for audio visual objects to be described with rich technical detail [29]. In theory
METS AV extension can also be used with MPEG-21, thus enabling MPEG-21 to have the
rich technical descriptiveness of METS + METS AV. In addition both METS and MPEG-21
uses Dublin Core (DC) for their descriptive markup [31]. DC is a metadata standard
developed by the Dublin Core metadata Initiative group for that allows for the XML based
descriptive markup of artifacts. In this way, METS and MPEG-21 can share functionality but
they remain separate and intended for different purposes, namely, METS for archiving and
MPEG-21 for distribution, discovery and access of digital objects.
METS administrative section can have links to external rights documents but METS itself
does not provide a vocabulary or syntax [23] However, METS like MPEG-21 can be
augmented with external schemas, including MPEG-21 REL and ODRL. In addition METS
Rights is an extension to METS and can be used to capture minimum rights information for
artifacts [23].
The last XML-schema is the Bitstream Syntax Description Language (BSDL) [32]. The
BSDL is designed to specify the document model of a multimedia bitstream [32] with the
resulting description being able to be transformed to dynamically adapt multimedia data to
the network and terminal capabilities [32]. It is completely focused on the display of product
and the use of the product.
2.1.3 Summary
Ultimately, the problem with these metadata schemas and standards is that while they provide
a semantic layer and provide a semantic container for information, the layer is not semantic
3 Although this thesis predominantly uses British spelling, METS uses American spelling for their tag definition.
34
enough. As a result, many projects aim to combine these standards, or at least parts of the
standards, with ontologies, these projects will be discussed in Section 2.2.2.
2.2 Ontology
The concept of ontology originated in the philosophy. The definition in philosophy of an
ontology is the study of the kinds of things that exist [33]. The idea of ontology, as it exist
within computer science, has its origins in philosophy. However, in practice the computer
scientist uses ontologies more loosely than its philosophical origins would dictate. In
computer science, ontologies are content theories about the sorts of objects, properties of
objects, and relations between objects that are possible in specified domain of knowledge
[33]. As a philosophical construct ontologies are not unknown within the film domain. In
1979 Cavell [34] first spoke of an ontology for film and recently Wood [4] engaged in a
discussion of film ontology in terms of film philosophy. From a philosophical point of view,
ontologies are complex constructs and its use within computing is inevitably viewed as
attempts at complex artificial intelligence. However, most prevalent use within computer
science of ontology is simply to provide a semantic layer that is more sophisticated than the
semantic layer that is possible from using metadata alone. Indeed, as mentioned in Section
2.1.3, most projects opt to build upon the semantic layer provided by metadata by combining
metadata with ontologies.
However, it would be erroneous to assume that all subsets of the computer science
community view and use ontologies in similar manners. They do not. As such, in this section,
we first look at different view that different subsets of the computer science community have
on the ontology. We then explore existing ontologies and their application, in order to gauge
the state of knowledge that currently exists.
2.2.1 The Use of Ontology within Computer Science
The earliest users of ontology within the computer science discipline were the Artificial
Intelligence (AI) communities. They were the first to adopt the concept of ontology from
philosophy and adapt it to suit their purposes. The difference is that while in philosophy an
ontology is about existence, in the computer science it is primarily about meaning as well as
existence [35]. To be precise, an ontology can tell what kinds of things exist in the domain of
some system, how these things can be interrelated and what they mean [35]. Building upon
35
this, in AI, an ontology refers to an engineering artifact, constituted by specific vocabulary
used to describe a certain reality, plus a set of explicit assumptions regarding the intended
meaning of the vocabulary words [36]. This is because within the AI community ontology
has largely come to mean one of two related things, 1) ontology is a representation
vocabulary and 2) ontologies are quintessentially content theories [33]. This view, however,
is not shared by all computer science disciplines.
While the AI community were the first to adopt the concept of ontology into computer
science, ontologies have become an important tool within the Knowledge Modelling (KM)
communities. The prevalent view within the KM community is that it adds a layer of
semantics, semantics that the computer can readily process. This is why ontologies are being
used for data integration across different communities [37] and sometimes even the same
community provided the said community produces heterogeneity in structured data. It is the
KM community who are closely associating metadata and ontologies as they view both as
method of providing a semantically rich container for knowledge content [38]. At the heart of
the KM community definition of an ontology is that an ontology is some formal description
of domain of discourse, intended for sharing among different applications, and expressed in a
language that can be used for reasoning [39].
However, the AI community take issue with the term “ontology” being applied to activities
like conceptual analysis and domain modelling, carried out by means of standard
methodologies [36]. Maintaining that true ontologies are not simply domain models or data
models but that ontological analysis clarifies the structure of knowledge [33] and that a
domain ontology forms the heart of any system of knowledge representation for that domain
[33].
Our view of ontology falls more towards the view of ontologies held by the KM community
and less with those of the AI community. This is simply because, our chief interest is in the
modelling of knowledge of the motion picture industry so as to aid them in the management
of their information. In this, we share the goals of the KM community, who utilise ontologies
for the management of information and knowledge by enabling a semantic layer to be added
to applications that are richer then the semantic layer provided by metadata alone but that do
not have the more cognitive capabilities of the AI.
Ontologies are not without limitations. While much has been made of the potential benefits of
ontologies, with most regarding ontologies as central building bloods of the semantic web
36
and other semantic systems, the number of ontologies that are actually used is rather limited
[40]. In his work Hepp identifies five real reasons that hinder the development of relevant
ontologies [40]. The five reasons identified by Hepp are,
1) Ontology engineering lag versus conceptual dynamics: how easily can the ontology
evolve?
2) Resources consumption: is the cost of production of the ontology is justified by the
benefits of the ontology?
3) Communication between creators and users: is the ontology understood by the domain
practitioners?
4) Incentive conflicts and network externalities: is the agents who invest in the ontology also
benefiting from the ontology?
5) Intellectual property rights: is the creation of an ontology infringing on property rights?
All of Hepp’s reasons are extremely valid but the reason most relevant to ontology design is
the communication between creators and users [40]. Ontologies are usually created by
computer scientists and conform to the “prettiness” of computer science theories and
principles. However, the resultant ontologies might not be easily comprehended by agents of
the domain [40]. Feeling this disconnect, the users are reluctant to use and embrace the
ontology. There is also another aspect to this problem: “the lost in the translation” problem. Is
the ontology that has been created an accurate reflection of the knowledge structure of the
domain? Or is it simply the reflection of the knowledge structure of the creator of the
ontology, assuming the ontology creator has firsthand knowledge of the domain, or the
knowledge structure as perceived by the creator of the ontology? This is a major problem,
compounded by the fact that domain experts do not have the skills to interpret an ontology as
is, i.e. place an ontology represented in a formal ontology language and for most domain
experts it is just gibberish [40].
By keeping and ontology simple and easily graspable by domain experts also addresses
Hepp’s other criticism of ontologies, that of adaptability. If the understanding of the ontology
comes naturally to the domain practitioners, they should not only be more inclined to use it
but should find it easier to adapt as their needs vary. This in turn addresses Hepp’s other
concerns regarding incentivisation. Hepp’s remaining two concerns regarding intellectual
37
property rights and that of cost-benefit analysis are contingent upon each individual
circumstances, not only in regards to the ontology but the organisation within the domain that
seeks to use the ontology. It is beyond the scope of our research to do a cost-benefit analysis
for an ontology for a motion picture industry but we do need to be mindful that when
constructing the ontology we do not infringe on the intellectual property rights of any
individual or organisation.
In the discussion about interpretation of ontologies, it must be noted that ontologies are
governed by axioms which are used in order to express other relationships between concepts
and to constrain their intended interpretation [36], e.g. an “is associated with” relationship
denotes a vague and nebulous relationship. These axioms define what concepts can and
cannot be included in the ontology, as well as provide the rules by which the concepts are to
be linked to each other. In addition, the axioms give guidance as to how the rules are to be
interpreted. Axioms can be considered objects as well [41] and certainly for large-scale
ontology modelling it might be beneficial to consider axioms as objects[41] but are generally
separate from ontologies. Thus, when attempting to understand an ontology, reference must
be made to both the ontology that models the knowledge and the axioms that govern the
modelling. All this makes ontologies difficult to evaluate as a standalone artifact. However,
when that artifact is used within an information system, it is much easier to evaluate the
ontology through application, simply because application of the ontology is much easier for
the domain expert to understand than the ontology itself. Therefore, it would appear that
ontologies can best be evaluated through application. This is something to be mindful of.
When developing an ontology, there is a methodology to it and what needs to be kept in mind
are what are the boundaries of ontology? What types are there of ontology? What is the
structure of ontology [42]. There is also need to distinguishing three main kinds of
information: ontological, quasi-ontological and non-ontological, as well as three types of
ontologies: descriptive, formal and formalized [42]. In addition, there has been work done in
the past to address the issues of categorisation in relation to ontology capture and methods of
handling ambiguous terms [43]. These research all address important issues in ontology
design and development. As such, they inform the development of new ontologies.
At this point, it is important to consider the connection between information models,
metadata models and ontologies. As mentioned in Section 2.1, an information model is an
abstract but formal representation of entities including their properties, relationships and the
38
operations that can be performed on them [18]. Metadata models are a type of information
models that define metadata for a given type of information. Ontologies, as we have learnt, is
a formal representation of knowledge used for reasoning. An ontology is a model, but it is not
an information model as such but a knowledge model. As such, the different between an
ontology and information models, which include metadata models, is the difference that
exists between information and knowledge. Bellinger states that information is data (which is
simply defined as symbols) that are processed to be useful; provides answers to "who",
"what", "where", and "when" questions, while knowledge is the application of data and
information; answers "how" questions [44].
In short, an ontology, which often has close ties with an information model, is used to answer
questions of “how” while information models and metadata models, describe the “what”,
“where” and “when”. By enabling computers to process the “how” question, ontologies open
the door to higher order reasoning and information exploitation that goes beyond the
simplistic “what”, “where” and “when” questions that information models and their metadata
model sub-set can process only.
2.2.2 Existing Ontologies
Just as there is no metadata standard for the motion picture industry as a whole, there is also
no ontology for the motion picture industry as a whole. However, many of the metadata
projects that deal with the production process of the motion picture industry and the motion
picture artifact itself, have been extended to include ontologies. The MPEG-7 and MPEG-21
projects [26], that have been discussed in Section 2.1.2, in particular have lead to a number of
ontologies being developed for the multimedia object that is a motion picture artifact. Before
we go into the details of these projects what must be emphasised is that these projects treat
the motion picture artifact as a generic multimedia object just as the metadata schemas they
extend. They do not identify the motion picture object as a motion picture object which in
itself leads to a loss of context, which is one of the problems we are seeking to address due to
the increased important of context [16].
The metadata standard that has been associated most with ontology development is the
MPEG-7 metadata standard. Hunter, who was involved with the development of the standard,
built a MPEG-7 Ontology [45] as part of the foundation of the semantic web [45]. Hunter
argued that, while XML Schema based MPEG-7 has been ideal for expressing the syntax,
39
structural, cardinality and data-typing constraints required by MPEG-7, XML is not enough
to make MPEG-7 accessible, re-usable and interoperable with other domains [45]. As such
the semantics of the MPEG-7 metadata terms also need to be expressed in an ontology using
a machine-understandable language [45]. This would facilitate the sharing of multimedia
objects over the semantic web. From the point of view of the motion picture industry,
products of the motion picture industry that are described in MPEG-7 can use this ontology
for better distribution. However, this ontology provides no mechanisms for describing the
process by which the motion picture or rather the multimedia object was created.
Hunter was also involved with the Harmony project, which brought together work done on
MPEG-7, among other standards, to define overlapping descriptive vocabularies for
annotating multimedia content [25]. The chief contribution of the Harmony project is the
ABC Data Model and Ontology [21, 46], which was the result of investigation into a general
approach towards metadata interoperability and its particular application in multimedia
digital libraries [47]. In order to facilitate interoperability, the concepts and relationships
described in the ABC model could be used to guide the development of community-specific
vocabularies [47], which the individual communities could express using formalisms such as
RDF to express the possibly complex relationships between the ABC model and their
community-specific vocabularies [47].
The ABC model could be used to describe an event with agents as ‘actors’ for an event [47]
and, in this way, the ABC model could be extended to form a metadata schema and ontology
set for the motion picture domain. However, such an extension can restrict the natural
modelling of the domain, getting us back to the problem identified by Hepp regarding the
dissonance between users and creators of ontology [40]. In addition, the ABC ontology is not
the only extensible core ontology associated with multimedia content. Figure 2.5 illustrates
the ABC top-level class hierarchy with properties.
40
Figure 2.
The CIDOC Conceptual Reference Model
information integration for cultural heritage data and their correlation with library and archive
information [49]. It is a property
controlled exchange of cultural heritage information
knowledge sharing platform with the specific domain of application being the cultural
institutes such as museums, archiv
extendable and extension is encouraged. Indeed work has started to integrate the IFLA FRBR
Model mentioned in Section 2.1.1
with MPEG-7 to describe multimedia in museums
.5: ABC Class Hierarchy with Properties [48]
The CIDOC Conceptual Reference Model (CRM) a high-level ontology to enable
information integration for cultural heritage data and their correlation with library and archive
It is a property-centric ontology that is an international standard for the
controlled exchange of cultural heritage information [49]. It was built as a common
knowledge sharing platform with the specific domain of application being the cultural
institutes such as museums, archives etc. Like the ABC model, the CRM ontologies are
extendable and extension is encouraged. Indeed work has started to integrate the IFLA FRBR
2.1.1 and the CIDOC CRM model [50] and Hunter combined it
7 to describe multimedia in museums [51]. However, once again, while this
el ontology to enable
information integration for cultural heritage data and their correlation with library and archive
centric ontology that is an international standard for the
. It was built as a common
knowledge sharing platform with the specific domain of application being the cultural
es etc. Like the ABC model, the CRM ontologies are
extendable and extension is encouraged. Indeed work has started to integrate the IFLA FRBR
and Hunter combined it
. However, once again, while this
41
ontology can inform the development of a domain ontology for the motion picture industry,
extending it to make the domain ontology would tie the domain ontology to the CIDOC CRM
for no beneficial reason.
The chief benefit gained from extending either the ABC ontology or the CIDOC CRM to
construct an ontology for the motion picture domain is interoperability. However, this raises
the question, with whom is interoperability facilitated if either of those two ontologies are
extended. The short answer is other users of the ontology, which at this current point in time
are mostly archival institutes for the ABC ontology and museums for the CIDOC CRM
ontology. Archiving is one activity far down the timeline that, while it needs to be taken note
of, cannot dictate the development of the ontology as a whole. Therefore, it is better to
construct a free flowing ontology unrestricted from the constraints of a core ontology to best
capture the nuances of the domain, especially since this does not sacrifice interoperability as
such. Ontologies can be mapped from one to another [52, 53] and therefore, should the need
arise, a transition ontology can always be written that maps a domain ontology for the motion
picture industry onto either the ABC ontology or the CIDOC CRM ontology to facilitate
communication between the motion picture industry and a given archival institute that might
be using one of those ontologies.
Other research conducted on combining ontologies with metadata standards include,
Tsinaraki research in ontology driven management [54], indexing [55] and user preference
models for MPEG 7/21 [56]. Arndt is another researcher who added formal semantics to
MPEG-7 [57]. Also there is OREL, which is an ontology-based rights expression language
that allows not only users but machines to handle digital rights at semantic level [58].
However, while these ontologies add an extra layer of semantics to the metadata standards,
the fact that the metadata standards were focused on generic multimedia objects means that
none of them are any substitute for a domain ontology that models both the process and the
product of the motion picture industry.
There are other ontologies that can be seen to be associated with the motion picture industry,
such as the Internet Movie Database (IMDB) Ontology that has been put forward to vastly
improve the knowledge representation in IMDB [59]. There are also models for specific types
of motion picture products such as the model for the archival and searching historical audio-
video materials to support traditional archival activities as well as some advance application
like video summary, speech recognition etc [60]. There are even models for describing
42
scalable and interactive TV services [61]. However, a domain ontology is sorely lacking, thus
leaving open a gap in knowledge that needs to be addressed.
2.2.3 Ontologies in other domains
The domain that has seen the most work in terms of ontology is the medical domain, which
has some of the most extensive ontologies defining multiple aspects of the domain. For
example, the Unified Medical Language System (UMLS) is a repository of biomedical
vocabularies developed by the US National Library of Medicine [62]. UMLS integrates over
2 million names for some 900 000 concepts from more than 60 families of biomedical
vocabularies, as well as 12 million relations among these concepts and is one of many
biomedical resources available to researcher [63]. Other biomedical resources include the
GenBank sequence database incorporates publicly available DNA sequences of more than
105 000 different organisms, primarily through direct submission of sequence data from
individual laboratories and large-scale sequencing projects [64], Genome databases like the
MGD: the Mouse Genome Database [65] and integrated resources such as RefSeq and
LocusLink [66]. In addition, SNOMED CT is a standardised healthcare terminology
including comprehensive coverage of diseases, clinical findings, therapies, procedures and
outcomes [67]. It provides the core general terminology for the electronic health record
(EHR) and contains more than 357,000 concepts with unique meanings and formal logic-
based definitions organised into hierarchies [67]. However, the abundance of resources is not
without its issues.
One common denominator for all of these resources is terminology, i.e. the names of genes,
proteins, diseases, molecular functions, etc., in biomedical texts and the corresponding entries
in the various controlled vocabularies and nomenclatures associated with these resources
[63]. However, having identified terminology as a key integrating factor for biomedical
resources does not imply that all resources have adopted standard vocabularies, which—
whenever existing—would make these resources interoperable [63]. This has lead to projects
such as TAMBIS, which addresses the the specific issue of integrating disparate resources for
bioinformatics through a model of domain knowledge [68]. However, in their work,
Bodenreider presented a different approach to information integration through terminology
integration: the Unified Medical Language System (UMLS) [63]. There is a lesson here for
us, in that we must be mindful of the terminology in use within ontologies for the motion
picture domain.
43
In other types of ontology-related research, Couto in their work explored the benefit from
comparing proteins based on their biological role rather than their sequence by considering
uses all the information in the graph structure of the Gene Ontology and not regarding it a
hierarchy[69]. This is very interesting and speaks to the reality that ontologies are graphs and
not hierarchies.
2.2.4 Ontology Implementation Languages
The last topic that needs to be covered for ontologies is how they are implemented. There are
a number of languages in which ontologies are implemented. Most of the languages are
designed for used on the web. This is because much of the ontology work that has taken place
in the KM communities has revolved around the semantic web and has built on work that was
conducted around metadata. These knowledge representation languages are often based on
XML, again for ease of use on the web and include languages like the Resource Description
Framework (RDF) [70], DARPA Agent Markup Language (DAML) – which itself is based
on RDF [71], Ontology Inference Layer (OIL) [72] and DAML + OIL [73]. RDF
specifications were originally designed as a metadata data model [70]. DAML had its origins
in attempts to construct machine-readable representations of knowledge for the Web and thus
contributed directly to the emergence of the semantic web [71]. OIL was the forerunner for
an ontology infrastructure for the web [72].
The most prevalent language of implementation with the KM communities is Web Ontology
Language (OWL) [74] and DAML+OIL. Indeed chronologically, both DAML and OIL were
superseded by DAML+OIL, which combined the features of both and was the stepping stone
towards the development of OWL [74].
OWL is a set of languages used for knowledge representation in the form of ontology
authoring [74]. OWL is designed for use by applications that need to process the content of
information instead of just presenting information to humans [74]. Recommended by the
World Wide Web Consortium, it comprises of three semantically related languages: OWL
Lite, OWL DL and OWL Full. Each language is a syntactic extension of its simpler
predecessor with OWL Lite being the simplest of the languages and OWL Full being the
most comprehensive and supporting all features of OWL [74]. As a result, while all valid
OWL Lite documents are also valid OWL Full documents, not all valid OWL Full documents
are valid OWL Lite documents [74]. OWL is the standard for the Semantic Web.
44
2.2.5 Summary
The motion picture industry as a whole has never been modelled into an ontology or even
into a metadata schema. An ontology is a richer model type than a metadata schema, which is
why ontologies have been written to compliment metadata schemas that deal with generic
multimedia objects, which is one of the few types of ontologies that do exist that can be
affiliated with the motion picture industry. The extensible ontology models ABC and CRM
can be extended to model the domain, or rather a domain model can be written in the
relationships, vocabularies etc described by these models. However, this would tie the
domain model of the motion picture industry to the rules and dictates of the ABC or CRM
model and therefore leaving such an ontology vulnerable for cognitive dissonance between
users and creators without gaining any real benefits. The reason there would be no benefits is
because the ABC and CRM model are popular amongst archivists and within digital libraries
and extending them leads to some benefits for interoperability with other ontologies based on
the ABC or CRM model. However, we are chiefly interested in a developing a knowledge
model type ontology for the motion picture industry and such model would benefit being
independent of generic extensible ontologies as it would allow the agents of the domain more
freedom of expression, besides which, once an ontology is in place it can be mapped into
other ontology formats, the most important thing is to have an ontology that can be exploited
for different purposes. The prevalent language of ontology implementation in web semantics
is OWL.
2.3 Semantic relatedness
In Chapter 1, we discussed at length the variation of perspective of agents within the motion
picture industry due to the time they are involved with a given motion picture and their role
within the motion picture. As illustrated in Section 1.4, the perspective of agents is strongly
related to the semantic relatedness of concepts in a given ontology. This can potentially have
implications on information needs. Therefore, as part of our literature study we undertook an
investigation of semantic relatedness and how such a thing can be measured.
2.3.1 What is semantic relatedness
The concept of semantic relatedness has its origins in the cognitive theory of similarity and
therefore, to understand semantic relatedness we must first look at similarity. Similarity plays
45
a fundamental role in theories of knowledge and behaviour [75]. It serves as an organizing
principle by which individuals classify objects, form concepts, and make generalizations [75].
However, how individuals classify objects, form concepts and make generalization is
contingent upon their developmental experiences, cultural backgrounds etc [17]. In short,
similarity is context dependent [75]. This idea of context-dependent similarity is further
reinforced by Goodman, who states that similarity is meaningless without a frame of
reference [76]. Therefore, Medin proposes that for similarity to be a useful construct, one
must be able to specify the ways or respects in which two things are similar [77]. As human
beings we are fairly adapt at stating how a man and a woman are similar and how they are
different. We can also explain that a dog and a table are only similar in that they have four
legs. The more we know about a subject, the more we are able to articulate the similarities
between concepts within that subject. Medin argues that this similarity comparison process
itself that is internal to an individual can serve to provide the frame of reference or the
context under which two concepts are similar [77]. It is in this setting that an ontology and
the notion of similarity intersect.
As explained in Section 2.2.1 an ontology is a model of things that exist and the connections
that exist between them. A domain ontology is a model of things that exist in that domain and
the connections that exist between concepts in that domain. Such a model, or at least the
portion of the model that is most relevant to them, exist within agents of that domain and as
such forms the point of reference for said individual [17]. Putting in another way, we all have
our own internal ontology, a subset of which is an extract from the domain ontology focused
around our understanding of the domain. Therefore, the concept of similarity within
ontologies arise in the form of semantic relatedness, if you take the human cognition
perspective, with the added benefits being able to measure semantic relatedness [78].
However, measures of semantic relatedness have predominantly been developed by the
discipline of computational linguistic in the context of lexical databases, usually WordNet
[79]. WordNet is a lexical database for the English Language which groups English words
into sets of synonyms called synsets, provides short, general definitions, and records the
various semantic relations between these synsets [79].
From the linguistics perspective, Blanchard defines semantic relatedness as evaluating the
closeness between two concepts from the whole set of their semantic links [80]. In short, in
linguistics, semantic similarity is not the same as semantic relatedness but they are linked.
46
Blanchard also states that all pairs of concepts with a high semantic similarity value have a
high semantic relatedness value whereas the inverse is not necessarily true [80]. Blanchard
defines semantic distance distinctly from semantic relatedness and states that semantic
distance evaluates the disaffection between two concepts: it is an inverse notion to the
semantic relatedness [80]. Within linguistics, generally speaking a “semantic similarity”
between two objects is related to their commonalities and sometimes their differences [81].
However, those involved with human cognition can use semantic relatedness and semantic
similarity interchangeably because, within the human cognitive process, two objects can be
semantically related that are not in fact linguistically related. An example of this is red and
blue – which are cognitively related because they are both types of colours but are not
necessarily linguistically related. A more subtle example of cognitively related concepts
occurs when undertaking same-difference categorisation [82]. Hampton spoke of relatedness
when investigating the effects of Good-Bad categorization tasks involving concept pairs that
were linguistically different but similar in other ways [82], e.g. plant-animal, natural-
manmade etc. This is the type of semantic relatedness we are interested in and therefore,
when we use the word semantic relatedness we are using it in the manner of the psychologists
(human cognition) and not the linguists.
That being said, WordNet can be said to model the English language. It can be said that
WordNet is a type of domain model for the English language. As such, the work that has
been done in this area is applicable to our research. Indeed, the need to determine the degree
of semantic relatedness between two concepts is a problem that pervades much of
computational linguistics [83] and many of the existing measures for the determination of
semantic similarity/relatedness have their origin in linguistics [84]. Though semantic
relatedness is defined differently in linguistics to the type of relatedness we are interested in,
the measure proposed by the domain of computer linguistics is still of interest to us when
WordNet is viewed as more of a model than a lexical database. As such, these measures are
discussed in Section 2.3.2.
2.3.2 Existing measures of semantic relatedness
Semantic relatedness has been employed in the areas of computational linguistics and
artificial intelligence for a variety of purposes, including, but not limited to, word sense
disambiguation [85], detection and correction of the word spelling errors, text segmentation,
47
image retrieval, multimodal documents retrieval and automatic hypertext linkage [78]. In
addition, Maedche showed that relatedness can be measured across multiple ontologies by
considering ontologies as a two-layered system consisting of a lexical and conceptual layer
[86], although we are only interested in relatedness within a single ontology.
From their survey of semantic relatedness measures, Blanchard et al. found that most
semantic relatedness measures are based on a given ontology, with some also requiring a
corpus of text [80]. Blanchard et al. also found that semantic relatedness measures were also
based on axioms [80]. For example, one axiom that forms the basis of many semantic
relatedness measures is that the shorter the path between two concepts, the more related they
are.
As mentioned before, linguistics is one area where semantic relatedness has been used
extensively. Linguistics and the WordNet lexical taxonomy is the basis for almost all the
existing methods of calculating semantic relatedness. The Hirst-St-Onge measure of semantic
relatedness is that two lexicalized concepts are semantically close if their synsets within
WordNet are connected by a path that is relatively short and does not change directions often
[87]. The shortest path between two synsets is also relied upon by Leacock-Chodorow [88],
while Resnik [89] brings together ontology and corpus to judge similarity by the extent to
which they share information. In addition, the method proposed by Jiang-Conrath [90] also
uses the notion of information content but in the form of the conditional probability. Lin [91]
proposed a semantic relatedness measure that is based on the same elements as those of
Jiang-Conrath [90] but arranged differently to give another probability function. These five
measures have been evaluated in the WordNet context by Budanitsky for application in
linguistics [83] and he found that that the probability-based measure of Jiang-Conrath [90] is
the best when applied to linguistics in a WordNet context [83].
However, while this is useful for linguistics, there is no indication that all these WordNet
based measures can be easily applied to non-relatedness calculation where the relatedness is
based on utility not semantic meaning in a linguistic sense. For example, a pencil and eraser
are related because they are used together but are not linguistically related. The answer is that
while the measures perhaps should not be used as is, the principles behind the measures
probably could.
48
To this end, even though lexilogical4 in origin, the path-based measures proposed
independently by Hirst [87] and Leacock [88] can inform the development of new measure
for semantic relatedness. Certainly the edge counting method proposed by Rada [92] is
adaptable to any graph or taxonomy. Rada proposed a metric, termed simply distance, to
measure the relatedness of nodes in a semantic net [92]. Distance defined as the average
minimum path length over all pair-wise combinations of nodes between two subsets of nodes
[92]. Distance has been successfully used to assess the conceptual distance between sets of
concepts when used on a semantic net of hierarchical, i.e. taxonomical, relations [92]. Rada
found that the judgements of distance metric significantly correlated with the distance
judgements that humans make [92]. Therefore, the conceptual distance between nodes in a
taxonomical model is certainly a viable measure of semantic relatedness.
However, the method proposed by Rada is simple edge counting and that leads to erroneous
similarity measures due to the problem of abstraction, as identified by Li [78]. Li [78] also
works with a lexilogical taxonomy and observed that certain concepts are more abstract than
others in a hierarchical semantic knowledge base and if this abstraction is not taken into
account, concepts that are not closely related appear closely related when a measure of their
relatedness is taken [78].
For example, Li uses the hierarchical semantic knowledge shown in Figure 2.6 to illustrate
that due to the structure of the hierarchy, many concepts can appear to be closely associated
when only simple edge counting is employed. Referring to the diagram, the taxonomy has
animals and persons at the same hierarchical level, as it does the concepts: adult, male,
female and juvenile. However, the concepts of adult, male, female and juvenile are highly
abstract and share very few properties beyond their parent. Simple edge counting has no
method of reflecting this abstraction.
4 Lexilogical is the adjective form of Lexilogy, which is the branch of linguistics that with the lexical component of language.
49
Figure 2.6: Hierarchical semantic knowledge base [78]
Li’s proposed solution involves modifying the direct path length method by utilising more
information [78]. Li uses length, depth and local semantic density, where length is the
distance between two nodes (edge counting), depth is the relative position of the two nodes in
the hierarchy and semantic density measures the number of connections a node has [78]. By
taking into account more variables, Li obtains a more precise picture of the abstractness of
concepts. However, this method is not necessarily transferrable to an ontology which might
have more of a graph structure and with layers of varying abstraction. Because the
determination of the degree of abstraction is a non-trivial task, it is still an open problem with
measurement of semantic relatedness.
2.3.3 Summary
Semantic relatedness measures have been developed and used in a variety of ways. While the
current measures are all for linguistic purposes, they have underlying principles that can be
used to develop a relatedness metric for the measurement of perspective and the general
relatedness of concepts in a non-linguistic setting. However, abstraction can lead to errors in
the determination of relatedness and is an issue that must be overcome.
2.4 Information Extraction
The last part of our literature review concentrates on the idea of information extraction itself,
which is the driving motivation behind our research. Information extraction is a form of
information exploitation that goes beyond simple information retrieval. Marchionini and
50
White have commented “Retrieval is sufficient when the need is well-defined in the
searcher’s mind; however, when searchers are seeking information for learning, decision
making, and other complex mental activities that take place over time, retrieval is necessary
but not sufficient.” [1]. The key phrase in that comment is “well-defined”. Studies have
shown that while a search engine typically treats each search interaction between itself and
the user as independent transactions, it is not so from the perspective of the user [93]. From
the perspective of the user, it is a dialogue where the user often starts with an ill-defined
search criteria in their mind but continuously refines the criteria based on the results they
obtain from the information retrieval tool, such as a search engine [93, 94]. Spink has
conducted an extensive study on the pattern of user queries in the setting of search engines to
arrive at this deduction [94] which is supported by the work of Jansen’s study of query
modification patterns during web searching [95].
Another aspect of Spink’s work has been in the concept of relevance [96]. Most importantly
for our purpose, Spink’s work shows that the user’s ability to rank relevance and irrelevance
of information is based on user’s familiarity of the subject matter to the information [96].
What is unknown in this field is whether domain ontologies in conjunction with a semantic
relatedness measure can be used to determine a given user’s ability to rank relevance. Also
unknown is whether it is possible to determine the actual information the user is looking to
extract by predicting the use for which they wish to employ the information. However the
literature does suggest that such uses might be possible.
For example, Tao recently proposed a novel contribution to the field of information retrieval
in the form of an ontology-based knowledge retrieval framework which requires a world
knowledge base describing and specifying the background knowledge possessed by humans
[97]. However Tao observes that such a knowledge base does not exist [97], but within the
confines of a domain, the domain ontology would act as world knowledge base; being
knowledge model of the world of that domain. Note that Tao talks about knowledge retrieval,
not information extraction. The difference between knowledge and information is not simply
a semantic one. Knowledge implies understanding and in order to retrieve knowledge, there
must be some understanding of the user’s personal ontology and the context within which the
user is operating.
Other researchers in information extraction are approaching the problem from widely
different directions. Some are adopting utility-based models from ecology and psychology to
51
predict human behaviour for specific information-seeking conditions [98]. Others are
exploring the social and collaborative search experience [99, 100]. However, for us, given
the motion picture industry’s strong focus on agents and their roles, the knowledge-based
approach suggested by Tao seems a more promising direction to pursue.
2.5 Summary
From the review of the literature we have identified a number of gaps in existing work.
Firstly, there is no domain ontology that models the entirety of the motion picture industry.
All current ontologies focus on the product, the motion picture itself, and treat it as a generic
multimedia object and the ontologies are chiefly built for archival purposes. Secondly, while
a number of semantic relatedness metrics exist, all of them have been developed for linguistic
purposes and the ones that can be adapted for use in general setting suffer from problems
with abstraction. That is, they do not adequately account for abstraction within hierarchical
semantic knowledge bases and thus lead to erroneous relatedness results. Lastly, there exists
an opportunity to explore the use of an ontology in combination with semantic relatedness to
better support user perspective sensitivity in information exploitation.
52
53
3 Research Plan “It's the question that drives us, Neo. It's the question that brought you here. You know the
question, just as I did.” - Trinity, The Matrix
3.1 Research Questions
There are three interconnected questions that drive the research presented in thesis. The
theme that links them is the theme of agents and how they view the domain of discourse.
These main questions incorporate a serious of sub-questions that break down the complex
nature of the over-arching question. These questions are given below and informed the basis
of the research methodology discussed in the next Section 3.2.
1. How to model the domain of discourse to facilitate the different perspectives of agents?
1.1. How to find the concepts of the domain?
1.2. How to model the relationships between the concepts?
1.3. How to model the relationship of the agents of the system to the concepts?
1.4. What other aspects of the domain need to be captured in the model?
2. Can agent’s perspective be exploited to better serve the needs of the Motion Picture
Industry?
2.1. How can agent perspective be measured?
2.2. How can agent perspective be exploited to better meet the information needs of the
Motion Picture Industry?
3. What kind of a system could exploit agent perspective to better serve the needs of the
Motion Picture Industry?
3.1. How would the various models be used?
3.2. How would the agent perspective be used?
3.2 Research Methodology
The research methodology is derived from the research question and is presented in Figure
3.1.
54
Research Methodology
Data GatheringAFTRS Interviews
Literature Search
Relatedness Survey
Modeling
Ontology
Metadata Schema
Relatedness Metric
Evaluation
Relatedness Metric
Loculus System Evaluation
Implementation
Loculus System
Data AnalysisAxiom analysis
Information Model Development
Test case formulation
System design
Figure 3.1: The Research Methodology Steps
From the research methodology steps the research workflow that is shown in Figure 3.2 was
derived. Most of the tasks of the workflow are qualitative in nature.
The initial tasks were gathering data about the concepts of the motion picture industry
through a review of industry literature and interviews with industry practitioners. The
gathered data was then analysed to determine the derive relationship and modelling
information. The modelling phase, in turn, consisted of two parallel activities. One activity
involved the creation of a metadata schema for later use within the Loculus System. The
other activity consisted of three tasks, the first of which was the formulation of axioms which
formed the foundation of the development of ontology (the second task) and the relatedness
metric (the third task).
The Loculus System was then built around the ontology and the relatedness metric, with the
metadata schema supporting the functionality of the system. The system, in short, is used to
demonstrate to what use the ontology and the relatedness metric can be put to.
The evaluation phase followed the implementation phase and involved two parallel activities.
One activity evaluated the relatedness metric, and through that evaluation, also the ontology.
This was done through a web survey and the gathered data was then analysed for the
evaluation. The other activity evaluated the algorithms of the system through unit testing.
55
Evaluation Phase
Modeling Phase
Data Gathering Phase
Data Analysis Phase
Formulation of Axioms
Ontology DevelopmentMetadata Schema
Development
Relatedness Metric
Loculus SystemImplementation Phase
Relatedness MetricEvaluation
Loculus SystemEvaluation
Unit TestingRelatedness Survey Data
Gathering
Analysis of Results
Figure 3.2: The Research Workflow
56
57
4 Loculus Ontology “You're late, do you have no concept of time?” - Dr. Emmett Brown, Back to the Future
Our first research question, as presented in Section 3.1, is: how to model the domain of
discourse to facilitate the different perspectives of agents? One of the answers to this question
is that an ontology can model the domain of discourse. If the ontology is constructed
correctly, then it can to facilitate different perspectives of agents. In this chapter we present
the Loculus Ontology as our answer to the first research question.
What is an ontology? To summarise the discussion from the literature review Section 2.2.1,
in computer science, ontologies are content theory about the sorts of objects, properties of
objects, and relations between objects that are possible in specified domain of knowledge
[33]. In addition, ontologies are governed by axioms which are used in order to express other
relationships between concepts and to constrain their intended interpretation [36]. The
definition in philosophy of ontology is the study of the kinds of things that exist [33].
As elaborated in Section 2.2.1, as a philosophical construct ontologies are not unknown
within the motion picture domain [4, 34]. However as a computer science construct, there is
no domain ontology for the motion picture industry. It is for this reason we undertook the
development of the Loculus Ontology for the motion picture industry, which is an ontology
that covers both the product, the motion picture, and the process by which the product is
created. This focus on both the product and process differentiates the Loculus ontology from
existing ontologies. In addition, the Loculus ontology incorporates time and people; two
integral aspects of the motion picture industry that provide the context for many of the
industry specific concepts within the domain.
In this chapter, we first present the conceptual foundation of the Loculus ontology before
presenting and discussing the axioms that govern the ontology. We explore the three axes that
naturally result from the structure of the ontology before discussing the implementation of the
ontology and present some extracts from the ontology, which was implemented in OWL,
using Altova SemanticWorks.
It is not feasible to present the entire Loculus ontology within this thesis. However, it may be
downloaded from the World Wide Web by following the instructions in Appendix A.
58
4.1 Conceptual Foundation of the Ontology
In essence, the foundation of the ontology is two layered. The first layer comprises the
methods by which the concepts of the ontology are revealed and how the relationships
between them are discovered. The second layer is the axioms that govern the ontology.
In terms of the first layer, the motion picture industry concepts are terms that exist in the
natural language of the industry. These are located within the literature of and about the
industry. The relationships between the concepts originate in the literature [101] and were
refined through consultation with industry professionals from AFTRS.
As we are interested in the discourse of the motion picture industry, we sought out the source
that had the widest range of contributors to a folksonomy as a starting point for gathering
concepts of the industry. As such, we decided that Wikipedia would be a good starting point
to get a quick footing into the discourse. Initially, we started with the term “screenplay”, a
term restricted to the motion picture industry and looked-up that term in Wikipedia. That
term lead us to Wikipedia categories film and video terminology, film making, film
production and film techniques. We then used these categories to compile a list of
terminology for the motion picture industry. We cannot claim this process provided complete
coverage of all terms in the motion picture industry; however, it created a substantial basis to
begin the modelling process. We did not model the terminology concepts based on Wikipedia
entries. For the actual modelling we referred to industry practitioners and industry literature.
While we obtain the list of terminology from Wikipedia, for a list of agents we turned to
IMDB, which includes a comprehensive list of credits (the often tedious enumeration of the
people involved in the development of the motion picture). We picked the movie “Stardust”,
which at the time was a recently released feature film from Hollywood; we then referred to
the full cast and crew list for that film to get a list of agents, where agents is a term that
collectively denotes the people involved in the industry. This list of agents was cross-checked
with the Australian movie “Lantana”. Once again, the actually modelling of the agents and
linking them with the terminology concepts was done through consultation with industry
practitioners and from investigation of industry literature, such as the book “On Film-
Making: An Introduction to the craft of the Director” by Alexander Mackendrick, a
screenwriter, storyboard editor and director, as well as the head of the California Institute of
Arts [101].
59
While investigating the concepts of the industry and relationships that link the concepts, the
importance of two contexts became very clear. It seems that the majority of the concepts in
the industry exist within the context of when the concepts are used and by whom they are
used. In short, all industry specific concepts have a temporal context and an agent context.
The aspects are related and have bearing on each other, feeding their importance. Both these
contexts were taken into account when developing the axioms that are the corner stone of the
ontology. The axioms are discussed in detail in Sections 4.2.
4.2 Axioms
Ontologies are governed by axioms which are used in order to express other relationships
between concepts and to constrain their intended interpretation [36]. These axioms define
what concepts can and cannot be included in the ontology, as well as provide the rules by
which the concepts are to be linked to each other. In addition, the axioms give guidance as to
how the rules are to be interpreted.
The Loculus ontology is governed by three types of axioms:
• General axioms which in turn is sub-divided into inclusion axioms and temporal axioms;
• Concept axioms which in turn is sub-divided into inheritance axioms, linkage axioms and
terminology axioms;
• Meta-link axioms which does not have any sub-divisions but links directly with the
linkage axioms as the Meta-link axioms provide names for the links governed by the
linkage axioms;
In this section we present these axioms in detail.
4.2.1 General Axioms
The General Axioms govern what can and cannot be included in the ontology (inclusion
axiom) as well as provides guidelines for capturing the temporal aspects of the industry
(temporal axiom).
4.2.1.1 Inclusion Axioms
To the extent that it is possible, we aim to capture in this ontology the natural discourse of the
motion picture industry. Therefore, the central tenet of the inclusion axiom is that if a concept
exists in the natural language discourse of the motion picture industry then it should be
60
considered for inclusion in the Loculus ontology. The first two inclusion axioms codify this
tenet.
[Inclusion 1] When expressed in natural language, concepts are considered to be part of
the discourse of the industry.
The reasoning behind this is that the concepts present within the motion picture industry is
part of the natural language used by the practitioners of the industry. While not made explicit
by the axiom, we are trying to avoid artificial constructs that are often employed in data and
knowledge modelling that are essentially convenient placeholders for a group of objects that
share some sort of a characteristic, e.g. a concept called “Things that have names”. The
reason for this is simply that as a domain ontology it is better to avoid such artificial
modelling constructs to keep the ontology as closely aligned with the domain as possible.
What must be noted here is that we cannot capture concepts that are not articulated. If the
industry has no term for a tacit understanding, then the concept is so implicit that it cannot be
captured. This is one of those situations where an example cannot be given because being
able to give an example means that the concept can be articulated and as such is part of the
discourse, therefore should be captured in the ontology.
It must be here noted that by adhering to the natural language of the industry and not using of
artificial constructs, that might make the model for aesthetically pleasing from a software
engineering perspective but that have no meaning within the industry, we partially mitigate
the problem identified by Hepp regarding the loss of meaning that is often experienced by
creators and intended users of ontologies [40].
[Inclusion 2] If a concept is used in the motion picture industry and is part of the
established discourse of the industry, then it should appear in at least one of the ontologies.
As the ontology is supposed to be a complete ontology, it must cover all concepts that form
part of the industry’s established discourse; however it should not cover things that are not
part of the established discourse of the industry. For example, concepts such as surgery,
triathlon and Field-programmable gate array should not be part of the ontology.
61
[Inclusion 3] Concepts that are limited to or that have a specific meaning within the
motion picture industry are considered to be motion picture industry (MPI) concepts.
This is all terms that are limited to the motion picture industry and hold little or no meaning
outside the context of the industry. For example Boom (the big microphones used to record
the actors) is a term limited to the motion picture industry and has little or no meaning
outside the context of the industry.
However, on occasion this would also apply to common terms that have special meaning
within the industry, e.g. Treatment - which within the context of the motion picture industry
means a document produced as a pitch document for a new motion picture project.
[Inclusion 4] Concepts that are not limited to or do not have a specific meaning within the
motion picture industry are considered to be “common” concepts.
Common concepts will be modelled in the Loculus ontology only to the extent where it is
necessary to model the concepts and process of the industry and/or where the concepts appear
so frequently that it would be remiss to not model them. That is, the criteria for including a
common concept are questions like, “is this part of a form?”; “is this used during the business
process of the motion picture industry?” etc.
For example, the ontology should cover the concept of lunch as it forms part of the business
process of the industry, e.g. allocation of time for lunch during shooting, catering for lunch
during shoots etc. On the other hand the concept of surgery is not a daily part of the business
of the industry.
[Inclusion 5] Concepts that deal with roles as played by natural or legal entities are defined
to be agent concepts.
For example, Actor, Editor and Production Studio are all agents that are involved in the
process of the motion picture industry and therefore should be part of the Loculus ontology.
On the other hand, submariner, tri-athlete and aero-space engineers should not be part of the
Loculus ontology as there are not agents who are involved in the daily process of the motion
picture industry.
62
4.2.1.2 Temporal Context
Before we present the Temporal Axioms, we must discuss the temporal context which gives
rise to the temporal axioms. As discussed in Section 1.1.1, the concept of time is very
important to the motion picture industry as all processes of the industry exists in the context
of the production cycle and the product itself develops in life stages over time. The timelines
do not have to be continuous and certainly the early stages of the both timelines can be
suspended and resumed.
For example, as mentioned in Section 1.1.2, The Curious Case of Benjamin Buttons was in
pre-production since at least 1994 when film industry executives were first approached with
the possibility of filming an adaptation of the F. Scott Fitzgerald short story of the same
name, but production did not start until sometime in 2007 and film was finally released in
2008 [5]. That is not to say that it was continuously being worked on. Indeed, work was
suspended for long stretches of time as those spearheading the project engaged in other
projects.
This is fairly common within the motion picture industry which can often have long
production process with bursts of activity marking critical development points for the project
or everything can be over and done within a short sharp burst of activity. However, the very
fact that all motion picture industry specific concepts exist within the context of a production
process timeline and a product development timeline means that concept of time is very
important in modelling the concepts of the motion picture industry.
The two timelines, shown in Figure 4.1, are linked but do not overlap. This is because they
represent different things. One represents the process and the other the development of the
product. The timelines are presented into detail below.
pre-production production post-production
conception production utilisation
distribution discovery access preservation
reuse/re purpose
Production Cycle
Life Stage
Production Cycle as a whole
Figure 4.1: The two timelines of the industry (reusing Figure 1.1)
The production cycle is the timeline for the process. It defines the stages in which the
different processes that are undertaken to make the motion picture happen. The production
63
cycle is broken into three phases: pre-production, production and post-production. It is hard
to define when pre-production starts, as pre-production usually involves imprecise tasks such
as getting the basic concepts of the film to such a state so that it is given the go-ahead. The
production phase starts on the first day of shooting and ends on the last day of shooting. As
soon as production ends, post production starts in earnest, although some post-production
activities, e.g. special effects for scenes already filmed, might have began while the bulk of
the motion picture was still in the production phase. Post-production encompasses everything
after production. The reason for this is because there is always something to do whether it be
to produce the final cut, to market the final cut or to digitally re-master the motion picture for
a new generation or simply to preserve it. As such, there is merit in saying a completed
motion picture is always in post-production, since there really is no event that can be marked
as “the end”.
The life stages timeline charts the progress of the product: the artifact of a motion picture
itself. Not necessarily a single instance of the artifact but the concept of the artifact. These
life stages are conception, production and utilization. Utilization in turn comprises of
distribution, discovery, access, reuse/repurpose and preservation. The life stages do not map
exactly to the production cycle, nor do all motion pictures reach all life stages. A motion
picture is in the conception stage when it is conceived and is being fleshed out. The latter part
of conception would correlate with pre-production. A motion picture is in production life
stage when it is being produced; so the latter parts of pre-production, all of production and
the post-production activities that end with the creation of the final cut would correlate with
this stage. Utilization spans the remainder of post-production. However, while there is an
order in which the life stages must be reached, the life stages are not a good measure of
sequence nor duration as a motion picture can exist in multiple life stages at once and return
to a previous life stage under certain circumstances. Certainly, the sub-stages of utilization,
discovery and access happen multiple times.
The important question to address at this point is why the two timelines do not overlap
perfectly. The answer lies in the fact that the birth of the product, i.e. the conception stage,
does not necessary have to involve any formal processes recognised by the industry. It could
merely be a discussion among agents of the industry over a quiet drink that fleshes out the
details of the project, or a conversation held on the set of one project with people who might
want to work in the new project. Until some formal process starts, the production cycle
cannot start. However, that is not to say the product is not being developed, even if it is in just
64
the heads of the agents involved. This goes towards the multifaceted nature of the motion
picture product.
4.2.1.3 Temporal Axioms
In light of the temporal contexts detailed in the previous section, the temporal axioms detail
how the concepts, mainly the agent and MPI specific concepts, exist in the context of the two
timelines of the industry. While the axioms only reference the concept of temporal phases
and changes in said phase, when reading the axioms it must be kept in mind that the temporal
phase for the motion picture Industry are defined by the temporal aspect of the motion picture
industry and its two timelines.
[Temporal 1] Given the importance of time, all concepts that are not common concepts
must have an explicit link to one or more of the temporal phases or the entire temporal
phase as a whole, either directly or inherited through a parent concept.
This is not to say that common concepts cannot have a temporal aspect, but only the MPI
specific concepts have to have a temporal aspect to it. This is because, as mentioned before,
within the industry time matters and almost all concepts have a temporal aspect in practice.
However, common concepts by their very nature are general and, while they may have
temporal aspects attached to them, it cannot be mandated. For example, Lunch is a common
concept that is most frequently associated with the business process during production. As
such Lunch can have a temporal phase association. On the other hand, Address cannot be
associated with any particular temporal phase.
[Temporal 2] Motion picture industry agents must have a link to one or more temporal
phases of the production cycle or to the production cycle as a whole. Common concept
agents do not always have to have a link to the temporal phase of the production cycle.
As mentioned before, the agents in the motion picture industry can be either long term or
short term, where short-term agents are associated with only one phase while long-term
agents are associated with two or more phases of the whole production cycle. For example,
the Producer is involved with a given motion picture during the entire production cycle. On
the other hand, the Boom Swinger is only involved during production phase of the production
cycle.
Therefore the agent ontology must reflect the short-term or long-term nature of a given agent
role inherent in the nature of the industry.
65
[Temporal 3] All concepts that are not agent or common must have a classification in
reference to the life stages of a motion picture.
Within the motion picture industry common concepts such as Action are not generally
associated with life stages and either are agents. The reason for agents never having life
stages is to do with the discourse of the industry. Within the industry the agents are not
associated with life stages, they are only associated with the phases of the production cycle.
4.2.2 Concept Axioms
The Concept Axioms govern the relationship between concepts. As shown in Figure 4.2 ,
concepts are linked with each other vertically and horizontally. The “vertical” relationships
are governed by the Inheritance Axioms, while the “horizontal”/property-type relationships
are governed by the Linkage Axioms. The Linkage Axioms also capture the Agent context of
the motion picture industry.
editing
action
editor
inherits from
is performed by
Figure 4.2: The concept of editing with vertical and horizontal links
4.2.2.1 Inheritance Axioms
The Inheritance Axioms govern the vertical relationship between concepts, as well as
dictating the level of abstractions of concepts. The defining test for the abstraction of a
concept is the “Get me a…” test. “Get me a crew” is a statement that is too general and would
prompt the question “Which type of crewmember are you referring to?”. On the other hand
the statement “get me an actor” is sufficiently detailed that such a statement could be
complied with. The highest level of abstractness in the ontology is represented by a set of
66
common concepts that are referred to as the root concepts. The roots concepts are identified
in the first inheritance axiom.
[Inheritance 1] All concepts are children of the abstract terms
This is the key inheritance axiom and in the case of the Loculus ontology the abstract terms
or root concepts are: Agent, Artifact, Tool, Technique, Description, Action and Process. All
concepts, whether they are common or MPI specific have their origin in these root concepts.
These root concepts are part of the discourse of the industry but tacitly understood to be the
parents of all the other concepts (tacitly because industry practitioners do not think in terms
of inheritance, abstract and concrete concepts). However, that is not to say that they do not
have an instinctive understanding that certain concepts are better defined than others.
Closely related to this axiom is the next axiom, which reinforces that notion that the root
concepts are the highest level of abstraction as dictated by the “Get me a...” test.
[Inheritance 2] The abstract terms represent the highest level of abstraction
In addition to the previously mentioned points, it must be noted that the root concepts are
parents to distinctly different child concepts, e.g. though very different types of actions, both
Editing and Acting are children of the root concept Action.
Another point to take note of is how more concrete Editing and Acting are as concepts
compared to Action. Figure 4.3 shows the inheritance hierarchy featuring all the root concepts
and one of the concepts that inherit from it. As it can be seen that the jump in concreteness is
the greatest when at the first inheritance from the root, e.g. a Acting is several times more
concrete than the abstract concept of Action but, while more specific, the difference in
concreteness/abstractness between a Method Acting and a Acting are not as great as those
between Action and Acting.
This understanding of increasing and decreasing abstractness is captured further in axioms 3
and 4, which are given below.
[Inheritance 3] In an inheritance hierarchy, the top level concepts are more abstract than
the bottom level concepts.
As explained before with the example of Action, Acting and Method Acting. Each step
vertically down represents a more specific concept. However, the first step down remains the
67
biggest step from abstract to concrete. This is further reflected in the other examples
presented in Figure 4.3.
[Inheritance 4] Assuming that the immediate parent of a concept is not a root concept, the
concept has a closer link to its immediate parent and by extension its immediate child, than
every other concept in its hierarchy.
As mentioned previously, the Root concepts are so abstract that by their very nature they
cannot be considered closely coupled with their children. On the other hand, more concrete
classes would be closely coupled with their children as their children would generally be a
more specific form of the parent concept, e.g. Method Acting is closer to Acting. The essence
of this axiom is to clarify the degree of closeness as movement is made along the inheritance
hierarchy.
Inheritance Hierarchy
artifact agent action description process technique tool
motionpicture
people acting category rehearsal montage camera
castmethodacting
emotivecategory
digitalcamera
Figure 4.3: Inheritance hierarchy of some of the concepts within the motion picture industry
68
[Inheritance 5] A concept that inherits from a common concept is not by virtue of
inheritance considered to be a common concept as the child concept may be specific to the
motion picture industry.
The root concepts are all common concepts but the children that result from them can be
concepts specific to the motion picture industry. This is an observable fact in the industry and
results because the common concepts came first and were extended into the specific concepts
by the industry as it developed. For example, Acting - a specific concept, is a child of Action
- a common concept.
However, once a common concept is extended into becoming specific, it is impossible for its
children to be anything but a concept specific to the motion picture industry. This is both an
observable fact as well as dictated by the inheritance axioms 3 and 4 that say that concepts
become more concrete as one moves down the inheritance tree. Inheritance axiom 6 and 7
codifies this observation.
[Inheritance 6] If a concept is a child of a specific concept then it follows that it too is a
specific concept.
For example, Method Acting is the child concept of Acting, where both are specific and
Method Acting cannot be any less specific than Acting. In short, common concepts can be
parents of specific concepts, but specific concepts cannot be parents of common concepts.
[Inheritance 7] If a concept is descended both from a common concept and a specific
concept then it is considered to be specific to the motion picture industry.
For example, Cut-away is the child of the specific concept Editing and the common concept
Technique. Once again, just having a specific concept included when it is a case of joint
inheritance, means that the resulting child cannot be a common concept.
The last Inheritance Axiom is very different from the other axioms presented so far and
relates directly to the Linkage Axioms that are to follow. Basically, the next axioms dictate
how the linkages of the parents are to be inherited by the child.
69
[Inheritance 8] Child concepts inherit all linkages of their parent, unless the linkages are
specifically overridden for the child.
For example, Cross-cutting inherits “is performed by Editor” through its parent concept of
Editing. This is a common rule usually applied during inheritance. The child is usually
assumed to inherit all its parents’ properties unless explicitly overridden.
4.2.2.2 Agent Context
As mentioned before, one of the functions of the Linkage axioms is to govern how the agent
context of the motion picture industry is captured by the Loculus Ontology. However, what is
the agent context of the motion picture industry? In this section, we briefly discuss the agent
context before we move on to presenting the actual Linkage axioms in the next section.
The concentration of motion picture industry specific agents is in the production phase. In the
minds of the audience, these agents of the motion picture industry are divided into two
categories, the cast and the crew, where the cast consists of agents in front of the camera and
the crew consists of agents behind the camera. However, upon closer inspection it becomes
clear that the situation is a lot more complicated than that. For starters, crew can be divided
into creative crew (the costume designer) and technical crew (cameraman), with some crew
agents being both creative and technical (directory of photography). These different roles
dictate the type of concepts the agents would be most familiar with and the type of concepts
the agents are least familiar with. Highly creative roles would not be aware of the concepts
involved in the highly technical roles and vice-versa. More importantly, within the industry
there is a strong association between the majority of concepts and the agents who are chiefly
associated with those concepts. As such the agent context of the concepts within the motion
picture industry domain of discourse must be an integral part of any domain ontology to
correctly reflect the nature of the industry.
The agent context is not independent of the temporal context; the temporal context applies to
the agents as well. Agents, both creative and technical, can either be involved with only one
phase of the production cycle, making them strictly short-term agents like the editor who is
only involved in post-production, or they can be involved with multiple phases of the
production cycle, making them longer-term agents like the producer who is involved in all
phases of the production cycle. Agents are never associated with the life stage timeline
because the industry never associates them with the life stage timeline.
70
It must be noted that there are other agents who are involved with the industry, such as
journalists, archivists, and financiers etc, but who are not exclusive to the industry. These
agents are largely involved during pre-production and post-production. However, agents
within the industry do not necessarily consider these other agents to be part of the industry,
rather they are more supporting of the industry. They would often identify themselves as
belonging to a different industry, i.e. journalist to the journalism/media.
The basis for inclusion here is ending credits and/or self-identification. If an agent is included
in the ending credits of a motion picture then it can be seen as the industry affirming that, that
agent role is part of the industry. However, ending credits alone is not sufficient marker of
motion picture industry inclusion as many who are members of the motion picture industry
would not necessarily be mentioned in the credits, e.g. the talent managers of actors. There is
also the situation where the ending credit may credit agents such as personal assistents,
caterers and other common concept agents who do not self identify as being part of the
motion picture industry. This is why self-identification is an additional basis for inclusion.
Self-identification is what answers agents return when they are asked “are you part of the
motion picture industry?”
4.2.2.3 Linkage Axioms
The Linkage Axioms govern how the concepts are linked “horizontally” through meta-links
or, in other words, the Linkage Axioms govern the relationship between concepts (other than
inheritance relationships). There are two types of links possible in the Loculus Ontology,
weak links and strong links. Weak links connect two concepts in a vague abstract manner,
indicating that while it is known that the two concepts have a direct relationship between
them, the exact nature of the relationship is nebulous and difficult to articulate, varying
greatly between motion picture projects. A strong link on the other hand links concepts
together in a specific and concrete manner, indicating that, not only do the two concepts have
a relationship between them, the relationship is well known and well understood. The first
two Linkage axioms codifies this:
[Linkage 1] Two concepts where neither of them are agents can be linked through a weak
link which implies that, while the concepts are linked, the link is abstract and non-specific.
For example, while the concept of Prestige is associated with the Film Festivals, the exact
nature of the relationship is not easy to articulate. Moreover, prestige is also very subjective
and there are other factors that dictate whether a given Film Festival is prestigious or not and
71
sometimes there are issues regarding who is receiving and who is giving the prestige. A new
and unknown film maker will undoubtedly be the one receiving all the prestige should their
film be accepted for screening at the Brisbane International Film Festival (BIFF). On the
other hand, if the BIFF manages to secure a premier for the new film of an established and
famous film maker for their festival, it is in fact the BIFF which is receiving prestige by
association with the famous film maker, while the film maker is likely not receiving any
prestige simply because the film maker’s base level of prestige is higher than that of BIFF.
As such, while prestige and Film festivals are related, the nature of the relationship is vague
in nature and therefore weak.
[Linkage 2] Two concepts where neither of them are agents, can be linked through a
strong link which implies that the link is specific and precise.
A motion picture is screened at Film Festival, this is a relationship readily articulated and
does not vary. As such the link between motion picture and Film Festival is a strong link.
There is a very important reason why both Linkage axioms 1 and 2 make a distinction
between agents and non-agent concepts. The reason is that the presence of the agent alters the
interpretation of the relationship. When an agent is involved, the connection between the
agent and the non-agent concept gives an indication of the perspective of the agent in
reference to the non-agent. However, when two non-agents are involved, there is no
perspective as only an agent is capable of perspective. In addition, when one of the concepts
being linked is an agent, the ontology is essentially attempting to capture the agent context of
the motion picture industry and, in essence, map the perspective of the agent in relation to the
concept. However, the basis behind the strong and weak relationships remains the same.
[Linkage 3] If an agent has a strong link to a non-agent concept then it is understood that
the agent has a deep involvement in the non-agent concept.
If the link can be precisely defined, i.e. the link is a strong link, it means that the agent should
have a thorough understanding of the concept. Expressed in another way, if the stated
industry practitioner, who has a given agent role, has a clear understanding of a given concept
– they can clearly articulate their relationship in terms of the concept and the relationship
with the concept does not vary when moving from one motion picture to another, then that
agent has a strong relationship to that concept and the relationship is modelled as such. E.g.
72
Actor performs Acting, this is a precise relationship that can be precisely defined and
articulated. As such, it is a strong link.
[Linkage 4] If an agent has a weak link to a non-agent concept then it is understood that
the agent has some understanding of the non-agent concept but it is not deep and the agent
is not an expert in the concept.
For example, Blocking involves agent Actor. Blocking is a rehearsal technique by which a
scene is finalized (e.g. placement of lights, movement of actors etc) before it is shot.
Precisely what an actor does during blocking is hard to capture and will differ from scene to
scene, actor to actor and motion picture to motion picture. In other words, where different
instances of the agent define their relationship to a concept in an abstract manner or cannot
exactly define their relationship beyond that they are involved the relationship is considered
to be weak.
The next Linkage axiom deals with the direction of the relationship, as in whether the
relationship is uni-directional or bi-directional. The reason that strong links and weak links
behave differently in this regard has its origins in the discourse of the industry. Strong links
are so specific that they take one form when they go from concept A to concept B and a
completely different form when they go from concept B to concept A. On the other hand,
weak links are so general and vague that whether they are linking concept A to concept B or
going the other way and linking concept B to concept A, they have the same form.
[Linkage 5] All strong links are uni-directional and occur in pairs, while all weak links are
bi-directional, where direction relates to the vocabulary used.
For example, the pair of strong links that connect Actor and Acting have different meta-types
depending on the direction. That is, Actor performs Acting but Acting is performed by Actor.
On the other hand the weak link between prestige and film festival is the same regardless of
the starting concept; Prestige is associated with Film Festival and Film Festival is associated
with Prestige.
The last two Linkage axioms deal with how agents are linked together and the reason inter-
links between agents have to be separately identified is because of how agents are linked
together within the industry. Firstly, certain agents are grouped together into departments
because these agents all have a specific speciality and work together on some aspect of the
73
production cycle exclusively. These agents are directly linked together and their link is
codified in the next Axiom.
[Linkage 6] Agents are linked to other agents through aggregate concepts; all agents that
belong to the same group are deemed to have a strong link to each other.
For example, both the Director and the Producer belong to the Production Office, where the
production office is the aggregate concept that groups together the agents Director and
Producer. These aggregate concepts come from the industry and it is how the industry groups
agents together. The understanding here is that these agents work closely with each other and
often across many concepts.
On the other hand, the second type of agent link is between agents that are in different group
in the industry classification. These agents are not linked directly to each other by the
industry as such — rather they are linked through concepts that involve them. The last
Linkage axiom captures this industry characteristic.
[Linkage 7] Agents who belong to separate groups are linked via a non-agent concept.
For example, in the previous Blocking example, the Director (a member of the Production
Office) is linked to Actor (a member of the Cast) through the concept of Blocking. The
question can be asked why Director and Actor cannot be linked directly because Directors
direct Actors. The answer is that the Director does not just direct Actors. They direct Actors
“on set”, during a specific scene. As such, the act of directing by the Director in relations to
the Actor is very context specific. Therefore, it makes sense to only link the two through that
context.
4.2.2.4 Terminology Axiom
The last type of Concept Axiom is the Terminology Axiom, which is necessary for natural
language reasons as language contains a number of synonyms or equivalencies. An example
of equivalence is FPS which is equivalent to Frames-Per-Second because one is the
abbreviation of the other. There is only one axiom pertaining to equivalence but a number of
other rules arise because of the nature of the axiom.
74
[Terminology 1] Synonyms are accounted for by linking synonym concepts as equivalent to
the central concept of which they are synonyms; the choice of the central concept is
determined by the industry and/or arises from natural language.
The above axiom by its nature dictates the following axioms to be true.
• There is a central concept to which all other synonyms are referred to.
• A synonym can only be equivalent to one central concept.
• Only the central concept have properties, synonyms absorb all properties from the central
concept.
• Only the central concept has an inheritance hierarchy (parent concepts), synonyms mirror the
inheritance of the central concept.
• Only the central concept can be linked to a synonym, synonyms cannot be linked to other
synonyms.
4.2.3 Meta-Link Axioms
The last type of major axiom class is the Meta-Link axiom. The Meta-Link axioms dictate the
names to assign the weak and strong relationships outlined in the Linkage axioms. In essence,
the Meta-Link axioms define what can be classified as weak links. Everything that is not a
weak link is a strong link. This is because there are more types of strong links than there are
weak links. These weak links are semi-artificial constructs in that the manner of their
expression is allowable in natural language discourse but it is unlikely that industry
practitioners would express it exactly in those terms. At the same time, the hierarchical
relationships between the weak and strong meta-links that the following axioms establish are
something tacitly understood by industry practitioners but not necessarily expressly
acknowledged.
[Meta-Link 1] All links between concepts inherit from the root links “is associated with”,
“involves concept”; both of which are considered to be weak relationships.
These are extremely general relationship descriptors that can cover a wide range of
relationship and at the same time convey nothing specific beyond that two concepts are
linked in some way. As such these are considered weak relationship links.
75
[Meta-Link 2] The root link “involves concept” has the child concepts “involves agent”
and “involves component”, which are considered to be weak relationships but does have
directions unlike the their parents
These relationships are a little more specific in that they link a particular type of concept.
That is, “involves agent” links a non-agent concept with an agent concept and “involves
component” indicates that the second concept is part of the first or expressed in another way,
the first concepts consists of the “component” concepts linked to it via “involves
component”. These are however still vague enough to be considered weak relationships.
[Meta-Link 3] Links can either be strong or weak, if a link is not weak (as defined by the
previous axioms) then it must be strong.
The decision to have a binary relationship of either weak or strong was made because in most
cases it is impossible to articulate the range of relationships that exists between weak and
strong. Therefore, the decision was made that if the industry practitioners or the industry
literature could clearly articulate the exact nature of the relationship between two concepts
then those two concepts would be linked using a strong relationship. If on the other hand the
industry practitioners or the industry literature could not clearly articulate the exact nature of
the relationship between two concepts, then they would be linked using the weak link that
suits the two concepts the best.
The weak links are fixed in quantity, with the next axiom setting down what meta-
relationships can be considered weak.
[Meta-Link 4] The weak links are limited to the root links (“is associated with” and
“involves concept”), “involves agent” and “involves component”.
We have made a concerted effort to exclude artificial constructs but in the case of the weak
relationship, we have had to come close to the borderline of such constructs. Industry
practitioners may not be explicit about relationship but common use of the concepts together
indicates the existence of some sort of relationship. As such, there are no natural occurring
phrases from the discourse of the domain that can be used to represent weak relationships.
However, the meta-links chosen for the concept relationship are such that when articulated
out loud, sound natural and meaningful. For example, Actor is associated with Blocking.
76
[Meta-Link 5] Except for “involves component” and “involves agent”, children of root
links are considered to be strong links as they are more specific.
This last axiom sets up the meta-link hierarchy, which is shown below in Figure 4.4. The
strong links are the children of the weak links because the weak links can apply to all
relationships and the strong links make the weak links more specific/precise. The decision to
make the strong links into the children of the weak links was made to clearly indicate the
inter-connected nature of the links.
Weak
links
2nd order
weak
links
Examples
of strong
links
Meta-Link Hierarchy
is associated with involves concept
involves agentinvolves
component
is performed by is used foris screened atis performed
duringto see
Figure 4.4: The meta-link hierarchy
4.3 Structure of the Ontology
The ontology is structured in three parts and, because of the axioms presented before, the
ontology exists as a lattice-like structure in three axes or dimensions. The ontology consists
of three sub-ontologies: the MPI concepts ontology, the Agent concepts ontology and the
Common concepts ontology. They are detailed in the following sections.
77
4.3.1 MPI Concepts Ontology
The motion picture industry (MPI) concepts ontology contains all the concepts unique to the
motion picture industry or that have a special meaning within the industry. For example,
Cross-Cutting is an editing technique that is a concept unique to the motion picture industry
and would therefore be modelled as part of the MPI concepts ontology. Likewise, Treatment
is a concept, while not unique to the motion picture industry, does have a special meaning
within the industry. In the case of Treatment: it is a document usually prepared by the
Producer as part of a pitch for a new motion picture project. This differs greatly from the
usage of the concept treatment in other domains of discourse and the natural language
discourse of society at large. Concepts that do not have a special meaning within the
discourse of the industry, e.g. action, description, or are types of agents, either people or
companies, are not modelled as part of the MPI concepts ontology.
4.3.2 Agent Concepts Ontology
The Agent concept ontology models all agents that are frequently involved with the processes
of the motion picture industry. It is, of course, not possible for all agents involved with the
motion picture industry to be modelled simply because of the sheer number that become
involved in the later parts of the production cycle and life stage timelines. For example, if the
motion picture is deemed of historical significations, an archivist will become involved.
There is also a question of whether agents such as the onsite nurse need to be modelled in the
Agent ontology. Certainly most motion pictures shoots would have an onsite nurse or at least
some sort of First aide provider, however, the question becomes where to draw the line and
the line was drawn between agents that are essential to be captured to give a good
representative model of the discourse of the motion picture industry, but excluding agents
who may not be as essential to the concept model.
As explained in Section 4.2.2.2, the basis for inclusion is ending credits and/or self-
identification. If an agent is included in the ending credit then it can be seen as a industry
affirmation that, that agent’s role is part of the industry. However, the agent in question must
still self-identify as being part of the motion picture industry to affirm that selection of the
industry. In addition, if an agent self-identifies as being part of the industry and their work
primarily does involve the motion picture industry, non-inclusion in ending-credit is not a
basis for exclusion from the agent ontology.
78
That is not to say that Agent ontology only models agents specific to the motion picture
industry. Certain common concept agents are necessary to model accurately the domain of
discourse. For example, an electrician is a common concepts agent that needs to be modelled
as an electrician plays such an important role during the production of the motion picture.
Taking into account the variation of the various types of agents, the ontology does identify
agents as creative, technical or a hybrid of the two, with MPI specific agents having an
explicit link to a phase of the production cycle.
4.3.3 Common Concepts Ontology
The common concepts ontology models those common concepts that either act as parents to
the motion picture industry specific concept, such as action which needs to be modelled to
properly identify concepts such as acting as types of action, or common concepts that occur
so frequently in the industry that without modelling them would not allow a proper concept
model for the industry to be developed. For example, the concepts of fee and lunch break
would need to be modelling to capture accurately and model various process of the motion
picture industry.
The question here, much like that needed for the agent concept, is where the line is to be
drawn. With common concepts the line is easier to draw. Any concept that does not occur
frequently in the motion picture industry can be safely discarded, with concentration being
focused on only those common concepts that are necessary. By “necessary”, we mean which
common concepts are expressed with such high frequency in relation to motion picture
industry specific concepts that not including them would result in the improper modelling of
the motion picture industry specific concepts. In short, the common concept ontology was
only developed to the extent that it was absolutely necessary to do so to model accurately the
MPI concepts ontology and the Agent concepts ontology.
4.3.4 The Root Concepts
For the Loculus ontology we made the conscious decision to capture the discourse of the
domain, as stated in Inclusion Axioms 1 and 2. Our root concepts are concepts that are
naturally the parents of concepts within the motion picture industry and concepts that
industry professionals would intuitively understand and indeed how they would group
together the concepts of their industry. However, it must be emphasised that our root
79
concepts address the questions of who, what, where, with the when being addressed
separately by the temporal context. Of the root concepts, Agent addressed the who question;
Artifact, Action, Process, Tool and Technique address the what question; the where question
is addressed by the concepts that are the children of Description. However, the root concepts
only apply to the MPI specific concepts contained within the MPI ontology.
The concepts that form part of the common ontology do not necessarily inherit from the root
concepts, but the root concepts are common concepts that have been chosen specifically for
the MPI concepts on the basis that these are the concepts under which the MPI concepts are
naturally grouped per their use in natural discourse. This is not to say that none of the
common concepts inherit from the root concepts, where the classification is obvious, such as
Address inheriting from Description, the inheritance is noted. Otherwise the common
concepts are left alone, such as in the case of Lunch – is it a process or should it be classified
as a description because that is how concept is used in the industry?
In addition, it must be noted that the concepts relating to the two timelines, i.e. the temporal
context, do not inherit from anything. They could potentially be grouped under Description
but that is not necessarily a natural grouping for them. Ontological elegance would dictate
that they be grouped under something like Temporal Phase or Temporal Description.
However, either of those two concepts would be wholly artificial and not related to discourse
of the industry. Often in ontology engineering, for the sake of ontological elegance such
artificial concepts are employed. However, as identified by Hepp, these artificial concepts do
not make sense to industry practitioners and often confuse them, thereby discouraging them
from using the ontology [40]. In addition, artificial constructs have the potential to throw off
perspective sensitivity by moving the ontology away from the natural discourse of the
industry. As such we have therefore opted to sacrifice ontological elegance in order to remain
faithful to the industry discourse as best we can.
In this manner we modelled the concept of Editing, shown in Figure 4.5. Editing inherits
from the root concept of Action and has linkages to the agent Editor and to the temporal
context through the concepts of Post Production and Production, where post production is the
phase of the production cycle in which editing is performed and production is the life stage
under which the industry classifies editing.
80
editing
action
editor
post production
inherits from
is performed during
is performed by
productionis classified under
Figure 4.5: Ontology Extract – Editing
Editor in turn is descended from the root concept of Agent but the descent is far more
complex, due to the hierarchical structures employed by the industry. As shown in Figure 4.6,
the concept of Editor inherits from the concepts of both Technical Crew, Creative Crew, as
well as that of Person. Technical Crew and Creative Crew are both children of the concept of
Crew. However, the industry dictates that both types are necessary as there is a distinction.
For example, a cameraman is only a member of Technical Crew as their function is generally
simply to operate the camera. On the other hand, the Director of Photography (DoP) is a
member of a both the Creative and Technical Crew because the DoP has an instrumental
input in the creative direction of the film and works closely with the director to bring the
script and the director’s vision to life. Different from both the Cameraman and the DoP is the
costume designer who is a member only of the creative crew. The reason to distinguish the
Editor is person is necessary as non-persons, such as companies and animals, can also be
agents of the industry. However, the pertinent point to note here is that eventually Editor does
descend from the root concept.
81
editor
personcreative
crewtechnical
crew
postproduction
EditorialDepartment
inherits from
works during
is part of
people
crew
inherits from
inherits from
is one of
agenttype of
Figure 4.6: Ontology Extract – The Inheritance of Editing
The next logical question is how the root concepts are modeled within the ontology. As
shown in Figure 4.7, naturally the root concept of Action is the top most level of abstraction
and has no parents. It is explicitly linked to the concept of motion picture. As common
concepts, root concepts are not required to have temporal association; in this case the root
concept of Action does as this makes it explicit that Action concepts are used thorough-out
the production cycle. In line with industry usage, action type concepts are mostly associated
with the production phase of the motion picture life stage timeline.
action motion pictureis associated with
production cycleis performed during
productionis classified under
Figure 4.7: Ontology Extract – Action
82
4.3.5 Lattice Structure
The lattice-like structure of the ontology is the result of the Concept Axiom. At the heart of
the lattice is the concept of motion picture, which links all the root concepts together as
shown in Figure 4.8. The motion picture concept has been modelled to be a child of the
concept Artifact (a root concept), which is what it is – the artifact that is the product of the
entire production cycle. The other root concepts are the components that make up the artifact
of the motion picture, which in turn has some specific links to the concepts of Film Festival,
Cinema and Audience. The temporal aspect of the motion picture concept is that it is linked to
the entire Production cycle but is chiefly used during the utilisation stage of the life stage
timeline. The latter, as mentioned before, is a link that’s made by the industry through
practice. In this case, the practice that associates the artifact of motion picture with the
utilisation stage is motivated by the fact that the artifact is used during the utilisation stage,
while the earlier stages create the artifact.
motionpicture
description
agent
action
tool
technique
process
artifact
involves component
artifact
inherits from
film festival
cinema
is screened at
audience
productioncycle
utilization
seen by
is associated with
classify under
Inheritance
Temporal
Linkage
Figure 4.8: Ontology Extract – Motion Picture
83
4.3.6 Three Axes
As mentioned before, the lattice-like the structure of the ontology exists in a three axes plane.
The first axis is the vertical or inheritance axis, the second axis is the horizontal or linkage
axis and lastly the temporal axis. These axes are the result of the axioms and the temporal and
agent contexts of industry. Almost all the concepts exist in the three axes. Almost all because,
per the axioms, common concepts do not have to have a temporal association and need not
inherit from anything. However, all the concepts of the MPI ontology and the Agent ontology
do exist within the bounds of the three axes. For example, Figure 4.9 shows the previously
introduced concept of Editing contained within the three axes. These axes are the direct result
of the axioms and of the two contexts, temporal and agent, of the motion picture industry.
Inhe
ritan
ce A
xis
Linkage Axis
Tem
pora
l Axis
editing
action
editor
post production
inherits from
is performed during
is performed by
production
is classified under
Figure 4.9: Ontology Extract – Editing within the Three Axes
The Inheritance Axis is the direct result of inheritance axioms, which set up the root concepts
and then dictate how concepts are to inherit from them and other concepts. Those axioms in
essence set up a vertical plane that start with the root concepts and goes downwards, or from
any given concept goes upwards towards the root.
84
The Linkage or horizontal axis is the result of the Linkage axioms that serve to connect one
concept to another horizontally through relationships. This of course incorporates the agent
context of the industry as that too is captured through horizontal linkages, through the linkage
axioms. Lastly, the temporal axioms that capture the temporal aspect of the industry are
responsible for the temporal axis.
4.4 Ontology Implementation
The ontology was implemented using the OWL using the Altova SemanticWorks 2008
software. In this section we will be discuss the motivation behind the choice of
implementation language and tool, as well as highlight how the ontology has been
implemented in practice. The implementation of the ontology was unavoidably subject to the
restraints of the technology used to implement it. The nature and impact of these limitations
are also discussed.
4.4.1 OWL
As first explained in Section 2.2.3, OWL is a set of languages used for knowledge
representation in the form of ontology authoring [74]. OWL is designed for use by
applications that need to process the content of information instead of just presenting
information to humans [74]. Recommended by the World Wide Web Consortium, it
comprises of three semantically related languages: OWL Lite, OWL DL and OWL Full. Each
language is a syntactic extension of its simpler predecessor with OWL Lite being the simplest
of the languages and OWL Full being the most comprehensive and supporting all features of
OWL [74]. As a result, while all valid OWL Lite documents are also valid OWL Full
documents, not all valid OWL Full documents are valid OWL Lite documents. For our
purposes we opted to use OWL Full in order to gain access to all OWL functionality. In
hindsight, it was found that it could have been implemented in OWL DL. However, we did
not know this at the onset and thus opted to use OWL Full.
Our decision to use OWL was made on three grounds. Firstly, we wanted to take advantage
of features that were exclusive to OWL; for example, the property of ‘disjointedWith’ is
exclusive to OWL and is invaluable in expressing the fact that while two concepts are
disjointed despite, say, an overlapping inheritance hierarchy. An example of two disjointed
concepts would be Emotive Category and Criteria Based Category, as can be seen from
85
Figure 4.10, while they both share the common parent of category they are not disjointed
from each other as one involves the use of the concept of emotion and the other the concept
of criteria.
The second benefit of using OWL is that it is the recommended language of the Semantic
Web. This makes the Loculus ontology ready for application within web services, without
creating any disadvantages in using the ontology in a non-web context.
We implemented the ontology that was possible, with our expressive power being
constrained by both what could be modelled and by what could be articulated. In this case,
while there was no particular function that was lacking in OWL that could be explicitly
flagged as restricting the expression of the ontology, rather the limitations were part of the
overall modelling process in that every time a concept was modelled the question had to be
asked, given the axioms that govern this ontology and the expressive power of OWL: what is
the best way to model this concept? In short, we constructed the ontology as expressively as
we could give the constraints of the semantics of OWL and in the context of the axioms for
Loculus.
Disjointed With
genrepost
production
discovery
criteria basedcategory
category
description
criteria
productioncycle
is used during
is used during
involves conceptemotivecategory
emotionsinvolves concept
moodpost
production
is used during
classify under
inherits from
inherits from
inherits from inherits from
Figure 4.10: Ontology Extract – Category and its children
86
4.4.2 Altova SemanticWorks
Altova SemanticWorks is a visual RDF and OWL editor that conforms to the W3C standard
for OWL. The tool is easy to use and has features that prevent the user from creating an
ontology that does not conform to W3C standards. This tool was used to visually construct
the Loculus ontology.
Physically, the Loculus ontology is in two separate files. The first file contains all non-agent
concepts and the second file contains all agent files. We used this separation to improve the
tracking of the elements. However, this did introduce an issue as SemanticWorks does not
have a feature that easily allows for concepts modelled in a separate files to be included. As a
result, we had to employ a work- around method where we referenced the concepts in one file
when we were using them in another. Referencing a concept is “static” and the software reads
it as a simple “text”. While if the concepts are in the same file the link is dynamic and
updates, for example, when the name of the original concept is changed, while with
references we have to go back and manually update the name of the concept. It makes no
difference in the interpretation of the the ontology, but does present a noticeably different
notation in XML form. The different concepts from the different files had different suffixes.
In our case the two suffixes were “Loculus” and “LoculusAgent”. In the next section, we
discuss in detail how various concepts are modelled in practice using the various properties of
OWL.
4.4.3 How Concepts are Represented
At the XML level, a concept is represented as a rdf:Description with the name of the concept
being a rdf:about property. Figure 4.11 below shows the XML level representation of the
concept of Editing.
87
<rdf:Description rdf:about="loculus:editing"><rdf:type>
<rdf:Description rdf:about="http://www.w3.org/2002/07/owl#Class"/></rdf:type><rdfs:subClassOf>
<rdf:Description rdf:about="loculus:action"/></rdfs:subClassOf><rdfs:subClassOf>
<rdf:Description><owl:hasValue>loculusAgent:editor</owl:hasValue><owl:onProperty>
<rdf:Description rdf:about="loculus:isPerformedBy"/></owl:onProperty><rdf:type>
<rdf:Description rdf:about="http://www.w3.org/2002/07/owl#Restriction"/></rdf:type>
</rdf:Description></rdfs:subClassOf><rdfs:subClassOf>
<rdf:Description><owl:hasValue>
<rdf:Description rdf:about="loculus:postProduction"/></owl:hasValue><owl:onProperty>
<rdf:Description rdf:about="loculus:isPerformedDuring"/></owl:onProperty><rdf:type>
<rdf:Description rdf:about="http://www.w3.org/2002/07/owl#Restriction"/></rdf:type>
</rdf:Description></rdfs:subClassOf><rdfs:subClassOf>
<rdf:Description><owl:hasValue>
<rdf:Description rdf:about="loculus:production"/></owl:hasValue><owl:onProperty>
<rdf:Description rdf:about="loculus:classifyUnder"/></owl:onProperty><rdf:type>
<rdf:Description rdf:about="http://www.w3.org/2002/07/owl#Restriction"/></rdf:type>
</rdf:Description></rdfs:subClassOf>
</rdf:Description>
Figure 4.11: The representation of Editor at the XML level
The key point to note here is that both the inheritance axis and the linkage axis relationships
have to be expressed within the <subclasses> tags. However the expression of the linkage
axis relationship is more complex than the inheritance axis relationship. Figure 4.12 isolates
the inheritance relationship of Editing to the root concept action.
<rdfs:subClassOf><rdf:Description rdf:about="loculus:action"/>
</rdfs:subClassOf>
Figure 4.12: Inheritance Relationship Representation
As it can be seen, the relationship depicted in Figure 4.12 is relatively simple when compared
to the more complex expression used for the linkage axis relationships, shown in Figure 4.13.
For the horizontal relationships, the OWL specific tags of <owl:hasValue> and
<owl:onProperty> are used in conjunction to express the linkage between two concepts. In
the case of Figure 4.13, the has value is set to “loculusAgent:editor” on the property
88
“loculus:isPerformedBy”, where “loculusAgent” and “loculus” prefixes denote the ontology
from which the concept and the property come from.
<rdfs:subClassOf><rdf:Description>
<owl:hasValue>loculusAgent:editor</owl:hasValue><owl:onProperty>
<rdf:Description rdf:about="loculus:isPerformedBy"/></owl:onProperty><rdf:type>
<rdf:Description rdf:about="http://www.w3.org/2002/07/owl#Restriction"/></rdf:type>
</rdf:Description></rdfs:subClassOf>
Figure 4.13: Linkage Relationship Representation
The representation of an agent concept in the XML level is not very different from the
representation of non-agent concepts. However, as mentioned before, within the industry
agents are grouped together in departments. We have opted to represent this grouping using
the OWL specific <owl:oneOf> . Figure 4.14 shows how the concept of editor is represented
in the XML level. As it can be seen from the highlighted portion, the editor is modeled to be
one of the concepts from the editorial department.
<rdf:Description rdf:about="loculusAgent:editor"><rdf:type>
<rdf:Description rdf:about="http://www.w3.org/2002/07/owl#Class"/></rdf:type><rdfs:subClassOf>
<rdf:Description rdf:about="loculusAgent:person"/></rdfs:subClassOf><rdfs:subClassOf>
<rdf:Description rdf:about="loculusAgent:technicalCrew"/></rdfs:subClassOf><rdfs:subClassOf>
<rdf:Description rdf:about="loculusAgent:creativeCrew"/></rdfs:subClassOf><owl:oneOf rdf:parseType="Collection">
<rdf:Description rdf:about="loculusAgent:editorialDepartment"/></owl:oneOf><rdfs:subClassOf>
<rdf:Description><owl:hasValue>loculus:postproduction</owl:hasValue><owl:onProperty>
<rdf:Description rdf:about="loculus:worksDuring"/></owl:onProperty><rdf:type>
<rdf:Description rdf:about="http://www.w3.org/2002/07/owl#Restriction"/></rdf:type>
</rdf:Description></rdfs:subClassOf>
</rdf:Description>
Figure 4.14: XML-level Representation of the Concept of Editor
89
4.5 Ontology completeness
In practice, an ontology can never truly be complete. You build the ontology that you can and
not necessarily the one you want because reality constrains the development of ontologies
[40]. The domain of discourse is always evolving and therefore an ontology needs to
continually evolve to keep-up with it. More importantly, to model a complex concept such as
motion picture fully, a high number of connections must be made. Even then the concept may
not be considered completely modelled by some within the industry and indeed experts can
step forward and point out missing links that were accidentally over-looked or not included
because they did not exist before. Therefore, for the sake of practicality you have to draw a
line in the sand and make an ontology that is complete enough.
In our specific case, we developed what we could given the limitation of time and other
resources. We have developed the Loculus ontology enough to demonstrate how the “full”
ontology is to be constructed and we have developed the Loculus ontology to the extent that
it can be used meaningfully in answering our driving research questions. However, the
development of the Loculus ontology is still an ongoing effort. Table 4.1 gives some basic
statistics on the ontology as it is. Note that the “Meta-Links Declared” give the number of
Meta-Links in use, not the number of relationships between concepts. The reason for this is
that Altova SemanticWorks does not have a tool that outputs the number of connections
within the ontology.
Table 4.1: Loculus Ontology Basic Statistic
Agent MPI Common
Ontology Concepts 95 312 35
Meta-Links Declared 5 23 5
4.6 Summary
The first research question, as presented in Chapter 3, is: how to model the domain of
discourse to reflect and include the different perspectives of agents. In this chapter, we
answer this question by presenting the Loculus ontology, which is a domain ontology for the
motion picture industry that captures the natural discourse of the industry. We also presented
a set of axioms that govern the ontology and how the ontology captures key contexts of the
motion picture industry; namely the temporal context and the agent context. We have shown
90
that the ontology exists in three parts, the motion picture industry concepts ontology, the
Agent ontology and the common concepts ontology and due to the axioms and the contexts of
the industry, most concepts in the ontology exist within a three axes plane. These axes are the
inheritance/vertical axis, the linkage/horizontal axis and the temporal axis.
The Loculus ontology, by modelling the agent along with the domain concepts and by
capturing the temporal context of the industry, model the domain of discourse to reflect and
include the different perspectives of agents, thus addressing the research question presented
in Section 3.1: How to model the domain of discourse to reflect and include the different
perspectives of agents?
Now that the ontology is in place, the question is how we plan to use it. In the next chapter
we address this question as we present a score-based relatedness metric that can be used to
measure the relatedness of the concepts within the Loculus ontology.
91
5 Semantic Relatedness Metric “Look, I don't want to get into a semantic argument, I just want the protein” – Marty, Grosse
Pointe Blank
In the last chapter we presented the Loculus Ontology as a domain ontology for the motion
picture industry. Now that we have the ontology the next question is: how can the ontology
be used to explore agent perspective as posed in the research question presented in Section
3.1? Therefore, our goal is to develop a relatedness metric based on analysis of the ontology
that best approximates the human cognitive sense of relatedness of concepts.
The ontology models the domain of discourse, including the agents, as a hierarchical graph
with the nodes of the graphs being the concepts of the ontology and the edges of the graph
being either inheritance links or linkage links. There is an added temporal layer that
associates the concepts to the two timelines of the industry. For example, Figure 5.1 shows
the concept of Editing with an inheritance link to the concept of Action and with a linkage
link to the agent Editor and, lastly, with temporal links to the two timelines of the industry.
editing
action
editorinherits from is performed by
pre-production production post-production
Production Cycle
Production Cycle as a whole
conception production utilisation
distribution discovery access preservation
reuse/re purpose
Life Stage
Figure 5.1: Ontology Extract - Editing
It is our contention that an agent’s perspective is based on the agent’s position within the
ontology (the model of the domain of discourse) and can be quantified by measuring the
92
semantic relatedness between the agent and the other concepts within the ontology. For
example, from Figure 5.1, it is clear that the Editor has a close association with the concept of
Editing. The Editor does not, however, have any visible association with the concept of
Acting. That is not to say the concept of Acting cannot be reached from the position in the
ontology the editor occupies but the editor would have to reach the concept of Acting via
many other concepts (e.g. editor -> editing -> action -> acting). The path, in short, between
Editor and Acting is longer than the path between the concept of Editor and Editing. This
reach between concepts within the ontology is the central basis of the rule-based metric for
the calculation of semantic relatedness between concepts in a given ontology and it is this
metric that will be presented in this chapter.
5.1 Introduction to the Relatedness Metric
The relatedness metric is devised as a step towards an answer to the research question posed
in Section 3.1: “Can agent perspective be exploited to better serve the needs of the motion
picture industry?” The ontology captures the domain of discourse, including the agents.
Using the ontology we can determine which concepts are close and which concepts are
further apart. However, how to measure this distance? This is the first question because, until
we have a means of measuring the distance, we cannot even think about exploiting it.
As mentioned in Section 2.3.2 of the literature review, there are a number of ways of
measuring the semantic relatedness of concepts within an ontology. However, the majority of
these methods arose from activities involving linguistics, often based on ontologies with a
tree structure such as the WordNet taxonomy. In addition, the intended use of these metrics
required the metrics to give a very precise measure of similarity. In contrast, what we need is
a relative measure of similarity, as a way to identify concepts as being closely or distantly
related. As discussed in Section 2.3.2 of the Literature Review, simply counting the edges is
not an effective method of measuring semantic relatedness as it does not take into account
issues of abstraction [78], which arises when concrete concepts that are not actually closely
related appear to be closely related through an abstract concept and thus generate an
erroneous measure of relatedness.
To develop the metric for the semantic relatedness for the motion picture industry, we had to
take into account the nature of the industry as well as the structure of Loculus ontology. This
93
led to a metric that is calculated along three axes introduced in the last chapter. The three
axes, once again, are the:
1. Inheritance Axis 2. Linkage Axis 3. Temporal Axis
Before presenting the calculation of the metric, the question must be asked, “what makes
things closely related?”.
A major contribution to similarity is common inheritance. If concepts are part of the same
inheritance tree then it follows that the concepts share properties and are therefore closely
related. For example, Method Acting and Stanislavski's System (both being types of acting
techniques) are more related than Method Acting and Cross-cutting, which is a type of editing
technique. Therefore the Inheritance Axis has the greatest impact on the determination of
relatedness. At the same time, many concepts can share a common inheritance but not all of
them are related to the same degree. If the two concepts only acquire one or two common
properties through their common inheritance, they might not be closely related. On the other
hand, if two concepts share many properties because of their common inheritance, they then
can be said to be very related. For example, Acting and Editing are both types of Action, but
they are very different types of Action. As such even though they both inherit from Action
they are not all that similar. On the other hand, Method Acting and Stanislavski's System have
a much closer relationship as they are both specialised forms of Acting. This relationship is
illustrated in Figure 5.2.
ActorActing
Method Acting
is performed by
+1 temporal scorepre-production production post-production
Production Cycle
Production Cycle as a whole
Action
Stanislavski'sSystem
Figure 5.2: Ontology Extract - Inheritance Links and Linkage Links
94
Regardless of inheritance, direct links between concepts can actively bring concepts closer
together even when they are not closely related by inheritance. An Actor is a person; Acting is
a type of action. Just following the inheritance tree would result in the perception of being
unrelated but through the linkage axis they have a direct link with each other as Actor
performs the action of Acting. This relationship is also shown in Figure 5.2.
The Inheritance Axis and the Linkage Axis do not function independently. Rather, they work
together to form the metric of Distance by costing the path between two concepts within the
ontology. Rada was the first to propose this metric of Distance [92]. Rada defined Distance
as the average minimum path length over all pair-wise combinations of nodes between two
subsets of nodes [92]. He also successfully used Distance to assess the conceptual distance
between sets of concepts within a semantic net of hierarchical relations [92]. Rada employed
simple edge counting to demonstrate that edge counting can be used to measure the semantic
relatedness between concepts. In contrast, we employ weighted edge counting. We call the
distance between two concepts as the reach from concept A to concept B, which is the
shortest path able to reach concept B from concept A.
Lastly, the temporal context of the industry as explained in Section 4.2.1.2 is accounted for
by the Temporal Axis. Where the reach between two concepts is high, their temporal
proximity has limited bearing on their relatedness. For example, Shooting and Costume
Design happen in the same phase but the reach between them is very high. Therefore, their
temporal proximity is of secondary consideration. On the other hand, where the reach
between two concepts is very low, the temporal proximity can add an important layer of
context. For example, the fact that Shooting and Editing (Shooting produces Film, Film used
during Editing) happens in two different phases adds the context that one feeds into another,
with the shooting technique employed having a direct impact on the type and quality of shots
available for editing purposes.
5.2 Rules for Calculation
The rules governing the calculation of the metric are based on a mixture of findings from
literature, the axioms that govern the ontology as well as the nature of the industry. The
scores assigned are relative to the scale chosen. We opted to have a scale from 0 – 100, where
a score of 0 indicates an equivalent concept, e.g. FPS and Frame rates per second would get
a score of 0 to indicate that there has been no movement conceptually, and a score closer to a
95
100 implies that the concepts are only related in that they are both part of the discourse of the
motion picture Industry, e.g. Method Acting and Camera would get a score of 66.
The scale then in turn dictates the values of the scores for each of the rules, which will
presented in the Section 5.2. By knowing the scale, we can then assign weights to the edges
to reflect relative strength and weakness of the links, taking into account the axioms that form
the foundation of our ontology (see Section 4.2). The purpose of the relatedness metric is to
give a relative measure of relatedness on the scale chosen. That is, concept A is more related
to concept B than to concept C. While the relative magnitude of the scores have meaning, a
low number on the scale means that the concepts are closely related and a large number on
the scale means that the concepts are not closely related. The actual value of the numbers is
not meaningful independent of the scale.
However, what does the scale mean? It must be made clear that the scale does not uniformly
increase in distance. “Very near” is anything between 1 to 10, “near” is anything 11 to 20, 20
to 40 is “not near and not far”, 40 to 60 is “far” and anything above 60 is “very far”, only
related to each other on the account of being part of the same domain of discourse. It must be
reinforced that this is a relative scale and as such the exact numbers are not as important as
the approximate position on the scale.
As mentioned in Section 5.1, the metric score is calculated along the three axis introduced in
the previous chapter. As such, the rules governing the scores are also divided into three
categories: the rules that govern the calculation along the temporal axis, the rules that govern
the calculation along the inheritance axis and the rules that govern the calculation along the
linkage axis.
However, even though the calculations are done on three axes, they result in only two types
of score. As explained in the Section 5.1, we refer to the path that joins two concepts as the
reach between those concepts. In order to be able to reach any concept from any other
concept, we have to traverse both vertically and horizontally, i.e. the inheritance and linkage
axis have to be considered together. As a result, this leads to the reach score, which is the
value of the path between Concept A and Concept B. The calculation along the temporal axis
results in the temporal score, which takes into account both the production cycle and the life
stage timelines.
96
There are fundamental differences between the axes that lead to each having very different
impacts on the relatedness score. The inheritance axis links many concepts together through
shared properties. Sharing properties contribute to the relatedness of concepts, but this is
related to the extent of sharing. Certainly if a concept shares a handful of properties but differ
in the case of one property, you can say that the concepts are closely related. However, if they
share one property but differ in the case of many others then they are not as related as two
concepts that share many properties and only differ for a handful. For this reason, the
inheritance axis contributes a range of numbers through its multiple rules. The rules and the
scores that each of them contributes are discussed in more detail in the sections that follow.
5.2.1 Reach Score
In mathematics, the concept of distance states that a distance function on a given set M is a
function d: M x M � R, where R denotes the set of real numbers, that satisfies the following
conditions [102]:
1. d(x,y) ≥ 0, and d(x,y) = 0 if and only if x = y. (Distance is positive between two
different points, and is zero precisely from a point to itself – although in our case this
means that the two concepts are in fact the same)
2. It is symmetric: d(x,y) = d(y,x). (The distance between x and y is the same in either
direction.)
3. It satisfies the triangle inequality: d(x,z) ≤ d(x,y) + d(y,z). (The length of a given side
must be less than or equal to the sum of the other two sides but greater than or equal
to the difference between the two sides).
In addition, in mathematics, a metric or distance function is a function which defines
a distance between elements of a set [103].
For our reach score, we will be adhering to the principles of algebraic distance functions
when assigning scores to the edges. The edges of the path are weighted according to the rules
with the vertical paths being weighted according the rules governing the inheritance axis and
the horizontal paths being weighted according to the rules governing the linkage axis. The
rules that we will be presenting conform to the algebraic distance function, as they
relatedness distance that is calculated is always, 1) Positive or precisely zero in the case of
concepts that are the same; 2) Is symmetric and is the same going from concept A to concept
97
B or concept B to concept A; and 3) Conforms to the triangle inequality. Therefore, our
relatedness metric can be considered a form of distance metric.
5.2.1.1 Inheritance Axis
Of all the axes the inheritance axis contributes most to the concept of relatedness as by
sharing a common inheritance hierarchy, concepts automatically share properties that relate
them to each other. This is reflected in the rules below as the inheritance axis makes the
largest contributions to the relatedness score.
Rule 1: When an inheritance link exists between two concrete concepts, that link or edge is
weighted at 2
This rule is based on Inheritance Axiom 4 that states, assuming that the immediate parent of a
concept is not a root concept, the concept has a closer link to its immediate parent and by
extension its immediate child, than every other concept in its hierarchy. This is a basic edge
definition with the score of 2 picked based on the scale, where 1-10 is considered near. The
reason 2 was picked instead of 1 was partly to differentiate between the weight assigned by
Linkage Rule 2, explained in Section 5.2.1.2. The other reason for picking 2 as the weight
was to reach a balance between having a weight that stacked in a manner compliant with the
scale that was selected. I.e. if the scale had been different from 0-100, the weight would be
been different. It was a judgement call made both with the scale in mind, as well as the
knowledge of the ontology and its structure.
Rule 2: When an inheritance link exists between two abstract concepts, that link or edge is
weighted at 5
This rule is also the result of the working of Inheritance Axiom 4 and works with the idea of
sharing properties. Two abstract concepts will not share as many properties as two concrete
concepts. Therefore the weight assigned to the edge connecting them must be higher then
what was assigned to the edge connecting two concrete concepts. However, two abstract
concepts inheriting from each other are different from a concrete concept inheriting from an
abstract concept.
When an abstract concept inherits from another abstract concept, that concept might only add
a handful of properties to what it inherited from its parent. The weight assigned to the edge
must reflect this. Going back to the fact that on the scale 1-10 reflects a near relationship, 5
98
was chosen as a middle point to reflect the complexities of having two abstract concepts that
form part of the same inheritance hierarchy.
Rule 3: When an inheritance link exists between an abstract concept and a concrete
concept, that link or edge is weighted at 25
This rule has grounding both in the axiom and literature. From Inheritance Axioms 2 stated
clearly that the top level concepts in the inheritance hierarchy represents the greatest level of
abstraction, from the literature, it is clear that if this abstraction is not properly accounted for
the metric is skewed and results in erroneous findings [78]. For example, Acting is the child
of Action, Editing is also a child of Action, but Acting and Editing are very different types of
actions that are undertaken by very different types of agents for very different purposes.
Figure 5.3 illustrates the difference this rule makes. Without Rule 3, the reach between Acting
and Editing would be 4, signifying that they were closely related concepts. With Rule 3,
however, the more distant nature of the concepts is reflected with a score of 50.
Action(abstract)
Acting Editing
+2+2
4
Action(abstract)
Acting Editing
+25+25
50
Without Rule 3 With Rule 3
Figure 5.3: Ontology Extract – Application of Inheritance Axis Rule 3
The next question is, why +25? Following from this chain of argument started in Rule 2, the
concrete concept adds many more properties to the abstract concept to morph into a concrete
concept. Therefore, the weight on the edge should be sufficiently large. The scale is from 0-
100 and from 20-40 is considered not near and not far. 25 was picked from not near and not
far range as a number that sufficiently reflected the enormity of the step from abstract into the
concrete.
99
The next question then is, what is an abstract concept? Within the Loculus ontology the root
concepts of Agent, Artifact, Tool, Technique, Description, Action and Process are explicitly
identified as abstract as explained in the inheritance axiom (Section 4.2.2.1) and, as such,
they would automatically fall within the domain of rule 3. However, there are other concepts
that can be considered abstract and to which rule 3 must be applied to avoid errors related to
abstraction.
This is related specifically to the agent ontology and involves the concepts of People, Person,
Company, Animal, Crew and the two subtypes of Crew: Creative Crew and Technical Crew.
People, Person, Company and Animal sit at the top of the inheritance hierarchy of the Agent
part of the Loculus ontology but are not root concepts. However, if they are not given the
same weighting as the root concept of Agent, distortions such as those identified by Li [78]
will occur. Likewise, while all agents grouped under the subclass of Cast are at least all
actors, such homogeneity does not exist for Crew or even its two subtypes Creative Crew and
Technical Crew.
The problem here is two-fold. Under the subtype of Technical Crew are all manner of agents
from editors through to make-up artists. Without properly accounting for abstraction these
agents are too closely related with each other, as illustrated in Figure 5.4. The editor is a
member of both the technical crew and the creative crew. The editor shares their technical
crew description with the make-up artist. Neither are particularly aware of the technical
details of their respective professions. They have a vague idea what the other does, but so do
people wholly unconnected with the industry. As such, the fact that the editor and the make-
up artist are both part of the technical crew does not add much to their respective relatedness
as they share very few properties through this common ancestor. Unless the abstractness of
the Technical Crew concept is accounted for, make-up artists and editors are erroneously
perceived to be closely associated. As such, higher weighting must be assigned to the edges
to account for the abstraction and thus, resulting in the proper distance being accounted for in
the context of the crew hierarchy.
100
Crew
CreativeCrew
TechnicalCrew
Make-up artistEditor
Figure 5.4: Crew Hierarchy and close association between Editor and Make-up artist
5.2.1.2 Linkage Axis
The best method of conceptualising the function of the Linkage Axis is to think of it as a
marriage. A marriage between two people usually brings two unrelated individuals into a
close relationship with each other. Therefore, on a scale where 0-10 is considered to reflect a
near relationship, the weights on the edges that connect two concepts along the Linkage Axis
must always be below 10 and largely closer to the lower (number-wise) end of the scale. This
is the pertinent argument to keep in mind when considering the rules given below.
Rule 1: If the two concepts are linked via an equivalent relationship then do not increase
the score, i.e. weight the edge to be 0.
This is because per the Terminology Axioms (Section 4.2.2.4), equivalent relationships are
simply synonyms, as such they should not add to the score because it is simply two
expressions of the same concept.
Rule 3: If the two concepts are linked via a strong relationship then weight the edge at 1
A strong relationship is direct, tangible and very, very concrete. As such the score has to be
very low to reflect the close nature of the relationship of the two concepts. This is why the
score assigned to the strong relationship is 1, which is the lowest possible score possible as 0
denotes that two concept are in fact one and the same. In addition, going back to the
Inheritance Rule 1, the reason this edge is weighted at 1 and concrete to concrete inheritance
edge was weighted at 2 is because, going back to the example of marriage, while you are
closely associated with your parents, you have a bigger impact on your spouse’s day-to-day
101
life. The Linkage link that connects two concepts indicates that one is dependent on the other
and therefore must have the lowest possible weighting.
Rule 2: If the two concepts are linked via a weak relationship then the edge is to be
weighted at 5
The nature of the score reflects that fact that there exists a direct and tangible link between
the two concepts. However, as the exact nature of the link is abstract, the weight has to be
sufficiently high to reflect the abstractness. When 0-10 is considered to be the weight of a
near relationship, 5 is a good weight for the abstract sideways edge.
5.2.2 Temporal Score
The temporal score is based on the temporal axioms that were presented in Chapter 4 and the
two timelines of the industry, presented once again in Figure 5.5. The two timelines of the
industry together yield the temporal score, the rules for the temporal score are divided
according to the two timelines. The basic reasoning behind the temporal score rules is that
concepts within the same temporal phase are more likely to be closely related than concepts
that are in different phases. Also concepts in adjacent temporal phases are more likely to be
somewhat related than concepts in more disjointed temporal phases. This is the basic premise
of the majority of the rules for the calculation of the temporal aspect.
pre-production production post-production
conception production utilisation
distribution discovery access preservation
reuse/re purpose
Production Cycle
Life Stage
Production Cycle as a whole
Figure 5.5: The two timelines of the industry (Reusing Figure 1.1)
The rules for each of the cycles is presented in the next two sections and it must be here noted
that the when the rules speak of “adjacent” and “non-adjacent” temporal phases, they are
referring to the phases of the timeline as presented in sequence in Figure 5.5.
5.2.2.1 Motion Picture Industry Production Cycle
Rule 1: Concepts that are equivalent do not have a temporal component
For example, FPS is simply the abbreviation of Frame rate per second. They are essentially
the same concept. As such assigning a temporal component to them is meaningless.
102
Rule 2: If the two concepts are in the same cycle then add 1 to the score
Concepts that are temporally proximate are more likely to be reliant on each other. For
example, all concepts involved with the production phase rely on each other in order to
accomplish the task of production. Therefore, the score should be low for concepts in the
same phase.
Rule 3: If the two concepts are in adjacent phases then add 3 to the score
As the phases are linked, concepts in adjacent phases can feed into each other. However, this
is not as certain a relationship as the concepts being in the same phase.
Rule 4: If the two concepts are in non-adjacent cycles then add 5 to the score
A concept in post-production is likely to have very little to do with a concept in pre-
production. This is the reasoning behind this rule. 5 was therefore picked as the number
sufficiently high enough to capture this sense of temporal distance.
Rule 5: If one or both the concepts have a link to the production cycle as a whole then add
1 to the score
This is related to the first rule. If one of the concepts being compared spans the entire
production cycle then regardless of the temporal position of the other concept, it is as if both
the concepts are in the same temporal phase.
5.2.2.2 Motion Picture Life Stage
Rule 1: Concepts that are equivalent do not have a temporal component
This is a repeat of Rule 5 for the production cycle and simply means that when two concepts
are linked through equivalence they do not receive a temporal score from the Life stage
timeline.
Rule 2: If the two concepts are in the same stage then add 1 to the score
Once again concepts being part of the same temporal phase means concepts are more likely
to be closely associated.
Rule 3: If the two concepts are in adjacent stages then add 3 to the score
103
Adjacent temporal phases means concepts are more likely to feed into each other. Adjacent
phases (as defined by the timeline presented in Figure 5.5) have meaning in that one stage
flows into another and while a motion picture object can simultaneously be in multiple
utilization stages, it does not render the temporal proximity or lack thereof between activities
any less significant.
Rule 4: If the two concepts are in non-adjacent stages then add 5 to the score
Non-adjacent temporal phase means concepts are far enough away temporally to warrant a
bigger score. Like the previous case, while the motion picture object can simultaneously be in
multiple utilization stages, the activities executed on the object are still affected by their
temporal position and so activities undertaken in non-adjacent Life stages are more distant
then those in adjacent Life stages or the same Life stage.
Rule 5: If one of the concepts being compared is an agent then the score for this axis is set
to 0
This comes from the temporal axioms. Because agents do have not life stage associated with
them, a life stage temporal score cannot be calculated for them. As such, the life stage
component of the temporal axis must be set to zero of one of the concept involved is an
agent. Once again, this is because the industry does not associate a life stage phase with
agents but do so for non-agent concepts where the association is based on when the non-agent
concepts are most used or produced. Why the industry makes this distinction is unclear.
However, it is how the industry behaves and we have merely reflected the behaviour in the
rule.
5.2.3 Example Calculations
What is the distance between Method Acting and Actor? More importantly, what should the
distance be? Method acting is a type of acting technique. Acting is what actors do. However,
not all actors can or choose to apply method acting techniques. As shown in Figure 5.6, in the
ontology, Method Acting is a child of Acting, while Acting is linked to Actor through the
concrete relationship of “is performed by”. The temporal calculation for the actor and
method acting only occurs on one timeline because actor has no life stage because the
industry does not associate actor with a life stage. Therefore, the temporal score is +1 and the
reach score is 3 (with +2 coming from the inheritance from Acting and the other +1 coming
104
from the strong link that connects Acting to Actor). This gives a total score of 4 on a scale of
1 to 100, indicating a close relationship.
ActorActing
Method Acting
is performed by
+2 reach score
+1 reach score
total score: 4
+1 temporal scorepre-production production post-production
Production Cycle+1 temporal score
Production Cycle as a whole
Figure 5.6: The score calculation of Method Acting to Actor
The next example of calculation is the path between Mood and Category. As shown in Figure
5.7, Mood is a type of Emotive Category. As such it lies in the inheritance hierarchy of
Category. Following the relationship up the inheritance hierarchy yields a reach score of 4,
with each step-up adding +2 to the total as defined by Rule 1 of the Inheritance Axis. In this
case the temporal score is calculated along both timeline and together yield a score of 2
because Mood inherits its life stage classification from Category and Mood and both the
concepts can said to be in the same production cycle phase because Category spans the entire
production cycle, so the fact that Mood is limited to post-production is of no consequence.
This yields the total score of 6 from Mood to Category.
105
Mood
Category
total score: 6
EmotiveCategory
Emotion
+1 temporal score
involves concept
+2 reach score
+2 reach score
+1 temporal score
pre-production production post-production
Production Cycle as a whole
Production Cycle
conception production utilisation
distribution discovery access preservation
reuse/re purpose
Life Stage
Figure 5.7: The score calculation of Mood to Category
The next example of calculation is from Mood to Rating, where rating refers to the
classification of motion pictures. E.g. PG, M, MA etc. Like Mood, Rating is also a child of
Category. Unlike Mood, Rating is a criteria based category and to reach it from Rating, you
have to go through Category. This is shown in Figure 5.8. The temporal score does not
change in this case. The only increase occurs in terms of the reach score, which is increased
by 4 because of the extra distance that needs to be travelled between mood and rating.
106
Mood
Category
total score: 10
EmotiveCategory
Emotion
+1 temporal score
involves concept
+2 reach score
+2 reach score
Criteria basedCategory
+2 reach score
Rating
Criteriainvolves concept
+2 reach score
+1 temporal score
conception production utilisation
distribution discovery access preservation
reuse/re purpose
Life Stage
+1 temporal score
pre-production production post-production
Production Cycle as a whole
Production Cycle
Figure 5.8: The score calculation of Mood to Rating
The next example calculates the distance between Film Festival and Prestige. Prestige can be
acquired from being involved in a film festival but that prestige is dependent on a multiple
factors. For a new film maker, acceptance into any film festival with a barrier to entry is a
matter of prestige. For an established film maker even the top film festivals do not confer
prestige upon them, rather they add to the prestige of the film festival by agreeing to take part
in it. For example, the fact that director Steven Spielberg agreed to premier his movie at the
Cannes Film Festival reinforces the stature of Cannes as the world’s leading film festival.
Steven Spielberg himself is so famous that no film festival can add to his prestige, though
being refused the right to premier his new movie at Cannes could possible lead to questions
and a slight reduction in his prestige as that would imply his latest film might be sub-par. By
the same token, the prestige of Cannes Film Festival and that of Steven Spielberg is such that
by refusing the right to premier his new movie at Cannes, the Film Festival itself can run the
risk of having its own prestige lessened.
It is a complex relationship but the point is that while film festivals and prestige are linked,
the link is by no means obvious or explicit. However, what is explicit is the link between
107
Film Festival and Award. Film Festivals give awards to those who participate in said
festivals. The award can take the form of an actual judged award that confers either a
monetary reward or simply a prestigious trophy, such as the Palme d'Or award for Cannes
Film Festival that does not have any monetary reward associated with it, or be implicit simply
through participation. Therefore, the link between Prestige and Film Festivals exists through
the concept of Award and specifically the Prestige Award, as shown in Figure 5.9. The total
score of 16 is reached primarily by reaching Prestige through the two weak links “is
associated with” and “involves concept”. The score reflects the weak but still somewhat close
association between Film Festival and Prestige. The score is high enough to reflect that
Prestige is not indispensably associated with Film Festival but still low enough to reflect that
part of the function of Film Festivals is to confer Prestige.
FilmFestival
Prestige
Award
Prestige Award involves concept
+5 reach score
+2 reach score
+3 temporal score
is a
sso
ciat
ed w
ith
+5
rea
ch
sco
re
Total Score: 16
pre-production production post-production
Production Cycle
Production Cycle as a whole
+1 temporal score
conception production utilisation
distribution discovery access preservation
reuse/re purposeLife Stage
Figure 5.9: The score calculation of Film Festival to Prestige
The next example of calculation demonstrates the linkage of two concepts through an abstract
root concept. In this case the two concepts being linked are Score and Prop, which are linked
through Artifact as illustrated in Figure 5.10. Though they are mostly in the same temporal
phase, the temporal score is largely eclipsed due to the high reach score. The high reach score
108
is a result of the workings of inheritance axis rule 3 (Section 5.2.1.1) which assigns a weight
of 25 to the edges connecting Score to Artifact and Prop to Artifact due the Artifact being a
root concept and as a result, representing the highest level of abstraction.
This is consistent with observed behaviour in the industry. When the reach between concepts
is small, the temporal score can add a layer of context. However, if the reach score indicates
that the two concepts are very distant then their temporal position becomes largely
meaningless. What does it matter if two concepts are in the same phase when they are so
clearly far apart within the context of the domain ontology?
To reiterate, when the reach is small, the temporal context becomes significant and adds a
layer of meaning. If two concepts are closely associated but temporally distant what it may
give information on sequencing or other information that add a dimension of meaning and
context. However, if the concepts are far apart within the domain ontology then the temporal
context has no bearing because the concepts are too unrelated.
score prop
Artifact
+1 temporal score
+25 Reach Score
+25 Reach Score
Total Score: 54
+3 temporal score
pre-production production post-production
Production Cycle as a whole
Production Cycle
conception production utilisation
distribution discovery access preservation
reuse/re purposeLife Stage
Figure 5.10: The score calculation of Score to Prop
The last example calculation is between Method Acting and Camera. In this case the two
concepts have a very high total score, 66. This indicates that these two terms are not related at
109
all, save that they are both part of the motion picture domain. This is reflected in the fact that
the method of getting to camera involves going though the central concept of motion picture.
Camera
Motion Picture
Acting
Method Acting
+2 reach score
+1 temporal score
Action is associated with Tool
Hardware
is associated with
+25 reach score
+5 reach score +5 reach score
+25 reach score
+2 reach score
+1 temporal score
Total Score: 66
conception production utilisation
distribution discovery access preservation
reuse/re purposeLife Stage
pre-production production post-production
Production Cycle
Production Cycle as a whole
Figure 5.11: The score calculation of Method Acting to Camera
Figure 5.11 is however not the only method to get to Camera from Method Acting. Another
path exists through the agents and that demonstrates the need for proper weighting for
abstraction. This path, illustrated in Figure 5.12, involves going through Actor up the Agent
ontology inheritance hierarchy and then coming down through Cast. This leads to higher
score of 69. The agent part of the ontology groups many different agents together under a
handful of abstract concepts. Through these abstract concepts all agents are linked to each
other and to the concepts they in turn are closely. However, the jump between the abstract
concept of Technical Crew to an actually technical crew such as a cameraman is extremely
high, i.e. the number of properties inherited by the Cameraman concept from the Technical
Crew concept is extremely small. As a result, any concept to which the Cameraman concept
is connected through the Technical Crew concept cannot be said to be as related to the
Cameraman concept as concept to which the Cameraman concept is linked through other
means. Although in this case both paths through the ontology produced similar reach scores,
110
this will not always be the case. The reach score that will be used to determine relatedness
will always be the reach score determined from the minimum weighted path.
Camera
Acting
Method Acting
+2 reach score
+1 temporal score
Total Score: 69
Actor
+1 reach score
is performed by
Cast
People Agenttype
CameraOperator
is operated by
Technical Crew
Crew
+25 reach score
+5 reach score
+5 reach score
+25 reach score
+2 reach score
+2 reach score
conception production utilisation
distribution discovery access preservation
reuse/re purposeLife Stage
+1 temporal score
pre-production production post-production
Production Cycle as a whole
Production Cycle
Figure 5.12: The score calculation of Method Acting to Camera through Agents
5.3 Abstraction
We have discussed how concepts share, or not share, properties along the inheritance
hierarchy and how certain concepts have more properties than other concepts. Concepts with
few properties are considered abstract and, because many concrete concepts can inherit from
a given abstract concept, the problem of abstraction in semantic relatedness is created. As
identified by Li, abstraction is a known problem in the measurement of semantic relatedness
[78]. As explained in Section 2.3.2, Li observed that certain concepts are more abstract than
others [78] in a hierarchical semantic knowledge base and if this abstraction is not taken into
account, concepts that are not actually closely related can appear closely related when a
measure of their relatedness is calculated [78].
111
Within existing measures of semantic relatedness, abstraction is a problem because of
hierarchical taxonomies where certain concepts are parents of many sub-concepts which
specialise the parent concept to a great extent. As we explained through the concept of
Action, the domain of motion picture has many types of Action. Editing is an Action, Acting is
an Action, Directing is an Action, Producing is an Action etc. However, they are all very
different type of actions and while they are all Actions, they do not share very much beyond
that. This makes Action a very abstract concept that must be accounted for properly.
The ability to account for the sharing for the difference in shared properties along the
inheritance hierarchy is necessary to avoid false assumptions of closeness. We believe that
our score-based measure is better able to reflect semantic relatedness in the presence of
abstraction. The key benefits being that under our system, any concept can be labelled as
abstract at the point of implementation and thus specifically accounted. It does not rely on
position within the trees or concentration of links, although it can work in conjunction with
any such measure, where such a measure is used to first determine a concept to be abstract
before applying the score. In this way it can help overcome some of the abstraction problem
that many face when calculating semantic relatedness.
What might need to change when our method is applied to other ontologies is the weight
assigned to edges. The weights were chosen based on the Loculus Ontology for the motion
picture industry and took into account the structure and axioms of the Loculus Ontology. For
other ontologies, the weight of connecting edges must reflect the structure of and axioms of
that ontology. The rules would still apply, but depending on the ontology, weights assigned
by those rules might have to change depending on the ontology to which the rules are
applied.
5.4 Application of the Metric
In this chapter so far we have presented the relatedness metric and have shown how the
metric has been applied to the ontology to determine the relatedness of pairs of ontological
concepts. In this section we will present how we plan to use the combination of the Loculus
ontology and the metric to answer the research questions posed in Chapter 3, specifically the
question - Can agent perspective be exploited to better serve the needs of the motion picture
Industry?
112
When an agent is involved, the relatedness metric yields the perspective of the agent. The
relationship can be expressed as: Perspectiveagent (concept) = distance (agent, concept).
As illustrated in Figure 5.13 by measuring the semantic relatedness distance between the
agent and the concept we can get an idea of how familiar or not familiar the agent is with a
certain concept. From the figure we see that an agent occupying the role of Editor has a very
close perspective of the concept Cross-Cutting, while an Actor has a far more distant
perspective. Cross-Cutting is an editing technique; an editor would know all about it and
exactly how to do it. An actor may only be familiar with the term and have some idea of what
it entails but their knowledge of the technique is far weaker than those of an editor.
Differences in perspective can have implications in the information needs of different agents.
Cross-cutting6 58
Actor
Editor
Figure 5.13: Agent Perspective
It must be noted that there is a difference between when the distance is measured between an
agent and a concept and between two non-agent concepts. The difference is that non-agent
concepts cannot have perspectives. As such when the relatedness is measured between two
non-agent concepts, it is not perspective but ontological distance. The ontological distance
between concepts and the perspective of agents need to be taken together to understand the
implications they have on the information-seeking behaviour of the agent involved.
One use of agent perspective is that it can approximate how precise or imprecisely worded
queries as pertaining to information exploitation are and whether a substitution is possible.
Figure 5.14 demonstrates the perspective of the Editor to four different concepts. Note the
distance of the Editor to each of the concepts and the distance of the concepts themselves to
each other. Even though Cross Cutting and Cutting on Action have similar semantic score to
that of Category and Genre have, the Editor has a closer perspective of Cross Cutting and
Cutting on Action than that of Category and Genre. This implies that the Editor comprehends
small differences between Cross Cutting and Cutting on Action. However, the Editor is less
113
likely to be overly concerned about the differences between Category and Genre; therefore
substituting one for the other may be appreciated by the Editor.
Category
Genre
Cross-cutting
cutting onactionEditor
6
52
56
6
4
8
Figure 5.14: Editor Perspective
The above conjecture leads us to formulate an axiom regarding the application of user driven
perspective for information extraction and classification. The axiom is given below.
[Perspective 1] If the distance between two concepts is sufficiently low, an agent who is
sufficiently distant from both concepts may consider the two concepts to be substitutable.
This axiom is a cornerstone of how the combination of the ontology and the relatedness
metric is to be used to meet the information needs of the motion picture industry.
However, the axiom also begs the question what it mean when the relatedness score of two
concepts is designated to be “sufficiently low” or when the distance between two concepts is
designated to be “sufficiently distant”. This is defined by the notion of Tolerance. We define
Tolerance as the maximum score the user is willing to tolerate when concept substitutions
are suggested. Also, the term Tolerance applies to the minimum distance the user needs to be
before they can be considered sufficiently distant from a concept. Tolerance is something
that is implemented at the system level and we did just that with our system. We will discuss
Tolerance and its use at the system level in Section 6.6.1.
5.4.1 Information Extraction
Perspective is mostly evident in the extraction of information based on a given query,
because a user formulates a query based on, per the dictionary definition of perspective, the
state of their ideas and the facts known to them. Similarly the user wants an answer framed
from that perspective as well. To that end, for information extraction, perspective can be used
in three ways: to expand the parameters of the query, clarification of ambiguity in the query
and lastly, to determine the relevance of a given piece of information.
114
The expansion of the parameters of the query becomes desirable when the user mentions a
concept in their query from which the user is sufficiently distant so that other concepts can be
considered to be substitutable to the required concept. For example, an editor submits the
query “Return me examples of Cross-cutting in motion pictures of the genre dark comedy”.
Referring back to Figure 5.14, the Editor has a near view of the concept Cross-cutting, as
illustrated in Figure 5.15, so there is no point in retrieving examples of any other editing
technique. On the other hand, the Editor has a sufficiently distant relationship from Genre so
that some substitution may be possible.
editing
action
editor
inherits from
is performed by
cross-cutting
cutaway cutback
jointly inheritance from
Figure 5.15: Ontology Extract – Editor and cross-cutting
115
As illustrated in Figure 5.16, if a motion picture has a Genre of “comedy” and has “dark” in
its Category information, then the Genre and the Category can be taken together and the
motion picture presented as a candidate that might satisfy the editor’s query. The central idea
here is that the editor might not realize that there is no such genre as “dark comedy”, but
there is a Genre “comedy” and Tone “dark” (Tone being a child of the concept Category). On
the other hand, a Distribution Manager would understand this distinction and could
reformulate the query themselves if they really wanted to expand the search.
Disjointed With
genrepost
production
discovery
criteria basedcategory
category
description
criteria
productioncycle
is used during
is used during
involves conceptemotivecategory
emotionsinvolves concept
tonepost
production
is used during
classify under
inherits from
inherits from
inherits from inherits from
rating
Figure 5.16: Ontology Extract – Genre, Category and Tone
In terms of the clarification of ambiguity, a user might present a query in such a manner that
it gives rise to some confusion. The source of the confusion could be the presence of a
concept within the query that is not present in the ontology or simply an awkwardly phased
query that contains concepts that have joint children. For example, as illustrated in Figure
5.17, Cross-Cutting is a joint child of Cutaway and Cutback. If a query contains the concepts
Cutaway and Cutback, it increases the likelihood that what the user is seeking is Cross-
Cutting. However, if the query is posed by an editor, who has a close association with all the
terms (see Figure 5.16) then the probability is that the user really did mean express the
concepts separately and would not appreciate being asked “Did you mean Cross-Cutting?”.
This is how perspective comes into play. The ontology and relatedness metric can be applied
116
to information extraction generically. However, only by acknowledging the user and their
perspective can their true query be divined.
cross-cutting
cutaway cutback
jointly inherits from
Figure 5.17: Ontology Extract – Cross-cutting
The last application of perspective in terms of extraction of information is identifying the
relevance of specific information and also ranking the information for presentation. For
relevance there are two different perspectives that come into play; firstly the perspective of
the user posing the query, and secondly the perspective of the originator of the information.
In other words, if the originator of the information and the user posing the query have the
same role, then the relevance of the information increases. In addition, the determination of
relevance takes into account the other concepts that are present in the piece of information
and the user’s perspective to the information, i.e. the relevance of a piece of information is
directly proportional to the number of concepts present in that piece of information to which
the user has a close relationship.
For example, when an actor poses the question “How do you film a punch?” the actor is most
likely looking for information pertaining to acting tips and techniques and as such the most
relevant piece of information for the actor would be information generated by another actor
that contains concepts to which the role of the actor has close connections, such as Method
Acting, Shooting, Make-up etc. Likewise, when an editor poses the question “How do you
film a punch?” the editor is most likely looking for information from an editing perspective.
Similarly a director would prefer information from the directing perspective. Therefore
perspective could be used in ranking information to be returned to the user.
117
5.4.2 Information Classification
Perspective can be used in information classification in two distinct ways: 1) correction of
errors and ambiguities; and 2) the removal of redundancies. Firstly, perspective can be used
to make corrections in the data being ingested and classified, to improve data qualities.
For example, AFTRS Distribution Release forms have the term “music score” under cast list.
In this case “music score” refers to the composer of the music score and not the music score
artifact. This gives rise to some ambiguity that can be picked up by measuring semantic
distance between concepts preceding the appearance of the term “music score” within the
form and through analysis of the nesting structure (“music score” is embedded inside the
“crew list”) but more importantly by analysing the perspective of the originator of the
information. In this case, it is the Producer (student producer to be precise) who is filling in
the distribution release form. The Producer has a closer relationship to the Composer than to
Music Score, but Music Score and Composer are closely linked. Thus the system can propose
that: 1) “music score” is probably not the correct term to use given the context (embedded
inside “crew list”, preceded by agent concepts); and 2) the better concept would be Composer
given that the Composer is a member of the Crew, has a close relationship to Music Score
and a closer relationship to the Producer than Music Score. The system can then flag the
suggestion to the user, with the proposed concept Composer and let the user make the final
decision regarding the correction. The intervention of the user is advisable in order to ensure
the integrity of the classification; perhaps Music Score was really intended.
Indeed the intervention of the user is necessary for the second type of application of
perspective: redundancy removal. In this application the system combs through data to be
classified to flag potential redundancies. To determine a redundancy the system must take
into account the perspective of the originator of the information. If the originator of the
information is sufficiently removed from two concepts that can be considered equivalent
based on the perspective axiom, and the instance value of both the concepts are the same than
it could be a case of redundancy and should therefore be flagged. For example, if a motion
picture is said to have the Tone “dark” and the Mood “dark”, one of those information is
potentially redundant as Tone and Mood are used synonymously by many agents in the
motion picture Industry.
118
Given that the literature has shown that humans categorize things based on internal
ontologies and find such categories useful [17], an interesting application of the ontology and
perspective might be in the realm of better form design, or more generally, better interface
design for information capture or even modulated interface for information capture that
presents relevant sections to appropriate agents and compiles the information together for
classification and administrative processing. This is beyond the scope of our research but
certainly an area of knowledge capture and information management where our research
might have some application.
5.5 Summary
In this chapter we presented a score-based relatedness metric that measures the relative
distance between concepts. The metric was developed as an answer to the research question:
how can the ontology be used to explore agent perspective, which was posed in Section 3.1.
The relatedness metric is calculated along the three axes, the temporal axis, the
inheritance/vertical axis and the linkage/horizontal axis. These scores are to measure the
perspective of an agent in relation to concepts within the domain of discourse and the
ontological relatedness between non-agent concepts. We plan to use this metric in
combination with the ontology to implement enhanced information extraction and
classification functionality. In the next chapter we discuss the Loculus System, which uses
Loculus ontology and the relatedness metric presented in this chapter to implement enhanced
methods of information extraction and classification.
119
6 Loculus System and Loculus Schema “The Matrix is a system, Neo.” – Morpheus, The Matrix
In Chapter 4, we introduced the Loculus domain ontology for the Motion Picture Industry. In
Chapter 5, we introduced a method by which we can measure the semantic relatedness of
concepts in the domain and suggested ways in which the metric and the ontology can be
applied for information exploitation. In this chapter we present the proof-of-concept
demonstrator that we built to test our ideas. The system was developed to answer the research
question: What kind of a system could exploit agent perspective to better serve the needs of
the Motion Picture Industry?
Originally, the proof-of-concept demonstrator was to be built under the umbrella of a larger
project which did not proceed as originally planned. Therefore the proof-of-concept
demonstrator was built as part of this thesis project within its more limited resource
constraints, such as time and manpower.
There are two aspects to the Loculus System: the Abstract System and the implementation of
the Abstract System into the concrete Loculus System. In this chapter, we first discuss the
Abstract System, before delving into its concrete manifestation, the Loculus System. In
between, we discuss the Loculus Wrapper Scheme, which is used by the System as a
semantic container for internal information storage. The evaluation of the system will be
discussed in Chapter 7.
6.1 The System
6.1.1 The High-Level System Architecture
The high-level system architecture, shown in Figure 7.1, has seven function modules that are
controlled by a central core module, which also interacts with the User Interface (UI). The
UI makes available to the user a range of options that result in the core invoking one or more
modules. Where more than one module is involved the core assists in the orchestration of the
tasks across modules where necessary. In addition, the Ingestation module, Dissemination
Module and the External Data Services Modules are abstract modules that need to be
implemented through plug-ins before their functionality can be invoked. The Semantic
Module too is extended through plug-ins to adapt it to the ontology to be used, as well as the
specific set of relatedness metric rules to be used. In contrast, the functionality of the other
120
modules are largely static and are unaffected by the ontology or the relatedness rules that are
going to be used.
RecordManagement
Module
SemanticModule
InformationExtraction
Module
Core
IngestationModule
DisseminationModule
ExternalData Services
UserInterface
Classification Module
Figure 6.1: High-level System Architecture
The modules presented in Figure 6.1 correspond with a set of tasks available for invocation
from the UI menu, shown in Figure 6.2. The tasks are:
• The View Loculus Record task interacts with the Record Management Module;
• The Discovery And Decision Support task interacts with the Information Extraction
Module;
• The Ingest task interacts with the Ingestation Module and the Classification Module;
• Lastly the Disseminate task invokes the Dissemination Module and also requires
interaction with the Information Extraction Module.
Note, the home and the help buttons take the user to text-only screens with no computational
functionality. The module with which the user has no direct interaction is the Semantic
Module. The External Data Services are interacted with through the Ingest and Disseminate
tasks.
Figure
Although the UI does not impose any constants on the ordering of tasks, there is a workflow
within the system that is illustrated in Figure 6.3. Before any information extraction can take
place the system must contain a repository of information. This information repository is
created through the Ingest task. This task first called the
set of data provided by the user or received from an
The Ingestation Module does nothing but read
passes the ingested data to the Classification Module
the Semantic Module to classify
the data. The Classification Module
error and redundancy detection. The classified data is then convert
container and stored as a record within the system records by the
Module and the user can access these records through the UI’s
records are XML documents that marks up the stored infor
based on the ontology and Loculus metadata Scheme, which wi
6.3.1, wrapping the record within a
containers and facilitates higher order semantic operations. Through repeated ingestation a
repository of records is built within the system and the user can then invoke information
extraction related activities meaningfully.
Figure 6.2: User Interface - Loculus Menu
Although the UI does not impose any constants on the ordering of tasks, there is a workflow
ated in Figure 6.3. Before any information extraction can take
place the system must contain a repository of information. This information repository is
task. This task first called the Ingestation Module to ingest a given
data provided by the user or received from an External Data Service, such as IMDB.
does nothing but read-in the data. The Ingestation Module
Classification Module. The Classification Module
the information based on ontological concepts present within
Classification Module also provides additional value-added functions such as
error and redundancy detection. The classified data is then converted into a semantically rich
container and stored as a record within the system records by the Record Management
and the user can access these records through the UI’s View Record function. The
records are XML documents that marks up the stored information using relevant metadata
based on the ontology and Loculus metadata Scheme, which will be discussed in Section
within a semantic layer. The semantic layer creates
ates higher order semantic operations. Through repeated ingestation a
repository of records is built within the system and the user can then invoke information
extraction related activities meaningfully.
121
Although the UI does not impose any constants on the ordering of tasks, there is a workflow
ated in Figure 6.3. Before any information extraction can take
place the system must contain a repository of information. This information repository is
to ingest a given
, such as IMDB.
Ingestation Module then
Classification Module then uses
the information based on ontological concepts present within
added functions such as
ed into a semantically rich
Record Management
function. The
mation using relevant metadata
ll be discussed in Section
semantic layer creates semantic
ates higher order semantic operations. Through repeated ingestation a
repository of records is built within the system and the user can then invoke information
122
aUserInterface
IngestationModule
Core
ClassificationModule
RecordManagement
Module
InformationExtraction
Module
DisseminationModule
External Data Services
SemanticModule
Figure 6.3: System Work Flow
The Discovery and Decision Support task and the Disseminate task are both forms of
Information Extraction. For Discovery and Decision Support the information extracted is
displayed on screen for the perusal of the user. The display is ephemeral in nature as the
results of such a search are not stored by default. On the other hand, the Disseminate task
results in the information extracted being repackaged in some form for distribution. The
distribution can either entail the user taking the information and storing it in a format for
themselves to refer back to later or the system may distribute the data to external sources,
such as IMDB, by accessing the External Data Services. This is the general behaviour of the
modules and, as described, none of the functionality is associated in any specific way to the
Motion Picture Industry.
6.1.2 The Loculus System Architecture
The architecture becomes tied to the Motion Picture Industry when it is implemented. Figure
6.4 shows a more detailed version of the system architecture. Notice that the Record
Management Module is now called the Loculus Record Management Module and that the
Semantic Module has the Loculus Ontology Reader. There are the methods by which the
general system architecture is customised for use for a given industry. In our case, we have
123
developed the Loculus metadata Wrapper (to be discussed in Section 6.3.1) to act as the
semantic container for the ingested information and customised the Semantic Module to work
with the Loculus Ontology presented in Chapter 4 and the rules for relatedness metric
calculation presented in Chapter 5. In addition, we developed specific plug-ins for the
Ingestation Module and the Dissemination Module to suit our purposes. The modules are
discussed in detail in the coming sections.
aUserInterface
LoculusRecord
ManagementModule
SemanticModule
InformationExtraction
Module
Core
Write
Read
Create
Delete
IngestationModule
DisseminationModule
ExternalData Services
DistanceMetric
IMDB
Query Ranking
IMDB
GoogleVideo
XMLIngestation
LoculusOntologyReader
ClassificationIdentification
ProductionCycle
Identification
Classification Module
TextIngestation
IndustryDatabase
Result
Disseminate
Ingest
Classification
RecordFormulation
Figure 6.4: The Loculus System
It must be noted that the technically the system architecture can be applied to other domains
simply by changing the Semantic Module to work with an ontology of a different domain and
follow relatedness rules designed around that ontology. The functionality provided by the
other modules are transferable and can potentially be applied with different incarnations of
the Semantic Module.
124
6.1.3 Technology Choices
The modules were all implemented in the Java Object-oriented programming language [104]
due to its cross-platform operability and the ability of Java to be used as a web-service
backend. In addition, there were key libraries available from open source initiatives that also
made Java an attractive choice. The libraries used were JDOM [105] and Saxonica’s Saxon
Java Library [106]. The interface was implemented in JavaServer Pages (JSP) [107] also
because of its cross-platform operability and because it enabled the system to be deployed as
a webservice.
JDom is a Java-based document object model for XML. We used JDom to, firstly, develop
an ontology reader the Loculus Ontology, as OWL can be read as an XML document.
Secondly, the JDom Java Library is used for the implementation of other modules and classes
that involve the manipulation of XML, such as within the Record Management Module.
As the repository of records is to be in XML form, Saxonica’s Saxon Java Library was used
for implementation of the Information Extraction Module. The Saxon library provides API
for executing XQuery queries directly from a Java application [106]. XQuery is a query and
functional programming language that is designed to query collections of XML data [108].
Use of XQuery is the most efficient method of querying collections of XML data.
6.2 Ingestation Modules
At the architecture level the Ingestation Module, illustrated in Figure 6.5, is an abstract class
that is to be extended at the time of implementation to provide the ingestation of specific
kinds of files and data, e.g. XML documents, text documents, multimedia content etc.
IngestationModule
XMLIngestation
TextIngestation
Figure 6.5: The Ingestation Module
125
For the purposes of this thesis, only an XML ingestation module has been developed as an
extension to the Ingestation Module of the abstract system. The XML ingestation module (a
Java Class), utilises the JDom Java Library [105] to read in XML documents and pass the
data to the Classification Module, which is discussed in Section 6.5.
6.3 Record Management Module
The Record Management module stores information in a chosen format and assists in the
manipulation and exploitation of that information. The format is not defined at the high-level
architecture but what is specified is that the records created by the record management
module must be a semantic container, i.e. they must be marked up using some metadata
schema, with the schema to use is defined at the time of implementation.
In the Loculus System implementation, the semantic container is formed using the Loculus
Wrapper Schema – which is a XML schema designed for using with content from the Motion
Picture Industry. This schema is discussed in Section 6.3.1. With the use of this schema, the
Record Management Module is renamed the Loculus Record Management Module to
differentiate it from its architectural counterpart that does not have a specific semantic
container format. The Loculus Record Management Module is illustrated in Figure 6.6 and
the implementation details of the module will be discussed in Section 6.3.2.
LoculusRecord
ManagementModule
Write
Read
Create
Delete
Figure 6.6: The Loculus Record Management Module
126
6.3.1 Loculus Wrapper Schema
As discussed in Section 2.1, metadata schemas are used to create semantic containers for
information. The benefits of a semantic container is that it facilities the performance of higher
order semantic manipulation by adding a layer of semantics to the container for the
information by marking up the information in a manner that references the ontology. This
makes the information easier to manipulate subsequently. Without this semantic container it
is more problematic to perform semantic manipulation on the information using an ontology
because the information lacks the semantic layer and therefore, tasks become more
computationally intensive and inefficient.
For the Loculus system to perform higher order information exploitation, it must first ingest
information. During the ingestation process the system must ingest many types of data,
including data that is already encoded in one metadata standard or another. The best way to
deal with this is to create the Loculus Metadata Wrapper Schema that effectively wraps the
ingested data to form a semantic container for the record.
In addition, as discussed in Section 2.1, many metadata schemas already exist in which some
of the information produced by the industry is encoded and when storing the information, the
mark-ups confirmed to the information by those metadata schemas must be preserved so as to
not lose contextual information that surrounded the information. Therefore the Loculus
Wrapper Schema is also designed to wrap information in such a manner that it preserves
original mark-ups.
conception production utilisation
distribution discovery access preservation
reuse/re purposeLife Stage
Figure 6.7: The Life Stage Timeline
The primary foundation and the source of the tags for the schema was the life stage timeline,
which is illustrated in Figure 6.7. This is the classification the industry uses for their
information, with the primary basis being either life stage in which the information originated
or the life stage in which the information is chiefly used. Therefore the wrapper schema
contains tags corresponding to each life stage of the Motion Picture Industry, so that ingested
information can be classified under one of those tags, there-by creating the semantic
container to encapsulate the ingested information. The wrapper is illustrated in Figure 6.8.
127
Loculus
Discovery
Rights
Production
Agent
Provenance
Utilization
Preservation
Artifact
Distribution
Ingest
Administrative
Security
Algorithm
Dissemination
Users
Governance
Figure 6.8: The Loculus metadata Wrapper Schema
As can be seen from Figure 6.8, the mandatory elements of Loculus are one level deep;
holding the root element at level 0. The level 1 elements or tags come from the life stage
Timeline, except the governance element. The governance element contains the metadata
about the Loculus object itself and is designed to be used by systems that utilise the Loculus
metadata Wrapper Schema for the governance of the semantic container that wrapper forms.
The details of the tags are given below.
• Loculus: Root element with the attribute ID
• Production: Level 1 element that is the parent element of all production related elements
and their content that have been ingested, e.g. calls sheets, camera sheets, production stills
etc.
• Distribution: Level 1 element that is the parent element of all distribution related
elements and their content that have been ingested, e.g. publicity information, technical
information such as aspect ratio and frame rates that required for distribution etc.
• Discovery: Level 1 element that is the parent element of all discovery related elements
and their content that have been ingested, e.g. final title, working title, rating information
etc. Discovery is also the parent element under which all description information, usually
described by Dublin Core metadata, is stored. This is because such information usually
aids discovery.
128
• Utilisation: Level 1 element that is the parent element of all Utilisation related elements
and their content that have been ingested, e.g. where the motion picture is being screened,
information about DVD releases, online releases etc.
• Preservation: Level 1 element that is the parent element of all preservation related
elements and their content that have been ingested, e.g. information regarding artistic
intent so that when preservation efforts are undertaken, the intent of the work is not lost
etc.
• Artifact: The artifact element, which is a level 1 element, is a parent element that has
three child elements: technical, content and associations. The technical child element
holds technical information about an artifact. The content element holds semantic and
syntactic metadata for the content of the artifact. Lastly, the associated child element
indicates the other artifacts that are associated with the artifact in question. In this case,
artifact are items that result from the motion picture project process, e.g. the motion
picture itself, the motion picture sound track, props, costumes etc.
• Agent: Level 1 element that is the parent element of all people, organisation and their
associated roles in relation to the artifact, e.g. cast and crew names.
• Rights: Level 1 element that is the parent of all rights related information associated with
the artifact that can be exercised by Agents based on their role, e.g. who owns distribution
rights etc.
• Provenance: The Level 1 element is the parent of an ever evolving record of actions
performed on and events affecting the artifact with reference to specific rights and agents
as required, e.g. who originally distributed the motion picture, who is attempting to
preserve the motion picture etc.
• Governance: Level 1 element that is the parent element of all elements and information
needed for the use and governance of the Loculus object. The child elements of
governance are Administrative, Ingest, Dissemination, Users, Security and Algorithm.
The Loculus metadata wrapper schema should be able to ingest and disseminate a variety of
metadata standard responsible for the capture of information at various stages of the Motion
Picture life stage; as well as metadata standards responsible for capturing information for
various uses and aspects of the artifact, where artifact are the products that result from the
motion picture production process – e.g. the motion picture itself. It must be made clear that
the idea is so have one semantic container per artifact. For example, for the Star Trek the
movie released in 2009, there would be only one semantic container formed using the
129
Loculus metadata Wrapper into which almost all ingested information related to the movie
would be hosted. So, one motion picture would only have one semantic information container
formed using Loculus metadata Wrapper schema.
One of the goals of Loculus is to eliminate information duplication. As such, when ingesting
metadata, Loculus will be consolidating information under the most suitable parent element
in level 1. For example, both METS and MPEG-21 have sections that contain Dublin Core
metadata. When ingested to form one coherent Loculus metadata object: the two Dublin
Core- based description sections of the two documents would be reconciled and stored in a
single place within the Loculus document. When disseminating, along with METS and
MPEG-21 metadata objects, Loculus would also be able to disseminate a Dublin Core- only
metadata object or any other metadata object that the information allows Loculus to produce.
In summary, the Loculus Metadata Wrapper schema was developed by us as a mechanism to
create an aggregated semantic container for use by the Record Management Module of the
Loculus system in which all information about a motion picture is captured.
In addition to this project, the Loculus Metadata schema was used in the Digital Artwork
Expression Language (DAEL) as a metadata wrapper [109], which is an unrelated project that
was developing tools and techniques for the semantic analysis of still images.
6.3.2 Implementation Details
As introduced in Section 6.3, the Loculus Record Management Module wraps the ingested
data using the Loculus Wrapper Schema and produces a Loculus Record for use internally
within the system. One Loculus Record conforms to one given Motion Picture that spans the
entire life stage of said Motion Picture. The same record can deal with multiple instances of
the Motion Picture, e.g. final cut intended for cinemas, director’s cut that was part of the
special release DVD or cut for the airline screens. The Loculus Record Management Modules
works with classified data from the Classification Module to formulate its records; it does not
deal with raw data directly from the Ingestation Module. Currently, the records are XML flat
files produced by the JDom Java Library. Figure 6.9 shows an example Loculus Record.
130
Figure 6.9: Example Loculus Record
The Loculus Record Management Module is also invoked by the Dissemination Module to
provide the extracts or, as the case may be, the whole record for dissemination, e.g. to IMDB.
The extracts of information or the record to be disseminated is based on the results of
information extraction dictated by the Information Extraction Module that handles user
query. So, the Loculus Record Management Module works in conjunction with the
Information Extraction Module, detailed in Section 6.6, and the relevant incarnation of the
Dissemination Module, detailed in Section 6.7, to provide and facilitate the dissemination of
relevant information. Internally, it uses JDom Java Library to read the records it has in
storage and extracts the relevant subsection of the XML record document and passes it on to
the Dissemination Module for preparation for dissemination. The Loculus Record
Management Module itself does not do any preparation.
Lastly, the Loculus Record Management Module can also be invoked directly by the user
though the UI to return a specific record. The user can then manually modify the record if
they so choose. The user can also delete records from the UI.
6.4 Semantic Module
From the perspective of the research being reported in this thesis, the Semantic Module is the
most significant module of the entire Loculus System. Illustrated in Figure 6.10, the
significance of the Semantic Module lies in the fact that it is the module that utilises the
Ontology and the semantic relatedness metric calculations to assist in information
exploitation by supporting the functions of the other modules.
131
SemanticModule
DistanceMetric
LoculusOntologyReader
ClassificationIdentification
ProductionCycle
Identification
Figure 6.10: The Semantic Module
As mentioned before, at the high-level architecture level the Semantic Module is a non-
specific module not tied to any particular ontology or set of rules for relatedness metric
calculations. At the implementation level we bound the module to the Loculus ontology by
giving it an ontology reader adapted to reading the Loculus Ontology. We also developed a
distance metric calculation class that followed the rules we detailed in Section 5.2. The four
classes shown in Figure 6.10 are discussed in the following subsections.
6.4.1 Ontology Reader Class
This class is responsible for reading the ontology. It does so by parsing the ontology using
JDom, which is an open source Java-based document object model for XML that can also be
applied to RDF and OWL, thus being able to find and return concepts within the ontology. It
is a helper class that assists the other classes by handling the ontology, thus allowing the
other classes to be written in general terms, e.g. the other classes would simply ask the
ontology reader to pass the parent concept of the given concept and would not be concerned
with how the parent-child relationship is expressed at the ontology implementation level in
OWL. The list of methods for the class is given in Figure 6.11 with the methods marked with
a green dot being visible and red dots being private.
132
Figure 6.11: The methods of the Loculus Ontology Reader
6.4.2 Classification Identification Class
The Classification Identification class identifies the life stage associated with a given concept
within the ontology. The life stage association of a concept could be inherited through its
parents and this class handles the complexities of discovering the life stage association.
It should be noted that while concepts can inherit from multiple parents, if the parents have
conflicting life stages, the child must have explicitly defined life stage at the ontology level.
133
This is because it is impossible to resolve such a conflict at the system level and should the
system encounter such a situation, it can do nothing but flag to the user that there is a
problem with the ontology and that the system cannot determine the life stage of a given
concept. In an ideal world, the ontology would be well-formed and would not require the
conflict checking mechanisms that have been highlighted here. However, for system safety,
we have made the class able to handle malformed ontologies.
The class is designed to return nothing in situations where a concept does not have a life
stage association, which is the case for agents and some common concepts within the
ontology.
6.4.3 Production Cycle Identification Class
Similar to the Classification Identification class, the Production Cycle Identification class
identifies the production cycle with which a concept is associated.
Once again, this is a non-trivial task due to inheritance. Like life stage, a concept does not
need to have an explicitly defined production cycle but can inherit one from its parents. Once
again, while it is possible for concepts to have multiple parents, the conflict must be resolved
at the ontology level and the system can only through an exception when a conflict is
detected. As common concepts do not have to have a Product Cycle association, the class is
designed to return nothing, in this situation. Once again, in an ideal world, the ontology
would be well-formed and would not require the conflict checking mechanisms that have
been highlighted here. However, for system safety, we have made the class able to handle bad
ontologies.
6.4.4 Distance Metric Class
The methods of the Distance Metric Class are presented in Figure 6.12. It contains two
important public methods. The calculate method is responsible for calculating the distance
score between two concepts as illustrated in Section 5.2.3, following the rules for calculating
the reach score presented in Section 5.2.1 and the temporal score presented in Section 5.2.2.
The method is offered with two alternate parameter sections.
The class is also able to return a list of linked concepts within a given reach score, which is
the visible method getLinkedConceptList. For example, starting from the concept Actor the
134
getLinkedConceptList will return a list of concepts within a certain reach, say a reach of 5.
This in the case of Actor would return such concepts as Acting, Method Acting, Prop,
Costume etc. Both these methods are used by the Classification and Information Extraction
Module.
Figure 6.12: The Distance Metric Class
The Distance Metric Class in the Loculus system prototype computes the pair-wise
relatedness score at runtime. In a production system it would be more computationally
efficient to pre-compute the distances between all pairs of concepts when the ontology is
loaded. However, in practice, the runtime calculation has not proven a problem on the
Loculus Ontology.
In summary, although not directly called by the user, the Semantic Module is responsible for
understanding the semantic relatedness of concept based on the ontology and the semantic
135
relatedness metric. It is used extensively by the Classification Module and the Information
Extraction Module and underpins their functions by providing them with the capability to
perform semantic manipulation.
6.5 Classification Module
The Classification Module, shown in Figure 6.13, is responsible for classifying ingested
information by utilizing the semantic module to:
• Find identify the life stage association of concepts present within ingested
information, as that how information is filed under the Loculus Metadata Wrapper
• To perform error correction, to detect redundancy.
Classification Module
Ingest
Classification
RecordFormulation
Figure 6.13: The Classification Module
The bulk of the algorithms that form part of this module, are general and are part of the
Abstract System, as they are not specifically tied to the Loculus Ontology and Loculus
Relatedness metric. The three classes shown in Figure 6.13 are discussed in the sub-sections
to follow.
6.5.1 Ingest Class
The Ingest class interacts with the Ingest Module to receive the ingested data in preparation
for classification. In essence, it passes each piece of ingested information to the classification
class and obtains the correct classification before passing it on to the record formulation
136
class. It does not do any direct ingesting but is used when the Core Module orchestrates a
workflow between the Classification Module and the Ingestation Module.
6.5.2 Record Formulation Class
The Record Formulation class interacts with the Loculus Record Management Module to
form the Loculus Record for the ingested data. In essence, it holds ingested data as it is being
classified before passing entire ingested and classified information set to the Loculus Record
Management Module for it to either append the ingested data to an existing record or create a
new record, as appropriate. Once again, the class is used as part of the workflow orchestration
by the Core Module.
6.5.3 Classification Class
The Classifiction Class classifies the ingested information. Firstly, it searches for key
concepts within the information and uses the Semantic Module to detect if the concepts are
present in the Loculus Ontology. If the concepts are present, it uses the classification
identification class of the Semantic Module to determine the life stage of the concept. The
life stage is the system of classification used to organise information within the semantic
container formed by the Loculus metadata Wrapper Schema.
As information being ingested usually involves many ontology concepts, it is normal to
expect ingested information to be from the same life stage of the motion picture. For
example, it would be unusual for information about camera operation to be grouped with film
festival details. Therefore, the classification module cross-checks the ontological relatedness
and life stage classification of all identified ontological concepts in the ingested data in order
to provide warnings if ingested information appears insufficiently related. Indeed, this cross-
checking identified an error in the form design of AFTRS. In the AFTRS form, the concept
music score appear surrounded by concepts like Producer, Actor, Director and within the
grouping of crew. Then the Music Score will be flagged to the user as concept that is out of
place. In this case, the intended information was Composer, the agent most associated with
music score.
Lastly, for redundancy checking the module checks for concepts that have the same instance
value, e.g. the concept Tone with the instance dark, and checks to see if any other concepts
137
that are a very small distance apart from the concept also has the same instance value, e.g. the
concept Mood with the instance dark. If it detects such concepts, it flags these to the user.
Most of the functions of the Classification Module are private and hidden from the user. The
user can only trigger the ingestation of the information. Most of the functions occur internally
with feedback provided to the user when anomalies are detected through cross-checking and
redundancy checking.
In summary, the Classification Module works to classify ingested information and while
engaged in this activity, the module also undertakes error correction and redundancy
checking.
6.6 Information Extraction Module
Illustrated in Figure 6.14, the Information Extraction Module is responsible for extracting
information from the Loculus Records stored within the system and uses that information to
either answer user queries or disseminate the required information to external sources through
the Dissemination Module. This is one of the modules with which the user directly interacts
most. The majority of the algorithms of this module are general and not being specifically
tied to the Loculus Ontology and Loculus Relatedness metric.
InformationExtraction
Module
Query RankingResult
Disseminate
Figure 6.14: The Information Extraction Module
The screen with which the user interacts with the module is shown in Figure 6.15. The screen
is called the Discovery and Decision Support Screen and is a JSP page that interacts with the
Information Extraction Module through the Core Module, see Figure 6.3. For the Discovery
138
and Decision Support Screen the
the queries and is used to match the user with an
Question field is for free-form questions to be entered; this functionality is currently
unimplemented as it related to the larger project
outside the scope of the research being reported in this thesis
use of the ontology and the user perspective
instance queries were implemented to simplify
used to designate whether the user was look to retrieve information or to compute the an
to a question. Only the information retrieval mode is supported in this prototype. The
compute mode is not supported.
Figure 6.15:
The four classes shown in Figure 6.
6.6.1 Query Class
The Query Class handles the user queries. The user query can be a straight forward
information retrieval query but it can also be a complex computational query or a query to
the Perspective field is for listing of the role of the use
and is used to match the user with an agent concept within the ontology
form questions to be entered; this functionality is currently
unimplemented as it related to the larger project’s ambitions outlined in Section
e the scope of the research being reported in this thesis. For the demonstration of the
use of the ontology and the user perspective within this thesis, the more rigid concept
eries were implemented to simplify user-system interactions. The Mode
whether the user was look to retrieve information or to compute the an
Only the information retrieval mode is supported in this prototype. The
: User Interface - Discovery and Decision Support
The four classes shown in Figure 6.14 are discussed in the sub-sections to follow.
handles the user queries. The user query can be a straight forward
formation retrieval query but it can also be a complex computational query or a query to
field is for listing of the role of the user making
agent concept within the ontology. The
form questions to be entered; this functionality is currently
mbitions outlined in Section 1.3 and is
. For the demonstration of the
ore rigid concept-
Mode field is
whether the user was look to retrieve information or to compute the answer
Only the information retrieval mode is supported in this prototype. The
sections to follow.
handles the user queries. The user query can be a straight forward
formation retrieval query but it can also be a complex computational query or a query to
139
generate a information set, e.g. generating a distribution release form by gathering and
collating all relevant information, for dissemination. This class has been constructed to only
utilise the Semantic Module for query expansion, for both dissemination and retrieval. The
Semantic Module is also used for query disambiguation. Again query disambiguation is only
implemented for information retrieval.
For the query expansion, the Query class first requests the Semantic Module to calculate the
user perspective to each of the concepts present. Where the distance between the user’s role
and the concept is very high, the module then requests the Semantic Module to return a list of
linked concepts, and provide them as a suggestion for alternative concepts to the user. The
concepts are chosen based on a level of Tolerance, which we introduced in Section 5.4. We
set our tolerance to 5, i.e. if another concept had a relatedness score of 5 or less than that
concept was returned as suggestions for the user as possible alternative concepts.
In terms of query disambiguation, firstly, if a concept is not present then the query module
tries to deduce what concept might be being referred to by looking for overlaps between the
linked concept list that is returned for concepts preceding the concept and concepts following
the concept. The system then presents the overlapping concepts to the user for consideration.
The other method by which the system disambiguates concepts is when two concepts are
present in the query and are joint parents of another concept. For this the module has to
interact directly with the Ontology Reader. The Ontology Reader has a method that return all
children of a given concept, the Query class invokes this method to get the children of
concepts present in the query, where the concepts overlap, the system presents them to the
user as alternatives.
6.6.2 Result Class
The Result Class is used mostly for the information retrieval type queries. This module
interacts with the ranking class to return information to the user. This is the also the module
that collates information in the background for dissemination.
6.6.3 Ranking Class
The Ranking Class is used for ranking results on user queries. Ranking is done by obtaining a
list of linked concepts from the users perspective and giving higher ranking to pieces of
140
information that have the most overlapping concepts with the concepts given the list of
concepts within a tolerance of 10 from the user’s role.
6.6.4 Disseminate Class
The Disseminate Class works with the Dissemination Module to collated information for
dissemination. In essence, it is holds the results of the dissemination query and passes the
information to the Dissemination Module when all of the information has been gathered. This
class is mostly used for compound disseminations where a complex query entered by the user
needs to be broken down internally into smaller queries that have to be executed one after
another and the results are passed to the Dissemination Module as a single unit.
In summary, the Information Extraction Module works to respond to the user’s queries,
which can be either simple retrieval type queries or computational queries, and either displays
the result to the user or disseminates the result to an external source as instructed.
6.7 Dissemination Module
The Dissemination Module, illustrated in Figure 6.16, works like the Ingestation Module in
that it is an abstract class which is extended into more specific classes for specific
dissemination purposes. These extension classes are responsible for disseminating
information to specific external source and repackaging information in desired forms to
support decision making activities and repurpose information for specific uses.
DisseminationModule
IMDB
GoogleVideo
Figure 6.16: The Dissemination Module
At the time of the writing of this thesis, only the Google Videos Disseminator had been
implemented. This disseminator was given priority for development as it addressed a key
need of our industry partner AFTRS. AFTRS has a channel on Google Videos and was
141
beginning to distribute some of the short works of their students through this Google Video
channel. However, the original preparation for the release is largely a manual process with
assistants and interns of AFTRS manually copy-and-pasting the relevant information from
Distribution Release Forms into the XML based metadata format Google Videos demanded.
6.8 External Data Services Module
The External Data Services, illustrated in Figure 6.17, are a sub-group of the Ingestation and
Dissemination Modules. These are instances of the Ingestation and Dissemination Module
that assist in the communication with an external data service. The Google Disseminator
described in Section 6.7 is an external data service as it communicates with Google Videos.
ExternalData Services
IMDBIndustryDatabase
Figure 6.17: The External Data Services Module
6.9 So, how DO you film a punch?
Now that we have presented the ontology, the relatedness metric and the system, it is time to
revisit the “how to film a punch” scenario discussed in Chapter 1 and explore how that
question would be addressed by our system. Unfortunately, we can only do this conceptually
as we do not have the dataset available to address this scenario in a live system. However,
this does point to the first step that must be taken before the question, “how to film a punch”,
can be addressed. For the system to work, it needs data and the data must come from existing
stores of data such as AFTRS databases, IMDB and metadata records from libraries and so
forth. Referring back to Section 6.1.1 and Figure 6.3, which shows the workflow of the
Loculus system, the first module that will be invoked is the ingestation module. As per
Section 6.2, the ingestation modules is extended to make custom ingestation interfaces that
interact with specific types of data. So, an ingestation module would need to be constructed
142
that communicates with AFTRS databases or another module for interacting with IMDB, and,
since both of these are external data sources, the External Data Sources Module will also
come into play.
The ingested data, which represents the concrete world of instances, will continue along the
workflow, as shown in Figure 6.3, and is converted and stored as XML flat files by the
Loculus Record Management Module (Section 6.3) with the aid of the Classification Module
(Section 6.5) that links the concrete instances to concepts in the ontology and assists in the
creation of the XML flat files that are Loculus Records. Once the Loculus Records are in the
system and are of a critical mass, we can set out answering the question “how do you film a
punch?”.
The “how do you film a punch?” is the thought bubble over the information seeker’s head. It
is not necessarily how the user would address the system. Indeed, given the influence of
Google on information seekers, it is highly likely that the keywords “punch” and “film” will
be entered in the question field of the interface shown in Figure 6.15. The keyword “punch”
is a term that is not present in the ontology, the keyword “film” is a concept in the ontology,
as is the role of the information seeker (for example, a cameraman). The system would then
run a keyword search for “punch” over the dataset stored as Loculus Records and return, as
shown in Figure 6.18, those pieces of information that have the keyword “punch” and
concepts related to cameraman and film. In addition, the pieces of information that contain a
high density of concepts to which the information seeker, the cameraman, has a close
relationship are ranked higher than those pieces of information that have a lower density of
concepts to which the cameraman has a close relationship.
Information related tofilm and associatedconcepts such as
"filming"
Informationcontaining the
keyword "punch"
Information related tocameramen and
associated conceptssuch as "filming"
Information thatis returned
Figure 6.18: Concepts Returned
143
If the words “punch” and “film” were to be input into Google, or even if the phrase “how do
you film a punch” were to be input into Google, the top most links would be films that have
punch in their title such as “Donkey Punch (film)” and “Sucker Punch (film)” etc. This is
because Google has no awareness of perspective and does not have any awareness of what
“film” is. However, the Loculus system is aware that, for example, the person who making
the query has the role cameraman. The Loculus system can also define “film” by the concepts
to which it is connected. As such, the Loculus system can return information that have a high
density of concepts connected to “cameraman” and “film”, that also contain the keyword
“punch”.
In addition, the Loculus system can return fragments of information from multiple Loculus
Records as it is an information extraction system, not merely an information retrieval system.
To put it in another way, the Loculus system does not have return the whole Loculus Record
but can return only the subset that is of importance to the query.
6.10 Summary
The Loculus System is an implementation of a high-level architecture that specifies the use of
the Loculus Ontology and the Loculus Relatedness Metric calculation rules. The system was
developed as an answer to the research question: What kind of a system could exploit agent
perspective to better serve the needs of the Motion Picture Industry? The Semantic Module is
the chief chief component of the system that answers this question as it is the module that
contains the methods that use the ontology and the semantic relatedness metric to assist the
Information Extraction Module and the Classification Module to use user perspective and
ontological distance for information management. The evaluation of this system is covered in
Chapter 7.
144
145
7 Evaluation and Discussion “I’ve spent my entire life doing nothing but collecting comic books... and now there's only
time to say... LIFE WELL SPENT! “– Comic Book Guy, The Simpsons Movie
In this thesis so far we have presented the Loculus Ontology, the rule-based metric for a score
based measure of semantic relatedness and the Loculus system that utilised the ontology and
the metric for information extraction and information classification. In this chapter, we will
evaluate these things, as well as our underlying assumption that agent perspective can be
measured and therefore utilised to better serve the needs of said agents. We will also enter
into extensive discussion of what has been achieved, the limitations and what needs to be
done in the future in the latter half of this chapter.
7.1 Evaluation of the Ontology
The quality of the ontology is generally determined through its usability; therefore, when we
use the Loculus Ontology for the calculation of relatedness and use both the relatedness score
along with the ontology, within the system. The terms of reference for the evaluation were
the magnitude of the score returned by the relatedness score metric. Where the magnitude
was different from the magnitude expected, the fragment of the ontology was closely
inspected and the modelling of the fragment re-evaluated through consultation with industry
experts and industry literature. This was deemed the most effective method of the testing the
ontology to identify errors in modelling, as well as expose idiosyncrasies the might have
crept into the ontology due to the sources used for the construction of the ontology.
7.2 Evaluation of the Metric
As the goal here is to evaluate the scores that the relatedness metric generates, the key to
evaluation is to determine if the magnitude of the numbers being generated gives the correct
level of the relatedness between two concepts. In this case, “correct” would be determined by
the combination of logic and the industry. In addition, this will also test our key assumption,
which is the idea that different roles have different perspective on the same concepts and that
we can predict and gauge this perspective using the Loculus ontology and the relatedness
metric.
Therefore, we established a two stage evaluation of the scores generated by the relatedness
metric that involved us selecting, initially, twenty-five pairs of concepts for stage 1 then
146
increasing it to thirty pairs of concepts and using our metric to calculate relatedness values
for them. The increase in pairings occurred with the view of having a few more technical
concepts in play going from stage 1 to stage 2. We then undertook some Pearson product-
moment correlation coefficient [110] on the stage 2 results. We did not calculate the Pearson
product-moment correlation coefficient for stage 1 as it only involved one person and so the
Pearson product-moment correlation coefficient is not interesting. As an afterthought, we
opted to compare the Loculus metric against Rada’s simple edge counting for stage 2 results
as we thought that comparing the Loculus metric against another metric might be useful [92].
The scores generated by the metric ranged from 0 (Frame rates per second and FPS) to 66
(Method Acting and Camera), on the 0 – 100 scale. As stated in Section 5.2, the scale of our
score can be broken down as “Very near” is anything between 1 to 10, “near” is anything 11
to 20, 20 to 40 is “not near and not far”, 40 to 60 is “far” and anything above 60 is “very far”,
only related to each other on the account of being part of the same domain of discourse. Once
again, this is a relative scale and as such the exact numbers are not as important as the
approximate position on the scale.
The basis of the selection of the pairs of concepts was not random. A truly random selection
of pairs would have statistically produced pairs that were not related which would not have
allowed evaluation of “relatedness”. Rather we picked a series of concepts and paired them
according to what we believed would give us a range of aspects to explore. Therefore some
of the pairings include concepts that are known to be far removed such as Actor and Cross-
cutting (an editing technique) as well as concepts that are known to be closely related such as
Editor and Cross-cutting. We also choose some pairs of concepts that generated mid-range
score as they are concepts that are related but not directly, such as Script and Prop. Props are
based on what the script dictates but the script rarely go into details about the prop. Table 7.1
shows the thirty pairs of concept and the scores that the metric generated for them, with the
pairs of the concepts with the * being the five additional pairs of concepts that were added to
the initial set of twenty five pairs.
Table 7.1: The Thirty Pairs of Concepts and their Relatedness Score
Concept 1 Concept 2 Score
1 Frame Rate per Second FPS 0
2 Mood Tone 0
3 Treatment Producer 2
147
4 Method Acting Actor 3
5 Motion Picture Cinema 3
6 Editor Cutting on action 4 *
7 Film Festival Cinema 4
8 Genre Category 6 *
9 Mood Category 6
10 Mood Style 6
11 Cross-cutting Editor 6
12 Camera Lighting 7
13 Cross-cutting Cutting on action 8 *
14 Screening copy Publicity 8
15 Film Festival Award 9
16 Mood Genre 10
17 Mood Rating 10 *
18 Film Festival Prestige 16
19 Screening copy Prestige 19
20 Motion Picture Actor 32
21 Motion Picture Editor 32
22 Film Festival Producer 32
23 Cross-cutting Motion picture 36
24 Editor Category 52 *
25 Script Prop 52
26 Editor Genre 56 *
27 Cross-cutting Actor 58
28 Script Camera 64
29 Camera Prop 64
30 Method Acting Camera 66
The reason thirty pairs of concepts were chosen was because choosing anymore would have
lead to difficulties in find humans to rate them. Thirty was already near the upper bound of
what a human could rate in one sitting without the undue influences of fatigue with the task.
For the purposes of the relatedness metric there is a very good spread of the concept pairings
to give a good indication of the abilities and limitation of the scoring system. The details of
the two stages are given below.
148
7.2.1 Stage 1 – Interview with a Producer
In the first stage of evaluation we approached a producer active within the industry to give a
rating of 1 to 5 for each of the selected pairs of concepts, where 1 represented a close
relationship and 5 a very distant relationship. This is obviously on a different scale to those
that the metric generated. However, it was deemed not practical to ask a person to rate
concepts on a scale of 0 to 100. The transformation of the scale is given in Table 7.2.
However, it must be emphasised the two different scales are not of any consequence as due to
the nature of the exercise what matters is the relative position assigned to the concepts on the
scale used. We also instructed the producer to put down the first number that came to mind
and not to over-think the decision.
Table 7.2: Score Transformation
Score Transformation Meaning
0 – 10: 1 Very Near
11 - 20: 2 Near
20 – 40: 3 Not Near Not Far
40 – 60: 4 Far
60+: 5 Very Far
As mentioned before, when stage 1 took place, twenty-five pairs of concepts were used.
These concepts, the metric generated scores for them and the rating given by the producer is
presented in table 7.3.
Table 7.3: Results of the Stage 1 Evaluation
Concept 1 Concept 2 Metric Score Transformed
Metric
Producer
Rating
1 Frame Rate per
Second
FPS 0 1 1
2 Mood Tone 0 1 1
3 Treatment Producer 2 1 1
4 Method Acting Actor 3 1 1
5 Motion Picture Cinema 3 1 1
6 Film Festival Cinema 4 1 2
7 Mood Category 6 1 2
8 Mood Style 6 1 1
9 Cross-cutting Editor 6 1 1
149
10 Camera Lighting 7 1 2
11 Screening copy Publicity 8 1 2
12 Mood Genre 8 1 1
13 Film Festival Award 9 1 3
14 Mood Rating 10 1 3
15 Screening copy Prestige 13 2 4
16 Film Festival Prestige 16 2 2
17 Motion Picture Actor 32 3 2
18 Motion Picture Editor 32 3 1
19 Film Festival Producer 32 3 2
20 Cross-cutting Motion
Picture
36 3 2
21 Script Prop 52 4 4
22 Cross-cutting Actor 58 4 2
23 Script Camera 64 5 3
24 Camera Prop 64 5 3
25 Method Acting Camera 66 5 2
In a follow-up interview, the producer was asked to account for any widely divergent
positioning compared to our metric. For example, our metric placed Method Acting and
Camera on the far extreme of the scale, indicating they were not related at all. However, the
producer on the other hand placed the pair in the middle of the scale.
From the discussion we determined that the differences stemmed from three reasons. Firstly,
as the Loculus ontology is not yet complete, some of the differences were because of links
that were missing from the ontology, which have since been added (illustrating how the
evaluation of the metric also serves to detect deficiencies in the ontology). Secondly, the
producer, who is involved with the entire production cycle, gave more weight to concepts that
belonged in the same stage of the production cycle. This raised an interesting question that
could not be easily answered by simply modifying the rules calculating the temporal aspect of
the relatedness score, because, from the discussions, there seem to be some indication that
certain roles put more emphasis on temporal context than others. This is a development that
requires further investigation, perhaps the use of Bayesian techniques to compute the
weighting of the temporal score relative to the reach score.
Lastly, from the discussion it was found that in some contexts the producer was relating two
concepts using links that were only visible to them as an individual and they were not able to
150
account for the link. For example, when we asked the producer to account for the difference
with the previously mentioned Method Action and Camera the producer replied that they
could not imagine Method Acting taking place without the Camera. However, upon further
discussion the producer acknowledged the Actors who employed Method Acting did so in all
instances, e.g. in stage plays, during rehearsals when the camera was not running and
sometimes not present. The relationship was “instinctively” mid-scale, but after discussion,
the producer felt the concept were more distant. Why then was the first instinct to put these
two concepts closer together? Could it be because the producer is used to seeing method
acting when the camera is running/present on set? That is, could the spatial proximity also
have a bearing on the closeness concepts? Spatial proximity also implies temporal proximity
and perhaps taking more account of temporal proximity would correct the divergence.
7.2.2 Stage 2 – Web Survey
In the second stage of the evaluation we created a web survey and invited students from the
Queensland University of Technology (QUT) Film School to participate. Similar to the
producer from stage 1, their instruction was to rate the pairs of concepts in terms of
relatedness on a scale 1 to 5. In the web survey, we also assigned words to give addition
meaning to the scale. The scale went from 1 – closely related, 2, 3, 4 and 5 - related only in
that they are both terms used in the industry. In addition, they could choose the option “I
don’t know”. There were circumstances where the participant gave no response, mostly likely
they were left blank by accident. They were also asked to supply the industry role they were
studying towards. A screenshot of the survey is given in Figure 7.1.
Figure 7.1: Web Survey Interface
151
We received twenty-two responses; of these nine were producers, three were editors and one
each of director, director of photography (DOP), production designer and documentary
researcher for TV. In addition we had five people who stated they were looking at having
multiple roles within the industry or were unsure of which role they would/could play within
the industry. The response rate was considered good both because of the small cohort size of
the QUT Film School and the general reluctance of the cohort in filling out surveys. The
limitation of this anonymous5 online survey was that we could not approach the participants
for clarification or elaboration of their answers and would have to draw our own conclusion
of their answers. Another factor that we needed to be mindful of was that the participants
were students and therefore their perspective based on their role might not be fully
developed.
Briefly, what was discovered from the web survey was a clear indication that agents with
different roles did indeed think differently and there was correlation between agents who held
the same role. The correlations were the strongest when the concepts were closely related to
their roles, which is consistent with our perspective axiom. The full table of results is given in
Appendix B: Web Survey Results and we undertake some statistical analysis in Section 7.2.3.
7.2.3 Statistical Analysis Of Stage 2 Data
After we obtained the data from Stage 2, we undertook some Pearson product-moment
correlation coefficient [110] in an effort to determine the extent to which the ratings
generated by our method, using the Loculus relatedness metric on the Loculus ontology,
correlated with the responses provided by the humans. This would give an indication of the
how well our method was at gauging human perception of relatedness of concepts.
To that end, we first calculated the Pearson product-moment correlation coefficient for each
individual participant. Pearson product-moment correlation coefficient compares the value of
two sets of variable, X and Y, to measure their correlation giving a value between +1 and −1
inclusive [110]. A value of -1 represented that X and Y are perfectly negatively correlated,
while a value of +1 represents that X and Y are perfectly positively correlated and 0 means
that X and Y are not correlated at all [110]. The X and Y datasets are symmetric and it does
5 A condition of the ethics clearance
152
not matter which set of data is designated X and which set is designated Y, the calculated
correlation coefficient will always be the same [110].
In our case, we would are expecting a positive correlation between X, which is the ratings
generated by the Loculus metric, and Y, the ratings assigned by each human participant. The
desirable result in this case is a positive correlation because that indicates that overall, the
human in question agreed when Loculus metric said a concept pair was closely related and
when a concept pair was distantly related. A negative correlation indicates that the human in
question consistently give the opposite rating to the one anticipated by the Loculus metric,
i.e. when the Loculus metric stated something was close, the human in question stated it was
far and vice-versa. The results for the individual correlations, based on correlation between
ratings given by human and raw ratings generated by the Loculus metric, as well as the
Loclus ratings on the transformed scale 1-5 respectively, are given in Table 7.4.
Table 7.4: Correlation Coefficient For Individual Respondents
Loculus Metric Raw Score Loculus Metric Transformed Score
No specific role -0.46 -0.44
Producer 7 -0.29 -0.29
Producer 5 -0.06 -0.13
Researcher TV 0.11 0.10
Producer 6 0.21 0.15
Producer 1 0.23 0.18
Producer 2 0.32 0.26
Production Designer 0.39 0.34
Editor 1 0.47 0.48
Student 0.49 0.47
Unsure of role 0.50 0.47
Producer 3 0.50 0.47
Editor 2 0.54 0.53
Director 0.55 0.51
Producer 4 0.55 0.51
Producer 8 0.58 0.52
Producer, Director, Actor 0.58 0.55
DOP 0.59 0.57
Editor 3 0.66 0.61
Editor, DOP, Director 0.66 0.63
Producer 9 0.79 0.76
We opted to calculate the individual correlations between both the raw scores generated by
the Loculus metric and the transformed rating for the 1-5 scale in order to ensure the most
accurate result possible but we expect the results generated from both sets of Xs to be
comparable in magnitude, which indeed proved to be the case as seen from Table 7.4.
153
From the individual results, we can observe that the Loculus metric is highly correlated with
“Producer 9” but strongly negatively correlated with “No Specific Role”. The Loculus metric
also posted a correlation of 0.5 or above in 11 cases, and posted a correlation of close 0.5 in 2
additional cases. Still it was negatively correlated in 3 cases, barely correlated in the case of
the “Researcher TV” and only marginally correlated in 4 more cases.
We also calculated the Pearson product-moment correlation coefficient for the overall human
rating. So we held the X variable set to be the ratings generated by the Loculus metric and the
Y variable set to be all ratings generated by the human participants of the web survey.
Excluding blank answers and “I don’t know” responses, there were 612 responses across
twenty-one participants for thirty pairs of concepts. This result is presented in Table 7.5.
Table 7.5: Correlation Coefficient For Overall Rating
Overall
Metric Raw: 0.34
Metric Transformed: 0.31
Once we had an overall correlation coefficient, we divided the participants according to role
groups to test how well our method can gauge the perspective of agents of a specific role. As
we had nine participants who identified as producers, the first group we created was for
Producers. The second group we created was Editors, which consisted of the three
participants who identified themselves as editors. The third group we titled Other Crew and
into this group we placed the participants who listed themselves as director, DOP, production
designer, researcher TV. The remaining participants, including “Student”, “Unsure Of Role”
and “No Specific Role”, were placed in the group Multirole. The reason for this was that as
these individuals had not identified a specific role that they wished to play in the motion
picture industry, they could play any role and were no different from the participants who had
identified multiple roles such as “Producer, Director, Actor” and “Editor, DOP, Director”.
For each group, we held the rating generated by the Loculus metric to be the X variable set
and then placed all ratings generated by all the participants for each group as the Y variable
set. For the Producers group, we had 266 data points. For the Editors group we had 86 data
points. For the Other Crew group we had 117 data point and for the Multirole group we had
142 data points. These results are presented in Table 7.6.
154
Table 7.6: Correlation Coefficient For Groups
Producers Editors Other Crew Multirole
Loculus Metric Raw: 0.27 0.53 0.39 0.34
Loculus Metric Transformed: 0.23 0.51 0.37 0.32
This information is interesting but we need to put it in a context. In this case, the
establishment of context involves determining how well the human participants correlate with
each other because this gives as us a number against which to compare the performance of
the Loculus metric. To that end, we calculated we calculated the correlation between the
ratings provided by each participant against every other participants. These results are given
in Appendix B – Tables B.7 to B.9.
Once we calculated the correlation for each individual participant against every other
participant we calculated the average and median to give an indication as to how well the
participants correlate with each other over all. This average and median is presented in Table
7.7
Table 7.7: Average and Median Correlation Coefficients For Human Against Human
Average: 0.34
Median: 0.42
As it can be seen, the average of the human against human correlation is comparable to the
overall correlation of the Loculus Metric to the human ratings (see Table 7.5). In addition, the
rating for the Editors group is significantly higher than the overall average; with the Other
Crew correlations being slightly higher than the overall average (see Table 7.6). However,
the correlation for the Producers group is significantly lower, with the Multirole group
being just at par. When we explored the reason for this, we discovered that one produce had
almost uniformly negative correlation not only with other producers but most of the other
participants as well. We also found another participant, “No Specific Role” – a member of
the Multirole group, had universally negative correlation with the other participants. Their
correlation with other participants is shown in Table 7.8.
Table 7.8: Possible Outliers
Producer 7 No specific role
Producer 1 -0.018902488 -0.352672808
Producer 2 0.00333463 -0.410241548
Producer 3 -0.135894674 -0.296228206
Producer 4 -0.229692174 -0.158643579
Producer 5 -0.099032556 0.082084823
Producer 6 -0.02745981 -0.161490361
155
Producer 7 1 -0.091894375
Producer 8 -0.191286386 -0.451284304
Producer 9 -0.146765277 -0.395540175
Editor 1 -0.050194059 -0.269910394
Editor 2 -0.035805627 -0.363111581
Editor 3 0.076696499 -0.504761905
Director -0.170345236 -0.58987042
DOP -0.303433042 -0.214861448
Production Designer -0.028887983 -0.250036985
Researcher TV -0.300527046 0.033512546
Producer, Director, Actor 0.291689897 -0.541149573
Editor, DOP, Director -0.264531703 -0.284564875
Unsure of role 0.266675872 -0.359107245
No specific role -0.091894375 1
Student 0.085022497 -0.474258661
The observations presented in Table 7.8 lead us to believe that perhaps these individuals did
not complete the survey properly and may even have lacked the necessary knowledge and
only undertook the survey for the chance to win the $50 gift voucher that was on offer.
Therefore, we believe that “Producer 7” and “No Specific Role” are outliers whose responses
are largely junk data. As such, we opted to recalculate the overall and group correlations for
the Loculus metric excluding the two outliers. We present the recalculated results in Table
7.9.
Table 7.9: Recalculated Group Correlations Coefficents
Overall Producers Editors Other Crew Multirole
Loculus Metric Raw: 0.41 0.33 0.53 0.39 0.55
Loculus Metric Transformed: 0.38 0.29 0.51 0.37 0.52
As it can be seen from Table 7.9, the overall correlation and the correlation for the two
groups of which “Producer 7” and “No Specific Role” were a member of, groups Producers
and Multirole respectively, have markedly improved. This is unsurprising, given that from
the individual correlation results (see Table 7.4), the Loculus metric has the worst correlation
for “Producer 7” and “No Specific Role”. We also recalculated the human against human
correlation average and median after excluding the outliers and this result is presented in
Table 7.10.
Table 7.10: Recalculated Average and Median For Human Against Human
Average: 0.46
Median: 0.48
The overall correlation is still comparable to the average and the correlation for the
Producers is still lower than the average but both have improved and certainly, the
156
correlation for the Multirole group has significantly improved and is now higher than the
average.
Part of the low correlation between the Loculus metric and the Producers group can be
attributed to “Producer 5” and the low correlation between the Other Crew group to
“Researcher TV”, both of whom individually have a negative correlation to the Loculus
metric. However, unlike the case of “Producer 7” and “No Specific Role”, “Producer 5” and
“Researcher TV” have a low but positive correlation with other participants. As we are
unable to conduct follow-up interviews with these individuals, their low correlation with the
Loculus could not be further investigated.
7.2.4 Comparison With Rada’s Simple Edge Counting
As stated in Section 7.2, as an afterthought, we opted to compare the performance of the
Loculus relatedness distance metric against Rada’s relatedness distance metric [92]. We
choose Rada because it is simple edge counting, while our method is weighted edge counting,
therefore, the two metrics are comparable. Furthermore, Rada is a metric that can work over
the graph structure used by the Loculus Ontology, while other existing metrics use only
inheritance relationships. The relatedness measures computed by Rada are presented in Table
7.11 and denote the number of edges/concepts that need to be traversed to get from Concept 1
to Concept 2. The minimum number of edges is 1 and the maximum number of edges is 6,
which was the maximum number of edges that separated two concepts amongst the thirty
pairs of concepts we had selected. Rada’s metric has no built in maximum.
Table 7.11: Rada’s Metric Baseline
Concept 1 Concept 2 Loculus
Metric
Loculus
Transformed
Rada’s
Metric
1 Frame rate per second Fps 0 1 1
2 Mood Tone 0 1 1
3 Treatment Producer 2 1 1
4 Motion Picture Cinema 3 1 2
5 Method Acting Actor 4 1 2
6 Editor Cutting on action 4 1 2
7 Film Festival Cinema 4 1 2
8 Genre Category 6 1 2
9 Mood Category 6 1 2
10 Mood Style 6 1 2
11 Cross-cutting Editor 6 1 3
12 Camera Lighting 7 1 1
157
13 Cross-cutting Cutting on action 8 1 3
14 Screening copy Publicity 8 1 5
15 Film Festival Award 9 1 1
16 Mood Genre 10 1 5
17 Mood Rating 10 1 4
18 Film Festival Prestige 16 2 3
19 Screening copy Prestige 19 2 4
20 Motion Picture Actor 32 3 2
21 Motion Picture Editor 32 3 2
22 Film Festival Producer 32 3 4
23 Cross-cutting Motion Picture 36 3 4
24 Editor Category 52 4 4
25 Script Prop 52 4 2
26 Editor Genre 56 4 6
27 Cross-cutting Actor 58 4 5
28 Script Camera 64 5 4
29 Camera Prop 64 5 4
30 Method Acting Camera 66 5 6
Due to the 1-6 node count, Rada outputs numbers very similar to the transformed scores for
the Loculus metric. This is to be expected as where the distance between concepts is small
(e.g. Treatment and Producer), Rada’s metric and Loculus metric should be comparable as
the big issue with Rada’s metric is that it does not properly account for abstractions (see
Section 2.3.2 for more discussion on Rada and Section 5.3 for a discussion on the problem of
abstraction). However, as it so happened, the majority of our thirty pairs of concepts were on
the “near” end of the scare. As a result when the Loculus metric is designated the X variable
set and the Rada’s node count metric is designated the Y variable set and the Pearson
product-moment correlation coefficient is calculated, the resulting coefficient yields a strong
positive correlation. This result is presented in Table 7.12.
Table 7.12: Correlation Coefficent Between Rada's Metric and Loculus Metric
Rada’s Metric
Loculus Raw Metric 0.63
Loculus Transformed Metric 0.57
Following the establishment of the Rada baseline we repeated our Pearson product-moment
correlation coefficients from Section 7.2.3. We first compared Rada’s edge count correlation
with each individual participants, the result of which is given in Table 7.13. We have also
include the Loculus metric correlations in Table 7.13 for quick comparison.
158
Table 7.13: Correlation Coefficient For Individual Respondents
Loculus Raw Metric Loculus Transformed Rada’s Metric
No specific role -0.46 -0.44 -0.37 Producer 7 -0.29
-0.29 -0.10 Producer 5 -0.06
-0.13 0.29 Researcher TV 0.11
0.10 0.23 Producer 6 0.21
0.15 0.32 Producer 1 0.23
0.18 0.37 Producer 2 0.32
0.26 0.35 Production Designer 0.39
0.34 0.56 Editor 1 0.47
0.48 0.43 Student 0.49
0.47 0.45 Unsure of role 0.50
0.47 0.45 Producer 3 0.50
0.47 0.48 Editor 2 0.54
0.53 0.45 Director 0.55
0.51 0.49 Producer 4 0.55
0.51 0.54 Producer 8 0.58
0.52 0.42 Producer, Director, Actor 0.58
0.55 0.44 DOP 0.59
0.57 0.49 Editor 3 0.66
0.61 0.62 Editor, DOP, Director 0.66
0.63 0.62 Producer 9 0.79
0.76 0.45
Once again, “No Specific Role” and “Producer 7” post negative correlations for Rada’s
metric. As a result, we proceeded with the rest of the calculations with datasets that exclude
“No Specific Role” and “Producer 7”, as we did for the Loculus metric.
With the outliers excluded, we calculated correlations for Rada’s metric for the overall
ratings provided by humans, as well as the four groups established in Section 7.2.3. Once
again, we held Rada’s metric to be the dataset X and the human ratings to be dataset Y for the
Pearson product-moment correlation coefficient calculations. The results are presented in
Table 7.16, where we have also included the results for the Loculus metric for quick
comparison.
159
Table 7.14: Correlation Coefficient For Groups Excluding Outliers
Overall Producers Editors Other Crew Multirole
Loculus Raw Metric 0.41 0.33 0.53 0.39 0.55 Loculus Transformed Metric 0.38 0.29 0.51 0.37 0.52 Rada Metric 0.41 0.37 0.48 0.42 0.48
As can be see, the results between Loculus and Rada are comparable. Although Rada does
perform marginally better in the Producers group and Other Crew group than the Loculus
metric. This is because, individually, Rada has a marginally better correlation with “Producer
5” and “Researcher TV”, while the Loculus metric does not. Again, without the opportunity
to interview these people, it is difficult to explain the difference.
However, as stated in the beginning of this section, the difference between the Loculus metric
and Rada’s metric lies in dealing with the problem of abstraction. This is the problem that
only becomes evident in concepts that deemed to be at a large distance from one another.
From Table 7.4, we can see that for a number of concept pairs to which the Loculus metric
assigned a high score, Rada only assigned a low score. This is because the Loculus metric
took into account abstraction and Rada did not. For these concepts, the Loculus metric should
have a better correlation then Rada. Indeed, when we computed the Pearson Product-Moment
Correlation Coefficient for pairs of concepts to which the Loculus metric assigned a score of
32 or above, i.e. “somewhat related” to “not related except that they are in the domain”, the
correlation between the Loculus metric and the human ratings were better overall then Rada’s
metric for all groups except Other Crew. These coefficient calculations are presented in
Table 7.15.
Table 7.15: Correlation Coefficient For Distant Concept Pairs, Excluding Outliers
Overall Producer Editor Other Crew Multirole
Loculus Raw Scores 0.40 0.46 0.48 0.32 0.58
Loculus Transformed Scores 0.34 0.37 0.42 0.28 0.52
Rada: 0.38 0.39 0.38 0.39 0.50
From the results of the statistical analysis presented in Section and Section 7.2.3, we can say
that the Loculus relatedness metric has about the same level of correlation with human ratings
as humans do to each other. In addition, comparing against Rada’s metric, we can say that
because the Loculus metric does take into account the issue of abstraction, it does perform
better when dealing with concept pairs that are at a distance from each other. We discuss the
results further in Section 7.5.
160
7.3 System Evaluation against Available Data
As explained in Section 1.3.3, this PhD project was intended as part of a larger project which
did not proceed as planned. As a consequence, large amount of industry data was not
available. Therefore, we did not have access to the volume of data needed to properly test the
system. Nor were we able to deploy the system in a setting that would truly test its
capabilities with genuine users. As a result we could only conduct some unit tests with a
limited set of data we obtained from AFTRS and CCI related projects. The unit tests,
however, only tested the functionality of the algorithms.
The data we obtained consisted of some samples of distribution information for a set of their
graduate films. However, we did not have any information on any of these films from earlier
in the production cycle; the distribution information came only from the post-production
phase. Therefore, we could not use this date to them generate complete Loculus records that
spanned the entire production cycle of a given film. However, we were able to user this set of
data to unit test our ingestation, classification and information extraction algorithms.
We also obtained data from the BPM project (see Section 1.3.3) for the film Rope Burn. This
data was generated by BPM’s workflow system and mimicked the data from an actual film.
Again, this was not the complete set of data for the film but was from the production phase.
Once again, the usability of the data was in unit testing our algorithms but not enough to
provide an overall evaluation of usability of the system in a practical setting. This was an on
avoidable short-coming but one that will be addressed as the over-arching project moves
forward.
7.4 Discussion of the Ontology
7.4.1 Achievements
The Loculus ontology is a domain ontology for the motion picture industry that captures the
natural discourse of the industry. The ontology has as its foundation a set of axioms that
govern its behaviours. The ontology captures key contexts of the motion picture industry;
namely the temporal context and the agent context, aspects often neglected in ontology
development. The Loculus ontology exists in three parts; the motion picture industry
concepts ontology, the Agent ontology and the common concepts ontology. Due to the
161
axioms and the contexts of the industry, our ontology is organized along three axes. These
axes are the inheritance/vertical axis, the linkage/horizontal axis and the temporal axis.
In designing our ontology, we kept in mind Hepp’s criticisms of ontologies regarding the
disconnect between creators and users of ontologies [40]. We designed our ontology to reflect
closely the discourse of the industry and minimised the use of the constructs the sole purpose
of which is to beautify the ontology in the eyes of the computer scientists. We have adopted
this approach to address the concerns of Hepp in regarding communication between creators
and users, as this measure should make the ontology more easily understandable by the
domain practitioners. However, the evaluation of the ontology in this regards is difficult
without a critical mass of adoption of the ontology. That being said, the accuracy of the
modelling of the ontology is something that can be evaluated by putting the ontology to use.
We put the ontology to use by using it to calculation agent perspective and semantic
relatedness of concepts within the ontology generally. The results of this evaluation are
discussed in Section 7.5.
7.4.2 Limitations and Future Works
The Loculus Ontology faces the many limitations that are inherent to ontologies. Notably, a
domain ontology can never be pronounced complete. There is always something more to
model and given that most domains evolve over time, a domain ontology must also evolve to
keep in-step with the domain. Then there is the question of adoption. Adoption of an
ontology is not only contingent upon the ontology’s usefulness and the ability of the ontology
for deliver benefits that far out way the costs associated with adoption of the ontology. Like
any technology, it requires social engineering at each and every organisation that can benefit
from the ontology in order for them to adopt a framework in which to conduct their processes
and business. In short, the adoption of ontologies is as much about economics as it is about
technology. Addressing the concerns of the economics is beyond the scope of this research,
all we have attempted to do is construct an ontology for the motion picture industry that is
complete enough to attempt to demonstrate the viability of ontology based approaches in
providing solutions to the information manipulation needs of the industry.
As discussed in Section 7.2.1, the limitations specific to our ontology is that, as it stands now,
the temporal aspect of the ontology is not being given enough weight. In addition, the
concept of spatial proximity may need to be introduced to better model the domain as a
162
whole. These are all aspects that need to be looked at as we move onto future works and build
upon the ontology base constructed during this PhD and move towards as comprehensive an
ontology as it is possible to have.
7.5 Discussion of the Relatedness Metric
7.5.1 Achievements
The relatedness metric devised by us is a score based relatedness metric that measures the
relative distance between concepts. The relatedness metric is calculated along the three axes,
the temporal axis, the inheritance/vertical axis and the linkage/horizontal axis. These scores
are used to measure the perspective of an agent in relation to concepts within the domain of
discourse and the ontological relatedness between non-agent concepts. As detailed in Section
7.2.2, we evaluated the scores generated by the metric, as well as the modelling of the
Loculus ontology, by means of a web survey.
Amongst the results of that web survey, the most erratic and divergent rating from the
calculated positioning were observed in those participants who were not certain which role
they would like to play in the industry or identified with multiple roles. This is both
interesting and has implications for the application of the perspective on information
extraction and classification as it goes to the question of how best to address the query of an
agent who might have multiple viewpoints.
An interesting behaviour was observed on the Frame rates per second and FPS pair, where
we expected most participants to give the pair a rating of 1 – thus indicating that they were
aware that FPS is an abbreviation of Frame rates per second; the majority of people did
indeed pick this response. Alternatively, we expected the participants to pick the “I don’t
know” response, indicating they were not aware what FPS was. Certainly two of the
participants, one of whom is a production designer and another who indicated that they were
studying film and music but were unsure of their final role, did pick the “I don’t know”
option. This makes sense as these two participants are from roles far removed from those
expected to have such technical knowledge. However, what was curious was that four of the
participants assigned a rating of other than 1 or “I don’t know”. We have no idea why they
did that. We can only assumed that they either assigned a different meaning to FPS (e.g. first
person shooter in the video game industry) or not knowing the meaning of FPS chose to
guess a rating.
163
Generally, the observed behaviour was that where the metric predicted a close relationship –
the majority of the participants agreed with the metric’s findings. However, the result was
more mixed as the distance increases. This is consistent with logic as, in general, humans are
more adapt are identifying things are the closely related but falter when things are not so
closely related. Unless of course they have special knowledge of both concepts involved.
This is the basis of our perspective-based hypothesis of information exploitation and does
hold true. As the distance increased, the overall results were more erratic but where the role
of the participant corresponded to a deeper knowledge of one or more concept, the participant
was more likely to agree with the metric result. This is a good indication that our key
assumption, the idea that different roles have different perspective on the same concepts and
that we can predict and gauge this perspective using the Loculus ontology and the relatedness
metric, is valid and viable.
Amongst the results there was one clear cut case where it was obvious that we had modelled
something incorrectly in our ontology. This involved the pair of concept Mood and Rating.
The metric calculated a score of 10 for this pair, which put it firmly in the closely-related end
of the scale. However, not only did the producer from stage 1 give that pair a rating of 3 and
thus put it in the mid range of the scale. No one from stage 2 opted for a rating of 1 and this
pair was overwhelming given a mid to highly distant rating. So what is going on here? If we
look at the modelling with the ontology, which is illustrated in Figure 7.2, Mood and Rating
are linked through the concept of Category. This is correct, mood is an emotive category and
rating is a criterion based category. The producer from stage 1 confirmed this, as did the
literature. So what is the issue here?
164
Mood
Category
total score: 10
EmotiveCategory
Emotion
+1 temporal score
involves concept
+2 reach score
+2 reach score
Criteria basedCategory
+2 reach score
Rating
Criteriainvolves concept
+2 reach score
+1 temporal score
conception production utilisation
distribution discovery access preservation
reuse/re purpose
Life Stage
+1 temporal score
pre-production production post-production
Production Cycle as a whole
Production Cycle
Figure 7.2: The Score Calculation of Mood to Rating (Reusing Figure 5.8)
At first glance, the problem seems to be the improper abstraction of the concept of Category
and this was an error in judgement on our part that is obvious when we look at the children of
the Category. While emotive and criteria based category are both types of category they are
completely disjointed and are modelled as such in the ontology as shown in Figure 7.3. This
suggests that a rule is necessary to reflect the disjointedness of concepts, such as Emotive
Category and Criteria-based Category. However, the matter is not quite that simple because
the predicted score for Mood and Genre did correspond with those selected by the survey
participants in that the majority of the participants agreed that the relationship between Mood
and Genre were close, the majority gave it a score of 1 on the scale of 1 to 5. This is
interesting because Mood and Genre, is the same distance away from Mood as Rating, as
shown in Figure 7.3. This adds another dimension to this issue and might actually be an
indication that both the ontology and the calculating rules are fine and the anomaly is actually
due to loss of context because we simply presented the word “rating”.
165
Disjointed With
genrepost
production
discovery
criteria basedcategory
category
description
criteria
productioncycle
is used during
is used during
involves conceptemotivecategory
emotionsinvolves concept
moodpost
production
is used during
classify under
inherits from
inherits from
inherits from inherits from
rating
Figure 7.3: The Score Calculation of Mood to Genre
By rating we meant the criteria based ratings a movie gets from the censorship board, e.g.
PG, M etc. However, it is likely that the participants could have thought that the rating
referred to something else. The rating a movie gets from a movie reviewer, for example.
Perhaps the participants thought that rating referred to how a movie is rated based on their
box office performance, e.g. blockbuster, flop etc. Because we were not able to conduct
follow-up interviews with the participants we will never know for certain what they were
thinking when they assigned the relatedness score they did. However, the indication does
seem to be that the modelling and the calculations were not at fault for this anomaly.
Lastly, the Method Action and Camera scoring issue from stage 1 was also present in stage 2
but with added twists. All three editors assigned a score of 3 to the pair, remembering that the
metric assigned it a score of 5. The producers were clustered around 2 and 3 with two
outliers, one of whom gave it a score of 1 and another gave it a score of 4. The other roles
clustered around 3 and 4. This is interesting in that it seemed to indicate a marked reluctance
by the participants to give concepts a score of 5. Prompted by this idea, all the results were
looked at again and there did seem to be a clear pattern of reluctance to score concept pairs
high. However, even taking this observation into account, the clustering of score around the
166
middle probably does indicate that closer attention needs to be placed on temporal or spatial
proximity.
The last question to discuss here is how effective is the Loculus metric in facilitating the use
of user role-concept distances for perspective-sensitive information extraction. However, the
evaluation we have undertaken so far does not equip us to answer this question. What can be
said is that based on the statistical analysis undertaken in Section 7.2.3 and Section 7.2.4 is
that the Loculus metric is no worse correlated to humans then humans are to each other and
that is it better correlated with humans for concept pairs that are more distantly connected.
Overall, these results are promising and encouraging for further investigation.
7.5.2 Limitations and Future work
The obvious limitation that has come through from the evaluation is that we are simply not
putting enough weight on the temporal score. Perhaps the problem is that we are adding the
temporal score to the reach score to get an overall score and more accurate method of
measuring relatedness is to keep the two scores separate and implement algorithms that use
some sort of Bayesian probability to arrive at a relatedness measure based on the two scores.
In addition, the weight we are assigning to the edges should also be more flexible and
adaptive, chosen based on training datasets as opposed to be being assigned by the user. In
short, the rules that govern the calculation of the scores appear to be sound but how the scores
are used and the exact value of the weight assigned to the edges are solid ground for future
exploration.
There is also grounds for future exploration in developing and exploring other methods of
accounting for user perspective, such as develop a dynamic relatedness distance function by
taking into account the user’s role. In other words, the distance calculations between concepts
would be dynamic and alter based on the user’s roles. Thus, concepts that would be near for
some users, would be far distant for others. This type of relatedness calculations would be
very interesting to explore and may lead to better relatedness predictions.
7.6 Discussion of the Loculus System
As explained in Section 7.3, the evaluation of the system was constrained due to the limited
amount of data. Therefore, all that can be said about the system is that the algorithms were
able to use the ontology and the metric to perform the functions for which they were
167
intended. The real evaluation of the system lies in the realm of future work. As the over-
arching project moves forward, we hope to deploy the system at AFTRS and then conduct an
extensive evaluation of the practical aspects of the system and the methods it employs. All
that can be really said at this point is that the semantic tools being utilised by the system have
yielded promising results.
7.7 Generalization of this Research
Although the work presented in this thesis has been developed for the motion picture
industry, many of the ideas are applicable more generally and could be applied to other
domains which have agents with different perspectives that play a significant role in their
information-seeking behaviour. One such domain is the medical domain.
Within medicine, there are many different types of agents. Health providers range from
specialist to general practitioners to technicians to nurses and even patients are a type of
agents in the domain of heath. Their perspective varies not only depending on their training
but also based on their involvement with patients. Some health care professionals are with the
patient longer term than others. Even with patients, they are not all the same. Some suffer
from chronic illness, with others the recovery takes a long time, still others are only ill for a
short time. The illness themselves differ based on intensity. All these differences introduce
multiple perspectives into the medical domain and agents when seeking information are
acutely influenced by their perspective. This, coupled with the fact that there is an increasing
push towards digitisation and sharing of medical information, means that it is another
information intensive field where the techniques presented in this thesis can potentially be
used for information management.
Firstly, while the Loculus ontology only applies to the motion picture domain, the axioms
underpinning the ontology can be used to develop an ontology for the medical industry that
works as an over arching ontology that combined the existing specialised ontologies of the
medical domain. To put another way, the medical domain already has ontologies that mainly
focussed on the representation and (re-)organization of medical terminologies [111]. So the
domain ontology for medicine can be said to be in parts. For these parts to be accessed
seamlessly, an overarching ontology may be necessary. Such an ontology can be built using
the axioms we have outlined in Section 4.2.
168
Only the temporal axioms, Section 4.2.1.3, would need to be drastic alteration because the
timeline of the medical domain is significantly different to those of the motion picture
domain. However, the other axioms would apply across domains. For example, Inclusion
Axiom 1 (Section 4.2.1.1) states that “When expressed in natural language, concepts are
considered to be part of the discourse of the industry”, this axiom is as much applicable to the
medical domain as it is the motion picture domain. Same goes with the other inclusion
axioms.
Moving on to the Inheritance Axioms (Section 4.2.2.1), they too are applicable to the medical
industry, as even within that industry, abstract concepts such as medicine, treatment etc
occupy high levels of abstraction. In addition, the idea of common concepts being at the root
of specific concepts also applies. The Linkage Axioms, as presented in Section 4.2.2.3, are
also transferrable to the medical domain and can be used to link medical concepts together
and agents to relevant medical concepts. Even the rules regarding the strong and weak
relationship apply as even within the medical domain there exists relationship that be
precisely articulated and those that are imprecise and are articulated as such.
Given an overarching ontology that links together the various ontologies of the medical
domain, the relatedness metric can theoretically be used to calculate the relatedness of
concepts within the large domain ontology. Our relatedness metric is not affected by the size
of the ontology and should be applicable to any ontology, however large or small. The
weights that are assigned to the edges may need to be altered but that decision would be
contingent upon the scale used, as well as the size of the ontology. Given the current weight,
a very deep ontology would result in numbers high enough to blow out the scale. This might
not be a desired effect and so the numerical weights assigned by the rules may have to be
lowered. However, what will remain the same is the magnitude of the weights, i.e. the rule
that assigns +25 to the edge that connects a concrete concept to concepts with the highest
level of abstraction can instead assign +10 to the edge but it cannot assign a number that is
lower or even close to the number assigned to the edge that connects a concrete concept to
another concrete concept.
Lastly, the Loculus system is designed to work with any ontology that’s plugged into its
Semantic Module. If the relatedness metric weights were changed – that change would need
to be reflected in the Semantic Module so it could calculate relatedness scores properly. In
addition to the Semantic Module, the Record Module would also have to be changed as the
169
Loculus Metadata Wrapper cannot be used to wrap medical information. The tags of the
metadata schema would simply not correspond with medical information. Aside from that,
the Classification Module and the Information Extraction Module should still function with
minimal modification within the medical domain.
In summary, we believe the bulk of the work presented in the thesis is general enough to be
applicable to any domain where user perspective matters.
170
171
8 Conclusions “Based on the findings of the report, my conclusion was that this idea was not a practical
deterrent for reasons which at this moment must be all too obvious.” – Dr. Strangelove, Dr.
Strangelove or: How I Learned to Stop Worrying and Love the Bomb
In this thesis we have presented the Loculus: An ontology-based framework for the
management for the motion picture industry. The work presented was born out of the three
main research questions; each of each has a set of accompanying sub-questions. In this
conclusion, we summarise the research reported in this thesis by presenting our findings as
answers to the research questions.
In order to discover answers to these questions, we undertook an extensive literature review
that lead us down the path of utilising a domain ontology, coupled with semantic relatedness
metric to apply to the management of information for the motion picture industry. As the
motion picture industry lacked a domain ontology, we developed one as part of the
contribution of the research reported in this thesis (Chapter 4). The second contribution of
the work reported in this thesis was a method to calculating semantic relatedness through the
utilisation of a rule-based scoring system (Chapter 5). Lastly, the ontology and the rule-based
scoring system for the determination of the semantic relatedness was combined in the
Loculus System and applied to information extraction and information classification (Chapter
6). This demonstration of the combined ability of the Loculus Ontology and the Semantic
Relatedness was another contribution of the work reported in this thesis.
1. How to model the domain of discourse to facilitates the different perspectives of agents?
1.1. How to find the concepts of the domain?
1.2. How to model the relationships between the concepts?
1.3. How to model the relationship of the agents of the system to the concepts?
1.4. What other aspects of the domain need to be captured in the model?
As discussed in Chapter 4, an ontology is an effective means by which an domain of
discourse can be modelled. The ontology was constructed from discussion with industry
experts and from a study of industry literature. The foundation for the ontology is a governing
set of axioms that are divided into three types:
• General axioms which in turn is sub-divided into inclusion axioms and temporal axioms;
172
• Concept axioms which in turn is sub-divided into inheritance axioms, linkage axioms and
equivalency axioms;
• Meta-link axioms.
Due to the workings of the axioms the majority of the concepts of the industry are oraniged
along three axes. These axes are the inheritance/vertical axis, the linkage/horizontal axis and
the temporal axis. Concepts are linked to each other through inheritance and linked sideway
through properties. In the inheritance hierarchy some concepts are considered abstract, from
which more concrete concepts inherit. Sideways, concepts are related using either strong or
weak concept links. The name of these concept links are determined by the Meta-link axioms.
The agent perspective is also modelled using the concept links, as the links can be used to
link a concept with the agents that are involved with a given concept. Agents can either be
related to a concept through a strong or weak link, with the strength of the link signifying the
clarity of the involvement of the agent.
Lastly, it was found that the other important aspect of the industry that had needed to be
captured in the ontology was the temporal aspect of the industry, which is represented by the
two timelines of the industry: the production cycle and the life stage timeline. These two
timelines were captured as properties of concepts within the ontology.
2. Can agent’s perspective be exploited to better serve the needs of the motion picture
industry?
2.1. How can agent perspective be measured?
2.2. How can agent perspective be exploited to better meet the information needs of the
motion picture industry?
In Chapter 5, we presented the development of the score-based relatedness metric, through
which we were able to approximate the perspective of agents and then devise methods of
exploiting the perspective in information extraction and classification. The score of the
relatedness metric is determined by a set of rules that work along the three axes of the
Loculus ontology. The rules are general enough to be applicable to any ontology at least to
determine the reach between concepts. However, the temporal rules are novel and bound to
the two timelines of the Motion Picture Industry.
173
We are applying the perspective in information extraction and classification. We are doing
this by leveraging the perspective axiom that states “If the distance between two concepts is
sufficiently low, an agent who is sufficiently distant from both concepts may consider the two
concepts to be substitutable”. In short, we are calculating the perspective of users using the
metric and utilising the axiom and the scores calculated for utilisation for information
extraction and classification.
Perspective is mostly evident in the extraction of information based on a given query,
because a user formulates a query based on, per the dictionary definition of perspective, the
state of their ideas and the facts known to them. Similarly the user wants an answer framed
from that perspective as well. To that end, for information extraction, perspective can be used
in three ways: to expand the parameters of the query, clarification of ambiguity in the query
and lastly, to determine the relevance of a given piece of information.
Perspective comes into play to classification in two distinct ways: correction of errors and
ambiguities and the removal of redundancies. Firstly, perspective can be used to make
corrections in the data being ingested and classified, so as to free it of organizational
idiosyncrasies that can lead to confusing and a deterioration of data quality. Secondly,
perspective can be used for redundancy removal. In this application the system combs
through data to be classified to flag potential redundancies. To determine a redundancy the
system must take into account the perspective of the originator of the information. If the
originator of the information is sufficiently removed from two concepts that can be
considered equivalent based on the perspective axiom, and the instance value of both the
concepts are the same than it maybe a case of redundancy and should therefore be flagged.
3. What kind of a system could exploit agent perspective to better serve the needs of the
motion picture industry?
3.1. How would the various models be used?
3.2. How would the agent perspective be used?
In Chapter 6 we presented the design and implementation of the Loculus System that was
designed to demonstrate the application of the ontology and the metric to information
extraction and classification. The Loculus System is designed with a group of abstract
modules forming the heart of the architecture. This abstract architecture was made concrete
by implementing the abstract modules to work with the Loculus Ontology and the Loculus
174
Relatedness Metric, with data ingestation and dissemination classes being designed to work
with AFTRS data.
The key module of the system is the Semantic Module that holds the ontology reader class
and the relatedness metric calculation class. Through the ontology reader, the system
interacts with the Loculus Ontology, which is the model of the domain, and brings it into play
by using the relatedness metric calculator to calculate relatedness scores between two
concepts, as well as generate lists of related concepts to a given concept within a certain score
of tolerance. The scores and the list are then employed for information exploitation and
information classification.
For information exploitation, perspective is employed to expand query parameter,
disambiguate concepts presented in the query and finally to rank the relevance and adjust the
display of the query data accordingly. For classification, perspective is employed for
redundancy removal and error correction.
In summary, the role of the practitioners of a domain and their corresponding degree of
familiarity with the concepts of the domain plays a significant part in the manner in which
they use and produce information. Therefore, there are benefits to be gained from taking a
perspective-based approach to information management. In this thesis, we have shown that a
perspective-based approach to information management is both feasible and promising.
175
References 1. Marchionini, G. and R.W. White, Information-seekign support systems, in IEEE
Computer. 2009. p. 30-32. 2. AFTRS. AFTRS Homepage. Available from: http://www.aftrs.edu.au/. 3. Kaplan, I. (2009) Natural Language Processing and Information Extraction.
Volume, 4. Wood, R.E., Toward an Ontology of Film. Film-Philosophy, 2001. 5(24). 5. Wikipedia. The Curious Case of Benjamin Button. 2009 [cited 2009 15-05-2009];
Available from: http://en.wikipedia.org/wiki/The_Curious_Case_of_Benjamin_Button_(film).
6. Wikipedia. Slumdog Millionaire. 2009 [cited 2009 15-05-2009]; Available from: http://en.wikipedia.org/wiki/Slumdog_Millionaire.
7. Wikipedia. Lions for Lambs. 2009 [cited 2009 15-05-2009]; Available from: http://en.wikipedia.org/wiki/Lions_for_Lambs.
8. Wikipedia. It's a Wonderful Life. 2009 [cited 2009 15-05-2009]. 9. Wikipedia. Ten Canoes. 2009 [cited 2009 15-05-2009]; Available from:
http://en.wikipedia.org/wiki/Ten_Canoes. 10. (ESA), E.S.A., ESSENTIAL FACTS ABOUT THE COMPUTER AND VIDEO GAME
INDUSTRY 2008 SALES, DEMOGRAPHIC AND USAGE DATA. 2008. 11. Elías, J. Why do people pirate media outside the USA? 2006 [cited 2009 15-05-
2009]; Available from: http://eliax.com/blog/articles/2006_05_28_why_do_people_pirate_media_outside_the_USA.htm.
12. Kravets, D. Pirate Bay Future Uncertain After Operators Busted. 2008 [cited 2009 15-05-2009]; Available from: http://www.wired.com/threatlevel/2008/01/pirate-bay-futu/#previouspost.
13. Lessig, L., Remix: Making Art and Commerce Thrive in the Hybrid Economy. 2008. 14. Ouyang, C., et al., Camera, Set, Action: Automating Film Production via Business
Process Management, in Creating Value: Between Commerce and Commons - CCI
International Conference. 2008: Brisbane Australia. 15. Ouyang, C., et al., Towards Web-Scale Workflows for Film Production, in QUT
ePrints Archive. 2008. 16. Leigh, A., Context! Context! Context!: Describing Moving Images at the Collection
Level. The Moving Image, 2006. 6(1): p. 33-65 17. Lakoff, G., Women, Fire, and Dangerous Things. 1987: The University of Chicago
Press. 18. Wikipedia. Information model. 2009 [cited 2009 27th February 2009]; Available
from: http://en.wikipedia.org/wiki/Information_model. 19. Group, I.S., Functional Requirements for Bibliographic Records. 1998. 20. Ayres, M.-L., et al., FRBR and AustLit: the Australian Literature Gateway. 2002,
ERIC. 21. Lagoze, C. and J. Hunter, The ABC Ontology and Model. Journal of Digital
Information, 2001. 22. Iannella, R. and P. Higgs, Driving Content Management With Digital Rights
Management. 2003, IPR SYSTEMS. 23. Barlas, C., Digital Rights Expression Languages (DRELs). 2006. 24. Mallorca, P.d. MPEG-7 Overview. 2004 [cited 2009; Available from:
http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm. 25. Hunter, J., MPEG-7 Behind the Scenes, in D-Lib. 1999.
176
26. Burnett, I.S., S.J. Davis, and G.M. Drury, MPEG-21 Digital Item Declaration and
Identification - Principles and Compression. IEEE Transactions on Multimedia, 2005. 7(3).
27. Wang, X., et al., The MPEG-21 Rights Expression Language and Rights Data
Dictionary. IEEE Transactions on Multimedia, 2005. 7(3). 28. Congress, T.L.o. METS: An Overview & Tutorial. 2006 Available from:
http://www.loc.gov/standards/mets/METSOverview.v2.html. 29. Fleischhauer, C. The Library of Congress Digital Audio Preservation Prototyping
Project. in Sound Savings Conference. 2003. Austin, Texas, USA. 30. Trippe, B., XrML and Emerging Models of Content Development and Distribution.
2002. 31. Apps, A. Guidelines for Encoding Bibliographic Citation Information in Dublin Core
Metadata. 2005 Available from: http://dublincore.org/documents/dc-citation-guidelines/.
32. Amielh, M. and S. Devillers. Bitstream Syntax Description Language: Application of
XML-Schema to Multimedia Content Adaptation. in THE ELEVENTH
INTERNATIONAL WORLD WIDE WEB CONFERENCE. 2002. Honolulu, Hawaii, USA.
33. Chandrasekaran, B., J.R. Josephson, and V.R. Benjamins, What Are Ontologies, and
Why Do We Need Them? IEEE Intelligent Systems, 1999. 14(1): p. 20-26. 34. Cavell, S., The World Viewed: Reflections on the Ontology of Film. 1979: Harvard
University Press. 35. Welty, C., Ontology Research, in AI MAGAZINE. 2003. 36. Guarino, N. Formal Ontology and Information Systems. in International Conference
on Formal Ontology in Information Systems 1998. Trento, Italy. 37. Uschold, M. and M. Gruninger, Ontologies and Semantics for Seamless Connectivity.
SIGMOD Record, 2004. 33(4): p. 58-64. 38. Shamsfard, M. and A.A. Barforoush, The state of the art in ontology learning: a
framework for comparison. The knowledge engineering Review, 2003. 18(4): p. 293-316.
39. Noy, N.F., Semantic Integration: A Survey Of Ontology-Based Approaches. SIGMOD Record, 2004. 33(4): p. 65-70.
40. Hepp, M., Possible Ontologies: How Reality Constrains the Development of Relevant
Ontologies. IEEE Internet Computing, 2007. 11(1): p. 90-96. 41. Staab, S. and A. Maedche, Axioms are Objects, too— Ontology Engineering beyond
the Modeling of Concepts and Relations. 2000. 42. Poli, R., Ontological methodology. International Journal of Human-Computer Studies,
2002. 56(6): p. 639–664. 43. Uschold, M. and M. King. Towards a methodology for building ontologies. in
Workshop on Basic Ontology Issues in Knowledge Sharing. 1995. 44. Bellinger, G., D. Castro, and A. Mills (2004) Data, Information, Knowledge, and
Wisdom. Systems Thinking Volume, 45. Hunter, J. Adding Multimedia to the Semantic Web: Building an MPEG-7 Ontology.
in International Semantic Web Working Symposium. 2001. Stanford University, California
46. Hunter, J., Enhancing the Semantic Interoperability of Multimedia Through a Core
Ontology. IEEE Transactions on Cirsuits and Systems for video technology, 2003. 13(1).
177
47. Lagoze, C., J. Hunter, and D. Brickley. An Event-Aware Model for Metadata
Interoperability. in 4th European Conference on Research and Advanced Technology
for Digital Libraries (ECDL). 2000. Lisbon, Portugal. 48. Project, H. (2001) ABC Harmony Data Model Version 2. Volume, 49. Doerr, M., The CIDOC CRM – an Ontological Approach to Semantic Interoperability
of Metadata. AI Magazine, Special Issue on Ontologies, 2002. 50. Doerr, M. and P. LeBoeuf. Modelling Intellectual Processes: The FRBR - CRM
Harmonization. in First International DELOS Conference on Digital Libraries:
Research and Development. 2007. Pisa, Italy. 51. Hunter, J. Combining the CIDOC CRM and MPEG-7 to Describe Multimedia in
Museums. in Museums on the Web 2002. Boston, USA. 52. Kalfoglou, Y. and M. Schorlemmer, Ontology mapping: the state of the art. The
knowledge engineering Review, 2003. 18(1): p. 1-31. 53. Ding, Y. and S. Foo, Ontology research and development. Part 2 - a review of
ontology mapping and evolving Journal of Information Science, 2002. 28(5): p. 375-388.
54. Tsinaraki, C., E. Fatourou, and S. Christodoulakis. An Ontology-Driven Framework
for the Management of Semantic Metadata Describing Audiovisual Information. in International Conference on Advanced Information Systems Engineering. 2003. Velden, Austria.
55. Tsinaraki, C., et al., Ontology-Based Semantic Indexing for MPEG-7 and TV-Anytime
Audiovisual Content. Multimedia Tools and Applications, 2005. 26: p. 299-325. 56. Tsinaraki, C. and S. Christodoulakis. A multimedia user preference model that
supports semantics and its application to MPEG 7/21. in 12th International Multi-
Media Modelling Conference Proceedings. 2006. Beijing, China. 57. Arndt, R., et al., Adding Formal Semantics to MPEG-7: Designing a Well-Founded
Multimedia Ontology for the Web. Arbeitsberichte aus dem Fachbereich Informatik, 2007.
58. Qu, Y., X. Zhang, and H. Li. OREL: An ontology-based Rights Expression Language. in 13th International World Wide Web Conference. 2004. New York USA.
59. Avancha, S., S. Kallurkar, and T. Kamdar, Design of Ontology for The Internet Movie
Database (IMDb). CMSC 771, 2001. 60. Amato, G., D. Castelli, and S. Pisani. A Metadata Model for Historical Documentary
Films. in ECDL. 2000. Lisbon, Portugal. 61. Durand, G., G. Kazai, and M. Lalmas, A Metadata model supporting scalable
interactive TV services. 11th International Multimedia jModelling Conference, 2005. 62. UMLS. 2010, National Library of Medicine. 63. Bodenreider, O., The Unified Medical Language System (UMLS): integrating
biomedical terminology. Nucleic Acids Research, 2004. 32(1): p. 267-270. 64. Benson, D.A., et al., GenBank. Nucleic Acids Research, 2002. 30(1): p. 17-20. 65. Blake, J.A., et al., MGD: the Mouse Genome Database. Nucleic Acids Research,
2003. 31(1): p. 193-195. 66. Pruitt, K.D. and D.R. Maglott, RefSeq and LocusLink: NCBI gene-centered resources.
Nucleic Acids Research, 2001. 29(1): p. 137-140. 67. Centre, F.M.R. Systematized Nomenclature of Medicine – Clinical Terms (SNOMED
CT). 2007 Available from: http://www.fmrc.org.au/snomed/. 68. Stevens, R., et al., TAMBIS: Transparent Access to Multiple Bioinformatics
Information Sources. Bioinformatics, 2000. 16(2): p. 184-186. 69. Couto, F.M., M.J. Silva, and P.M. Coutinho., Measuring semantic similarity between
Gene Ontology terms. Data & Knowledge Engineering, 2007. 61(1): p. 137-152.
178
70. W3C. Resource Description Framework (RDF) Model and Syntax Specification. 1999 Available from: http://www.w3.org/TR/PR-rdf-syntax/.
71. Hendler, J. and D.L. McGuinness, The DARPA Agent Markup Language. 2000. 72. Fensel, D., et al., OIL: An Ontology Infrastructure for the Semantic Web, in IEEE
INTELLIGENT SYSTEMS. 2001. p. 38 - 45. 73. Horrocks, I., DAML+OIL: A Reason-Able Web Ontology Language, in Web Services,
E-Business, and the Semantic Web. 2002. 74. W3C. OWL Web Ontology Language. 2004 Available from:
http://www.w3.org/TR/owl-features/. 75. Tversky, A., Features of similarity. Psychological Review, 1977. 84(4): p. 327-352. 76. Goodman, N., Seven strictures on similarity, in Problems and projects. 1972, Bobbs-
Merrill: New York. p. 437-447. 77. Medin, D.L., R.L. Goldstone, and D. Gentner, Respects for Similarity.
PSYCHOLOGICAL REVIEW -NEW YORK-, 1993. 100(2): p. 254. 78. Li, Y., Z.A. Bandar, and D. McLean, An Approach for Measuring Semantic Similarity
between Words Using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering, 2003. 15(4): p. 871-882.
79. Miller, G.A., et al., Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography, 1990. 3(4): p. 235-244.
80. Blanchard, E., et al. A typology of ontology-based semantic measures. in Open
Interop Workshop on Enterprise Modelling and Ontologies for Interoperability 2005. 2005.
81. Blanchard, E., et al., A Tree-Based Similarity for Evaluating Concept Proximities in
an Ontology, in Studies in Classification, Data Analysis, and Knowledge
Organization. 2006. p. 3-11. 82. Hampton, J.A. and P.J. Taylor, Effects of Semantic Relatedness on Same-Different
Decisions in a Good-Bad Categorization Task. Journal of Experimental Pshychology, 1985. 11(1): p. 85-93.
83. Budanitsky, A. and G. Hirst. Semantic distance in WordNet: An experimental,
application-oriented evaluation of five measures. in Workshop on WordNet and Other
Lexical Resources. 2001. Pittsburgh, Pennsylvania, USA. 84. Budanitsky, A., Lexical Semantic Relatedness and Its Application in Natural
Language Processing. 1999, University of Toronto. 85. Mihalcea, R. Using Wikipedia for AutomaticWord Sense Disambiguation. in Human
Language Technologies: The Annual Conference of the North American Chapter of
the Association for Computational Linguistics. 2007. Rochester, New York, USA. 86. Maedche, A. and S. Staab. Measuring Similarity between Ontologies. in 13th
International Conference on Knowledge Engineering and Knowledge Management.
Ontologies and the Semantic Web. 2002. Siguenza, Spain. 87. Hirst, G. and D. St-Onge., Lexical chains as representations of context for the
detection and correction of malapropisms, in Fellbaum 1998. 1998. p. 305–332. 88. Leacock, C. and M. Chodorow, Combining local context and WordNet similarity for
word sense identification, in Fellbaum 1998. 1998. p. 265–283. 89. Resnik, P. Using information content to evaluate semantic similarity. in 14th
International Joint Conference on Artificial Intelligence. 1995. Montreal, Canada. 90. Jiang, J.J. and D.W. Conrath. Semantic similarity based on corpus statistics and
lexical taxonomy. in International Conference on Research in Computational
Linguistics. 1997. Taiwan. 91. Lin, D. An information-theoretic definition of similarity. in 15th International
Conference on Machine Learning. 1998. Madison, Wisconsin, USA.
179
92. Rada, R., et al., Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 1989. 19(1): p. 17-30.
93. Spink, A. and T. Saracevic, Human-computer interaction in information retrieval:
nature and manifestations of feedback. Interacting with Computers, 1998. 10(3): p. 249-267.
94. Spink, A., et al., Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology, 2000. 52(3): p. 226 - 234.
95. Jansen, B.J., A. Spink, and B. Narayan. Query Modification Patterns During Web
Searching. in International Conference on Information Technology. 2007. Las Vegas, Nevada, USA.
96. Spink, A., H. Greisdorf, and J. Bateman, From highly relevant to not relevant:
examining different regions of relevance. Information Processing & Management, 1998. 34(5): p. 599-621.
97. Tao, X., et al. An Ontology-based Framework for Knowledge Retrieval. in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent
Technology. 2008. Sydney, Australia. 98. Pirolli, P., Powers of 10: Modeling complex information-seeking systems at multiple
scales, in IEEE Computer. 2009. p. 33-40. 99. Golovchinsky, G., P. Qvarfordt, and J. Pickens, Collaborative information seeking, in
IEEE Computer. 2009. p. 47-51. 100. Chi, E.H., Information Seeking Can Be Social, in IEEE Computer. 2009. p. 42-46. 101. Mackendrick, A., On Film-Making, ed. P. Cronin. 2004: Faber and Faber Ltd. 102. Wikipedia. Distance. 2009 [cited 2009; Available from:
http://en.wikipedia.org/wiki/Distance. 103. Wikipedia. Metric (mathematics). 2009 [cited 2009 27-09-2009]; Available from:
http://en.wikipedia.org/wiki/Metric_(mathematics). 104. Sun. Java. [cited 2009; Available from: http://java.sun.com/. 105. Harold, E.R., JDOM, in Processing XML with Java. 2002, Addison-Wesley. 106. Saxonica. Saxon Documentation. [cited 2008; Available from:
http://www.saxonica.com/documentation/index/intro.html. 107. Bergsten, H., JavaServer Pages, ed. 3rd. 2003: McGraw-Hill Osborne Media. 108. W3C. XQuery. [cited 2009; Available from: http://www.w3.org/XML/Query/. 109. Smith, R., B. Pham, and S. Choudhury. A Digital Artwork Expression Language
(DAEL). in 11th IASTED International Conference on Internet and Multimedia
Systems and Applications. 2007. Honolulu, Hawaii, USA. 110. Wasson, J. Pearson Product Moment Correlation Coefficient. 2008 [cited 2010;
Available from: http://www.mnstate.edu/wasson/ed602pearsoncorr.htm. 111. OpenClinical.org. Ontologies. [cited 2010; Available from:
http://www.openclinical.org/ontologies.html.
180
181
Appendix A: Loculus Ontology Availability The Loculus Ontology is available for viewing on the CCI wiki. The URL of the wiki page
for the ontology is given below:
https://wiki.cci.edu.au/spaces/listattachmentsforspace.action?key=Loculus
As the ontology is in RDF form, reading and following it can be difficult. Altova
SemanticWorks, the software that was used to create the ontology, can render it visually and
thus make it easier to read. A trial version of Altova SemanticWorks with a free 30 day trial
licence can be downloaded from the URL given below:
http://www.altova.com/semanticworks.html
182
183
Appendix B: Web Survey Results This appendix contains the complete result of the web survey discussed in Section 7.2.2.
Students of the QUT Film School were asked to rate the closeness of thirty pairs of concepts
taken from the motion picture industry. The student rated closeness using scores in Table B.1.
The scores calculated by the metric and how it translation into the 1-5 scare is shown in Table
B.2. The tables B.3 to B.6 display the results of the survey, grouped according to the roles the
students identified as being what they were aiming to enter within the industry.
The remaining tables, B.7 to B.9 display the Pearson Product-Moment Correlation
Coefficient for the ratings provided by the participants against other participants. This is
discussed in Section 7.2.3.
Table B.1: Legend
Legend Meaning
1: Closely related
2: Somewhat closely related
3: Somewhat related
4: Distantly related
5: In domain
6 I don't know
7 No answer
Table B.2: Translation of metric generated score to a scale of 1 to 5
Score Translation
0 – 10: 1
11 - 20: 2
20 – 40: 3
40 – 60: 4
60+: 5
184
Table B.3: Editors
Concept 1 Concept 2 Editor 1 Editor 2 Editor 3
1 Frame rate per second Fps 2 1 6
2 Mood Tone 2 1 1
3 Treatment Producer 3 2 1
4 Method Acting Actor 1 1 1
5 Motion Picture Cinema 2 1 1
6 Editor Cutting on action 1 1 1
7 Film Festival Cinema 2 1 1
8 Genre Category 2 3 3
9 Mood Category 3 2 1
10 Mood Style 3 2 1
11 Cross-cutting Editor 1 1 1
12 Camera Lighting 1 1 1
13 Cross-cutting Cutting on action 1 1 2
14 Screening copy Publicity 2 3 6
15 Film Festival Award 1 2 1
16 Mood Genre 2 1 1
17 Mood Rating 4 3 6
18 Film Festival Prestige 4 2 1
19 Screening copy Prestige 4 4 6
20 Motion Picture Actor 1 2 1
21 Motion Picture Editor 3 2 1
185
22 Film Festival Producer 3 3 3
23 Cross-cutting Motion Picture 2 2 1
24 Editor Category 3 1 3
25 Script Prop 2 3 3
26 Editor Genre 2 2 4
27 Cross-cutting Actor 5 3 5
28 Script Camera 4 4 1
29 Camera Prop 4 3 3
30 Method Acting Camera 3 3 3
Table B.4: Producers
Concept 1 Concept 2 Producer
1
Producer
2
Producer
3
Producer
4
Producer
5
Producer
6
Producer
7
Producer
8
Prodcuer
9
1 Frame rate per
second
Fps 1 1 1 1 1 2 5 1 1
2 Mood Tone 3 2 1 1 1 1 3 1 1
3 Treatment Producer 2 1 1 1 1 1 7 2 1
4 Method Acting Actor 1 1 1 1 4 1 2 1 1
5 Motion Picture Cinema 2 1 2 1 1 1 2 1 1
6 Editor Cutting on
action
2 1 1 1 1 1 3 1 1
7 Film Festival Cinema 1 2 1 1 2 1 4 1 1
8 Genre Category 4 5 2 2 1 2 3 1 1
9 Mood Category 4 5 5 2 5 2 3 2 2
186
1
0
Mood Style 2 2 3 2 5 4 3 1 2
1
1
Cross-cutting Editor 1 1 1 1 3 7 1 1 1
1
2
Camera Lighting 1 1 1 1 1 1 4 1 2
1
3
Cross-cutting Cutting on
action
2 1 1 1 5 1 3 1 1
1
4
Screening copy Publicity 1 1 3 2 5 1 4 1 1
1
5
Film Festival Award 1 1 1 1 1 1 4 1 2
1
6
Mood Genre 3 1 1 1 2 1 3 1 1
1
7
Mood Rating 5 5 5 3 5 3 3 3 3
1
8
Film Festival Prestige 2 2 6 2 1 1 4 1 1
1
9
Screening copy Prestige 3 2 3 3 6 2 4 1 3
2
0
Motion Picture Actor 2 1 1 2 1 1 2 1 3
2 Motion Picture Editor 2 1 1 2 1 1 2 1 2
187
1
2
2
Film Festival Producer 4 1 2 1 2 1 4 1 2
2
3
Cross-cutting Motion Picture 1 1 1 2 1 1 2 1 1
2
4
Editor Category 3 5 5 3 5 2 3 3 3
2
5
Script Prop 2 2 2 1 1 1 3 2 4
2
6
Editor Genre 4 5 3 2 2 3 3 2 5
2
7
Cross-cutting Actor 5 4 3 4 4 3 2 3 3
2
8
Script Camera 1 2 5 3 2 2 1 2 3
2
9
Camera Prop 3 2 5 2 2 1 2 2 5
3
0
Method Acting Camera 2 3 3 2 1 2 4 2 3
188
Table B.5: Miscellaneous Crew
Concept 1 Concept 2 Director DOP Production Designer Researcher TV
1 Frame rate per second Fps 1 1 6 1
2 Mood Tone 2 1 1 2
3 Treatment Producer 3 1 1 1
4 Method Acting Actor 1 1 1 5
5 Motion Picture Cinema 2 1 1 3
6 Editor Cutting on action 1 1 1 1
7 Film Festival Cinema 1 1 1 3
8 Genre Category 5 1 1 3
9 Mood Category 5 1 3 2
10 Mood Style 4 1 2 3
11 Cross-cutting Editor 2 1 1 1
12 Camera Lighting 2 1 1 1
13 Cross-cutting Cutting on action 1 1 1 1
14 Screening copy Publicity 2 1 7 1
15 Film Festival Award 1 1 1 1
16 Mood Genre 4 1 2 2
17 Mood Rating 5 5 5 4
18 Film Festival Prestige 2 2 1 1
19 Screening copy Prestige 3 2 6 5
20 Motion Picture Actor 2 3 1 1
21 Motion Picture Editor 2 3 1 1
189
22 Film Festival Producer 3 2 2 1
23 Cross-cutting Motion Picture 3 5 3 3
24 Editor Category 4 5 5 2
25 Script Prop 5 1 1 1
26 Editor Genre 5 3 3 2
27 Cross-cutting Actor 3 3 2 3
28 Script Camera 5 5 2 5
29 Camera Prop 4 1 3 2
30 Method Acting Camera 5 3 2 2
Table B.6: Multi-role
Concept 1 Concept 2 Producer, Director, Actor Editor, DOP, Director Unsure of role No specific role Student
1 Frame rate per second Fps 1 1 6 4 1
2 Mood Tone 1 1 3 5 2
3 Treatment Producer 1 1 2 3 1
4 Method Acting Actor 1 2 1 5 1
5 Motion Picture Cinema 1 1 2 5 1
6 Editor Cutting on action 1 1 1 5 1
7 Film Festival Cinema 3 1 1 4 3
8 Genre Category 2 6 6 3 4
9 Mood Category 2 6 3 4 2
10 Mood Style 2 1 3 5 2
11 Cross-cutting Editor 1 1 1 4 1
12 Camera Lighting 1 1 2 5 1
190
13 Cross-cutting Cutting on action 1 2 1 4 1
14 Screening copy Publicity 2 1 3 5 2
15 Film Festival Award 3 1 5 4 2
16 Mood Genre 1 1 3 3 2
17 Mood Rating 3 5 3 4 2
18 Film Festival Prestige 3 2 4 4 3
19 Screening copy Prestige 6 4 3 4 2
20 Motion Picture Actor 1 3 2 4 2
21 Motion Picture Editor 1 2 2 5 2
22 Film Festival Producer 4 1 5 3 3
23 Cross-cutting Motion Picture 1 2 2 5 2
24 Editor Category 7 6 6 3 4
25 Script Prop 5 1 2 3 2
26 Editor Genre 4 4 4 3 4
27 Cross-cutting Actor 3 5 5 4 3
28 Script Camera 2 4 3 3 2
29 Camera Prop 3 4 5 4 3
30 Method Acting Camera 4 4 4 3 2
191
Table B.7: Correlation for Producers Against Other Participants
Producer 1 Producer 2 Producer 3 Producer 4 Producer 5 Producer 6 Producer 7 Producer 8 Producer 9
Producer 1 1.00 0.73 0.48 0.49 0.28 0.49 -0.02 0.56 0.40
Producer 2 0.73 1.00 0.68 0.61 0.38 0.65 0.00 0.70 0.45
Producer 3 0.48 0.68 1.00 0.70 0.52 0.54 -0.14 0.72 0.61
Producer 4 0.49 0.61 0.70 1.00 0.40 0.62 -0.23 0.64 0.49
Producer 5 0.28 0.38 0.52 0.40 1.00 0.47 -0.10 0.38 0.05
Producer 6 0.49 0.65 0.54 0.62 0.47 1.00 -0.03 0.49 0.38
Producer 7 -0.02 0.00 -0.14 -0.23 -0.10 -0.03 1.00 -0.19 -0.15
Producer 8 0.56 0.70 0.72 0.64 0.38 0.49 -0.19 1.00 0.61
Producer 9 0.40 0.45 0.61 0.49 0.05 0.38 -0.15 0.61 1.00
Editor 1 0.53 0.44 0.71 0.75 0.21 0.47 -0.05 0.60 0.41
Editor 2 0.34 0.28 0.55 0.62 0.03 0.31 -0.04 0.35 0.53
Editor 3 0.70 0.61 0.42 0.51 0.18 0.44 0.08 0.66 0.63
Director 0.56 0.69 0.70 0.49 0.15 0.53 -0.17 0.61 0.59
DOP 0.22 0.39 0.46 0.69 0.13 0.33 -0.30 0.55 0.38
Production Designer 0.52 0.65 0.78 0.62 0.54 0.51 -0.03 0.72 0.48
Researcher TV 0.16 0.29 0.39 0.48 0.28 0.40 -0.30 0.20 0.13
Producer, Director,
Actor
0.37 0.47 0.45 0.26 0.01 0.28 0.29 0.50 0.65
Editor, DOP, Director 0.57 0.74 0.72 0.84 0.24 0.52 -0.26 0.73 0.69
Unsure of role 0.52 0.39 0.50 0.44 -0.02 0.33 0.27 0.39 0.51
No specific role -0.35 -0.41 -0.30 -0.16 0.08 -0.16 -0.09 -0.45 -0.40
Student 0.54 0.67 0.45 0.49 0.07 0.32 0.09 0.37 0.46
192
Table B.8: Correlation for Editors and Miscellaneous Crew Against All Other Participants
Editor 1 Editor 2 Editor 3 Director DOP Production Designer Researcher TV
Producer 1 0.53 0.34 0.70 0.56 0.22 0.52 0.16
Producer 2 0.44 0.28 0.61 0.69 0.39 0.65 0.29
Producer 3 0.71 0.55 0.42 0.70 0.46 0.78 0.39
Producer 4 0.75 0.62 0.51 0.49 0.69 0.62 0.48
Producer 5 0.21 0.03 0.18 0.15 0.13 0.54 0.28
Producer 6 0.47 0.31 0.44 0.53 0.33 0.51 0.40
Producer 7 -0.05 -0.04 0.08 -0.17 -0.30 -0.03 -0.30
Producer 8 0.60 0.35 0.66 0.61 0.55 0.72 0.20
Producer 9 0.41 0.53 0.63 0.59 0.38 0.48 0.13
Editor 1 1.00 0.65 0.42 0.51 0.44 0.48 0.38
Editor 2 0.65 1.00 0.48 0.58 0.35 0.25 0.35
Editor 3 0.42 0.48 1.00 0.48 0.23 0.41 0.01
Director 0.51 0.58 0.48 1.00 0.41 0.59 0.29
DOP 0.44 0.35 0.23 0.41 1.00 0.67 0.35
Production Designer 0.48 0.25 0.41 0.59 0.67 1.00 0.31
Researcher TV 0.38 0.35 0.01 0.29 0.35 0.31 1.00
Producer, Director, Actor 0.41 0.61 0.67 0.50 0.16 0.32 0.00
Editor, DOP, Director 0.63 0.63 0.64 0.54 0.68 0.67 0.51
Unsure of role 0.58 0.57 0.56 0.41 0.19 0.40 -0.02
No specific role -0.27 -0.36 -0.50 -0.59 -0.21 -0.25 0.03
Student 0.42 0.34 0.63 0.48 0.34 0.44 0.10
193
Table B.9: Correlation for Remaining Participants Against All Other Participants
Producer, Director, Actor Editor, DOP, Director Unsure of role No specific role Student
Producer 1 0.37 0.57 0.52 -0.35 0.54
Producer 2 0.47 0.74 0.39 -0.41 0.67
Producer 3 0.45 0.72 0.50 -0.30 0.45
Producer 4 0.26 0.84 0.44 -0.16 0.49
Producer 5 0.01 0.24 -0.02 0.08 0.07
Producer 6 0.28 0.52 0.33 -0.16 0.32
Producer 7 0.29 -0.26 0.27 -0.09 0.09
Producer 8 0.50 0.73 0.39 -0.45 0.37
Producer 9 0.65 0.69 0.51 -0.40 0.46
Editor 1 0.41 0.63 0.58 -0.27 0.42
Editor 2 0.61 0.63 0.57 -0.36 0.34
Editor 3 0.67 0.64 0.56 -0.50 0.63
Director 0.50 0.54 0.41 -0.59 0.48
DOP 0.16 0.68 0.19 -0.21 0.34
Production Designer 0.32 0.67 0.40 -0.25 0.44
Researcher TV 0.00 0.51 -0.02 0.03 0.10
Producer, Director, Actor 1.00 0.36 0.59 -0.54 0.62
Editor, DOP, Director 0.36 1.00 0.42 -0.28 0.44
Unsure of role 0.59 0.42 1.00 -0.36 0.67
No specific role -0.54 -0.28 -0.36 1.00 -0.47
Student 0.62 0.44 0.67 -0.47 1.00
top related