loculus: an ontology-based i nformation management...

Loculus: An Ontology-Based I nformation Management Framework for the Motion

Picture Industry •

I ARC CENTRE OF EXCELLENCE

I FOR CREATIV I OUSTRIES

A 0 INNOVATION

Sharmin (Tinni) Choudhury, B.Eng (Hons), B.Com Thesis

Submitted to the Discipline of Computer Science at the Faculty of

Science and Technology In partial fulfilment of the requirements for the

degree of Doctor of Philosophy

At Queensland University of Technology

Brisbane 2010

Dedicated To My Mother and My Grandparents

� بس��م ال��رحيم ال��رحمن

STATEMENT OF ORIGINAL AUTHORSHIP

I hereby declare that the work contained in this thesis titled “Loculus: An Ontology-Based

Information Management Framework for the Motion Picture Industry” has not been

previously submitted to meet requirements for an award at this or any other higher education

institution. To the best of my knowledge and belief, the thesis contains no material previously

published or written by another person except where due reference is made.

----------------------------------------------------

Sharmin (Tinni) Choudhury

19 October 2010

SUPERVISORY PANEL

Principal Supervisor

Professor Kerry Raymond

Discipline of Computer Science

Faculty of Science and Technology

Queensland University of Technology

Associate Supervisor

Mr Peter Higgs

ARC Centre of Excellence for Creative Industries and Innovation (CCI)

Creative Industries Faculty

Queensland University of Technology

ABSTRACT

“How do you film a punch?” This question can be posed by actors, make-up artists, directors

and cameramen. Though they can all ask the same question, they are not all seeking the same

answer. Within a given domain, based on the roles they play, agents of the domain have

different perspectives and they want the answers to their question from their perspective. In

this example, an actor wants to know how to act when filming a scene involving a punch. A

make-up artist is interested in how to do the make-up of the actor to show bruises that may

result from the punch. Likewise, a director wants to know how to direct such a scene and a

cameraman is seeking guidance on how best to film such a scene. This role-based difference

in perspective is the underpinning of the Loculus framework for information management for

the Motion Picture Industry.

The Loculus framework exploits the perspective of agent for information extraction and

classification within a given domain. The framework uses the positioning of the agent’s role

within the domain ontology and its relatedness to other concepts in the ontology to determine

the perspective of the agent. Domain ontology had to be developed for the motion picture

industry as the domain lacked one. A rule-based relatedness score was developed to calculate

the relative relatedness of concepts with the ontology, which were then used in the Loculus

system for information exploitation and classification.

The evaluation undertaken to date have yielded promising results and have indicated that

exploiting perspective can lead to novel methods of information extraction and

classifications.

KEYWORDS

Ontology, Motion Picture Industry, Human Factors, Semantic Relatedness, Knowledge

Extraction, Knowledge Classification

ACKNOWLEDGEMENTS

Firstly, I would like to thank my principal supervisor Kerry Raymond for all her patient

guidance and support. Kerry rather graciously took over my supervision when my former

principal supervisor Binh Pham retired. I couldn’t have been the easiest student to handle but

she still put up with me. For that I am very grateful. I would also like to take this opportunity

to thank my former principal supervisor Binh Pham for giving me an opportunity to do my

PhD and also for securing the necessary funding, without which doing a PhD would not have

been very financially viable. My thanks also go to my associate supervisor Peter Higgs for

putting up with me for so long, especially acting as the conduit between myself and the

creative industry.

Stuart Cunningham is logically the next person on my list of people to thank. This PhD

would not have been possible without the support of Stuart and the ARC Centre for

Excellence and Innovation in the Creative Industries (CCI). I would also like to thank the

hard working individuals in the research and higher degree office of the Faculty of Science

and Technology, Agatha Nucifora, Carol Richter, Matt Williams, Jason Weiss and Sara

Thomas, who have helped me with everything from the arranging travel to sorting out the

myriad of forms a PhD student has to fill out. I would like to especially thank Ricky Tunny

who did not only help me while he was at the Faculty as the research co-ordinator but

continued to help me with forms, applications and other things after he changed jobs and

became the Coordinator Scholarships, Admission & Enrolments at the QUT research office.

Just goes to show, there is no escaping me.

I would also like to thank my mother (Neena Choudhury) for all her support and for inspiring

me on my PhD journey. My thanks also goes to my grandfather (Sorwar Jan Choudhury) for

always believing in me and for inspiring me to do engineering instead of following both my

parents into chemistry. Also, I would like to thank my grandmother (Meena Choudhury) who

encouraged me whole heartedly in the whole PhD thing despite having to share the “Dr” title

with me, but at least she can still say “yes” to the all important question “Is there a Doctor in

the house?” I would also like to thank my brother (Adnan Ali Khan Choudhury) for simply

being there.

I would also like to thank my friends for sticking by me and peppering me with messages of

support. I would especially like to mention Suzanne Little, who having gone through the PhD

process herself was always there with a sympathetic ear.

PUBLICATIONS FROM THE RESEARCH

Reference Type Refereed

Choudhury, Sharmin (2006) A Metadata-based Framework for the management,

distribution and reuse of digital Motion Picture content. CCI Symposium, October

11 – 12, 2006, Melbourne, Australia

Symposium Presentation No

Choudhury, Sharmin and Pham, Binh L. and Smith, Robert and Higgs, Peter L. (2007)

Loculus: a metadata wrapper for digital motion picture. Internet and Multimedia

Systems and Applications, August 20 – 22, 2007, Honolulu, Hawaii, USA.

Conference Paper Yes

Smith, Robert and Pham, Binh L. and Choudhury, Sharmin (2007) A digital artwork

expression language (DAEL). Internet and Multimedia Systems and Applications,

August 20 – 22, 2007, Honolulu, Hawaii, USA.

Choudhury, Sharmin (2007) A metadata-based framework for the management,

distribution and reuse of digital motion pictures. CCI Symposium, October 18-19,

2007, Melbourne, Australia

Symposium Presentation No

Choudhury, Sharmin and Raymond, Kerry and Higgs, Peter L. (2008) A rule-based

metric for calculating semantic relatedness score for the motion picture industry.

Workshop on Natural Language Processing and Ontology Engineering at the Web

IEEE/WIC/ACM Intelligence Conference, December 9 – 12, 2008, Sydney, Australia.

Choudhury, Sharmin (2008) Ontology based perspective determination and its

implications for searching. Third Workshop of the HCSNet Next-Generation Search

Technology Priority Area, November 13 2008, Melbourne, Australia

Workshop Presentation No

Choudhury, Sharmin (2008) Improving human computer interaction. Creating

Value: Between Commerce and Commons - CCI International Conference, June 25

– 27, 2008, Brisbane, Australia

Conference Presentation No

Choudhury, Sharmin and Raymond, Kerry and Higgs, Peter L. Ontology-Based

Information Extraction and Classification: Exploiting User Perspective within

the Motion Picture Industry. International Conference on Asian Digital Libraries,

July 21 – 25, 2010, Gold Coast, Australia

Conference Poster No

TABLE OF CONTENTS

1 Introduction ................................................................................................................... 1

1.1 Motion Picture Industry Background ..................................................................... 2

1.1.1 Timelines ........................................................................................................... 3

1.1.2 Importance of People ......................................................................................... 4

1.1.3 The Product of the Industry ................................................................................ 6

1.1.4 Challenges being faced ...................................................................................... 9

1.2 Motivation for the Project .................................................................................... 11

1.2.1 Decision Support ............................................................................................. 12

1.2.2 Repurpose ........................................................................................................ 13

1.2.3 Reuse ............................................................................................................... 14

1.2.4 Driving Questions ............................................................................................ 14

1.3 Project Background ............................................................................................. 15

1.3.1 Brief introduction to CCI and AFTRS .............................................................. 15

1.3.2 What AFTRS wanted from the Project ............................................................. 16

1.3.3 Overview of the Standards and Metadata Project ............................................. 17

1.4 The Thesis ........................................................................................................... 18

1.4.1 Contribution ..................................................................................................... 21

1.4.2 Broader application of the research .................................................................. 22

1.4.3 Structure of Thesis ........................................................................................... 23

2 Literature Review ........................................................................................................ 25

2.1 Information Models and Metadata Models ........................................................... 25

2.1.1 Existing Models ............................................................................................... 26

2.1.2 Metadata Schemas and Standards ..................................................................... 31

2.1.3 Summary ......................................................................................................... 33

2.2 Ontology.............................................................................................................. 34

2.2.1 The Use of Ontology within Computer Science ................................................ 34

2.2.2 Existing Ontologies .......................................................................................... 38

2.2.3 Ontologies in other domains............................................................................. 42

2.2.4 Ontology Implementation Languages ............................................................... 43

2.2.5 Summary ......................................................................................................... 44

2.3 Semantic relatedness ............................................................................................ 44

2.3.1 What is semantic relatedness ............................................................................ 44

2.3.2 Existing measures of semantic relatedness ....................................................... 46

2.3.3 Summary ......................................................................................................... 49

2.4 Information Extraction ......................................................................................... 49

2.5 Summary ............................................................................................................. 51

3 Research Plan .............................................................................................................. 53

3.1 Research Questions .............................................................................................. 53

3.2 Research Methodology ........................................................................................ 53

4 Loculus Ontology ........................................................................................................ 57

4.1 Conceptual Foundation of the Ontology ............................................................... 58

4.2 Axioms ................................................................................................................ 59

4.2.1 General Axioms ............................................................................................... 59

4.2.1.1 Inclusion Axioms .................................................................................... 59

4.2.1.2 Temporal Context ................................................................................... 62

4.2.1.3 Temporal Axioms ................................................................................... 64

4.2.2 Concept Axioms .............................................................................................. 65

4.2.2.1 Inheritance Axioms ................................................................................. 65

4.2.2.2 Agent Context ......................................................................................... 69

4.2.2.3 Linkage Axioms...................................................................................... 70

4.2.2.4 Terminology Axiom................................................................................ 73

4.2.3 Meta-Link Axioms ........................................................................................... 74

4.3 Structure of the Ontology ..................................................................................... 76

4.3.1 MPI Concepts Ontology ................................................................................... 77

4.3.2 Agent Concepts Ontology ................................................................................ 77

4.3.3 Common Concepts Ontology ........................................................................... 78

4.3.4 The Root Concepts........................................................................................... 78

4.3.5 Lattice Structure .............................................................................................. 82

4.3.6 Three Axes ...................................................................................................... 83

4.4 Ontology Implementation .................................................................................... 84

4.4.1 OWL ............................................................................................................... 84

4.4.2 Altova SemanticWorks .................................................................................... 86

4.4.3 How Concepts are Represented ........................................................................ 86

4.5 Ontology completeness ........................................................................................ 89

4.6 Summary ............................................................................................................. 89

5 Semantic Relatedness Metric ....................................................................................... 91

5.1 Introduction to the Relatedness Metric ................................................................. 92

5.2 Rules for Calculation ........................................................................................... 94

5.2.1 Reach Score ..................................................................................................... 96

5.2.1.1 Inheritance Axis ...................................................................................... 97

5.2.1.2 Linkage Axis ........................................................................................ 100

5.2.2 Temporal Score .............................................................................................. 101

5.2.2.1 Motion Picture Industry Production Cycle............................................. 101

5.2.2.2 Motion Picture Life Stage ..................................................................... 102

5.2.3 Example Calculations .................................................................................... 103

5.3 Abstraction ........................................................................................................ 110

5.4 Application of the Metric ................................................................................... 111

5.4.1 Information Extraction ................................................................................... 113

5.4.2 Information Classification .............................................................................. 117

5.5 Summary ........................................................................................................... 118

6 Loculus System and Loculus Schema ........................................................................ 119

6.1 The System ........................................................................................................ 119

6.1.1 The High-Level System Architecture ............................................................. 119

6.1.2 The Loculus System Architecture .................................................................. 122

6.1.3 Technology Choices ...................................................................................... 124

6.2 Ingestation Modules ........................................................................................... 124

6.3 Record Management Module ............................................................................. 125

6.3.1 Loculus Wrapper Schema .............................................................................. 126

6.3.2 Implementation Details .................................................................................. 129

6.4 Semantic Module ............................................................................................... 130

6.4.1 Ontology Reader Class................................................................................... 131

6.4.2 Classification Identification Class .................................................................. 132

6.4.3 Production Cycle Identification Class ............................................................ 133

6.4.4 Distance Metric Class .................................................................................... 133

6.5 Classification Module ........................................................................................ 135

6.5.1 Ingest Class.................................................................................................... 135

6.5.2 Record Formulation Class .............................................................................. 136

6.5.3 Classification Class ........................................................................................ 136

6.6 Information Extraction Module .......................................................................... 137

6.6.1 Query Class ................................................................................................... 138

6.6.2 Result Class ................................................................................................... 139

6.6.3 Ranking Class ................................................................................................ 139

6.6.4 Disseminate Class .......................................................................................... 140

6.7 Dissemination Module ....................................................................................... 140

6.8 External Data Services Module .......................................................................... 141

6.9 So, how DO you film a punch? .......................................................................... 141

6.10 Summary ........................................................................................................... 143

7 Evaluation and Discussion ......................................................................................... 145

7.1 Evaluation of the Ontology ................................................................................ 145

7.2 Evaluation of the Metric .................................................................................... 145

7.2.1 Stage 1 – Interview with a Producer ............................................................... 148

7.2.2 Stage 2 – Web Survey .................................................................................... 150

7.2.3 Statistical Analysis Of Stage 2 Data ............................................................... 151

7.2.4 Comparison With Rada’s Simple Edge Counting ........................................... 156

7.3 System Evaluation against Available Data ......................................................... 160

7.4 Discussion of the Ontology ................................................................................ 160

7.4.1 Achievements ................................................................................................ 160

7.4.2 Limitations and Future Works ........................................................................ 161

7.5 Discussion of the Relatedness Metric ................................................................. 162

7.5.1 Achievements ................................................................................................ 162

7.5.2 Limitations and Future work .......................................................................... 166

7.6 Discussion of the Loculus System ...................................................................... 166

7.7 Generalization of this Research .......................................................................... 167

8 Conclusions ............................................................................................................... 171

References ........................................................................................................................ 175

Appendix A: Loculus Ontology Availability ..................................................................... 181

Appendix B: Web Survey Results ..................................................................................... 183

LIST OF FIGURES

FIGURE 1.1: THE TWO INDUSTRY TIMELINES .......................................................................................................... 3

FIGURE 1.2: DISTANCE TO EDITING ....................................................................................................................... 20

FIGURE 2.1: ENTITIES AND “PRIMARY” RELATIONSHIPS OF THE FRBR MODEL ..................................................... 27

FIGURE 2.2: ENTITIES AND “RESPONSIBILITY” RELATIONSHIPS ............................................................................. 28

FIGURE 2.3: ENTITIES AND “SUBJECT” RELATIONSHIPS ......................................................................................... 29

FIGURE 2.4: ODRL MODEL ................................................................................................................................... 30

FIGURE 2.5: ABC CLASS HIERARCHY WITH PROPERTIES ...................................................................................... 40

FIGURE 2.6: HIERARCHICAL SEMANTIC KNOWLEDGE BASE ................................................................................... 49

FIGURE 4.1: THE TWO TIMELINES OF THE INDUSTRY (REUSING FIGURE 1.1) .......................................................... 62

FIGURE 4.2: THE CONCEPT OF EDITING WITH VERTICAL AND HORIZONTAL LINKS ................................................. 65

FIGURE 4.3: INHERITANCE HIERARCHY OF SOME OF THE CONCEPTS WITHIN THE MOTION PICTURE INDUSTRY ...... 67

FIGURE 4.4: THE META-LINK HIERARCHY .............................................................................................................. 76

FIGURE 4.5: ONTOLOGY EXTRACT – EDITING ....................................................................................................... 80

FIGURE 4.6: ONTOLOGY EXTRACT – THE INHERITANCE OF EDITING ..................................................................... 81

FIGURE 4.7: ONTOLOGY EXTRACT – ACTION ........................................................................................................ 81

FIGURE 4.8: ONTOLOGY EXTRACT – MOTION PICTURE ......................................................................................... 82

FIGURE 4.9: ONTOLOGY EXTRACT – EDITING WITHIN THE THREE AXES ............................................................... 83

FIGURE 4.10: ONTOLOGY EXTRACT – CATEGORY AND ITS CHILDREN ................................................................... 85

FIGURE 4.11: THE REPRESENTATION OF EDITOR AT THE XML LEVEL ................................................................... 87

FIGURE 4.12: INHERITANCE RELATIONSHIP REPRESENTATION .............................................................................. 87

FIGURE 4.13: LINKAGE RELATIONSHIP REPRESENTATION ..................................................................................... 88

FIGURE 4.14: XML-LEVEL REPRESENTATION OF THE CONCEPT OF EDITOR .......................................................... 88

FIGURE 5.1: ONTOLOGY EXTRACT - EDITING ........................................................................................................ 91

FIGURE 5.2: ONTOLOGY EXTRACT - INHERITANCE LINKS AND LINKAGE LINKS ................................................... 93

FIGURE 5.3: ONTOLOGY EXTRACT – APPLICATION OF INHERITANCE AXIS RULE 3 ............................................... 98

FIGURE 5.4: CREW HIERARCHY AND CLOSE ASSOCIATION BETWEEN EDITOR AND MAKE-UP ARTIST .................. 100

FIGURE 5.5: THE TWO TIMELINES OF THE INDUSTRY (REUSING FIGURE 1.1) ....................................................... 101

FIGURE 5.6: THE SCORE CALCULATION OF METHOD ACTING TO ACTOR ............................................................. 104

FIGURE 5.7: THE SCORE CALCULATION OF MOOD TO CATEGORY ........................................................................ 105

FIGURE 5.8: THE SCORE CALCULATION OF MOOD TO RATING ............................................................................. 106

FIGURE 5.9: THE SCORE CALCULATION OF FILM FESTIVAL TO PRESTIGE ............................................................ 107

FIGURE 5.10: THE SCORE CALCULATION OF SCORE TO PROP ............................................................................... 108

FIGURE 5.11: THE SCORE CALCULATION OF METHOD ACTING TO CAMERA ........................................................ 109

FIGURE 5.12: THE SCORE CALCULATION OF METHOD ACTING TO CAMERA THROUGH AGENTS .......................... 110

FIGURE 5.13: AGENT PERSPECTIVE ..................................................................................................................... 112

FIGURE 5.14: EDITOR PERSPECTIVE .................................................................................................................... 113

FIGURE 5.15: ONTOLOGY EXTRACT – EDITOR AND CROSS-CUTTING ................................................................... 114

FIGURE 5.16: ONTOLOGY EXTRACT – GENRE, CATEGORY AND TONE................................................................. 115

FIGURE 5.17: ONTOLOGY EXTRACT – CROSS-CUTTING ....................................................................................... 116

FIGURE 6.1: HIGH-LEVEL SYSTEM ARCHITECTURE ............................................................................................. 120

FIGURE 6.2: USER INTERFACE - LOCULUS MENU ................................................................................................ 121

FIGURE 6.3: SYSTEM WORK FLOW ...................................................................................................................... 122

FIGURE 6.4: THE LOCULUS SYSTEM .................................................................................................................... 123

FIGURE 6.5: THE INGESTATION MODULE ............................................................................................................ 124

FIGURE 6.6: THE LOCULUS RECORD MANAGEMENT MODULE ............................................................................ 125

FIGURE 6.7: THE LIFE STAGE TIMELINE .............................................................................................................. 126

FIGURE 6.8: THE LOCULUS METADATA WRAPPER SCHEMA ................................................................................ 127

FIGURE 6.9: EXAMPLE LOCULUS RECORD ........................................................................................................... 130

FIGURE 6.10: THE SEMANTIC MODULE ............................................................................................................... 131

FIGURE 6.11: THE METHODS OF THE LOCULUS ONTOLOGY READER ................................................................... 132

FIGURE 6.12: THE DISTANCE METRIC CLASS ...................................................................................................... 134

FIGURE 6.13: THE CLASSIFICATION MODULE ...................................................................................................... 135

FIGURE 6.14: THE INFORMATION EXTRACTION MODULE .................................................................................... 137

FIGURE 6.15: USER INTERFACE - DISCOVERY AND DECISION SUPPORT............................................................... 138

FIGURE 6.16: THE DISSEMINATION MODULE ...................................................................................................... 140

FIGURE 6.17: THE EXTERNAL DATA SERVICES MODULE .................................................................................... 141

FIGURE 6.18: CONCEPTS RETURNED ................................................................................................................... 142

FIGURE 7.1: WEB SURVEY INTERFACE ................................................................................................................ 150

FIGURE 7.2: THE SCORE CALCULATION OF MOOD TO RATING (REUSING FIGURE 5.8) ........................................ 164

FIGURE 7.3: THE SCORE CALCULATION OF MOOD TO GENRE ............................................................................. 165

LIST OF TABLES

TABLE 4.1: LOCULUS ONTOLOGY BASIC STATISTIC ---------------------------------------------------------------------- 89

TABLE 7.1: THE THIRTY PAIRS OF CONCEPTS AND THEIR RELATEDNESS SCORE------------------------------------- 146

TABLE 7.2: SCORE TRANSFORMATION ------------------------------------------------------------------------------------ 148

TABLE 7.3: RESULTS OF THE STAGE 1 EVALUATION --------------------------------------------------------------------- 148

TABLE 7.4: CORRELATION COEFFICIENT FOR INDIVIDUAL RESPONDENTS -------------------------------------------- 152

TABLE 7.5: CORRELATION COEFFICIENT FOR OVERALL RATING ------------------------------------------------------ 153

TABLE 7.6: CORRELATION COEFFICIENT FOR GROUPS ------------------------------------------------------------------ 154

TABLE 7.7: AVERAGE AND MEDIAN CORRELATION COEFFICIENTS FOR HUMAN AGAINST HUMAN --------------- 154

TABLE 7.8: POSSIBLE OUTLIERS ------------------------------------------------------------------------------------------- 154

TABLE 7.9: RECALCULATED GROUP CORRELATIONS COEFFICENTS --------------------------------------------------- 155

TABLE 7.10: RECALCULATED AVERAGE AND MEDIAN FOR HUMAN AGAINST HUMAN ----------------------------- 155

TABLE 7.11: RADA’S METRIC BASELINE ---------------------------------------------------------------------------------- 156

TABLE 7.12: CORRELATION COEFFICENT BETWEEN RADA'S METRIC AND LOCULUS METRIC --------------------- 157

TABLE 7.13: CORRELATION COEFFICIENT FOR INDIVIDUAL RESPONDENTS ------------------------------------------ 158

TABLE 7.14: CORRELATION COEFFICIENT FOR GROUPS EXCLUDING OUTLIERS ------------------------------------- 159

TABLE 7.15: CORRELATION COEFFICIENT FOR DISTANT CONCEPT PAIRS, EXCLUDING OUTLIERS ---------------- 159

TABLE B.1: LEGEND ............................................................................................................................................ 183

TABLE B.2: TRANSLATION OF METRIC GENERATED SCORE TO A SCALE OF 1 TO 5 ............................................... 183

TABLE B.3: EDITORS ........................................................................................................................................... 184

TABLE B.4: PRODUCERS ...................................................................................................................................... 185

TABLE B.5: MISCELLANEOUS CREW ................................................................................................................... 188

TABLE B.6: MULTI-ROLE ..................................................................................................................................... 189

TABLE B.7: CORRELATION FOR PRODUCERS AGAINST OTHER PARTICIPANTS .................................................... 191

TABLE B.8: CORRELATION FOR EDITORS AND MISCELLANEOUS CREW AGAINST ALL OTHER PARTICIPANTS .... 192

TABLE B.9: CORRELATION FOR REMAINING PARTICIPANTS AGAINST ALL OTHER PARTICIPANTS ..................... 193

LIST OF ABBREVIATIONS

ABC – A Boring Core model and ontology

AFTRS – Australian Film Television and Radio School

AI – Artificial Intelligence

ARC – Australian Research Council

BPM – Business Process Modelling

CCI – ARC Centre of Excellence for Creative Industries and Innovation

CIDOC - Committee on Documentation of the International Council of Museums

CRM – Conceptual Reference Model

DAML - DARPA Agent Markup Language

DARPA - Defense Advanced Research Projects Agency (United States of America)

DREL – Digital Rights Language

FPS – Frames Per Second

FRBR - Functional Requirements for Bibliographic Records

FRBRoo - FRBR-object oriented

IFLA - International Federation of Library Associations and Institutions

JDom – An open source Java-based document object model

KM - Knowledge Modelling

METS - Metadata Encoding and Transmission Standard schema

METS AV - Metadata Encoding and Transmission Standard Audio-Visual schema

MPEG - Moving Picture Experts Group

MPEG-7 – Moving Picture Experts Group Standard for multimedia content description.

MPEG-21 – Moving Picture Experts Group Standard that aims at defining an open

framework for multimedia applications

ODRL - Open Digital Rights Language

OIL - Ontology Inference Layer

OWL - Web Ontology Language

QUT – Queensland University of Technology

RDF - Resource Description Framework

REL – Rights Expression Language

UI – User Interface

XML - Extensible Markup Language

XrML - eXtensible rights Markup Language

1 Introduction “I would like, if I may, to take you on a strange journey.”- The Criminologist, The Rocky

Horror Picture Show

Why do we seek information? What are we really looking for when we start to seek? What

kind of information are we really looking for? Are we looking for any answer to a question?

Or is there a specific type of answer in our mind when we pose a given question?

The information we seek and the manner in which we go about seeking it differs from person

to person based not only on our information needs but on how much knowledge we already

have on the subject. Where the information need is well-defined, information-seeking

becomes a simple information retrieval task [1], e.g. “who directed The Da Vinci Code?”.

However, where the information needs are for more complex mental activities such as

learning and decision making, information retrieval is necessary but not sufficient [1].

At the library of Australian Film Television and Radio School’s (AFTRS) [2] students

frequently seek the answer to questions such as “How is a punch filmed?1”. While many

students might ask the same question, the answer they are seeking is widely different. Editing

students are seeking opinions on how best to edit a film sequence containing a punch. The

directing students are seeking opinions on what angles work best when filming a punch and

how many angles are needed to provide the editor enough material to choose from. For acting

students the primary concern would be to pick up tips on how best to act when punching or

being punched in a given scene. The acting students might be interested in gaining some

superfluous knowledge of the art of filming a punch from the viewpoint of the editor and the

director, but only to the extent which would allow them best to perform their role as actors.

This is an example of information-seeking for a complex mental process: the students are

seeking to learn. Their information needs are not precisely defined, but equally they are not

completely open to any answer, and do have a set of criteria by which they will judge the

suitability of the information they discover for their purposes. This is in contrast to someone

who merely wishes to retrieve a particular piece of information they already know exists for a

specific purpose. For example, while making an argument regarding a specific topic, to

strengthen their argument they might wish to retrieve specific pieces of information that they

1 For the purposes of the paper, the examples of user queries are expressed in terms of what the user is thinking and not how they interface with the Loculus system.

already know to exist either from previous information-seeking activity or from prior

knowledge. This is an instance of a simple information-seeking activity that calumniates in a

simple information retrieval task. Another simple information-seeking activity is where the

seeker simply wishes to verify what they already know, e.g. they believe that Ron Howard

directed the movie The Da Vinci Code and they simply wish to confirm that it is so. This is

the kind of information-seeking activity that current technologies handle well. However the

previous “filming the punch” example is remarkably different and existing query

technologies do not handle such queries very well.

In this thesis, we propose a new method of information extraction support that is based

around the perspective of the information seeker, where perspective takes into account the

seeker’s relationship to the information being sought. We explore the viability of such an

information extraction support mechanism within the context of the motion picture industry

with the aid of our industry partners AFTRS, under the collaborative research umbrella of

the ARC Centre of Excellence for Creative Industries and Innovation (CCI). However, the

method being proposed is applicable to any domain where perspective of the information

seeker matters.

Before we proceed, we should explain what we mean by information extraction and how it

differs from information retrieval. In information retrieval as performed by a search engine,

usually whole documents are returned. In information extraction whole or part of a document

may be returned. In addition, information extraction may also include extracting information

from multiple sources and presenting them to the information seeker in a coherent manner. In

this way, the idea of information extraction is similar in nature to those in the natural

language process where information extraction involves the extraction a part of a corpus of

the text for processing [3]. Perspective sensitivity and ill-defined queries come into play

because without some awareness of perspective, the information cannot be extracted and

presented properly, with ill-defined queries being a reflection the information seeker’s

perspective in terms of their expertise on the topic of the information seeking activity.

1.1 Motion Picture Industry Background

The motion picture has been described as “THE contemporary art form” [4] that draws

together people from a wide variety of fields to create what is both a consumer product and a

cultural artifact. These artifacts are unique and always have a creative underpinning with

many films aiming for the heights of artistic expression. However, while most other mediums

of artistic expression deal with one aspect of the sensory field, the motion picture touches

multiple sensory fields [4]. Motion pictures share visual space with paintings and sculpture

[4]. Motion pictures share audible space with theatre, poetry and music [4]. Motion pictures

share the space of action with literature and theatre [4]. This speaks to the complex nature of

motion pictures and its production process, in that it is a collaborative effort that brings

together a variety of people to put together a work of multi-sensory experience that is both a

consumer product and a cultural artifact. In this section, we first look at many facets of the

industry that contribute to its complex nature, namely, the timelines of the industry, the

importance of the people as well as the nature of the product. Finally we look at the

challenges being faced by this industry. It is these challenges, combined with the

characteristics of the industry, that makes the motion picture industry a interesting domain of

application for perspective-based information management.

1.1.1 Timelines

One of the important characteristics of the motion picture industry (MPI), especially during

its production process, is the importance of time. The industry has two timelines, both are

inherently linked - to each other and to other activities within the industry. Graphically the

relationship between the two is showed in Figure 1.1.

pre-production production post-production

conception production utilisation

distribution discovery access preservation

reuse/re purpose

Production Cycle

Life Stage

Production Cycle as a whole

Figure 1.1: The Two Industry Timelines

The first timeline is the ‘production cycle’ timeline, the process by which the motion picture

is created. The production cycle is broken into three phases: pre-production, production and

post-production. It is hard to set precise boundaries on when pre-production starts, as it

usually involves imprecise tasks. It takes time to get the basic concepts of the film to such a

state that it obtains a commitment to fund further development or is ‘green lighted’ as it is

termed in the industry.

In contrast, the production phase has precise borderlines, starting on the first day of shooting

and finishing on the last day of shooting. As soon as production ends, post production starts

in earnest, although some post-production activities, e.g. special effects for scenes already

filmed, might have began while the bulk of the motion picture was still in the production

phase. Post-production encompasses everything after production. The reason for this is

because there is always something to do whether it be to produce the final cut, to market the

final cut or to digitally re-master the motion picture for a new generation of viewing

technologies or simply to preserve it. As such, there is merit in saying a completed motion

picture is always in post-production, since there really is no event that can be marked as “the

end”.

The second timeline is the motion picture’s life stages. These life stages are conception,

production and utilization. Utilization in turn comprises distribution, discovery, access,

reuse/repurpose and preservation. The life stages do not map exactly onto the production

cycle, nor do all motion pictures reach all life stages. A motion picture is in the conception

stage when it is conceived and is being fleshed out. The latter part of conception would

correlate with pre-production. A motion picture is in production life stage when it is being

produced; so the latter parts of pre-production, all of production and the post-production

activities that end with the creation of the final cut would correlate with this stage. Utilization

spans the remainder of post-production. However, while there is an order in which the life

stages must be reached, the life stages are not a good measure of chronology. This is because

a motion picture can exist in multiple life stages at once and return to a previous life stage

under certain circumstances. Certainly, the sub-stages of utilization: discovery and access

happen multiple times.

Therefore, while the production cycle and the motion picture life stages are related they are

not the same. An easy way to distinguish between them is that the production cycle creates

the motion picture, while the life stages are the various stages in the life of the motion

picture.

1.1.2 Importance of People

Another important aspect of the industry that adds complexity to its nature is the diverse

range of crafts or skills involved. As mentioned in the introduction to this section, motion

picture development process brings together people from a wide variety of fields, some of

these people are only involved with one phase of the production cycle or only one of the life

stages of the motion picture. Others are involved with multiple cycles or life stages and a few

are involved for the entire production cycle and the majority of the life stages.

This mix of agents who are involved short term and agents2 who are involved long term raises

interesting dynamics in terms of how the industry functions. One of the contributing factors

to the stagnation of innovation in the production process of the motion picture has been the

short-term nature of people involved in the production phase of the production cycle and the

production life-stage of the motion picture. Aside from a few notable exceptions, the span of

the production phase is measured in months, even when the span of the entire production

cycle is measured in decades. For example, The Curious Case of Benjamin Buttons started

pre-production in 1994 when film industry executives were first approached with the

possibility of filming an adaptation of the F. Scott Fitzgerald short story of the same name.

Production, however, did not start until 2007 with the film finally released in 2008 [5]. What

this means is that when people are brought together to work for the filming on the motion

picture, everybody is expected to know their role and to perform it according to the industry

standard. This inevitably means that changing the way the industry defines a role can become

a sluggish process. There is a significant risk from the uncertainty associated with changing

roles and how different roles interact, which is a major reason why some parts of the motion

picture production process have remained unchanged for decades.

Despite this, standard industry practices are slowly changing (which will be discussed in

detail later in the chapter) but what is not changing is how the perspective of people differs,

based on the length of time they are associated with a project. This also speaks to

specialisation and collaboration. The movie-making process is inherently a collaborative

process; however some people’s day-to-day activities involve more collaboration than others,

e.g. the director’s work is more collaborative than that of the costume designer. A director’s

job involves directing the technical and artistic crew and cast members to bring to the screen

the director’s vision of the script, in the process controlling the artistic direction of the film.

The costume designer, on the other hand, works on a small subset of the script visualisation

process and, while they are directed by the director and have to make sure that the costumes

fit the actors, they have their own expertise and their own largely self contained way of

2 The term “agent” is used, and will be used throughout this thesis, in reference to practitioners of the domain. “Agents” as understood by the motion picture industry will be referred to more specifically, e.g. talent agent.

bringing to life the era and situation depicted in the script into being through their costumes.

Some people’s information needs are mostly limited to those generated by their own

department, e.g. the costume designers do not need too much information from outside their

department. Other people’s information needs extend not only across departments but across

phases and stages of the two timelines, e.g. the producer works across all phases and needs

information from all phases.

Another interesting aspect of the people involved within the Motion picture industry is that

the people are mix of highly technical to highly creative, as well as people who are both

creative and technical. For example, an electrician who installs the lights is a technical

person, but the lighting technician is not merely a fancy name for an electrician. The key grip,

the head of the grip department and chief rigging technician on the set, is part of the creative

technical crew who has an active role in bringing the magic into the movie. On the other

hand, an actor is a purely creative role.

This mix of short-term and long-term, technical and creative, specialised within a narrow

field but collaborating towards one grand vision means that the people involved in the motion

picture industry contributes greatly towards the complex characteristic of the industry.

1.1.3 The Product of the Industry

The motion picture is a creative product; it is a cultural artifact but, at the same time, it is

generally meant for mass consumption. Being a creative production, even the most formulaic

of motion pictures, e.g. Rocky, Rocky II, Rocky III, Rocky IV, Rocky V, is a creative product

that is therefore subject to the idiosyncrasies of the creative process and the creative mind.

Why did the director feel the need to reshoot a scene forty-two times? Would adding more

car crashes make the movie more appealing to the target demographics? How should the film

be edited to make the story more thrilling? It is not always clear why certain decisions are

made, even to the person making the decisions.

More importantly, both as an artistic product and a consumer product, the motion picture is

subject to taste-based subjective measures that make the business of making motion pictures

very unpredictable and therefore financially very risky. In truth, there is no such thing as a

“sure fire hit”. Unexpected films often find box office and/or critical success, e.g. Slumdog

Millionaire, despite featuring unknown actors, being set in a non-western society with part of

the film being filmed in a foreign language and with the entire story strongly tied to the

history of the city of Mumbai that would be unfamiliar to western audiences [6]. At other

times, what are considered sure fire hits end up being flops, e.g. Lions For Lambs which

starred Tom Cruise, Meryl Streep, with Robert Redford directing, producing and starring in

the film. In addition the film expounded the heroism of US soldiers in Iraq and Afghanistan

[7]. Yet despite combining patriotic themes with leading actors the movie still failed. This is

both a testament to the fickle nature of motion picture audiences, as well as inherent risk

associated with a creative product.

What the people making the motion picture thought was a good idea, the audience can

perceive as being a very bad idea. On the flipside of the coin, the audience often take a liking

to unexpected films. Whether a film does succeed or fail, the reasons behind it are often

complex and the reasons why they succeed or fail is unquantifiable. Sometimes the reason is

obvious in hindsight but not necessarily replicable, e.g. The Blair Witch project was an

unexpected hit due to the marketing campaign employed that had people convinced that it

was in fact a true story, but it is not a trick that will necessarily work twice.

Given the large budget to make motion pictures, studios and investors would like to be able

to answer the questions: what makes a good motion picture good? What makes a bad motion

picture? Sometimes it is obvious which category a motion picture belong to but most of the

time it depends wholly on who is giving the judgement. In addition, being “good” does not

guarantee commercial success, nor does being “bad” preclude commercial success. Many

blockbuster movies are critically considered to be mediocre at best and more often than not

tend to follow a formula, e.g. The James Bond franchise. This, however, does not stop them

from attracting huge audiences.

On the other hand artistic films, even when critically acclaimed, often are not commercial

success. Indeed, in recent times the gap between artistic films and so called commercial films

has been steadily growing; one of the major signs of this gap is that blockbusters rarely win

Academy Awards these days. This was not historically the case. Indeed, historically big

budget blockbusters such as Ben Hur were more favoured for Oscars than smaller

independent films. These days however, the Oscar for the Best Film tends to go more to

independent films such as Slumdog Millionaire.

There is, however, another reason for the shift in voting pattern for the Oscars. The motion

picture industry is an industry that is highly risky but at the same time risk averse. The

production cost of most movies is counted in the millions and sometimes hundreds of

millions of dollars; however, the unit price of a movie ticket at the theatre is ~$15 and a unit

price of a DVD rental of a one-year-old movie can be as low as $3. With this kind of low unit

revenues, the industry generally depends on mass consumption to make a profit. Increasingly,

the major studios are going for the safe option of sequels and prequels to past hits, e.g. the

Star Wars films, adaptations of books/video games/plays that have a built-in audience, e.g.

the Harry Potter films, and therefore a better chance of success in the fickle world of

consumer taste that determines the success and failure of such a taste-based product as a

motion picture. This is one of the main reasons why the voting pattern for the Academy

Awards has shifted from rewarding films from major studios to those being made by smaller

independent studios who often take more creative risks. The Academy is increasingly finding

that the big studios simply do not produce films that fit the Academy’s “arts” self image.

So far we have discussed the motion picture as an artistic product, a creative product and as a

consumer product. However, at the beginning of this section we also stated that the motion

picture was a cultural artifact; motion pictures tell the story of the period in which they are

produced. Even if they are action movies or popcorn blockbusters, they reflect the state of

mind and values of the audiences they are targeting. However, the cultural significance of a

movie cannot always be judged by the immediate reaction that it garners. For example, the

1946 film It’s a Wonderful Life is today deemed "culturally, historically, or aesthetically

significant" by the United States Library of Congress and selected for preservation in

their National Film Registry [8]. However, at the time of its original release the reviews were

almost all negative and the FBI considered the film to be communist propaganda because of

the negative portrayal of a banker [8]. In many parts of the US, audiences walked out of the

film as soon as it appeared that George Bailey, the main character of the film, was

contemplating suicide. Therefore, the movie suffered from poor word of mouth as people did

not realize that it was actually a rather uplifting film, despite its downcast opening.

At the same time, some motion pictures are specifically funded for cultural significance with

little regard to box office potential, e.g. Ten Canoes [9]. Culturally significant movies are

often funded by government bodies, private artistic funds and grants. Such films are almost

certainly not the domain of major studios. Due to toughening economic situations, the major

studios are shying away from the small portion of these kinds of movies they once used to

fund. This speaks directly to the challenges being faced by the industry, which is discussed in

detail in Section 1.1.4.

1.1.4 Challenges being faced

During the 30s and 40s, which is generally considered the golden age of cinema, the motion

picture industry was dominated by the four major studios that exerted control over all aspects

of the production and marketing of films. Their business model was a direct pipeline – the

cinema operators would tell studios they wanted three Clark Gable movies, two Spencer

Tracy movies and one Laurence Olivier movie. The studios would then get to work on these

movies, with a guaranteed distribution channel and, while not the only game in town, movies

were by far the entertainment of choice for the masses due to the affordability of tickets and

the general novelty of the concept of motion pictures. However, the coming of television

spelt the death of that model, introducing the prime challenge: rising costs but decreasing

revenue that is being faced by the industry even today, although no longer just from

television, which is now itself an established player in the greater screen based entertainment

industry.

The chief competition for the entertainment consumer’s dollar comes from the computer

games industry. The average age of a computer gamer has risen to 35 - in the heartland of

film’s traditional audience age group [10]. While the unit cost of a single game is higher than

the unit cost of a single movie ticket, a game offers many more hours of entertainment. Not

surprisingly many people, with and without families, now prefer to spend their entertainment

dollars on games, achieving more hours of entertainment per dollar than a night out in the

cinema.

Piracy has long been a problem for the motion picture industry since the advent and

widespread adoption of video cassette recorders and players made it easy to duplicate and

distribute pirated copies. However, the rise of the Internet has taken piracy to a new level.

Sometimes piracy is a direct market response to out-of-date industry practices –for instance

unmet demand is created when film business models required the staggered release of a

motion pictures through a sequence of countries [11], e.g. if a movie is released in the US in

January but does not come to Australia till May often people in Australia who really want to

see the movie will simply download a pirate copy, not because they are averse to paying

money to watch it but because they do not wish to wait [11]. Other times piracy is a result of

the consumers having limited money for entertainment and needing to stretch it [11].

Although there is a cost associated with the bandwidth needed to download, downloading is

still cheaper than going to the movies and, despite the industry coming down hard on pirates,

the industry are far from winning the war against piracy [12]. However, a lot of the time

piracy is simply an unwillingness to pay for a given motion picture.

However, even as the Internet has cut into the revenue of the motion picture industry, it has

also opened up new opportunities for the motion picture industry. At its most obvious the

Internet is a new distribution channel that allows the industry to reach consumers and

redefine the concept of a niche market. There have always been niche markets in the movie

industry together with niche production companies servicing these markets, e.g. the Mormon

movie industry, horror movies, surfing movies etc. That being said, these niche markets have

often been limited by access, which is often connected with geography. For example,

Mormon movies are easily accessible in Utah, but followers of the Mormon faith (or people

who just happen to like Mormon movies) outside of Utah have a far more difficult time

getting access to these movies. Internet distribution is able to make niche movies more

accessible to a worldwide audience.

Moreover, the Internet has opened the door to capitalise on fan culture. Fan culture is not

very well understood by the industry and is often viewed with suspicion and, under existing

copyright laws, often treated as criminal [13]. Yet, facilitating fan culture and enabling the

fans to legally do what they already do covertly could potentially generate new revenue

streams for the industry. For example, many fans of Star Wars prefer Han Solo to Luke

Skywalker. These fans make montage videos as tributes to Han Solo and post them on

YouTube. They write fanfiction that rewrites the Star Wars story to turn Han Solo into a Jedi

and the saviour of the galaxy. These fans would love to get hold of footage that ended up on

the cutting room floor more focused on Han Solo and be able to use it, to remix it [13], and

create something new. Fans are also interested in things that might be considered trivial by

non-fans and even the creators of the movies. Fans of Star Wars would be interested in

reading the different version of the script. What was the script originally? How did it change

overtime? Was one of the other versions of the script far superior to the version that was

filmed? These are not questions that interests non-fans but these are exactly the questions that

fans would find fascinating.

There are two issues that stand in the way of fully realising the value of fan culture and

exploiting the niche markets. Firstly, there is the long ingrained business practice of the

industry that stops the industry executives from realising the full potential of new

developments in technology and behaviour of fans in response to the technology. Secondly,

there is the issue of hampered ability due to mismanagement of information resources.

Earlier, we mentioned how the different versions of the Star Wars scripts would be of interest

to the fans, however even if industry executives recognised the benefits of making available

these various versions of the script, they may not have the script version available to make

them available. The information regarding the changes made to the script are meticulously

recorded during production and promptly discarded after the film hits post production. Even

if the scripts are kept, they are not preserved in a reliable way. There is nothing we can do to

change the business culture of the motion picture industry directly. However, we can develop

tools and techniques that make it easier for the motion picture industry to manage, reuse,

repurpose their information. Perhaps changing the information culture may also alter the

business culture as it becomes more cost effective to support fan culture and grant access to

more consumers to increase the size of a niche market etc.

1.2 Motivation for the Project

The motivation for this project came from the observation of the inefficiencies of the motion

picture industry in relation to the information management. The motion picture industry is an

information intensive industry, generating and utilising vast quantities of data each day with

the practitioners within the industry having unquantified stores of tacit knowledge. However,

while the industry is deeply reliant on information, the tools and techniques necessary for

efficient management of such information are lacking. There are different types of databases

with various gatekeepers of information presiding over them, such as casting directors with

databases of actors or location managers with databases of locations. These are effectively

information silos with little communication with other silos and even within the silos the

utilisation of information is limited. Part of the problem with these silos is the use of

proprietary formats that make it difficult to export the information. The other problem with

the silos is the gate-keepers, who either do not understand the value of the data to others or

believe providing access to the data would diminish their influence and job security.

Meanwhile, the industry is steadily moving from analogue to digital processes not only in

terms of using digital tools, e.g. digital cameras and editing suites, but due to increased

economic pressures the industry is at long last moving away from century-old manual

processes and adopting semi-automated workflow systems [14, 15] and other tools and

techniques that are generating more “born digital” information than ever before. However,

unless these new information sources are properly managed then the information will have no

life beyond the immediate and limited use within the production process. Therefore, there is

considerable scope for development of tools and techniques to aid the industry in managing,

enriching and generally adding value to the digital information that the industry is generating

as part of its production process. This information can be a potential new source of revenue

when utilised to support a fan culture. These kinds of fan activities also serve to keep the

movie in question in the public consciousness and fuels further consumption by the general

public, as well aid the movies in creating a lasting cultural impact.

Information can be a valuable commodity and, increasingly, the contextual information

generated during the production process of motion picture is being recognised as having

value all of its own [16]. However, while the motion picture industry has always generated

and consumed a vast quantity of information during any given production of a single motion

picture, the repurposing and repackaging of that information to generate additional value has

been fairly non-existent. Part of the problem has been the form of the information.

Traditionally the information was in analogue and often cumbersome; it could not easily

repackaged or was held tacitly inside people’s heads, but this is changing. The other is that

many aspects of the industry, as explained earlier in terms of the short-term involvement of

many agents, has been stagnant for many decades now and/or has not changed since the

industry came into being about a hundred years ago and therefore, the value of such

contextual information is not recognised and/or practitioners are unsure how to generate

value from the information.

The information can generate value in three broad ways. Firstly, through repurposing the

information for uses other than what it was generated for. Secondly by reusing existing

information and thirdly by enabling people to make more informed decisions.

1.2.1 Decision Support

The first type of decision support activity that can potentially be supported by allowing more

advanced forms of information manipulation is process improvement. Within the industry

production information collected about budget over-runs and other things can be analysed to

reveal chronic problems and point to possible solutions. This kind of trend analysis requires

sophisticated information retrieval across multiple motion picture productions. If such data

was available in high quantity, then data mining would also be possible to find correlations

that might not be immediately obvious. For example, some actors might be associated with

production cost over-runs due to their temperamental behaviour during filming, or the desire

for certain editing techniques might necessitate more cameras during filming. Even simple

analysis could lead to improvements to the production process. Such lessons learnt at the

industry level could flow back to the teaching institutes as comprehensive case studies and

best practice guidelines.

The second type of decision support that can be facilitated through the better management of

data is Learning Support. Even films that are spectacular flops such as the 2004 film

Catwoman can generate valuable information during its production. Catwoman, for example,

was the first film that was totally digital, including totally digital editing. Indeed, this was

cited as one of the contributing factors towards its failures. With the editor complaining that

as the film was totally digital, nothing ended up on the “cutting room floor” so to speak,

leaving too many options open for the final cut. The film’s producers, director and editor had

a hard time locking down the final cut, as they were overwhelmed by the amount of choice

available to them. While this was by no means the only fault with the film, analysis of the

production information of Catwoman might lead to insights and best practice guidelines for

an industry that is rapidly changing, after not changing for a hundred years.

1.2.2 Repurpose

Repurposing can take on two forms: content repurposing and process repurposing,

repurposing information to feed back into industry or related projects, alternatively

information can be repurposed for use outside of the industry.

Some forms of repurposing are already taking place. For example, CGI generated for movies

can now be imported into game engines and can thus double as game graphics. CGI from

movies to games is an obvious move but there might be other possible ports of data and

information. Could editing information be ported into preservation software to enable better

preservation of historically important motion pictures? The answer to such a question is

totally unknown simply because it is currently not feasible to share data and information

easily between various processes of motion picture production.

Repurposing can also go beyond the bounds of the industry. For example, stills taken of a real

location (as opposed to a sound stage) that has been used repeatedly can become a historical

record of the change sustained by a location. With the emergence of the Internet, the value-

added function of remixing has been touted as additional revenue stream, especially within

the creative common community as part of the hybrid economy[13]. While remixing parts of

the original film is also controversial, most of the filmed material actually ends up on the

cutting room floor. With better management of these shots, which are now almost always

digital and therefore don’t have to end up in the bin, the motion picture industry can get a

good foothold within the hybrid economy by repurposing what is today being thrown away as

waste product.

1.2.3 Reuse

Information such as location scouting reports can easily be reused. The gatekeepers of the

location scouting reports are location managers, who not only investigate locations for a

particular movie but also go scouting for interesting locations without having any particular

movie in mind. The location reports are often kept on paper and become outdated easily.

However, if moved to digital formats and integrated with tools that can update the location

info automatically, e.g. Google maps, heritage-listing databases, photo archives, the location

scouting reports can become more accessible with the location manager freed from the task of

keeping their existing locations updated and can be free to go find new locations, as well as

spend more time in selecting the right location for each scene of the film. Therefore, through

information re-use, the location manager becomes both more effective and more efficient.

This does not diminish the status of the gatekeeper; it simply makes the gatekeeper more

efficient by removing some of the more tedious aspects of their work. There will still be the

Location Manager, except said Location Manager will no longer have to troll through their

old locations updating tedious details like new zoning regulations, heritage listing or even

change in access condition and other things that can easily done fully automatically.

It is not to say that information within the motion picture industry is not reused now. It is just

that the reuse is often a needlessly laborious process, which often deters or at least reduces

reuse.

1.2.4 Driving Questions

The motivating scenarios mentioned above, combined with the observation regarding

perspective, lead us to the driving questions that motivated the work being reported in this

thesis. The central questions are:

• How to model the domain of discourse to reflect and include the different

perspectives of agents?

• Can agent perspective be exploited to better serve the needs of the Motion Picture

Industry?

• What kind of a system could exploit agent perspective to better serve the needs of the

Motion Picture Industry?

1.3 Project Background

The research presented in this thesis is part of a larger project between the ARC Centre of

Excellence for Creative Industries and Innovation (CCI) and the Australian Film Television

and Radio School (AFTRS). The research reported in this thesis is supported by a CCI

scholarship. In this section, we will cover briefly the background from which the thesis

emerged.

1.3.1 Brief introduction to CCI and AFTRS

Established in July 2005, the ARC Centre of Excellence for Creative Industries and

Innovation (CCI) is the first Centre of Excellence funded outside the science, engineering and

technology sectors. The centres research interests falls under three broad categories: Creative

Innovation, Innovative Policy and Creative Human Capital. The centre brings together

researcher from the divergent fields of humanities, the law, various social sciences and

information technology. As part of its Creative Innovation research stream, the centre

engaged in a variety of research projects to develop tools, techniques and methodology for

better management of information and information-related services for the Creative

Industries.

One of these projects was the Standards and Metadata. The primary industry partner for the

Standards and Metadata project was the Australian Film Television and Radio School

(AFTRS), which is also an industry participant in CCI itself.

Created in 1973 by the Australian Government as a key strategy to revive the Australian Film

Industry, AFTRS is the only institution of its kind in the country. AFTRS has an international

reputation for excellence and has had three of its short films (created by their students as part

of their final year project) achieve Oscar nominations. AFTRS has a huge back catalogue of

student films, to which new films are added every year by the graduating students.

Consequently, AFTRS has a growing need to effectively manage information about motion

pictures and related activities, making it a suitable partner for the CCI projects in information

management. In addition, AFTRS has total ownership and reserves all rights to these films.

This meant that an agreement with them would suffice and there would be minimal copyright

related issues, as AFTRS owned both the films and all information pertaining to the

production of the film in most circumstances. Moreover, AFTRS has specific issues and

needs that made them potential candidates for early adoption for technologies developed by

the Standards and Metadata project.

1.3.2 What AFTRS wanted from the Project

Like many organisations AFTRS has information silos that result in the considerable wasted

effort of both staff and students in having to supply the same information multiple times to

different departments resulting in information discrepancy and general information

redundancy. It also leads to information fatigue, where providers of information became

reluctant to provide quality information because they have had to provide it so many times.

The organisation as a whole also found itself unable to utilise its information properly both

because of the information silos and the general difficulties in management of information

sources.

AFTRS’ needs fell into the three broad categories of information use mentioned earlier.

AFTRS wants to reuse the information it collected easily; it wants to repurpose its

information to generate more revenue and lastly, as a teaching institute, it would be keen to

utilise its information for decision support purposes.

An example of reuse was in terms of something as simple as distribution release information

for final year graduate films. At present, AFTRS students have to first fill out an internal

AFTRS form providing details to the AFTRS distribution manager, which is used for

AFTRS’ internal records and subsequently supplied to the AFTRS library for their catalogue

information. The AFTRS library is required under the Australian Federal Government

Legislation to keep meticulous records of the student graduation films and then make both the

films and the information available to the National Archives of Australia (NAA) for

permanent storage. Previously the students had to fill out the distribution release form only

once. These days, the students face the additional requirement of re-entering the information

provided in the official AFTRS distribution release form online into multiple databases of

distribution clearing houses. This is because AFTRS actively enters their students’ works in

film festivals all over the globe on behalf of their students. These days the majority of the

festivals demand online submission facilitated through online distribution clearing houses.

AFTRS has no means of efficiently or easily porting the data they gather for their internal use

directly into the forms for the online clearing houses. As a result, AFTRS requires its students

to first fill in their internal forms and then provide the same information to three different

online clearing houses. Student participation was problematic when there was only one form.

The cooperation of students has been greatly reduced with so many more forms. AFTRS has

one full -time distribution manager and one part-time assistant, their workload is needlessly

increased when they find themselves forced to track down incomplete forms or identify

inconsistent information that have been provided across the four forms.

Apart from the distribution clearing houses, there are now many websites that are repositories

of information regarding films. Well-known examples are IMDB and Wikipedia. AFTRS

also has a channel on Google Videos where it releases student films. The information

provided in the distribution release form could be repurposed for use in websites such as

IMDB and Wikipedia. The information can also be repurposed for the metadata requirements

for Google Video. This would not be a direct reuse of information as the information would

have to be transformed to “fit” the requirements of IMDB, Wikipedia and Google Videos.

Lastly, in terms of decision support, the information of successes and failures of past student

films at various film festivals could be used to streamline the film festival entrance process

and allow the distribution manager to better focus their limited time and resources.

The above are but few examples of the needs and issues faced by AFTRS and for which they

hoped the Standards and Metadata project would have a solution.

1.3.3 Overview of the Standards and Metadata Project

The idea behind the Standards and Metadata project was to take a leadership role in the

refinement and implementation of cross-sectoral and cross-stage metadata and format

standards. In particular it was to research and demonstrate strategies to incorporate the

effective gathering, use and integration of metadata into the value chain activities.

In essence, the idea was that metadata would be used to enable the reuse, repurposing and

allow for data to be used for decision support purposes. However, it quickly became apparent

that metadata alone was not enough and indeed somewhat meaningless. For any system to be

able to properly interpret the metadata it needed a semantic layer that was richer than simple

metadata. It needed an ontology, which in turn became the central focus of this PhD. The link

between the PhD and the project was that the PhD would focus on the development of the

semantic module as part of a larger system that could be deployed within AFTRS. It was

hoped that such a system would aid AFTRS’s efforts of punching holes into its various silos

and improve communication between them. From the point of view of the PhD project the

deployment would be used as a means of testing the semantic modules and its effectiveness

in improving information classification and extraction. Unfortunately, due to a number of

factors (such as personal changes, changes at AFTRS) led to delays with the over-arching

project that resulted in the work undertaken for this PhD being conducted independently of

the overall project. This also had implication in terms of co-operation with other CCI projects

working with AFTRS in the stream of Creative innovation, notably, the Business Process

Management (BPM) project that has been introducing workflow management tools into the

production process of AFTRS final year student short film projects. The idea was that the

data collected via the Business Process Management project, which would be in XML form,

would be used by the system being developed by the Standards and Metadata project for

testing purposes and used to test data for the research reported in this thesis. While some data

has been obtained from the BPM project it is not voluminous as required by this research.

In the meantime, the PhD project has pushed on and developed the semantic module, with

only a skeletal structure of the wider system being developed to support some limited testing

of the semantic module. Overall the system is not yet in a state where it can be deployed as a

fully functional prototype within AFTRS. Therefore, it has not been possible to extensively

test the semantic models, given the lack of the surrounding system and the limited availability

of “real” data. Verification of the semantic module reported in this thesis has resulted from

largely manually constructed test data.

1.4 The Thesis

At the beginning of this chapter we described the scenario of different AFTRS students going

to the AFTRS library and asking the same question and expecting very different results.

Subsequently we described scenarios where the same piece of information, the completed

distribution release form, could be used in various different ways. However, how are these

two things connected? The answer is that the context surrounding the information-seeking

activity, which is composed of the purpose to which the seeker wants to put the information

as well as the information that the seeker already possess, determines what subset of a given

information repository the agent seeking the information is most interested in.

The context in which information is captured and how it is to be used can be expressed

through the use of the metadata tags. However, a set of metadata tags do not provide

sufficient semantics to allow for complex and nuanced multifaceted uses of information that

allows different users to pose the same question and get answers framed from their

perspective or allows the relevant parts of a given piece of document to be isolated and

transformed for use in multiple circumstances based on the needs and perspectives of those

seeking the information. Certainly an ontology for the domain is needed to provide meaning

for the metadata tags, however is that all the domain ontology is good for? Can it be used to

determine the perspective of the agents involved with the processes of the domain? If so, how

can that perspective be determined and applied?

In short, the central question being addressed by this thesis is: given a domain ontology, how

can context and perspective be determined from that ontology so as to enhance the seeking of

information to satisfy various needs and situations; a domain ontology being a model of the

concepts of a domain and how they relate to each other.

From the context of the students of AFTRS, the reason the director, editor and actor want

different answers to the same question is because they have different perspectives on the

same situation, due to the role they play within that domain of discourse, a difference that

should be evident upon examination of the domain ontology within the context of the user’s

role. All roles of a domain that can be occupied by an agent should rightly be part of that

domain’s ontology. In addition, the links of these roles to concepts and their nearness to

some concepts more than others can be used to infer the type of answers the agents occupying

those roles are seeking. We can infer how the agent will view and categorize information

because the agent’s internal categories and cognition will be influenced by the role they

occupy within the process of the domain of discourse: the motion picture industry in this

case [17].

As such, the roles of director, actor and editor are all part of the motion picture industry and

would therefore be part of the domain ontology for the motion picture industry. Editing,

Acting, Lighting, Camera Angles and indeed the concept of a Shoot would also be part of the

domain ontology for the motion picture industry. From the ontology we know how concepts

are related and whether those relationships are direct or indirect, close or distant. From this

we can determine, for example, that the role of editor is closely associated with editing

technique.

On the other hand, as shown in Figure 1.2, the director is less closely associated with editing

techniques. However, the director is still has closer association with editing techniques then,

say, actors. From this we can infer that when an actor asks the question “How is a punch

filmed?”, they are more likely seeking an answer framed in terms of acting techniques as

opposed to editing techniques.

EditingActor

Director

Editor

Figure 1.2: Distance to Editing

Similar reasoning applies to multiple uses to which a given piece of information can be

applied. When the AFTRS Distribution Release Form is used for standard distribution

purposes by the distribution manager, all the data would be of relevance to the manager.

However, when the information contained within the distribution release forms are used by

agents occupying a different role, for example a marketing manager who is using the

information to generate the Wikipedia page or the IMDB entry for the film in question, then

only a subset of the data would be of relevance. The relevance or rather the relative degree of

relevance of specific pieces of information to a given agent should be determinable through

examination of the domain ontology with reference to the role of the agent and the concepts

to which the information pertains.

At the beginning of the chapter we said that the information we seek, and the manner in

which we go about seeking it, differs not only on our information needs but also on how

much knowledge we have to begin with on the subject. Where the information needs are for

more complex mental activities such as learning and decision making, information retrieval is

necessary but not sufficient [1]. The thesis delves into the area of information extraction, as

well as information classification, where simple retrieval is necessary but not sufficient. The

novelty of the approach reported in this thesis lies with exploring the importance of user

perspective and how it can be exploited to enhance the process of information extraction and

classification.

1.4.1 Contribution

The first contribution of the research reported in this thesis is in the development of a domain

ontology for the motion picture industry that also models the agents involved in the industry

and their links to the various concepts of the motion picture industry. The ontology also takes

into account the temporal dimension that is so closely associated with the motion picture

industry. This makes it markedly different from most other ontologies currently in existence

as most ontologies do not take the concept of time into account. Most ontologies also do not

explicitly model agent roles; however both time and people are of paramount importance to

the motion picture industry and therefore need to be incorporated within the ontology. This

makes the ontology atypical.

The second contribution of this thesis project is in the development of a score based

relatedness metric that is used to determine how closely/distantly two concepts are related.

This work has been published in:

• Choudhury, Sharmin and Raymond, Kerry and Higgs, Peter L. (2008) A rule-based

metric for calculating semantic relatedness score for the motion picture industry.

Workshop on Natural Language Processing and Ontology Engineering at the Web

IEEE/WIC/ACM Intelligence Conference, December 9 – 12, 2008, Sydney, Australia.

The combination of the ontology and the relatedness metric then allowed the thesis to pursue

its third contribution to new knowledge by exploring if and how information retrieval and

classification could be improved if the relative perception of users in regards to the concepts

involved in retrieval and classification is taken into account. The work related to this was

presented at the following conferences:

• Choudhury, Sharmin (2008) Ontology based perspective determination and its

• Choudhury, Sharmin (2008) Improving human computer interaction. Creating Value:

Between Commerce and Commons - CCI International Conference, June 25 – 27,

2008, Brisbane, Australia

A minor contribution of this project is in the form of an XML wrapper for the picture

industry that wraps information based on the use to which the information is commonly put.

The work related to the schema was published in:

• Choudhury, Sharmin and Pham, Binh L. and Smith, Robert and Higgs, Peter L.

(2007) Loculus: a metadata wrapper for digital motion picture. Internet and

Multimedia Systems and Applications, August 20 – 22, 2007, Honolulu, Hawaii,

The wrapper schema was also used by another CCI project and that work is published in:

• Smith, Robert and Pham, Binh L. and Choudhury, Sharmin (2007) A digital artwork

expression language (DAEL). Internet and Multimedia Systems and Applications,

August 20 – 22, 2007, Honolulu, Hawaii, USA.

1.4.2 Broader application of the research

Due to the origins of this research the primary domain of testing was the motion picture

industry. However, the work being undertaken does have wider application as many

application domains have a need to reuse, repurpose etc and have agents in different roles

seeking different answers to the same question.

For example, suppose a health professional is executing a search on available health records

to send out information and encourage certain high risks patients for a certain disease to make

an appointment with their doctors. Let us say that the high risk patients are to have high blood

pressure. However, medical records available for such searches are rarely complete. In

Australia, at the federal government level, the only detailed medical record available

electronically is through the pharmaceutical subsidiary scheme. As such, when searching

over such records at the federal level it is beneficial to have the system be able to recognise

that a person who is on medication for high blood pressure, is most likely to have high blood

pressure even though no available medical history record definitively says the patient has

high blood pressure. The medical field is one field where there already exists large number of

ontologies and is also a field where perspective matters greatly. The perspective between

specialists and GPs, and the perspective between patient who have just been diagnosed and

patient who have been diagnosed sometime ago differ greatly. If the differences are taken

into account and used to enhance the search experience, it could be very beneficial. This

application of the work is of interest to the information searching research community in

general and novelty of the approach was of interest when the work was presented to the

community:

• Choudhury, Sharmin (2008) Ontology based perspective determination and its

1.4.3 Structure of Thesis

Chapter 2 of this thesis presents the literature review, where we explore three broad topics.

Firstly, we will be looking at what exist in terms of information management for the motion

picture industry. Under this topic we will be covering certain information models, metadata

standards and schema as well as ontologies that exist as well as explore the benefits of using

ontologies for knowledge management. The second topic deals with the idea of similarity/

semantic relatedness and how to measure them and why the ability to measure semantic

relatedness is important. Lastly, we undertake an analysis of information extraction, what it

means, what works is being conducted in the field and groundwork that lead to our own

thinking in relations to user perspective based information extraction.

Chapter 3 presents the research question and outlines the methodology used in this research.

We also present the workflow undertaken by the project to conduct the work being reported

in the thesis.

In Chapter 4 the Loculus Ontology for the motion picture industry is presented. We begin by

explaining the axioms that govern the ontology, as well as detail the sources upon which we

based the construction of relationships between concepts and the origins of the concepts

themselves. We then present excerpts from the ontology to highlight key features and

constructions idioms. In addition, we explain the three axes in which the concepts of the

ontology are positioned. The name Loculus comes from Ancient Rome and literally means

little place and was used in a number of senses, including a satchel that formed part of a

Roman legionaries luggage. The name was chosen to reflect the idea that the Loculus

Ontology is a little bag in which you keep the concepts of the industry.

In Chapter 5 we present the semantic relatedness metric, the rules, the reasoning behind the

rules. We also provides example of calculation, as well as presenting applications of the

metric.

In Chapter 6 we give details of the Loculus System, including discussions on all modules and

the technology used to develop the system. We also present the Loculus schema, which is

used internally with the system for information storage and manipulation.

In Chapter 7 we discuss the evaluation of the ontology, the metric and the system. We also

discuss the results of the evaluation, as well as the findings of the research being reported in

this thesis. In addition, we discuss the applicability of the findings of the research in domains

other then the motion picture industry.

Lastly, we conclude in Chapter 8 by revisiting the research questions first presented in

Chapter 3 and discussing in summary format, how the subsequent chapters addressed the

research questions.

2 Literature Review “I knew you weren't suited for literature.” – Gonzo, The Muppet Christmas Carol

The literature reviewed during the course of this thesis formed the corner stone upon which

the work was built, justifying the choice of models, methods and technology and served as

motivators towards the research, identifying gaps in knowledge and opening areas for

research and exploration.

One of the first areas we explored in the literature review is what already exists for the

management of information; this is presented in Sections 2.1 and 2.2. In Section 2.1 we will

review the development of information and metadata models that have led to metadata

standards and schemas as semantic containers for information management. In Section 2.2 we

will review information management based on ontologies that have often been deployed in

conjunction with the semantic containers discussed in Section 2.1 to enable higher forms of

semantic manipulation. Another topic that will be dealt with is the idea of similarity/semantic

relatedness and how to measure them and why such a measure can be useful. The review of

literature in relation to semantic relatedness is dealt with in Section 2.3. Lastly, we undertake

an analysis of information exploitation and what potential for contribution exists in this field;

this is presented in Section 2.4.

2.1 Information Models and Metadata Models

As motivation for our research was to enable better information exploitation for the motion

picture industry, a logical starting point was to examine information models. An information

model is an abstract but formal representation of entities including their properties,

relationships and the operations that can be performed on them [18]. Metadata is data

describing data. Metadata models are a type of information models that define metadata for a

given type of information. Metadata schemas that define how data is to be marked-up using

metadata, in a mark-up language such as XML.

A number of existing information and metadata models have been put forward for various

different purposes within the motion picture domain, including archival purposes,

commercial business interaction and rights management, such MPEG-21, MPEG-7 etc. The

question motivating our literature search was: to what extent can information and metadata

models aid in the better management and exploitation of information for the motion picture

industry. Therefore, we began by examining a couple of existing information models and

evaluated not just these models but the capabilities and limitations of the information models

in general. Metadata schemas are often become standardised for use within domains. We also

evaluated a number of these schemas and standards to gauge the full effectiveness of

information and metadata models.

2.1.1 Existing Models

Information models are used extensively for archival activities and these activities included

the archiving of motion pictures themselves, although within the archival community they

prefer the more general term “screen content” that encompasses the screen products such as

video games, new media artwork etc, etc. The prevalent information model within the

archival community is the IFLA Functional Requirements for Bibliographic Records (FRBR)

Model Group [19].

The FRBR is an entity-relationship model. There are three groups of entity relationships. The

entities in the first group, as depicted in Figure 2.1, represent the different aspects of user

interests in the products of intellectual or artistic endeavour. The entities defined as work (a

distinct intellectual or artistic creation) and expression (the intellectual or artistic realization

of a work) reflects intellectual or artistic content. The entities defined as manifestation (the

physical embodiment of an expression of a work) and item (a single exemplar of a

manifestation), on the other hand, reflect physical form [19].

The relationships depicted in Figure 2.1 indicate that a work can be realized through one or

more expressions (hence the double arrow on the line that links work to expression). An

expression, on the other hand, is the realization of one and only one work (hence the single

arrow on the reverse direction of that line linking expression to work). An expression can be

embodied in one or more manifestations; likewise a manifestation can embody one or more

expressions. A manifestation, in turn, can be exemplified by one or more items; but an item

can exemplify one and only one manifestation [19].

Expression

Manifestation

is realized through

is embodied in

is exemplified by

Figure 2.1: Entities and “primary” relationships of the FRBR model [19]

The entities in the second group, as depicted in Figure 2.2, represent those responsible for the

intellectual or artistic content, the physical production and dissemination, or the

custodianship of the entities in the first group. The entities in the second group include person

(an individual) and corporate body (an organization or group of individuals and/or

organizations). Figure 2.2 depicts the type of “responsibility” relationships that exist between

entities in the second group and the entities in the first group [19].

Figure 2.2 also indicates that a work can be created by one or more persons and/or one or

more corporate bodies. Conversely, a person or a corporate body can create one or more

works. An expression can be realized by one or more persons and/or corporate body; and a

person or corporate body can realize one or more expressions. A manifestation can be

produced by one or more persons or corporate bodies; a person or corporate body can

produce one or more manifestations. An item can be owned by one or more person and/or

corporate body; a person or corporate body can own one or more items [19].

Expression

Manifestation

is realized by

is produced by

is owned by

Person

Corporate

is created by

Figure 2.2: Entities and “responsibility” relationships [19]

The entities in the third group, as depicted in Figure 2.3, represent an additional set of entities

that serve as the subjects of works. The group includes concept (an abstract notion or idea),

object (a material thing), event (an action or occurrence), and place (a location). The Figure

2.3 depicts the “subject” relationships between entities in the third group and the work entity

in the first group.

Figure 2.3 indicates that a work can have as its subject one or more than one concept, object,

event, and/or place. Conversely, a concept, object, event, and/or place can be the subject of

one or more than one work. The diagram also depicts the “subject” relationships between

work and the entities in the first and second groups. The diagram indicates that a work can

have as its subject one or more than one work, expression, manifestation, item, person, and/or

corporate body. The FRBR model has been described as a conceptual framework for

developing metadata system suitable for effective indexing [19].

EXPRESSION

MANIFESTATION

PERSON

CORPORATE

CONCEPT

OBJECT

has as subject

Figure 2.3: Entities and “subject” relationships [19]

The first major implementation for the IFLA FRBR model was the AustLit project [20].

AustLit is a database that provides information on hundreds of thousands of creative and

critical Australian literature works relating to more than 100,000 Australian authors and

literary organisations. Its coverage spans 1780 to the present day [20]. The AustLit project

also used the “A Boring Core” (ABC) model [21]. ABC is a metadata model developed

within the Harmony International digital library project to provide a common conceptual

model to facilitate interoperability between metadata ontologies from different domains; we

will discuss ontologies further in Section 2.2. The AustLit project augmented the FRBR

bibliographic description model with the ABC event model, thus allowing the FRBR entity

work to have a creation event, the FRBR entity expression to have a realization event and the

manifestation entity to have an embodiment event. AustLit also went on to extend the FRBR

model to represent agents, including such things as birth and death events for agents as well

as others [20]. The AustLit project showed that the implementing the models presents

significant challenges but is achievable, cost effective, offers many benefits to practitioners

and should be considered by a range of information providers[20] .

The models mentioned so far do not have any concept of use, certainly none of context,

which is something that is important in the motion picture industry [16]. They were designed

for a specific purpose and they perform that purpose well; they are great information

containers but do not appear to have any higher level semantics needed for more advanced

forms of information exploitation. For example, the lack of context is a significant flaw.

Nonetheless, the IFLA FRBR model has been incorporated within the Open Digital Rights

Language (ODRL) and in that setting the IFLA FRBR model has some sense of use and

context. The ODRL is a Digital Rights Expression Language (DREL) model that links rights

with artifacts and agents [22]. DRELs, sometimes shortened to RELs (Rights Expression

Language), are a machine-processable languages used for Digital Rights Management [23].

Figure 2.4 shows the ODRL model and, as it can be seen, the ODRL model uses the IFLA

model to define the concept of content, in the process, the ODRL model introduces a limited

concept of use and context because of the links content has with Parties (Agents) and Rights.

However, the context is limited to rights and how they pertain to the use of the content by

parties. Even combined with ODRL, IFLA FRBR is not expressive enough to address the

majority of the issues being faced by the motion picture industry.

Figure 2.4: ODRL Model [22]

2.1.2 Metadata Schemas and Standards

There is no metadata schema or standard for the motion picture industry as a whole.

However, there have been many metadata schemas and standards for the product of the

motion picture industry, that is, the motion picture itself. More accurately it should be said

that, when the motion picture is encoded into a digital video file, metadata schemas and

standards exist to wrap metadata around the digital video object. This metadata can then be

used to distribute, discover and utilise the digital video object. Schemas and standards also

exist for the purpose of distribution, archiving and preserving the archived digital objects.

Almost all the schemas related to the motion picture industry were intended for one of these

purposes.

The metadata schemas and standards for the video digital objects include the MPEG-7

multimedia content description standard [24, 25], MPEG-21 Digital Item Declaration

Language (DIDL) [26], the MPEG-21 Rights Expression Language (REL) [27],

metadata Encoding and Transmission Standard schema (METS) [28] and metadata Encoding

and Transmission Standard Audio-Visual schema (METS AV) [29]. The MPEG-21 DIDL,

METS and METS Audio-Visual are some of the different standards that exist for object

markup, while some form of REL is used for rights markup. We will explore these standards

in more detail in the remainder of this section.

MPEG-7, formally named "Multimedia Content Description Interface", is a standard for

describing the multimedia content data that supports some degree of interpretation of the

information meaning, which can be passed onto, or accessed by, a device or a computer

program. MPEG-7 is not aimed at any one application in particular; rather, the elements that

MPEG-7 standardizes support as broad a range of applications as possible [24, 25]. That is to

say that, MPEG-7 could be used to describe the content and thus provide a metadata wrapper

for previous MPEG standards such as MPEG-1 and MPEG-2 which were used for encoding

multimedia objects [24]. The objectives of MPEG-7 were to facilitate fast and efficient

searching, filtering and content identification, describe the content characteristics including

audiovisual information and most importantly provide independence between description and

the information itself [24]. In a nutshell, it provided a metadata wrapper for a given

multimedia object.

The MPEG-21 DIDL and REL forms part of the MPEG-21 Framework, with the MPEG-21

DIDL expressing in XML the concept of a Digital Item and the MPEG-21 REL expressing

the rights associated with the said Digital Item in eXtensible Rights Markup Language

(XrML) [30] [27]. XrML is based on XML and describes rights, fees and conditions together

with message integrity and entity authentication information [30].

METS has been developed by the Library of Congress to provide an XML document format

for encoding metadata necessary for both management of digital library objects within a

repository and exchange of such objects between repositories [28]. A METS document

consists of seven major sections [28]:

� METS Header - The METS Header contains metadata describing the METS document

itself, including such information as creator, editor, etc.

� Descriptive metadata - The descriptive metadata section can point to descriptive

metadata external to the METS document or contain internally embedded descriptive

metadata, or both. Multiple instances of both external and internal descriptive metadata

can be included in the descriptive metadata section.

� Administrative metadata - The administrative metadata section provides information

regarding how the files were created and stored, intellectual property rights, metadata

regarding the original source object from which the digital library object derives, and

information regarding the provenance of the files comprising the digital library object

(i.e., master/derivative file relationships, and migration/transformation information). As

with descriptive metadata, administrative metadata can be either external to the METS

document or encoded internally.

� File Section - The file section lists all files containing content which comprise the

electronic versions of the digital object. The <file> elements can be grouped within

<fileGrp> elements, to provide for subdividing the files by object version.

� Structural Map - The structural map is the heart of a METS document. It outlines a

hierarchical structure for the digital library object, and links the elements of that structure

to content files and metadata that pertain to each element.

� Structural Links - The Structural Links section of METS allows METS creators to

record the existence of hyperlinks between nodes in the hierarchy outlined in the

Structural Map. This is of particular value in using METS to archive Websites.

� Behavior3 - The behaviour section can be used to associate executable behaviours with

content in the METS object. Each behaviour within a behaviour section has an interface

definition element that represents an abstract definition of the set of behaviours

represented by a particular behaviour section. Each behaviour also has a mechanism

element which identifies a module of executable code that implements and runs the

behaviours defined abstractly by the interface definition.

The METS AV extension plugs into the administrative section of a METS document and

allows for audio visual objects to be described with rich technical detail [29]. In theory

METS AV extension can also be used with MPEG-21, thus enabling MPEG-21 to have the

rich technical descriptiveness of METS + METS AV. In addition both METS and MPEG-21

uses Dublin Core (DC) for their descriptive markup [31]. DC is a metadata standard

developed by the Dublin Core metadata Initiative group for that allows for the XML based

descriptive markup of artifacts. In this way, METS and MPEG-21 can share functionality but

they remain separate and intended for different purposes, namely, METS for archiving and

MPEG-21 for distribution, discovery and access of digital objects.

METS administrative section can have links to external rights documents but METS itself

does not provide a vocabulary or syntax [23] However, METS like MPEG-21 can be

augmented with external schemas, including MPEG-21 REL and ODRL. In addition METS

Rights is an extension to METS and can be used to capture minimum rights information for

artifacts [23].

The last XML-schema is the Bitstream Syntax Description Language (BSDL) [32]. The

BSDL is designed to specify the document model of a multimedia bitstream [32] with the

resulting description being able to be transformed to dynamically adapt multimedia data to

the network and terminal capabilities [32]. It is completely focused on the display of product

and the use of the product.

2.1.3 Summary

Ultimately, the problem with these metadata schemas and standards is that while they provide

a semantic layer and provide a semantic container for information, the layer is not semantic

3 Although this thesis predominantly uses British spelling, METS uses American spelling for their tag definition.

enough. As a result, many projects aim to combine these standards, or at least parts of the

standards, with ontologies, these projects will be discussed in Section 2.2.2.

2.2 Ontology

The concept of ontology originated in the philosophy. The definition in philosophy of an

ontology is the study of the kinds of things that exist [33]. The idea of ontology, as it exist

within computer science, has its origins in philosophy. However, in practice the computer

scientist uses ontologies more loosely than its philosophical origins would dictate. In

computer science, ontologies are content theories about the sorts of objects, properties of

objects, and relations between objects that are possible in specified domain of knowledge

[33]. As a philosophical construct ontologies are not unknown within the film domain. In

1979 Cavell [34] first spoke of an ontology for film and recently Wood [4] engaged in a

discussion of film ontology in terms of film philosophy. From a philosophical point of view,

ontologies are complex constructs and its use within computing is inevitably viewed as

attempts at complex artificial intelligence. However, most prevalent use within computer

science of ontology is simply to provide a semantic layer that is more sophisticated than the

semantic layer that is possible from using metadata alone. Indeed, as mentioned in Section

2.1.3, most projects opt to build upon the semantic layer provided by metadata by combining

metadata with ontologies.

However, it would be erroneous to assume that all subsets of the computer science

community view and use ontologies in similar manners. They do not. As such, in this section,

we first look at different view that different subsets of the computer science community have

on the ontology. We then explore existing ontologies and their application, in order to gauge

the state of knowledge that currently exists.

2.2.1 The Use of Ontology within Computer Science

The earliest users of ontology within the computer science discipline were the Artificial

Intelligence (AI) communities. They were the first to adopt the concept of ontology from

philosophy and adapt it to suit their purposes. The difference is that while in philosophy an

ontology is about existence, in the computer science it is primarily about meaning as well as

existence [35]. To be precise, an ontology can tell what kinds of things exist in the domain of

some system, how these things can be interrelated and what they mean [35]. Building upon

this, in AI, an ontology refers to an engineering artifact, constituted by specific vocabulary

used to describe a certain reality, plus a set of explicit assumptions regarding the intended

meaning of the vocabulary words [36]. This is because within the AI community ontology

has largely come to mean one of two related things, 1) ontology is a representation

vocabulary and 2) ontologies are quintessentially content theories [33]. This view, however,

is not shared by all computer science disciplines.

While the AI community were the first to adopt the concept of ontology into computer

science, ontologies have become an important tool within the Knowledge Modelling (KM)

communities. The prevalent view within the KM community is that it adds a layer of

semantics, semantics that the computer can readily process. This is why ontologies are being

used for data integration across different communities [37] and sometimes even the same

community provided the said community produces heterogeneity in structured data. It is the

KM community who are closely associating metadata and ontologies as they view both as

method of providing a semantically rich container for knowledge content [38]. At the heart of

the KM community definition of an ontology is that an ontology is some formal description

of domain of discourse, intended for sharing among different applications, and expressed in a

language that can be used for reasoning [39].

However, the AI community take issue with the term “ontology” being applied to activities

like conceptual analysis and domain modelling, carried out by means of standard

methodologies [36]. Maintaining that true ontologies are not simply domain models or data

models but that ontological analysis clarifies the structure of knowledge [33] and that a

domain ontology forms the heart of any system of knowledge representation for that domain

Our view of ontology falls more towards the view of ontologies held by the KM community

and less with those of the AI community. This is simply because, our chief interest is in the

modelling of knowledge of the motion picture industry so as to aid them in the management

of their information. In this, we share the goals of the KM community, who utilise ontologies

for the management of information and knowledge by enabling a semantic layer to be added

to applications that are richer then the semantic layer provided by metadata alone but that do

not have the more cognitive capabilities of the AI.

Ontologies are not without limitations. While much has been made of the potential benefits of

ontologies, with most regarding ontologies as central building bloods of the semantic web

and other semantic systems, the number of ontologies that are actually used is rather limited

[40]. In his work Hepp identifies five real reasons that hinder the development of relevant

ontologies [40]. The five reasons identified by Hepp are,

1) Ontology engineering lag versus conceptual dynamics: how easily can the ontology

evolve?

2) Resources consumption: is the cost of production of the ontology is justified by the

benefits of the ontology?

3) Communication between creators and users: is the ontology understood by the domain

practitioners?

4) Incentive conflicts and network externalities: is the agents who invest in the ontology also

benefiting from the ontology?

5) Intellectual property rights: is the creation of an ontology infringing on property rights?

All of Hepp’s reasons are extremely valid but the reason most relevant to ontology design is

the communication between creators and users [40]. Ontologies are usually created by

computer scientists and conform to the “prettiness” of computer science theories and

principles. However, the resultant ontologies might not be easily comprehended by agents of

the domain [40]. Feeling this disconnect, the users are reluctant to use and embrace the

ontology. There is also another aspect to this problem: “the lost in the translation” problem. Is

the ontology that has been created an accurate reflection of the knowledge structure of the

domain? Or is it simply the reflection of the knowledge structure of the creator of the

ontology, assuming the ontology creator has firsthand knowledge of the domain, or the

knowledge structure as perceived by the creator of the ontology? This is a major problem,

compounded by the fact that domain experts do not have the skills to interpret an ontology as

is, i.e. place an ontology represented in a formal ontology language and for most domain

experts it is just gibberish [40].

By keeping and ontology simple and easily graspable by domain experts also addresses

Hepp’s other criticism of ontologies, that of adaptability. If the understanding of the ontology

comes naturally to the domain practitioners, they should not only be more inclined to use it

but should find it easier to adapt as their needs vary. This in turn addresses Hepp’s other

concerns regarding incentivisation. Hepp’s remaining two concerns regarding intellectual

property rights and that of cost-benefit analysis are contingent upon each individual

circumstances, not only in regards to the ontology but the organisation within the domain that

seeks to use the ontology. It is beyond the scope of our research to do a cost-benefit analysis

for an ontology for a motion picture industry but we do need to be mindful that when

constructing the ontology we do not infringe on the intellectual property rights of any

individual or organisation.

In the discussion about interpretation of ontologies, it must be noted that ontologies are

governed by axioms which are used in order to express other relationships between concepts

and to constrain their intended interpretation [36], e.g. an “is associated with” relationship

denotes a vague and nebulous relationship. These axioms define what concepts can and

cannot be included in the ontology, as well as provide the rules by which the concepts are to

be linked to each other. In addition, the axioms give guidance as to how the rules are to be

interpreted. Axioms can be considered objects as well [41] and certainly for large-scale

ontology modelling it might be beneficial to consider axioms as objects[41] but are generally

separate from ontologies. Thus, when attempting to understand an ontology, reference must

be made to both the ontology that models the knowledge and the axioms that govern the

modelling. All this makes ontologies difficult to evaluate as a standalone artifact. However,

when that artifact is used within an information system, it is much easier to evaluate the

ontology through application, simply because application of the ontology is much easier for

the domain expert to understand than the ontology itself. Therefore, it would appear that

ontologies can best be evaluated through application. This is something to be mindful of.

When developing an ontology, there is a methodology to it and what needs to be kept in mind

are what are the boundaries of ontology? What types are there of ontology? What is the

structure of ontology [42]. There is also need to distinguishing three main kinds of

information: ontological, quasi-ontological and non-ontological, as well as three types of

ontologies: descriptive, formal and formalized [42]. In addition, there has been work done in

the past to address the issues of categorisation in relation to ontology capture and methods of

handling ambiguous terms [43]. These research all address important issues in ontology

design and development. As such, they inform the development of new ontologies.

At this point, it is important to consider the connection between information models,

metadata models and ontologies. As mentioned in Section 2.1, an information model is an

abstract but formal representation of entities including their properties, relationships and the

operations that can be performed on them [18]. Metadata models are a type of information

models that define metadata for a given type of information. Ontologies, as we have learnt, is

a formal representation of knowledge used for reasoning. An ontology is a model, but it is not

an information model as such but a knowledge model. As such, the different between an

ontology and information models, which include metadata models, is the difference that

exists between information and knowledge. Bellinger states that information is data (which is

simply defined as symbols) that are processed to be useful; provides answers to "who",

"what", "where", and "when" questions, while knowledge is the application of data and

information; answers "how" questions [44].

In short, an ontology, which often has close ties with an information model, is used to answer

questions of “how” while information models and metadata models, describe the “what”,

“where” and “when”. By enabling computers to process the “how” question, ontologies open

the door to higher order reasoning and information exploitation that goes beyond the

simplistic “what”, “where” and “when” questions that information models and their metadata

model sub-set can process only.

2.2.2 Existing Ontologies

Just as there is no metadata standard for the motion picture industry as a whole, there is also

no ontology for the motion picture industry as a whole. However, many of the metadata

projects that deal with the production process of the motion picture industry and the motion

picture artifact itself, have been extended to include ontologies. The MPEG-7 and MPEG-21

projects [26], that have been discussed in Section 2.1.2, in particular have lead to a number of

ontologies being developed for the multimedia object that is a motion picture artifact. Before

we go into the details of these projects what must be emphasised is that these projects treat

the motion picture artifact as a generic multimedia object just as the metadata schemas they

extend. They do not identify the motion picture object as a motion picture object which in

itself leads to a loss of context, which is one of the problems we are seeking to address due to

the increased important of context [16].

The metadata standard that has been associated most with ontology development is the

MPEG-7 metadata standard. Hunter, who was involved with the development of the standard,

built a MPEG-7 Ontology [45] as part of the foundation of the semantic web [45]. Hunter

argued that, while XML Schema based MPEG-7 has been ideal for expressing the syntax,

structural, cardinality and data-typing constraints required by MPEG-7, XML is not enough

to make MPEG-7 accessible, re-usable and interoperable with other domains [45]. As such

the semantics of the MPEG-7 metadata terms also need to be expressed in an ontology using

a machine-understandable language [45]. This would facilitate the sharing of multimedia

objects over the semantic web. From the point of view of the motion picture industry,

products of the motion picture industry that are described in MPEG-7 can use this ontology

for better distribution. However, this ontology provides no mechanisms for describing the

process by which the motion picture or rather the multimedia object was created.

Hunter was also involved with the Harmony project, which brought together work done on

MPEG-7, among other standards, to define overlapping descriptive vocabularies for

annotating multimedia content [25]. The chief contribution of the Harmony project is the

ABC Data Model and Ontology [21, 46], which was the result of investigation into a general

approach towards metadata interoperability and its particular application in multimedia

digital libraries [47]. In order to facilitate interoperability, the concepts and relationships

described in the ABC model could be used to guide the development of community-specific

vocabularies [47], which the individual communities could express using formalisms such as

RDF to express the possibly complex relationships between the ABC model and their

community-specific vocabularies [47].

The ABC model could be used to describe an event with agents as ‘actors’ for an event [47]

and, in this way, the ABC model could be extended to form a metadata schema and ontology

set for the motion picture domain. However, such an extension can restrict the natural

modelling of the domain, getting us back to the problem identified by Hepp regarding the

dissonance between users and creators of ontology [40]. In addition, the ABC ontology is not

the only extensible core ontology associated with multimedia content. Figure 2.5 illustrates

the ABC top-level class hierarchy with properties.

Figure 2.

The CIDOC Conceptual Reference Model

information integration for cultural heritage data and their correlation with library and archive

information [49]. It is a property

controlled exchange of cultural heritage information

knowledge sharing platform with the specific domain of application being the cultural

institutes such as museums, archiv

extendable and extension is encouraged. Indeed work has started to integrate the IFLA FRBR

Model mentioned in Section 2.1.1

with MPEG-7 to describe multimedia in museums

.5: ABC Class Hierarchy with Properties [48]

The CIDOC Conceptual Reference Model (CRM) a high-level ontology to enable

It is a property-centric ontology that is an international standard for the

controlled exchange of cultural heritage information [49]. It was built as a common

institutes such as museums, archives etc. Like the ABC model, the CRM ontologies are

2.1.1 and the CIDOC CRM model [50] and Hunter combined it

7 to describe multimedia in museums [51]. However, once again, while this

el ontology to enable

centric ontology that is an international standard for the

. It was built as a common

es etc. Like the ABC model, the CRM ontologies are

and Hunter combined it

. However, once again, while this

ontology can inform the development of a domain ontology for the motion picture industry,

extending it to make the domain ontology would tie the domain ontology to the CIDOC CRM

for no beneficial reason.

The chief benefit gained from extending either the ABC ontology or the CIDOC CRM to

construct an ontology for the motion picture domain is interoperability. However, this raises

the question, with whom is interoperability facilitated if either of those two ontologies are

extended. The short answer is other users of the ontology, which at this current point in time

are mostly archival institutes for the ABC ontology and museums for the CIDOC CRM

ontology. Archiving is one activity far down the timeline that, while it needs to be taken note

of, cannot dictate the development of the ontology as a whole. Therefore, it is better to

construct a free flowing ontology unrestricted from the constraints of a core ontology to best

capture the nuances of the domain, especially since this does not sacrifice interoperability as

such. Ontologies can be mapped from one to another [52, 53] and therefore, should the need

arise, a transition ontology can always be written that maps a domain ontology for the motion

picture industry onto either the ABC ontology or the CIDOC CRM ontology to facilitate

communication between the motion picture industry and a given archival institute that might

be using one of those ontologies.

Other research conducted on combining ontologies with metadata standards include,

Tsinaraki research in ontology driven management [54], indexing [55] and user preference

models for MPEG 7/21 [56]. Arndt is another researcher who added formal semantics to

MPEG-7 [57]. Also there is OREL, which is an ontology-based rights expression language

that allows not only users but machines to handle digital rights at semantic level [58].

However, while these ontologies add an extra layer of semantics to the metadata standards,

the fact that the metadata standards were focused on generic multimedia objects means that

none of them are any substitute for a domain ontology that models both the process and the

product of the motion picture industry.

There are other ontologies that can be seen to be associated with the motion picture industry,

such as the Internet Movie Database (IMDB) Ontology that has been put forward to vastly

improve the knowledge representation in IMDB [59]. There are also models for specific types

of motion picture products such as the model for the archival and searching historical audio-

video materials to support traditional archival activities as well as some advance application

like video summary, speech recognition etc [60]. There are even models for describing

scalable and interactive TV services [61]. However, a domain ontology is sorely lacking, thus

leaving open a gap in knowledge that needs to be addressed.

2.2.3 Ontologies in other domains

The domain that has seen the most work in terms of ontology is the medical domain, which

has some of the most extensive ontologies defining multiple aspects of the domain. For

example, the Unified Medical Language System (UMLS) is a repository of biomedical

vocabularies developed by the US National Library of Medicine [62]. UMLS integrates over

2 million names for some 900 000 concepts from more than 60 families of biomedical

vocabularies, as well as 12 million relations among these concepts and is one of many

biomedical resources available to researcher [63]. Other biomedical resources include the

GenBank sequence database incorporates publicly available DNA sequences of more than

105 000 different organisms, primarily through direct submission of sequence data from

individual laboratories and large-scale sequencing projects [64], Genome databases like the

MGD: the Mouse Genome Database [65] and integrated resources such as RefSeq and

LocusLink [66]. In addition, SNOMED CT is a standardised healthcare terminology

including comprehensive coverage of diseases, clinical findings, therapies, procedures and

outcomes [67]. It provides the core general terminology for the electronic health record

(EHR) and contains more than 357,000 concepts with unique meanings and formal logic-

based definitions organised into hierarchies [67]. However, the abundance of resources is not

without its issues.

One common denominator for all of these resources is terminology, i.e. the names of genes,

proteins, diseases, molecular functions, etc., in biomedical texts and the corresponding entries

in the various controlled vocabularies and nomenclatures associated with these resources

[63]. However, having identified terminology as a key integrating factor for biomedical

resources does not imply that all resources have adopted standard vocabularies, which—

whenever existing—would make these resources interoperable [63]. This has lead to projects

such as TAMBIS, which addresses the the specific issue of integrating disparate resources for

bioinformatics through a model of domain knowledge [68]. However, in their work,

Bodenreider presented a different approach to information integration through terminology

integration: the Unified Medical Language System (UMLS) [63]. There is a lesson here for

us, in that we must be mindful of the terminology in use within ontologies for the motion

picture domain.

In other types of ontology-related research, Couto in their work explored the benefit from

comparing proteins based on their biological role rather than their sequence by considering

uses all the information in the graph structure of the Gene Ontology and not regarding it a

hierarchy[69]. This is very interesting and speaks to the reality that ontologies are graphs and

not hierarchies.

2.2.4 Ontology Implementation Languages

The last topic that needs to be covered for ontologies is how they are implemented. There are

a number of languages in which ontologies are implemented. Most of the languages are

designed for used on the web. This is because much of the ontology work that has taken place

in the KM communities has revolved around the semantic web and has built on work that was

conducted around metadata. These knowledge representation languages are often based on

XML, again for ease of use on the web and include languages like the Resource Description

Framework (RDF) [70], DARPA Agent Markup Language (DAML) – which itself is based

on RDF [71], Ontology Inference Layer (OIL) [72] and DAML + OIL [73]. RDF

specifications were originally designed as a metadata data model [70]. DAML had its origins

in attempts to construct machine-readable representations of knowledge for the Web and thus

contributed directly to the emergence of the semantic web [71]. OIL was the forerunner for

an ontology infrastructure for the web [72].

The most prevalent language of implementation with the KM communities is Web Ontology

Language (OWL) [74] and DAML+OIL. Indeed chronologically, both DAML and OIL were

superseded by DAML+OIL, which combined the features of both and was the stepping stone

towards the development of OWL [74].

OWL is a set of languages used for knowledge representation in the form of ontology

authoring [74]. OWL is designed for use by applications that need to process the content of

information instead of just presenting information to humans [74]. Recommended by the

World Wide Web Consortium, it comprises of three semantically related languages: OWL

Lite, OWL DL and OWL Full. Each language is a syntactic extension of its simpler

predecessor with OWL Lite being the simplest of the languages and OWL Full being the

most comprehensive and supporting all features of OWL [74]. As a result, while all valid

OWL Lite documents are also valid OWL Full documents, not all valid OWL Full documents

are valid OWL Lite documents [74]. OWL is the standard for the Semantic Web.

2.2.5 Summary

The motion picture industry as a whole has never been modelled into an ontology or even

into a metadata schema. An ontology is a richer model type than a metadata schema, which is

why ontologies have been written to compliment metadata schemas that deal with generic

multimedia objects, which is one of the few types of ontologies that do exist that can be

affiliated with the motion picture industry. The extensible ontology models ABC and CRM

can be extended to model the domain, or rather a domain model can be written in the

relationships, vocabularies etc described by these models. However, this would tie the

domain model of the motion picture industry to the rules and dictates of the ABC or CRM

model and therefore leaving such an ontology vulnerable for cognitive dissonance between

users and creators without gaining any real benefits. The reason there would be no benefits is

because the ABC and CRM model are popular amongst archivists and within digital libraries

and extending them leads to some benefits for interoperability with other ontologies based on

the ABC or CRM model. However, we are chiefly interested in a developing a knowledge

model type ontology for the motion picture industry and such model would benefit being

independent of generic extensible ontologies as it would allow the agents of the domain more

freedom of expression, besides which, once an ontology is in place it can be mapped into

other ontology formats, the most important thing is to have an ontology that can be exploited

for different purposes. The prevalent language of ontology implementation in web semantics

is OWL.

2.3 Semantic relatedness

In Chapter 1, we discussed at length the variation of perspective of agents within the motion

picture industry due to the time they are involved with a given motion picture and their role

within the motion picture. As illustrated in Section 1.4, the perspective of agents is strongly

related to the semantic relatedness of concepts in a given ontology. This can potentially have

implications on information needs. Therefore, as part of our literature study we undertook an

investigation of semantic relatedness and how such a thing can be measured.

2.3.1 What is semantic relatedness

The concept of semantic relatedness has its origins in the cognitive theory of similarity and

therefore, to understand semantic relatedness we must first look at similarity. Similarity plays

a fundamental role in theories of knowledge and behaviour [75]. It serves as an organizing

principle by which individuals classify objects, form concepts, and make generalizations [75].

However, how individuals classify objects, form concepts and make generalization is

contingent upon their developmental experiences, cultural backgrounds etc [17]. In short,

similarity is context dependent [75]. This idea of context-dependent similarity is further

reinforced by Goodman, who states that similarity is meaningless without a frame of

reference [76]. Therefore, Medin proposes that for similarity to be a useful construct, one

must be able to specify the ways or respects in which two things are similar [77]. As human

beings we are fairly adapt at stating how a man and a woman are similar and how they are

different. We can also explain that a dog and a table are only similar in that they have four

legs. The more we know about a subject, the more we are able to articulate the similarities

between concepts within that subject. Medin argues that this similarity comparison process

itself that is internal to an individual can serve to provide the frame of reference or the

context under which two concepts are similar [77]. It is in this setting that an ontology and

the notion of similarity intersect.

As explained in Section 2.2.1 an ontology is a model of things that exist and the connections

that exist between them. A domain ontology is a model of things that exist in that domain and

the connections that exist between concepts in that domain. Such a model, or at least the

portion of the model that is most relevant to them, exist within agents of that domain and as

such forms the point of reference for said individual [17]. Putting in another way, we all have

our own internal ontology, a subset of which is an extract from the domain ontology focused

around our understanding of the domain. Therefore, the concept of similarity within

ontologies arise in the form of semantic relatedness, if you take the human cognition

perspective, with the added benefits being able to measure semantic relatedness [78].

However, measures of semantic relatedness have predominantly been developed by the

discipline of computational linguistic in the context of lexical databases, usually WordNet

[79]. WordNet is a lexical database for the English Language which groups English words

into sets of synonyms called synsets, provides short, general definitions, and records the

various semantic relations between these synsets [79].

From the linguistics perspective, Blanchard defines semantic relatedness as evaluating the

closeness between two concepts from the whole set of their semantic links [80]. In short, in

linguistics, semantic similarity is not the same as semantic relatedness but they are linked.

Blanchard also states that all pairs of concepts with a high semantic similarity value have a

high semantic relatedness value whereas the inverse is not necessarily true [80]. Blanchard

defines semantic distance distinctly from semantic relatedness and states that semantic

distance evaluates the disaffection between two concepts: it is an inverse notion to the

semantic relatedness [80]. Within linguistics, generally speaking a “semantic similarity”

between two objects is related to their commonalities and sometimes their differences [81].

However, those involved with human cognition can use semantic relatedness and semantic

similarity interchangeably because, within the human cognitive process, two objects can be

semantically related that are not in fact linguistically related. An example of this is red and

blue – which are cognitively related because they are both types of colours but are not

necessarily linguistically related. A more subtle example of cognitively related concepts

occurs when undertaking same-difference categorisation [82]. Hampton spoke of relatedness

when investigating the effects of Good-Bad categorization tasks involving concept pairs that

were linguistically different but similar in other ways [82], e.g. plant-animal, natural-

manmade etc. This is the type of semantic relatedness we are interested in and therefore,

when we use the word semantic relatedness we are using it in the manner of the psychologists

(human cognition) and not the linguists.

That being said, WordNet can be said to model the English language. It can be said that

WordNet is a type of domain model for the English language. As such, the work that has

been done in this area is applicable to our research. Indeed, the need to determine the degree

of semantic relatedness between two concepts is a problem that pervades much of

computational linguistics [83] and many of the existing measures for the determination of

semantic similarity/relatedness have their origin in linguistics [84]. Though semantic

relatedness is defined differently in linguistics to the type of relatedness we are interested in,

the measure proposed by the domain of computer linguistics is still of interest to us when

WordNet is viewed as more of a model than a lexical database. As such, these measures are

discussed in Section 2.3.2.

2.3.2 Existing measures of semantic relatedness

Semantic relatedness has been employed in the areas of computational linguistics and

artificial intelligence for a variety of purposes, including, but not limited to, word sense

disambiguation [85], detection and correction of the word spelling errors, text segmentation,

image retrieval, multimodal documents retrieval and automatic hypertext linkage [78]. In

addition, Maedche showed that relatedness can be measured across multiple ontologies by

considering ontologies as a two-layered system consisting of a lexical and conceptual layer

[86], although we are only interested in relatedness within a single ontology.

From their survey of semantic relatedness measures, Blanchard et al. found that most

semantic relatedness measures are based on a given ontology, with some also requiring a

corpus of text [80]. Blanchard et al. also found that semantic relatedness measures were also

based on axioms [80]. For example, one axiom that forms the basis of many semantic

relatedness measures is that the shorter the path between two concepts, the more related they

As mentioned before, linguistics is one area where semantic relatedness has been used

extensively. Linguistics and the WordNet lexical taxonomy is the basis for almost all the

existing methods of calculating semantic relatedness. The Hirst-St-Onge measure of semantic

relatedness is that two lexicalized concepts are semantically close if their synsets within

WordNet are connected by a path that is relatively short and does not change directions often

[87]. The shortest path between two synsets is also relied upon by Leacock-Chodorow [88],

while Resnik [89] brings together ontology and corpus to judge similarity by the extent to

which they share information. In addition, the method proposed by Jiang-Conrath [90] also

uses the notion of information content but in the form of the conditional probability. Lin [91]

proposed a semantic relatedness measure that is based on the same elements as those of

Jiang-Conrath [90] but arranged differently to give another probability function. These five

measures have been evaluated in the WordNet context by Budanitsky for application in

linguistics [83] and he found that that the probability-based measure of Jiang-Conrath [90] is

the best when applied to linguistics in a WordNet context [83].

However, while this is useful for linguistics, there is no indication that all these WordNet

based measures can be easily applied to non-relatedness calculation where the relatedness is

based on utility not semantic meaning in a linguistic sense. For example, a pencil and eraser

are related because they are used together but are not linguistically related. The answer is that

while the measures perhaps should not be used as is, the principles behind the measures

probably could.

To this end, even though lexilogical4 in origin, the path-based measures proposed

independently by Hirst [87] and Leacock [88] can inform the development of new measure

for semantic relatedness. Certainly the edge counting method proposed by Rada [92] is

adaptable to any graph or taxonomy. Rada proposed a metric, termed simply distance, to

measure the relatedness of nodes in a semantic net [92]. Distance defined as the average

minimum path length over all pair-wise combinations of nodes between two subsets of nodes

[92]. Distance has been successfully used to assess the conceptual distance between sets of

concepts when used on a semantic net of hierarchical, i.e. taxonomical, relations [92]. Rada

found that the judgements of distance metric significantly correlated with the distance

judgements that humans make [92]. Therefore, the conceptual distance between nodes in a

taxonomical model is certainly a viable measure of semantic relatedness.

However, the method proposed by Rada is simple edge counting and that leads to erroneous

similarity measures due to the problem of abstraction, as identified by Li [78]. Li [78] also

works with a lexilogical taxonomy and observed that certain concepts are more abstract than

others in a hierarchical semantic knowledge base and if this abstraction is not taken into

account, concepts that are not closely related appear closely related when a measure of their

relatedness is taken [78].

For example, Li uses the hierarchical semantic knowledge shown in Figure 2.6 to illustrate

that due to the structure of the hierarchy, many concepts can appear to be closely associated

when only simple edge counting is employed. Referring to the diagram, the taxonomy has

animals and persons at the same hierarchical level, as it does the concepts: adult, male,

female and juvenile. However, the concepts of adult, male, female and juvenile are highly

abstract and share very few properties beyond their parent. Simple edge counting has no

method of reflecting this abstraction.

4 Lexilogical is the adjective form of Lexilogy, which is the branch of linguistics that with the lexical component of language.

Figure 2.6: Hierarchical semantic knowledge base [78]

Li’s proposed solution involves modifying the direct path length method by utilising more

information [78]. Li uses length, depth and local semantic density, where length is the

distance between two nodes (edge counting), depth is the relative position of the two nodes in

the hierarchy and semantic density measures the number of connections a node has [78]. By

taking into account more variables, Li obtains a more precise picture of the abstractness of

concepts. However, this method is not necessarily transferrable to an ontology which might

have more of a graph structure and with layers of varying abstraction. Because the

determination of the degree of abstraction is a non-trivial task, it is still an open problem with

measurement of semantic relatedness.

2.3.3 Summary

Semantic relatedness measures have been developed and used in a variety of ways. While the

current measures are all for linguistic purposes, they have underlying principles that can be

used to develop a relatedness metric for the measurement of perspective and the general

relatedness of concepts in a non-linguistic setting. However, abstraction can lead to errors in

the determination of relatedness and is an issue that must be overcome.

2.4 Information Extraction

The last part of our literature review concentrates on the idea of information extraction itself,

which is the driving motivation behind our research. Information extraction is a form of

information exploitation that goes beyond simple information retrieval. Marchionini and

White have commented “Retrieval is sufficient when the need is well-defined in the

searcher’s mind; however, when searchers are seeking information for learning, decision

making, and other complex mental activities that take place over time, retrieval is necessary

but not sufficient.” [1]. The key phrase in that comment is “well-defined”. Studies have

shown that while a search engine typically treats each search interaction between itself and

the user as independent transactions, it is not so from the perspective of the user [93]. From

the perspective of the user, it is a dialogue where the user often starts with an ill-defined

search criteria in their mind but continuously refines the criteria based on the results they

obtain from the information retrieval tool, such as a search engine [93, 94]. Spink has

conducted an extensive study on the pattern of user queries in the setting of search engines to

arrive at this deduction [94] which is supported by the work of Jansen’s study of query

modification patterns during web searching [95].

Another aspect of Spink’s work has been in the concept of relevance [96]. Most importantly

for our purpose, Spink’s work shows that the user’s ability to rank relevance and irrelevance

of information is based on user’s familiarity of the subject matter to the information [96].

What is unknown in this field is whether domain ontologies in conjunction with a semantic

relatedness measure can be used to determine a given user’s ability to rank relevance. Also

unknown is whether it is possible to determine the actual information the user is looking to

extract by predicting the use for which they wish to employ the information. However the

literature does suggest that such uses might be possible.

For example, Tao recently proposed a novel contribution to the field of information retrieval

in the form of an ontology-based knowledge retrieval framework which requires a world

knowledge base describing and specifying the background knowledge possessed by humans

[97]. However Tao observes that such a knowledge base does not exist [97], but within the

confines of a domain, the domain ontology would act as world knowledge base; being

knowledge model of the world of that domain. Note that Tao talks about knowledge retrieval,

not information extraction. The difference between knowledge and information is not simply

a semantic one. Knowledge implies understanding and in order to retrieve knowledge, there

must be some understanding of the user’s personal ontology and the context within which the

user is operating.

Other researchers in information extraction are approaching the problem from widely

different directions. Some are adopting utility-based models from ecology and psychology to

predict human behaviour for specific information-seeking conditions [98]. Others are

exploring the social and collaborative search experience [99, 100]. However, for us, given

the motion picture industry’s strong focus on agents and their roles, the knowledge-based

approach suggested by Tao seems a more promising direction to pursue.

2.5 Summary

From the review of the literature we have identified a number of gaps in existing work.

Firstly, there is no domain ontology that models the entirety of the motion picture industry.

All current ontologies focus on the product, the motion picture itself, and treat it as a generic

multimedia object and the ontologies are chiefly built for archival purposes. Secondly, while

a number of semantic relatedness metrics exist, all of them have been developed for linguistic

purposes and the ones that can be adapted for use in general setting suffer from problems

with abstraction. That is, they do not adequately account for abstraction within hierarchical

semantic knowledge bases and thus lead to erroneous relatedness results. Lastly, there exists

an opportunity to explore the use of an ontology in combination with semantic relatedness to

better support user perspective sensitivity in information exploitation.

3 Research Plan “It's the question that drives us, Neo. It's the question that brought you here. You know the

question, just as I did.” - Trinity, The Matrix

3.1 Research Questions

There are three interconnected questions that drive the research presented in thesis. The

theme that links them is the theme of agents and how they view the domain of discourse.

These main questions incorporate a serious of sub-questions that break down the complex

nature of the over-arching question. These questions are given below and informed the basis

of the research methodology discussed in the next Section 3.2.

1. How to model the domain of discourse to facilitate the different perspectives of agents?

1.1. How to find the concepts of the domain?

1.2. How to model the relationships between the concepts?

1.3. How to model the relationship of the agents of the system to the concepts?

1.4. What other aspects of the domain need to be captured in the model?

2. Can agent’s perspective be exploited to better serve the needs of the Motion Picture

Industry?

2.1. How can agent perspective be measured?

2.2. How can agent perspective be exploited to better meet the information needs of the

3. What kind of a system could exploit agent perspective to better serve the needs of the

3.1. How would the various models be used?

3.2. How would the agent perspective be used?

3.2 Research Methodology

The research methodology is derived from the research question and is presented in Figure

Research Methodology

Data GatheringAFTRS Interviews

Literature Search

Relatedness Survey

Modeling

Ontology

Metadata Schema

Relatedness Metric

Evaluation

Relatedness Metric

Loculus System Evaluation

Implementation

Loculus System

Data AnalysisAxiom analysis

Information Model Development

Test case formulation

System design

Figure 3.1: The Research Methodology Steps

From the research methodology steps the research workflow that is shown in Figure 3.2 was

derived. Most of the tasks of the workflow are qualitative in nature.

The initial tasks were gathering data about the concepts of the motion picture industry

through a review of industry literature and interviews with industry practitioners. The

gathered data was then analysed to determine the derive relationship and modelling

information. The modelling phase, in turn, consisted of two parallel activities. One activity

involved the creation of a metadata schema for later use within the Loculus System. The

other activity consisted of three tasks, the first of which was the formulation of axioms which

formed the foundation of the development of ontology (the second task) and the relatedness

metric (the third task).

The Loculus System was then built around the ontology and the relatedness metric, with the

metadata schema supporting the functionality of the system. The system, in short, is used to

demonstrate to what use the ontology and the relatedness metric can be put to.

The evaluation phase followed the implementation phase and involved two parallel activities.

One activity evaluated the relatedness metric, and through that evaluation, also the ontology.

This was done through a web survey and the gathered data was then analysed for the

evaluation. The other activity evaluated the algorithms of the system through unit testing.

Evaluation Phase

Modeling Phase

Data Gathering Phase

Data Analysis Phase

Formulation of Axioms

Ontology DevelopmentMetadata Schema

Development

Relatedness Metric

Loculus SystemImplementation Phase

Relatedness MetricEvaluation

Loculus SystemEvaluation

Unit TestingRelatedness Survey Data

Gathering

Analysis of Results

Figure 3.2: The Research Workflow

4 Loculus Ontology “You're late, do you have no concept of time?” - Dr. Emmett Brown, Back to the Future

Our first research question, as presented in Section 3.1, is: how to model the domain of

discourse to facilitate the different perspectives of agents? One of the answers to this question

is that an ontology can model the domain of discourse. If the ontology is constructed

correctly, then it can to facilitate different perspectives of agents. In this chapter we present

the Loculus Ontology as our answer to the first research question.

What is an ontology? To summarise the discussion from the literature review Section 2.2.1,

in computer science, ontologies are content theory about the sorts of objects, properties of

objects, and relations between objects that are possible in specified domain of knowledge

[33]. In addition, ontologies are governed by axioms which are used in order to express other

relationships between concepts and to constrain their intended interpretation [36]. The

definition in philosophy of ontology is the study of the kinds of things that exist [33].

As elaborated in Section 2.2.1, as a philosophical construct ontologies are not unknown

within the motion picture domain [4, 34]. However as a computer science construct, there is

no domain ontology for the motion picture industry. It is for this reason we undertook the

development of the Loculus Ontology for the motion picture industry, which is an ontology

that covers both the product, the motion picture, and the process by which the product is

created. This focus on both the product and process differentiates the Loculus ontology from

existing ontologies. In addition, the Loculus ontology incorporates time and people; two

integral aspects of the motion picture industry that provide the context for many of the

industry specific concepts within the domain.

In this chapter, we first present the conceptual foundation of the Loculus ontology before

presenting and discussing the axioms that govern the ontology. We explore the three axes that

naturally result from the structure of the ontology before discussing the implementation of the

ontology and present some extracts from the ontology, which was implemented in OWL,

using Altova SemanticWorks.

It is not feasible to present the entire Loculus ontology within this thesis. However, it may be

downloaded from the World Wide Web by following the instructions in Appendix A.

4.1 Conceptual Foundation of the Ontology

In essence, the foundation of the ontology is two layered. The first layer comprises the

methods by which the concepts of the ontology are revealed and how the relationships

between them are discovered. The second layer is the axioms that govern the ontology.

In terms of the first layer, the motion picture industry concepts are terms that exist in the

natural language of the industry. These are located within the literature of and about the

industry. The relationships between the concepts originate in the literature [101] and were

refined through consultation with industry professionals from AFTRS.

As we are interested in the discourse of the motion picture industry, we sought out the source

that had the widest range of contributors to a folksonomy as a starting point for gathering

concepts of the industry. As such, we decided that Wikipedia would be a good starting point

to get a quick footing into the discourse. Initially, we started with the term “screenplay”, a

term restricted to the motion picture industry and looked-up that term in Wikipedia. That

term lead us to Wikipedia categories film and video terminology, film making, film

production and film techniques. We then used these categories to compile a list of

terminology for the motion picture industry. We cannot claim this process provided complete

coverage of all terms in the motion picture industry; however, it created a substantial basis to

begin the modelling process. We did not model the terminology concepts based on Wikipedia

entries. For the actual modelling we referred to industry practitioners and industry literature.

While we obtain the list of terminology from Wikipedia, for a list of agents we turned to

IMDB, which includes a comprehensive list of credits (the often tedious enumeration of the

people involved in the development of the motion picture). We picked the movie “Stardust”,

which at the time was a recently released feature film from Hollywood; we then referred to

the full cast and crew list for that film to get a list of agents, where agents is a term that

collectively denotes the people involved in the industry. This list of agents was cross-checked

with the Australian movie “Lantana”. Once again, the actually modelling of the agents and

linking them with the terminology concepts was done through consultation with industry

practitioners and from investigation of industry literature, such as the book “On Film-

Making: An Introduction to the craft of the Director” by Alexander Mackendrick, a

screenwriter, storyboard editor and director, as well as the head of the California Institute of

Arts [101].

While investigating the concepts of the industry and relationships that link the concepts, the

importance of two contexts became very clear. It seems that the majority of the concepts in

the industry exist within the context of when the concepts are used and by whom they are

used. In short, all industry specific concepts have a temporal context and an agent context.

The aspects are related and have bearing on each other, feeding their importance. Both these

contexts were taken into account when developing the axioms that are the corner stone of the

ontology. The axioms are discussed in detail in Sections 4.2.

4.2 Axioms

Ontologies are governed by axioms which are used in order to express other relationships

between concepts and to constrain their intended interpretation [36]. These axioms define

what concepts can and cannot be included in the ontology, as well as provide the rules by

which the concepts are to be linked to each other. In addition, the axioms give guidance as to

how the rules are to be interpreted.

The Loculus ontology is governed by three types of axioms:

• General axioms which in turn is sub-divided into inclusion axioms and temporal axioms;

• Concept axioms which in turn is sub-divided into inheritance axioms, linkage axioms and

terminology axioms;

• Meta-link axioms which does not have any sub-divisions but links directly with the

linkage axioms as the Meta-link axioms provide names for the links governed by the

linkage axioms;

In this section we present these axioms in detail.

4.2.1 General Axioms

The General Axioms govern what can and cannot be included in the ontology (inclusion

axiom) as well as provides guidelines for capturing the temporal aspects of the industry

(temporal axiom).

4.2.1.1 Inclusion Axioms

To the extent that it is possible, we aim to capture in this ontology the natural discourse of the

motion picture industry. Therefore, the central tenet of the inclusion axiom is that if a concept

exists in the natural language discourse of the motion picture industry then it should be

considered for inclusion in the Loculus ontology. The first two inclusion axioms codify this

tenet.

[Inclusion 1] When expressed in natural language, concepts are considered to be part of

the discourse of the industry.

The reasoning behind this is that the concepts present within the motion picture industry is

part of the natural language used by the practitioners of the industry. While not made explicit

by the axiom, we are trying to avoid artificial constructs that are often employed in data and

knowledge modelling that are essentially convenient placeholders for a group of objects that

share some sort of a characteristic, e.g. a concept called “Things that have names”. The

reason for this is simply that as a domain ontology it is better to avoid such artificial

modelling constructs to keep the ontology as closely aligned with the domain as possible.

What must be noted here is that we cannot capture concepts that are not articulated. If the

industry has no term for a tacit understanding, then the concept is so implicit that it cannot be

captured. This is one of those situations where an example cannot be given because being

able to give an example means that the concept can be articulated and as such is part of the

discourse, therefore should be captured in the ontology.

It must be here noted that by adhering to the natural language of the industry and not using of

artificial constructs, that might make the model for aesthetically pleasing from a software

engineering perspective but that have no meaning within the industry, we partially mitigate

the problem identified by Hepp regarding the loss of meaning that is often experienced by

creators and intended users of ontologies [40].

[Inclusion 2] If a concept is used in the motion picture industry and is part of the

established discourse of the industry, then it should appear in at least one of the ontologies.

As the ontology is supposed to be a complete ontology, it must cover all concepts that form

part of the industry’s established discourse; however it should not cover things that are not

part of the established discourse of the industry. For example, concepts such as surgery,

triathlon and Field-programmable gate array should not be part of the ontology.

[Inclusion 3] Concepts that are limited to or that have a specific meaning within the

motion picture industry are considered to be motion picture industry (MPI) concepts.

This is all terms that are limited to the motion picture industry and hold little or no meaning

outside the context of the industry. For example Boom (the big microphones used to record

the actors) is a term limited to the motion picture industry and has little or no meaning

outside the context of the industry.

However, on occasion this would also apply to common terms that have special meaning

within the industry, e.g. Treatment - which within the context of the motion picture industry

means a document produced as a pitch document for a new motion picture project.

[Inclusion 4] Concepts that are not limited to or do not have a specific meaning within the

motion picture industry are considered to be “common” concepts.

Common concepts will be modelled in the Loculus ontology only to the extent where it is

necessary to model the concepts and process of the industry and/or where the concepts appear

so frequently that it would be remiss to not model them. That is, the criteria for including a

common concept are questions like, “is this part of a form?”; “is this used during the business

process of the motion picture industry?” etc.

For example, the ontology should cover the concept of lunch as it forms part of the business

process of the industry, e.g. allocation of time for lunch during shooting, catering for lunch

during shoots etc. On the other hand the concept of surgery is not a daily part of the business

of the industry.

[Inclusion 5] Concepts that deal with roles as played by natural or legal entities are defined

to be agent concepts.

For example, Actor, Editor and Production Studio are all agents that are involved in the

process of the motion picture industry and therefore should be part of the Loculus ontology.

On the other hand, submariner, tri-athlete and aero-space engineers should not be part of the

Loculus ontology as there are not agents who are involved in the daily process of the motion

picture industry.

4.2.1.2 Temporal Context

Before we present the Temporal Axioms, we must discuss the temporal context which gives

rise to the temporal axioms. As discussed in Section 1.1.1, the concept of time is very

important to the motion picture industry as all processes of the industry exists in the context

of the production cycle and the product itself develops in life stages over time. The timelines

do not have to be continuous and certainly the early stages of the both timelines can be

suspended and resumed.

For example, as mentioned in Section 1.1.2, The Curious Case of Benjamin Buttons was in

pre-production since at least 1994 when film industry executives were first approached with

the possibility of filming an adaptation of the F. Scott Fitzgerald short story of the same

name, but production did not start until sometime in 2007 and film was finally released in

2008 [5]. That is not to say that it was continuously being worked on. Indeed, work was

suspended for long stretches of time as those spearheading the project engaged in other

projects.

This is fairly common within the motion picture industry which can often have long

production process with bursts of activity marking critical development points for the project

or everything can be over and done within a short sharp burst of activity. However, the very

fact that all motion picture industry specific concepts exist within the context of a production

process timeline and a product development timeline means that concept of time is very

important in modelling the concepts of the motion picture industry.

The two timelines, shown in Figure 4.1, are linked but do not overlap. This is because they

represent different things. One represents the process and the other the development of the

product. The timelines are presented into detail below.

reuse/re purpose

Production Cycle

Life Stage

Figure 4.1: The two timelines of the industry (reusing Figure 1.1)

The production cycle is the timeline for the process. It defines the stages in which the

different processes that are undertaken to make the motion picture happen. The production

cycle is broken into three phases: pre-production, production and post-production. It is hard

to define when pre-production starts, as pre-production usually involves imprecise tasks such

as getting the basic concepts of the film to such a state so that it is given the go-ahead. The

production phase starts on the first day of shooting and ends on the last day of shooting. As

soon as production ends, post production starts in earnest, although some post-production

activities, e.g. special effects for scenes already filmed, might have began while the bulk of

the motion picture was still in the production phase. Post-production encompasses everything

after production. The reason for this is because there is always something to do whether it be

to produce the final cut, to market the final cut or to digitally re-master the motion picture for

a new generation or simply to preserve it. As such, there is merit in saying a completed

motion picture is always in post-production, since there really is no event that can be marked

as “the end”.

The life stages timeline charts the progress of the product: the artifact of a motion picture

itself. Not necessarily a single instance of the artifact but the concept of the artifact. These

life stages are conception, production and utilization. Utilization in turn comprises of

distribution, discovery, access, reuse/repurpose and preservation. The life stages do not map

exactly to the production cycle, nor do all motion pictures reach all life stages. A motion

picture is in the conception stage when it is conceived and is being fleshed out. The latter part

of conception would correlate with pre-production. A motion picture is in production life

stage when it is being produced; so the latter parts of pre-production, all of production and

the post-production activities that end with the creation of the final cut would correlate with

this stage. Utilization spans the remainder of post-production. However, while there is an

order in which the life stages must be reached, the life stages are not a good measure of

sequence nor duration as a motion picture can exist in multiple life stages at once and return

to a previous life stage under certain circumstances. Certainly, the sub-stages of utilization,

discovery and access happen multiple times.

The important question to address at this point is why the two timelines do not overlap

perfectly. The answer lies in the fact that the birth of the product, i.e. the conception stage,

does not necessary have to involve any formal processes recognised by the industry. It could

merely be a discussion among agents of the industry over a quiet drink that fleshes out the

details of the project, or a conversation held on the set of one project with people who might

want to work in the new project. Until some formal process starts, the production cycle

cannot start. However, that is not to say the product is not being developed, even if it is in just

the heads of the agents involved. This goes towards the multifaceted nature of the motion

picture product.

4.2.1.3 Temporal Axioms

In light of the temporal contexts detailed in the previous section, the temporal axioms detail

how the concepts, mainly the agent and MPI specific concepts, exist in the context of the two

timelines of the industry. While the axioms only reference the concept of temporal phases

and changes in said phase, when reading the axioms it must be kept in mind that the temporal

phase for the motion picture Industry are defined by the temporal aspect of the motion picture

industry and its two timelines.

[Temporal 1] Given the importance of time, all concepts that are not common concepts

must have an explicit link to one or more of the temporal phases or the entire temporal

phase as a whole, either directly or inherited through a parent concept.

This is not to say that common concepts cannot have a temporal aspect, but only the MPI

specific concepts have to have a temporal aspect to it. This is because, as mentioned before,

within the industry time matters and almost all concepts have a temporal aspect in practice.

However, common concepts by their very nature are general and, while they may have

temporal aspects attached to them, it cannot be mandated. For example, Lunch is a common

concept that is most frequently associated with the business process during production. As

such Lunch can have a temporal phase association. On the other hand, Address cannot be

associated with any particular temporal phase.

[Temporal 2] Motion picture industry agents must have a link to one or more temporal

phases of the production cycle or to the production cycle as a whole. Common concept

agents do not always have to have a link to the temporal phase of the production cycle.

As mentioned before, the agents in the motion picture industry can be either long term or

short term, where short-term agents are associated with only one phase while long-term

agents are associated with two or more phases of the whole production cycle. For example,

the Producer is involved with a given motion picture during the entire production cycle. On

the other hand, the Boom Swinger is only involved during production phase of the production

cycle.

Therefore the agent ontology must reflect the short-term or long-term nature of a given agent

role inherent in the nature of the industry.

[Temporal 3] All concepts that are not agent or common must have a classification in

reference to the life stages of a motion picture.

Within the motion picture industry common concepts such as Action are not generally

associated with life stages and either are agents. The reason for agents never having life

stages is to do with the discourse of the industry. Within the industry the agents are not

associated with life stages, they are only associated with the phases of the production cycle.

4.2.2 Concept Axioms

The Concept Axioms govern the relationship between concepts. As shown in Figure 4.2 ,

concepts are linked with each other vertically and horizontally. The “vertical” relationships

are governed by the Inheritance Axioms, while the “horizontal”/property-type relationships

are governed by the Linkage Axioms. The Linkage Axioms also capture the Agent context of

the motion picture industry.

editing

action

editor

inherits from

is performed by

Figure 4.2: The concept of editing with vertical and horizontal links

4.2.2.1 Inheritance Axioms

The Inheritance Axioms govern the vertical relationship between concepts, as well as

dictating the level of abstractions of concepts. The defining test for the abstraction of a

concept is the “Get me a…” test. “Get me a crew” is a statement that is too general and would

prompt the question “Which type of crewmember are you referring to?”. On the other hand

the statement “get me an actor” is sufficiently detailed that such a statement could be

complied with. The highest level of abstractness in the ontology is represented by a set of

common concepts that are referred to as the root concepts. The roots concepts are identified

in the first inheritance axiom.

[Inheritance 1] All concepts are children of the abstract terms

This is the key inheritance axiom and in the case of the Loculus ontology the abstract terms

or root concepts are: Agent, Artifact, Tool, Technique, Description, Action and Process. All

concepts, whether they are common or MPI specific have their origin in these root concepts.

These root concepts are part of the discourse of the industry but tacitly understood to be the

parents of all the other concepts (tacitly because industry practitioners do not think in terms

of inheritance, abstract and concrete concepts). However, that is not to say that they do not

have an instinctive understanding that certain concepts are better defined than others.

Closely related to this axiom is the next axiom, which reinforces that notion that the root

concepts are the highest level of abstraction as dictated by the “Get me a...” test.

[Inheritance 2] The abstract terms represent the highest level of abstraction

In addition to the previously mentioned points, it must be noted that the root concepts are

parents to distinctly different child concepts, e.g. though very different types of actions, both

Editing and Acting are children of the root concept Action.

Another point to take note of is how more concrete Editing and Acting are as concepts

compared to Action. Figure 4.3 shows the inheritance hierarchy featuring all the root concepts

and one of the concepts that inherit from it. As it can be seen that the jump in concreteness is

the greatest when at the first inheritance from the root, e.g. a Acting is several times more

concrete than the abstract concept of Action but, while more specific, the difference in

concreteness/abstractness between a Method Acting and a Acting are not as great as those

between Action and Acting.

This understanding of increasing and decreasing abstractness is captured further in axioms 3

and 4, which are given below.

[Inheritance 3] In an inheritance hierarchy, the top level concepts are more abstract than

the bottom level concepts.

As explained before with the example of Action, Acting and Method Acting. Each step

vertically down represents a more specific concept. However, the first step down remains the

biggest step from abstract to concrete. This is further reflected in the other examples

presented in Figure 4.3.

[Inheritance 4] Assuming that the immediate parent of a concept is not a root concept, the

concept has a closer link to its immediate parent and by extension its immediate child, than

every other concept in its hierarchy.

As mentioned previously, the Root concepts are so abstract that by their very nature they

cannot be considered closely coupled with their children. On the other hand, more concrete

classes would be closely coupled with their children as their children would generally be a

more specific form of the parent concept, e.g. Method Acting is closer to Acting. The essence

of this axiom is to clarify the degree of closeness as movement is made along the inheritance

hierarchy.

Inheritance Hierarchy

artifact agent action description process technique tool

motionpicture

people acting category rehearsal montage camera

castmethodacting

emotivecategory

digitalcamera

Figure 4.3: Inheritance hierarchy of some of the concepts within the motion picture industry

[Inheritance 5] A concept that inherits from a common concept is not by virtue of

inheritance considered to be a common concept as the child concept may be specific to the

motion picture industry.

The root concepts are all common concepts but the children that result from them can be

concepts specific to the motion picture industry. This is an observable fact in the industry and

results because the common concepts came first and were extended into the specific concepts

by the industry as it developed. For example, Acting - a specific concept, is a child of Action

- a common concept.

However, once a common concept is extended into becoming specific, it is impossible for its

children to be anything but a concept specific to the motion picture industry. This is both an

observable fact as well as dictated by the inheritance axioms 3 and 4 that say that concepts

become more concrete as one moves down the inheritance tree. Inheritance axiom 6 and 7

codifies this observation.

[Inheritance 6] If a concept is a child of a specific concept then it follows that it too is a

specific concept.

For example, Method Acting is the child concept of Acting, where both are specific and

Method Acting cannot be any less specific than Acting. In short, common concepts can be

parents of specific concepts, but specific concepts cannot be parents of common concepts.

[Inheritance 7] If a concept is descended both from a common concept and a specific

concept then it is considered to be specific to the motion picture industry.

For example, Cut-away is the child of the specific concept Editing and the common concept

Technique. Once again, just having a specific concept included when it is a case of joint

inheritance, means that the resulting child cannot be a common concept.

The last Inheritance Axiom is very different from the other axioms presented so far and

relates directly to the Linkage Axioms that are to follow. Basically, the next axioms dictate

how the linkages of the parents are to be inherited by the child.

[Inheritance 8] Child concepts inherit all linkages of their parent, unless the linkages are

specifically overridden for the child.

For example, Cross-cutting inherits “is performed by Editor” through its parent concept of

Editing. This is a common rule usually applied during inheritance. The child is usually

assumed to inherit all its parents’ properties unless explicitly overridden.

4.2.2.2 Agent Context

As mentioned before, one of the functions of the Linkage axioms is to govern how the agent

context of the motion picture industry is captured by the Loculus Ontology. However, what is

the agent context of the motion picture industry? In this section, we briefly discuss the agent

context before we move on to presenting the actual Linkage axioms in the next section.

The concentration of motion picture industry specific agents is in the production phase. In the

minds of the audience, these agents of the motion picture industry are divided into two

categories, the cast and the crew, where the cast consists of agents in front of the camera and

the crew consists of agents behind the camera. However, upon closer inspection it becomes

clear that the situation is a lot more complicated than that. For starters, crew can be divided

into creative crew (the costume designer) and technical crew (cameraman), with some crew

agents being both creative and technical (directory of photography). These different roles

dictate the type of concepts the agents would be most familiar with and the type of concepts

the agents are least familiar with. Highly creative roles would not be aware of the concepts

involved in the highly technical roles and vice-versa. More importantly, within the industry

there is a strong association between the majority of concepts and the agents who are chiefly

associated with those concepts. As such the agent context of the concepts within the motion

picture industry domain of discourse must be an integral part of any domain ontology to

correctly reflect the nature of the industry.

The agent context is not independent of the temporal context; the temporal context applies to

the agents as well. Agents, both creative and technical, can either be involved with only one

phase of the production cycle, making them strictly short-term agents like the editor who is

only involved in post-production, or they can be involved with multiple phases of the

production cycle, making them longer-term agents like the producer who is involved in all

phases of the production cycle. Agents are never associated with the life stage timeline

because the industry never associates them with the life stage timeline.

It must be noted that there are other agents who are involved with the industry, such as

journalists, archivists, and financiers etc, but who are not exclusive to the industry. These

agents are largely involved during pre-production and post-production. However, agents

within the industry do not necessarily consider these other agents to be part of the industry,

rather they are more supporting of the industry. They would often identify themselves as

belonging to a different industry, i.e. journalist to the journalism/media.

The basis for inclusion here is ending credits and/or self-identification. If an agent is included

in the ending credits of a motion picture then it can be seen as the industry affirming that, that

agent role is part of the industry. However, ending credits alone is not sufficient marker of

motion picture industry inclusion as many who are members of the motion picture industry

would not necessarily be mentioned in the credits, e.g. the talent managers of actors. There is

also the situation where the ending credit may credit agents such as personal assistents,

caterers and other common concept agents who do not self identify as being part of the

motion picture industry. This is why self-identification is an additional basis for inclusion.

Self-identification is what answers agents return when they are asked “are you part of the

motion picture industry?”

4.2.2.3 Linkage Axioms

The Linkage Axioms govern how the concepts are linked “horizontally” through meta-links

or, in other words, the Linkage Axioms govern the relationship between concepts (other than

inheritance relationships). There are two types of links possible in the Loculus Ontology,

weak links and strong links. Weak links connect two concepts in a vague abstract manner,

indicating that while it is known that the two concepts have a direct relationship between

them, the exact nature of the relationship is nebulous and difficult to articulate, varying

greatly between motion picture projects. A strong link on the other hand links concepts

together in a specific and concrete manner, indicating that, not only do the two concepts have

a relationship between them, the relationship is well known and well understood. The first

two Linkage axioms codifies this:

[Linkage 1] Two concepts where neither of them are agents can be linked through a weak

link which implies that, while the concepts are linked, the link is abstract and non-specific.

For example, while the concept of Prestige is associated with the Film Festivals, the exact

nature of the relationship is not easy to articulate. Moreover, prestige is also very subjective

and there are other factors that dictate whether a given Film Festival is prestigious or not and

sometimes there are issues regarding who is receiving and who is giving the prestige. A new

and unknown film maker will undoubtedly be the one receiving all the prestige should their

film be accepted for screening at the Brisbane International Film Festival (BIFF). On the

other hand, if the BIFF manages to secure a premier for the new film of an established and

famous film maker for their festival, it is in fact the BIFF which is receiving prestige by

association with the famous film maker, while the film maker is likely not receiving any

prestige simply because the film maker’s base level of prestige is higher than that of BIFF.

As such, while prestige and Film festivals are related, the nature of the relationship is vague

in nature and therefore weak.

[Linkage 2] Two concepts where neither of them are agents, can be linked through a

strong link which implies that the link is specific and precise.

A motion picture is screened at Film Festival, this is a relationship readily articulated and

does not vary. As such the link between motion picture and Film Festival is a strong link.

There is a very important reason why both Linkage axioms 1 and 2 make a distinction

between agents and non-agent concepts. The reason is that the presence of the agent alters the

interpretation of the relationship. When an agent is involved, the connection between the

agent and the non-agent concept gives an indication of the perspective of the agent in

reference to the non-agent. However, when two non-agents are involved, there is no

perspective as only an agent is capable of perspective. In addition, when one of the concepts

being linked is an agent, the ontology is essentially attempting to capture the agent context of

the motion picture industry and, in essence, map the perspective of the agent in relation to the

concept. However, the basis behind the strong and weak relationships remains the same.

[Linkage 3] If an agent has a strong link to a non-agent concept then it is understood that

the agent has a deep involvement in the non-agent concept.

If the link can be precisely defined, i.e. the link is a strong link, it means that the agent should

have a thorough understanding of the concept. Expressed in another way, if the stated

industry practitioner, who has a given agent role, has a clear understanding of a given concept

– they can clearly articulate their relationship in terms of the concept and the relationship

with the concept does not vary when moving from one motion picture to another, then that

agent has a strong relationship to that concept and the relationship is modelled as such. E.g.

Actor performs Acting, this is a precise relationship that can be precisely defined and

articulated. As such, it is a strong link.

[Linkage 4] If an agent has a weak link to a non-agent concept then it is understood that

the agent has some understanding of the non-agent concept but it is not deep and the agent

is not an expert in the concept.

For example, Blocking involves agent Actor. Blocking is a rehearsal technique by which a

scene is finalized (e.g. placement of lights, movement of actors etc) before it is shot.

Precisely what an actor does during blocking is hard to capture and will differ from scene to

scene, actor to actor and motion picture to motion picture. In other words, where different

instances of the agent define their relationship to a concept in an abstract manner or cannot

exactly define their relationship beyond that they are involved the relationship is considered

to be weak.

The next Linkage axiom deals with the direction of the relationship, as in whether the

relationship is uni-directional or bi-directional. The reason that strong links and weak links

behave differently in this regard has its origins in the discourse of the industry. Strong links

are so specific that they take one form when they go from concept A to concept B and a

completely different form when they go from concept B to concept A. On the other hand,

weak links are so general and vague that whether they are linking concept A to concept B or

going the other way and linking concept B to concept A, they have the same form.

[Linkage 5] All strong links are uni-directional and occur in pairs, while all weak links are

bi-directional, where direction relates to the vocabulary used.

For example, the pair of strong links that connect Actor and Acting have different meta-types

depending on the direction. That is, Actor performs Acting but Acting is performed by Actor.

On the other hand the weak link between prestige and film festival is the same regardless of

the starting concept; Prestige is associated with Film Festival and Film Festival is associated

with Prestige.

The last two Linkage axioms deal with how agents are linked together and the reason inter-

links between agents have to be separately identified is because of how agents are linked

together within the industry. Firstly, certain agents are grouped together into departments

because these agents all have a specific speciality and work together on some aspect of the

production cycle exclusively. These agents are directly linked together and their link is

codified in the next Axiom.

[Linkage 6] Agents are linked to other agents through aggregate concepts; all agents that

belong to the same group are deemed to have a strong link to each other.

For example, both the Director and the Producer belong to the Production Office, where the

production office is the aggregate concept that groups together the agents Director and

Producer. These aggregate concepts come from the industry and it is how the industry groups

agents together. The understanding here is that these agents work closely with each other and

often across many concepts.

On the other hand, the second type of agent link is between agents that are in different group

in the industry classification. These agents are not linked directly to each other by the

industry as such — rather they are linked through concepts that involve them. The last

Linkage axiom captures this industry characteristic.

[Linkage 7] Agents who belong to separate groups are linked via a non-agent concept.

For example, in the previous Blocking example, the Director (a member of the Production

Office) is linked to Actor (a member of the Cast) through the concept of Blocking. The

question can be asked why Director and Actor cannot be linked directly because Directors

direct Actors. The answer is that the Director does not just direct Actors. They direct Actors

“on set”, during a specific scene. As such, the act of directing by the Director in relations to

the Actor is very context specific. Therefore, it makes sense to only link the two through that

context.

4.2.2.4 Terminology Axiom

The last type of Concept Axiom is the Terminology Axiom, which is necessary for natural

language reasons as language contains a number of synonyms or equivalencies. An example

of equivalence is FPS which is equivalent to Frames-Per-Second because one is the

abbreviation of the other. There is only one axiom pertaining to equivalence but a number of

other rules arise because of the nature of the axiom.

[Terminology 1] Synonyms are accounted for by linking synonym concepts as equivalent to

the central concept of which they are synonyms; the choice of the central concept is

determined by the industry and/or arises from natural language.

The above axiom by its nature dictates the following axioms to be true.

• There is a central concept to which all other synonyms are referred to.

• A synonym can only be equivalent to one central concept.

• Only the central concept have properties, synonyms absorb all properties from the central

concept.

• Only the central concept has an inheritance hierarchy (parent concepts), synonyms mirror the

inheritance of the central concept.

• Only the central concept can be linked to a synonym, synonyms cannot be linked to other

synonyms.

4.2.3 Meta-Link Axioms

The last type of major axiom class is the Meta-Link axiom. The Meta-Link axioms dictate the

names to assign the weak and strong relationships outlined in the Linkage axioms. In essence,

the Meta-Link axioms define what can be classified as weak links. Everything that is not a

weak link is a strong link. This is because there are more types of strong links than there are

weak links. These weak links are semi-artificial constructs in that the manner of their

expression is allowable in natural language discourse but it is unlikely that industry

practitioners would express it exactly in those terms. At the same time, the hierarchical

relationships between the weak and strong meta-links that the following axioms establish are

something tacitly understood by industry practitioners but not necessarily expressly

acknowledged.

[Meta-Link 1] All links between concepts inherit from the root links “is associated with”,

“involves concept”; both of which are considered to be weak relationships.

These are extremely general relationship descriptors that can cover a wide range of

relationship and at the same time convey nothing specific beyond that two concepts are

linked in some way. As such these are considered weak relationship links.

[Meta-Link 2] The root link “involves concept” has the child concepts “involves agent”

and “involves component”, which are considered to be weak relationships but does have

directions unlike the their parents

These relationships are a little more specific in that they link a particular type of concept.

That is, “involves agent” links a non-agent concept with an agent concept and “involves

component” indicates that the second concept is part of the first or expressed in another way,

the first concepts consists of the “component” concepts linked to it via “involves

component”. These are however still vague enough to be considered weak relationships.

[Meta-Link 3] Links can either be strong or weak, if a link is not weak (as defined by the

previous axioms) then it must be strong.

The decision to have a binary relationship of either weak or strong was made because in most

cases it is impossible to articulate the range of relationships that exists between weak and

strong. Therefore, the decision was made that if the industry practitioners or the industry

literature could clearly articulate the exact nature of the relationship between two concepts

then those two concepts would be linked using a strong relationship. If on the other hand the

industry practitioners or the industry literature could not clearly articulate the exact nature of

the relationship between two concepts, then they would be linked using the weak link that

suits the two concepts the best.

The weak links are fixed in quantity, with the next axiom setting down what meta-

relationships can be considered weak.

[Meta-Link 4] The weak links are limited to the root links (“is associated with” and

“involves concept”), “involves agent” and “involves component”.

We have made a concerted effort to exclude artificial constructs but in the case of the weak

relationship, we have had to come close to the borderline of such constructs. Industry

practitioners may not be explicit about relationship but common use of the concepts together

indicates the existence of some sort of relationship. As such, there are no natural occurring

phrases from the discourse of the domain that can be used to represent weak relationships.

However, the meta-links chosen for the concept relationship are such that when articulated

out loud, sound natural and meaningful. For example, Actor is associated with Blocking.

[Meta-Link 5] Except for “involves component” and “involves agent”, children of root

links are considered to be strong links as they are more specific.

This last axiom sets up the meta-link hierarchy, which is shown below in Figure 4.4. The

strong links are the children of the weak links because the weak links can apply to all

relationships and the strong links make the weak links more specific/precise. The decision to

make the strong links into the children of the weak links was made to clearly indicate the

inter-connected nature of the links.

2nd order

Examples

of strong

Meta-Link Hierarchy

is associated with involves concept

involves agentinvolves

component

is performed by is used foris screened atis performed

duringto see

Figure 4.4: The meta-link hierarchy

4.3 Structure of the Ontology

The ontology is structured in three parts and, because of the axioms presented before, the

ontology exists as a lattice-like structure in three axes or dimensions. The ontology consists

of three sub-ontologies: the MPI concepts ontology, the Agent concepts ontology and the

Common concepts ontology. They are detailed in the following sections.

4.3.1 MPI Concepts Ontology

The motion picture industry (MPI) concepts ontology contains all the concepts unique to the

motion picture industry or that have a special meaning within the industry. For example,

Cross-Cutting is an editing technique that is a concept unique to the motion picture industry

and would therefore be modelled as part of the MPI concepts ontology. Likewise, Treatment

is a concept, while not unique to the motion picture industry, does have a special meaning

within the industry. In the case of Treatment: it is a document usually prepared by the

Producer as part of a pitch for a new motion picture project. This differs greatly from the

usage of the concept treatment in other domains of discourse and the natural language

discourse of society at large. Concepts that do not have a special meaning within the

discourse of the industry, e.g. action, description, or are types of agents, either people or

companies, are not modelled as part of the MPI concepts ontology.

4.3.2 Agent Concepts Ontology

The Agent concept ontology models all agents that are frequently involved with the processes

of the motion picture industry. It is, of course, not possible for all agents involved with the

motion picture industry to be modelled simply because of the sheer number that become

involved in the later parts of the production cycle and life stage timelines. For example, if the

motion picture is deemed of historical significations, an archivist will become involved.

There is also a question of whether agents such as the onsite nurse need to be modelled in the

Agent ontology. Certainly most motion pictures shoots would have an onsite nurse or at least

some sort of First aide provider, however, the question becomes where to draw the line and

the line was drawn between agents that are essential to be captured to give a good

representative model of the discourse of the motion picture industry, but excluding agents

who may not be as essential to the concept model.

As explained in Section 4.2.2.2, the basis for inclusion is ending credits and/or self-

identification. If an agent is included in the ending credit then it can be seen as a industry

affirmation that, that agent’s role is part of the industry. However, the agent in question must

still self-identify as being part of the motion picture industry to affirm that selection of the

industry. In addition, if an agent self-identifies as being part of the industry and their work

primarily does involve the motion picture industry, non-inclusion in ending-credit is not a

basis for exclusion from the agent ontology.

That is not to say that Agent ontology only models agents specific to the motion picture

industry. Certain common concept agents are necessary to model accurately the domain of

discourse. For example, an electrician is a common concepts agent that needs to be modelled

as an electrician plays such an important role during the production of the motion picture.

Taking into account the variation of the various types of agents, the ontology does identify

agents as creative, technical or a hybrid of the two, with MPI specific agents having an

explicit link to a phase of the production cycle.

4.3.3 Common Concepts Ontology

The common concepts ontology models those common concepts that either act as parents to

the motion picture industry specific concept, such as action which needs to be modelled to

properly identify concepts such as acting as types of action, or common concepts that occur

so frequently in the industry that without modelling them would not allow a proper concept

model for the industry to be developed. For example, the concepts of fee and lunch break

would need to be modelling to capture accurately and model various process of the motion

picture industry.

The question here, much like that needed for the agent concept, is where the line is to be

drawn. With common concepts the line is easier to draw. Any concept that does not occur

frequently in the motion picture industry can be safely discarded, with concentration being

focused on only those common concepts that are necessary. By “necessary”, we mean which

common concepts are expressed with such high frequency in relation to motion picture

industry specific concepts that not including them would result in the improper modelling of

the motion picture industry specific concepts. In short, the common concept ontology was

only developed to the extent that it was absolutely necessary to do so to model accurately the

MPI concepts ontology and the Agent concepts ontology.

4.3.4 The Root Concepts

For the Loculus ontology we made the conscious decision to capture the discourse of the

domain, as stated in Inclusion Axioms 1 and 2. Our root concepts are concepts that are

naturally the parents of concepts within the motion picture industry and concepts that

industry professionals would intuitively understand and indeed how they would group

together the concepts of their industry. However, it must be emphasised that our root

concepts address the questions of who, what, where, with the when being addressed

separately by the temporal context. Of the root concepts, Agent addressed the who question;

Artifact, Action, Process, Tool and Technique address the what question; the where question

is addressed by the concepts that are the children of Description. However, the root concepts

only apply to the MPI specific concepts contained within the MPI ontology.

The concepts that form part of the common ontology do not necessarily inherit from the root

concepts, but the root concepts are common concepts that have been chosen specifically for

the MPI concepts on the basis that these are the concepts under which the MPI concepts are

naturally grouped per their use in natural discourse. This is not to say that none of the

common concepts inherit from the root concepts, where the classification is obvious, such as

Address inheriting from Description, the inheritance is noted. Otherwise the common

concepts are left alone, such as in the case of Lunch – is it a process or should it be classified

as a description because that is how concept is used in the industry?

In addition, it must be noted that the concepts relating to the two timelines, i.e. the temporal

context, do not inherit from anything. They could potentially be grouped under Description

but that is not necessarily a natural grouping for them. Ontological elegance would dictate

that they be grouped under something like Temporal Phase or Temporal Description.

However, either of those two concepts would be wholly artificial and not related to discourse

of the industry. Often in ontology engineering, for the sake of ontological elegance such

artificial concepts are employed. However, as identified by Hepp, these artificial concepts do

not make sense to industry practitioners and often confuse them, thereby discouraging them

from using the ontology [40]. In addition, artificial constructs have the potential to throw off

perspective sensitivity by moving the ontology away from the natural discourse of the

industry. As such we have therefore opted to sacrifice ontological elegance in order to remain

faithful to the industry discourse as best we can.

In this manner we modelled the concept of Editing, shown in Figure 4.5. Editing inherits

from the root concept of Action and has linkages to the agent Editor and to the temporal

context through the concepts of Post Production and Production, where post production is the

phase of the production cycle in which editing is performed and production is the life stage

under which the industry classifies editing.

editing

action

editor

post production

inherits from

is performed during

is performed by

productionis classified under

Figure 4.5: Ontology Extract – Editing

Editor in turn is descended from the root concept of Agent but the descent is far more

complex, due to the hierarchical structures employed by the industry. As shown in Figure 4.6,

the concept of Editor inherits from the concepts of both Technical Crew, Creative Crew, as

well as that of Person. Technical Crew and Creative Crew are both children of the concept of

Crew. However, the industry dictates that both types are necessary as there is a distinction.

For example, a cameraman is only a member of Technical Crew as their function is generally

simply to operate the camera. On the other hand, the Director of Photography (DoP) is a

member of a both the Creative and Technical Crew because the DoP has an instrumental

input in the creative direction of the film and works closely with the director to bring the

script and the director’s vision to life. Different from both the Cameraman and the DoP is the

costume designer who is a member only of the creative crew. The reason to distinguish the

Editor is person is necessary as non-persons, such as companies and animals, can also be

agents of the industry. However, the pertinent point to note here is that eventually Editor does

descend from the root concept.

editor

personcreative

crewtechnical

postproduction

EditorialDepartment

inherits from

works during

is part of

people

inherits from

is one of

agenttype of

Figure 4.6: Ontology Extract – The Inheritance of Editing

The next logical question is how the root concepts are modeled within the ontology. As

shown in Figure 4.7, naturally the root concept of Action is the top most level of abstraction

and has no parents. It is explicitly linked to the concept of motion picture. As common

concepts, root concepts are not required to have temporal association; in this case the root

concept of Action does as this makes it explicit that Action concepts are used thorough-out

the production cycle. In line with industry usage, action type concepts are mostly associated

with the production phase of the motion picture life stage timeline.

action motion pictureis associated with

production cycleis performed during

productionis classified under

Figure 4.7: Ontology Extract – Action

4.3.5 Lattice Structure

The lattice-like structure of the ontology is the result of the Concept Axiom. At the heart of

the lattice is the concept of motion picture, which links all the root concepts together as

shown in Figure 4.8. The motion picture concept has been modelled to be a child of the

concept Artifact (a root concept), which is what it is – the artifact that is the product of the

entire production cycle. The other root concepts are the components that make up the artifact

of the motion picture, which in turn has some specific links to the concepts of Film Festival,

Cinema and Audience. The temporal aspect of the motion picture concept is that it is linked to

the entire Production cycle but is chiefly used during the utilisation stage of the life stage

timeline. The latter, as mentioned before, is a link that’s made by the industry through

practice. In this case, the practice that associates the artifact of motion picture with the

utilisation stage is motivated by the fact that the artifact is used during the utilisation stage,

while the earlier stages create the artifact.

motionpicture

description

action

technique

process

artifact

involves component

artifact

inherits from

film festival

cinema

is screened at

audience

productioncycle

utilization

seen by

is associated with

classify under

Inheritance

Temporal

Linkage

Figure 4.8: Ontology Extract – Motion Picture

4.3.6 Three Axes

As mentioned before, the lattice-like the structure of the ontology exists in a three axes plane.

The first axis is the vertical or inheritance axis, the second axis is the horizontal or linkage

axis and lastly the temporal axis. These axes are the result of the axioms and the temporal and

agent contexts of industry. Almost all the concepts exist in the three axes. Almost all because,

per the axioms, common concepts do not have to have a temporal association and need not

inherit from anything. However, all the concepts of the MPI ontology and the Agent ontology

do exist within the bounds of the three axes. For example, Figure 4.9 shows the previously

introduced concept of Editing contained within the three axes. These axes are the direct result

of the axioms and of the two contexts, temporal and agent, of the motion picture industry.

Linkage Axis

l Axis

editing

action

editor

post production

inherits from

is performed during

is performed by

production

is classified under

Figure 4.9: Ontology Extract – Editing within the Three Axes

The Inheritance Axis is the direct result of inheritance axioms, which set up the root concepts

and then dictate how concepts are to inherit from them and other concepts. Those axioms in

essence set up a vertical plane that start with the root concepts and goes downwards, or from

any given concept goes upwards towards the root.

The Linkage or horizontal axis is the result of the Linkage axioms that serve to connect one

concept to another horizontally through relationships. This of course incorporates the agent

context of the industry as that too is captured through horizontal linkages, through the linkage

axioms. Lastly, the temporal axioms that capture the temporal aspect of the industry are

responsible for the temporal axis.

4.4 Ontology Implementation

The ontology was implemented using the OWL using the Altova SemanticWorks 2008

software. In this section we will be discuss the motivation behind the choice of

implementation language and tool, as well as highlight how the ontology has been

implemented in practice. The implementation of the ontology was unavoidably subject to the

restraints of the technology used to implement it. The nature and impact of these limitations

are also discussed.

4.4.1 OWL

As first explained in Section 2.2.3, OWL is a set of languages used for knowledge

representation in the form of ontology authoring [74]. OWL is designed for use by

applications that need to process the content of information instead of just presenting

information to humans [74]. Recommended by the World Wide Web Consortium, it

comprises of three semantically related languages: OWL Lite, OWL DL and OWL Full. Each

language is a syntactic extension of its simpler predecessor with OWL Lite being the simplest

of the languages and OWL Full being the most comprehensive and supporting all features of

OWL [74]. As a result, while all valid OWL Lite documents are also valid OWL Full

documents, not all valid OWL Full documents are valid OWL Lite documents. For our

purposes we opted to use OWL Full in order to gain access to all OWL functionality. In

hindsight, it was found that it could have been implemented in OWL DL. However, we did

not know this at the onset and thus opted to use OWL Full.

Our decision to use OWL was made on three grounds. Firstly, we wanted to take advantage

of features that were exclusive to OWL; for example, the property of ‘disjointedWith’ is

exclusive to OWL and is invaluable in expressing the fact that while two concepts are

disjointed despite, say, an overlapping inheritance hierarchy. An example of two disjointed

concepts would be Emotive Category and Criteria Based Category, as can be seen from

Figure 4.10, while they both share the common parent of category they are not disjointed

from each other as one involves the use of the concept of emotion and the other the concept

of criteria.

The second benefit of using OWL is that it is the recommended language of the Semantic

Web. This makes the Loculus ontology ready for application within web services, without

creating any disadvantages in using the ontology in a non-web context.

We implemented the ontology that was possible, with our expressive power being

constrained by both what could be modelled and by what could be articulated. In this case,

while there was no particular function that was lacking in OWL that could be explicitly

flagged as restricting the expression of the ontology, rather the limitations were part of the

overall modelling process in that every time a concept was modelled the question had to be

asked, given the axioms that govern this ontology and the expressive power of OWL: what is

the best way to model this concept? In short, we constructed the ontology as expressively as

we could give the constraints of the semantics of OWL and in the context of the axioms for

Loculus.

Disjointed With

genrepost

production

discovery

criteria basedcategory

Category

total score: 6

EmotiveCategory

Emotion

+1 temporal score

involves concept

+2 reach score

+1 temporal score

Production Cycle

reuse/re purpose

Life Stage

Figure 5.7: The score calculation of Mood to Category

The next example of calculation is from Mood to Rating, where rating refers to the

classification of motion pictures. E.g. PG, M, MA etc. Like Mood, Rating is also a child of

Category. Unlike Mood, Rating is a criteria based category and to reach it from Rating, you

have to go through Category. This is shown in Figure 5.8. The temporal score does not

change in this case. The only increase occurs in terms of the reach score, which is increased

by 4 because of the extra distance that needs to be travelled between mood and rating.

Category

total score: 10

EmotiveCategory

Emotion

+1 temporal score

involves concept

+2 reach score

Criteria basedCategory

+2 reach score

Rating

Criteriainvolves concept

+2 reach score

+1 temporal score

reuse/re purpose

Life Stage

+1 temporal score

Production Cycle

Figure 5.8: The score calculation of Mood to Rating

The next example calculates the distance between Film Festival and Prestige. Prestige can be

acquired from being involved in a film festival but that prestige is dependent on a multiple

factors. For a new film maker, acceptance into any film festival with a barrier to entry is a

matter of prestige. For an established film maker even the top film festivals do not confer

prestige upon them, rather they add to the prestige of the film festival by agreeing to take part

in it. For example, the fact that director Steven Spielberg agreed to premier his movie at the

Cannes Film Festival reinforces the stature of Cannes as the world’s leading film festival.

Steven Spielberg himself is so famous that no film festival can add to his prestige, though

being refused the right to premier his new movie at Cannes could possible lead to questions

and a slight reduction in his prestige as that would imply his latest film might be sub-par. By

the same token, the prestige of Cannes Film Festival and that of Steven Spielberg is such that

by refusing the right to premier his new movie at Cannes, the Film Festival itself can run the

risk of having its own prestige lessened.

It is a complex relationship but the point is that while film festivals and prestige are linked,

the link is by no means obvious or explicit. However, what is explicit is the link between

Film Festival and Award. Film Festivals give awards to those who participate in said

festivals. The award can take the form of an actual judged award that confers either a

monetary reward or simply a prestigious trophy, such as the Palme d'Or award for Cannes

Film Festival that does not have any monetary reward associated with it, or be implicit simply

through participation. Therefore, the link between Prestige and Film Festivals exists through

the concept of Award and specifically the Prestige Award, as shown in Figure 5.9. The total

score of 16 is reached primarily by reaching Prestige through the two weak links “is

associated with” and “involves concept”. The score reflects the weak but still somewhat close

association between Film Festival and Prestige. The score is high enough to reflect that

Prestige is not indispensably associated with Film Festival but still low enough to reflect that

part of the function of Film Festivals is to confer Prestige.

FilmFestival

Prestige

Prestige Award involves concept

+5 reach score

+2 reach score

+3 temporal score

Total Score: 16

Production Cycle

+1 temporal score

reuse/re purposeLife Stage

Figure 5.9: The score calculation of Film Festival to Prestige

The next example of calculation demonstrates the linkage of two concepts through an abstract

root concept. In this case the two concepts being linked are Score and Prop, which are linked

through Artifact as illustrated in Figure 5.10. Though they are mostly in the same temporal

phase, the temporal score is largely eclipsed due to the high reach score. The high reach score

is a result of the workings of inheritance axis rule 3 (Section 5.2.1.1) which assigns a weight

of 25 to the edges connecting Score to Artifact and Prop to Artifact due the Artifact being a

root concept and as a result, representing the highest level of abstraction.

This is consistent with observed behaviour in the industry. When the reach between concepts

is small, the temporal score can add a layer of context. However, if the reach score indicates

that the two concepts are very distant then their temporal position becomes largely

meaningless. What does it matter if two concepts are in the same phase when they are so

clearly far apart within the context of the domain ontology?

To reiterate, when the reach is small, the temporal context becomes significant and adds a

layer of meaning. If two concepts are closely associated but temporally distant what it may

give information on sequencing or other information that add a dimension of meaning and

context. However, if the concepts are far apart within the domain ontology then the temporal

context has no bearing because the concepts are too unrelated.

score prop

Artifact

+1 temporal score

+25 Reach Score

Total Score: 54

+3 temporal score

Production Cycle

Figure 5.10: The score calculation of Score to Prop

The last example calculation is between Method Acting and Camera. In this case the two

concepts have a very high total score, 66. This indicates that these two terms are not related at

all, save that they are both part of the motion picture domain. This is reflected in the fact that

the method of getting to camera involves going though the central concept of motion picture.

Camera

Motion Picture

Acting

Method Acting

+2 reach score

+1 temporal score

Action is associated with Tool

Hardware

is associated with

+25 reach score

+5 reach score +5 reach score

+25 reach score

+2 reach score

+1 temporal score

Total Score: 66

Production Cycle

Figure 5.11: The score calculation of Method Acting to Camera

Figure 5.11 is however not the only method to get to Camera from Method Acting. Another

path exists through the agents and that demonstrates the need for proper weighting for

abstraction. This path, illustrated in Figure 5.12, involves going through Actor up the Agent

ontology inheritance hierarchy and then coming down through Cast. This leads to higher

score of 69. The agent part of the ontology groups many different agents together under a

handful of abstract concepts. Through these abstract concepts all agents are linked to each

other and to the concepts they in turn are closely. However, the jump between the abstract

concept of Technical Crew to an actually technical crew such as a cameraman is extremely

high, i.e. the number of properties inherited by the Cameraman concept from the Technical

Crew concept is extremely small. As a result, any concept to which the Cameraman concept

is connected through the Technical Crew concept cannot be said to be as related to the

Cameraman concept as concept to which the Cameraman concept is linked through other

means. Although in this case both paths through the ontology produced similar reach scores,

this will not always be the case. The reach score that will be used to determine relatedness

will always be the reach score determined from the minimum weighted path.

Camera

Acting

Method Acting

+2 reach score

+1 temporal score

Total Score: 69

+1 reach score

is performed by

People Agenttype

CameraOperator

is operated by

Technical Crew

+25 reach score

+5 reach score

+25 reach score

+2 reach score

+1 temporal score

Production Cycle

Figure 5.12: The score calculation of Method Acting to Camera through Agents

5.3 Abstraction

We have discussed how concepts share, or not share, properties along the inheritance

hierarchy and how certain concepts have more properties than other concepts. Concepts with

few properties are considered abstract and, because many concrete concepts can inherit from

a given abstract concept, the problem of abstraction in semantic relatedness is created. As

identified by Li, abstraction is a known problem in the measurement of semantic relatedness

[78]. As explained in Section 2.3.2, Li observed that certain concepts are more abstract than

others [78] in a hierarchical semantic knowledge base and if this abstraction is not taken into

account, concepts that are not actually closely related can appear closely related when a

measure of their relatedness is calculated [78].

Within existing measures of semantic relatedness, abstraction is a problem because of

hierarchical taxonomies where certain concepts are parents of many sub-concepts which

specialise the parent concept to a great extent. As we explained through the concept of

Action, the domain of motion picture has many types of Action. Editing is an Action, Acting is

an Action, Directing is an Action, Producing is an Action etc. However, they are all very

different type of actions and while they are all Actions, they do not share very much beyond

that. This makes Action a very abstract concept that must be accounted for properly.

The ability to account for the sharing for the difference in shared properties along the

inheritance hierarchy is necessary to avoid false assumptions of closeness. We believe that

our score-based measure is better able to reflect semantic relatedness in the presence of

abstraction. The key benefits being that under our system, any concept can be labelled as

abstract at the point of implementation and thus specifically accounted. It does not rely on

position within the trees or concentration of links, although it can work in conjunction with

any such measure, where such a measure is used to first determine a concept to be abstract

before applying the score. In this way it can help overcome some of the abstraction problem

that many face when calculating semantic relatedness.

What might need to change when our method is applied to other ontologies is the weight

assigned to edges. The weights were chosen based on the Loculus Ontology for the motion

picture industry and took into account the structure and axioms of the Loculus Ontology. For

other ontologies, the weight of connecting edges must reflect the structure of and axioms of

that ontology. The rules would still apply, but depending on the ontology, weights assigned

by those rules might have to change depending on the ontology to which the rules are

applied.

5.4 Application of the Metric

In this chapter so far we have presented the relatedness metric and have shown how the

metric has been applied to the ontology to determine the relatedness of pairs of ontological

concepts. In this section we will present how we plan to use the combination of the Loculus

ontology and the metric to answer the research questions posed in Chapter 3, specifically the

question - Can agent perspective be exploited to better serve the needs of the motion picture

Industry?

When an agent is involved, the relatedness metric yields the perspective of the agent. The

relationship can be expressed as: Perspectiveagent (concept) = distance (agent, concept).

As illustrated in Figure 5.13 by measuring the semantic relatedness distance between the

agent and the concept we can get an idea of how familiar or not familiar the agent is with a

certain concept. From the figure we see that an agent occupying the role of Editor has a very

close perspective of the concept Cross-Cutting, while an Actor has a far more distant

perspective. Cross-Cutting is an editing technique; an editor would know all about it and

exactly how to do it. An actor may only be familiar with the term and have some idea of what

it entails but their knowledge of the technique is far weaker than those of an editor.

Differences in perspective can have implications in the information needs of different agents.

Cross-cutting6 58

Editor

Figure 5.13: Agent Perspective

It must be noted that there is a difference between when the distance is measured between an

agent and a concept and between two non-agent concepts. The difference is that non-agent

concepts cannot have perspectives. As such when the relatedness is measured between two

non-agent concepts, it is not perspective but ontological distance. The ontological distance

between concepts and the perspective of agents need to be taken together to understand the

implications they have on the information-seeking behaviour of the agent involved.

One use of agent perspective is that it can approximate how precise or imprecisely worded

queries as pertaining to information exploitation are and whether a substitution is possible.

Figure 5.14 demonstrates the perspective of the Editor to four different concepts. Note the

distance of the Editor to each of the concepts and the distance of the concepts themselves to

each other. Even though Cross Cutting and Cutting on Action have similar semantic score to

that of Category and Genre have, the Editor has a closer perspective of Cross Cutting and

Cutting on Action than that of Category and Genre. This implies that the Editor comprehends

small differences between Cross Cutting and Cutting on Action. However, the Editor is less

likely to be overly concerned about the differences between Category and Genre; therefore

substituting one for the other may be appreciated by the Editor.

Category

Cross-cutting

cutting onactionEditor

Figure 5.14: Editor Perspective

The above conjecture leads us to formulate an axiom regarding the application of user driven

perspective for information extraction and classification. The axiom is given below.

[Perspective 1] If the distance between two concepts is sufficiently low, an agent who is

sufficiently distant from both concepts may consider the two concepts to be substitutable.

This axiom is a cornerstone of how the combination of the ontology and the relatedness

metric is to be used to meet the information needs of the motion picture industry.

However, the axiom also begs the question what it mean when the relatedness score of two

concepts is designated to be “sufficiently low” or when the distance between two concepts is

designated to be “sufficiently distant”. This is defined by the notion of Tolerance. We define

Tolerance as the maximum score the user is willing to tolerate when concept substitutions

are suggested. Also, the term Tolerance applies to the minimum distance the user needs to be

before they can be considered sufficiently distant from a concept. Tolerance is something

that is implemented at the system level and we did just that with our system. We will discuss

Tolerance and its use at the system level in Section 6.6.1.

5.4.1 Information Extraction

Perspective is mostly evident in the extraction of information based on a given query,

because a user formulates a query based on, per the dictionary definition of perspective, the

state of their ideas and the facts known to them. Similarly the user wants an answer framed

from that perspective as well. To that end, for information extraction, perspective can be used

in three ways: to expand the parameters of the query, clarification of ambiguity in the query

and lastly, to determine the relevance of a given piece of information.

The expansion of the parameters of the query becomes desirable when the user mentions a

concept in their query from which the user is sufficiently distant so that other concepts can be

considered to be substitutable to the required concept. For example, an editor submits the

query “Return me examples of Cross-cutting in motion pictures of the genre dark comedy”.

Referring back to Figure 5.14, the Editor has a near view of the concept Cross-cutting, as

illustrated in Figure 5.15, so there is no point in retrieving examples of any other editing

technique. On the other hand, the Editor has a sufficiently distant relationship from Genre so

that some substitution may be possible.

editing

action

editor

inherits from

is performed by

cross-cutting

cutaway cutback

jointly inheritance from

Figure 5.15: Ontology Extract – Editor and cross-cutting

As illustrated in Figure 5.16, if a motion picture has a Genre of “comedy” and has “dark” in

its Category information, then the Genre and the Category can be taken together and the

motion picture presented as a candidate that might satisfy the editor’s query. The central idea

here is that the editor might not realize that there is no such genre as “dark comedy”, but

there is a Genre “comedy” and Tone “dark” (Tone being a child of the concept Category). On

the other hand, a Distribution Manager would understand this distinction and could

reformulate the query themselves if they really wanted to expand the search.

Disjointed With

genrepost

production

discovery

Category

total score: 10

EmotiveCategory

Emotion

+1 temporal score

involves concept

+2 reach score

Criteria basedCategory

+2 reach score

Rating

Criteriainvolves concept

+2 reach score

+1 temporal score

reuse/re purpose

Life Stage

+1 temporal score

Production Cycle

Figure 7.2: The Score Calculation of Mood to Rating (Reusing Figure 5.8)

At first glance, the problem seems to be the improper abstraction of the concept of Category

and this was an error in judgement on our part that is obvious when we look at the children of

the Category. While emotive and criteria based category are both types of category they are

completely disjointed and are modelled as such in the ontology as shown in Figure 7.3. This

suggests that a rule is necessary to reflect the disjointedness of concepts, such as Emotive

Category and Criteria-based Category. However, the matter is not quite that simple because

the predicted score for Mood and Genre did correspond with those selected by the survey

participants in that the majority of the participants agreed that the relationship between Mood

and Genre were close, the majority gave it a score of 1 on the scale of 1 to 5. This is

interesting because Mood and Genre, is the same distance away from Mood as Rating, as

shown in Figure 7.3. This adds another dimension to this issue and might actually be an

indication that both the ontology and the calculating rules are fine and the anomaly is actually

due to loss of context because we simply presented the word “rating”.

Disjointed With

genrepost

production

discovery

loculus: an ontology-based i nformation management...

Documents

ontology and information systems - buffalo ontology...

secondary school course completion certificate ›...

ocm ontology and ontology services

i nformation visualization session 13

who d rugi nformation

december 1997 nformation forestry

augmented systems in-ontology based intelligentl...

copyright © 2004-2005, kisti i nformation s ystem r esearch...

i nformation literacy a ssessment

ontology engineering: ontology use

eeeeee enrollment i nformation

management i nformation s ystem for

corporate nformation - moneycontrol

eographic nformation ystems - learnforests.org

ontology engineering: ontology evaluation

la stanza logo-motoria. un ambiente multimodale...

accountin nformation

ase nformation ate ase

m k s nformation - jannaf

ontology engineering: ontology construction i