vineet gandhi, vaishnavi ameya murukutla, arxiv:1508 ... · the prose storyboard language a tool...

18
The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD * , Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, France VINEET GANDHI, International Institute of Information Technology LAURENT BOIRON, Weta Digital VAISHNAVI AMEYA MURUKUTLA, Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, France ACM Reference format: Remi Ronfard, Vineet Gandhi, Laurent Boiron, and Vaishnavi Ameya Murukutla. 2016. e Prose Storyboard Language. 1, 1, Article 1 (January 2016), 18 pages. DOI: 10.1145/nnnnnnn.nnnnnnn 1 INTRODUCTION In movie production, directors oen use a semi-formal idiom of natural language to convey the shots they want to their cinematographer. Similarly, lm scholars use a semi-formal idiom of natural language to describe the visual composition of shots in produced movies to their readers. In order to build intelligent and expressive virtual cinematography and editing systems, we believe the same kind of high-level descriptions need to be agreed upon. In this paper, we propose a formal language that can serve that role. Our primary goal in proposing this language is to build soware cinematography agents that can take such formal descriptions as input, and produce fully realised shots as an output [19, 39]. A secondary goal is to perform in-depth studies of lm style by automatically analysing real movies and their scripts in terms of the proposed language [38, 40]. e prose storyboard language is a formal language for describing shots visually. We leave the description of the soundtrack for future work. e prose storyboard language separately describes the spatial structure of individual movie frames (compositions) and their temporal structure (shots). In lm analysis, there is a frequent confusion between shots and compositions. A medium shot describes a composition, not a shot. If the actor moves towards the camera in the same shot, the composition will change to a close shot and so on. erefore, a general language for describing shots cannot be limited to describing compositions such as medium shot or close shot but should also describe screen events which change the composition during the shot. In the prose storyboard language, each shot is a complete sentence with at least one composition and an arbitrary number of screen events. is oers unprecedented expressive power for describing and directing movies. * is is the corresponding author Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permied. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. © 2016 ACM. Manuscript submied to ACM Manuscript submied to ACM 1 arXiv:1508.07593v3 [cs.GR] 13 Dec 2019

Upload: others

Post on 22-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

The Prose Storyboard Language

A Tool for Annotating and Directing Movies

REMI RONFARD∗, Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, France

VINEET GANDHI, International Institute of Information Technology

LAURENT BOIRON, Weta Digital

VAISHNAVI AMEYA MURUKUTLA, Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, France

ACM Reference format:Remi Ronfard, Vineet Gandhi, Laurent Boiron, and Vaishnavi Ameya Murukutla. 2016. �e Prose Storyboard Language. 1, 1, Article 1(January 2016), 18 pages.DOI: 10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTION

In movie production, directors o�en use a semi-formal idiom of natural language to convey the shots they want to theircinematographer. Similarly, �lm scholars use a semi-formal idiom of natural language to describe the visual compositionof shots in produced movies to their readers. In order to build intelligent and expressive virtual cinematographyand editing systems, we believe the same kind of high-level descriptions need to be agreed upon. In this paper, wepropose a formal language that can serve that role. Our primary goal in proposing this language is to build so�warecinematography agents that can take such formal descriptions as input, and produce fully realised shots as an output[19, 39]. A secondary goal is to perform in-depth studies of �lm style by automatically analysing real movies and theirscripts in terms of the proposed language [38, 40].

�e prose storyboard language is a formal language for describing shots visually. We leave the description of thesoundtrack for future work. �e prose storyboard language separately describes the spatial structure of individualmovie frames (compositions) and their temporal structure (shots).

In �lm analysis, there is a frequent confusion between shots and compositions. Amedium shot describes a composition,not a shot. If the actor moves towards the camera in the same shot, the composition will change to a close shot and soon. �erefore, a general language for describing shots cannot be limited to describing compositions such as medium

shot or close shot but should also describe screen events which change the composition during the shot. In the prosestoryboard language, each shot is a complete sentence with at least one composition and an arbitrary number of screenevents. �is o�ers unprecedented expressive power for describing and directing movies.

∗�is is the corresponding author

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are notmade or distributed for pro�t or commercial advantage and that copies bear this notice and the full citation on the �rst page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting with credit is permi�ed. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior speci�c permission and/or a fee. Request permissions from [email protected].© 2016 ACM. Manuscript submi�ed to ACM

Manuscript submi�ed to ACM 1

arX

iv:1

508.

0759

3v3

[cs

.GR

] 1

3 D

ec 2

019

Page 2: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

2 R. Ronfard et al.

Our language can be used indi�erently to describe shots in pre-production (when the movie only exists in thescreen-writer and director’s minds), during production (when the camera records a continuous ”shot” between the timeswhen the director calls ”camera” and ”cut”), in post-production (when shots are cut and assembled by the �lm editor) orto describe existing movies. �e description of a entire movie is an ordered list of sentences, one per shot. Exceptionally,a movie with a single shot, such as Rope by Alfred Hitchcock, can be described with a single, long sentence.

In this paper, we assume that all shot descriptions are manually created. We leave for future work the importantissue of automatically generating prose storyboards from existing movies, where a number of existing techniques canbe used [7, 12, 20, 23]. We also leave for future work the di�cult problems of automatically generating movies fromtheir prose storyboards, where existing techniques in virtual camera control can be used [9, 10, 16, 17, 19, 24, 26].

2 PRIOR ART

Our language is loosely based on existing practices in movie-making [45, 46] and previous research in the history of �lmstyle [5, 42]. Our language is also related to the common practice of the graphic storyboard. In a graphic storyboard,each composition is illustrated with a single drawing. �e blocking of the camera and actors can be depicted with aconventional system of arrows within each frame, or with a separate set of �oor plan views, or with titles betweenframes.

In our case, the transitions between compositions use a small vocabulary of screen events including camera actions(pan, dolly, crane, lock, continue) and actor actions (speak, react, move, cross, use, touch). Although the vocabularycould easily be extended, we voluntarily keep it small because our focus in this paper is restricted to the blocking ofactors and cameras, not the high-level semantics of the narrative.

We borrow the term prose storyboard from Proferes [36] who used it as a technique decomposing a �lms script intoa sequence of shots, expressed in natural language �e name catches the intuition that the language should directlytranslate to images. In contrast to Proferes, our prose storyboard language is a formal language, with a well de�nedsyntax and semantics, suitable for future work in intelligent cinematography and editing.

Our proposal is complementary to the Movie Script Markup Language (MSML) [37], which encodes the structure ofa movie script. In MSML, a script is decomposed into dialogue and action blocks, but does not describe how each blockis translated into shots. Our prose storyboard language can be used to describe the blocking of the shots in a movie inrelation to an MSML-encoded movie script.

Our proposal is also related to the Declarative Camera Control Language (DCCL) which describes �lm idioms, not interms of cameras in world coordinates but in terms of shots in screen coordinates [9]. �e DCCL is compiled into a �lmtree, which contains all the possible editings of the input actions, where actions are represented as subject-verb-objecttriples. Our prose storyboard language can be used in coordination with such constructs to guide a more extensive setof shot categories, including complex and composite shots.

Our approach is also related to the work of Jhala and Young who used the movie Rope by Alfred Hitchcock todemonstrate how the story line and the director’s goal should be represented to an automatic editing system [26]. �eyused Crossbow, a partial order causal link planner, to solve for the best editing, according to a variety of strategies,including maintaining tempo and depicting emotion. �ey demonstrated the capability of their solver to present thesame sequence in di�erent editing styles. But their approach does not a�empt to describe the set of possible shots. Ourprose storyboard language a�empts to �ll that gap.Manuscript submi�ed to ACM

Page 3: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

�e Prose Storyboard Language 3

Other previous work in virtual cinematography [15, 16, 19, 25, 27, 31, 35, 44] has been limited to simple shots witheither a static camera or a single uniform camera movement. Our prose storyboard language is complementary to suchprevious work and can be used to de�ne higher-level cinematic strategies, including arbitrarily complex combinationsof camera and actor movements, for most existing virtual cinematography systems.

Other related work includes Ye and Baldwin [50] who describe method for generating storyboards from moviescripts; Marti et al. [32] who describe methods for generating previz animation, also from movie scripts.

Text-to-movie (or text-to-scene) authoring is a general class of methods that have been proposed for automaticallygenerating 3D graphics and animation from natural language text. Good results have been obtained in limited domains,such as generating 3D scenes from natural language accident reports [1] or generating cartoon animation from scripteddialogue scenes [43]. Commercially available text-to-movie systems include NawmalMAKE 1 and Plotagon Studio2. Such systems use marked-up dialogues as input, and generate simple shots with minimal camera movements andediting.

Closer to our approach, Director Notation [49] is a symbolic language intended to express the content of �lm (motionpictures), much as notes provide a language for the writing of music. But DN is a graphical notation whereas PSL is apseudo-natural language, and DN describes the movie production process, whereas PSL describes the movie itself, as ina storyboard. TIMISTO [47] and SLAP [6] are pa�ern languages for creating animation from storyboards. Finally, thevideo language [2, 13] is a special-purpose language for scripting repetitive tasks in video editing wri�en in the Racketprogramming language [14]. �e video language operates at the level of video �les and frames, and does not include adescription of the shot content, which we are targe�ing here.

A previous version of the prose storyboard language was presented at the international workshop on intelligencecinematography and editing (WICED) in 2012. Since then, PSL has been used for generating cinematic replays in seriousgames [19], for generating synthetic complex shots from live video material [21, 22], for staging complex scenes in 3Danimation [29], for directing cinematographic drones [18] and for learning �lm editing pa�erns from examples [48].

3 REQUIREMENTS

�e prose storyboard language is designed to be expressive, i.e. it should describe arbitrarily complex shots, while at thesame being compact and intuitive. Our approach has been to keep the description of simple shots as simple as possible,while at the same time allowing for more complex descriptions when needed. �us, for example, we describe actors in acomposition from le� to right, which is an economical and intuitive way of specifying relative actor positions in mostcases. As a result, our prose storyboard language is very close to natural language (see Fig.21).

It should be easy to parse the language into a non ambiguous semantic representation that can be matched tovideo content, either for the purpose of describing existing content, or for generating novel content that matches thedescription. It should therefore be possible (at least in theory) to translate any sentence in the language into a sketchstoryboard, then to a fully animated sequence.

It should also be possible (at least in theory) to translate existing video content into a prose storyboard. �is putsanother requirement on the language, that it should be possible to describe existing shots just by watching them. �ereshould be no need for contextual information, except for place and character names. As a result, the prose storyboardlanguage can also be used as a tool for annotating complete movies and for logging shots before post-production. Since

1h�ps://www.nawmal.com/2h�ps://plotagon.com/

Manuscript submi�ed to ACM

Page 4: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

4 R. Ronfard et al.

Scene

Shot

MLS � 34right

Composition

cut to

Transition

Shot

low angle FS Plane �ying MS � front running

Composition

Fig. 1. Prose storyboard language description of two iconic shots in Al�red Hitchcock’s North by Northwest

the annotator has no access to the details of the shooting plan, even during post-production [33, 34], we must thereforemake it possible to describe the shots in screen coordinates, without any reference to world coordinates.

4 SYNTAX AND SEMANTICS

�e prose storyboard language is a context-free language, whose terminals include generic and speci�c terms. Genericterminals are used to describe the main categories of screen events including camera actions (pan, dolly, cut, dissolve,etc.) and actor actions (enter, exit, cross, move, speak, react, etc.). Speci�c terminals are the names of characters, placesand objects that compose the image and play a part in the story. Non-terminals of the language include importantcategories of shots (simple, complex, composite), sub-shots and image compositions. �e complete grammar for thelanguage is presented in Fig.16 in extended BNF notation and further illustrated with the AND/OR graph in Fig. 15.

�e semantics of the prose storyboard language is best described in terms of a Timed Petri Net (TPN) where actorsand objects are represented as places describing the composition ; and screen events (such as cuts, pans and dollies) arerepresented as Petri net transitions. TPNs have been proposed for representing the temporal structure of movie scripts[37], character animation [4, 30], game authoring [3], turn-taking in conversation [8] and synchronisation and storagemodels for multimedia systems [28]. Our model di�ers from the Object Composition Petri Net (OCPN) [28] by havingnon instantaneous �ring at transitions.

5 IMAGE COMPOSITION

Image composition is the way to organise visual elements in the motion picture frame in order to deliver a speci�cmessage to the audience. In our work, we propose a formal way to describe image composition in terms of the actorsand objects present on the screen and the spatial and temporal relations between them.

Following �omson and Bowen [46], we de�ne a composition as the relative position and orientation of visualelements called Subject, in screen coordinates.Manuscript submi�ed to ACM

Page 5: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

�e Prose Storyboard Language 5

(a) Shot sizes (b) Pro�le angle

Fig. 2. (a) shows shot sizes in the prose storyboard (reproduced from [41]). (b) shows the profile angle of an actor defines hisorientation relative to the camera. For example, an actor with a le� profile angle is oriented with his le� side facing the camera.

(a) Basis for shot size (b) Screen coordinates

Fig. 3. Shot size is a function of the distance between the camera and actors,as well as the camera focal length,as seen in (a). (b)shows the horizontal placement of actors in a composition is expressed in screen coordinates.

In the simple case of �at staging, all subjects are more or less in the same plane with the same size. �us, we candescribe this �at composition as follows:

<S i z e > on <S u b j e c t > [< P r o f i l e >][< Screen >]{ and <S u b j e c t > [< P r o f i l e >][< S c r ee n ] } ∗

where Size is one of the classical shot size illustrated in Fig.2(a) and Subject is generally an actor name. �e Pro�le

and Screen terms describe respectively the orientation and the position of the subject in screen space. See full grammarin Fig.16 for more details.

Manuscript submi�ed to ACM

Page 6: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

6 R. Ronfard et al.

(a) low angle ECU Girl 34le�

Fig. 4. Single actor composition in Prose Storyboard Language. This frame from Brian De Palma’s Dressed to Kill(1980) shows anextreme close up shot of the Girl from a low angle.

(a) FS Cyd 34right Fred front (b) MS Girl 34backright Ferdinand front

Fig. 5. Two actor composition in Prose Storyboard Language. Compositions from Vincente Minnelli’s 1953 musical, The Band Wagon (a)and Jean-Luc Godard’s 1965 French New Wave film, Pierrot le Fou (b) feature two actors in a frame at di�erent sizes.

In the case of deep staging, di�erent subjects are seen at di�erent sizes, in di�erent planes. We therefore describesuch shot with a stacking of �at compositions as follows:

<F l a t C o m p o s i t i o n > { , <F l a t C o m p o s i t i o n >}∗

Manuscript submi�ed to ACM

Page 7: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

�e Prose Storyboard Language 7

(a) ECU Scissors MCU Marianne front (b) CU Sanchez hands front as hands hold Bomb

Fig. 6. Composition with inanimate objects in Prose Storyboard Language. (a) and (b) show the inclusion of inanimate objects in thecomposition. Both the objects in the compositions, a scissors from Pierrot le Fou and the bomb from Orson Welles’s Touch of Evil, playimportant roles in the movies and justify their inclusion in the composition.

(a) MLS Father 34right screen le� ELS Kane 34le� screen centerMS �atcher Mother 34le� screen right

(b) MCU Eve 34le� MS �ornhill 34right Vandam 34le� Leonard 34backle�

Fig. 7. Complex composition with multiple actors in Prose Storyboard Language. The frames (a) from Orson Welles’s 1941 Citizen Kaneand from (b) Alfred Hitchcock’s 1959 North by Northwest show multiple actor compositions at di�erent distances from the camera.They are described from le� to right with their sizes indicating their depth in the composition.

As a convention, we assume that the subjects are described from le� to right. �is means that the le�-to-rightordering of actors and objects is part of the composition. Indeed, because the le�-to-right ordering of subjects is soimportant in cinematography and �lm editing, we introduce a special keyword cross for the screen event of one actorcrossing over or under another actor.

Shot sizes are used to describe the relative sizes of actors independently of the camera lens as illustrated in Fig.3(a).�e Screen term describe the subject position in screen coordinate. It allows to slightly modify the generic framing

by shi�ing the subject position to the le� or right corner as shown in Fig.3(b). �us, we can describe a more harmoniouscomposition with respect to the head room and look room or the rules of thirds. We can also describe unconventionalframing to create unbalanced artistic composition or to show other visual elements from the scene.

Manuscript submi�ed to ACM

Page 8: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

8 R. Ronfard et al.

6 SHOT DESCRIPTIONS

A shot is a sequence of frames over a continuous time period. For a cohesive and coherent narration the individualshots have to be joined in a manner so as to allow the spectator to mentally recreate the story [11]. Transition describesthe progression of a shot to the subsequent one or the starting and ending of a shot. In our model we include two of themost widely used transition techniques:cut and fade. We use the simplest form of cut transition in which the two shotsare played one a�er another. We also use the same notation to describe other types of cuts, such as cutaways in whichshot A is followed by a intermi�ent shot with a di�erent composition and then returns back to shot A. (North by Northwest example.. In this we describe the example as three di�erent sentences). Fades are used to describe the entry or exitof a shot in which the composition either slowly appears or disappears respectively.

�e key feature of the language is that all shots are self contained entities. Starting from a transition to the �nalcomposition, including camera and script actions, each prose storyboard sentence is independent of the previous or thenext shot. A shot will not be set up by the one the preceding one. �is allows users to work on sections of the movie oron a per shot basis.

Based on the taxonomy of shots proposed by �omson and Bowen [46], our prose storyboarding language distin-guishes three main categories of shots :

• A simple shot is taken with a camera that does not move or turn. Any change in the composition is from themovements of the actors in relation to the camera.

• A complex shot is taken with a camera that has movements around a �xed point such as pan, tilt and zoom.We introduce camera actions pan and zoom to describe such movements. �us the camera can pan le� andright, up and down (as in a tilt) and zoom in and out.

• A composite shot is taken with a moving camera. We introduce two camera actions (dolly and crane) to describethese shots. Pan and zoom are allowed during dolly and crane movements thereby creating interesting visuale�ects.

For each of the three cases we propose a simpli�ed model of a shot which consists of a sequence of compositions,cues and screen events. Screen events can be actions of the camera relative to the actors, or actions of the actors relativeto the camera. Screen events come in two main categories - those which change the composition (events-to-composition)and those which maintain the composition (events-with-composition).

In our model, we can be in only one of four di�erent states:

(1) Camera does not move and composition does not change.(2) Camera does not move and composition changes due to actor movements.(3) Camera moves and composition does not change(4) Camera moves and composition changes.

In case (1), the shot can be described with a single composition. In case (2), we introduce the special verb continue

to indicate that the camera remains static while the actors move, leading to a new composition. In case (3), we usethe constructions pan with, dolly with and crane with to indicate how the camera moves to maintain the composition.In case (4), we use the constructions pan to, dolly to and crane to to indicate how the cameras moves to change thecomposition. �e actor movements that change the composition are stated in the cue.Manuscript submi�ed to ACM

Page 9: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

�e Prose Storyboard Language 9

Shot

Development

Recomposition

CU Mar 34le� Ge le�

Composition

Reframe

continue to

Rename

then as Mar appears screen center

Cue

CU Ge le�

Composition

Fig. 8. Simple shot with actor movement in Back to the future

Shot

Development

Recomposition

MS Mar 34right Ge front Go 34le�

Composition

Reframe

dolly right to

CameraMov

then as Go crosses under Ge

Cue

MS Mar Go 34right Ge 34le�

Composition

Fig. 9. Composite shot in Back to the future

Manuscript submi�ed to ACM

Page 10: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

10 R. Ronfard et al.

Shot

Development

Recomposition

MS � 34backright

Composition

Reframe

continue to

Rename

then as � turns away from camera

Cue

MS � 34 right

Composition

Fig. 10. Simple shot: with actor movement in North by Northwest

Shot

Development

Recomposition

high angle LS Mi 34backle�

Composition

Reframe

pan le� to

CameraMov

then as Mi moves

Cue

Development

Recomposition

high angle LS Mi Pat le�

Composition

Reframe

pan le� to

CameraMov

then as Pat moves

Cue

high angle LS Pat 34backle�

Composition

Fig. 11. Composite shot in Breathless

7 EXPERIMENTAL EVALUATION

As a validation of the proposed language, we have manually annotated extended scenes from existing movies coveringmany di�erent styles and periods. �e cafe scene in Back to the future (1985) focuses on dialogues between 8 characters.Manuscript submi�ed to ACM

Page 11: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

�e Prose Storyboard Language 11

Shot

Development

Recomposition

FS Dan back pedalling

Composition

Reframe

dolly with

CameraMov

then

Cue

FS Dan back pedalling

Composition

Fig. 12. Composite shot: Dolly with actor in The Shining

We are making the prose storyboard for all 29 shots in the entire scene available for future reference. Rope (1948),a single shot movie by Alfred Hitchcock, shows dialogue between 10 characters using precise blocking and cameramovements rather than employing cuts. �is is challenging example for annotation, and we show examples from twoextended sequences fully annotated with PSL. Results are shown in Fig. 21.

We also annotated the crop duster a�ack scene from North by Northwest (1959) to highlight the versatility of thelanguage in describing a scene with non human actors in an outdoor environment. In that scene the intent of thepilot is personi�ed in the movements of the plane. We annotated all 131 shots in this virtuoso scene with their prosestoryboards, to illustrate the variety of shots used in this mostly silent scene. Finally we annotated the long establishingshot from Orson Welles’s Touch of Evil (1958) which shows a wide variety of camera movements interlaced withmeticulously planned choreography for the characters resulting in a rich and dynamic visual composition. Despite thecomplexity of these scenes, we show that the prose storyboard is fairly simple to read and easy to generate.

Our results also illustrate how default values are used to describe simple cases. For instance, default values fora two-shot are supplied by assuming that the two actors are both facing the camera and placed at one-third andtwo-third of the screen. Such default values are stored in a style sheet, which can be used to accommodate di�erentcinematographic styles, i.e. television vs. motion pictures, or musical vs. �lm noir.

Manuscript submi�ed to ACM

Page 12: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

12 R. Ronfard et al.

Shot

Development

Recomposition

as Sh moves to screen top

Cue

high angle ELS Sh back

Composition

Reframe

crane up to

CameraMov

then

Cue

MS Sh le�

Composition

Fig. 13. Composite shot: Crane up in High noon

8 CONCLUSION

We have presented a language for describing the spatial and temporal structure of movies with arbitrarily complex shots.�e language can be extended in many ways, e.g. by taking into account lens choices, depth-of-�eld and lighting. Futurework will be devoted to the dual problems of automatically generating movies from prose storyboards in machinimaenvironments, and automatically describing shots in existing movies. We are also planning to extend our frameworkfor the case of stereoscopic movies, where image composition needs to be extended to include the depth and disparityof subjects in the composition. We believe that the proposed language can be used to extend existing approaches inintelligent cinematography and editing towards more expressive strategies and idioms and bridge the gap between realand virtual movie-making. In future work, it would be interesting to extend the language even further for the case ofpanoramic video and virtual reality movies.

REFERENCES[1] O. Akerberg, H. Svensson, B. Schulz, and P. Nugues. Carsim: An automatic 3d text-to-scene conversion system applied to road accident reports. In

Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - Volume 2, EACL ’03, pages 191–194, 2003.[2] L. Andersen, S. Chang, and M. Felleisen. Super 8 languages for making movies (functional pearl). Proc. ACM Program. Lang., 1(ICFP):30:1–30:29,

Aug. 2017.[3] D. Balas, C. Brom, A. Abonyi, and J. Gemrot. Hierarchical petri nets for story plots featuring virtual humans. In AIIDE, 2008.[4] L. Blackwell, B. von Konsky, and M. Robey. Petri net script: a visual language for describing action, behaviour and plot. In Australasian conference

on Computer science, ACSC ’01, 2001.[5] D. Bordwell. On the History of Film Style. Harvard University Press, 1998.

Manuscript submi�ed to ACM

Page 13: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

�e Prose Storyboard Language 13

Shot…

Development

Recomposition

MS M right LS Kane 34backle� MS � le�

Composition

Reframe

dolly out to

CameraMov

then as M enters screen le�

Cue

LS Kane playing

Composition

Development

Recomposition

LS F MS M

Composition

Reframe

dolly out to

CameraMov

then as F enters screen le� and M crosses over �

Cue

Development

Recomposition

MLS F 34right ELS Kane 34le� MS � M 34le�

Composition

Reframe

pan down to

CameraMov

then as � F moves to M

Cue

Fig. 14. Composite shot with multiple actors in Citizen KaneManuscript submi�ed to ACM

Page 14: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

14 R. Ronfard et al.

Fig. 15. AND-OR tree representation of the Prose Storyboard Language grammar.

[6] P. H. C. Braga and I. F. Silveira. Slap: Storyboard language for animation programming. IEEE Latin America Transactions, 14(12):4821–4826, Dec2016.

[7] M. Brand. �e ”inverse hollywood problem”: From video to scripts and storyboards via causal analysis. In AAAI/IAAI, pages 132–137, 1997.[8] C. Chao. Timing multimodal turn-taking for human-robot cooperation. In Proceedings of the 14th ACM international conference on Multimodal

interaction, ICMI ’12, pages 309–312, New York, NY, USA, 2012. ACM.[9] D. B. Christianson, S. E. Anderson, L. wei He, D. H. Salesin, D. S. Weld, and M. F. Cohen. Declarative camera control for automatic cinematography.

In AAAI, 1996.[10] M. Christie and P. Olivier. Camera control for computer graphics. In Eurographics State of the Art Reports, Eurographics 2006. Blackwell, 2006.[11] J. E. Cu�ing. Narrative theory and the dynamics of popular movies. In Psychonomic Bulletin & Review, 2016.[12] R. Dony, J. Mateer, and J. Robinson. Techniques for automated reverse storyboarding. IEE Journal of Vision, Image and Signal Processing,

152(4):425–436, 2005.[13] M. Felleisen, R. B. Findler, M. Fla�, S. Krishnamurthi, E. Barzilay, J. McCarthy, and S. Tobin-Hochstadt. A programmable programming language.

Commun. ACM, 61(3):62–71, Feb. 2018.[14] M. Fla�. Creating languages in racket. Commun. ACM, 55(1):48–56, Jan. 2012.[15] D. Friedman and Y. A. Feldman. Automated cinematic reasoning about camera behavior. Expert Syst. Appl., 30(4):694–704, May 2006.[16] Q. Galvane, M. Christie, C. Lino, and R. Ronfard. Camera-on-rails: Automated Computation of Constrained Camera Paths. In ACM SIGGRAPH

Conference on Motion in Games, pages 151–157, Paris, France, Nov. 2015. ACM.[17] Q. Galvane, M. Christie, R. Ronfard, C.-K. Lim, and M.-P. Cani. Steering Behaviors for Autonomous Cameras. In MIG 2013 - ACM SIGGRAPH

conference on Motion in Games, MIG ’13 Proceedings of Motion on Games, pages 93–102, Dublin, Ireland, Nov. 2013. ACM.Manuscript submi�ed to ACM

Page 15: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

�e Prose Storyboard Language 15

Shot = Transit ion ? Composition? ( Development ) ∗Development = Cue? Recomposition ?Recomposition = Reframe? Composition? Cue?Reframe = Rename/CameraMovCue = After / As / ThenAfter = ” then ”? ” a f t e r ” Time? Event ?As = ” then ”? ” as ” Event ( ” and”? Event ) ∗Then = ” then” Event ? ( ” and”? Event ) ∗Composition = Angle ? Figure ( Angle ? Figure ) ∗Figure = S ize ? Subject Prof i l e ? Screen ? S ta te ?Subject = ( Actor / Object / Place ) ( Actor / Object / Place ) ∗Rename = ( ” keep” / ” continue ” ) ” to ”CameraMov = Camera ( ” to ” / ”with ” ) ?Camera = Speed ? ( Pan / Dolly / Crane / Zoom)Pan = ”pan” ( ” l e f t ” / ” r ight ” / ”up” / ”down” ) ?Dolly = ” dolly ” ( ” l e f t ” / ” r ight ” / ” in ” / ” out ” ) ?Crane = ” crane ” ( ” up” / ”down” ) ( ” l e f t ” / ” r ight ” ) ?Zoom = ”zoom” ( ” in ” / ” out ” )Speed = ” slow” / ” quick ” / ( ” following ” Subject )Transit ion = ( ” cut to ” / ” d i s so lve to ” / ” fade in to ” )Enter = ” enters ” Screen ? Place ?Exit = ” ex i t s ” Screen ? Place ?Look = ” looks ” ” at ”? Subject ? Screen ?Move = ”moves” ” to ” ( Screen / Subject )Speak = ( ” speaks ” / ( ” says ” Str ing ) ) ( ” to ” Subject ) ?Use = ” uses ” ObjectCross = ” crosses ” ( ” over ” / ”under ” ) Subject ( Subject ) ?Touch = ” touches ” SubjectReact = ” reac t s to ” SubjectTurn = ” turns ” ( ” l e f t ” / ” r ight ” / ” towards camera ” / ” away from camera ” )Stop = ” stops ” ( ” at ” / ”near ” ) SubjectAngle = ”low angle ” / ”high angle ”S ize = ”ECU” / ”CU” / ”MCU” / ”MS” / ”MLS” / ”FS” / ”LS” / ”ELS”Prof i l e = ” l e f t ” / ” r ight ” / ” front ” / ”back” / ”34 l e f t ” / ”34 r ight ”

/ ”34 backlef t ” / ”34 backright ”Screen = ” screen ” ( ” top ” / ”bottom ” ) ? ( ” l e f t ” / ” center ” / ” r ight ” ) ?Time = ˜ r ”[0 −9] ∗” i ( ” seconds ” / ” s ” )S tr ing = ˜ r ”[A−Z 0−9]∗” iActor = From Scr ip tAction = Look / Move / Speak / Use / Cross / Touch / React

/ Enter / Exit / Turn / StopSta te = ” looking ” / ”moving” / ” speaking ” / ” using ” / ” cross ing ” / ” touching ”

/ ” react ing ” / ” entering ” / ” ex i t ing ” / ” turning ”Event = Subject ActionObject = From Scr ip tPlace = From Scr ip tspace = ˜ r ”\ s ”

= ( space ) ∗

Fig. 16. EBNF grammar of the prose storyboard language.

Manuscript submi�ed to ACM

Page 16: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

16 R. Ronfard et al.

Actor = ” Thornhill ” / ”MBS” / ”CD” / ”TD1” / ”TD2” / ”Farmer” /”BC Driver ” / ”BC Woman” / ”BC Man”

Object = ”Bus” / ”White car ” / ”Limo” / ”Truck” / ”Blue car ” /”Green bus” / ” Bluewhite car ” / ” Oil truck ” / ”Pickup”/ ”Brown car ”

Place = ”Arid plot 1” / ”Arid plot 2” / ”Arid plot 3” /”Corn f i e l d ” / ”Highway” / ” Dirt road ”

Fig. 17. Script elements for North by Northwest.

Actor = ”Brandon” / ” Phi l ip ” / ”Atwater ” / ” Janet ” /”Kentley ” / ”Kenneth” / ”Rupert ” / ”Wilson”

Object = ”Glass ”Place = ” Salon ” / ”Dining room” / ”Kitchen ”

Fig. 18. Script elements for Rope.

Actor = ”Marty” / ”George” / ” Bi f f ” / ”Lou” / ”Goldie ”/ ”Match” / ” Skinhead” / ”hands” / ”3D”

Object = ”Coffee ” / ”Bar” / ”Car”Place = ”Cafe ”

Fig. 19. Script elements for Back to the future.

Actor = ”Kane” / ”Mother” / ”Thatcher ” / ” Father ”Object = ”Papers ” / ”Window”Place = ”Outside ”

Fig. 20. Script elements for Citizen Kane.

[18] Q. Galvane, C. Lino, M. Christie, J. Fleureau, F. Servant, F.-l. Tariolle, and P. Guillotel. Directing cinematographic drones. ACM Trans. Graph.,37(3):34:1–34:18, July 2018.

[19] Q. Galvane, R. Ronfard, M. Christie, and N. Szilas. Narrative-Driven Camera Control for Cinematic Replay of Computer Games. In MIG’14 - 7thInternational Conference on Motion in Games , pages 109–117, Los Angeles, United States, Nov. 2014. ACM.

[20] V. Gandhi and R. Ronfard. Detecting and naming actors in movies using generative appearance models. In CVPR, 2013.[21] V. Gandhi and R. Ronfard. A Computational Framework for Vertical Video Editing. In 4th Workshop on Intelligent Camera Control, Cinematography

and Editing, pages 31–37, Zurich, Switzerland, May 2015. Eurographics, Eurographics Association.[22] V. Gandhi, R. Ronfard, and M. Gleicher. Multi-Clip Video Editing from a Single Viewpoint. In CVMP 2014 - European Conference on Visual Media

Production, page Article No. 9, London, United Kingdom, Nov. 2014. ACM.[23] D. B. Goldman, B. Curless, D. Salesin, and S. M. Seitz. Schematic storyboarding for video visualization and editing. ACM Trans. Graph., 25(3):862–871,

2006.[24] L.-w. He, M. F. Cohen, and D. H. Salesin. �e virtual cinematographer: a paradigm for automatic real-time camera control and directing. In

Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, SIGGRAPH ’96, pages 217–224, New York, NY, USA, 1996.ACM.

[25] A. Jhala and R. M. Young. A discourse planning approach to cinematic camera control for narratives in virtual environments. In AAAI, 2005.[26] A. Jhala and R. M. Young. Representational requirements for a plan based approach to automated camera control. In AIIDE’06, pages 36–41, 2006.[27] M. Leake, A. Davis, A. Truong, and M. Agrawala. Computational video editing for dialogue-driven scenes. ACM Trans. Graph., 36(4):130:1–130:14,

July 2017.Manuscript submi�ed to ACM

Page 17: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

�e Prose Storyboard Language 17

Fig. 21. Prose storyboard language annotations of two extended sequences from the movie Rope.

Manuscript submi�ed to ACM

Page 18: VINEET GANDHI, VAISHNAVI AMEYA MURUKUTLA, arXiv:1508 ... · The Prose Storyboard Language A Tool for Annotating and Directing Movies REMI RONFARD∗, Univ. Grenoble Alpes, Inria,

18 R. Ronfard et al.

Fig. 22. Sketch storyboards for the two sequences in Fig.21 above.

[28] T. D. C. Li�le and A. Ghafoor. Synchronization and storage models for multimedia objects. IEEE Journal on Selected Areas in Communications,8:413–427, 1990.

[29] A. Louarn, M. Christie, and F. Lamarche. Automated staging for virtual cinematography. In Proceedings of the 11th Annual International Conferenceon Motion, Interaction, and Games, MIG 2018, Limassol, Cyprus, November 08-10, 2018, pages 4:1–4:10, 2018.

[30] L. P. Magalhaes, A. B. Raposo, and I. L. Ricarte. Animation modeling with petri nets. Computers and Graphics, 22(6):735 – 743, 1998.[31] D. Markowitz, J. T. K. Jr., A. Shoulson, and N. I. Badler. Intelligent camera control using behavior trees. In MIG, pages 156–167, 2011.[32] M. Marti, J. Vieli, W. Witon, R. Sanghrajka, D. Inversini, D. Wotruba, I. Simo, S. Schriber, M. Kapadia, and M. Gross. Cardinal: Computer assisted

authoring of movie scripts. In 23rd International Conference on Intelligent User Interfaces, IUI ’18, pages 509–519, 2018.[33] W. Murch. In the blink of an eye. Silman-James Press, 1986.[34] M. Ondaatje. �e Conversations: Walter Murch and the Art of Film Editing. Random House, 2004.[35] B. O’Neill, M. O. Riedl, and M. Nitsche. Towards intelligent authoring tools for machinima creation. In CHI Extended Abstracts, pages 4639–4644,

2009.[36] N. Proferes. Film Directing Fundamentals - See your �lm before shooting it. Focal Press, 2008.[37] D. V. Rijsselbergen, B. V. D. Keer, M. Verwaest, E. Mannens, and R. V. de Walle. Movie script markup language. In ACM Symposium on Document

Engineering, pages 161–170, 2009.[38] R. Ronfard. Reading movies: an integrated dvd player for browsing movies and their scripts. In ACM Multimedia, 2004.[39] R. Ronfard. A Review of Film Editing Techniques for Digital Games. In R. M. Y. Arnav Jhala, editor, Workshop on Intelligent Cinematography and

Editing, Raleigh, United States, May 2012. ACM.[40] R. Ronfard and �uong. A framework for aligning and indexing movies with their script. In International Conference on Multimedia and Expo, 2003.[41] B. Salt. Moving Into Pictures. Starword, 2006.[42] B. Salt. Film Style and Technology: History and Analysis (3 ed.). Starword, 2009.[43] L. M. Seversky and L. Yin. Real-time automatic 3d scene generation from natural language voice and text descriptions. In Proceedings of the 14th

ACM International Conference on Multimedia, MM ’06, pages 61–64, 2006.[44] J. Shen, S. Miyazaki, T. Aoki, and H. Yasuda. Intelligent digital �lmmaker dmp. In ICCIMA, 2003.[45] R. �ompson and C. Bowen. Grammar of the Edit. Focal Press, 2009.[46] R. �ompson and C. Bowen. Grammar of the Shot. Focal Press, 2009.[47] J. Vogt, M. Haesen, K. Luyten, K. Coninx, and A. Meier. Timisto: A technique to extract usage sequences from storyboards. In Proceedings of the 5th

ACM SIGCHI Symposium on Engineering Interactive Computing Systems, EICS ’13, pages 113–118, 2013.[48] H.-Y. Wu, F. Palu, R. Ranon, and M. Christie. �inking like a director: Film editing pa�erns for virtual cinematographic storytelling. ACM Trans.

Multimedia Comput. Commun. Appl., 14(4):81:1–81:22, Oct. 2018.[49] A. Yannopoulos. Directornotation: Artistic and technological system for professional �lm directing. J. Comput. Cult. Herit., 6(1):2:1–2:34, Apr. 2013.[50] P. Ye and T. Baldwin. Towards automatic animated storyboarding. In Proceedings of the 23rd National Conference on Arti�cial Intelligence - Volume 1,

AAAI’08, pages 578–583, 2008.

Manuscript submi�ed to ACM