synchonization of senses - inria€¦ · rémi ronfard, vineet gandhi, laurent boiron. 2nd workshop...

Post on 24-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SynchronisationofSensesFromTexttoSpeech…toMovie

RémiRonfardCVAM/ICCVOct23,2017

1

Introduction• IMAGINEteamatINRIAonnaturalinterfacesfordesigningshapes,motionsandstories

• Buildinteractivenarrativeenvironmentswheretheuseristhedirector– Requiresanexplicitrepresentationofstorygoals:characteractions,eventsandtheircausalrelations

– Requiresadirectablefilmcrewofvirtualactors,cameramen,lightingtechnicians,etc.

Scientificchallenges

• Naturallanguageandstoryunderstandingforscriptanalysis

• Generativeaudio-visualmodels• Proceduralmodelsfor3Dscenegeneration• Behavior-based3Danimationfordirectingvirtualactors

• Virtualcinematographyforplacinglightsandcamerasautomaticallyandeditingthemtogethertoasinglestringoffilm

Outline–Text-to-movie–Generativeaudiovisualprosodymodelforvirtualactors–Eisenstein’stheoryofverticalmontage–Continuityeditingfor3Danimation

4

Motivation:Text-to-Movie

Hitchcock’sdreamofamachineinwhichhe’d“insertthescreenplayatoneendandthefilmwouldemergeattheotherend”(Truffaut/Hitchcock,p.330)

Script Storyboard Stage EditingRoom

VideoGame/LiveAction/3DAnimation

XtranormalText-to-Movie©

• Startupcreatedin2006inMontreal• Missionstatement:3-Danimationtoolsfordigitalstorytelling

• «Ifyoucanwrite,youcanmakemovies»• Shutdownin2013,re-bornin2015as«Nawmal»

Createashortmovieinfoureasysteps….

1. Picktemplate,characters&voicesfromlibraries…

…withoutworryingaboutcinematographyandediting

2.Typedialogandinsertgesturesandeffects…3.View&edityourwork…4- Publish…

1. Pick template, characters & voices from libraries…

4.Publish

Text-to-movie:NawmalMake

9

Text-to-movie:Nawmalsmartcameras

10

Text-to-speech(TTS)• Atext-to-speech(TTS)systemconvertsnormallanguagetextintospeech;othersystemsrendersymboliclinguisticrepresentationslikephonetictranscriptionsintospeech.

• Allen,Jonathan;Hunnicutt,M.Sharon;Klatt,Dennis(1987).FromTexttoSpeech:TheMITalksystem.CambridgeUniversityPress.

11

Parametrictext-to-speech(TTS)

• Abeginners’guidetostatisticalparametricspeechsynthesis,SimonKing,2010.12

13Inpress:IEEEComputerGraphicsandApplications,Nov/Dec2017.

Exercisesinstyle

14

Emotionsandattitudes

15

• Actorsexpressdramaticattitudesusingthecoordinatedprosodyofvoice,rhythm,facialexpressionsandheadandgazemotion.

• Weproposeamethodforgeneratingnaturalspeechandanimationinvariousattitudesusingneutralspeechandanimationasinput.

Audioprosody• High-levelfeatures:pitch,durationandintensitypersyllable

• Low-levelfeatures:voicequalities

16

Visualprosody

17

• High-levelfeatures:shoulder,headandeyemovements

• Low-levelfeatures:facialexpressions• VisualProsody:FacialMovementsAccompanyingSpeech,HansPeterGraf,EricCosatto,VolkerStrom,FuJieHuang,FaceandGesture,2002.

Exercisesinstyle

18

GenerativeAudiovisualProsodicModel

19Dramaticattitude:seductive

GenerativeAudiovisualProsodicModel

20Dramaticattitude:scandalized

GenerativeAudiovisualProsodicModel

21Dramaticattitude:thinking

Speech-drivenanimation

22

• ErikaChuangandChristophBregler.2005.Moodswings:expressivespeechanimation.ACMTrans.Graph.2005.

• StacyMarsella,YuyuXu,MargauxLhommet,AndrewFeng,StefanScherer,andAriShapiro.2013.Virtualcharacterperformancefromspeech.SymposiumonComputerAnimation(SCA'13).

• TeroKarras,TimoAila,SamuliLaine,AnttiHerva,andJaakkoLehtinen.2017.Audio-drivenfacialanimationbyjointend-to-endlearningofposeandemotion.ACMTrans.Graph.36,4,July2017.

GeneralizedSpeechAnimation

23

• SarahTaylor,TaehwanKim,YisongYue,MosheMahler,JamesKrahe,AnastasioGarciaRodriguez,JessicaHodgins,andIainMatthews.2017.Adeeplearningapproachforgeneralizedspeechanimation.ACMTrans.Graph.36,4,July2017.

GeneralizedSpeechAnimation

24

Text-drivenanimation

25

• IreneAlbrecht,JörgHaber,KoljaKähler,MarcSchröder,andHans-PeterSeidel.2002."MayItalktoyou?:-)"FacialAnimationfromText.PacificGraphics,2002.

Expressiveconversion

26

• JointGaussianMixtureModelsofexpressionpairs• DanielVlasic,MatthewBrand,HanspeterPfister,andJovanPopovic.2006.Facetransferwithmultilinearmodels.InACMSIGGRAPH2006Courses(SIGGRAPH'06).

Ourapproach:prosodiccontours

27

Ourapproach:prosodiccontours

28

• F=voicepitch,H=headmotion,G=gazemotion,U=upper-face,L=lower-face,C=rhythm,E=energy

Learningaudiovisualprosody

29

Generatingaudiovisualprosody

30

Experimentalresults

31

• Thankyouforthelovelyflowers:thinking,ironic,scandalized

Experimentalresults

32• You’rewelcome(fascinated,doubtful,embarrassed)

Subjectiveevaluation

33

Subjectiveevaluation

34• CF=comforting,FA=fascinated,TH=thinking,DO=doubtful,C0=confronted,EM=embarrassed

Subjectiveevaluation

35• CF=comforting,FA=fascinated,TH=thinking,DO=doubtful,C0=confronted,EM=embarrassed

Exercisesinstyleresults

36

Eisenstein,synchronizationofsenses

37

Eisenstein,synchronizationofsenses

MONTAGE defined as:• Piece A, derived from the elements of the theme being developed• Piece B, derived from the same source • in juxtaposition give birth to the image in which the thematic matter is most clearly embodied.

38

Eisenstein,synchronizationofsenses

Representation A and representation B must be so selected from all the possible features within the theme that their juxtaposition shall evoke in the perception and feelings of the spectator the most complete image of the theme itself.

39

Eisenstein,synchronizationofsenses– Transitionfromsilentmontagetosound-picture,oraudio-visualmontagechangesnothinginprinciple.Ourconceptionofmontageencompassesequallythemontageofthesilentfilmandofthesound-film.

– However,thisdoesnotmeanthatinworkingwithsound-film,wearenotfacedwithnewtasks,newdifficulties,andevenentirelynewmethods.

– Onthecontrary!40

Eisenstein,synchronizationofsenses– Thatiswhyitissonecessaryforustomakeathoroughanalysisofthenatureofaudio-visualphenomena.

– Ourfirstquestionis:Whereshallwelookforasecurefoundationofexperiencewithwhichtobeginouranalysis?

41

Eisenstein,synchronizationofsenses–Manandtherelationsbetweenhisgesturesandtheintonationsofhisvoice,whicharisefromthesameemotions,areourmodelsindeterminingaudio-visualstructures,whichgrowinanexactlyidenticalwayfromthegoverningimage.

42

Eisenstein,synchronizationofsenses

43

Eisenstein,synchronizationofsenses

44

– Torelateimagewithsound,wefindanaturallanguagecommontoboth-movement.

–Movementwillrevealallthesubstrataofinnersynchronizationthatwewishtoestablishinduecourse.Movementwilldisplayinaconcreteformthesignificanceandmethodofthefusionprocess.

Eisenstein,synchronizationofsenses

45

– Letusexamineanumberofdifferentapproachestosynchronizationinlogicalorder.

– Thefirstisapurelyfactualsynchronization:thesound-filmingofnaturalthings(acroakingfrog,themournfulchordsofabrokenharp,therattleofwagonwheelsovercobblestone).

Eisenstein,synchronizationofsenses

46

– Inthemorerudimentaryformsofexpressionbothelements(thepictureanditssound)willbecontrolledbyanidentityofrhythm,accordingtothecontentofthescene.

– Thisisthesimplest,easiestandmostfrequentcircumstanceofaudio-visualmontage,consistingofshotscutandeditedtogethertotherhythmofthemusicontheparallelsound-track.

Eisenstein,synchronizationofsenses

47

–Wecansurelyfindashotwhosemovementharmonizesnotonlywiththemovementoftherhythmicpattern,butalsowiththemovementofthemelodicline.

– (…)– Synchronizationcanbenatural,metric,rhythmic,melodicandtonal.

Eisenstein,synchronizationofsenses

48

Eisenstein,synchronizationofsenses

49

ContinuityEditingfor3DAnimation

QuentinGalvaneRémiRonfardChristopheLinoMarcChristie

Twenty-NinthAAAIConference

2015

50

Objectives

➢Readactionsanddialoguesfromscript

➢Generatespeechandanimation

➢Placecamerasandlights,generaterushes

➢Edittherushesintoamovie

51

… GoldiespeakstoGeorge

GeorgespeakstoGoldie

GoldiespeakstoGeorge

Related work

Idiombasedsolutions

Scenario

Virtualcinematographer[Christiansonetal.1996]

52

… GoldiespeakstoGeorge

GeorgespeakstoGoldie

GoldiespeakstoGeorge

Related work

Scenario

Allcamerasevaluatedovertheentirebeat

Alltransitionsevaluatedatbeatchanges

[Riedl,M.etal.,2008]

OptimizationbasedapproachDynamicprogramming

53

… GoldiespeakstoGeorge

GeorgespeakstoGoldie

GoldiespeakstoGeorge

Our approach

Scenario

EvaluateallpossibletransitionsRhythm

54

➢Filmeditingasanoptimizationproblem▪Semi-Markovchains

➢Createaneditinggraphthatevaluates3aspects:

▪Shotquality

▪Cutquality

▪Rhythm

Outline

55

➢ Searchoversemi-Markovchainss=(rj,dj)givenactionsa(t)

➢Minimizecostfunction:

Actioncost(Shotquality)

Transitioncost(Cutquality)

Rhythmcost(RhythmicQuality)

Thefinaleditingisgivenbytheshortestpathintheeditinggraph

Film editing as optimization

56

➢Shotquality:

▪ Hitchcockprinciple

Shot Selection

Thesizeofacharacteronthescreenshouldbeproportionaltoitsnarrativeimportanceinthestory.

•Narrativeimportancefromscript•VisibleareaV=S–Oforeachrush

57

Actorsandactions

58

Continuity editing

59

Results

60

Limitations & Future work

Limitations

➢ Audiotracksandcamerasmustbepre-computed

➢ Cannothandleellipsisorflashbacks

➢ Cannothandlebook-ending

▪ Contextfreegrammar

61

Limitations & Future work

Futurework

➢ Optimizeovercamerapositionsandmovements

➢ Extendtoliveactionvideo

➢ Learnothereditingstylesfromrealmovies [Gandhietal.,2014]

[Galvaneetal.,2014]

62

ComputationalVideoEditingforDialogue-DrivenScenesMackenzieLeake,AbeDavis,AnhTruong,ManeeshAgrawala

63

Whataboutverticalediting?• CanwelearnstatisticalmodelsofEisensteinstyleofmontage?• Harderthancontinuityediting• Semi-Markovmodelstillrelevant• Verticalrelationsbetweensoundandpicture• Verticalrelationsbetweenvirtualcamerashots

64

Whataboutverticalediting?• Semi-Markovmodelscanbeuseful!• syllableandsentencedurations• shotandscenedurations• actiondurations

• MultimodalSemi-Markovmodelsneeded!

65

Conclusion• Generativeaudiovisualmodels• expressionofemotionsandattitudeswithprosody

• expressionofnarrativeswithvideoediting• Motivatedbytexttomovieconversion• Alsoimportantformoviedescription

66

References• The Prose Storyboard Language: A Tool for Annotating andDirecting

Movies.RémiRonfard,VineetGandhi, LaurentBoiron.2ndWorkshopon Intelligent Cinematography and Editing part of Foundations ofDigitalGames-FDG2013.

• Narrative-Driven Camera Control for Cinematic Replay of ComputerGames.QuentinGalvane,RémiRonfard,MarcChristie,NicolasSzilas.MIG2014.

• Beyond Basic Emotions: Expressive Virtual Actors with SocialAttitudes. Adela Barbulescu, Rémi Ronfard, Gérard Bailly, GeorgesGagneré, Huseyin Cakmak. 7th International ACM SIGGRAPHConferenceonMotioninGames2014.

67

References• Continuity Editing for 3D Animation. Quentin Galvane, Rémi

Ronfard, Christophe Lino, Marc Christie. AAAI Conference onArtificialIntelligence,Jan2015.

• Camera-on-rails: Automated Computation of Constrained CameraPaths. Quentin Galvane, Marc Christie, Christophe Lino, RémiRonfard. ACM SIGGRAPH Conference on Motion in Games, Nov2015.

• Implementing Hitchcock - the Role of Focalization and Viewpoint.Quentin Galvane, Rémi Ronfard. Eurographics Workshop onIntelligentCinematographyandEditing,Apr2017.

68

References• Five Challenges for Intelligent Cinematography and Editing.

Rémi Ronfard. Eurographics Workshop on IntelligentCinematographyandEditing,Apr2017.

• Which prosodic features contribute to the recognition ofdramatic attitudes? Adela Barbulescu, Rémi Ronfard, GérardBailly.SpeechCommunication,Aug.2017.

• AGenerative Audio-Visual ProsodicModel for Virtual Actors.AdelaBarbulescu,RémiRonfard,GérardBailly. IEEEComputerGraphicsandApplications,Nov./Dec.2017.

69

top related