new tools for audio descripon research: the viw projectco-author: marta villegas universitat...

22
New tools for audio descrip1on research: the VIW project Anna Matamala Co-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group [email protected] Languages and the Media, Berlin 2-4 November 2016. FFI2015-62522-ERC, 2014SGR0027, FFI2015-64038-P (MINECO/FEDER, UE)

Upload: others

Post on 10-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

Newtoolsforaudiodescrip1onresearch:theVIWprojectAnnaMatamalaCo-author:MartaVillegasUniversitatAutònomadeBarcelonaTransMediaCataloniaresearchgroupanna.matamala@uab.catLanguagesandtheMedia,Berlin2-4November2016.FFI2015-62522-ERC,2014SGR0027,FFI2015-64038-P(MINECO/FEDER,UE)

Page 2: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

Overview

• Whythisproject?• Previousworkonaudiodescrip7on(AD)andcorpora• Projectra7onale• Crea7ngthematerials• Processingthematerials• TheplaAorm

2

Page 3: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

Why this project?

• Needforcorporatoanalyseaudiodescrip7onwhichare

• Mul7modal(audio,video,text)• Mul7lingual• Openaccess

• Andallowforincreasingresearch

3

Page 4: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

Why this project?

• Fundingforoneyear:EuropaExcelenciacall(FFI2015-62522-ERC)

• Mainresearcher:AnnaMatamala• Postdocresearcher:MartaVillegas

4

Page 5: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

AD and corpora

• TIWO(Salway2007)• TRACCE(JiménezHurtadoetal.2010)• MPIIMovieDescrip7ondataset(Rohrbachetal.2015)• PearTreeProject(MazurandKruger2012),inspiredbyChafe(1980)

• Reviersetal.(2015)

5

Page 6: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

Challenges of multimodal corpora

• Knight(2011)• Designandinfrastructure• Sizeandscope• Naturalness• Availabilityand(re)usability

• Valen7ni(2013)• Verbalandaudioandvisual• Segmenta7oncriteria• Needtodeviseasoundmethodology

6

Page 7: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

The short film

• Shortfilmcommissionedtoafilmdirector(guidelinesbasedonliteraturereview)

• “Whathappenswhile---”,byNúriaNia,inEnglish.• DubbedintoCatalanandSpanishinprofessionalstudio

h]p://pagines.uab.cat/viw/

7

Page 8: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

The audio descriptions

• Audiodescrip7onsbyprofessionals(10inEnglish,10inCatalan,10inSpanish):recordedvideoplustext

• Addi7onally:audiodescrip7onsbystudents(volunteersinSpanishandCatalan),onlytext

8

Page 9: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

The corpus

9

AUDIODESCRIPTION VERSIONS WORDS

ENGLISH 10 6799

CATALAN 10 6888

SPANISH 10 6191

STUDENTS-CATALAN 7 7354

STUDENTS-SPANISH 10 5185

GLOBAL 17 32,417

Page 10: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

The corpus

h]p://pagines.uab.cat/viw/

LinkedtoUAB’sopenaccessrepository

10

Page 11: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

Processing the materials

11

mp4

txt eaffile

conll2eafLing.Annotatedtext

Ling.Annotatedtext

Ling.Annotatedtext

webapp

Page 12: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

Segmenting and processing

• Linguis7c7ers:AD-unit7er(sentences,chunks,tokens)andCredits7er

• Token:partsofspeech,lemma,andseman7cvalues

• Filmic7ers:scene,shot,sound,character,text.

12

Page 13: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

13

Timeline

AudioDesc.txt

Filmicannota1ons

en1en2en3en4… es1es2es3es4… ca1ca2ca3ca4…

EN CAES

Linguis1cannota1ons

ShortMovie.mp4

Page 14: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

The web app

• WebappusingSymfonyandahos7ngserviceatUAB

• AllcodedataareavailableatGitHub

• Accesstosourcedataplussomegraphicalvisualiza7ons

14

Page 15: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

The web app: source data

• RawmaterialperproviderandpersubcorpustoimportintoELANandintoCQPweb.Alsofilmicannota7onsaseaffile.

• Visualiza7onsforpre-establishedanalyses.

• Accessfrompreviouspagebutalsodirectly:hjp://transmediacatalonia.uab.cat/web/

15

Page 16: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

Data and visualisations

• Simplestringsearch• ADunits,sentencesandword’scoun7ngs• ADdistribu7onin7meline• Verbdistribu7onin7meline,withselec7onofverbalseman7cclass.

• ADsimilarity(TedPedersen’sText-Similaritymodule)

16

Page 17: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

Data and visualisations

• WordfrequencybyPoS,perprovider.• Seman7ctaggingforverbs,nouns,adjec7ves,andadverbs.

• Htmlversionofeacheaffile,withaccesstovisuals.…andmanyotherfeatures

17

Page 18: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

18

Page 19: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

19

Page 20: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

20

Page 21: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

And the future?

• On-goinganalysison• Character• Textonscreen• Spa7o-temporalseongs• Professionalsversusamateurs

• Howtoexpanditintootherlanguages?21

Page 22: New tools for audio descripon research: the VIW projectCo-author: Marta Villegas Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Languages

Newtoolsforaudiodescrip1onresearch:theVIWprojectAnnaMatamalaCo-author:MartaVillegasUniversitatAutònomadeBarcelonaTransMediaCataloniaresearchgroupanna.matamala@uab.catLanguagesandtheMedia,Berlin2-4November2016.FFI2015-62522-ERC,2014SGR0027,FFI2015-64038-P(MINECO/FEDER,UE)