multimodal transcript of face-to-face group-work activity ...ceur-ws.org/vol-2163/paper4.pdf ·...

CompanionProceedings8thInternationalConferenceonLearningAnalytics&Knowledge(LAK18)

1

CreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0)

Multimodal Transcript of Face-to-Face Group-Work Activity Around Interactive Tabletops

XavierOchoa,KatherineChiluiza,RogerGranda,GabrielFalcones,JamesCastellsandBrunoGuamán

EscuelaSuperiorPolitécnicadelLitoral,ESPOL{xavier,kchilui,roger.granda,gabriel.falcones,james.castells,bruno.guaman}@cti.espol.edu.ec

ABSTRACT:Thispaperdescribesamultimodalsystemaroundamulti-touchtabletoptocollectdifferent data sources for group-work activities. The system collects data from variouscameras,microphonesandthelogsoftheactivitiesperformedinthemulti-touchtabletop.We conducted a pilot study with 27 students in an authentic classroom to explore thefeasibilityofcaptureindividualandgroupinteractionsofeachparticipantinacollaborativedatabasedesignactivity.Fromtherawdata,weextracted low-level features (e.g. tabletopaction,gazeinteraction,verbalintervention,emotions)andgeneratedsomevisualizationsasannotatedtranscriptsofwhathappenedinaworksession.Weevaluatedteacher’sperceptionabouthowtheautomatedmultimodaltranscriptcouldpotentiallysupporttheunderstandingofgroup-workactivities.Resultsfromteachers’perceptionspointedoutthatthemultimodaltranscriptcouldbecomeavaluabletooltounderstandgroup-workrapportandperformance.

Keywords:multimodaltranscripts,collaboration,group-workvisualizations

1 INTRODUCTION

With the evolution of multi-user tabletop devices, the opportunities to enhance collaboration inseveral contexts have been extended significantly, especially in collaborative learning contexts.Several authors have studied the effect of introducing this particular technology in collaborativesessions, reporting positive results on enhancing communication skills between participants(Kharrufa,etal.2013;Heslop,2015).Thedata-capturecapabilitiesoftabletopsinlearningcontextspresent new opportunities to better understand the collaboration and learning processes duringgroup-work activities in the classroom. For instance, collaboration interactions gathered from atabletop setting could help teachers by making group-work orchestration easier (Martinez-Maldonadoetal.,2011)orhelpstudentstoreflectabouttheircollaborationexperience.However,usingonlythedataproducedbythetabletopsprovideanarrowpictureofthoseprocessesbecausestudentsinteractthroughavarietyofmodes(speech,gaze,posture,gestures,etc.)andnotalloftheactions are perceived or recorded by the tabletop software (Martinez-Maldonado et al. 2017). Acommonapproach to obtain amoreholistic viewof the collaboration is to complement the datacapturedbythetabletopswiththecaptureandanalysisfromothersourcessuchasvideorecordings(Al-Qaraghuli,2013).However, thevisualizationof thesedata,especially if several communicationmodalitieswant to be captured, could become cluttered and confusing for teacherswho seek toprovideinstantfeedbacktostudentsafteragroup-workactivity.

Inthissense,anautomaticmultimodaltranscripthasproventobeanefficientandcomprehensivemethodtorepresentandvisualizetemporal informationfromseveralsources(Bezemer&Mavers,


2


2011).Thus,combiningmultimodalcollaborationfeaturesintoatime-basedvisualizationcouldhelpteacherstomakesenseofcollaborationprocessesthroughtheobservationofstudents’actionsandemotionsinthegroup-workactivity(Martinez-Maldonadoetal.,2011;Tangetal.,2010).Eventhoughsomestudieshaveaddednewdimensionsofcollaborationtothedataobtainedfromtabletop,mostoftheeffortstocreateautomatedtranscriptsfromgroup’sinteractionshavebeenfocusedononeortwomodalities.Forexample,Martinez-Maldonadoetal. (2013)presentedanapproachto identifycommonpatterns of collaboration bymining student logs and detected speech. In anotherwork,Adachietal.(2015)capturedandvisualizedgazeandtalkingparticipationofmembersinaco-locatedconversationtoprovidefeedbackthatinturnwouldhelpbalancingparticipation.Thesestudiesserveasabaselineforouranalysis;however,wewanttoexplorethepotentialofgeneratinganautomatedtranscriptbycombiningmultiplemodalitiestoinformteachersaboutgroups’interactionsaroundatabletop.Besides,inaclassroominteractionresearchcontext,thefocushasbeenalmostexclusivelyonteachertalkorteacher–studenttalkandnotinstudent-studenttalkingroup-worke.g.student-studentrapportbuildingingroup-work(Ädel,2011).Ultimately,wewanttoknowifitispossibleforteacherstodeterminemoreevidenceaboutcollaboration,suchasgrouprapportfromthetranscriptsgenerated.

2 MULTIMODAL INTERACTIVE TABLETOP SYSTEM

The system is a variation of a prototype presented by Echeverria et al. (2017), that fosters thecollaborativedesignprocess.Besidesthesensorsusedinthepreviousversion(KinectV1,coffeetableandthreetablets),thisnewversionaddsthreemultimodalselfies(Dominguezetal.,2015)withlapelmicrophonesattachedforcapturingindividualspeech.Themultimodalselfiesarelocatedaroundthetabletop to capture synchronizedvideoandaudio foreachparticipant.Additionally, amultimodalselfiewasusedtocapturethevideooftheentiregroupsession.Figure1showsthecomponentsofthesystemandtheworkingprototype.

Thesoftwareofthesystemhasthreedifferentapplications:atabletopapplication,amanagementwebapplication,anda recordingapplication.The tabletopapplicationwasdesigned following thedesign principles described inWong-Villacres et al. (2015). It allows the participants to develop adatabase design through the creation and modification of several interactive objects (entities,attributes,relations).Themanagementwebapplicationcommunicateswiththetabletopapplication

Figure1:Componentsofthemultimodalinteractivetabletopsystem


3


andwiththemultimodalselfiestocontroltheexecutionofthesessionstarttherecordings. Italsoallowstheinstructortoviewthesolutiondevelopedbythestudents(seeEcheverria,etal.2017fordetails). The recording application was deployed in each multi-modal selfie. It controls thesynchronizationof the recordingsby the implementationofapublish/subscribesolutionusing thelightweight MQTT connectivity protocol1. The recordings obtained are further processed toautomaticallytagthemaccordingtotheproposedaudioandvideofeatures(seesection2.1).Alllogsandfeaturesarestoredinarelationaldatabase.Inaddition,rawaudioandvideodataaresavedinaNASServer,associatedwithacodeforidentifyingeachstudent.

2.1 Multimodal Transcript

Themultimodal transcript combines a setof automatic features (extracted fromvideo, audio andtabletopactionlogs)intoatimelinewheretheteachercanobservethemomenteachinteractiontookplace,andhowthegroupsessiondevelopedthroughtime.Thefollowingsectionspresentthesetoffeaturesanddetailsofthemultimodaltranscript.

2.1.1 Audio Features Fromthespeechrecordedbythesystem,weusedaSpeech-to-Textrecognitionsoftware,toobtainanautomatedtranscriptoftheconversationamongparticipantsinthegroup.Then,weextractedthespeechsectionsfromtherecordedindividualaudioofeachparticipant,andthenconvertedtotextusingGoogle'sCloudSpeechAPI2.Inthisway,weobtainedtheconversationbetweentheparticipantsalongwiththetimeeachverbalinteractiontookplace.Google'sAPIresultsusingSpanishlanguageare not as accurate as results obtainedusing English language. In spite of that, it is still useful toretrievesentenceswithwordsrelatedtothedesignproblem.

2.1.2 Video Features Mutualgazeandsmileshasbeenconsideredasnon-verbal indicatorsof rapport inpreviouswork,(Harriganetal.,1985).Thus,webelievethatthosefeaturescouldbeavaluablefeaturetobedepictedinthemultimodaltranscript.KeypointsfromthefaceofeachparticipantrecordedbytheMultimodalSelfie were extracted using the OpenPose Library (Cao et al., 2016). This library retrieves thecoordinatesof20points(e.g.eyes,nose,ears,etc.).Thesefacekeypointswereanalyzedoneveryframeof the recordings,andanalgorithmwasdeveloped toautomaticallyestimate themomentswhenaparticipantislookingtowardstoanother.Thisevaluationwascarriedoutbycountingfalsepositivesandfalsenegativesfromalldetectionsmadebythealgorithm.Toevaluatetheaccuracy,weconsidered5differentvideosof5minute-lengthfromtheoriginalgroupsessions(seesection4fordetails).Accordingtoourevaluation,thisfeaturehasanerrorrateof15.1%.Inaddition,weextractedtheemotionseachparticipantdemonstratedduringtheactivity.VideoframesofindividualrecordingsfromtheMultimodalSelfiewereprocessedusingtheMicrosoftEmotionAPI3.Thus,foreverysecond,oneframeoftheparticipant'sfaceissenttotheAPI,whichreturnsanarrayofscoresdetermininglevelsofhappiness,anger,disgust,amongothers,withvaluesfrom0to1.Toevaluatetheaccuracyof the emotion recognition software, videos of three students (25 min approx.) were randomly

1http://mqtt.org/2https://cloud.google.com/speech/3https://azure.microsoft.com/en-us/services/cognitive-services/emotion/


4


selectedfromallthegroupsthatparticipatedinapilotstudy(seesection4fordetails).Weselectedhappinessastheemotiontobeevaluatedbecauseitwasthemostcommondetectedemotionintherecordedsessions.Ahumanevaluatedthevideosbyannotatingifthestudentwashappyornotforeach second. Since the APIreturnsvaluesbetween0and1,we selected a threshold above0.5todetermineifthestudent’semotion corresponded tohappiness. Our evaluationresultedinanaverageerrorrateof1.77%.

2.1.3 Interactive Tabletop Logs

All the interactions with theobjects on the tabletop wererecorded in a database. Eachinteractionisrepresentedbythefollowing features: type ofinteraction(CREATE,EDIT,DELETE),studentidentity,timestamp,andthetypeofobjectthestudentcreated.ThesolutionproposedbyMartínez,R.etal.(2011)wasusedforstudentdifferentiationwhileinteractingwithanobjectonthetabletop.Figure2showsanexcerptofagroupsessioncapturedbythesystem.Aswecansee,differentfeaturesofthegroup’sinteractionarerepresented(e.g.tabletopactions,gaze,etc.)inaverticaltimeline.Forinstance,wecanobservethatint=1,student1(S1)andstudent2(S2)werelookingtotheright,student3(S3)waslookingtotheleft,andsoon.

3 PILOT STUDY

The purpose of this study was to validate with teachers the results obtained from the proposedmultimodaltranscriptgatheredfromgroups’sessions.Thestudyisdividedintwoparts.Inafirstpartof thestudy, twenty-fourundergraduatestudents fromaComputerScienceprogram(20males,4females,averageage:24years),enrolledinanintroductoryDatabaseSystemscourseandwereaskedtoparticipate in a collaborative sessionusing theproposed system.Eight groupswere conformed(threestudentseach)andgroupedbyaffinity.Eachgroupworkedapproximately30minutesinthedesign session.All of themultimodal featureswere recordedwhile the studentswere solving thedesignproblem.Attheendofthesession,thesystemscoredthesolutionproposedbythegroupandtheteachergavefeedbacktostudentsabouttheirperformance.Inaddition,eachstudentreportedtherapportoftheirgroupaccordingtotheirEnjoyableInteractionandPersonalConnection(Frisbyetal.,2010).

Inthesecondpartofthestudy,fourteachers(3males,1female,avg.age31)withpreviousexperienceon teaching Database Design, were invited to participate in the evaluation of the multimodaltranscript.First,teacherswatchedavideoshowinghowthemultimodaltabletopsystemworks.Then,theyobservedthemultimodaltranscriptsfromthedatagatheredfromtwogroups,correspondingtothe highest and lowest rapport scores from self-reported data. For purposes of simplicity, only

Figure2:Anexcerptofthemultimodaltranscriptfromagroup


5


relevantfragmentsandasummaryofthetranscriptwereusedintheobservations.Next,theteacheransweredasetofquestionsabouttheperceptionofEnjoyableInteractionandPersonalConnectionregardingeachgroup.Theyassignedascoretoeachgroupforeachvariableusingathree-pointLikertScale (Low: 1, Medium: 2. High. 3). Additionally, we included open questions about how themultimodaltranscriptcouldpotentiallyprovidesupporttotheteachertoevaluateandrecreatethegroup-workperformance.

4 RESULTS AND DISCUSSION

AsforEnjoyableInteraction,teachersperceivedalowinteractioninthegroupwithlowestrapport,forinstancetheyreportedanaverageof1.25over3;whereasinthegroupwithhighestrapport,theyscoredanaverageof2.75over3. AsforPersonalConnection,asimilarpatternwasobserved,thegroupwithlowestrapportwasevaluatedwithanaverageof1.25over3,andtheonewithhighestrapportscored2.5over3.Fromtheseresults,itseemsthatthemultimodaltranscriptswerevaluablefortheteachers,sincetheymostlyagreedwiththerapportreportedbythemembersofthegroups.Somepositivecommentsaboutthesupporttheteachersperceivedfromthemultimodaltranscriptforassessingenjoyable interactionandpersonalconnectionarepresentedasfollows.Oneteacherstated:"thecombinationofvoiceandemotionspresentedinthetranscriptgivemetheideaofhowthestudentsfeltaboutthetaskduringthesession",anotherteachersaidthat:"whatIobservedgivesmeevidencesabouttheinteractionamongstudents…morespecifically,themixbetweenactionsandemotionsaretheevidenceofsuchinteractions". Inaddition,therewerealsosomecriticalremarks.For instance, one teacher indicated that the transcript: "didnot present enoughdetails about theemotionsoftheparticipants".Anotherteachersaidthat"theemotionspresentedarenotenoughtoinfertheinteractionsthatwerepresent,Ithinkthere'stheneedtoevidencetheinterrelationsbetweentheemotionsofoneparticipantwiththeothers".

Asfortheperceptionofteachersabouthowthetranscriptwouldsupportthemtoevaluatethegroup-workperformance,alltheinterviewedparticipantsansweredpositivelytothisquestion.Regardingthe recreationof studentworkduring the sessionusing themultimodal transcript, three teachersansweredpositivelyandoneindicatedthatthetranscriptwouldpartiallysupportthistask.Duringtheinterviewsoneteacherstatedthat:"Icouldobservewhetherthestudentswereworkingonthetaskortheyweredebatingaboutthetask;moreover,Icanobservetheactionsattheleveloftheindividual.Itiseasytoidentifywhoistheonewhoworkthemostorifthetaskwasequallydistributed".Anotherteachersuggestedthefollowing:"Itwouldbenicethatthestudents'commentcouldbeanalyzedaswellatthelevelofemotions".Oneteacherhadaslightlyreluctantreactionabouttherecreationofstudentwork:"Ithinkitisstillambiguouswhattheemotionsreflectinthetranscript;however,theactionsperformedusingthetabletopcouldhelpmeintherecreationofthework".

The validation stage of this work points out to a promising research path. Teachersweremostlypositiveaboutthepotentialofthemultimodaltranscripttosupportgroup-workevaluation,beyondactions and scores. Teachers valued the fact that emotions were present in the transcript. Theythoughtthatthemixofthisfeaturewithvoiceandtaskwouldsupporttheinferenceofinteractionsbetweenthemembersof thegroups.Nevertheless, thiswork isanon-goingproject thatneedsto


6


furtherexplorehowtoexpandthemeaningofemotionsinthetask,aswellas,thereactionsbetweenthemembersofthegroupsaftersomeenactedemotions.

REFERENCES

Ädel, A. (2011). Rapport building in student group work. Journal Of Pragmatics, 43(12), 2932-2947.http://dx.doi.org/10.1016/j.pragma.2011.05.007.

Adachi,H.,Haruna,A.,Myojin,S.,&Shimada,N.(2015).ScoringTalkandwatchingmeter:Utteranceandgazevisualizationforco-locatedcollaboration.SIGGRAPHAsia2015

Al-Qaraghuli, A., Zaman, H. B., Ahmad, A., & Raoof, J. (2013, November). Interaction patterns forassessmentoflearnersintabletopbasedcollaborativelearningenvironment.InProceedingsofthe25thAustralianComputer-HumanInteractionConference.(pp.447-450).ACM.

Bezemer, J., & Mavers, D. (2011). Multimodal transcription as academic practice: a social semioticperspective.InternationalJournalofSocialResearchMethodology,14(3),191-206.

Cao, Z., Simon,T.,Wei, S. E.,&Sheikh, Y. (2016).Realtimemulti-person2dposeestimationusingpartaffinityfields.arXivpreprintarXiv:1611.08050.

Domínguez,F.,Chiluiza,K.,Echeverria,V.,&Ochoa,X.(2015).Multimodalselfies:Designingamultimodalrecordingdeviceforstudentsintraditionalclassrooms.InProceedingsofthe2015ACMonIntl.Conf.onMultimodalInteraction(pp.567-574).ACM.

Echeverria,V.,Falcones,G.,Castells,J.,Martinez-Maldonado,R.,&Chiluiza,K.(2017).Exploringon-timeautomatedassessmentinaco-locatedcollaborativesystem.Paperpresentedatthe4thIntl.Conf.oneDemocracyandeGovernment,ICEDEG2017,273-276.

Frisby,B.N.,&Martin,M.M.(2010).Instructor–studentandstudent–studentrapportintheclassroom.CommunicationEducation,59(2),146-164.

Harrigan, J. A., Oxman, T. E., & Rosenthal, R. (1985). Rapport expressed through nonverbal behavior.Journalofnonverbalbehavior,9(2),95-110.

Heslop,Philip&Preston,Anne&Kharrufa,Ahmed&Balaam,Madeline&Leat,David&Olivier,Patrick.(2015).EvaluatingDigitalTabletopCollaborativeWritingintheClassroom.

Kharrufa,A.,Balaam,M.,Heslop,P.,Leat,D.,Dolan,P.,&Olivier,P. (2013).Tables in thewild:Lessonslearned from a large-scale multi-tabletop deployment. Conf. on Human Factors in ComputingSystems,1021-1030.

Martinez-Maldonado,R.,Collins,A.,Kay,J.,andYacef,K.(2011)Whodidwhat?whosaidthat?Collaid:anenvironment for capturing traces of collaborative learning at the tabletop. ACM Intl. Conf. onInteractiveTabletopsandSurfaces,ITS2011,pages172-181.

Martinez-Maldonado, R., Kay, J., & Yacef, K. (2013). An automatic approach for mining patterns ofcollaborationaroundaninteractivetabletop10.1007/978-3-642-39112-5-11

Martinez-Maldonado, R., Kay, J., Buckingham Shum, S. J., & Yacef, K. (2017). Collocated collaborationanalytics: Principles and dilemmas for mining multimodal interaction data. Human-ComputerInteraction..

Schneider, B., Sharma, K., Cuendet, S., Zufferey, G., Dillenbourg, P., & Pea, R. (2016). Detectingcollaborativedynamicsusingmobileeye-trackers.ProceedingsofInternationalConferenceoftheLearningSciences,ICLS,1,522-529.

Tang,A.,Pahud,M.,Carpendale,S.,&Buxton,B. (2010).VisTACO:visualizingtabletopcollaboration. InACMInternationalConferenceonInteractiveTabletopsandSurfaces(pp.29-38).ACM.

Wong-Villacres, M., Ortiz, M., Echeverría, V., & Chiluiza, K. (2015). A tabletop system to promoteargumentationincomputersciencestudents.InProceedingsofthe2015Intl.Conf.onInteractiveTabletops&Surfaces(pp.325-330).ACM.

multimodal transcript of face-to-face group-work activity ...ceur-ws.org/vol-2163/paper4.pdf ·...

Documents