modelling and analyzing multimodal dyadic interactions using social networks

19
Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks Sergio Escalera, Petia Radeva, Jordi Vitrià, Xavier Barò and Bogdan Raducanu

Upload: osgood

Post on 16-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks. Sergio Escalera, Petia Radeva, Jordi Vitrià, Xavier Barò and Bogdan Raducanu. Outline Introduction Audio – Visual cues extraction and fusion Social Network extraction and analysis Experimental Results - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

Modelling and Analyzing Multimodal Dyadic Interactions

Using Social Networks

Sergio Escalera, Petia Radeva, Jordi Vitrià, Xavier Barò and Bogdan Raducanu

Page 2: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

Outline

1. Introduction

2. Audio – Visual cues extraction and fusion

3. Social Network extraction and analysis

4. Experimental Results

5. Conclusions and future work

Page 3: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

1. Introduction

- Social interactions play a very important role in people’s daily lives.

- Present trend: analysis of human behavior based on electronic communications: SMS, e-mails, chat

- New trend: analysis of human behavior based on nonverbal communication: social signals

- Quantification of social signals represents a powerful cue to characterize human behavior: facial expression, hand and body gestures, focus of attention, voice prosody, etc.

Page 4: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

Social Network Analysis (SNA) has been developed as a tool to model social interactions in terms of a graph-based structure:

- ‘Nodes’ represent the ‘actors’: persons, communities, institutions, etc.

- ‘Links’ represent a specific type of interdepency: friendship, familiarity, business transactions, etc.

A common way to characterize the information ‘encoded’ in a SNA is to use several centrality measures.

Page 5: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

Our contribution:

- In this work, we propose an integrated framework for extraction and analysis of a SNA from multimodal (A/V) dyadic interactions*

- The advantage is represented by the fact that it is based on a totally non-intrunsive technology

- First: we perform speech segmentation through an audio/visual fusion scheme- In the audio domain, speech is detected through clusterization of audio features

- In the visual domain, speech is detected through differential-based feature extraction from the segmented mouth region

- The fusion scheme is based on stacked sequential learning

*We used a set of videos belonging to the New York Times’ Blogging heads opinion blog. The videos depict two persons talking on different subject in front of a webcam

Page 6: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

Block-diagram representation of our integrated framework

- Second: To quantify the dyadic interaction, we used the ‘Influence Model’, whose states encode previously integrated audio-visual data

- Third: The Social Network is extracted based on the estimated influences* and its properties are characterized based on several centrality measures.

* The use of term ‘influence’ is inspired by the previous work of Choudhury:

T. Choudhury, 2003. “Sensing and Modelling Human Networks”, Ph.D. Thesis, MIT Media Lab

Page 7: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

2. Audio – Visual cues extraction and fusion

• Audio cue– Description

• 12 first MFCC coefficients• Signal energy• Temporal cepstral derivatives (Δ and Δ2 )

Page 8: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

• Audio cue– Diarization process

• Segmentation– Coarse segmentation according Generalized Likelihood

ratio between consecutive windows

• Clustering– Agglomerative hierarchical clustering with a BIC stopping

scheme

• Segments boundaries are adjusted at the end

Page 9: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

• Visual cue– Description:

• Face segmentation based on Viola-Jones detector• Mouth region segmentation• Vector of HOG descriptors for for the mouth region

Page 10: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

• Visual cue– Classification:

• Non-Speech class modelling• One-class Dynamic Time warping based on the

following dynamic programming equation

Page 11: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

• Fusion scheme– Stacked sequential learning (suitable for

problems characterized by long runs of identical labels)

• Fusion of audio-visual modalities• Determining temporal relations of both feature sets

for learning a two-stage classifier (based on Ada-Boost)

Page 12: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

3. Social Network extraction and analysis

- Influence Model (IM), was a tool introduced for quantification of interacting processes using a coupled Hidden Markov Model (HMM)

- In the case of social interaction, the states of IM encode automatically extracted audio-visual features

Influence Model Architecture

parameters represent the ‘influences’

Page 13: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

- The construction of the Social Network is based on ‘influences’ values

- A directed link between two nodes A and B (designated by A → B) implies that ‘A has influence over B’

- The SNA is based on several centrality measures:- degree centrality (indegree and outdegree) - Refers to the number of direct connections with other persons

- closeness centrality

- Refers to the facility between two persons to communicate

- betweeness centrality

- Refers to the relevance of a person to act as a ‘bridge’ between two sub-groups of the network

- eigenvector centrality

- Refers to the importance of a person in the network

Page 14: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

4. Experimental results

- We collected a subset of videos from the New York Blogging Heads’ opinion blog

- We used 17 videos from 15 persons- Videos depict two persons having a conversation in front of

their webcam on different topics (politics, economy,…)- The conversations have an informal character and

sometimes frequent interruptions can occur

Snapshot from a video

Page 15: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

- Audio features - The audio stream has been analyzed using sliding windows of 25 ms with

an overlapping factor of 50%.- Each window is characterized by 13 features (12 MFCC +E), complemented with Δ and Δ2 - The shortest length of a valid audio segment was set to 2.5 ms

- Video features- 32 oriented features (corresponding to the mouth region) have been extracted using the HOG descriptor

- the length of the DTW sequences has been set to 18 frames (which corresponds to 1.5 s)

- Fusion process- stacked sequential learning was used to fusion the audio-visual features - Adaboost was chosen as classifier

Page 16: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

Visual and audio-visual speaker segmentation accuracy

Page 17: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

The extracted social network showing participants’ label and influence directions

Page 18: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

Centrality measures table

Page 19: Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks

5. Conclusions and future work

- We presented an integrated framework for automatic extraction and analysis of a social network from im-plicit input (multimodal dyadic interactions), based on theintegration of audio/visual features.

- In the future, we are planning to extend the current work to study the problem of social interactions at larger scale and in different scenarios

- Starting from the premise that people's lives are more structured than it might seem a priori, we plan to study long-term interactions between persons, with the aim to discover underlying behavioral patterns present in our day-to-day existence