presented by: karan parikh [email protected] towards the automated social analysis of situated speech...

17
Presented By: Karan Parikh [email protected] Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent Embedded Systems

Post on 19-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Presented By:Karan [email protected]

Towards the Automated Social Analysis of Situated Speech Data

Watt, Chaudhary, Bilmes, Kitts

CS546 Intelligent Embedded Systems

Page 2: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

• Objective• Difficulties faced in achieving the objectives• Dataset Formation• Privacy Sensitive Speech Processing• Analysis• Application• References

Overview

Page 3: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

• An automated approach for studying fine grained details of social interaction and relationship• Local behavior and global structure of the group to be analyzed • Conversation characteristics of a group of 24 people over 6 months are analyzed to study the relationship between conversational dynamics and network position. Objective

Page 4: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Difficulties

• Requires more data than just pure audio recording

• Risk of using audio of uninvolved parties, unethical and illegal

• Risk of people changing behavior if they know that they are being recorded

Page 5: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Dataset Formation

• Data collected from a group of 24 Grad students attending a class.

• All of them carried a MSB(Multi Sensor Board) consisting of tri-axial accelerometer, barometer, microphone, digital compass.

• The MSB was connected to the PDA carried in bag.

• Data was collected over the period of 9 months• Subjects also submitted reports about the

conversations with other subjects for the same month.

Page 6: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Privacy Sensitive Speech processing

• Speech can be modeled in 2 separate components:• Sound generated from the Vocal

Chords• Filter (mouth, nose, tongue)

• Voice can be of 2 types:• Voiced, of fundamental frequency• Unvoiced with no fundamental

frequency.• Intonation, stress and duration are

described by the changes in the fundamental frequency and the energy change during the speech

Page 7: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Privacy Sensitive Speech processing

• Resonant peaks of the frequency response(formants) contain information about the phonemes, which form the basis for words.

• Minimum 3 formants required to reconstruct the words. Once they are removed/not recorded, the word cannot be reconstructed.

• Features that are useful for extracting information from the recorded speech are • Non-initial maximum autocorrelation peak• Number of such peaks• Spectral relative entropy

• The speech is measured in a frames of 60Hz, known as voicing frames• For each frame, the relative entropy is calculated between the

normalized power spectrum of the current voicing frame and a normalized running average of the power spectra of the last 500 voicing frames.

• The accuracy for detection the conversation ranges from 96.1% to 99.2%

Page 8: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Privacy Sensitive Speech processing

• A subject is marked active for the preceding 20 seconds and the following 1 minute

• If the subject is not marked active, then the person is removed from the conversation.

• This feature is heuristic only to prevent the false triggering of conversation recording.

Page 9: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Analysis

• A network is constructed based on the face-face interaction• An edge is created when there is conversation of 20 minutes or more

between 2 subjects.

• Thus a network is formed using multiple edges.• We then examine whether the network thus formed corresponds to

the social relations from the survey and the feedback given by the subject.

• The average aggregate to agreement with the network came to 71.3%.

Page 10: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Analysis

Correlation between speaking style and strength of lies• Assumption: People’s normal behavior is based more on the regular

interaction partners than by rare partners• Hypothesis : people change their way of speaking less when

interacting in their strong ties.• Time spent in conversation by persons I and j estimated to test the

hypothesis.• 4 features that are to be measured of subjects speaking style:

• Rate• Pitch• Turn frequency• Turn Length

Page 11: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Analysis

• bi/j – Mean of i’s speech feature while talking to everyone except j

• bi->j - Mean of i’s speech feature while talking j

• si – Standard Deviation of i’s feature

• dij = |bi/j - bi->j|/si , the amount of feature, that i’s speech changes, when in conversation with j.

• cij = time spent by i and j in conversation.

• Correlation between cij and dij can be measured out.

• The negative correlation : the more the people talk with each other, the less they change their speaking style.

• The table listed below supports the above mentioned hypothesis:

Page 12: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Analysis

Correlation between change in speech features and tie strength

Page 13: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Analysis

Correlation between speaking style and strength of lies• Assumption: People’s normal behavior is based more on the regular

interaction partners than by rare partners• Hypothesis : people change their way of speaking more when they are

speaking with a person who is more central to the network.• The length of the edge for i->j is calculated using (1 – cij) where cij is

that time spent in conversation between I and j. • The centrality of the person is calculated by calculating the

multiplicative inverse of the mean distance of all the points from i.• For no conversation, the edge length will be infinite and for longer

conversation, the length will be shorter.• dik is the change in the feature of all k who speak with i. The higher

the incoming mean, the more the people change their style talking to i.

• Thus dik is correlated with the centrality of the subject.

Page 14: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Analysis

Correlation between change in speech features and tie strength

Page 15: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

• More than sociological applications• More than just the identity of the call, the content of the conversation

can be noted and the interruptibility can be predicted.• Can be used in the mobile phones, to divert the incoming calls or

messages to the voicemail during important conversation• Can used to study difference in conversational characteristics of

male/female, people across different geographical regions.

Application

Page 16: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

Questions?

Page 17: Presented By: Karan Parikh knparikh@usc.edu Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent

• [Donovan, 1996]• http://www.mssociety.org.uk/• Danny Wyatt, Tanzeem Choudhury, Henry Kautz ,” Conversation

Detection and Speaker Segmentation in Privacy-Sensitive situated speech data “

• Jeff Bilmes, Danny Wyatt, Tanzeem Choudhury, Henry Kautz “

References