cognitive load measurement using speech/linguistic features

From imagination to impact

Using Information to Drive Decisions

Cognitive Load Measurement using Speech/Linguistic Features

Dr. Fang Chen

NICTA Copyright

2010

1

Dr. Fang [email protected]

Outline

• Background

• Research Applications

• Speech and Language Analyses

• Data Sets:

– Reading Experiment– Reading Experiment

– Touch-table Collaborative Experiment

– Bushfire Study

– Driving Experiment

2

Background

• Cognitive loadCognitive loadCognitive loadCognitive load (CL):(CL):(CL):(CL): refers to the mental demand imposed on

working memory by a particular task.

• Working Memory:Working Memory:Working Memory:Working Memory: limited capacity for holding information in

mind in the context of cognitive activity.

• Cognitive Load Theory:Cognitive Load Theory:Cognitive Load Theory:Cognitive Load Theory: development of the instructional

methods for effective use of people's limited cognitive

processing capacity.

3

Research Aims

• Overall:– Identification of potential indices of cognitive load for

• real-time,

• objective,

• non-intrusive •non-intrusive

•measurement of cognitive load.

• Specific to this research:– Identification of potential linguistic and grammatical features of

cognitive load.

4

Need for CL Measurement

• Overloading or underloading of cognitive

processing:

– degradation of performance, and/or

– failures of learning and performing, and/or

– source of performance errors. – source of performance errors.

• CL measurement is crucial for:

– minimising the amount of cognitive effort required,

– maintaining the right level of CL,

– achieving adaptive system response,

– improving user performance.

5

Cognitive Load Measures

• Subjective measures– e.g. self-reporting – manual, post-task, time-consuming, intrusive.

• Physiological measures– e.g. eyes, brain, skin biosensors – sensitive, signal noise, intrusive, lot of

complex equipmentcomplex equipment

• Performance measures – e.g. error rate, task performance – dual tasks

• Behavioral measures– e.g. speech, pressure mouse – can be automatic, non-intrusive

6

Research Applications

• Designing intelligent adaptive user interfaces for intensive

working/interaction environments.

– Emergency services e.g. Bushfire Cooperative Research Centre

– Road traffic control services e.g. Roads and Traffic Authority (RTA)

• Other potential areas:

– Call centers – Call centers

– Air traffic control rooms

– Pilot cockpits

– Online education / e-learning

– … and so on.

7

Speech and Linguistic Measures

• Why Speech?Why Speech?Why Speech?Why Speech?

– Sensitivity in the speechSensitivity in the speechSensitivity in the speechSensitivity in the speech modality shown by prior art.

– NonNonNonNon----intrusive,intrusive,intrusive,intrusive, easy to collecteasy to collecteasy to collecteasy to collect e.g. phone calls, conversations

– Objective measureObjective measureObjective measureObjective measure, not easily manipulated by the user

– RealRealRealReal----time analysistime analysistime analysistime analysis is possible (for some speech signal features)

– Widely availableWidely availableWidely availableWidely available, in a number of application scenarios

• What measures?What measures?What measures?What measures?

– Pauses and response latency

• Pausing differently under different conditions.

– Language and word usage

• Using particular words and/or phrases at specific sentence and/or paragraph

positions;

– Grammar features and structures

• Using particular types of linguistic/grammatical categories;

• Using a particular type of syntax or grammatical structure i.e. usage of parts of speech

and their forms;8

Experiment Setup

• A user study with two controlled levels of cognitive load

– Elicit natural speech from users

The Sun

The Sun has "burned" for more than 4.5 billion years and will continue to do so for several billion more. It is a massive collection of gas, mostly hydrogen and

• A reading and comprehension task

– General knowledge (avoid the

expertise effect)

– Reading the extract

NICTA Copyright 2010 9

collection of gas, mostly hydrogen and helium. Because it is so massive, it has immense gravity, enough gravitational force to hold all of hydrogen and helium together (and to hold all of the planets in their orbits around the Sun!). The Sun does not "burn" like wood burns – it is a gigantic nuclear reactor….

– Reading the extract

– Answer open-ended questions

• Give a short summary of the story

in at least five whole sentences.

• What was the most interesting

point in this story?.

• Describe at least two other points

highlighted in this story.

Story-reading Experiment

• Experimental setup

– Story reading followed by Q&A

– 3 different levels of text difficulty (Lexile Framework for Reading,

www.lexile.com)

– 3 stories in each of the 2 sessions (fixed order)

• 1st session: 2nd session:

– “Sleep” (900L), “Smoke Detectors” (950L),


– “Sleep” (900L), “Smoke Detectors” (950L),

– “History of Zero” (1350L) & “Hurricanes” (1250L) &

– “Milky Way Galaxy” (1400L) “The Sun” (1300L)

• 5 minutes break between sessions

– Dual-task for “Milky Way Galaxy” & “The Sun”

• Counting of background spoken numbers while reading the stories and

answering the questions

Experiment Setup

• Cognitive load level design– Lexile Framework for Reading (200L 1st grade, 1700L grad)

• Syntactic and semantic complexity, vocabulary

– Text with same difficulty for both conditions

– Aural dual task, counting numbers during reading and answering

Task Load LevelTask Load LevelTask Load LevelTask Load Level Lexile RatingLexile RatingLexile RatingLexile Rating Dual TaskDual TaskDual TaskDual Task


• Participants

– 15 native English speakers as subjects (8 females and 7 males)

Task Load LevelTask Load LevelTask Load LevelTask Load Level Lexile RatingLexile RatingLexile RatingLexile Rating Dual TaskDual TaskDual TaskDual Task

Low 1300L No

High 1300L Yes

Reading Experiment Data – Pause Analysis

12

Pause Analysis – Results Summary

*p<0.05, n=24.

13

Touch-table Collaboration Study - Lab Data

• Collaborative tasks using multi-touch tabletop screen.

• Interactive Firefighting tasks.

• 10 groups x 4 members = 40 subjects + (1 Pilot group)

– 30 Commanders + 10 Leaders

– 39 subjects data available (1 leader’s data missing)

• Speech Transcriptions using ELAN.• Speech Transcriptions using ELAN.

• Extracted and cleaned for LIWC and other analysis tools.

• Analysis completed:

– Subjective Ratings

– Grammar features - Pronouns

– Word Category Features

– Language Complexity Features

14

Touch-table Study Design

16

Lab Data – Some Hypotheses

• Higher subjective ratings under high load task.

• More speech and longer sentences.

• More and longer pauses under high load task.

• More use of:

– Negative emotion words, inclusive words, swear words, cognitive and – Negative emotion words, inclusive words, swear words, cognitive and

perceptive phrases, disagreement words etc.

• Less use of:

– Positive emotion words, agreement, certainty, achievement words

• More hesitations and incomplete sentences

• More use of plural pronouns and less use of singular ones.

• More complex sentences under high load task.

17

Lab Data – Subjective Ratings

18

Lab Data – Linguistic Analysis (Words)

19

Lab Data – Linguistic Analysis (Pronouns)

• Singular pronouns decrease

• Plural pronouns increase

20

Lab Data – Linguistic Analysis (Pronouns)

• Interaction between Singular and Plural Personal Pronouns

21

• Language complexity measuresLanguage complexity measuresLanguage complexity measuresLanguage complexity measures

• Measured by two major factors:

– Semantic difficultySemantic difficultySemantic difficultySemantic difficulty: observes the use of words, their frequencies, and

their lengths (both in syllables as well as alphabets/characters).

– Syntactic complexitySyntactic complexitySyntactic complexitySyntactic complexity: observes primarily the sentence length, which

is considered as the best indicator of text or language complexity.

Lab Data – Language Complexity Analysis

is considered as the best indicator of text or language complexity.

• Hypotheses

– Language Complexity increases

– Lexical Density decreases

22

• Lexical Density (Vocabulary Richness) – expected to decrease

• Hard Word Ratio – expected to decrease

• Gunning Fog Index

Lexical Density is the estimated measure of content per

functional and lexical units or lexemes in total text. In simple

words, it is a measure of the ratio of unique words to the total

number of words.

Lexical Density = (different words / total words) x 100

A word is considered complex or hard if it has three

or more syllables and does not contain a hyphen ( -

). For example, the word ‘density’ has three

syllables.

Complex Word Ratio is the measure of the ratio of

complex words to the total number of words.Gunning Fog Index calculates the syntactic complexity of

language using sentence lengths and complex words and

implies that short and simple sentences in plain English

achieve a better score (lower value) than long sentences in Flesch-Kincaid Grade calculates the language difficulty using


• Gunning Fog Index – expected to increase

• Flesch-Kincaid Grade – expected to increase

• SMOG Grade – expected to increase

• Lexile Level – expected to increase

achieve a better score (lower value) than long sentences in

complicated language.

Gunning Fog Index = 0.4 x (ASL + ((SYW / words) x 100))

Where:

ASL = Average sentence length (the number of words divided

by the number of sentences)

SYW = Number of words with three or more syllables

Flesch-Kincaid Grade calculates the language difficulty using

average sentence lengths and average syllables per word. It

estimates the number of years of education required to

understand the written or transcribed text.

Flesch-Kincaid Grade = (0.39 x ASL) + (11.8 x ASW) – 15.59

Where:

ASL = Average sentence length (the number of words divided by

the number of sentences)

ASW = Average number of syllables per word (the number of

syllables divided by the number of words)

The SMOG Grade also estimates the number of education years

needed to fully comprehend the text. It uses sentences and

complex words to calculate it. The emphasis on full

comprehension distinguishes this measurement from other

complexity measures.

SMOG Grade = square root of ((SYW / sentences) x 30) + 3

Where:

SYW = Number of words with three or more syllables

Lexile Level also measures the comprehension complexity

of any text. A Lexile measure is the numeric representation

of a text’s difficulty ranging from 200L for easy to above

1700L for complicated texts. It uses mean sentence

lengths and mean log word frequency to calculate it.

23


24


25

Bushfire Data - Introduction

• Speech and transcription data from Bushfire CRC.

• Training exercises – four states (TAS, VIC, NSW, and QLD).

• Three roles: Incident Controller (IC), Planning, Operations.

• 11 exercises, 33 subjects

• All exercises monitored by bushfire management experts.

• Operators co-located in a control room and trained for


• Operators co-located in a control room and trained for roles.

• Data collection, transcription, coding, cleaning, analyses.

• Four different load levels: – (1) ‘low’: casual conversation, no time pressure;

– (2) ‘medium’: routine tasks;

– (3) ‘high’: challenging tasks, time constraints; and

– (4) ‘very high’: very challenging, lot of unexpected events and breakdowns.

• Combined into low and high.

Bushfire Data – Same Hypotheses

• Higher subjective ratings under high load task.

• More speech and longer sentences.

• More and longer pauses under high load task.

• More use of:

– Negative emotion words, inclusive words, swear words, cognitive and – Negative emotion words, inclusive words, swear words, cognitive and

perceptive phrases, disagreement words etc.

• Less use of:

– Positive emotion words, agreement, certainty, achievement words

• More hesitations and incomplete sentences

• More use of plural pronouns and less use of singular ones.

• More complex sentences under high load task.

27

Bushfire Data – Linguistic Analysis (Words)

28

Bushfire Data – Linguistic Analysis (Pronouns)

• Singular pronouns decrease

• Plural pronouns increase

29

Bushfire Data – Linguistic Analysis (Pronouns)

• Interaction between Singular and Plural Personal Pronouns

30

Bushfire Data – Language Complexity Analysis

31

Other Linguistic Analysis Possibilities

• NNNN----gram Analysisgram Analysisgram Analysisgram Analysis

• Bi-gram Ratio

• Others:• Most common N-grams

(Bigrams, Trigrams, 4-grams)

• Most common words (Unigrams)70%

80%

90%

100%

Perc

ent

Bi-gram Ratio

• Most common words (Unigrams)

• Most frequent or least frequent N-grams

• More…

• Parse Tree AnalysisParse Tree AnalysisParse Tree AnalysisParse Tree Analysis

– Order of nOrder of nOrder of nOrder of n----gramsgramsgramsgrams

• For both For both For both For both –––– words and parts of speech.words and parts of speech.words and parts of speech.words and parts of speech.

L1L1L1L1 L2L2L2L2 L3L3L3L3 L4L4L4L4 pppp

BiBiBiBi----gram Ratiogram Ratiogram Ratiogram Ratio 93.5% 80.9% 79.4% 72.6% 0.0002

50%

60%

1 2 3 4

Load Level

32

An Abstract CLM Model

• Automatic, Real-time, Non-intrusive

33

Looking at Data Sets

• Reading Experiment

• Touch-table Collaborative Experiment

• Bushfire Study

• Driving Study

34

Driving Study Data - Introduction

• Simulated Driving Experiment

• Investigate how the distractions can affect the performance of the user

• Identification of features to measure users’ cognitive load.

• 18 participants (8 females and 10 males)

• Data collected:– Video (2 cameras, front and rear view)

• Eye gaze movement

– Audio

– Galvanic Skin Response (GSR) or skin resistance

35

Driving Study Data – Experiment Setup

• Big screen for game

• Front camera

• Simulator frame

• Wireless headset

• Bio-sensor (GSR)• Bio-sensor (GSR)

• Speakers at back

• Rear Camera

36

Future Challenges

• Areas for future work– Development of larger databases

– Task dependant and task independent feature

• Need to take lab experiments ‘into the wild’

– Defining, researching and standardising tasks of interest

– Joint modeling of linguistic, speaker and cognitive load/emotion – Joint modeling of linguistic, speaker and cognitive load/emotion

information

37

Exploring MultimodalitiesExploring Multimodalities

38

Exploring Multimodality

• Hypothesis:– Users are more likely to use complimentary multimodal productions

as cognitive load increases

– Users will tend to rely on one modality more as cognitive load increases

• Method:


• Method: – Wizard of OZ scenario: speech and gesture interface for a series of

map based tasks; task increasing in difficulty by varying quantity of content and time-pressure

– Conditions for Speech Only interaction, Gesture Only interaction and Multimodal

– Videotape participants, record audio, record answers, post-hoc introspection questionnaire

Multimodality and Cognitive Load

• Exploring Multimodal Interface

Scenarios

– The recognisers in the interface

will capture the user’s input and

interpret the information and

choose and appropriate response

Cognitive Load Analysis

User

Characteristics

Visual Data

Audio Data

Physiological Data

Environmental


choose and appropriate response

– Opportunity to capture interaction

data implicitlyTask

Characteristics

Environmental Data

Other

Modalities

Experiment Design

• Task:Task:Task:Task:

– Incident Management Response

E.g. A major accident on corner of X and Y.

– Operators are required to deploy necessary crews and implement policies

and procedures

• Method:Method:Method:Method:

– Elicit speech and free-hand gesture interface for a series of map based

tasks;

41

tasks;

– Wizard of OZ scenario

– Videotape participants, record audio, record answers, post-hoc

introspection questionnaire

• Dependant Variables: Dependant Variables: Dependant Variables: Dependant Variables:

– Biosensor input: GSR and BVP

– Gesture: video footage

– Speech: transcribed manually

– Performance: latency, completion time & error-rates

– Multimodal productions: manual annotation

Examining Multimodal Input Structures


The Task

• There are 36 small tasks, divided into 3 groups of 12.

• Each group of 12 will consist of maps from 4 different cities:

• Each new task will be given to you at the top of the screen:– e.g. There has been an accident on the corner of Victoria and Liverpool Street.

• The tasks will be carried out using different modes: – speech + gesture together,


– speech + gesture together,

– speech-only and

– gesture-only

The experimenter will tell you which mode you should be using for each task.

• The task will first require some visual search for information.

• There are only three things the system can do:1. Zooming in and out of maps

2. Selecting map elements

3. Tagging map elements

The TaskToolbox

Task

Description

NICTA Copyright 2010 44Information/Feedback Area

Map

Zooming Map Levels

Lower-level map

Contains selectable

elements; can zoom out

to higher level map


Top-level map

No selectable elements:

divided into four quadrants by

a dotted black line

Selectable Elements

• Selected elements will be shown with a blue border.

==>==>==>==>

School

Petrol Station

Library

Fire Station


Library

Shopping Centre

Parking Station

Intersection

Hospital

RTA Branch

Church

Tagging Map Elements

Accident: e.g. car accident, fire, flooding

� Green border

Tagging is a two-step process:

1. Select map element ->

2. Tag as Accident, Incident or Event -> ->


� Green border

Event: e.g. concert, protest march, fun run

� Red border

Incident: occurrence that might cause a disruption to the traffic, e.g.

broken-down car, or a traffic jam in peak hour

� Yellow border

Info: Information area beneath the map ->

Clear: Clears all tags for selected element

Special Tag: Notifying

Two parts: The element and the recipient need to be specified.

• Select map element (e.g. Intersection, marked as accident)

->


• Select NOTIFY action

� PINK tag appears ->

• Select the recipient map element (RTA Branch, Fire Station…)

� AQUA tag appears ->

Zooming

• 2 zoom levels

• Lower level maps have selectable elements

• Zoom in: 4 quadrants

Top-level zoomable Map

(no selectable elements)


• Zoom in: 4 quadrants

• Zoom out

Lower-level Map

with selectable

elements

The Modalities

• Speech

– Short and sweet

– No specific words, no specific word order�We only give some suggestions

– Speak clearly and loudly

Zooming Zoom into the top right quadrant

Top right quadrant


Top right quadrant

Zoom in to top right

Zoom out please

Selecting Select the Church on Liverpool Street

Church on Liverpool

Please highlight the Church

Tagging Make selected Church an accident (or incident or event) zone

Selected Church. Accident.

Accident.

The Modalities (2)

• Hand Gestures

– Pointing

– Hand shapes

Zooming Point to quadrant and pause to select and zoom in.

Point to diagonal opposite ends of map, pause to zoom out.


Selecting Point to the element, pause until beep

Tagging Very clear hand shape (fist, flat palm, scissors, thumbs-up)

OR

Point to button in toolbox, pause to select

The Modalities (3)

• Multimodal

– Speech + gesture

– Any order or combination

– Speech only or gesture only are OK

– Examples:• “Make this into an accident” + pointing at element


• “Make this into an accident” + pointing at element

• “Zoom into this quadrant” + pointing at quadrant

• “Zoom out again”

Research DesignResearch DesignResearch DesignResearch Design

Balancing Available ModalitiesBalancing Available ModalitiesBalancing Available ModalitiesBalancing Available Modalities• The traffic incident management (TIM) domain was used, and subjects

were required to update a geographical map with traffic conditions information. Following our requirement, tasks were achievable using the following modalities:

– GestureGestureGestureGesture:

• Deictic pointing to map locations, items, and function buttons;


• Deictic pointing to map locations, items, and function buttons;

• Circling gestures for zoom functions.

– Hand ShapesHand ShapesHand ShapesHand Shapes: Predefined hand shapes for item tagging: fist, open palm, thumbs up etc

– SpeechSpeechSpeechSpeech: street names, actions etc

• A large overlap was introduced across modal ways of performing actions. However, some tasks required the combination of modalities.

Task Design

• Task Specification

– Task was given in written mode

– Users had freedom of inspection

– The task described a situation, but did not specify activities, e.g.

“An incident has occurred: a truck has lost some of its load at Walter

Avenue and Lytton Road, near Mowbray Park”


Avenue and Lytton Road, near Mowbray Park”

• Task Activities

– Locate point of interest on the map

– Mark with one of 3 tags: accident, incident or event

– Notify relevant authorities, e.g. if casualties exist, notify a hospital.

– 11 different kinds of functionality available

Task Difficulty Level DesignTask Difficulty Level DesignTask Difficulty Level DesignTask Difficulty Level Design

• There were four levels of cognitive load, and three tasks were completed for each level.

• The same visual was used for each level to avoid differences in visual complexity.

• The tasks varied in load through:

– The number of distinct entitiesnumber of distinct entitiesnumber of distinct entitiesnumber of distinct entities in the task description;

– The number of distractorsnumber of distractorsnumber of distractorsnumber of distractors (items not needed for the task);

– The minimum number of actionsminimum number of actionsminimum number of actionsminimum number of actions required for the task.


– The minimum number of actionsminimum number of actionsminimum number of actionsminimum number of actions required for the task.

– Further load was achieved in Level 4 by introducing a time limit.

Level Entities Actions Distractors Time

1 6 3 2 ∞

2 10 8 2 ∞

3 12 13 4 ∞

4 12 13 4 90 sec.

Available Modalities

• The Modalities

– Aimed to capture natural patterns of

speech and gesture combinations

– Speech: natural spoken language

‘recognised’ by an operator

• Avoids bias injected by errors in

recognisers

– Gesture: automated hand tracking

InputInputInputInput SpeechSpeechSpeechSpeech GestureGestureGestureGesture

Select “Select” Point

Zoom “Zoom” Circling

Notify “Notify Thumbs up

Tag

Accident

“Accident” Fist


– Gesture: automated hand tracking

• Untethered: no equipment used on

the person

• Both tracking of the hand and hand

shapes used

• Buttons added to reduce

expressivity gap between gesture

and speech

– Either or both could be used for

each command

Accident

Tag Incident “Incident” Open Palm

Tag Event “Event” Scissors

Example of Interaction

<Point at location>; or“St Mary’s Church”

Selecting a location/item of interest

<Point at quadrant>; or“Zoom in to the top right quadrant”

Zooming in or out of a map

Example of InteractionSystem

Functionality


“End task”; or<Point at End task button>

Starting or ending a task

<Select accident> and“notify”; or fist shape and <Select recipient>

Notifying a recipient (item) of an accident, incident or an event

<Select location> and:“Incident”; orScissors shape

Tagging a location of interest with an ‘accident’, ‘incident’or ‘event’ marker

interest

Wizard of Oz

Wizard

Camcorder

Main computer


Firewire

camera

Camcorder

AGR

Data CapturedData CapturedData CapturedData Captured

• The study generated various streams of data that were captured as

follows:

– Speech was orthographically transcribed, including specific tags for

disfluencies such as false starts, hesitations. Start and end time were

annotated for each utterance;

– Hand motion was captured by the automatic gesture recogniser at the rate

of 20 frames per seconds. Positions are relative to the camera view angle;


of 20 frames per seconds. Positions are relative to the camera view angle;

– Deictic pointing (pause while pointing, or circling) and hand shapes were

annotated at two levels: the video was annotated to mark the start and end

time of the overall motion leading to the gesture.

– System feedback to the user such as task change (marked by a beep), item

information, or error message were recorded with their time of occurrence;

– Bio-sensor data was recorded at the rate of 100 points per second. Skin

conductance is measured in micro Siemens (µS) while blood volume pulse

only provides relative measures expressed in percentage.

Sample of Annotation

Turn Construction Modality Content

Mark an

Incident

(A)

Select

(a)

Gesture [point to St Mary’s Church]

Speech “Select St.Mary’s Church”

Tag

(a)

Shape [scissors=Incident]

Speech “Incident”



Mark an

Accident

(C)

Select

(c)

Speech “Select Crown Street Library”

Tag

(c)Shape [fist=Accident]

Mark an Event

(B)

Select

(b)

Speech “Select”

Gesture [point to Collingwood School]

Tag

(b)

Shape [open_palm=Event]

Results and Analysis

• Users:Users:Users:Users: 15 available

• Total inputs:Total inputs:Total inputs:Total inputs: 1119

• Total turns:Total turns:Total turns:Total turns: 394 (206 MM)

• Total constructions:Total constructions:Total constructions:Total constructions: 644

• Average difficulty rating for levels (subjective)� Level 1 (easiest): 2/10

• Redundancy and Complementarity:– Each user command in the system

requires an action and an object• Speech and/or

• Gesture-HandShape

• Redundancy– Doubling up of either action or object

information or both


� Level 1 (easiest): 2/10

� Level 2: 4.2/10

� Level 4 (hardest): 5/10Action Object

Speech √ √

Gesture √ √

information or both

• Complementarity– Action and object come through

different modalities

Action Object

Speech √

Gesture √

Rates of Redundancy

40

50

60

70

80

90

Q1

Min

Mean

Max

• Redundancy:– Conveying the same information over

more than one modality,

– Either would be sufficient on its own

Turn Const Modality Content

Pure

Redundant

Select Gesture [point to St Mary’s Church]

Speech “Select St.Mary’s Church”


0

10

20

30

40

Level1 Level2 Level4

Max

Q3

Proportion of Purely Redundant turns by Level

• We found a statistically significant decreasedecreasedecreasedecrease in the number of purely redundant turns from – 62.91% in Level 1 to

– 29.9% in Level 4 of all multimodal turns.

Tag Hand_Shape [scissors=Incident]


30

40

50

60

70

Purely redundant

Partially redundant

Redundancy


0

10

20

30

Level1 Level2 Level4

Purely

complementary

We observed a steady decrease in redundancy as task difficulty increased. An ANOVA test between-users, across levels, shows there are significant differences between the means (F =3.88 (df=2); p<0.05).

Rates of Complementarity

• Complementarity:– Conveying different information over different modalities

e.g.

Turn Action Modality Content

Pure

Complement

Select Speech “Select St Mary’s Church”


• We also found trends of increased multimodal complementarity across levels:– 12.86% in Level 1

– 45.53% in Level 2, and

– 36.02% in Level 4

Tag Hand_Shape [scissors=Incident]

Cognitive and Working Memory Theories

• Why?Reduced level of redundancy + increased level of complementarity, suggests a specific working memory strategy

• Modal Model of Working Memory [Baddeley, 92]

• Working Memory Strategies:

Phonological Loop


• Working Memory Strategies:

– Activity is shifted to areas marked exclusively for modal use

– At high load, users try to maximise the usage of modal working memory

– Users channel the required semantic chunks to different modalities, with the least amount of least amount of least amount of least amount of replicationreplicationreplicationreplication possible

Central Executive

Visual-Spatial Sketchpad

Discussion and Challenges

• Results:

– The results of this study give initial evidence for

redundancy/complementarity behavioural symptom of cognitive load

management employed by users

• Sensitivity and Diagnosticity:

– ‘Ceiling’ values for rates of redundancy or complementarity


– ‘Ceiling’ values for rates of redundancy or complementarity

– Clearly not suitable for all users

• Automatic cognitive load estimation:

– A compound measure

– Various individual modal measurements for robustness

– Weighting of features on a per-user basis

• more reliable indices will influence a combined measure more strongly

cognitive load measurement using speech/linguistic features

Documents