brainnet : combining evidence from corpora and from the brain to study conceptual representations

62
BrainNet: Combining evidence from corpora and from the brain to study conceptual representations Massimo Poesio Uni Essex, Language & Computation Uni Trento, CIMEC/CLIC

Upload: lysa

Post on 26-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

BrainNet : Combining evidence from corpora and from the brain to study conceptual representations. Massimo Poesio Uni Essex, Language & Computation Uni Trento, CIMEC/CLIC. COLLABORATORS. Trento: ANDREW ANDERSON YUQIAO GU YUAN TAO MARCO BARONI GABRIELE MICELI. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

BrainNet: Combining evidence from corpora and from the brain to

study conceptual representations

Massimo PoesioUni Essex, Language & Computation

Uni Trento, CIMEC/CLIC

Page 2: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

COLLABORATORS

Essex:HEBA LAKANY (now Strathclyde)FRANCISCO SEPULVEDA

BRIAN MURPHY(CMU, was Trento)

Trento: ANDREW ANDERSONYUQIAO GUYUAN TAOMARCO BARONIGABRIELE MICELI

Page 3: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

MOTIVATIONSResearch on conceptual knowledge is carried

out in Artificial Intelligence, Computational Linguistics, Neural Science, and Psychology

But there is limited interchange between AI, CL and the other disciplines studying concepts Except indirectly through the use of WordNet

This line of research: use evidence from Neural Science, work on (vector-space) models in CL, and psychology to rethink the design of lexical repositories such as WordNet

Page 4: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

THE (LEXICAL) SEMANTICS REVOLUTION IN CL AND AIThe availability of repositories of lexical

knowledge such as ConceptNet, Cyc, FrameNet, and especially WordNet, has had a dramatic impact on research and development in HLT and AI, leading to the development of the first HLT systems able to do (some form of) lexical semantic interpretation on large amounts of data

This extensive use however has also highlighted the limitations of such resources (focusing here on WordNet as it’s the best known)

Page 5: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

LIMITATIONS OF WORDNETAlready familiar from the CL literature:

CoverageOverly fine-grained distinctions

More fundamental problems:Evidence for categorical distinctionsAssumptions about taxonomic structureLack of information about function / perceptual

propertiesEmotional import

Page 6: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

ENCOUNTERING WORDNET’S LIMITATIONS: A TYPICAL EXAMPLE Between 2003 and 2006 Abdulrahman Almuhareb

and myself ran a series of studies on ontology learning from text (Poesio & Almuhareb, 2008)

We used WordNet to identify the categories of interest and to evaluate the results of our system

Page 7: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

QUANTITATIVE EVALUATION

ATTRIBUTES PROBLEM: can’t compare against WordNetPrecision / recall against hand-annotated datasetsHuman judges (ourselves):

We used the classifiers to classify the top 20 features of 21 randomly chosen concepts

We separately evaluated the results

CATEGORIES:Clustering of the balanced datasetPROBLEM: The WordNet category structure is highly

subjective

Page 8: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

CLUSTERING: ERROR ANALYSISANIMAL bear, bull, camel, cat, cow, deer,

dog, elephant, horse, kitten, lion, monkey, puppy, rat, sheep, tiger, turtle

Page 9: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

CLUSTERING: ERROR ANALYSISANIMAL bear, bull, camel, cat, cow, deer, dog,

elephant, horse, kitten, lion, monkey, puppy, rat, sheep, tiger, turtle

EDIBLE FRUIT

apple, banana, berry, cherry, fig, grape, kiwi, lemon, lime, mango, melon, olive, orange, peach, pear, pineapple, strawberry, watermelon, (pistachio, oyster)

Page 10: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

CLUSTERING: ERROR ANALYSISANIMAL bear, bull, camel, cat, cow, deer, dog, elephant,

horse, kitten, lion, monkey, puppy, rat, sheep, tiger, turtle

EDIBLE FRUIT

apple, banana, berry, cherry, fig, grape, kiwi, lemon, lime, mango, melon, olive, orange, peach, pear, pineapple, strawberry, watermelon, (pistachio, oyster)

ILLNESS acne, anthrax, arthritis, asthma, cancer, cholera, cirrhosis, diabetes, eczema, flu, glaucoma, hepatitis, leukemia, malnutrition, meningitis, plague, rheumatism, smallpox, (superego, lumbago, neuralgia, sciatica, gestation, menopause, quaternary, pain)

IN WORDNET: PAIN

Page 11: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

EXAMPLE: STATES AND RELATIONS

FEELING STATE ATTRIBUTE

FEELING

STATE ATTRIBUTE

WORDNET

PLAUSIBLE ALTERNATIVE?

Page 12: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

LIMITS OF THIS TYPE OF EVALUATIONNo way of telling how complete / accurate are our

concept descriptionsBoth in terms of relations and in terms of their

relative importance

No way of telling whether the category distinctions we get from WordNet are empirically founded

Page 13: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

EVIDENCE FROM OTHER AREAS OF COGNITIVE SCIENCE

Attributes: evidence from psychologyAssociation lists (priming)

E.g., use results of association tests to evaluate proximity (Lund et al, 1995; Pado and Lapata, 2008)

Comparison against feature norms: Schulte im Walde, 2008)

Feature norms

Category distinctions: evidence from neural science

Page 14: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

USING BRAIN DATA TO IDENTIFY CATEGORY DISTINCTIONS

Studies of brain-damaged patients have been shown to provide useful insights in the organization of conceptual knowledge in the brainWarrington and Shallice 1984, Caramazza &

Shilton 1998fMRI has been used to identify these

distinctions in healthy patients as wellE.g., Martin & Chao

See, e.g., Capitani et al 2003 for a survey

Page 15: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

Magnetic Resonance ImagingScanner

Page 16: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

fMRI Setup

SETUP

Page 17: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

19

Simple Paradigms

Image visualisation

Property elicitation

Silent naming

Concept “simulation”

Page 18: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

CATEGORY DISTINCTIONS IN THE BRAIN

ANIMALSTOOLS

VOXEL

Page 19: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

A MORE COMMON CASE

d. RED: Law, BLUE: Music

Page 20: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

MVPA: USING SUPERVISED LEARNING TO CLASSIFY ACTIVATION PATTERNSSimple experiment: Show subjects pictures of

different objects (e.g., shoes vs. bottles) on different trials of different runs

Page 21: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

FROM WORDNET TO BRAINNETNeural evidence, unlike the evidence used to

compile dictionaries and WordNet, and like the evidence one gathers from corpora and certain behavioral experiments, is entirely objective (although it can be subjective in the sense of differing from subject to subject)

The objective of our research is to combine evidence from brain data, from corpora, and from behavioral experiments (all of which is rather noisy) to develop a new architecture for conceptual knowledge: BrainNet

Page 22: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

FIRST CASE STUDY: ABSTRACT CONCEPTSUntil recently, most work on concepts in CL /

neuroscience / psychology focused on concrete conceptsBut the type of conceptual knowledge that really

challenges traditional assumptions about its organization are `abstract concepts’ – or to be more precise, the set of categories of non-concrete concepts Events / actions States ‘Urabstract’ concepts: LAW, JUSTICE, ART

We are carrying out explorations of abstract knowledge using fMRI

Page 23: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

THEORIES OF ABSTRACT CONCEPTS IN COGNITIVE NEUROSCIENCEIn CL/AI: TAXONOMIC organization for both

abstract and concrete conceptsBest known Cognitive Neuroscience: Paivio’s DUAL

CODE theory (Paivio, 1986)CONCRETE: verbal system & visual systemABSTRACT: verbal system only

Schwanenflugel & Akin 1994: CONTEXT AVAILABILITY

Barsalou’s SCENARIO-BASED MODEL (Barsalou, 1999):Abstract knowledge organized around SCENARIOS

Page 24: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

THE OBJECTIVES OF OUR EXPERIMENT Identify the representation in the brain of a variety of WordNet

categories exemplifying both concrete and abstract concepts (abstract words chosen by inspecting the words rated as most abstract in the De Rosa et al norms 2005) Really abstract: ATTRIBUTE, COMMUNICATION, EVENT, LOCATION,

‘URABSTRACT’ A category of concrete objects: TOOLS A complex category: SOCIAL-ROLE

Comparing two types of classification: TAXONOMIC (as in WordNet) DOMAIN (cfr. Barsalou’s hypothesis about abstract concepts being

‘situated’)

Two domains: LAW and MUSIC Using WordNet Domain

Page 25: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

STIMULICATEGORY LAW (English) MUSIC (English)

attributegiurisdizione jurisdiction sonorita' sonority

cittadinanza citizenship ritmo rhythm

impunita' impunity melodia melody

legalita' legality tonalita' tonality

illegalita' illegality intonazione pitchcommunication divieto prohibition canzone song

verdetto verdict pentagramma stave

ordinanza decree ballata ballad

addebito accusation ritornello refrain

ingiunzione injunction sinfonia symphony

Page 26: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

STIMULI, 2: URABSTRACTS

CATEGORYurabstracts giustizia justice musica music

liberta' liberty blues blues

legge law jazz jazz

corruzione corruption canto singing

refurtiva loot punk punk

Page 27: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

STIMULI, 3: SOCIAL ROLES

Social-role giudice judge musicista musician

ladro thief cantante singer

imputato defendantcompositore composer

testimone witness chitarrista guitarist

avvocato lawyer tenore tenor

Page 28: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

ABSTRACT CONCEPTS: DATA COLLECTION AND ANALYSIS 7 right-handed native speakers of Italian Task:

Words presented in white on grey screen for 10 sec Cross in between, 7 sec Subjects had to think of a situation in which the word applied

Scanner: 4T Bruker MedSpec MRI scanner, EPI pulse sequence TR=1000ms, TE=33ms, 26° flip angle. Voxel dimensions 3mm*3mm*5mm

Preprocessing: using UCL’s Statistical Parameter Mapping Software Data corrected for head motion

Classification: using a single layer NN

Page 29: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

MAIN QUESTIONSCan the taxonomic and domain classes be

distinguished from the fMRI data?Is there a difference in classification accuracy

between taxonomy and domain?Can the taxonomic and domain classes be

predicted across participants?

Page 30: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

RESULTS WITHIN PARTICIPANTS (CATEGORY DISTINCTIONS)

ALL CATEGORICAL DISTINCTIONS CAN BE PREDICTED ABOVE CHANCETHERE ARE SIGNIFICANT DIFFERENCES BETWEEN CATEGORIES

Page 31: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

RESULTS WITHIN PARTICIPANTS(DOMAIN)

Page 32: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

WITHIN PARTICIPANTS RESULTS SUMMARYCan discriminate with accuracy well above chance

both taxonomic and domain distinctionsEasiest categories to recognize: TOOL, ATTRIBUTE,

LOCATION, Then SOCIAL ROLE, COMMUNICATIONMain confusions: communication / event

Page 33: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

Red: AttributeBlue: ToolGreen: Location

R+G=YellowG+B=CyanR+B=PinkR+G+B=White

CATEGORY LOCALIZATION IN THE BRAIN

Page 34: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

Red: Social-roleGreen: AttributeBlue: Urabstract

Red: Social-roleGreen: CommunicationBlue: Event

R+G=YellowG+B=CyanR+B=PinkR+G+B=White

Page 35: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

Concrete taxonomic classes tool and location can be predicted across participant, attribute can also be significantly classified, but less concrete classes become conflated with attribute.

In general domain can be predicted across participants, however domain membership is much better classified in the most abstract taxonomic classes (attribute, communication and urabstract)

Visually apparent inter-region differences in activation.

The precuneus appears to contain voxels systematically associated with independent taxonomic/topical categories.

CROSS PARTICIPANTS RESULTS SUMMARY

Page 36: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

Concrete categories TOOL and LOCATION can be predicted across participant; ATTRIBUTE can also be significantly classified; but less concrete classes become conflated with ATTRIBUTE.

In general DOMAIN can be predicted across participants, however domain membership is much better classified in the most abstract taxonomic classes (attribute, communication and urabstract)

CROSS PARTICIPANTS RESULTS SUMMARY

Page 37: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

LAW MUSICAttribute giurisdizione jurisdiction sonorita' sonority

cittadinanza citizenship ritmo rhythmimpunita' impunity melodia

melodylegalita' legality tonalita’

tonalityillegalita' illegality intonazione pitch

communication divieto prohibition canzone songverdetto verdict pentagrammastaveordinanza decree ballata balladaddebito accusation ritornello refrainingiunzione injunction sinfonia

symphony event arresto arrest concerto concert

processo trial recital recitalreato crime assolo solofurto theft festival festivalassoluzione acquittal spettacolo show

social-role giudice judge musicistamusician

ladro thief cantante singerimputato defendant compositore composertestimone witness chitarrista

guitaristavvocato lawyer tenore tenor

tool manette handcuffs violino violin

toga robe tamburo drummanganello truncheon tromba trumpetcappio noose metronomo metronomegrimaldello skeleton key radio radio

Location tribunale court/tribunal palco stagecarcere prison auditorium

auditoriumquestura police station discoteca discopenitenziario penitentiary conservatorio conservatorypatibolo gallows teatro

theatreurabstracts giustizia justice musica music

liberta' liberty blues blueslegge law jazz jazzcorruzione corruption canto singingrefurtiva loot punk punk

TAXONOMIC / DOMAIN ORGANIZATION

Page 38: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

WHAT THE DATA SUGGESTS

Page 39: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

EEG vs fMRIA question about conceptual organization that can

be clearly investigated using neural evidence is: Which categories can be distinguished?

But: fMRI too expensive to carry out systematic investigations (~500 euros x hour)Alternative: EEGUsed in BCI for a variety of ‘mind reading’ tasksAlso used to study semantics with ERPs

Page 40: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

EEG vs. fMRI

Page 41: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

49USING EEG TO STUDY SEMANTICS: ERP

• Features: signal amplitude and slope at range of resolutions gives compact representation of waveform

• N400 Violations of person and number in pronoun-verb agreement

• Up to 70% detection on single trials

• Gaussian Naive-Bayes, SM Log. Regression, Linear SVM

Page 42: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

EEG Spectral Analysis of Concepts?Participants presented

with aural or visual concept stimuli

EEG apparatus records electrical activity on the scalp

Waveforms can be reduced to frequency components

Page 43: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

EEG pros and consPros:

Lighter CheaperBetter temporal resolution (ms)

Cons:Coarser spatial resolution (cm)Noisy (e.g., very sensitive to skull depth)

Page 44: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

EEG CAN BE USED TO IDENTIFY MAJOR CATEGORICAL DISTINCTIONSMurphy et al, Brain and Language 2011:

7 Italian subjects30 animals, 30 toolsEach presented 6 timesTask: silent naming

Page 45: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

STIMULI

Page 46: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

EEG SIGNALS: TIME-FREQUENCY (PER CHANNEL)

Page 47: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

Data analysisClassification System Schematic

Filter by Time,

Freq and Eelectr.

CS

SD

D

ecomposition

Vector Transform

SupVec M

achine

var(“tool”), var(“anim

al”)

64 channels preprocessed data

X channels filtered data* “Tool” component

“Animal” component

Feature vector

Answer

?

Page 48: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

58

RESULTS

Murphy et al, 2011, Brain and Language

• Time/Freq window optimisation, CSP extraction of class-sensitive sources, 5-fold cross-validated SVM

• With group analysis, 98% accuracy categorising mammals vs tools

Page 49: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

PRELIMINARY CONCLUSIONSEEG can be used to decode broad categorical

distinctionsMay need to use fMRI to study

Finer grained distinctionsCross-language distinctions

Page 50: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

BRAIN EVIDENCE AND CORPUS EVIDENCECan we find ways of combining evidence about

strength of categorial distinctions coming from EEG / fMRI with the evidence coming from corpora?

First question: what is the relation between the conceptual spaces induced from corpora and the conceptual spaces elicited using EEG?

Page 51: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

PREDICTING BRAIN (FMRI) ACTIVATION USING CONCEPT DESCRIPTIONS

T. Mitchell, S. Shinkareva, A. Carlson, K. Chang, V. Malave, R. Mason and M. Just. 2008. Predicting human brain activity associated with the meanings of nouns. Science 320, 1191–1195

Page 52: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

MITCHELL ET AL 2008: METHODSRecord fMRI activation for 60 nominal concepts

And extract 200 ‘best’ features, or VOXELsBuild conceptual descriptions for these concepts from corpora

(the Web) 25 features for each concept 25 verbs expressing typical properties of living things / tools Collect strength of association between these features and each

conceptLearn association between each voxel and the 25 verbal

features using 58 concepts

Use learned model to predict activation of 2 held-out data (compare using Euclidean distance) Accuracy: 77%

Page 53: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

MITCHELL ET AL 2008

Page 54: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

MITCHELL ET AL 2008: VERB FEATURES

Page 55: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

MITCHELL ET AL: LEARNING ASSOCIATIONS

Page 56: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

OUR EXPERIMENTS Replicate the Mitchell et al study using EEG

data instead of fMRIDifferent feature selection mechanisms

Compare different methods for building concept descriptionsIn addition to hand-picked, also a variety of

standard corpus modelsFor ItalianB. Murphy, M. Baroni, and M. Poesio, EEG

responds to conceptual stimuli and corpus semantics, EMNLP 2009

Page 57: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

RESULTS USING THE HAND-PICKED FEATURES

Page 58: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

RESULTS USING AUTOMATICALLY SELECTED FEATURES

MITCHELL ET AL

AA-MP

Page 59: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

INTERIM SUMMARYIt is possible to establish systematic links between

knowledge about concepts acquired from corpora and knowledge extracted from brain data

These links may be used for instance to compare ontology learning methods (need however to extend the investigation of categorial distinctions discussed above)

Page 60: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

APPLICATIONS‘Mind-reading’ techniques can be used for a

variety of other studies of interest to CL typesDEEP RELATIONS: fMRI can be used to extract

information about POLARITYThis can be used for sentiment analysis in text

ADAM: being able to distinguish between ANIMALS and TOOLS using EEG can be used as an early predictor of certain classes of semantic dementia

Page 61: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

CONCLUSIONSEvidence from neuroscience, combined

with evidence from corpora and from behavioral studies, may be used to put our theories of the lexicon on a firmer empirical footing

The resulting resources may be more useful both for HLT and for other applications

Page 62: BrainNet : Combining  evidence  from corpora and from  the brain to study conceptual representations

THANKS!