reaction reaction workshop 2011.01.06 task 1 – progress report & plans lisbon, pt and austin,...
TRANSCRIPT
REAC
TIO
NREACTION Workshop 2011.01.06Task 1 – Progress Report & PlansLisbon, PT and Austin, TX
Mário J. SilvaUniversity of Lisbon, Portugal
REAC
TIO
N
Grants (paid by Reaction)
Sílvio Moreira (BI: Oct 1, 2010 – March 31, 2011 )
João Ramalho (BIC: Jan 1, 2011 – April 31, 2011)
REAC
TIO
N
Mining resources
Development of robust linguistic resources to process different types and genres of texts knowledge resources about media personalities:
recognizing and resolving references to named-entities;
sentiment lexicons and grammars: detecting the polarity of opinions about media personalities
annotated corpora: training different text classifiers and evaluating classification procedures
REAC
TIO
N
Mining resources
POWER - Political Ontology for Web Entity Retrieval
SentiLex-PT01 – Sentiment Lexicon for Portuguese
SentiCorpus-PT09 – Sentiment annotated corpus of user comments to political debates
REAC
TIO
N
POWER
POWER is an ontology that formalizes the domain knowledge defining a political landscape, i.e., the political actors and their roles in the political scene, their relationships and interactions. The ontology is foccused in describing:
Politicians Political Institutions with different levels of authority
(International, National, Regional,...) Political Associations Political Affiliations and Endorsements Elections Mandates
REAC
TIO
N
POWER
Currently, the ontology describes: 587 Political actors 17 (editions) of Political Institutions 16 Political Associations 900 Mandates
1 Election 6 Candidate Lists
from the Portuguese political scene
REAC
TIO
N
SentiLex-PT01SentiLex-PT01 is a sentiment lexicon for Portuguese made up of 6,321 adjective lemmas, and 25,406 inflected forms.
The sentiment entries correspond to human predicate adjectives
The sentiment attributes described in SentiLex-PT01 concern:
the predicate polarity,
the target of sentiment, and
the polarity assignment (which was performed manually or automatically, by JALC)
REAC
TIO
N
SentiLex-lem-PT01
8
6,321 lemmas
abatido.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN
abelhudo.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN
abençoado. PoS=Adj;TG=HUM;POL=1;ANOT=JALC
atrevido, PoS=Adj;TG=HUM;POL=0;ANOT=MAN
bem-educado.PoS=Adj;TG=HUM;POL=1;ANOT=MAN
brega.PoS=Adj;TG=HUM;POL=-1;ANOT=JALC
violento, PoS=Adj;TG=HUM;POL=-1;ANOT=JALC
Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01
REAC
TIO
N
SentiLex-flex-PT01
9
25,406 inflected forms abatida,abatido.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=MAN
abatidas,abatido.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=MAN
abatido,abatido.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=MAN
abatidos,abatido.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=MAN
bem-educada,bem-educado.PoS=Adj;GN=fs;TG=HUM;POL=1;ANOT=MAN
bem-educadas,bem-educado.PoS=Adj;GN=fp;TG=HUM;POL=1;ANOT=MAN
bem-educado,bem-educado.PoS=Adj;GN=ms;TG=HUM;POL=1;ANOT=MAN
bem-educados,bem-educado.PoS=Adj;GN=mp;TG=HUM;POL=1;ANOT=MAN
brega,brega.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=JALC
brega,brega.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=JALC
bregas,brega.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=JALC
bregas,brega.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=JALC
Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01
REAC
TIO
N
SentiCorpus-PT09SentiCorpus-PT09 is a collection of comments posted by the readers of the Público newspaper to a series of 10 news articles, each covering a televised face-to-face debate between the main candidates to the 2009 parliamentary elections.
The collection is composed by 2,795 comments (~8,000 sentences).
3,537 sentences, from 736 comments (27% of the corpus), were
manually labeled with sentiment information.
Sentiment annotation involves different relevant dimensions, such as
polarity, opinion target, target mention and verbal irony.
REAC
TIO
N
REAC
TIO
N
Main findings Real challenge in performing opinion mining in user-
generated content is correctly identifying the positive opinions Positive opinions are less frequent than negative opinions (20%) Positive opinions particularly exposed to verbal irony (11%)
Other opinion mining challenges are related to the entity recognition and co-reference resolution sub-tasks mentions to human targets are frequently made through pronouns,
definite descriptions and nicknames. The most frequent type of mention is the person name, but it only
covers 36% of the analyzed cases.
REAC
TIO
N
Next steps
April 2011: POWER
Populating the ontology, using text-mining approaches Internal release
SentiLex-PT01 Exploring other methods and algoritms (SVM, Active Learning) for
automatic polarity classification Enlarging the sentiment lexicon (verbs, predicate nouns, idiomatic
expressions)
REAC
TIO
N
Next steps
August 2011: POWER
First release to the general public via SPARQL endpoint and web user interface
SentiCorpus-PT09 Publically available
Analysis and (semi-automated) annotation of a collection of documents from industrial and social media, over a period of 6 months