task question

25
Task Question Is it possible to monitor news media from regions all over the world over extended periods of time, extracting low-level events from them, and piece them together to automatically track and predict conflict in all the regions of the world?

Upload: zagiri

Post on 17-Mar-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Task Question. Is it possible to monitor news media from regions all over the world over extended periods of time, extracting low-level events from them, and piece them together to automatically track and predict conflict in all the regions of the world?. The Ares project. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Task Question

Task Question

• Is it possible to monitor news media from regions all over the world over extended periods of time, extracting low-level events from them, and piece them together to automatically track and predict conflict in all the regions of the world?

Page 2: Task Question

The Ares project

OnlineInformationSources

RiceEventData

Extractor

Singularitydetection

Hubs &Authorities

Models

AP, AFP,BBC, Reuters,…

Over 1 millionarticles on theMiddle East from1979 to 2005 (filtered automatically)

http://ares.cs.rice.edu

Page 3: Task Question

Analysis of wire stories

Date Actor Target Weis Code Wies event Goldstein scale790415 ARB ISR 223 (MIL ENGAGEMENT) -10790415 EGY AFD 194 (HALT NEGOTIATION) -3.8790415 PALPL ISR 223 (MIL ENGAGEMENT) -10790415 UNK ISR 223 (MIL ENGAGEMENT) -10790415 ISR EGY 31 (MEET) 1790415 EGY ISR 31 (MEET) 1790415 ISRMIL PAL 223 (MIL ENGAGEMENT) -10790415 PALPL JOR 223 (MIL ENGAGEMENT) -10790415 EGY AFD 193 (CUT AID) -5.6790415 IRQ EGY 31 (MEET) 1790415 EGY IRQ 31 (MEET) 1790415 ARB CHR 223 (MIL ENGAGEMENT) -10790415 JOR AUS 32 (VISIT) 1.9790415 UGA CHR 32 (VISIT) 1.9790415 ISRGOV ISRSET 54 (ASSURE) 2.8

Singularity detectionon aggregated eventsdata

Hubs and authoritiesanalysis of eventsdata

Relevance filter

Page 4: Task Question

Embedded learner design• Representation

– Identify relevant stories, extract event data from them, build time series models and graph-theoretic models.

• Learning– Identifying regime shifts in events data, tracking

evolution of militarized interstate disputes (MIDs) by hubs/authorities analysis of events data

• Decision-making– Issuing early warnings of outbreak of MIDs

Page 5: Task Question

Identifying relevant stories

– Only about 20% of stories contain events that are to be extracted.

• The rest are interpretations, (e.g., op-eds), or are events not about conflict (e.g., sports)

– We have trained Naïve Bayes (precision 86% and recall 81%), SVM classifiers (precision 92% and recall 89%) & Okapi classifiers (precision 93% and recall 87%) using a labeled set of 180,000 stories from Reuters.

– Surprisingly difficult problem!• Lack of large labeled data sets; • Poor transfer to other sources (AP/BBC)• The category of “event containing stories” is not well-separated

from others, and changes with time

Lee, Tran, Singer, Subramanian, 2006

Page 6: Task Question

Okapi classifier• Reuters data set: relevant

categories are GVIO, GDIP, G13; irrelevant categories: 1POL, 2ECO, 3SPO, ECAT, G12, G131, GDEF, GPOL

Irr

Rel New article

Decision rule: sum of top N Okapi scores in Rel set > sum of top N Okapi scores in Irr set then classify as rel; else irr

Okapi measure takestwo articles and givesthe similarity between them.

Page 7: Task Question

Event extraction

Page 8: Task Question

Parse sentence

Klein and Manning parser

Page 9: Task Question

Pronoun de-referencing

Page 10: Task Question

Sentence fragmentation

Correlative conjunctions

Extract embedded sentences (SBAR)

Page 11: Task Question

Conditional random fields

We extract who (actor) did what (event) to whom (target)

Not exactly the same as NER

Page 12: Task Question

Results

200 Reuters sentences; hand-labeled with actor, target,and event codes (22 and 02).

TABARIis stateof the artcoderin politicalscience

Stepinksi, Stoll, Subramanian 2006

Page 13: Task Question

Events data

177,336 events from April 1979 to October 2003 in Levantdata set (KEDS).

Date Actor Target Weis Code Wies event Goldstein scale790415 ARB ISR 223 (MIL ENGAGEMENT) -10790415 EGY AFD 194 (HALT NEGOTIATION) -3.8790415 PALPL ISR 223 (MIL ENGAGEMENT) -10790415 UNK ISR 223 (MIL ENGAGEMENT) -10790415 ISR EGY 31 (MEET) 1790415 EGY ISR 31 (MEET) 1790415 ISRMIL PAL 223 (MIL ENGAGEMENT) -10790415 PALPL JOR 223 (MIL ENGAGEMENT) -10790415 EGY AFD 193 (CUT AID) -5.6790415 IRQ EGY 31 (MEET) 1790415 EGY IRQ 31 (MEET) 1790415 ARB CHR 223 (MIL ENGAGEMENT) -10790415 JOR AUS 32 (VISIT) 1.9790415 UGA CHR 32 (VISIT) 1.9790415 ISRGOV ISRSET 54 (ASSURE) 2.8

Page 14: Task Question

What can be predicted?

Page 15: Task Question

Singularity detection

Stoll and Subramanian, 2004, 2006

Page 16: Task Question

Singularities = MID start/end

biweek

Date range

event17-35 11/79 to

8/80Start of Iran/Iraq war

105-111 4/83 to 7/83 Beirut suicide attack, end of Iran/Iraq war

244 1/91 to 2/91 Desert Storm

413-425 1/95 to 7/95 Rabin assassination/start of Intifada

483-518 10/97 to 2/99

US/Iraq confrontation via Richard Butler/arms inspectors

522-539 4/99 to 11/99

Second intifada Israel/Palestine

Page 17: Task Question

Interaction graphs

• Model interactions between countries in a directed graph.

Date Actor Target Weis Code Wies event Goldstein scale790415 ARB ISR 223 (MIL ENGAGEMENT) -10790415 EGY AFD 194 (HALT NEGOTIATION) -3.8790415 PALPL ISR 223 (MIL ENGAGEMENT) -10790415 UNK ISR 223 (MIL ENGAGEMENT) -10790415 ISR EGY 31 (MEET) 1

ARB ISR

EGY UNK

AFD PALPL

Page 18: Task Question

Hubs and authorities for events data

• A hub node is an important initiator of events.• An authority node is an important target of

events.• Hypothesis:

– Identifying hubs and authorities over a particular temporal chunk of events data tells us who the key actors and targets are.

– Changes in the number and size of connected components in the interaction graph signal potential outbreak of conflict.

Page 19: Task Question

Hubs/Authorities picture of Iran Iraq war

Page 20: Task Question

2 weeks prior to Desert Storm

Page 21: Task Question

Validation using MID data• Number of bi-weeks with MIDS in Levant data: 41 out of

589.• Result 1: Hubs and Authorities correctly identify actors

and targets in impending conflict.• Result 2: Simple regression model on change in hubs

and authorities scores, change in number of connected components, change in size of largest component 4 weeks before MID, predicts MID onset.

• Problem: false alarm rate of 16% can be reduced by adding political knowledge of conflict.

Stoll and Subramanian, 2006

Page 22: Task Question
Page 23: Task Question

Current work

• Extracting economic events along with political events to improve accuracy of prediction of both economic and political events.

Page 24: Task Question

Publications

• An OKAPI-based approach for article filtering, Lee, Than, Stoll, Subramanian, 2006 Rice University Technical Report.

• Hubs, authorities and networks: predicting conflict using events data, R. Stoll and D. Subramanian, International Studies Association, 2006 (invited paper).

• Events, patterns and analysis, D. Subramanian and R. Stoll, in Programming for Peace: Computer-aided methods for international conflict resolution and prevention, 2006, Springer Verlag, R. Trappl (ed).

• Four Way Street? Saudi Arabia's Behavior among the superpowers, 1966-1999, R. Stoll and D. Subramanian, James A Baker III Institute for Public Policy Series, 2004.

• Events, patterns and analysis: forecasting conflict in the 21st century, R. Stoll and D. Subramanian, Proceedings of the National Conference on Digital Government Research, 2004.

• Forecasting international conflict in the 21st century, D. Subramanian and R. Stoll, in Proc. of the Symposium on Computer-aided methods for international conflict resolution, 2002.

Page 25: Task Question

The research team