exploring temporal information for...

1
TUTA1 at the NTCIR-11 Temporalia Task Exploring Temporal Information for TQIC Hai-Tao Yu ¶* & Xin Kang †‡* & Fuji Ren Electronics and Information, Tongji University Faculty of Engineering, The University of Tokushima Faculty of Library, Information and Media Science, University of Tsukuba [email protected], [email protected], [email protected], [email protected] Abstract For NTCIR-11 Temporalia subtask Temporal Query Intent Classification (TQIC), we carefully study tempo- ral information in the dry-run search queries, explore time gap, verb tense, lemma and named entity as temporal features, and build supervised and semi-supervised linear classifiers. We report the Precision and over Precision scores through RUN-1 to RUN-3 as well as a baseline RUN-4, compare the performance with respect to different parameter and learning algorithm configurations, and analyze the TQIC errors. We find the time gap and verb tense features with a supervised classifier are effective in separating the Past and Future queries, while the lemma and named entity feature could help predicting the Recent and Atemporal queries with a semi-supervised classifier. Introduction The TUTA1 group at The University of Tokushima participated in two subtasks, Temporal Query In- tent Classification (TQIC) and Temporal Information Retrieval (TIR), of the new pilot task Temporal Information Access[1] (Temporalia) at NTCIR-11. The TQIC subtask focuses on the identification of user’s temporal intent given the query string and submission date, across four temporal categories Past, Recent 1 , Future, and Atemporal. Challenges 1. What are effective temporal features in search queries? 2. How to explore temporal information in the background? Temporal Feature Extraction Because query strings are usually very short (4.2 words in dry-run on average), to find useful temporal features in queries and to explore the background information seem to be prominent in this subtask. AOL 500K User Session Collection 2 [2] is employed to expand our knowledge of temporal features, through a semi-supervised learning model. Class Query String Submit Date Temporal Feature Future June 2013 movie releases May 28, 2013 2013-06 DIFF future Future 2013 winter weather forecast Oct 28, 2013 2013-WI DIFF future Future weather for tomorrow Oct 28, 2013 P1D DIFF future Future comet coming in 2013 Oct 28, 2013 2013 DIFF same year Future comet coming in 2013 Oct 28, 2013 VBG UVT VGB VGB come Past when did hawaii become a state Feb 28, 2013 VBD VB UVT VBD VBD do VB become Atemp New York Times Feb 28, 2013 ORGAN New York Times Past Yuri Gagarin Cause of Death Feb 28, 2013 PERSON Yuri Gagarin Recent Boston Bruins Scores Oct 13, 2013 ORGAN Boston Bruins Table 1: Extracting the time gap features. Time Gap The ideal temporal features should indicate the gap between the intended time point in a search query and the query submission time, in which case the Temporal Query Intent Classification problem falls back to evaluating this time gap. We employ the SUTIME library in Stanford CoreNLP pipeline to recognize and normalize the temporal expressions in search queries. ROOT SBARQ WHADVP WRB when SQ VBD did NP NNS hawaii VP VB become NP DT a NN state Verb Tense Another important temporal feature in search query is the verb tense, which include the past tense (VBD), the singular present tense (VBZ/VBP for 3rd person/non- 3rd person), the present participle tense (VBG), the past participle tense (VBN), and the base tense (VB). The verb tense features are represented by the combination of POS tags and verb lemmas. In case of multiple verbs in a query string, we use the Uppermost Verb Tense UVT VB* to rep- resent a user’s temporal intent, in which VB* is the tense of the main predicate. Verb tense and the main predicate in a search query are obtained through Stanford POS tag- ger library and Stanford Parser library in Stanford CoreNLP pipeline, which follows Penn Treebank tag set for POS tag- ging. Figure 1: Temporal feature ratios. Lemma and Named Entity Not all query strings contain time gap and verb tense features. We investigate the word lemmas and named entities in queries, which are still sparse but count the majority of query contents, to indicate the tem- poral intents. Stanford Named Entity Recognizer in Stan- ford CoreNLP pipeline is employed to recognize named en- tities in search queries. Temporal Intent Classifiers Supervised classifier: Logistic Regression Classifier in scikit-learn 0.15.0. Semi-supervised classifier: Linear SVM Classifier in SVMlin[3]. Experiments Experiment Setup The supervised classifier is trained with the 80 labeled examples in dry-run dataset, while the semi-supervised classifier is trained with extra 3.1M unlabeled examples in the AOL dataset. Model parameters are selected through a 5-fold cross validation on the training dataset, based on the overall Precision. All models are tested on the formal-run dataset, which contains 300 labeled examples. RUN-1-to-3 are submitted, and RUN-4 serves as a baseline. Run Feature Dataset Classifier Hyper Parameter 1 All temporal features Dry-run LGR C = 30, penalty = l 1 2 All temporal features Dry-run LGR C = 300, penalty = l 1 3 All temporal features Dry-run, AOL SVMlin A =2, W =0.03, U =3, R =0.03 4 Lemma & named entity Dry-run LGR C =3, penalty = l 2 Table 2: TQIC runs. Experiment Result TQIC results are evaluated on the formal-run dataset, based on the classification Precision for each temporal class τ P (τ )= correct(τ ) total (τ ) , (1) and the overall Precision ¯ P = τ correct(τ ) τ total (τ ) . (2) Run Past Recent Future Atemp Overall 1 0.8533 0.4800 0.8533 0.7733 0.7400 2 0.8533 0.4667 1 0.8267 1 0.7600 0.7267 3 0.8667 1,2 0.5867 1,2 0.8400 1,2 0.5333 1,2 0.7067 4 0.6933 1,2,3 0.4800 1,2,3 0.8133 1,2,3 0.6400 1,2,3 0.6567 Table 3: Precision scores. Wilcoxon signed-rank test with p< 0.05 is em- ployed for statistical significance test: superscripts 1, 2, 3 indicate statistically significant differences to RUN-1, RUN-2, and RUN-3 respectively. RUN-1 achieves the highest over- all Precision and Precision for Fu- ture and Atemporal, while RUN- 3 yields the highest Precision for Past and Recent. Results suggest that time gap and verb tense are effective in separating Past, Fu- ture, and even Atemporal, and the background information helps our semi-supervised classifier to fur- ther improve on Recent and Past. Figure 2: Confusion matrices for 4 Runs. Error Analysis Recent and Atemporal seem more difficult to predict than Past and Future. In each sub- plot, the cell located at row i and column j corre- sponds to the number of observations known to be in class i while predicted as class j . For all runs, the mis-prediction of Recent to Future counts the largest number of errors, while the mis-predictions of Atem- poral and Future to Recent count a significant part of classification errors. Time gap features DIFF same * turn to be less indicative, since they cannot suggest a useful gap. 11 mis-classifications of Future on weather queries indicate an over-fitting problem, since 5/6 weather queries in dry-run have a Future label. 5 mis-classifications of Future on tonight queries, which are all labeled as Recent in formal-run, may reflect ei- ther an incorrectly learned feature or a vague boundary for Recent examples in feature space (the same query string “bruins game tonight time” of id 078 in dry-run and id 194 in formal-run, although submitted in different dates, was labeled as Future and Recent separately). The failure of understanding named entities, e.g. “belmont stakes 2013” and “voice 2013”, is also responsible for some of the errors. Conclusions Three temporal features were extracted for temporal information representation in search queries. A semi-supervised classifier was developed to expand the temporal feature on an unlabeled dataset. Recent-Future, Atemporal-Recent, and Future-Recent counted a big part of the mis-classification. Forthcoming Research Our future work will focus on investigating temporal information in lemmas and named entities. Meanwhile, methods for preventing the learning algorithms from over-fitting will also be employed. References [1] H. Joho, A. Jatowt, and R. Blanco. Ntcir temporalia: a test collection for temporal information access research. In Proceedings of the companion publication of the 23rd international confer- ence on World wide web companion, pages 845–850. International World Wide Web Conferences Steering Committee, 2014. [2]G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In InfoScale, volume 152, page 1. Citeseer, 2006. [3]V. Sindhwani and S. S. Keerthi. Large scale semi-supervised linear svms. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 477–484. ACM, 2006. 1 According to task description, the Recent category corresponds to the “very near past or at present time” temporal intents in search queries 2 http://www.gregsadetsky.com/aol-data/

Upload: others

Post on 07-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring Temporal Information for TQICresearch.nii.ac.jp/ntcir/workshop/OnlineProceedings11/pdf/NTCIR/... · Exploring Temporal Information for TQIC Hai-Tao Yu{& Xin Kangyz & Fuji

TUTA1 at the NTCIR-11 Temporalia TaskExploring Temporal Information for TQIC

Hai-Tao Yu¶∗ & Xin Kang†‡∗ & Fuji Ren‡†Electronics and Information, Tongji University‡Faculty of Engineering, The University of Tokushima¶Faculty of Library, Information and Media Science, University of Tsukuba

[email protected], [email protected],[email protected], [email protected]

AbstractFor NTCIR-11 Temporalia subtask Temporal Query Intent Classification (TQIC), we carefully study tempo-

ral information in the dry-run search queries, explore time gap, verb tense, lemma and named entity as temporalfeatures, and build supervised and semi-supervised linear classifiers. We report the Precision and over Precisionscores through RUN-1 to RUN-3 as well as a baseline RUN-4, compare the performance with respect to differentparameter and learning algorithm configurations, and analyze the TQIC errors. We find the time gap and verb tensefeatures with a supervised classifier are effective in separating the Past and Future queries, while the lemma andnamed entity feature could help predicting the Recent and Atemporal queries with a semi-supervised classifier.

IntroductionThe TUTA1 group at The University of Tokushima participated in two subtasks, Temporal Query In-tent Classification (TQIC) and Temporal Information Retrieval (TIR), of the new pilot task TemporalInformation Access[1] (Temporalia) at NTCIR-11. The TQIC subtask focuses on the identificationof user’s temporal intent given the query string and submission date, across four temporal categoriesPast, Recent1, Future, and Atemporal.

Challenges1. What are effective temporal features in search queries?

2. How to explore temporal information in the background?

Temporal Feature ExtractionBecause query strings are usually very short (4.2 words in dry-run on average), to find useful temporalfeatures in queries and to explore the background information seem to be prominent in this subtask.AOL 500K User Session Collection2[2] is employed to expand our knowledge of temporal features,through a semi-supervised learning model.

Class Query String Submit Date Temporal Feature

Future June 2013 movie releases May 28, 2013 2013-06 DIFF futureFuture 2013 winter weather forecast Oct 28, 2013 2013-WI DIFF futureFuture weather for tomorrow Oct 28, 2013 P1D DIFF futureFuture comet coming in 2013 Oct 28, 2013 2013 DIFF same year

Future comet coming in 2013 Oct 28, 2013 VBGUVT VGBVGB come

Past when did hawaii become a state Feb 28, 2013 VBD VBUVT VBDVBD do VB become

Atemp New York Times Feb 28, 2013 ORGAN New York TimesPast Yuri Gagarin Cause of Death Feb 28, 2013 PERSON Yuri GagarinRecent Boston Bruins Scores Oct 13, 2013 ORGAN Boston Bruins

Table 1: Extracting the time gap features.

Time Gap The ideal temporal features should indicate the gap between the intended time point in asearch query and the query submission time, in which case the Temporal Query Intent Classificationproblem falls back to evaluating this time gap. We employ the SUTIME library in Stanford CoreNLPpipeline to recognize and normalize the temporal expressions in search queries.

ROOT

SBARQ

WHADVP

WRB

when

SQ

VBD

did

NP

NNS

hawaii

VP

VB

become

NP

DT

a

NN

state

Verb Tense Another important temporal feature in searchquery is the verb tense, which include the past tense (VBD),the singular present tense (VBZ/VBP for 3rd person/non-3rd person), the present participle tense (VBG), the pastparticiple tense (VBN), and the base tense (VB). The verbtense features are represented by the combination of POStags and verb lemmas. In case of multiple verbs in a querystring, we use the Uppermost Verb Tense UVT VB* to rep-resent a user’s temporal intent, in which VB* is the tenseof the main predicate. Verb tense and the main predicatein a search query are obtained through Stanford POS tag-ger library and Stanford Parser library in Stanford CoreNLPpipeline, which follows Penn Treebank tag set for POS tag-ging.

Figure 1: Temporal feature ratios.

Lemma and Named Entity Not all query strings containtime gap and verb tense features. We investigate the wordlemmas and named entities in queries, which are still sparsebut count the majority of query contents, to indicate the tem-poral intents. Stanford Named Entity Recognizer in Stan-ford CoreNLP pipeline is employed to recognize named en-tities in search queries.

Temporal Intent Classifiers• Supervised classifier: Logistic Regression Classifier in

scikit-learn 0.15.0.

• Semi-supervised classifier: Linear SVM Classifier inSVMlin[3].

ExperimentsExperiment Setup The supervised classifier is trained with the 80 labeled examples in dry-rundataset, while the semi-supervised classifier is trained with extra 3.1M unlabeled examples in theAOL dataset. Model parameters are selected through a 5-fold cross validation on the training dataset,based on the overall Precision. All models are tested on the formal-run dataset, which contains 300labeled examples. RUN-1-to-3 are submitted, and RUN-4 serves as a baseline.

Run Feature Dataset Classifier Hyper Parameter

1 All temporal features Dry-run LGR C = 30, penalty = l12 All temporal features Dry-run LGR C = 300, penalty = l13 All temporal features Dry-run, AOL SVMlin A = 2, W = 0.03, U = 3, R = 0.034 Lemma & named entity Dry-run LGR C = 3, penalty = l2

Table 2: TQIC runs.

Experiment Result TQIC results are evaluated on the formal-run dataset, based on the classificationPrecision for each temporal class τ

P (τ ) =correct(τ )

total(τ ), (1)

and the overall PrecisionP̄ =

∑τ correct(τ )∑τ total(τ )

. (2)

Run Past Recent Future Atemp Overall

1 0.8533 0.4800 0.8533 0.7733 0.74002 0.8533 0.46671 0.82671 0.7600 0.72673 0.86671,2 0.58671,2 0.84001,2 0.53331,2 0.70674 0.69331,2,3 0.48001,2,3 0.81331,2,3 0.64001,2,3 0.6567

Table 3: Precision scores. Wilcoxon signed-rank test with p < 0.05 is em-ployed for statistical significance test: superscripts 1, 2, 3 indicate statisticallysignificant differences to RUN-1, RUN-2, and RUN-3 respectively.

RUN-1 achieves the highest over-all Precision and Precision for Fu-ture and Atemporal, while RUN-3 yields the highest Precision forPast and Recent. Results suggestthat time gap and verb tense areeffective in separating Past, Fu-ture, and even Atemporal, and thebackground information helps oursemi-supervised classifier to fur-ther improve on Recent and Past.

P R F A

P

R

F

A

64

0

0

2

1

36

9

9

5

29

64

6

5

10

2

58

P R F A

64

0

0

2

1

35

11

9

6

31

62

7

4

9

2

57

P

R

F

A

65

0

0

5

3

44

11

21

5

30

63

9

2

1

1

40

52

3

1

11

1

36

10

10

12

29

61

6

10

7

3

480

20

40

60

80

Figure 2: Confusion matrices for 4 Runs.

Error Analysis Recent and Atemporal seem moredifficult to predict than Past and Future. In each sub-plot, the cell located at row i and column j corre-sponds to the number of observations known to be inclass i while predicted as class j. For all runs, themis-prediction of Recent to Future counts the largestnumber of errors, while the mis-predictions of Atem-poral and Future to Recent count a significant part ofclassification errors. Time gap features DIFF same *turn to be less indicative, since they cannot suggesta useful gap. 11 mis-classifications of Future onweather queries indicate an over-fitting problem, since5/6 weather queries in dry-run have a Future label. 5mis-classifications of Future on tonight queries, whichare all labeled as Recent in formal-run, may reflect ei-ther an incorrectly learned feature or a vague boundaryfor Recent examples in feature space (the same querystring “bruins game tonight time” of id 078 in dry-run and id 194 in formal-run, although submittedin different dates, was labeled as Future and Recent separately). The failure of understanding namedentities, e.g. “belmont stakes 2013” and “voice 2013”, is also responsible for some of the errors.

Conclusions• Three temporal features were extracted for temporal information representation in search queries.

•A semi-supervised classifier was developed to expand the temporal feature on an unlabeled dataset.

• Recent-Future, Atemporal-Recent, and Future-Recent counted a big part of the mis-classification.

Forthcoming ResearchOur future work will focus on investigating temporal information in lemmas and named entities.Meanwhile, methods for preventing the learning algorithms from over-fitting will also be employed.

References[1] H. Joho, A. Jatowt, and R. Blanco. Ntcir temporalia: a test collection for temporal information

access research. In Proceedings of the companion publication of the 23rd international confer-ence on World wide web companion, pages 845–850. International World Wide Web ConferencesSteering Committee, 2014.

[2] G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In InfoScale, volume 152, page 1.Citeseer, 2006.

[3] V. Sindhwani and S. S. Keerthi. Large scale semi-supervised linear svms. In Proceedings of the29th annual international ACM SIGIR conference on Research and development in informationretrieval, pages 477–484. ACM, 2006.

1According to task description, the Recent category corresponds to the “very near past or at present time” temporal intents in search queries2http://www.gregsadetsky.com/aol-data/