extracting information for context-aware meeting preparation

Extracting Information for Context-aware Meeting Preparation

Simon Scerri, Behrang Q. Zadeh, Maciej Dabrowski, Ismael Rivera

26.05.2014 LREC 2014. Reykjavik, Iceland.

General Objectives

LREC 2014Wednesday, 28 May 2014

Information Extraction Targets

Information Items & their attributes: (Semi)Structured• Email Messages• Instant Messages• Documents• Calendar Events• Folders

Item Titles, Descriptions & Content: Complex/Unstructured• Keywords• Action Items: Information Request & Task Request


Architecture


Keyword Extraction - Method

Keyword Extraction

General Text Processing Indexing and Storage

Keyword Extraction - Method


• Generic term extraction architecure• Based on the assumption that similar

terms appear in similar contexts• Use the context of previously known

terms to identify new terms

• Random Indexing for the Construction of a VSM at reduced dimension

• Create a training set using the previously known terms

• Use Linear least square support vector machine (SVM)

MLTagger• Mining technical terms (Expert Vocabulary) in semi-supervised

manner (minimum user intervention)• Train or Use Pre-Trained Models• Input: a Sentence

• Tagger based on Liblinear SVMs • Includes POS tags, Dependency Structures• Includes user feedback to identify relevant terms

• Output: Set of weighted terms

technology term / 1.3071887518221268term tagger / 0.859136213710545technology term tagger / 0.75647809808033technology related terms / 0.38733521155619terms / 0.3856395759054531Dependency / 0.24820541872752222identification of technology related terms / 0.22234662115108667

technology / 0.2218680207043609technology / 0.20526909576693653features including POS tags / 0.169229802088223Dependency Structures / 0.1408195803257369features including Part / 0.12821844123781564Part of Speech tags / 0.10986616318102964

Keyword Extraction - Evaluation


Evaluation over corpora of scientific papersSection A of ACL Anthology Reference CorpusSemantic web dog food corpusEvaluated datasets are availed here: http://parsie.deri.ie/datasets/TTI/

Precision-Recall estimation

GATE Pipeline (English)• Conditional Corpus

ANNIE IE System• Tokeniser/NE Transducer/POS Tagger

Gazetteer Lookup• Verbs (Actions, Activities, Modal verbs)• Grammatical Person

JAPE Hand-coded Rules• 62 rules in 16 phases• Grammatical Person

Action Item Extraction - Method


Action Item Extraction - Method


Action Item Extraction - Evaluation


Human vs Automatic Annotation• > 100 email messages• > 240 chat turns • Confirmation of Extracted Action Items• Marking False positives & False negatives (Missed Items)

Results• F2-measure: 0.69• Email only: 0.71• IM only: 0.64

Extracted Items: Unified Representation


Future Work


Action Item Extraction• Separation of pipelines

• Email & IM

• IM Pipeline: Abbreviation/TxtSpk replacement service

Keyword Extraction• Iterative Learning Procedure (App Validation)

•Active Learning – k-nearest-neighbour Regression instead of SVM

• Chat-email histories to Extract Background Knowledge• Application of Association Measures for Filtering Candidate terms

extracting information for context-aware meeting preparation

Education