extracting information for context-aware meeting preparation
DESCRIPTION
People working in an office environment suffer from large volumes of information that they need to manage and access. Frequently, the problem is due to machines not being able to recognise the many implicit relationships between office artefacts, and also due to them not being aware of the context surrounding them. In order to expose these relationships and enrich artefact context, text analytics can be employed over semi-structured and unstructured content, including free text. In this paper, we explain how this strategy is applied together with for a specific use-case: supporting the attendees of a calendar event to prepare for the meeting.TRANSCRIPT
Extracting Information for Context-aware Meeting Preparation
Simon Scerri, Behrang Q. Zadeh, Maciej Dabrowski, Ismael Rivera
26.05.2014 LREC 2014. Reykjavik, Iceland.
General Objectives
LREC 2014Wednesday, 28 May 2014
Information Extraction Targets
Information Items & their attributes: (Semi)Structured• Email Messages• Instant Messages• Documents• Calendar Events• Folders
Item Titles, Descriptions & Content: Complex/Unstructured• Keywords• Action Items: Information Request & Task Request
LREC 2014Wednesday, 28 May 2014
Architecture
LREC 2014Wednesday, 28 May 2014
Keyword Extraction - Method
Keyword Extraction
General Text Processing Indexing and Storage
Keyword Extraction - Method
LREC 2014Wednesday, 28 May 2014
• Generic term extraction architecure• Based on the assumption that similar
terms appear in similar contexts• Use the context of previously known
terms to identify new terms
• Random Indexing for the Construction of a VSM at reduced dimension
• Create a training set using the previously known terms
• Use Linear least square support vector machine (SVM)
MLTagger• Mining technical terms (Expert Vocabulary) in semi-supervised
manner (minimum user intervention)• Train or Use Pre-Trained Models• Input: a Sentence
• Tagger based on Liblinear SVMs • Includes POS tags, Dependency Structures• Includes user feedback to identify relevant terms
• Output: Set of weighted terms
technology term / 1.3071887518221268term tagger / 0.859136213710545technology term tagger / 0.75647809808033technology related terms / 0.38733521155619terms / 0.3856395759054531Dependency / 0.24820541872752222identification of technology related terms / 0.22234662115108667
technology / 0.2218680207043609technology / 0.20526909576693653features including POS tags / 0.169229802088223Dependency Structures / 0.1408195803257369features including Part / 0.12821844123781564Part of Speech tags / 0.10986616318102964
Keyword Extraction - Evaluation
LREC 2014Wednesday, 28 May 2014
Evaluation over corpora of scientific papersSection A of ACL Anthology Reference CorpusSemantic web dog food corpusEvaluated datasets are availed here: http://parsie.deri.ie/datasets/TTI/
Precision-Recall estimation
GATE Pipeline (English)• Conditional Corpus
ANNIE IE System• Tokeniser/NE Transducer/POS Tagger
Gazetteer Lookup• Verbs (Actions, Activities, Modal verbs)• Grammatical Person
JAPE Hand-coded Rules• 62 rules in 16 phases• Grammatical Person
Action Item Extraction - Method
LREC 2014Wednesday, 28 May 2014
Action Item Extraction - Method
LREC 2014Wednesday, 28 May 2014
Action Item Extraction - Evaluation
LREC 2014Wednesday, 28 May 2014
Human vs Automatic Annotation• > 100 email messages• > 240 chat turns • Confirmation of Extracted Action Items• Marking False positives & False negatives (Missed Items)
Results• F2-measure: 0.69• Email only: 0.71• IM only: 0.64
Extracted Items: Unified Representation
LREC 2014Wednesday, 28 May 2014
Future Work
LREC 2014Wednesday, 28 May 2014
Action Item Extraction• Separation of pipelines
• Email & IM
• IM Pipeline: Abbreviation/TxtSpk replacement service
Keyword Extraction• Iterative Learning Procedure (App Validation)
•Active Learning – k-nearest-neighbour Regression instead of SVM
• Chat-email histories to Extract Background Knowledge• Application of Association Measures for Filtering Candidate terms