igt - dcu school of computingasrivastava/docs/igtdetectionslides.pdf · to igt spans non_igt igtr...

23
IGT DETECTION IGT DETECTION Sabrina Burleigh Sabrina Burleigh Ankit Srivastava Ankit Srivastava LING 572 Presentation | March 8, 2007 A Machine Learning Approach to A Machine Learning Approach to Sequence Labeling Task Sequence Labeling Task

Upload: others

Post on 07-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

IGT DETECTIONIGT DETECTION

Sabrina BurleighSabrina Burleigh

Ankit SrivastavaAnkit Srivastava

LING 572 Presentation | March 8, 2007

A Machine Learning Approach to A Machine Learning Approach to

Sequence Labeling TaskSequence Labeling Task

Page 2: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 22

OVERVIEWOVERVIEW

�� Label SetLabel Set

�� Feature TemplatesFeature Templates

�� Beam SearchBeam Search

�� System ModificationSystem Modification

�� Final SystemFinal System

�� ConclusionConclusion

IGT

NOT

IGT

IGT

NOT

NOT

NOT

Page 3: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 33

OVERVIEWOVERVIEW

�� Label SetLabel Set

�� Feature TemplatesFeature Templates

�� Beam SearchBeam Search

�� System ModificationSystem Modification

�� Final SystemFinal System

�� ConclusionConclusion

Page 4: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 44

3 LABELS3 LABELS

�� First Line of IGTFirst Line of IGT

�� Remaining Lines of IGTRemaining Lines of IGT

�� Lines not IGTLines not IGT

IGTB

IGTR

NON IGT

Page 5: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 55

OVERVIEWOVERVIEW

�� Label SetLabel Set

�� Feature TemplatesFeature Templates

�� Beam SearchBeam Search

�� System ModificationSystem Modification

�� Final SystemFinal System

�� ConclusionConclusion

Page 6: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 66

BINARY FEATURESBINARY FEATURES

�� Presence of number or letter at beg. of line Presence of number or letter at beg. of line

(e.g. “(1)” or ”a.”)(e.g. “(1)” or ”a.”)

�� Presence of quote at beg. of linePresence of quote at beg. of line

�� Presence of >= 6 spaces at beg. of linePresence of >= 6 spaces at beg. of line

�� 2 features testing for gloss line 2 features testing for gloss line

(e.g. “bring(e.g. “bring--1sg”)1sg”)(e.g. “cook(e.g. “cook--NMNM--DAT”)DAT”)

IGTB

IGTR

IGTR

IGTR

Page 7: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 77

BASELINEBASELINE

87.1287.1283.4583.4591.1391.13ALMOSTALMOST

5.515.515.825.825.245.24EXACTEXACT

FF--MEASMEASRECALLRECALLPRECPRECNBNB

82.9682.9693.2993.2974.6974.69ALMOSTALMOST

20.1420.1426.6226.6216.1916.19EXACTEXACT

FF--MEASMEASRECALLRECALLPRECPRECMAXENTMAXENT

Page 8: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 88

OVERVIEWOVERVIEW

�� Label SetLabel Set

�� Feature TemplatesFeature Templates

�� Beam SearchBeam Search

�� System ModificationSystem Modification

�� Final SystemFinal System

�� ConclusionConclusion

Page 9: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 99

BEAM SEARCHBEAM SEARCH

�� No PruningNo Pruning

�� Using Using topNtopN: 0(default), 3, 5,10: 0(default), 3, 5,10

�� All other parameters set to defaultAll other parameters set to default

35.9335.9310010021.9021.90ALMOSTALMOST

0.990.993.803.800.570.57EXACTEXACT

FF--MEASMEASRECALLRECALLPRECPRECMaxEntMaxEnt

Page 10: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1010

OVERVIEWOVERVIEW

�� Label SetLabel Set

�� Feature TemplatesFeature Templates

�� Beam SearchBeam Search

�� System ModificationSystem Modification

�� Final SystemFinal System

�� ConclusionConclusion

Page 11: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1111

MODIFICATION MODIFICATION –– 1.11.1

�� Better mapping from classification results Better mapping from classification results

to IGT spansto IGT spans

NON_IGT

IGTR

IGTR

IGTB

Page 12: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1212

MODIFICATION MODIFICATION –– 1.21.2

�� Better mapping from classification results Better mapping from classification results

to IGT spansto IGT spans

NON_IGT

IGTB

IGTR

IGTR

Page 13: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1313

MODIFICATION MODIFICATION –– 1.31.3

�� Was not successful in span improvementWas not successful in span improvement

�� Led us to implement gram detectionLed us to implement gram detection

IGTB

NON_IGT

IGTR

IGTR

Page 14: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1414

MODIFICATION MODIFICATION –– 1.41.4

�� Don’t count single IGT lineDon’t count single IGT line

IGTB

NON_IGT

NON_IGT

NON_IGT

Page 15: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1515

MODIFICATION MODIFICATION –– 11

�� Results: Results: 28.42 % change on exact28.42 % change on exact

--9.12 % change on almost (due to 1.4)9.12 % change on almost (due to 1.4)

73.8473.8459.7859.7897.8897.88ALMOSTALMOST

48.5648.5639.6039.6062.7762.77EXACTEXACT

FF--MEASMEASRECALLRECALLPRECPREC

Page 16: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1616

MODIFICATION MODIFICATION -- 22

�� Edited the Feature TemplateEdited the Feature Template

�� Gram Detection in the gloss line Gram Detection in the gloss line -- RegexesRegexes for gram detection unreliablefor gram detection unreliable-- Used grams fileUsed grams file-- Retain only 1 Retain only 1 regexregex

�� Added feature for presence of endAdded feature for presence of end--quotequote

�� Changed # beg. spaces from 6 to 8 to Changed # beg. spaces from 6 to 8 to prevent firing of NON_IGT lines prevent firing of NON_IGT lines

IGTR

Page 17: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1717

MODIFICATION MODIFICATION -- 33

�� Used more training dataUsed more training data

�� Approx. half the files from $hw5Dir/ Approx. half the files from $hw5Dir/

more_training_datamore_training_data

-- Training time increased with addition of Training time increased with addition of

featuresfeatures

Page 18: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1818

MODIFICATIONMODIFICATION

�� Results of implementing all 3 modifications: Results of implementing all 3 modifications: 24.51 % change on exact24.51 % change on exact

--14.17 % change on almost (due to 1.4)14.17 % change on almost (due to 1.4)

68.7968.7958.5158.5181.1581.15ALMOSTALMOST

44.6544.6538.2638.2653.6153.61EXACTEXACT

FF--MEASMEASRECALLRECALLPRECPREC

Page 19: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1919

OVERVIEWOVERVIEW

�� Label SetLabel Set

�� Feature TemplatesFeature Templates

�� Beam SearchBeam Search

�� System ModificationSystem Modification

�� Final SystemFinal System

�� ConclusionConclusion

Page 20: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 2020

BEST SYSTEMBEST SYSTEM

�� Modification Modification –– 1 (Post1 (Post--processing)processing)

73.8473.8459.7859.7897.8897.88ALMOSTALMOST

48.5648.5639.6039.6062.7762.77EXACTEXACT

FF--MEASMEASRECALLRECALLPRECPREC

Page 21: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 2121

MORE TEST DATAMORE TEST DATA

�� Results consistent with first test setResults consistent with first test set

63.8063.8057.7757.7775.1575.15ALMOSTALMOST

45.0645.0640.8240.8250.2950.29EXACTEXACT

FF--MEASMEASRECALLRECALLPRECPREC

Page 22: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 2222

OVERVIEWOVERVIEW

�� Label SetLabel Set

�� Feature TemplatesFeature Templates

�� Beam SearchBeam Search

�� System ModificationSystem Modification

�� Final SystemFinal System

�� ConclusionConclusion

Page 23: IGT - DCU School of Computingasrivastava/docs/IgtDetectionSlides.pdf · to IGT spans NON_IGT IGTR IGTR IGTB. March 8, 2007 Ling 572 -Sabrina and Ankit 12 MODIFICATION –1.2 Better

March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 2323

CONCLUDING REMARKSCONCLUDING REMARKS

�� What workedWhat worked-- Gram detectionGram detection-- Quote detectionQuote detection-- PostPost--processing (span file)processing (span file)-- More training data ??More training data ??

�� What did not workWhat did not work-- Spaces at beg. of lineSpaces at beg. of line-- Gram Gram regexesregexes-- Beam SearchBeam Search

�� DifficultiesDifficulties-- YesYes