igt - dcu school of computingasrivastava/docs/igtdetectionslides.pdf · to igt spans non_igt igtr...
TRANSCRIPT
IGT DETECTIONIGT DETECTION
Sabrina BurleighSabrina Burleigh
Ankit SrivastavaAnkit Srivastava
LING 572 Presentation | March 8, 2007
A Machine Learning Approach to A Machine Learning Approach to
Sequence Labeling TaskSequence Labeling Task
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 22
OVERVIEWOVERVIEW
�� Label SetLabel Set
�� Feature TemplatesFeature Templates
�� Beam SearchBeam Search
�� System ModificationSystem Modification
�� Final SystemFinal System
�� ConclusionConclusion
IGT
NOT
IGT
IGT
NOT
NOT
NOT
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 33
OVERVIEWOVERVIEW
�� Label SetLabel Set
�� Feature TemplatesFeature Templates
�� Beam SearchBeam Search
�� System ModificationSystem Modification
�� Final SystemFinal System
�� ConclusionConclusion
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 44
3 LABELS3 LABELS
�� First Line of IGTFirst Line of IGT
�� Remaining Lines of IGTRemaining Lines of IGT
�� Lines not IGTLines not IGT
IGTB
IGTR
NON IGT
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 55
OVERVIEWOVERVIEW
�� Label SetLabel Set
�� Feature TemplatesFeature Templates
�� Beam SearchBeam Search
�� System ModificationSystem Modification
�� Final SystemFinal System
�� ConclusionConclusion
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 66
BINARY FEATURESBINARY FEATURES
�� Presence of number or letter at beg. of line Presence of number or letter at beg. of line
(e.g. “(1)” or ”a.”)(e.g. “(1)” or ”a.”)
�� Presence of quote at beg. of linePresence of quote at beg. of line
�� Presence of >= 6 spaces at beg. of linePresence of >= 6 spaces at beg. of line
�� 2 features testing for gloss line 2 features testing for gloss line
(e.g. “bring(e.g. “bring--1sg”)1sg”)(e.g. “cook(e.g. “cook--NMNM--DAT”)DAT”)
IGTB
IGTR
IGTR
IGTR
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 77
BASELINEBASELINE
87.1287.1283.4583.4591.1391.13ALMOSTALMOST
5.515.515.825.825.245.24EXACTEXACT
FF--MEASMEASRECALLRECALLPRECPRECNBNB
82.9682.9693.2993.2974.6974.69ALMOSTALMOST
20.1420.1426.6226.6216.1916.19EXACTEXACT
FF--MEASMEASRECALLRECALLPRECPRECMAXENTMAXENT
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 88
OVERVIEWOVERVIEW
�� Label SetLabel Set
�� Feature TemplatesFeature Templates
�� Beam SearchBeam Search
�� System ModificationSystem Modification
�� Final SystemFinal System
�� ConclusionConclusion
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 99
BEAM SEARCHBEAM SEARCH
�� No PruningNo Pruning
�� Using Using topNtopN: 0(default), 3, 5,10: 0(default), 3, 5,10
�� All other parameters set to defaultAll other parameters set to default
35.9335.9310010021.9021.90ALMOSTALMOST
0.990.993.803.800.570.57EXACTEXACT
FF--MEASMEASRECALLRECALLPRECPRECMaxEntMaxEnt
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1010
OVERVIEWOVERVIEW
�� Label SetLabel Set
�� Feature TemplatesFeature Templates
�� Beam SearchBeam Search
�� System ModificationSystem Modification
�� Final SystemFinal System
�� ConclusionConclusion
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1111
MODIFICATION MODIFICATION –– 1.11.1
�� Better mapping from classification results Better mapping from classification results
to IGT spansto IGT spans
NON_IGT
IGTR
IGTR
IGTB
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1212
MODIFICATION MODIFICATION –– 1.21.2
�� Better mapping from classification results Better mapping from classification results
to IGT spansto IGT spans
NON_IGT
IGTB
IGTR
IGTR
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1313
MODIFICATION MODIFICATION –– 1.31.3
�� Was not successful in span improvementWas not successful in span improvement
�� Led us to implement gram detectionLed us to implement gram detection
IGTB
NON_IGT
IGTR
IGTR
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1414
MODIFICATION MODIFICATION –– 1.41.4
�� Don’t count single IGT lineDon’t count single IGT line
IGTB
NON_IGT
NON_IGT
NON_IGT
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1515
MODIFICATION MODIFICATION –– 11
�� Results: Results: 28.42 % change on exact28.42 % change on exact
--9.12 % change on almost (due to 1.4)9.12 % change on almost (due to 1.4)
73.8473.8459.7859.7897.8897.88ALMOSTALMOST
48.5648.5639.6039.6062.7762.77EXACTEXACT
FF--MEASMEASRECALLRECALLPRECPREC
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1616
MODIFICATION MODIFICATION -- 22
�� Edited the Feature TemplateEdited the Feature Template
�� Gram Detection in the gloss line Gram Detection in the gloss line -- RegexesRegexes for gram detection unreliablefor gram detection unreliable-- Used grams fileUsed grams file-- Retain only 1 Retain only 1 regexregex
�� Added feature for presence of endAdded feature for presence of end--quotequote
�� Changed # beg. spaces from 6 to 8 to Changed # beg. spaces from 6 to 8 to prevent firing of NON_IGT lines prevent firing of NON_IGT lines
IGTR
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1717
MODIFICATION MODIFICATION -- 33
�� Used more training dataUsed more training data
�� Approx. half the files from $hw5Dir/ Approx. half the files from $hw5Dir/
more_training_datamore_training_data
-- Training time increased with addition of Training time increased with addition of
featuresfeatures
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1818
MODIFICATIONMODIFICATION
�� Results of implementing all 3 modifications: Results of implementing all 3 modifications: 24.51 % change on exact24.51 % change on exact
--14.17 % change on almost (due to 1.4)14.17 % change on almost (due to 1.4)
68.7968.7958.5158.5181.1581.15ALMOSTALMOST
44.6544.6538.2638.2653.6153.61EXACTEXACT
FF--MEASMEASRECALLRECALLPRECPREC
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 1919
OVERVIEWOVERVIEW
�� Label SetLabel Set
�� Feature TemplatesFeature Templates
�� Beam SearchBeam Search
�� System ModificationSystem Modification
�� Final SystemFinal System
�� ConclusionConclusion
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 2020
BEST SYSTEMBEST SYSTEM
�� Modification Modification –– 1 (Post1 (Post--processing)processing)
73.8473.8459.7859.7897.8897.88ALMOSTALMOST
48.5648.5639.6039.6062.7762.77EXACTEXACT
FF--MEASMEASRECALLRECALLPRECPREC
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 2121
MORE TEST DATAMORE TEST DATA
�� Results consistent with first test setResults consistent with first test set
63.8063.8057.7757.7775.1575.15ALMOSTALMOST
45.0645.0640.8240.8250.2950.29EXACTEXACT
FF--MEASMEASRECALLRECALLPRECPREC
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 2222
OVERVIEWOVERVIEW
�� Label SetLabel Set
�� Feature TemplatesFeature Templates
�� Beam SearchBeam Search
�� System ModificationSystem Modification
�� Final SystemFinal System
�� ConclusionConclusion
March 8, 2007March 8, 2007 Ling 572 Ling 572 -- Sabrina and AnkitSabrina and Ankit 2323
CONCLUDING REMARKSCONCLUDING REMARKS
�� What workedWhat worked-- Gram detectionGram detection-- Quote detectionQuote detection-- PostPost--processing (span file)processing (span file)-- More training data ??More training data ??
�� What did not workWhat did not work-- Spaces at beg. of lineSpaces at beg. of line-- Gram Gram regexesregexes-- Beam SearchBeam Search
�� DifficultiesDifficulties-- YesYes