part-of-speech tagging using neural networks
DESCRIPTION
Part-Of-Speech Tagging using Neural Networks. Ankur Parikh LTRC IIIT Hyderabad [email protected]. Outline. 1.Introduction 2.Background and Motivation 3.Experimental Setup 4.Preprocessing 5.Representation 6.Single-neuro tagger 7.Experiments 8.Multi-neuro tagger 9.Results - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/1.jpg)
Part-Of-Speech Part-Of-Speech Tagging Tagging
using Neural using Neural NetworksNetworks
Ankur ParikhAnkur ParikhLTRCLTRC
IIIT HyderabadIIIT Hyderabad [email protected]@gmail.com
![Page 2: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/2.jpg)
OutlineOutline
1.Introduction1.Introduction2.Background and Motivation2.Background and Motivation3.Experimental Setup3.Experimental Setup4.Preprocessing 4.Preprocessing 5.Representation5.Representation6.Single-neuro tagger6.Single-neuro tagger7.Experiments7.Experiments8.Multi-neuro tagger8.Multi-neuro tagger9.Results9.Results10.Discussion10.Discussion11.Future Work11.Future Work
![Page 3: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/3.jpg)
IntroductionIntroduction
POS-TaggingPOS-Tagging::
It is the process of assigning the part of speech tag to It is the process of assigning the part of speech tag to the NL text based on both its definition and its contextthe NL text based on both its definition and its context..
Uses:Uses:Parsing of sentences, MT, IR, Word Sense disambiguation, Parsing of sentences, MT, IR, Word Sense disambiguation, Speech synthesis etc.Speech synthesis etc.
Methods:Methods:1. Statistical Approach1. Statistical Approach2. Rule Based2. Rule Based
![Page 4: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/4.jpg)
Background: Previous Background: Previous ApproachesApproaches
Lots of work has been done using various machine Lots of work has been done using various machine learning algorithms like learning algorithms like TNTTNT CRF CRF
for Hindi.for Hindi. Trade-off: Performance versus Training timeTrade-off: Performance versus Training time
- Less precision affects later stages- Less precision affects later stages
- For a new domain or new corpus parameter tuning - For a new domain or new corpus parameter tuning is a non-trivial task. is a non-trivial task.
![Page 5: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/5.jpg)
Background: Previous Background: Previous Approaches & MotivationApproaches & Motivation
Empirically chosen context.Empirically chosen context. Effective Handling of corpus based featuresEffective Handling of corpus based features Need of the hour: Need of the hour:
- Good performance- Good performance
- Less training time- Less training time
- Multiple contexts- Multiple contexts
- exploit corpus based features effectively- exploit corpus based features effectively Two Approaches and their comparison with TNT and Two Approaches and their comparison with TNT and
CRFCRF Word level taggingWord level tagging
![Page 6: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/6.jpg)
Experimental Setup : Experimental Setup : Corpus statitsticsCorpus statitstics
Tag set of 25 tagsTag set of 25 tags
CorpusCorpus Size (in Size (in words)words)
Unseen Unseen words (in words (in percentagpercentage)e)
TrainingTraining 187,095187,095 --
DevelopmDevelopmentent
23,56523,565 5.33%5.33%
TestingTesting 23,28123,281 8.15%8.15%
![Page 7: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/7.jpg)
Experimental Setup: Tools Experimental Setup: Tools and Resourcesand Resources
ToolsTools- CRF++- CRF++- TNT- TNT- Morfessor Categories – MAP- Morfessor Categories – MAP
ResourcesResources- Universal word – Hindi Dictionary- Universal word – Hindi Dictionary- Hindi Word net- Hindi Word net- Morph Analyzer - Morph Analyzer
![Page 8: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/8.jpg)
PreprocessingPreprocessing
XC tag is removed (Gadde et. Al., XC tag is removed (Gadde et. Al., 2008).2008).
LexiconLexicon
- For each unique word w of the - For each unique word w of the training corpus training corpus => ENTRY(t1,=> ENTRY(t1,……,t24)……,t24)
- where tj = c(posj , w) / c(w)- where tj = c(posj , w) / c(w)
![Page 9: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/9.jpg)
Representation: Encoding & Representation: Encoding & DecodingDecoding
Each word w is encoded as an n-element Each word w is encoded as an n-element vector INPUT(t1,t2,…,tn) where n = size of vector INPUT(t1,t2,…,tn) where n = size of the tag set.the tag set.
INPUT(t1,t2,…,tn) comes from lexicon if INPUT(t1,t2,…,tn) comes from lexicon if training corpus contains w.training corpus contains w.
If w is not in the training corpusIf w is not in the training corpus
- N(w) = Number of possible POS tags for w- N(w) = Number of possible POS tags for w
- tj - tj = 1/N(w) if posj is a candidate= 1/N(w) if posj is a candidate
= 0 otherwise= 0 otherwise
![Page 10: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/10.jpg)
Representation: Encoding & Representation: Encoding & DecodingDecoding
For each word w, Desired Output is For each word w, Desired Output is encoded as D = (d1,d2,….,dn).encoded as D = (d1,d2,….,dn).
- dj = 1 if posj is a desired ouput- dj = 1 if posj is a desired ouput
= 0 otherwise= 0 otherwise In testing, for each word w, an n-In testing, for each word w, an n-
element vector OUTPUT(o1,…,on) is element vector OUTPUT(o1,…,on) is returned.returned.
- Result = posj, if oj = max(OUTPUT)- Result = posj, if oj = max(OUTPUT)
![Page 11: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/11.jpg)
Single – neuro tagger: Single – neuro tagger: StructureStructure
![Page 12: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/12.jpg)
Single – neuro tagger: Single – neuro tagger: Training & TaggingTraining & Tagging
Error Back-propagation learning Error Back-propagation learning AlgorithmAlgorithm
Weights are Initialized with Random Weights are Initialized with Random valuesvalues
Sequential modeSequential mode Momentum termMomentum term Eta = 0.4 and Alpha = 0.1Eta = 0.4 and Alpha = 0.1 In tagging, it can give multiple outputs In tagging, it can give multiple outputs
or a sorted list of all tags.or a sorted list of all tags.
![Page 13: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/13.jpg)
Experiments: Experiments: Development DataDevelopment Data
FeaturesFeatures PrecisionPrecision
Corpus based and Corpus based and contextualcontextual
93.19%93.19%
Root of the wordRoot of the word 93.38%93.38%
Length of the wordLength of the word 94.04%94.04%
Handling of unseen Handling of unseen wordswords
Root->Dictionary-Root->Dictionary->Word net->Word net->Morfessor >Morfessor
{{tj tj = c(= c(posj posj ,s) + ,s) + c(c(posj posj ,p)/ c(s) + c(p)},p)/ c(s) + c(p)}
95.62%95.62%
![Page 14: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/14.jpg)
Development of the Development of the systemsystem
![Page 15: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/15.jpg)
Multi – neuro tagger: Multi – neuro tagger: StructureStructure
![Page 16: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/16.jpg)
Multi – neuro tagger: Multi – neuro tagger: TrainingTraining
![Page 17: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/17.jpg)
Multi – neuro tagger: Multi – neuro tagger: Learning curvesLearning curves
![Page 18: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/18.jpg)
Multi – neuro tagger: Multi – neuro tagger: ResultsResults
StructureStructure ContextContext DevelopmDevelopmentent
TestTest
97-48-2497-48-24 33 95.44%95.44% 91.87%91.87%
121-48-24121-48-24 4_prev4_prev 95.64%95.64% 92.05%92.05%
121-48-24121-48-24 4_next4_next 95.66%95.66% 91.95%91.95%
145-72-24145-72-24 55 95.55%95.55% 92.15%92.15%
169-72-24169-72-24 6_prev6_prev 95.56%95.56% 92.14%92.14%
169-72-24169-72-24 6_next6_next 95.54%95.54% 92.14%92.14%
193-96-24193-96-24 77 95.46%95.46% 92.07%92.07%
![Page 19: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/19.jpg)
Multi – neuro tagger: Multi – neuro tagger: ComparisonComparison
Precision after voting : 92.19%Precision after voting : 92.19%
TaggerTagger DevelopmDevelopment ent
TestTest Training Training TimeTime
TNTTNT 95.18%95.18% 91.58%91.58% 1-2 1-2 (Seconds)(Seconds)
Multi – Multi – neuro neuro taggertagger
95.78%95.78% 92.19%92.19% 13-14 13-14 (Minutes)(Minutes)
CRFCRF 96.05%96.05% 92.92%92.92% 2-2-2.5(Hours2.5(Hours))
![Page 20: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/20.jpg)
ConclusionConclusion
Single versus Multi-neuro taggerSingle versus Multi-neuro tagger Multi-neuro tagger versus TNT and Multi-neuro tagger versus TNT and
CRFCRF Corpus and Dictionary based featuresCorpus and Dictionary based features More parameters need to be tunedMore parameters need to be tuned 24^5 = 79,62,624 n-grams, while 24^5 = 79,62,624 n-grams, while
250,560 weights250,560 weights Well suited for Indian LanguagesWell suited for Indian Languages
![Page 21: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/21.jpg)
Future WorkFuture Work
Better voting schemes (Confidence Better voting schemes (Confidence point based)point based)
Finding the right context Finding the right context (Probability based)(Probability based)
Various Structures and algorithmsVarious Structures and algorithms
- Sequential Neural Network- Sequential Neural Network
- Convolution Neural Network- Convolution Neural Network
- Combination with SVM- Combination with SVM
![Page 22: Part-Of-Speech Tagging using Neural Networks](https://reader036.vdocument.in/reader036/viewer/2022062305/5681502a550346895dbe18e4/html5/thumbnails/22.jpg)
Thank You!!
Queries???