ling 570: day 8 classification, mallet 1. roadmap open questions? quick review of classification ...
TRANSCRIPT
![Page 1: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/1.jpg)
1
Ling 570: Day 8Classification, Mallet
![Page 2: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/2.jpg)
2
Roadmap
Open questions? Quick review of classification Feature templates
![Page 3: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/3.jpg)
3
Classification Problem Steps
Input processing: Split data into training/dev/test Convert data into a feature representation (aka
Attribute Value Matrix) Training Testing Evaluation
![Page 4: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/4.jpg)
4
Feature templates
Problem: predict the POS tag distribution of an unknown word
Input: “unfrobulate” Input: “turduckenly”
word w[-3..-1] w[-2..-1] w[-3..-1]==ate w[-3..-1]==nly w[-2,-1]=te w[-2,-1]=ly
unfrobulate ate te 1 0 1 0
turduckenly nly ly 0 1 0 1
![Page 5: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/5.jpg)
5
Feature templates
Problem: predict the POS tag distribution of an unknown word
Input: “unfrobulate” Input: “turduckenly”
Features might include:
word w[-3..-1] w[-2..-1] w[-3..-1]==ate w[-3..-1]==nly w[-2,-1]=te w[-2,-1]=ly
unfrobulate ate te 1 0 1 0
turduckenly nly ly 0 1 0 1
![Page 6: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/6.jpg)
6
Feature templates
Problem: predict the POS tag distribution of an unknown word
Input: “unfrobulate” Input: “turduckenly”
Features might include: Last three characters are “ate” Last two characters are “ly”
word w[-3..-1] w[-2..-1] w[-3..-1]==ate w[-3..-1]==nly w[-2,-1]=te w[-2,-1]=ly
unfrobulate ate te 1 0 1 0
turduckenly nly ly 0 1 0 1
![Page 7: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/7.jpg)
7
Feature templates
Problem: predict the POS tag distribution of an unknown word Input: “unfrobulate” Input: “turduckenly”
Features might include: Last three characters are “ate” Last two characters are “ly”
Feature templates generate features given an input Template : Last three characters == XXX.
word w[-3..-1] w[-2..-1] w[-3..-1]==ate w[-3..-1]==nly w[-2,-1]=te w[-2,-1]=ly
unfrobulate ate te 1 0 1 0
turduckenly nly ly 0 1 0 1
![Page 8: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/8.jpg)
8
Feature templates
Problem: predict the POS tag distribution of an unknown word Input: “unfrobulate” Input: “turduckenly”
Features might include: Last three characters are “ate” Last two characters are “ly”
Feature templates generate features given an input Template : Last three characters == XXX. Plug in XXX to get a binary valued feature. Templates generate many featuresword w[-3..-1] w[-2..-1] w[-3..-1]==ate w[-3..-1]==nly w[-2,-1]=te w[-2,-1]=ly
unfrobulate ate te 1 0 1 0
turduckenly nly ly 0 1 0 1
![Page 9: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/9.jpg)
9
Machine learning
![Page 10: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/10.jpg)
10
Classifiers
Wide variety Differ on several dimensions
Supervision
Learning Function
Input Features
![Page 11: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/11.jpg)
11
Supervision in Classifiers Supervised:
True label/class of each training instance is provided to the learner at training time
Naïve Bayes, MaxEnt, Decision Trees, Neural nets, etc
![Page 12: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/12.jpg)
12
Supervision in Classifiers Supervised:
True label/class of each training instance is provided to the learner at training time
Naïve Bayes, MaxEnt, Decision Trees, Neural nets, etc Unsupervised:
No true labels are provided for examples during training Clustering: k-means; Min-cut algorithms
![Page 13: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/13.jpg)
13
Supervision in Classifiers Supervised:
True label/class of each training instance is provided to the learner at training time
Naïve Bayes, MaxEnt, Decision Trees, Neural nets, etc Unsupervised:
No true labels are provided for examples during training Clustering: k-means; Min-cut algorithms
Semi-supervised: (bootstrapping) True labels are provided for only a subset of examples Co-training, semi-supervised SVM/CRF, etc
![Page 14: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/14.jpg)
14
Inductive Bias
What form of function is learned? Function that separates members of different classes
Linear separator Higher order functions Vornoi diagrams, etc
![Page 15: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/15.jpg)
15
Inductive Bias
What form of function is learned? Function that separates members of different classes
Linear separator Higher order functions Vornoi diagrams, etc
Graphically, decision boundary
+ + + - - -
![Page 16: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/16.jpg)
16
Machine Learning Functions
Problem: Can the representation effectively model the class to be learned?
![Page 17: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/17.jpg)
17
Machine Learning Functions
Problem: Can the representation effectively model the class to be learned?
Motivates selection of learning algorithm
++ + + + +
- - - - - - - - -
![Page 18: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/18.jpg)
18
Machine Learning Functions
Problem: Can the representation effectively model the class to be learned?
Motivates selection of learning algorithm
++ + + + +
- - - - - - - - -
For this function,Linear discriminant is GREAT!
![Page 19: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/19.jpg)
19
Machine Learning Functions
Problem: Can the representation effectively model the class to be learned?
Motivates selection of learning algorithm
++ + + + +
- - - - - - - - -
For this function,Linear discriminant is GREAT!Rectangular boundaries (e.g. ID trees)
TERRIBLE!
![Page 20: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/20.jpg)
20
Machine Learning Functions
Problem: Can the representation effectively model the class to be learned?
Motivates selection of learning algorithm
++ + + + +
- - - - - - - - -
For this function,Linear discriminant is GREAT!Rectangular boundaries (e.g. ID trees)
TERRIBLE!
Pick the right representation!
![Page 21: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/21.jpg)
21
Machine Learning Features
Inputs: E.g.words, acoustic measurements, parts-of-speech,
syntactic structures, semantic classes, ..
Vectors of features: E.g. word: letters
‘cat’: L1=c; L2 = a; L3 = t Parts of syntax trees?
![Page 22: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/22.jpg)
22
Machine Learning Features
Questions: Which features and values should be used? How should they relate to each other?
Issue 1: What values should they take? Binary features – don’t do anything! Real valued features *may* need to be normalized
Can force the values to have 0 mean and unit variance Compute the mean and variance on the training set for real valued
feature Replace original value with
Can also bin them or binarize them – often this works better Issue 2: Which ones are important?
Feature selection is sometimes important Current approach
![Page 23: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/23.jpg)
23
Machine Learning Toolkits
Many learners, many tools/implementations
![Page 24: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/24.jpg)
24
Machine Learning Toolkits
Many learners, many tools/implementations
Some broad tool sets weka
Java, lots of classifiers, pedagogically oriented
![Page 25: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/25.jpg)
25
Machine Learning Toolkits
Many learners, many tools/implementations
Some broad tool sets weka
Java, lots of classifiers, pedagogically oriented
mallet Java, classifiers, sequence learners More heavy duty
![Page 26: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/26.jpg)
26
Mallet: intro and data prep
![Page 27: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/27.jpg)
27
Mallet
Machine learning toolkit Developed at UMass Amherst by Andrew McCallum
![Page 28: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/28.jpg)
28
Mallet
Machine learning toolkit Developed at UMass Amherst by Andrew McCallum
Java implementation, open source
![Page 29: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/29.jpg)
29
Mallet
Machine learning toolkit Developed at UMass Amherst by Andrew McCallum
Java implementation, open source
Large collection of machine learning algorithms Targeted to language processing Naïve Bayes, MaxEnt, Decision Trees, Winnow, Boosting Also, clustering, topic models, sequence learners
![Page 30: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/30.jpg)
30
Mallet
Machine learning toolkit Developed at UMass Amherst by Andrew McCallum
Java implementation, open source
Large collection of machine learning algorithms Targeted to language processing Naïve Bayes, MaxEnt, Decision Trees, Winnow, Boosting Also, clustering, topic models, sequence learners
Widely used, but Research software: some bugs/gaps; odd documentation
![Page 31: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/31.jpg)
31
Installation
Installed on patas /NLP_TOOLS/tool_sets/mallet/latest/
Directories: bin/: script files src/: java source code class/: java classes lib/: jar files sample-data/: wikipedia docs for languages id, etc
![Page 32: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/32.jpg)
32
Environment Should be set up on patas
$PATH should include /NLP_TOOLS/tool_sets/mallet/latest/bin
$CLASSPATH should include /NLP_TOOLS/tool_sets/mallet/latest/lib/mallet-deps.jar;
/NLP_TOOLS/tool_sets/mallet/latest/lib/mallet.jar
Check: which text2vectors
/NLP_TOOLS/tool_sets/mallet/latest/bin
![Page 33: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/33.jpg)
33
Mallet Commands
Mallet command types: Data preparation Data/model inspection Training Classification
![Page 34: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/34.jpg)
34
Mallet Commands
Mallet command types: Data preparation Data/model inspection Training Classification
Command line scripts Shell scripts
Set up java environment Invoke java programs
--help lists command line parameters for scripts
![Page 35: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/35.jpg)
35
Mallet Data
Mallet data instances: Instance_id label f1 v1 f2 v2 …..
Stored in internal binary format: “vectors”
Binary format used by learners, decoders
Need to convert text files to binary format
![Page 36: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/36.jpg)
36
Data Preparation
Built-in data importers One class per directory, one instance per file
bin/mallet import-dir --input IF --output OF Label is directory name
(Also text2vectors)
One instance per line bin/mallet import-file --input IF --output OF
Line: instance label text ….. (Also csv2vectors)
Create binary representation of text feature counts
![Page 37: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/37.jpg)
37
Data Preparation
bin/mallet import-svmlight --input IF --output OF Allows import of user constructed feature value pairs
![Page 38: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/38.jpg)
38
Data Preparation
bin/mallet import-svmlight --input IF --output OF Allows import of user constructed feature value pairs Format:
label f1:v1 f2:v2 …..fn:vn Features can strings or indexes
(Also bin/svmlight2vectors)
![Page 39: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/39.jpg)
39
Data Preparation
bin/mallet import-svmlight --input IF --output OF Allows import of user constructed feature value pairs Format:
label f1:v1 f2:v2 …..fn:vn Features can strings or indexes
(Also bin/svmlight2vectors)
If building test data separately from original bin/mallet import-svmlight --input IF --output OF
--use-pipe-from previously_built.vectors
![Page 40: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/40.jpg)
40
Data Preparation
bin/mallet import-svmlight --input IF --output OF Allows import of user constructed feature value pairs Format:
label f1:v1 f2:v2 …..fn:vn Features can strings or indexes
(Also bin/svmlight2vectors)
If building test data separately from original bin/mallet import-svmlight --input IF --output OF
--use-pipe-from previously_built.vectors Ensures consistent feature representation
Note: can’t mix svmlight models with others
![Page 41: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/41.jpg)
41
Accessing Binary Formats vectors2info --input IF
![Page 42: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/42.jpg)
42
Accessing Binary Formats vectors2info --input IF
-- print-labels TRUE Prints list of category labels in data set
![Page 43: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/43.jpg)
43
Accessing Binary Formats vectors2info --input IF
-- print-labels TRUE Prints list of category labels in data set
-- print-matrix sic prints all features and values by string and number
Returns original text feature-value list Possibly out of order
![Page 44: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/44.jpg)
44
Accessing Binary Formats vectors2info --input IF
-- print-labels TRUE Prints list of category labels in data set
-- print-matrix sic prints all features and values by string and number
Returns original text feature-value list Possibly out of order
vectors2vectors --input IF --training-file TNF --testing-file TTF --training-portion pct
![Page 45: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/45.jpg)
45
Accessing Binary Formats vectors2info --input IF
-- print-labels TRUE Prints list of category labels in data set
-- print-matrix sic prints all features and values by string and number
Returns original text feature-value list Possibly out of order
vectors2vectors --input IF --training-file TNF --testing-file TTF --training-portion pct
Creates random training/test splits in some ratio
![Page 46: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/46.jpg)
46
Building & Accessing Models
bin/mallet train-classifier --trainer classifiertype - -training-portion 0.9 --output-classifier OF
Builds classifier model Can also store model, produce scores, confusion matrix, etc
![Page 47: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/47.jpg)
47
Building & Accessing Models
bin/mallet train-classifier --input vector_data_file --trainer classifiertype --training-portion 0.9 --output-classifier OF
Builds classifier model Can also store model, produce scores, confusion matrix, etc
--trainer: MaxEnt, DecisionTree, NaiveBayes, etc
![Page 48: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/48.jpg)
48
Building & Accessing Models
bin/mallet train-classifier --trainer classifiertype - -training-portion 0.9 --output-classifier OF
Builds classifier model Can also store model, produce scores, confusion matrix, etc
--trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en
![Page 49: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/49.jpg)
49
Building & Accessing Models
bin/mallet train-classifier --trainer classifiertype - -training-portion 0.9 --output-classifier OF
Builds classifier model Can also store model, produce scores, confusion matrix, etc
--trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en
Can also use pre-split training & testing files e.g. output of vectors2vectors --training-file, --testing-file
![Page 50: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/50.jpg)
50
Building & Accessing Models
bin/mallet train-classifier --trainer classifiertype - -training-portion 0.9 --output-classifier OF
Builds classifier model Can also store model, produce scores, confusion matrix, etc
--trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en
Confusion Matrix, row=true, column=predicted accuracy=1.0 label 0 1 |total 0 de 1 . |1 1 en . 1 |1 Summary. train accuracy mean = 1.0 stddev = 0 stderr = 0 Summary. test accuracy mean = 1.0 stddev = 0 stderr = 0
![Page 51: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/51.jpg)
51
Accessing Classifiers
classifier2info --classifier maxent.model Prints out contents of model file
![Page 52: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/52.jpg)
52
Accessing Classifiers
classifier2info --classifier maxent.model Prints out contents of model file
FEATURES FOR CLASS en <default> -0.036953801963395115 book 0.004605219133228236 the 0.24270652500835088 i 0.004605219133228236
![Page 53: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/53.jpg)
53
Mallet: testing
![Page 54: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/54.jpg)
54
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output
outputfile --classifier maxent.model
![Page 55: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/55.jpg)
55
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output
outputfile --classifier maxent.model Also instance file, directories: classify-file, classify-dir
![Page 56: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/56.jpg)
56
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output
outputfile --classifier maxent.model Also instance file, directories: classify-file, classify-dir Prints class,score matrix
![Page 57: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/57.jpg)
57
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output
outputfile --classifier maxent.model Also instance file, directories: classify-file, classify-dir Prints class,score matrix
Inst_id class1 score1 class2 score2 array:0 en 0.995 de 0.0046 array:1 en 0.970 de 0.0294 array:2 en 0.064 de 0.935 array:3 en 0.094 de 0.905
![Page 58: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/58.jpg)
58
General Use bin/mallet import-svmlight --input svmltrain.vectors.txt
--output svmltrain.vectors Builds binary representation from feature:value pairs
![Page 59: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/59.jpg)
59
General Use bin/mallet import-svmlight --input svmltrain.vectors.txt
--output svmltrain.vectors Builds binary representation from feature:value pairs
bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model
Trains MaxEnt classifier and stores model
![Page 60: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/60.jpg)
60
General Use bin/mallet import-svmlight --input svmltrain.vectors.txt
--output svmltrain.vectors Builds binary representation from feature:value pairs
bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model
Trains MaxEnt classifier and stores model bin/mallet classify-svmlight --input svmltest.vectors.txt
--output - --classifier svml.model Tests on the new data
![Page 61: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/61.jpg)
61
Other Information
Website: Download and documentation (such as it is) http://mallet.cs.umass.edu
![Page 62: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/62.jpg)
62
Other Information
Website: Download and documentation (such as it is) http://mallet.cs.umass.edu
API tutorial: http://mallet.cs.umass.edu/mallet-tutorial.pdf
![Page 63: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/63.jpg)
63
Text Categorization Task:
Given a document, assign to one of finite set of classes What are the classes? What are the features?
![Page 64: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/64.jpg)
64
Text 1 Several hundred protesters, some wearing goggles and gas masks, marched
past authorities in a downtown street Sunday, hours after riot police forced Occupy Portland demonstrators out of a pair of weeks-old encampments in nearby parks.
Police moved in shortly before noon and drove protesters into the street after dozens remained in the camp in defiance city officials. Mayor Sam Adams had ordered that the camp shut down Saturday at midnight, citing unhealthy conditions and the encampment’s attraction of drug users and thieves.
Anti-Wall Street protesters and their supporters flooded a city park area in Portland early Sunday in defiance of an eviction order, and authorities elsewhere stepped up pressure against the demonstrators, arresting nearly two dozen. (Nov. 13)
More than 50 protesters were arrested in the police action, but officers did not use tear gas, rubber bullets or other so-called non-lethal weapons, police said.
Washington Post, online 11/13/2011
![Page 65: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/65.jpg)
65
Text 2
George Washington coach Mike Lonergan looked at the stat sheet, tried to muster a smile then clicked off the reasons why the Colonials lost to No. 24 California on Sunday night.
A piercing 21-0 run by the Golden Bears at the end of the first half was at the top of the list.
Not even a second straight 20-point effort from Tony Taylor was enough to dig George Washington out of the early hole, and the Colonials spent the rest of the night in a futile game of catch-up.
“I’ve never really been involved with a run quite like that,” Lonergan said after Cal’s 81-54 win over George Washington. “I tried calling a couple timeouts. It was very disappointing that we just never really got our composure back the rest of that half. To end it that way and not even score any points, that was basically the game right there.”
Washington Post, online 11/13/2011
![Page 66: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/66.jpg)
66
Test 3
‘Jersey Boys’ at the National Theatre By Jane Horwitz, Sunday, November 13, 5:29 PM “Jersey Boys” is irresistible, and the touring company now at the
National Theatre gets it almost entirely right. This Broadway hit (it has been running since fall 2005 and has
played Washington before as well) rises well above the so-called jukebox show genre. Subtitled “The Story of Frankie Valli & the Four Seasons,” the musical tells a tale that transcends show business gossip to become a close character study of four talented but very different blue-collar guys from New Jersey — who just happen to have sung some of the best close-harmony rock/pop tunes of the late 1950s, the 1960s and into the 1970s.
Washington Post, online 11/13/2011
![Page 67: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/67.jpg)
67
What categories?
What features?
![Page 68: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/68.jpg)
68
Example: CoreferenceQueen Elizabeth set about transforming her husband, King
George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...
![Page 69: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/69.jpg)
69
Example: CoreferenceQueen Elizabeth set about transforming her husband, King
George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...
![Page 70: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/70.jpg)
70
Example: CoreferenceQueen Elizabeth set about transforming her husband, King
George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...
Can be viewed as a classification problem
![Page 71: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/71.jpg)
71
Example: CoreferenceQueen Elizabeth set about transforming her husband, King
George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...
Can be viewed as a classification problem
What are the inputs?
![Page 72: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/72.jpg)
72
Example: CoreferenceQueen Elizabeth set about transforming her husband, King
George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...
Can be viewed as a classification problem
What are the inputs?
What are the categories?
![Page 73: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/73.jpg)
73
Example: CoreferenceQueen Elizabeth set about transforming her husband, King
George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...
Can be viewed as a classification problem
What are the inputs?
What are the categories?
What features would be useful?
![Page 74: Ling 570: Day 8 Classification, Mallet 1. Roadmap Open questions? Quick review of classification Feature templates 2](https://reader035.vdocument.in/reader035/viewer/2022062421/56649ced5503460f949b9d6f/html5/thumbnails/74.jpg)
74
Example: NER Named Entity tagging:
John visited New York last Friday [person John] visited [location New York] [time last Friday]
As a classification problem John/PER-B visited/O New/LOC-B York/LOC-I last/TIME-B
Friday/TIME-I Input? Features? Classes?