Download - Hands on Classification
-
8/12/2019 Hands on Classification
1/32
Hands on Classification with Learning
Based Java
Gourab Kundu
Adapted from a talk by Vivek Srikumar
-
8/12/2019 Hands on Classification
2/32
Goals of this tutorial
At the end of these lectures, you will be able to
1. Get started with Learning Based Java2. Use a generic, black box text classifier for different
applicationsand write your own text classifier if needed3. Understand how features can impact the classifier
performance and add features to improve your application4. Build a badge classifier based on character features
-
8/12/2019 Hands on Classification
3/32
A Quick Recap
Given:Examples (x,f(x)) of some unknown functionf Find:A good approximation of f
x provides some representation of the input
The process of mapping a domain element into a representation iscalled Feature Extraction. (Hard; ill-understood; important) x {0,1}n or x Rn
The target function (label) f(x) {-1,+1} Binary Classification f(x) {1,2,3,.,k-1} Multi-class classification
-
8/12/2019 Hands on Classification
4/32
What is text classification?
A documentSome labels
A classifier
(black box)
-
8/12/2019 Hands on Classification
5/32
Several applications fit this framework
Spam detection Sentiment classification
What else can you do, if you had such a black boxsystem that can classify text?
Try to spend 30 seconds brainstorming
-
8/12/2019 Hands on Classification
6/32
Outline of this session
Getting started with LBJ Writing our first classifier: Spam/Ham
Playing with features
Looking inside the black box classifier for feature weights
-
8/12/2019 Hands on Classification
7/32
LEARNING BASED JAVA
Writing classifiers
-
8/12/2019 Hands on Classification
8/32
What is Learning Based Java?
A modeling language for learning and inference
Supports Programming using learned models High level specification of features and constraints between classifiers Inference with constraints Different learning algorithms
The learning operator Classifiers are functions defined in terms of data Learning happens at ompile time
-
8/12/2019 Hands on Classification
9/32
What does LBJ do for you?
Abstracts away the feature representation, learning andinference
Allows you to write learning based programs
Application developers can reason about the application athand
-
8/12/2019 Hands on Classification
10/32
Demo
A learning based program
First, we will write an application that assumes the existence of
a black box classifier
-
8/12/2019 Hands on Classification
11/32
SPAM DETECTION
-
8/12/2019 Hands on Classification
12/32
Spam detection
Which of these (if any) are email spam?
Subject: save over 70 % on name brandsoftware
ppharmacy devote fink tungstatebrown lexicon pawnshop crescentrailroad distaff cytosine barium cainapplication elegy donnellyhydrochloride common embargoshakespearean bassett trustee nucleoluschicano narbonne telltale taggingswirly lank delphinus bragging braverycornea asiatic susanne
Subject: please keep in touch
just like to say that it has been greatmeeting and working with you all . iwill be leaving enron effective july 5 th
to do investment banking in hongkong . i will initially be based in newyork and will be moving to hong kongafter a few months . do contact mewhen you are in the vicinity .
How do you know?
-
8/12/2019 Hands on Classification
13/32
What do we need to build a classifier?
1. Annotated documents*
2. A feature representation of the documents
3. A learning algorithm
* Here we are dealing with supervised learning
-
8/12/2019 Hands on Classification
14/32
Our first LBJ program
/** A learned text classifier; its definition comes from data. */discrete TextClassifier(Document d)
-
8/12/2019 Hands on Classification
15/32
Demo
Lets build a spam detector
How to train? How do different learning algorithms perform? Does this choicematter much?
-
8/12/2019 Hands on Classification
16/32
Features
Our current spam detector uses words as features
Can we do better?
Lets try it out
-
8/12/2019 Hands on Classification
17/32
MORE TEXT CLASSIFICATION
-
8/12/2019 Hands on Classification
18/32
Sentiment classification
Which of these product reviews is positive?
I recently made the switch from PCto Mac, and I can say that I'm notsure why I waited so long.
Considering that I have only hadmy computer a few weeks I can'tsay much about the durability andlongevity of the hardware, but I cansay that the operating system(mine shipped with Lion) andsoftware is top notch.
I've been an Apple user for a longtime, but my most recentMacBook Pro purchase has
convinced me to reconsider. I'vehad several hardware issues,including a failed keyboard,battery failure, and a bad DVDdrive. Now, the backlight on thedisplay fails to turn on whenwaking from sleep
How do you know?
-
8/12/2019 Hands on Classification
19/32
Classifying news groups
Which mailing list should this message be posted to?
I am looking for Quick C or Microsoft C code for image decoding from file forVGA viewing and saving images from/to GIF, TIFF, PCX, or JPEG format. I havescoured the Internet, but its like trying to find a Dr. Seuss spell checkerTSR. It must be out there, and there's no need to reinvent the wheel.
How do you know?alt.atheismcomp.graphicscomp.os.ms-windows.misccomp.sys.ibm.pc.hardwarecomp.sys.mac.hardware
comp.windows.xmisc.forsalerec.autosrec.motorcyclesrec.sport.baseball
rec.sport.hockeysci.cryptsci.electronicssci.medsci.space
soc.religion.christiantalk.politics.gunstalk.politics.mideasttalk.politics.misctalk.religion.misc
-
8/12/2019 Hands on Classification
20/32
Demo
Converting our spam classifier into a Sentiment classifier A newsgroup classifier
Note: How different are these at the implementation level?
-
8/12/2019 Hands on Classification
21/32
Most of the engineering lies in the features
A documentSome labels
A classifier(black box)
-
8/12/2019 Hands on Classification
22/32
Summary
What is LBJ? How do we use it?
Writing a simple spam detector
Playing with features
How much do we need to change to move to a differentapplication?
-
8/12/2019 Hands on Classification
23/32
Assignment before Next Class (Not Graded)
Download the code & data(http://l2r.cs.uiuc.edu/~danr/Teaching/CS446-12/handsonclassification.html)for this class and play with it
Try to solve the Badges game puzzle with LBJ Think about what features are needed Write a parser for reading the data Write a classifier for solving the puzzle
http://l2r.cs.uiuc.edu/~danr/Teaching/CS446-12/handsonclassification.htmlhttp://l2r.cs.uiuc.edu/~danr/Teaching/CS446-12/handsonclassification.htmlhttp://l2r.cs.uiuc.edu/~danr/Teaching/CS446-12/handsonclassification.htmlhttp://l2r.cs.uiuc.edu/~danr/Teaching/CS446-12/handsonclassification.htmlhttp://l2r.cs.uiuc.edu/~danr/Teaching/CS446-12/handsonclassification.html -
8/12/2019 Hands on Classification
24/32
Next Class
We will solve the Badges Game puzzle by Machine Learning
We will look at more text classification examples
We will think about a famous people classifier
Questions
-
8/12/2019 Hands on Classification
25/32
Badge Classifier
Brainstorm the possible Features Characters in entire name Two consecutive Characters Character as Vowel, Character as Consonant .
Feature Engineering is Important (especially if labeled data issmall)
What is the baseline? 70 +, 24 -
-
8/12/2019 Hands on Classification
26/32
THE FAMOUS PEOPLECLASSIFIER
-
8/12/2019 Hands on Classification
27/32
The Famous PeopleClassifier
f( ) = Politician
f( ) =Athlete
f( ) = Corporate Mogul
-
8/12/2019 Hands on Classification
28/32
The NLP version of the fame classifier
All sentences in the news, which thestring Barack Obama occurs
All sentences in the news, which thestring Roger Federeroccurs
All sentences in the news, which thestring Bill Gatesoccurs
Representedby
-
8/12/2019 Hands on Classification
29/32
Our goal
Find famous athletes, corporate moguls and politicians
Athlete
MichaelSchumacher Michael Jordan
Politician
Bill Clinton George W. Bush
Corporate Mogul
Warren Buffet Larry Ellison
-
8/12/2019 Hands on Classification
30/32
Lets brainstorm
How do we build a fame classifier?Remember, we start off with just raw text from a news website
-
8/12/2019 Hands on Classification
31/32
One solution
Let us label entities using features defined on mentions
Identify mentions using the named entity recognizer Define features based on the words, parts of speech and
dependency trees Train a classifier
All sentences in the news, which thestring Barack Obama occurs
-
8/12/2019 Hands on Classification
32/32
Summary
1. Get started with Learning Based Java2. Use a generic, black box text classifier for different
applicationsand write your own text classifier if needed3. Understand how features can impact the classifier
performance and add features to improve your application4. Build a badge classifier based on character features
Questions