dataless classi cation - ameet deshpande · ameet deshpande , vedant somani dataless classi cation...

77
Dataless Classification Paper Presentation (CS 6370) Ameet Deshpande 1 Vedant Somani 2 April 16, 2018 Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 1 / 33

Upload: others

Post on 23-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Dataless ClassificationPaper Presentation (CS 6370)

Ameet Deshpande 1 Vedant Somani 2

April 16, 2018

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 1 / 33

Page 2: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Outline

1 Building BlocksBag of WordsExplicit Semantic AnalysisNaive Bayes Classifier

2 Dataless ClassificationMotivationLabel ExpansionOn the Fly ClassificationLeveraging Unlabeled DataDomain Adaptation

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 2 / 33

Page 3: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Outline

1 Building BlocksBag of WordsExplicit Semantic AnalysisNaive Bayes Classifier

2 Dataless ClassificationMotivationLabel ExpansionOn the Fly ClassificationLeveraging Unlabeled DataDomain Adaptation

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 3 / 33

Page 4: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Bag of Words

Bag of words (BOW) is a Naive way of representing documents.

It just counts the number of occurrences of each words and does notpay heed to their positioning.

It is used to serve the purpose of a baseline in this work.

As might be apparent, BOW vector representations are useful onlywhen the exact words required to be retrieved are present in thedocument. More on this later.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 4 / 33

Page 5: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Bag of Words Vector representations

A document D is represented by a vector of size |V |, where Vrepresents the Vocabulary under consideration.

Consider two documents

D1: I am Raju

D2: I love Kaju

The following will be vector representations of the two documents.

Doc I am love Raju Kaju

D1 1 1 0 1 0

D2 1 0 1 0 1

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 5 / 33

Page 6: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Bag of Words Vector representations

A document D is represented by a vector of size |V |, where Vrepresents the Vocabulary under consideration.

Consider two documents

D1: I am Raju

D2: I love Kaju

The following will be vector representations of the two documents.

Doc I am love Raju Kaju

D1 1 1 0 1 0

D2 1 0 1 0 1

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 5 / 33

Page 7: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Bag of Words Vector representations

A document D is represented by a vector of size |V |, where Vrepresents the Vocabulary under consideration.

Consider two documents

D1: I am Raju

D2: I love Kaju

The following will be vector representations of the two documents.

Doc I am love Raju Kaju

D1 1 1 0 1 0

D2 1 0 1 0 1

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 5 / 33

Page 8: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Outline

1 Building BlocksBag of WordsExplicit Semantic AnalysisNaive Bayes Classifier

2 Dataless ClassificationMotivationLabel ExpansionOn the Fly ClassificationLeveraging Unlabeled DataDomain Adaptation

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 6 / 33

Page 9: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Explicit Semantic Analysis

Humans interpret a word in a very large context comprising ofbackground knowledge.

Explicit Semantic Analysis (ESA) is a method which represents text(words) in a high dimensional space of Wikipedia derived concepts.

For each word, an inverted index of concepts - in which the wordappeared - is stored.

A TFIDF matrix is generated.

The concepts associated with the word are then scored based on theTFIDF vector for the input word, and the relevance of the concept tothe word.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 7 / 33

Page 10: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Explicit Semantic Analysis

Humans interpret a word in a very large context comprising ofbackground knowledge.

Explicit Semantic Analysis (ESA) is a method which represents text(words) in a high dimensional space of Wikipedia derived concepts.

For each word, an inverted index of concepts - in which the wordappeared - is stored.

A TFIDF matrix is generated.

The concepts associated with the word are then scored based on theTFIDF vector for the input word, and the relevance of the concept tothe word.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 7 / 33

Page 11: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Explicit Semantic Analysis

Humans interpret a word in a very large context comprising ofbackground knowledge.

Explicit Semantic Analysis (ESA) is a method which represents text(words) in a high dimensional space of Wikipedia derived concepts.

For each word, an inverted index of concepts - in which the wordappeared - is stored.

A TFIDF matrix is generated.

The concepts associated with the word are then scored based on theTFIDF vector for the input word, and the relevance of the concept tothe word.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 7 / 33

Page 12: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Explicit Semantic Analysis

Humans interpret a word in a very large context comprising ofbackground knowledge.

Explicit Semantic Analysis (ESA) is a method which represents text(words) in a high dimensional space of Wikipedia derived concepts.

For each word, an inverted index of concepts - in which the wordappeared - is stored.

A TFIDF matrix is generated.

The concepts associated with the word are then scored based on theTFIDF vector for the input word, and the relevance of the concept tothe word.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 7 / 33

Page 13: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Explicit Semantic Analysis

Humans interpret a word in a very large context comprising ofbackground knowledge.

Explicit Semantic Analysis (ESA) is a method which represents text(words) in a high dimensional space of Wikipedia derived concepts.

For each word, an inverted index of concepts - in which the wordappeared - is stored.

A TFIDF matrix is generated.

The concepts associated with the word are then scored based on theTFIDF vector for the input word, and the relevance of the concept tothe word.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 7 / 33

Page 14: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Explicit Semantic Analysis

Following is an example of ESA representations.

WordConcept1

Score1

Concept2

Score2

Concept3

Score3. . .

Marsplanet0.9

SolarSystem0.85

jupiter0.3

. . .

exploreradventurer0.89

pioneer0.7

vehicle0.2

. . .

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 8 / 33

Page 15: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Explicit Semantic Analysis

All the words will have some score associated with each concept ofWikipedia. Therefore while computing the true relatedness of words,a cosine similarity measure is used over the word representations.

However, not all the scores associated with concepts are significant.For example, the score for the concept ”Sachin Tendulkar” for theword neutron might be very low.

Thus in practice, scores for all the concepts are not stored, insteadonly the scores associated with a k (say 100) most related conceptsare stored.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 9 / 33

Page 16: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Explicit Semantic Analysis

All the words will have some score associated with each concept ofWikipedia. Therefore while computing the true relatedness of words,a cosine similarity measure is used over the word representations.

However, not all the scores associated with concepts are significant.For example, the score for the concept ”Sachin Tendulkar” for theword neutron might be very low.

Thus in practice, scores for all the concepts are not stored, insteadonly the scores associated with a k (say 100) most related conceptsare stored.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 9 / 33

Page 17: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Explicit Semantic Analysis

All the words will have some score associated with each concept ofWikipedia. Therefore while computing the true relatedness of words,a cosine similarity measure is used over the word representations.

However, not all the scores associated with concepts are significant.For example, the score for the concept ”Sachin Tendulkar” for theword neutron might be very low.

Thus in practice, scores for all the concepts are not stored, insteadonly the scores associated with a k (say 100) most related conceptsare stored.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 9 / 33

Page 18: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Outline

1 Building BlocksBag of WordsExplicit Semantic AnalysisNaive Bayes Classifier

2 Dataless ClassificationMotivationLabel ExpansionOn the Fly ClassificationLeveraging Unlabeled DataDomain Adaptation

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 10 / 33

Page 19: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Naive Bayes Classifier

This serves as a baseline in this work.

P(Class|D) =P(d1|Class) . . .P(dn|Class)× P(Class)

P(D)

Usually d1, d2, . . . , dn ∈Words(Document).

But let’s convince ourselves that d ′i s can be a single element of any

vector representation of the document.

Usually the features of the vector representation are words. But wecould use the ESA representation (and it could perhaps be moreuseful).

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 11 / 33

Page 20: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Naive Bayes Classifier

This serves as a baseline in this work.

P(Class|D) =P(d1|Class) . . .P(dn|Class)× P(Class)

P(D)

Usually d1, d2, . . . , dn ∈Words(Document).

But let’s convince ourselves that d ′i s can be a single element of any

vector representation of the document.

Usually the features of the vector representation are words. But wecould use the ESA representation (and it could perhaps be moreuseful).

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 11 / 33

Page 21: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Naive Bayes Classifier

This serves as a baseline in this work.

P(Class|D) =P(d1|Class) . . .P(dn|Class)× P(Class)

P(D)

Usually d1, d2, . . . , dn ∈Words(Document).

But let’s convince ourselves that d ′i s can be a single element of any

vector representation of the document.

Usually the features of the vector representation are words. But wecould use the ESA representation (and it could perhaps be moreuseful).

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 11 / 33

Page 22: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Naive Bayes Classifier

This serves as a baseline in this work.

P(Class|D) =P(d1|Class) . . .P(dn|Class)× P(Class)

P(D)

Usually d1, d2, . . . , dn ∈Words(Document).

But let’s convince ourselves that d ′i s can be a single element of any

vector representation of the document.

Usually the features of the vector representation are words. But wecould use the ESA representation (and it could perhaps be moreuseful).

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 11 / 33

Page 23: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Naive Bayes Classifier

Here is how ESA representations can be used to classify documents.

Consider two documents

D1: Tesla launches Falcon

D2: The Eagle has landed

The following could be the ESA representations of the twodocuments.

Doc Space Cars Birds Musk Armstrong

D1 0.02 0.02 0.01 0.10 0.00

D2 0.02 0.00 0.01 0.00 0.10

10-100 examples are used to train a Supervised Classifier and resultsare compared against the Dataless classifier.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 12 / 33

Page 24: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Naive Bayes Classifier

Here is how ESA representations can be used to classify documents.

Consider two documents

D1: Tesla launches Falcon

D2: The Eagle has landed

The following could be the ESA representations of the twodocuments.

Doc Space Cars Birds Musk Armstrong

D1 0.02 0.02 0.01 0.10 0.00

D2 0.02 0.00 0.01 0.00 0.10

10-100 examples are used to train a Supervised Classifier and resultsare compared against the Dataless classifier.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 12 / 33

Page 25: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Naive Bayes Classifier

Here is how ESA representations can be used to classify documents.

Consider two documents

D1: Tesla launches Falcon

D2: The Eagle has landed

The following could be the ESA representations of the twodocuments.

Doc Space Cars Birds Musk Armstrong

D1 0.02 0.02 0.01 0.10 0.00

D2 0.02 0.00 0.01 0.00 0.10

10-100 examples are used to train a Supervised Classifier and resultsare compared against the Dataless classifier.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 12 / 33

Page 26: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Naive Bayes Classifier

Here is how ESA representations can be used to classify documents.

Consider two documents

D1: Tesla launches Falcon

D2: The Eagle has landed

The following could be the ESA representations of the twodocuments.

Doc Space Cars Birds Musk Armstrong

D1 0.02 0.02 0.01 0.10 0.00

D2 0.02 0.00 0.01 0.00 0.10

10-100 examples are used to train a Supervised Classifier and resultsare compared against the Dataless classifier.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 12 / 33

Page 27: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Disclaimer

No Data used?

It is important to remember that a large amount of data has already beenused to train the model. Wikipedia articles and ESA are used to get vectorrepresentations. This approach is not Dataless in that sense

On the fly Classification

But post the training procedure, the classifier can categorize documentsinto labels which it has never been trained on before. It is definitelyDataless in that sense.

Example

We will now see a demonstration to get a feel of how this procedure works.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 13 / 33

Page 28: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Disclaimer

No Data used?

It is important to remember that a large amount of data has already beenused to train the model. Wikipedia articles and ESA are used to get vectorrepresentations. This approach is not Dataless in that sense

On the fly Classification

But post the training procedure, the classifier can categorize documentsinto labels which it has never been trained on before. It is definitelyDataless in that sense.

Example

We will now see a demonstration to get a feel of how this procedure works.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 13 / 33

Page 29: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Disclaimer

No Data used?

It is important to remember that a large amount of data has already beenused to train the model. Wikipedia articles and ESA are used to get vectorrepresentations. This approach is not Dataless in that sense

On the fly Classification

But post the training procedure, the classifier can categorize documentsinto labels which it has never been trained on before. It is definitelyDataless in that sense.

Example

We will now see a demonstration to get a feel of how this procedure works.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 13 / 33

Page 30: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Outline

1 Building BlocksBag of WordsExplicit Semantic AnalysisNaive Bayes Classifier

2 Dataless ClassificationMotivationLabel ExpansionOn the Fly ClassificationLeveraging Unlabeled DataDomain Adaptation

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 14 / 33

Page 31: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Motivation

Say we are posed with the task of classifying documents into twocategories, {Alien Invasion, Rocket Launch}

Consider the following article

UFO sightings have been reported throughout recorded history and invarious parts of the world, raising questions about life on other planets andwhether extraterrestrials have visited Earth.

How is it possible for us to classify the article with ease?

We humans seem to use the semantics of the labels representing theclasses. We “know” that the label means.

But say we were training a supervised classifier for this task. UsualMachine Learning approaches just treat the labels/classes as 0/1.

Can we use an algorithm which does not throw away the meaning inthe labels? This could be one way of injecting World Knowledge intothe system.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 15 / 33

Page 32: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Motivation

Say we are posed with the task of classifying documents into twocategories, {Alien Invasion, Rocket Launch}

Consider the following article

UFO sightings have been reported throughout recorded history and invarious parts of the world, raising questions about life on other planets andwhether extraterrestrials have visited Earth.

How is it possible for us to classify the article with ease?

We humans seem to use the semantics of the labels representing theclasses. We “know” that the label means.

But say we were training a supervised classifier for this task. UsualMachine Learning approaches just treat the labels/classes as 0/1.

Can we use an algorithm which does not throw away the meaning inthe labels? This could be one way of injecting World Knowledge intothe system.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 15 / 33

Page 33: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Motivation

Say we are posed with the task of classifying documents into twocategories, {Alien Invasion, Rocket Launch}

Consider the following article

UFO sightings have been reported throughout recorded history and invarious parts of the world, raising questions about life on other planets andwhether extraterrestrials have visited Earth.

How is it possible for us to classify the article with ease?

We humans seem to use the semantics of the labels representing theclasses. We “know” that the label means.

But say we were training a supervised classifier for this task. UsualMachine Learning approaches just treat the labels/classes as 0/1.

Can we use an algorithm which does not throw away the meaning inthe labels? This could be one way of injecting World Knowledge intothe system.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 15 / 33

Page 34: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Motivation

Say we are posed with the task of classifying documents into twocategories, {Alien Invasion, Rocket Launch}

Consider the following article

UFO sightings have been reported throughout recorded history and invarious parts of the world, raising questions about life on other planets andwhether extraterrestrials have visited Earth.

How is it possible for us to classify the article with ease?

We humans seem to use the semantics of the labels representing theclasses. We “know” that the label means.

But say we were training a supervised classifier for this task. UsualMachine Learning approaches just treat the labels/classes as 0/1.

Can we use an algorithm which does not throw away the meaning inthe labels? This could be one way of injecting World Knowledge intothe system.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 15 / 33

Page 35: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Motivation

Say we are posed with the task of classifying documents into twocategories, {Alien Invasion, Rocket Launch}

Consider the following article

UFO sightings have been reported throughout recorded history and invarious parts of the world, raising questions about life on other planets andwhether extraterrestrials have visited Earth.

How is it possible for us to classify the article with ease?

We humans seem to use the semantics of the labels representing theclasses. We “know” that the label means.

But say we were training a supervised classifier for this task. UsualMachine Learning approaches just treat the labels/classes as 0/1.

Can we use an algorithm which does not throw away the meaning inthe labels? This could be one way of injecting World Knowledge intothe system.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 15 / 33

Page 36: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Motivation

Say we are posed with the task of classifying documents into twocategories, {Alien Invasion, Rocket Launch}

Consider the following article

UFO sightings have been reported throughout recorded history and invarious parts of the world, raising questions about life on other planets andwhether extraterrestrials have visited Earth.

How is it possible for us to classify the article with ease?

We humans seem to use the semantics of the labels representing theclasses. We “know” that the label means.

But say we were training a supervised classifier for this task. UsualMachine Learning approaches just treat the labels/classes as 0/1.

Can we use an algorithm which does not throw away the meaning inthe labels? This could be one way of injecting World Knowledge intothe system.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 15 / 33

Page 37: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Semantic Representation of Categories

Now that we have established that a rich semantic representation of acategory helps, let’s see how we can get such a representation.

Say an oracle provides for each category i , a document wi .

If the representation of the document to be classified, ϕ(d), is closerto the correct class wi rather than a wrong class wj , we can classify itsuccessfully.

||w i − ϕ(d)|| ≤ ||w j − ϕ(d)|| − γ,∀j 6= i

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 16 / 33

Page 38: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Semantic Representation of Categories

Now that we have established that a rich semantic representation of acategory helps, let’s see how we can get such a representation.

Say an oracle provides for each category i , a document wi .

If the representation of the document to be classified, ϕ(d), is closerto the correct class wi rather than a wrong class wj , we can classify itsuccessfully.

||w i − ϕ(d)|| ≤ ||w j − ϕ(d)|| − γ,∀j 6= i

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 16 / 33

Page 39: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Semantic Representation of Categories

Now that we have established that a rich semantic representation of acategory helps, let’s see how we can get such a representation.

Say an oracle provides for each category i , a document wi .

If the representation of the document to be classified, ϕ(d), is closerto the correct class wi rather than a wrong class wj , we can classify itsuccessfully.

||w i − ϕ(d)|| ≤ ||w j − ϕ(d)|| − γ,∀j 6= i

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 16 / 33

Page 40: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Semantic Representation of Categories

Now that we have established that a rich semantic representation of acategory helps, let’s see how we can get such a representation.

Say an oracle provides for each category i , a document wi .

If the representation of the document to be classified, ϕ(d), is closerto the correct class wi rather than a wrong class wj , we can classify itsuccessfully.

||w i − ϕ(d)|| ≤ ||w j − ϕ(d)|| − γ,∀j 6= i

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 16 / 33

Page 41: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Semantic Representation of Categories

Since an oracle is imaginary and sadly we have to do all the work, weassume that the label name li is a good enough approximation of theoracle document w i .

But it can be proved that the approximation has room for some errorand we can get a good classifier even when the approximation is alittle off.

||ϕ(d)− ϕ({li})|| ≤ ||ϕ(d)− ϕ({lj})|| − γ + 2η,∀j 6= i

where η is the error made in approximating the oracle document.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 17 / 33

Page 42: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Semantic Representation of Categories

Since an oracle is imaginary and sadly we have to do all the work, weassume that the label name li is a good enough approximation of theoracle document w i .

But it can be proved that the approximation has room for some errorand we can get a good classifier even when the approximation is alittle off.

||ϕ(d)− ϕ({li})|| ≤ ||ϕ(d)− ϕ({lj})|| − γ + 2η,∀j 6= i

where η is the error made in approximating the oracle document.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 17 / 33

Page 43: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Semantic Representation of Categories

Since an oracle is imaginary and sadly we have to do all the work, weassume that the label name li is a good enough approximation of theoracle document w i .

But it can be proved that the approximation has room for some errorand we can get a good classifier even when the approximation is alittle off.

||ϕ(d)− ϕ({li})|| ≤ ||ϕ(d)− ϕ({lj})|| − γ + 2η,∀j 6= i

where η is the error made in approximating the oracle document.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 17 / 33

Page 44: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Outline

1 Building BlocksBag of WordsExplicit Semantic AnalysisNaive Bayes Classifier

2 Dataless ClassificationMotivationLabel ExpansionOn the Fly ClassificationLeveraging Unlabeled DataDomain Adaptation

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 18 / 33

Page 45: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Label Expansion

To reduce the error being made in approximating the oracledocument, we need to make sure the label names are representativeof the category.

A procedure called label expansion is used to ensure that. Instead ofusing a single word to represent a category, a handful ofrepresentative words are used.

Expanded Labels for Newsgroup Categories

talk.politics.guns → {politics, guns}soc.religion.christian → {society, religion, christianity, christian}comp.sys.mac.hardware → {computer, systems, mac, apple, hardware}sci.crypt → {science, cryptography}

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 19 / 33

Page 46: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Label Expansion

To reduce the error being made in approximating the oracledocument, we need to make sure the label names are representativeof the category.

A procedure called label expansion is used to ensure that. Instead ofusing a single word to represent a category, a handful ofrepresentative words are used.

Expanded Labels for Newsgroup Categories

talk.politics.guns → {politics, guns}soc.religion.christian → {society, religion, christianity, christian}comp.sys.mac.hardware → {computer, systems, mac, apple, hardware}sci.crypt → {science, cryptography}

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 19 / 33

Page 47: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Label Expansion

To reduce the error being made in approximating the oracledocument, we need to make sure the label names are representativeof the category.

A procedure called label expansion is used to ensure that. Instead ofusing a single word to represent a category, a handful ofrepresentative words are used.

Expanded Labels for Newsgroup Categories

talk.politics.guns → {politics, guns}soc.religion.christian → {society, religion, christianity, christian}comp.sys.mac.hardware → {computer, systems, mac, apple, hardware}sci.crypt → {science, cryptography}

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 19 / 33

Page 48: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Outline

1 Building BlocksBag of WordsExplicit Semantic AnalysisNaive Bayes Classifier

2 Dataless ClassificationMotivationLabel ExpansionOn the Fly ClassificationLeveraging Unlabeled DataDomain Adaptation

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 20 / 33

Page 49: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

On the Fly Classification

This is a technique that does not use any data post the trainingprocedure.

Nearest Neighbor Classification is used to predict the correct category.

Consider the label set {l1, l2, . . . , lk}. Given a document’srepresentation, the label whose representation is closest to thatdocument is predicted.

arg mini||ϕ(li )− ϕ(d)||

Depending on if Naive Bayes representation is used or ESArepresentation is used, the classifier is called NN-BOW or NN-ESA

The NN-BOW classifier can categorize a document successfully onlyif there are words common between the label and the document (?)but that is not the case with NN-ESA.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 21 / 33

Page 50: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

On the Fly Classification

This is a technique that does not use any data post the trainingprocedure.

Nearest Neighbor Classification is used to predict the correct category.

Consider the label set {l1, l2, . . . , lk}. Given a document’srepresentation, the label whose representation is closest to thatdocument is predicted.

arg mini||ϕ(li )− ϕ(d)||

Depending on if Naive Bayes representation is used or ESArepresentation is used, the classifier is called NN-BOW or NN-ESA

The NN-BOW classifier can categorize a document successfully onlyif there are words common between the label and the document (?)but that is not the case with NN-ESA.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 21 / 33

Page 51: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

On the Fly Classification

This is a technique that does not use any data post the trainingprocedure.

Nearest Neighbor Classification is used to predict the correct category.

Consider the label set {l1, l2, . . . , lk}. Given a document’srepresentation, the label whose representation is closest to thatdocument is predicted.

arg mini||ϕ(li )− ϕ(d)||

Depending on if Naive Bayes representation is used or ESArepresentation is used, the classifier is called NN-BOW or NN-ESA

The NN-BOW classifier can categorize a document successfully onlyif there are words common between the label and the document (?)but that is not the case with NN-ESA.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 21 / 33

Page 52: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Outline

1 Building BlocksBag of WordsExplicit Semantic AnalysisNaive Bayes Classifier

2 Dataless ClassificationMotivationLabel ExpansionOn the Fly ClassificationLeveraging Unlabeled DataDomain Adaptation

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 22 / 33

Page 53: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Leveraging Unlabeled Data

There is often a lot of unlabeled data available. Though it is expensiveto get labeled data, unlabeled data can be retrieved easily (scraping).

Is it possible to harness this pool of documents starting with just thelabels?

Bootstrapping

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 23 / 33

Page 54: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Leveraging Unlabeled Data

There is often a lot of unlabeled data available. Though it is expensiveto get labeled data, unlabeled data can be retrieved easily (scraping).

Is it possible to harness this pool of documents starting with just thelabels?

Bootstrapping

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 23 / 33

Page 55: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Leveraging Unlabeled Data

There is often a lot of unlabeled data available. Though it is expensiveto get labeled data, unlabeled data can be retrieved easily (scraping).

Is it possible to harness this pool of documents starting with just thelabels? Bootstrapping

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 23 / 33

Page 56: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Bootstrapping Algorithm

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 24 / 33

Page 57: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training

An influential paper [3] suggested that if there are two independentviews of the data, both self sufficient in themselves, combining themcould give better results in labeling the documents.

How can this possibly be beneficial?

Let X1,X2 represent the independent views of the data (D) andf1(X1), f2(X2) represent the classifiers built on them.

Clearly f1, f2 depend on the documents they have been trained on. Ifthere is a document on which the predicted labels do not agree, whatshould be done?

Push the document for later, or ignore it. Let’s see what thealgorithm looks like.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 25 / 33

Page 58: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training

An influential paper [3] suggested that if there are two independentviews of the data, both self sufficient in themselves, combining themcould give better results in labeling the documents.

How can this possibly be beneficial?

Let X1,X2 represent the independent views of the data (D) andf1(X1), f2(X2) represent the classifiers built on them.

Clearly f1, f2 depend on the documents they have been trained on. Ifthere is a document on which the predicted labels do not agree, whatshould be done?

Push the document for later, or ignore it. Let’s see what thealgorithm looks like.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 25 / 33

Page 59: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training

An influential paper [3] suggested that if there are two independentviews of the data, both self sufficient in themselves, combining themcould give better results in labeling the documents.

How can this possibly be beneficial?

Let X1,X2 represent the independent views of the data (D) andf1(X1), f2(X2) represent the classifiers built on them.

Clearly f1, f2 depend on the documents they have been trained on. Ifthere is a document on which the predicted labels do not agree, whatshould be done?

Push the document for later, or ignore it. Let’s see what thealgorithm looks like.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 25 / 33

Page 60: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training

An influential paper [3] suggested that if there are two independentviews of the data, both self sufficient in themselves, combining themcould give better results in labeling the documents.

How can this possibly be beneficial?

Let X1,X2 represent the independent views of the data (D) andf1(X1), f2(X2) represent the classifiers built on them.

Clearly f1, f2 depend on the documents they have been trained on. Ifthere is a document on which the predicted labels do not agree, whatshould be done?

Push the document for later, or ignore it. Let’s see what thealgorithm looks like.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 25 / 33

Page 61: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training

An influential paper [3] suggested that if there are two independentviews of the data, both self sufficient in themselves, combining themcould give better results in labeling the documents.

How can this possibly be beneficial?

Let X1,X2 represent the independent views of the data (D) andf1(X1), f2(X2) represent the classifiers built on them.

Clearly f1, f2 depend on the documents they have been trained on. Ifthere is a document on which the predicted labels do not agree, whatshould be done?

Push the document for later, or ignore it. Let’s see what thealgorithm looks like.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 25 / 33

Page 62: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training Algorithm

We will look at this algorithm in detail again, for now let’s focus on line 9where the documents are added to respective training sets only if both theclassifiers output the same label.Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 26 / 33

Page 63: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training Algorithm

The training procedure in the original paper [3] was slightly different.

In this work, the document is added to the training set only if boththe classifiers label the document with high confidence.

In the original work, each classifier chooses p positive examples and nnegative examples which it is most confident of (BinaryClassification).

The advantage of this is that the classifiers transfer knowledge toeach other. How?

Instead of ignoring a document which is not classified with highconfidence by both, even if it is classified by one of them (say C1),we can be sure that it is a legitimate classification.

This can be used in the next iteration of learning to train even C2 andthe confidence of classification for it will hopefully increase.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 27 / 33

Page 64: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training Algorithm

The training procedure in the original paper [3] was slightly different.

In this work, the document is added to the training set only if boththe classifiers label the document with high confidence.

In the original work, each classifier chooses p positive examples and nnegative examples which it is most confident of (BinaryClassification).

The advantage of this is that the classifiers transfer knowledge toeach other. How?

Instead of ignoring a document which is not classified with highconfidence by both, even if it is classified by one of them (say C1),we can be sure that it is a legitimate classification.

This can be used in the next iteration of learning to train even C2 andthe confidence of classification for it will hopefully increase.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 27 / 33

Page 65: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training Algorithm

The training procedure in the original paper [3] was slightly different.

In this work, the document is added to the training set only if boththe classifiers label the document with high confidence.

In the original work, each classifier chooses p positive examples and nnegative examples which it is most confident of (BinaryClassification).

The advantage of this is that the classifiers transfer knowledge toeach other. How?

Instead of ignoring a document which is not classified with highconfidence by both, even if it is classified by one of them (say C1),we can be sure that it is a legitimate classification.

This can be used in the next iteration of learning to train even C2 andthe confidence of classification for it will hopefully increase.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 27 / 33

Page 66: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training Algorithm

The training procedure in the original paper [3] was slightly different.

In this work, the document is added to the training set only if boththe classifiers label the document with high confidence.

In the original work, each classifier chooses p positive examples and nnegative examples which it is most confident of (BinaryClassification).

The advantage of this is that the classifiers transfer knowledge toeach other. How?

Instead of ignoring a document which is not classified with highconfidence by both, even if it is classified by one of them (say C1),we can be sure that it is a legitimate classification.

This can be used in the next iteration of learning to train even C2 andthe confidence of classification for it will hopefully increase.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 27 / 33

Page 67: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training Algorithm

The training procedure in the original paper [3] was slightly different.

In this work, the document is added to the training set only if boththe classifiers label the document with high confidence.

In the original work, each classifier chooses p positive examples and nnegative examples which it is most confident of (BinaryClassification).

The advantage of this is that the classifiers transfer knowledge toeach other. How?

Instead of ignoring a document which is not classified with highconfidence by both, even if it is classified by one of them (say C1),we can be sure that it is a legitimate classification.

This can be used in the next iteration of learning to train even C2 andthe confidence of classification for it will hopefully increase.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 27 / 33

Page 68: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training Algorithm

The training procedure in the original paper [3] was slightly different.

In this work, the document is added to the training set only if boththe classifiers label the document with high confidence.

In the original work, each classifier chooses p positive examples and nnegative examples which it is most confident of (BinaryClassification).

The advantage of this is that the classifiers transfer knowledge toeach other. How?

Instead of ignoring a document which is not classified with highconfidence by both, even if it is classified by one of them (say C1),we can be sure that it is a legitimate classification.

This can be used in the next iteration of learning to train even C2 andthe confidence of classification for it will hopefully increase.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 27 / 33

Page 69: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training Algorithm

Catch: BOW and ESA representations are not really independent.Nevertheless, this was found to improve the classifier.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 28 / 33

Page 70: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Co-training Algorithm

Catch: BOW and ESA representations are not really independent.Nevertheless, this was found to improve the classifier.Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 28 / 33

Page 71: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Outline

1 Building BlocksBag of WordsExplicit Semantic AnalysisNaive Bayes Classifier

2 Dataless ClassificationMotivationLabel ExpansionOn the Fly ClassificationLeveraging Unlabeled DataDomain Adaptation

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 29 / 33

Page 72: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Domain Adaptation

Different domains tend to use different vocabularies to express thesame meaning.

Documents from different Domains

D1: Manchester United floors Manchester City in yesterday’s footballmatch.

D2: LA Galaxy wins derby 4-3 in Major League Soccer.

The difference in vocabularies may restrict supervised classifier to beused in different domains. But with ESA representations, the wordsFootball and Soccer may already be close to each other and thiscould help in generalization.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 30 / 33

Page 73: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Domain Adaptation

Different domains tend to use different vocabularies to express thesame meaning.

Documents from different Domains

D1: Manchester United floors Manchester City in yesterday’s footballmatch.

D2: LA Galaxy wins derby 4-3 in Major League Soccer.

The difference in vocabularies may restrict supervised classifier to beused in different domains. But with ESA representations, the wordsFootball and Soccer may already be close to each other and thiscould help in generalization.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 30 / 33

Page 74: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Domain Adaptation

Different domains tend to use different vocabularies to express thesame meaning.

Documents from different Domains

D1: Manchester United floors Manchester City in yesterday’s footballmatch.

D2: LA Galaxy wins derby 4-3 in Major League Soccer.

The difference in vocabularies may restrict supervised classifier to beused in different domains. But with ESA representations, the wordsFootball and Soccer may already be close to each other and thiscould help in generalization.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 30 / 33

Page 75: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Results for Domain Adaptation

Let’s stare at a few results and deduce what is effecting DomainAdaptation.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 31 / 33

Page 76: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

Summary

We have looked at,

Dataless Classification

Less Data Classification

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 32 / 33

Page 77: Dataless Classi cation - Ameet Deshpande · Ameet Deshpande , Vedant Somani Dataless Classi cation April 16, 2018 4 / 33. Bag of Words Vector representations A document D is represented

References I

Chang, Ming-Wei, et al. ”Importance of Semantic Representation:Dataless Classification.” AAAI. Vol. 2. 2008.

Gabrilovich, Evgeniy, and Shaul Markovitch. ”Computing semanticrelatedness using wikipedia-based explicit semantic analysis.” IJcAI.Vol. 7. 2007.

Blum, Avrim, and Tom Mitchell. ”Combining labeled and unlabeleddata with co-training.” Proceedings of the eleventh annual conferenceon Computational learning theory. ACM, 1998.

Ameet Deshpande , Vedant Somani Dataless Classification April 16, 2018 33 / 33