mapping between taxonomies

19
Mapping Between Taxonomies Elena Eneva 27 Sep 2001 Advanced IR Seminar

Upload: dung

Post on 06-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Mapping Between Taxonomies. Elena Eneva 27 Sep 2001 Advanced IR Seminar. Taxonomies. Formal systems of orderly classification of knowledge, which are designed for a specific purpose Change of purpose, change of taxonomies Businesses often need and keep the - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mapping Between Taxonomies

Mapping Between Taxonomies

Elena Eneva

27 Sep 2001

Advanced IR Seminar

Page 2: Mapping Between Taxonomies

Taxonomies

Formal systems of orderly classification of knowledge, which are designed for a specific purpose

Change of purpose, change of taxonomies

Businesses often need and keep theinformation in several structures

Important to be able to automatically map between taxonomies

Page 3: Mapping Between Taxonomies

Useful Mappings Companies, organizing information in various ways

(eg. one for marketing, another for product development)

Personal online bookmark classification

Search engines (eg. Google <-> Yahoo)

EU Committee for Standardization “detailed overview of the existing taxonomies officially used in the EU, in order to derive general concepts such as: information organisation, properties, multilinguality, keywords, etc. and, last but not least, the mapping between.”

Page 4: Mapping Between Taxonomies

ApproachGerman

French

Textile

Automobile

By country

By industry

Page 5: Mapping Between Taxonomies

ApproachGerman

French

Textile

Automobile

By country

By industry

Page 6: Mapping Between Taxonomies

ApproachGerman

French

Textile

Automobile

By country

By industry

Page 7: Mapping Between Taxonomies

ApproachGerman

French

Textile

Automobile

By country

By industry

Page 8: Mapping Between Taxonomies

ApproachTextile

Automobile

By industry

Page 9: Mapping Between Taxonomies

ApproachTextile

Automobile

By industry

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

Page 10: Mapping Between Taxonomies

ApproachTextile

Automobile

By industry

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

Page 11: Mapping Between Taxonomies

ApproachTextile

Automobile

By industry

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

Page 12: Mapping Between Taxonomies

ApproachGerman

French

Textile

Automobile

By country

By industry

abc abc abc abc

Page 13: Mapping Between Taxonomies

ApproachGerman

French

Textile

Automobile

By country

By industry

abc abc abc abc

Page 14: Mapping Between Taxonomies

ApproachGerman

French

Textile

Automobile

By country

By industry

abc abc abc abc

abc abc abc abc

Page 15: Mapping Between Taxonomies

Learning Algorithms

2 separate learners for the documents Old doc category -> new doc category Doc contents -> new category

Weighted average based on confidence Final result determined by a decision tree

One combined learner – used both old category and contents as features

Use the unlabeled data for bootstrapping (eg. top 1%)

Page 16: Mapping Between Taxonomies

Learners

Decision Tree (C4.5)Naïve Bayes Classifier (Rainbow)Support Vector Machine (SVM-Light)KNN (from Yiming)

Page 17: Mapping Between Taxonomies

DatasetsTwo classification schemes:

Reuter 2001 Topics Industry categories

Hoovers-255 and Hoovers-28 28 industry categories 255 industry categories

Web pages from Google and Yahoo

Page 18: Mapping Between Taxonomies

Related Literature

Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach, A. Doan, P. Domingos, and A. Halevy. Proceedings of the ACM SIGMOD Conf. on Management of Data (SIGMOD-2001)

Learning Source Descriptions for Data Integration, A. Doan, P. Domingos, and A. Levy. Proceedings of the Third International Workshop on the Web and Databases (WebDB-2000), pages 81-86, 2000. Dallas, TX: ACM SIGMOD.

Learning Mappings between Data Schemas , A. Doan, P. Domingos, and A. Levy. Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, 2000, Austin, TX.

Page 19: Mapping Between Taxonomies

Questions and Ideas

Other possible datasets?

Other learners?

Other papers?

The end.