mapping between taxonomies

Post on 06-Jan-2016

29 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Mapping Between Taxonomies. Elena Eneva 27 Sep 2001 Advanced IR Seminar. Taxonomies. Formal systems of orderly classification of knowledge, which are designed for a specific purpose Change of purpose, change of taxonomies Businesses often need and keep the - PowerPoint PPT Presentation

TRANSCRIPT

Mapping Between Taxonomies

Elena Eneva

27 Sep 2001

Advanced IR Seminar

Taxonomies

Formal systems of orderly classification of knowledge, which are designed for a specific purpose

Change of purpose, change of taxonomies

Businesses often need and keep theinformation in several structures

Important to be able to automatically map between taxonomies

Useful Mappings Companies, organizing information in various ways

(eg. one for marketing, another for product development)

Personal online bookmark classification

Search engines (eg. Google <-> Yahoo)

EU Committee for Standardization “detailed overview of the existing taxonomies officially used in the EU, in order to derive general concepts such as: information organisation, properties, multilinguality, keywords, etc. and, last but not least, the mapping between.”

ApproachGerman

French

Textile

Automobile

By country

By industry

ApproachGerman

French

Textile

Automobile

By country

By industry

ApproachGerman

French

Textile

Automobile

By country

By industry

ApproachGerman

French

Textile

Automobile

By country

By industry

ApproachTextile

Automobile

By industry

ApproachTextile

Automobile

By industry

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

ApproachTextile

Automobile

By industry

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

ApproachTextile

Automobile

By industry

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

abcabcabcabcabcabc

ApproachGerman

French

Textile

Automobile

By country

By industry

abc abc abc abc

ApproachGerman

French

Textile

Automobile

By country

By industry

abc abc abc abc

ApproachGerman

French

Textile

Automobile

By country

By industry

abc abc abc abc

abc abc abc abc

Learning Algorithms

2 separate learners for the documents Old doc category -> new doc category Doc contents -> new category

Weighted average based on confidence Final result determined by a decision tree

One combined learner – used both old category and contents as features

Use the unlabeled data for bootstrapping (eg. top 1%)

Learners

Decision Tree (C4.5)Naïve Bayes Classifier (Rainbow)Support Vector Machine (SVM-Light)KNN (from Yiming)

DatasetsTwo classification schemes:

Reuter 2001 Topics Industry categories

Hoovers-255 and Hoovers-28 28 industry categories 255 industry categories

Web pages from Google and Yahoo

Related Literature

Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach, A. Doan, P. Domingos, and A. Halevy. Proceedings of the ACM SIGMOD Conf. on Management of Data (SIGMOD-2001)

Learning Source Descriptions for Data Integration, A. Doan, P. Domingos, and A. Levy. Proceedings of the Third International Workshop on the Web and Databases (WebDB-2000), pages 81-86, 2000. Dallas, TX: ACM SIGMOD.

Learning Mappings between Data Schemas , A. Doan, P. Domingos, and A. Levy. Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, 2000, Austin, TX.

Questions and Ideas

Other possible datasets?

Other learners?

Other papers?

The end.

top related