multilayer som with tree-structured data for efficient document retrieval and plagiarism detection

14
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology Multilayer SOM With Tree- Structured Data for Efficient Document Retrieval and Plagiarism Detection Presenter : Cheng-Feng Weng Authors :Tommy W. S. Chow, M. K. M. Rahman 2009/10/12 TNN.18 (2009)

Upload: inari

Post on 12-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection. Presenter : Cheng-Feng Weng Authors : Tommy W. S. Chow, M. K. M. Rahman 2009/10/12. TNN.18 (2009). Outline. Motivation Objective Method Experiments Conclusion Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

Presenter : Cheng-Feng Weng

Authors :Tommy W. S. Chow, M. K. M. Rahman

2009/10/12

TNN.18 (2009)

Page 2: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

2

Outline

Motivation Objective Method Experiments Conclusion Comments

Page 3: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

3

Motivation

Document Retrieval: Term-Frequency Problem

Two doc. Containing similar term frequencies may be of different contextually when it spatial distribution of terms is very different.

Plagiarism Detective: Paraphrasing Problem

SOM…project……..

SOM…be mapped into……..

Science…….Computer…….School……..

School of Computer Science……..

Page 4: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

4

Objective It proposed a tree-structured

document model with MLSOM for DR and PD.

Document…….

DR

PD

Global View

Local View

Tree-Structured Model

MLSOM

Page 5: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

5

Structured Representation of DF

A document is partitioned into pages that are further partitioned into paragraphs.

我是網頁 第一行 第二行 無言的第三行

<HTML> <HEAD></HEAD> <BODY> 我是網頁 <br> <p> 第一行 </p> <p> 第二行 </p> 無言的第三行 </BODY></HTML>

我是網頁

第一行

第二行

無言的第三行

Page

我是網頁

第一行

Paragraph

我是網頁

Page 6: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

6

Structured Representation of DF (cont.)

Page 7: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

7

Multilayer SOM

MLSOM was developed for handling tree-structured data.

Page 8: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

8

Multilayer SOM (cont.)

Similarity:

Page 9: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

9

MLSOM Retrieval

Document

Trained MLSOM

Extract to tree-structure and project with PCA matrix

Related Docs.

Page 10: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

10

Plagiarism Detective

Plagiarism Detective using Local Association (PDLA)

Layer 3 SOM

D1, D2, …

D3, D4, ….

D2, D6, …

Related Docs.

Page 11: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

11

Experiments

Document Retrieval:

Page 12: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

12

Experiments (cont.)

Plagiarism Detective:

Page 13: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

13

Conclusions

A new approach of DR and PD using tree-structured document representation and MLSOM is proposed. It has shown that tree-structured representation

enhances the retrieval accuracy by incorporating local characteristics with traditional global characteristics.

Computational Issue: The MLSOM serves as an efficient computational

solution for practical implementation.

Page 14: Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

14

Comments

Advantage Practical, Simple but efficient and effective

Drawback Rate of fail plagiarism detective is still high

Application …