multilayer som with tree-structured data for efficient document retrieval and plagiarism detection
DESCRIPTION
Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection. Presenter : Cheng-Feng Weng Authors : Tommy W. S. Chow, M. K. M. Rahman 2009/10/12. TNN.18 (2009). Outline. Motivation Objective Method Experiments Conclusion Comments. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection
Presenter : Cheng-Feng Weng
Authors :Tommy W. S. Chow, M. K. M. Rahman
2009/10/12
TNN.18 (2009)
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
2
Outline
Motivation Objective Method Experiments Conclusion Comments
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
3
Motivation
Document Retrieval: Term-Frequency Problem
Two doc. Containing similar term frequencies may be of different contextually when it spatial distribution of terms is very different.
Plagiarism Detective: Paraphrasing Problem
SOM…project……..
SOM…be mapped into……..
Science…….Computer…….School……..
School of Computer Science……..
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
4
Objective It proposed a tree-structured
document model with MLSOM for DR and PD.
Document…….
DR
PD
Global View
Local View
Tree-Structured Model
MLSOM
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
5
Structured Representation of DF
A document is partitioned into pages that are further partitioned into paragraphs.
我是網頁 第一行 第二行 無言的第三行
<HTML> <HEAD></HEAD> <BODY> 我是網頁 <br> <p> 第一行 </p> <p> 第二行 </p> 無言的第三行 </BODY></HTML>
我是網頁
第一行
第二行
無言的第三行
Page
我是網頁
第一行
Paragraph
我是網頁
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
6
Structured Representation of DF (cont.)
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
7
Multilayer SOM
MLSOM was developed for handling tree-structured data.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
8
Multilayer SOM (cont.)
Similarity:
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
9
MLSOM Retrieval
Document
Trained MLSOM
Extract to tree-structure and project with PCA matrix
Related Docs.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
10
Plagiarism Detective
Plagiarism Detective using Local Association (PDLA)
Layer 3 SOM
D1, D2, …
D3, D4, ….
D2, D6, …
…
Related Docs.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
11
Experiments
Document Retrieval:
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
12
Experiments (cont.)
Plagiarism Detective:
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
13
Conclusions
A new approach of DR and PD using tree-structured document representation and MLSOM is proposed. It has shown that tree-structured representation
enhances the retrieval accuracy by incorporating local characteristics with traditional global characteristics.
Computational Issue: The MLSOM serves as an efficient computational
solution for practical implementation.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
14
Comments
Advantage Practical, Simple but efficient and effective
Drawback Rate of fail plagiarism detective is still high
Application …