natural language processing lab national taiwan university the splog detection task and a solution...

15
Natural Language Processing Lab National Taiwan The splog Detection Task an d A Solution Based on Tempo ral and Link Properties Yu-Ru Lin et al. NEC America TREC 2006 (Blog session) Presentor: Chun-Yuan Teng

Upload: oswald-campbell

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Natural Language Processing Lab National Taiwan University Uniqueness of splogs Dynamic content –Unlike web spam, a splog generates fresh content to drive traffic Non-endorsement link –Hyperlink is an endorsement of other pages –Spammers can create hyperlinks in normal blogs, links in blogs is not endorsement

TRANSCRIPT

Page 1: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

The splog Detection Task and A Solution Based on Temporal and Link PropertiesYu-Ru Lin et al.

NEC AmericaTREC 2006 (Blog session)

Presentor: Chun-Yuan Teng

Page 2: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Splog characteristics• Machine-generated content• No Value-addition

– No unique information to their readers• Hidden agenda, usually an economic

goal– Commercial intention

Page 3: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Uniqueness of splogs• Dynamic content

– Unlike web spam, a splog generates fresh content to drive traffic

• Non-endorsement link– Hyperlink is an endorsement of other pages– Spammers can create hyperlinks in normal bl

ogs, links in blogs is not endorsement

Page 4: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Features to detect splog• Traditional features

– Tokenized URL, blog and post titles, homepage content, and post content

• Temporal regularity– Temporal content regularity/Temporal

structural regularity• Link regularity

– Consistency in target website

Page 5: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Temporal Content Regularity

Page 6: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Temporal Structural Regularity

Page 7: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Link Regularity estimation

Page 8: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Two kinds of spam detection

• Offline detection– Traditional measurement

• Online detection– Detect spam online

Page 9: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Experimental Result (Offline)

Page 10: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Experimental results (Offline)

Page 11: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Online indexing in blog search engine

Page 12: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Online test

Page 13: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Online test in this paper

Page 14: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Experimental results

Page 15: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al

Natural Language Processing LabNational Taiwan University

Conclusion and contributions

• Modeling the splog problem– The uniqueness of splog

• Regularity based detection– Content and post time

• Evaluation– Online evaluation