advisor : dr. koh jia-ling speaker : che-wei liang date : 2007.11.20
DESCRIPTION
Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper. Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20. Outline. Introduction Problem Definitions Computational Model - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/1.jpg)
Truth Discovery with Multiple Confliction Information Providers
on the WebXiaoxin Yin, Jiawei Han, Philip S.Yu
Industrial and Government Track short paper
AdvisorAdvisor :: Dr. Koh Jia-LingDr. Koh Jia-LingSpeakerSpeaker :: Che-Wei LiangChe-Wei Liang
DateDate :: 2007.11.202007.11.20
1
![Page 2: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/2.jpg)
Outline
• Introduction• Problem Definitions• Computational Model– Web Site Trustworthiness and Fact Confidence– Iterative Computation
• Empirical Study• Conclusions
2
![Page 3: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/3.jpg)
Introduction
• World-wide web– a necessary part of our lives.– ex: Amazon.com, ShopZilla.com.
• Is the world-wide web always trustable?– There is no guarantee for the correctness of
information on the web.
3
![Page 4: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/4.jpg)
Introduction
• Example 1: Authors of books
incomplete!
incorrect!
4
![Page 5: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/5.jpg)
Introduction
• Ranking web pages– According to authority based on hyperlinks.– Ex: Authority-Hub analysis, PageRank,
more general link-based analysis.
• Does authority or popularity of web sites lead to accuracy of information?
5
![Page 6: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/6.jpg)
Introduction
• Veracity problem– Discover the true fact about each object.
6
![Page 7: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/7.jpg)
Problem Definitions
• Define1: Confidence of facts.– The probability of a fact f being correct,
denote by s(f).
• Define2: Trustworthiness of web sites.– The expected confidence of the facts provided by
a web site w, denote by t(w).
7
![Page 8: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/8.jpg)
Problem Definitions
• Facts may be conflict or supportive to each other.– Ex: “Jennifer Widom”, “J. Widom”
• Concept of implication– imp(f1 → f2): f1’s influence on f2’s confidence.
8
![Page 9: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/9.jpg)
Basic heuristic
• Basic heuristic1. Usually there is only one true fact
for a property of an object.
2. This true fact appears to be the same or similar on different web sites.
9
![Page 10: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/10.jpg)
Basic heuristic (cont.)
• Basic heuristic3. The false facts on different web sites are
less likely to be the same or similar.
4. In a certain domain, a web site that provides mostly true facts for many objects will likely provide true facts for other objects.
10
![Page 11: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/11.jpg)
Web Site Trustworthiness and Fact Confidence
• Trustworthiness t(w)
where F(w) is the set of facts provided by w.
11
![Page 12: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/12.jpg)
Web Site Trustworthiness and Fact Confidence
• more difficult to estimate the confidence of a fact.
12
![Page 13: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/13.jpg)
Web Site Trustworthiness and Fact Confidence
• Simple case– f1 is the only fact about object o1
– assume w1 and w2 are independent.
• Confidence s(f)
W(f) is the set of web sites providing f.13
![Page 14: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/14.jpg)
Web Site Trustworthiness and Fact Confidence
• Trustworthiness score of a web site
• τ(w) is between 0 and +∞, better characterizes how accurate w is.– ex: t(w1) = 0.9, t(w2) = 0.99
t(w2) = 1.1 × t(w1)
τ(w2) = 2 × τ(w1)
14
![Page 15: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/15.jpg)
Web Site Trustworthiness and Fact Confidence
• Confidence score of a fact
– Property:
15
![Page 16: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/16.jpg)
Web Site Trustworthiness and Fact Confidence
• adjusted confidence score of a fact f
16
![Page 17: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/17.jpg)
Web Site Trustworthiness and Fact Confidence
• Compute the confidence of f based on σ*(f) in the same way as computing it based on σ(f).
• Different web sites are independent. add a dampening factor γ, 0 < γ < 1.
incorrect!
17
![Page 18: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/18.jpg)
Web Site Trustworthiness and Fact Confidence
• Negative-confidence problem– a fact f conflicting with some facts provided by
trustworthy web sites. σ*(f) < 0 and s*(f) < 0.
• – If γ . σ*(f) > 0, s(f) is very close to s*(f).– If γ . σ*(f) < 0, s(f) is close to zero but still
positive.
unreasonable!
18
![Page 19: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/19.jpg)
Iterative Computation
• TRUTHFINDER - Iterative method– TruthFinder has little information about the
web sites and the facts.
– Each iteration, improves its knowledge about trustworthiness and confidence.
– Stops when the computation reaches a stable state.
19
![Page 20: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/20.jpg)
Empirical Study
• Compare with VOTING– Which Chooses the fact that is provided by most
web sites.
• Intel PC with a 1.66GHz dual-core processor, 1GB memory, Windows XP Professional.ρ = 0.5 and γ = 0.3.
20
![Page 21: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/21.jpg)
Empirical Study
21
![Page 22: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/22.jpg)
Empirical Study
22
![Page 23: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/23.jpg)
Empirical Study
23
![Page 24: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/24.jpg)
Empirical Study
24
![Page 25: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20](https://reader031.vdocument.in/reader031/viewer/2022013011/56813afe550346895da39772/html5/thumbnails/25.jpg)
Conclusions
• Introduce and formulate the Veracity problem– resolving conflicting facts from multiple web site.– finding true facts among them.
• Propose TRUTHFINDER– Utilizes Web site trustworthiness and fact confidence to
find trustable web sites and true facts.
• Experiment achieves high accuracy.
25