e6885 network science lecture 11cylin/course/netsci/netsci... · prior work on part-based object...
TRANSCRIPT
© 2013 Columbia University
E6885 Network Science Lecture 11: Knowledge Graphs
E 6885 Topics in Signal Processing -- Network Science
Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University
November 25th, 2013
© 2013 Columbia University2 E6885 Network Science – Lecture 11: Knowledge Graphs
Course Structure
Class Date Lecture Topics Covered
09/09/13 1 Overview of Network Science
09/16/13 2 Network Representation and Feature Extraction
09/23/13 3 Network Paritioning, Clustering and Visualization
09/30/13 4 Network Analysis Use Case
10/07/13 5 Network Sampling, Estimation, and Modeling
10/14/13 6 Network Topology Inference
10/21/13 7 Network Information Flow
10/28/13 8 Dynamic & Probabilistic Networks and Graph Database
11/11/13 9 Final Project Proposal Presentation
11/18/13 10 Graph Databases II
11/25/13 11 Knowledge Graphs
12/02/13 12 Large-Scale Network Processing System
12/09/13 13 Final Project Presentation – I
12/16/13 14 Final Project Presentation – II
© 2013 Columbia University
Relational Term-Suggestion
What keywords should I put in the search box to get the information I really want? Q.
© 2013 Columbia University
Term Suggestion and Query Expansion
Log-based
Multi-partite Network Analytics
WordNet WikipediaInfluenced by test collection characteristics
Simple concept links only
Limited semantic relatedness
Difficult to update
Extracting human factor
Incorporateexpertise
Networkcommunity -based
Click log, biased in favor of top ranks
Query log,failure for rare queries
Document-based
Ontology-based
Multi-partite network analytics
Not publicly available
© 2013 Columbia University
Influenced by test collection characteristics
No consideration of key terms that are highly semantically related but do not frequently co-occur.
Influenced
Document-based
apple juiceapple tree
apple storeapple TV
Kim, M. AND Choi, K. A. 1999. Comparison of collocation-based similarity measures in query expansion. Information Processing and Management 35 (1999), 19-30.
© 2013 Columbia University
Term Suggestion and Query Expansion
Log-based
Multi-partite Network Analytics
WordNet WikipediaInfluenced by test collection characteristics
Simple concept links only
Limited semantic relatedness
Difficult to update
Extracting human factor
Incorporateexpertise
Networkcommunity -based
Click log, biased in favor of top ranks
Query log,failure for rare queries
Document-based
Ontology-based
Multi-partite network analytics
Not publicly available
© 2013 Columbia University
Log-based Cluster queries with similar clicked URLs
Identifying the mapping between queries and clicked URLs
Pet food
Dog food
BAEZA-YATES, R., AND TIBERI, A. 2007. Extracting Semantic Relations from Query Logs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), 76-85.
© 2013 Columbia University
Term Suggestion and Query Expansion
Log-based
Multi-partite Network Analytics
WordNet WikipediaInfluenced by test collection characteristics
Simple concept links only
Limited semantic relatedness
Difficult to update
Extracting human factor
Incorporateexpertise
Networkcommunity -based
Click log, biased in favor of top ranks
Query log,failure for rare queries
Document-based
Ontology-based
Multi-partite network analytics
Not publicly available
© 2013 Columbia University
WordNet as Ontology
Manuallyconstructed system based on individual words benefit will be limited
System is not easily updated
Pedersen, T, Patwardhan, S and Michelizzi, J. "WordNet::Similarity - Measuring the Relatedness of Concepts" 2004 In Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-2004) pp. 1024-1025.
© 2013 Columbia University
Wikipedia as Ontology
© 2013 Columbia University
Wikipedia is a web-based free encyclopedia that anyone can edit.
The English Wikipedia edition
2.4 million articles
1 billion words.
Wikipedia relies on the power of collective intelligence
by peer-reviewed approaches rather than the authority of individual.
high quality,
almost noise free.
Wikipedia as Ontology
© 2013 Columbia University
Previous Approaches
Merely as an online dictionary and utilize it only as a structured knowledge database
Using associated hyperlinks
MILNE, D., WITTEN, I. H., AND NICHOLS, D. 2007. A Knowledge-Based Search Engine Powered by Wikipedia. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM 2007), 445-454..
© 2013 Columbia University
Term Suggestion and Query Expansion
Log-based
Multi-partite Network Analytics
WordNet WikipediaInfluenced by test collection characteristics
Simple concept links only
Limited semantic relatedness
Difficult to update
Extracting human factor
Incorporateexpertise
Networkcommunity -based
Click log, biased in favor of top ranks
Query log,failure for rare queries
Document-based
Ontology-based
Multi-partite network analytics
W 2.0Not publicly available
© 2013 Columbia University
Log-based
WordNet WikipediaInfluenced by test collection characteristics
Simple concept links only
Limited semantic relatedness
Difficult to update
Click log, biased in favor of top ranks
Not publicly available
Query log,failure for rare queries
Ontology-based
Multi-partite network analytics
Crawling is resource-intensive
Human factor modeling
Semantic relatedness difficult to evaluate
Multi-partite Network AnalyticsTerm Suggestion and Query Expansion
Document-based
Our Challenge
© 2013 Columbia University6/3/12 15
Wikipedia as Ontology
© 2013 Columbia University
Query
Contributor Expertise Analysis
Optimization
Relative Importance Ranking
Visualization Interface
Evaluation Interface
Ontology Data Sampling
Semantic Relatedness Weighting
© 2013 Columbia University
C C
C:contributors T:Terms
T
T
C C
Key Term
T
C C
T
C C
L
L L
L
L:Categories
Layer by layer
© 2013 Columbia University
Query
Contributor Expertise Analysis
Optimization
Relative Importance Ranking
Visualization Interface
Evaluation Interface
Ontology Data Sampling
Semantic Relatedness Weighting
© 2013 Columbia University
© 2013 Columbia University
Query
Contributor Expertise Analysis
Optimization
Relative Importance Ranking
Visualization Interface
Evaluation Interface
Ontology Data Sampling
Semantic Relatedness Weighting
© 2013 Columbia University
© 2013 Columbia University
Contributor Expertise factor
Expertise inference
Expertise
Contributor to contributor
Contributor to categories
Term to categories
Term to Term
© 2013 Columbia University
Query
Contributor Expertise Analysis
Optimization
Relative Importance Ranking
Visualization Interface
Evaluation Interface
Ontology Data Sampling
Semantic Relatedness Weighting
© 2013 Columbia University
High Semantic Relatedness Term Suggestion from Our System
© 2013 Columbia University
Word-completion Term Suggestion
© 2013 Columbia University
P@1 P@5 S@5 S@20 MRR
Simple link 0.3736 0.3039 0.6017 0.6231 0.4023
+Contributor 0.6151 0.3917 0.8031 0.8116 0.4125
+Expertise 0.6693 0.4412 0.8297 0.9620 0.5919
Performance Comparison for Different Relationship Levels.Using BibSonomy Dataset
Experiment I
© 2013 Columbia University
Wordnet Bag of words Our algorithm
Literature 62.0% ± 5% 62.7% ± 4% 76.8% ± 6%
Natural science 60.7% ± 4% 65.6% ± 6% 73.3% ± 3%
Sociology 72.1% ± 5% 62.9% ± 5% 72.5% ± 7%
Business 60.4% ± 6% 58.5% ± 8% 67.1% ± 7%
Law 52.2% ± 9% 50.4% ± 8% 66.3% ± 6%
Engineering 54.0% ± 6% 68.3% ± 5% 66.2% ± 4%
Electrical & Computer Eng.
77.0% ± 4% 68.0% ± 3% 82.3% ± 3%
Life Science 73.1% ± 6% 70.9% ± 6% 81.4% ± 7%
Agriculture 72.6% ± 5% 65.1% ± 6% 72.3% ± 5%
Medical 63.0% ± 8% 65.6% ± 7% 61.6% ± 8%
ODP-based precision evaluation results increase 12.5% in average
Experiment II – Accuracy on different categories
© 2013 Columbia University
Synonyms Hyponymy Antonyms Paraphrase
Zhao et al. - - - 0.7444
Our approach 0.2197 0.3665 0.2313 -
Precision Comparison With Paraphrase Detection System
82% of the suggested terms are reported as related, i.e., synonyms (22%), hyponyms (37%) or antonyms (23%)
© 2013 Columbia University29 E6885 Network Science – Lecture 11: Knowledge Graphs
References
Jyh-Ren Shieh, Ching-Yung Lin, Shun-Xuan Wang, Ja-Ling Wu, “Relational Term-Suggestion Graphs Incorporating Multi-Partite Concept and Expertise Networks,” ACM Transactions on Intelligent Systems and Technology (2012).
Jyh-Ren Shieh, Ching-Yung Lin, Shun-Xuan Wang, Ja-Ling Wu, “ Building Multi-Modal Relational Graphs for Multimedia Retrieval,” International Journal of Multimedia Data Engineering and Management (IJMDEM): pp. 19-41 (2011). Best paper award nomination.
Jyh-Ren Shieh, Yung-Huan Hsieh, Yang-Ting Yeh, Tse-Chung Su, Ching-Yung Lin, Ja-Ling Wu, “Building term suggestion relational graphs from collective intelligence,” World Wide Web Conference (WWW 2009) pp. 1091-1092 (2009).
Jyh-Ren Shieh, Yang-Ting Yeh, Chih-Hung Lin, Ching-Yung Lin and Ja-Ling Wu, “Using Semantic Graphs for Image Search,” IEEE International Conference on Multimedia & Expo (ICME 2008), pp. 105-108 (2008).
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Part-based Object Detection by Learning Random Attributed Graphs
Ref: DQ Zhang and SF Chang, “Detecting image near-duplicate by stochastic attributeed relational graph matching with learning”, ACM MM 2014.
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Problem 1 : Object Detection and Part Identification
b. Where are the object parts ?
a. Does the input image contain the specified object ?
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Problem 2 : Learning Part-based Object Model
Automatically learn the structure and parameters Minimum supervision : no object location and part location
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Prior Work on Part-based Object Detection
Model with Hand-builtstructure
Model with learned structure and part statistics
MRF model,[Li 94’]
Constellation Model, [Burl, Weber, Fergus, Perona, Caltech, Oxford 98’-04’]
AdaBoost, [Viola & Jones, 01’]
Pictorial structure,[Felzenszwalb & Huttenlocher 98’]Elastic Bunch Graph,[Wiskott et. al 97’]
This new model : Graph-based representation; Can handle multi-view object detection
Model withoutspatial structure
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Part-based Representation of Visual Scene
ARGVisual scenes are considered as the composition of the parts with certain spatial/attribute relations, modeled as Attributed Relational Graph (ARG)
==??
ARGSimilarity
IND Detectionas Computing ARG similarity
Attributed Relational Graph (ARG)
Part
Partrelation
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
ARG based on Interest Point Detection
Region-based representation had very bad performance ! Interest point detector: SUSAN (Smallest Univalue Segment Assimilating Nucleus) corner detector Local features at vertexes
Spatial location, Color, Gabor filter coefficients Part relational features at edges
Spatial coordinate difference
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Stochastic Framework for ARG Similarity
H: Hypotheses: H = 1, Graph t is similar to Graph s H = 0, Graph t is not similar to Graph s
VertexCorrespondence
Attribute Transformation
ARG similarity is the likelihood or likelihood ratio of the stochastic process that transforms source ARG to target ARG
ARG s ARG tsY tY
Stochastic Process that Transforms ARG s to ARG t
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Non-linear Scene Transformation
Model Occlusion of objectsAddition of objects
Attribute Transformation
Graph s Graph tsY tY
Scene changes: object movement, occlusion etc. Camera changes: view point
change, panning etc Photometric changes: Lighting etc. Digitization changes: Resolution, gray scale etc.
Model Object appearance change,Object move,Photometric change
VertexCorrespondence
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Generative Model of the Stochastic Transformation Process
X },...,,{ 321211 xxx
Graph s
Graph t
H: Hypothesis
H=1 : two graphs are similar
H
X : Correspondence Matrix
1
2
3
1
2
11x
32x
GraphS
Grapht
Product Graph
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Transformation Likelihood
Prior MRF for constraints
Conditional density for attribute transformation
Transformation Likelihood
1
2
3
1
2
11x
32x
GraphS
Grapht
1
2
3
1
2
11x
32x
GraphS
Grapht
0),( 121112,11 xx
1),( 221122,11 xx
Transformation likelihood is:
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Learning to Match ARGs
Feature point level learning: Label every feature point pairs
Image level learning: Label duplicate pairs and non-duplicate pairs Use Variational Expectation-Maximization (E-M)
Vertex-level annotation
Positive Samples
Negative Samples
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Experiments and Results
Data set Images are picked up from TREC-VID 2003 video frames (partly based on TDT2 topic detection ground truth) 150 duplicate pairs, 300 non-duplicate images
Learning Training set: 30 duplicate pairs, 60 non- duplicate images Feature point level learning – 5 duplicate pairs, 10 non-duplicate images Image level learning – 25 duplicate pairs, 50 non-duplicate images
Feature pointlevel learning
Image-levellearning
Initial parameters
Final parameters
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Compare with other similarity measures
(CH) HSV color histogram (LED) Local Edge Descriptor (AFDIP) Average feature distance of interest points (GRAPH) ARG matching with learning (GRAPH-M) ARG matching with manual parameter adjustment
Precision
Recall
© 2013 Columbia UniversityE6885 Network Science – Lecture 11: Knowledge Graphs
Summary of Part-based ARG Visual Modeling Algorithm
● Statistical part-based similarity measure performs much better than global color histogram and grid-based edge map
● Learning-based ARG matching not only save human cost,but also may give better performance
© 2013 Columbia University44 E6885 Network Science – Lecture 11: Knowledge Graphs
Questions?