creating semantic fingerprints for web documents
Post on 22-Jan-2018
282 Views
Preview:
TRANSCRIPT
100.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 1
Creating Semantic Fingerprints for Web Resources
Katrin Krieger, Jens Schneider, Christian Nywelt, Dietmar RösnerOtto-von-Guericke Universität Magdeburg (Germany)
200.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 2
Motivation
• Automatic extraction of information and generating formal
semantic descriptions are important aspects of Semantic Web
research
query
compare
combine
http://mehmetveysiadam.com
300.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 3
400.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 4
500.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 5
Semantic Fingerprints (SF)
• Semantic signatures of Web documents
• Representing concepts to be found in documents as well as
relationships between these concepts
• Graph structures with concepts as nodes and relationships as
edges
• Can be used to compute semantic relatedness, e.g. in e-learning
scenarios
600.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 6
Desired Properties of Semantic Fingerprints
P1 Concepts are distinct and unambiguous
P2 Concepts are connected through relationships
P3 Documents with similar content will
yield similar SF
P4 A SF covers all essential concepts
belonging to a document
700.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 7
General Idea
• Hypothesis: semantically related concepts of a domain are
connected through relationships
• This information is inherent in LOD datasets which we can exploit
to disambiguate concepts
• This information is sufficient to build semantic fingerprints
800.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 8
How to automatically obtain a Semantic Fingerprint
1. Extract keywords from Web document
2. Create nodes by mapping keywords to semantic concepts
3. Add edges by finding relations
4. Remove irrelevant nodes and edges
5. Identify all connected subgraphs
6. Choose semantic fingerprint from connected subgraphs
900.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 9
Extracting Keywords and Mapping to Concepts
• Use Natural Language Processing (NLP) tools to extract nouns and
noun phrases
• Query dataset to find concepts whose labels correspond with
keywords
1000.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 10
Result of step #1
Disconnected graph with n concepts per keyword
1200.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 12
Find relationships
• Expand each node and search for neighboring concepts to “grow”
the graph (BFS) up to a certain path length n
1300.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 13
Result of Step #2
• Graph with connected subgraphs
1400.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 14
Removing irrelevant nodes and edges
Which nodes and edges are really relevant for the semantic
fingerprint?
Heuristics:
• Path length
• Number of connecting paths
• Occurences in paths
• Number of corresponding keywords
• Interconnection property
1500.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 15
Identifying subgraphs and picking the SF
• Identify subgraphs by performing BFS
• Determine which of the subgraphs is the semantic fingerprint
• Cover as many keywords as possible
• Number of concepts in the subgraph
1600.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 16
Evaluation
P1 Concepts are distinct and unambiguous
P2 Concepts are connected through relationships
P3 Documents with similar content will
yield similar SF
P4 A SF covers all essential concepts
belonging to a document
1700.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 17
Quantitative Evaluation
• P3: Documents with similar content will yield similar SF
• Extraction of 11 different KW lists from real world e-learning
documents
• Generation of SF for all KW lists
• Generation of SF for all (|KWi| k)− -tuple subsets for each KWi with |KWi|
denoting the number of keywords in KWi and varied k from 1 to 4
• Comparison of SF of original KW lists with varied KW lists
• Number of contained concepts
• Number of common concepts
1800.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 18
Quantitative Evaluation (2)
● Number of concepts in 1992 SF vary
from 0 to 22● SF with 14-16 concepts make up 8.3%● SF with 10-13 concepts make up 20.8%
● Grouping into bins● Majority of SF with one KW
less still have ≥90% KW in
common with original SF
1900.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 19
Quantitative Evaluation
• P1: Concepts are distinct and unambiguous
• P4: A SF covers all essential concepts belonging to a document
• Evaluation with human reviewers:
• the reviewers rated the behavior of our algorithm as comprehensible and
the fingerprints as suitable for the keyword lists
• The reviewers also found that some concepts seem to be more important
than others.
2000.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 20
Conclusion
• New method to create a formal semantic description of
a document
• Exploits inherent properties and structures in LOD
datasets
• No need for other methods such as statistics
Open Issues
• Runtime is rather high and expensive in computing
resources
• Not all semantic relations from the documents are also
in the dataset
• Scalability
2100.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 21
Outlook
• Exploit DOM structure of the document
• Add weights to keywords
• Investigate other data structures and adapted expansion
algorithms
• Study other methods to capture semantic relationships from text
2200.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 22
Thank you for your attention.
What are your questions?
img src: https://flic.kr/p/6DBVxb
katrin.krieger@ovgu.de
2300.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 23
SF for KW ={“haskell”, “fold”, “higher order function”, “prove”}
2400.00.2009 OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 24
In use SlideshareConnector
SlideshareConnector
StackOverflowConnector
StackOverflowConnector
FreebaseConnectorFreebaseConnector
DBpediaConnectorDBpedia
Connector
LectureSlideConnector
LectureSlideConnector
Educational metadata
Educational metadata
RESTbasedWeb-Service
(Codename: Guinan)
top related