inferring web citations using social data and sparql rules

Post on 11-May-2015

895 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using Social Data and SPARQL Rules

Matthew RoweOrganisations, Information and Knowledge Group

University of Sheffield

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Outline

• Problem Setting– Personal Information Dissemination

• SPARQL Rules: Identifying Web Citations– Generating Seed Data – Gathering Possible Web Citations– Inferring Web Citations

• Evaluation• Conclusions• Future Work

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Personal Information on the Web

• Personal information on the Web is disseminated:– Voluntarily– Involuntarily

• Increase in personal information:– Identity Theft– Lateral Surveillance

• Web users must discover their identity web references– 2 stage process

• Find possible references• Identify definite references

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Ambiguity!

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Composer

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Cyclist

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Gardener

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Song Writer

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: PhD Student

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Problem Setting

• Performing identification manually:– Time consuming – Laborious

• Handle masses of information– Repeated often

• The Web keeps changing

• Solution = automated techniques– Alleviate the need for humans– Need background knowledge

• Who am I searching for?• What makes them unique?

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

http://www.dcs.shef.ac.uk/~mrowe/foafgenerator.html

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

1. Blocking Step2. Compare values of Inverse

Functional Properties3. Compare Geo URIs4. Compare Geo data

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Gathering Possible Web Citations

• Search WWW and Semantic Web for possible citations• Web resources come in many flavours:

– Data Models, HTML documents, XHTML documents• Convert into RDF

– XHTML Documents:• Use GRDDL• Automated RDF model lifting

– HTML Documents:• Apply person name gazetteer: identify person information• Apply Hidden Markov Model to extract information• Build RDF model from information

M Rowe. Data.dcs: Converting Legacy Data into Linked Data. In proceedings of Linked Data on the Web Workshop, WWW 2010. Raleigh, USA. (2010)

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n

}

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:name ?m .?url foaf:topic ?r .?r foaf:name ?m

}

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:homepage ?h .?url foaf:topic ?r .?r foaf:homepage ?h

}

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:homepage ?h .?url foaf:topic ?r .?r foaf:homepage ?h

}

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Evaluation

• Measures:– Precision, Recall, F-Measure

• Dataset– 50 participants from the Semantic Web and Web 2.0 communities– Seed data collected from Facebook and Twitter– ~17300 web resources: 346 web resources for each participant

• Baselines– Baseline 1: Person name as positive classification

• Skeleton SPARQL Rule– Baseline 2: Human Processing

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

ResultsPrecision Recall F-Measure

Inference Rules 0.955 0.436 0.553Baseline 1 0.191 0.998 0.294Baseline 2 0.765 0.725 0.719

• High precision– Better than humans– Triple Patterns

• Low recall– Rules are strict

• No room for variability– Hard to generalise

• No learning from disambiguation decisions

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Conclusions

• SPARQL Rules are precise– Poor generalisation however– Outperform humans at low web presence levels

• “Needle in a haystack problem”

• User profiles provide seed data– Inexpensively– Capturing:

• Biographical information• Social networking information

• Inability to learn from identifications– Plan for future work– Overcome poor seed data feature coverage

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Questions?

Twitter: @mattroweshowWeb: http://www.dcs.shef.ac.uk/~mroweEmail: m.rowe@dcs.shef.ac.uk

M Rowe and F Ciravegna. Disambiguating Identity Web References using Web 2.0 Data and Semantics. In Press for special issue on "Web 2.0" in the Journal of Web Semantics. (2010)

For more information:

top related