publishing technology online forum - engineering the semantic web

22
innovation. quality. service “Enabling clients to realize the full potential of their content and increase efficiency throughout their enterprise.” Engineering technology to deliver the revolution Presentation to Online Publishers’ forum November 29, 2011 Priya Parvatikar, Technical Architect

Upload: publishing-technology

Post on 24-May-2015

1.159 views

Category:

Education


0 download

DESCRIPTION

Priya Parvatakir from Publishing Technology demonstrates how it is implementing semantic web technologies in new publisher GSE Research's online publishing website.

TRANSCRIPT

Page 1: Publishing Technology Online Forum - Engineering the semantic web

innovation. quality. service

“Enabling clients to realize the full potential of their content and increase efficiency throughout their enterprise.”

Engineering technology to deliver the revolution

Presentation to Online Publishers’ forum

November 29, 2011

Priya Parvatikar, Technical Architect

Page 2: Publishing Technology Online Forum - Engineering the semantic web

About this talk

Engineering technology to deliver the revolution 2

• Features of the GSE Research website

• Overview of how the features have been achieved

• ‘Under the hood’ look at the technology

Page 3: Publishing Technology Online Forum - Engineering the semantic web

Improved search - Enhancing auto-suggest

Engineering technology to deliver the revolution 3

Page 4: Publishing Technology Online Forum - Engineering the semantic web

Using taxonomy information for “did you mean”

Engineering technology to deliver the revolution 4

Page 5: Publishing Technology Online Forum - Engineering the semantic web

Boosting relevant results

Engineering technology to deliver the revolution 5

Page 6: Publishing Technology Online Forum - Engineering the semantic web

Guiding the user through facets

Engineering technology to deliver the revolution 6

Page 7: Publishing Technology Online Forum - Engineering the semantic web

Guiding the user through suggestions

Engineering technology to deliver the revolution 7

Page 8: Publishing Technology Online Forum - Engineering the semantic web

Concept homepages

Engineering technology to deliver the revolution 8

Page 9: Publishing Technology Online Forum - Engineering the semantic web

Showing concepts on item homepages

Engineering technology to deliver the revolution 9

Page 10: Publishing Technology Online Forum - Engineering the semantic web

Suggest related items

Engineering technology to deliver the revolution 10

Page 11: Publishing Technology Online Forum - Engineering the semantic web

GSE Research – How?

Engineering technology to deliver the revolution 11

• Built using the pub2web platform

• MetaStore used for metadata storage

• Apache Solr used for search indexing

• Semantic enrichment of content

• Apache UIMA used for entity extraction

Page 12: Publishing Technology Online Forum - Engineering the semantic web

MetaStore

Engineering technology to deliver the revolution 12

• RDF triplestore for storing metadata

• Agnostic to the type of data being stored

• Able to store rich and very granular data

• Flexible to cater for future data enhancements

For the GSE Research site:

Content

Authors

Taxonomy concepts and relations

Federation of data from external datasets

Page 13: Publishing Technology Online Forum - Engineering the semantic web

Search

Engineering technology to deliver the revolution 13

• Uses enterprise-grade Apache Solr

• Inbuilt support for rich features

• Faceted searching

• Synonyms

• Stemming

• Boosting

• ‘More like this’

• ‘Did you mean’

Page 14: Publishing Technology Online Forum - Engineering the semantic web

Content for GSE Research website

Engineering technology to deliver the revolution 14

Provided by GSE

• Content XML

• Taxonomy prepared by GSE

Taxonomy enhancement

• Concepts mapped to Library of Congress classifications

• Taxonomy automatically enhanced with terms from this classification

Page 15: Publishing Technology Online Forum - Engineering the semantic web

GSE Research taxonomy - example

Engineering technology to deliver the revolution 15

For example, the GSE taxonomy contains

Climate change, pollution & environmental impacts

Water pollution

Air pollution

After enhancing with Library of Congress classification

Climate change, pollution & environmental impacts

Water pollution – variants: aquatic pollution, water contamination

Marine pollution – variants: ocean pollution, sea pollution

Oil pollution of water – variants: petroleum pollution of water

Estuarine pollution – variants: estuary pollution

Air pollution

Page 16: Publishing Technology Online Forum - Engineering the semantic web

Content workflow in GSE Research

Engineering technology to deliver the revolution 16

MetaStoreMetaStore

SearchIndex

SearchIndex

MetaStoreLoader

MetaStoreLoader

Text miningpipelinesText miningpipelines

Content Content

ImagesImages

TablesTables

AuthorsAuthors

Additional concepts

ConceptsConcepts

External datasetsExternal datasets

Page 17: Publishing Technology Online Forum - Engineering the semantic web

Entity extraction for GSE Research content

Engineering technology to deliver the revolution 17

Apache UIMA

• Architectural framework to manage unstructured data

• Apache license open-source project

• OASIS standard

Provides

• Framework

• Annotators – multiple annotators can be applied in a pipeline

• Ability to plug in external text-mining services as annotators

Page 18: Publishing Technology Online Forum - Engineering the semantic web

Example of entity extraction

Engineering technology to deliver the revolution 18

Page 19: Publishing Technology Online Forum - Engineering the semantic web

Editorial curation

Engineering technology to deliver the revolution 19

Page 20: Publishing Technology Online Forum - Engineering the semantic web

Future possibilities for GSE Research

Engineering technology to deliver the revolution 20

• Extraction of geographical concepts

• Federation of data from other external datasets eg. government datasets

• Semantic analysis of search queries to deliver better results

Page 21: Publishing Technology Online Forum - Engineering the semantic web

Summary

Engineering technology to deliver the revolution 21

• Tagging drives discovery

• Provide multiple routes to content

• Provide external context to content

• Start simple and experiment

• Flexibility of underlying systems is key

Page 22: Publishing Technology Online Forum - Engineering the semantic web

Thank you!