semantic web 0 (0) 1 ios press similarity-based knowledge...

29
Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige * , Johannes Ruhland Chair of Business Information Systems, Friedrich-Schiller-Universität Jena, Germany E-mail: [email protected] Abstract. This paper investigates how similarity-based retrieval strategies can be combined with graph queries to enable users or system providers to explore repositories in the Linked Open Data (LOD) cloud more thoroughly. For this purpose, we developed a content-based recommender system (RS). It relies on concept annotations of Simple Knowledge Organization System (SKOS) vocabularies and a SPARQL-based query language that facilitates advanced and personalized requests for openly available and interlinked datasets. We have comprehensively evaluated the novel search strategies in several test cases and example applica- tion domains (i.e., travel search and multimedia retrieval). The results of the web-based online experiments showed that our approaches increase the recall and diversity of recommendations or at least provide a competitive alternative strategy of resource access when conventional methods do not provide helpful suggestions. The findings may be of use for Linked Data-enabled recommender systems (LDRS) as well as for semantic search engines that can consume LOD resources. Keywords: Recommender Systems, Linked Open Data, Information Retrieval, SPARQL, Semantic Search, SKOS 1. Introduction A recommender system (RS) component is usually one of the key search features in online portals. It helps users to discover items that reflect their interests [3]. Personalized recommendations are based on a user profile that contains either implicit (e.g., access statis- tics or click behavior) or explicit preference informa- tion (e.g., ratings for items). Common RS techniques are collaborative filtering (CF) or content-based (CB) algorithms. CF approaches derive suggestions from users with similar tastes, whereas CB methods are based on similar items according to metadata descrip- tions [26]. Descriptions in CB engines are often struc- tured in the table-like format of attribute-value pairs. The flat data structure tremendously reduces the mul- tidimensionality of preferences as well as item charac- teristics and can therefore produce weak recommenda- tions. This is why the complex data structure of RDF graphs in the Linked Open Data (LOD) cloud can help to im- prove the representation of user tastes in CB engines. * Corresponding author. E-mail: [email protected]. Consider the following example for illustration: Sup- pose a user has stated that he likes a particular movie director and would like to receive suggestions for other interesting filmmakers. A conventional RS would de- termine similar directors from these metadata. How- ever, the director descriptions may be predominantly comprised of irrelevant information, e.g., the national- ity or the won prizes of the filmmakers. 1 Such meta- data would not yield useful results in a purely content- based system and only achieve a few random hits. On the other hand, a request that is issued against the LOD collection DBpedia could explore the semantic net- work that surrounds the director. By this means, all movies that were shot by the favored director are iden- tified. Additional interesting movies, that were not di- rected by the filmmaker, but share certain characteris- tics with his movies (e.g., the same genres or main ac- tors) could enhance the metadata descriptions and user profile information further. The data web can answer such requests because the LOD technology stack pro- vides the SPARQL Protocol and RDF Query Language 1 http://dbpedia.org 1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved

Upload: others

Post on 24-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

Semantic Web 0 (0) 1 1IOS Press

Similarity-based Knowledge Graph Queriesfor Recommendation RetrievalLisa Wenige Johannes RuhlandChair of Business Information Systems Friedrich-Schiller-Universitaumlt Jena GermanyE-mail lisawenigeuni-jenacom

Abstract This paper investigates how similarity-based retrieval strategies can be combined with graph queries to enable users orsystem providers to explore repositories in the Linked Open Data (LOD) cloud more thoroughly For this purpose we developeda content-based recommender system (RS) It relies on concept annotations of Simple Knowledge Organization System (SKOS)vocabularies and a SPARQL-based query language that facilitates advanced and personalized requests for openly available andinterlinked datasets We have comprehensively evaluated the novel search strategies in several test cases and example applica-tion domains (ie travel search and multimedia retrieval) The results of the web-based online experiments showed that ourapproaches increase the recall and diversity of recommendations or at least provide a competitive alternative strategy of resourceaccess when conventional methods do not provide helpful suggestions The findings may be of use for Linked Data-enabledrecommender systems (LDRS) as well as for semantic search engines that can consume LOD resources

Keywords Recommender Systems Linked Open Data Information Retrieval SPARQL Semantic Search SKOS

1 Introduction

A recommender system (RS) component is usuallyone of the key search features in online portals Ithelps users to discover items that reflect their interests[3] Personalized recommendations are based on a userprofile that contains either implicit (eg access statis-tics or click behavior) or explicit preference informa-tion (eg ratings for items) Common RS techniquesare collaborative filtering (CF) or content-based (CB)algorithms CF approaches derive suggestions fromusers with similar tastes whereas CB methods arebased on similar items according to metadata descrip-tions [26] Descriptions in CB engines are often struc-tured in the table-like format of attribute-value pairsThe flat data structure tremendously reduces the mul-tidimensionality of preferences as well as item charac-teristics and can therefore produce weak recommenda-tionsThis is why the complex data structure of RDF graphsin the Linked Open Data (LOD) cloud can help to im-prove the representation of user tastes in CB engines

Corresponding author E-mail lisawenigeuni-jenacom

Consider the following example for illustration Sup-pose a user has stated that he likes a particular moviedirector and would like to receive suggestions for otherinteresting filmmakers A conventional RS would de-termine similar directors from these metadata How-ever the director descriptions may be predominantlycomprised of irrelevant information eg the national-ity or the won prizes of the filmmakers1 Such meta-data would not yield useful results in a purely content-based system and only achieve a few random hits Onthe other hand a request that is issued against the LODcollection DBpedia could explore the semantic net-work that surrounds the director By this means allmovies that were shot by the favored director are iden-tified Additional interesting movies that were not di-rected by the filmmaker but share certain characteris-tics with his movies (eg the same genres or main ac-tors) could enhance the metadata descriptions and userprofile information further The data web can answersuch requests because the LOD technology stack pro-vides the SPARQL Protocol and RDF Query Language

1httpdbpediaorg

1570-08440-1900$3500 ccopy 0 ndash IOS Press and the authors All rights reserved

2 L Wenige et al Similarity-based Graph Queries

(SPARQL) and suitable query engines that enable fastretrieval [55]However graph-based queries alone are not yet suffi-cient to generate personalized suggestions since theyonly return results for graph patterns that exist inthe repository Consider the following example Con-sumers may like to use recommendation engines thatwork on multiple domains simultaneously For in-stance a user who has stated that he likes certainitems from one domain may like to receive suggestionsfrom another domain as well (ie for cross-domain re-trieval) In case a user profile contains feedback infor-mation for books it would be desirable if the systemcould generate movie suggestions based on this dataWhile the approach of querying RDF graphs can beuseful for this recommendation scenario (since it canidentify matching objects based on entity type declara-tions or typed link information) it may also be the casethat there are no direct links from a preferred book toa suitable movie in the RDF graphTherefore RS designers should not rely too heavily ongraph-based queries but also explore possibilities ofsimilarity-based retrieval A possible solution wouldbe to identify books that are similar to the ones inthe user profile In a subsequent processing step andthrough a graph-based query the engine could explorewhether the similar books have any connections tomovie items in the repository This approach can re-veal implicit connections in the data and might helpto return fewer empty result sets to the user There-fore the system should process the data online Thisis because one processing step (identification of suit-able movies through graph pattern matching) relies onthe outcome of a preceding one (computation of sim-ilar book items) For this purpose it is important thatrecommendations can be quickly generated on-the-flywithout further preprocessing Only then a system willbe able to switch between processing of graph- andsimilarity-based query parts in a flexible mannerAlmost none of the existing Linked Data recommendersystems (LDRS) addresses these issues which pre-vents efficient usage of available knowledge sourcesfor retrieval tasks Instead current LDRS rely on afixed set of item features which have to be extractedfrom the LOD cloud before suggestions can be gen-erated [12 13 22 24 29 50 64] However the ad-hoc retrieval and processing of item features for sim-ilarity computation in combination with graph-basedqueries can be facilitated by concepts from SimpleKnowledge Organization System (SKOS) vocabularies[38] SKOS annotations occur in more than half of the

repositories in the LOD cloud and often describe real-world entities [53] For instance in the DBpedia repos-itory an LOD resource representing a music act ischaracterized by SKOS-based subject descriptors (egstating the genre or the label of the music act) The de-scriptors are part of the DBpedia category graph [11]A recommendation framework that utilizes SKOS an-notations for item-to-item similarity calculation is ap-plicable to a wide range of data collections and do-mains Additionally SKOS subjects are uniquely iden-tified URI resources Therefore they can be conve-niently processed for ad-hoc retrieval without furtherpreprocessing [38]This paper presents a SKOS-based recommendationengine called SKOSRecommender (SKOSRec) thatcan flexibly combine on-the-fly similarity calculationand SPARQL-like techniques to leverage the full po-tential of LOD repositories Its effectiveness has beenproven in a series of user studies In summary the maincontributions of our paper are

ndash Comprehensive literature review of the state-of-the-art in LOD-enabled information retrieval (IR)and recommender systems (Sect 2)

ndash Development of the recommendation frameworkSKORecommender that is applicable to numer-ous LOD collections and works in an open sys-tem architecture It provides novel retrieval ap-proaches through

bull Syntax and processing units for a SPARQL-based recommendation query language (Sect3)

bull The ability to flexibly switch between simi-lar resource retrieval and graph pattern match-ing at different stages of the recommendationworkflow (eg by being able to apply graph-based filter conditions on recommendation re-sults or by utilizing recommendations as partof a SPARQL-based subquery) (Subsects 32-35)

ndash Extensive evaluation of the SKOSRec system in aseries of web-based user experiments which haveproven the effectiveness of the novel retrieval ap-proaches especially in terms of diversity and re-call (Sect 4)

2 Related Work

Effective answering of retrieval requests has beeninvestigated by researchers for many years Even be-

L Wenige et al Similarity-based Graph Queries 3

fore the evolution of the LOD cloud scientists de-veloped strategies for extracting relevant informationfrom text data The most prevalent IR systems performrelevance rankings for unstructured resources and of-fer means to state filter conditions that complementqueries in a faceted search interface The approachescan combine query constraints with similarity-basedretrieval In these systems user requests are quicklyanswered due to extensive preprocessing and index-ing of resources before runtime execution [58] How-ever low latency of search results comes at the cost ofa hard-wired data model which prevents flexible cus-tomizations and causes limitations in terms of ad-hocretrieval [18]Other researchers have developed query-based RS fortable data For instance Adomavicius et al proposethe REQUEST query language that generates sugges-tions from a relational database which contains in-formation on both user profiles and items Thus per-sonalized recommendation queries can be formulated[2] Another interesting approach in the category ofquery-based RS is the FlexRecs system by Koutrikaet al The key advantage of their course RS is that itenables highly individual requests in which differentquery parts (ie profile generation filtering of rec-ommendations or like-minded peers) can be flexiblycombined into so called recommendation workflowsHence highly individual recommendation queries canbe formulated with FlexRecs [32]Despite the strengths and weaknesses of the discussedapproaches it has to be taken into account that theyare not designed to consume LOD LOD-enabled sys-tems on the other hand use RDF data for retrievalThere exist many index-based Linked Data IR systems[4 9 10 20 25 40 46 47 61] However they indexRDF resources before runtime retrieval and can there-fore neither facilitate on-the-fly similarity calculationnor do they provide extensive SPARQL-based queryoptionsAnother interesting engine category are on-the-flyLinked Data IR systems These systems process in-formation from the LOD cloud through spreading ac-tivation algorithms While they enable fast retrievalthe downside of these approaches is that graph-basedquery constraints are only possible to a limited ex-tent due to the applied probabilistic path traversal tech-niques [16 17 28 33 35 56]Existing LDRS on the other hand mostly performoffline computation of item-to-item similarities fromLOD metadata descriptions While this approach of-fers more control over the retrieval process it pre-

vents individual customizations and runtime responses[12 13 22 24 29 50 64]To this date there are only a few query-based LDRS[5 43 49] Most of these systems rely on the assump-tion that user preferences and item metadata reside inthe same repository But this goes against the notion ofthe LOD cloud as a decentralized and openly accessi-ble data space which can be queried through API-likeinterfaces Additionally while query-based LDRS pro-vide SPARQL-based query options to filter recommen-dation results they are not capable to flexibly combinesimilar resource retrieval with graph filters (eg sub-querying with recommendation results)In summary existing non-Linked Data as well asLinked Data-enabled retrieval and RS engines alreadyprovide useful features for personalized resource ac-cess Nevertheless the full potential of LOD for rec-ommendation and search tasks has yet to be realized(see Tab 1) The novel query strategies of the SKOS-Rec engine are attempts to tackle the previously out-lined research gaps to improve existing RS as well asto offer new search strategies for LOD repositories thatare guided by personal preferences

3 The SKOS Recommender

31 System Overview

The SKOSRecommender is designed to consumerecommendation requests and was developed with thehelp of the Apache Jena framework2 A SKOSRecquery triggers a recommendation workflow in whichsimilarity-based retrieval steps can be joined withSPARQL-like filter expressions Listing 1 shows thesyntax of the SKOSRec query language that needs tobe applied to formulate recommendation requests

In the given grammar underlined parts representSPARQL syntax elements that have been directly takenfrom the latest W3C specification [21] Words in cap-ital letters denote query keywords which are alsoSPARQL expressions except for the ones listed in thelanguage parts SimProjection and ServiceIntegrationAs a regular SPARQL SELECT query a SKOSRecquery always generates a table-like result set that con-tains at least a mapping to a single variable A detailedexplanation of each part of the SKOSRec grammar canbe found in the Appendix A of this paperFigure 1 depicts the overall architecture of the system

2httpsjenaapacheorg

4 L Wenige et al Similarity-based Graph Queries

Table 1Summary of existing retrieval and recommendation approaches

System Type LOD-enabled A SPARQL-based queries B On-the-fly similarity C Flexible combination ofA and B

IR systems x x x x

Query-based RS x x X (X)

Index-based LinkedData IR systems

X x x x

On-the-fly LinkedData IR systems

X x X x

LDRS X x x x

Query-based LDRS X X x x

Listing 1 Grammar of the SKOSRec query languageSKOSRecQuery = Prologue S e l e c t P a r t S i m P r o j e c t i o n PREF I t e m P a r t + RecWhereClause S e l e c t P a r t = SELECT ( DISTINCT | REDUCED) Var+ RecWhereClause

LimitClause A g g r e g a t i o n A g g r e g a t i o n = AGG IRIref Var (SUM | MAX | AVG )S i m P r o j e c t i o n = RECOMMEND Var TOP INTEGER S e r v i c e I n t e g r a t i o n S e r v i c e I n t e g r a t i o n = FROM SERVICE IRIREFI t e m P a r t = ( V a r P a r t | IRI ) SimV a r P a r t = [ Var RecWhereClause ]Sim = SIM R e l a t i o n DECIMALR e l a t i o n = ( gt | gt= | = )RecWhereClause = WHERE RecGroupGraphPa t t e rnRecGroupGraphPa t t e rn = rsquo rsquo RecGroupGraphPa t t e rnSub rsquo rsquoRecGroupGraphPa t t e rnSub = TriplesBlock ( R e c G r a p h P a t t e r n N o t T r i p l e s rsquo rsquo TriplesBlock)lowastR e c G r a p h P a t t e r n N o t T r i p l e s = RecGroupOrUnionGraphPat te rn | R e c O p t i o n a l G r a p h P a t t e r n |

RecMinusGraphPa t t e rn | Filter | Bind | InlineDataRecGroupOrUnionGraphPa t te rn = RecGroupGraphPa t t e rn ( rsquoUNIONrsquo RecGroupGraphPa t t e rn )lowastR e c O p t i o n a l G r a p h P a t t e r n = rsquoOPTIONALrsquo RecGroupGraphPa t t e rnRecMinusGraphPa t t e rn = rsquoMINUSrsquo RecGroupGraphPa t t e rn

with all its components The most important parts arethe SKOSRec query processing units (ie the parserand the compiler) They check the syntax of recom-mendation requests and regulate the correct execu-tion of critical operations of the engine (eg graph-based filtering result set joins similarity calculation)The architecture does not dictate a particular SPARQLserver implementation for LOD repositories since thesystem is decoupled from the endpoint that is accessedover HTTP during retrieval However the prototypi-cal implementation of the SKOSRec engine utilizes theOpenLink Virtuoso server3

Fig 1 also shows the sequential order of a typicalrecommendation workflow for a complex request thatconsists of the following retrieval steps

1 Issuing a SKOSRec query2 Parsing the SKOSRec query

3httpsvirtuosoopenlinkswcom

3 Loading information on the LOD repositoryS-

PARQL endpoint to be queried from the config-

uration

4 Retrieval of preferred items and setting up of the

user profile (see Subsect 33 Preference Query-

ing)

5 Optimization of the retrieval process through

workload reduction

6 Determination of similar LOD resources based

on the user profile (see Subsect 32 On-the-Fly

Recommendations

7 Ranking of LOD resources

8 Sending of recommendation results to the com-

piler and joining them with postfilter statements

9 Retrieval of the final result set (see Subsect 35

Postfiltering amp Combinations)

10 Output to the user

L Wenige et al Similarity-based Graph Queries 5

Fig 1 Architecture and workflow of the SKOSRec engine

32 On-the-Fly Recommendations

A simple on-the-fly request is the most basic form ofa SKOSRec query It is based on the enginersquos ability togenerate ad-hoc recommendations from a user profilecontaining preference statements for LOD resourcesListing 2 depicts an example SKOSRec on-the-fly re-quest that obtains suggestions based on a userrsquos pref-erence for the western movie ldquoThey Call Me Trin-ityrdquo In the given query the preferred movie is rep-resented by the DBpedia resource (dbrThey_Call_Me_Trinity) The shown query also contains a prefixwith a namespace declaration and a specification re-garding the number of suggestions the engine shouldgenerate

Listing 2 A simple recommendation query for themovie domain (Q1)

PREFIX dbr lthttpdbpediaorgresourcegt

RECOMMEND movie TOP 5PREF dbrThey_Call_Me_Trinity

In this basic form the query does neither con-tain any pre- or postfilter conditions nor a preferencequery statement Thus the engine simply identifies all

movies that are similar to the preference in the profileWhen executed over DBpedia the query returns thesolution set that is depicted in Table 2

Table 2Result set for Q1

moviedbrGod_Forgivesrsquo_I_DonrsquotdbrAce_High_(1968_film)dbrBoot_Hill_(film)dbrTrinity_Is_Still_My_NamedbrTroublemakers_(1994_film)

For determining recommendations in an ad-hocfashion the system identifies SKOS annotations bymatching a suitable property (ie annotation property)in an RDF dataset (Definition 1)

Definition 1 (Annotation property) An annotationproperty is an IRI (Listing 3) that is defined in theDublin Core Metadata Initiative (DCMI) specificationfor subject annotations It can occur in conjunctionwith one of the two namespaces which are stated bythe standard [15]

Listing 3 Annotation property

ltANNOTPROPgt = dcsubject | dctsubject

6 L Wenige et al Similarity-based Graph Queries

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis for LOD-enabledRS [37] However while their measure considers alltypes of IRI resources for the retrieval our similaritymetric purely relies on SKOS annotations (Definition6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency amongrelevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the set ofshared features (c isin Cshared) where f req(c) is thefrequency of c among all relevant resources and n isthe maximum frequency among these resources The fi-nal similarity score of two resources r and q is deter-

L Wenige et al Similarity-based Graph Queries 7

mined by summating the IC values of each concept ofthe shared feature set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items he likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graphAD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

8 L Wenige et al Similarity-based Graph Queries

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval he would receive a recommenda-tion list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevant re-sources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter (Definitions 12and 13) By this means recommendation scores reflectthe similarity of items for the actual set of filtered LODresources

Definition 12 (Conditional SKOS similarity) LetAnnot(r) be the set of SKOS features of r and Annot(q)the set of SKOS features of resource q which is con-tained in the filtered set of annotations and relevantresources (q isin micro(q)| micro isin Ωpre) then the similaritycan be derived from the IC of their shared concepts(Ccond = Annot(r) cap Annot(q)) (Eq 12)

simcond(r q) = IC(Ccond) (12)

Definition 13 (Conditional SKOS Information Con-tent) The conditional IC of a set of SKOS annota-tions is defined by the sum of the IC of each conceptc isin micro(c) | micro isin Ωpre where f reqcond(c) is the fre-quency of c among all filtered resources and n is themaximum frequency among these resources (Eq 13)

IC(Ccond) = minussum

cisinCcond

log(

f reqcond(c)

n

)(13)

The system carries out the calculations (ie deter-mination of similarity values the ranking of LOD re-sources) in the same way as in non-filtering mode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenario

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

for a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 6 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output table

10 L Wenige et al Similarity-based Graph Queries

Another weakness is the biased semantics of the queryUsually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have createdcan still be different A third factor is that recommen-dation retrieval for movie directors should also takeinto account that a film artist is likely more relevantthe more movies heshe has shot that are in line withthe preferred movies of the user Hence the SKOSRecengine enables another postfiltering strategy which iscalled aggregation-based retrieval This strategy canbe helpful whenever an entity type is linked to a cou-ple of LOD resources in a dataset such that similarityscores of related entities (eg movies) can be aggre-gated to an overall recommendation score for a certainitem (eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queriesIn the case of the movie recommendation scenarioa user could state that he enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Tab5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

The combined aggregation-based query can also bedescribed as a rollup request since suggestions arebased on preferences for sublevel entities Another ex-ample for a roll-up request stems from the applicationscenario of travel search This recommendation queryderives suggestions for city trip destinations from thepoints of interest (POI) a user has visited and liked inanother city (eg London) Listing 8 shows the corre-sponding SKOSRec query for this scenario

5httpsenwikipediaorgwikiQuentin_Tarantino

L Wenige et al Similarity-based Graph Queries 11

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREFdbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-

ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that triggers re-trieval of music artists and bands Afterwards the scoresof the top 100 suggestions are summarized with maximum-based aggregation Thus the highest similarity value amongthe set of music acts determines the ranking score of thefilm Also note because of the aggregation-based retrievalprocedure connections between movies and the LOD re-source dbrThe_Jackson_5 are excluded from similar-ity calculation The scenario exemplifies how semantic re-lationships can help to generate suggestions for a target do-main (ie movies) that are based on a profile containingitems from a different source domain (ie music acts) Whileregular SPARQL queries perform exact matching of triplestatements the SKOSRec syntax facilitates the integrationof similarity- and graph-based retrieval to enrich result listswith other potentially relevant itemsAs a point of reference consider the SPARQL query for thegiven cross-domain scenario (Listing 10)

The query omits the similarity calculation part Thus itonly retrieves movies linked to the resource dbrThe_Jackson_5 By this means the SPARQL results have abetter chance of being related to the initial user profile Onthe other hand empty result lists can occur when the pre-ferred LOD resource does not have any direct connections torecommendable items This assumption is backed up by theresult lists of the two queries (Tabs 8 and 9) The SKOS-Rec query produces a more comprehensive solution than aregular SPARQL query

The following of definitions will formalize the postfilteredand aggregation-based retrieval forms Definition 14 speci-

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 2: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

2 L Wenige et al Similarity-based Graph Queries

(SPARQL) and suitable query engines that enable fastretrieval [55]However graph-based queries alone are not yet suffi-cient to generate personalized suggestions since theyonly return results for graph patterns that exist inthe repository Consider the following example Con-sumers may like to use recommendation engines thatwork on multiple domains simultaneously For in-stance a user who has stated that he likes certainitems from one domain may like to receive suggestionsfrom another domain as well (ie for cross-domain re-trieval) In case a user profile contains feedback infor-mation for books it would be desirable if the systemcould generate movie suggestions based on this dataWhile the approach of querying RDF graphs can beuseful for this recommendation scenario (since it canidentify matching objects based on entity type declara-tions or typed link information) it may also be the casethat there are no direct links from a preferred book toa suitable movie in the RDF graphTherefore RS designers should not rely too heavily ongraph-based queries but also explore possibilities ofsimilarity-based retrieval A possible solution wouldbe to identify books that are similar to the ones inthe user profile In a subsequent processing step andthrough a graph-based query the engine could explorewhether the similar books have any connections tomovie items in the repository This approach can re-veal implicit connections in the data and might helpto return fewer empty result sets to the user There-fore the system should process the data online Thisis because one processing step (identification of suit-able movies through graph pattern matching) relies onthe outcome of a preceding one (computation of sim-ilar book items) For this purpose it is important thatrecommendations can be quickly generated on-the-flywithout further preprocessing Only then a system willbe able to switch between processing of graph- andsimilarity-based query parts in a flexible mannerAlmost none of the existing Linked Data recommendersystems (LDRS) addresses these issues which pre-vents efficient usage of available knowledge sourcesfor retrieval tasks Instead current LDRS rely on afixed set of item features which have to be extractedfrom the LOD cloud before suggestions can be gen-erated [12 13 22 24 29 50 64] However the ad-hoc retrieval and processing of item features for sim-ilarity computation in combination with graph-basedqueries can be facilitated by concepts from SimpleKnowledge Organization System (SKOS) vocabularies[38] SKOS annotations occur in more than half of the

repositories in the LOD cloud and often describe real-world entities [53] For instance in the DBpedia repos-itory an LOD resource representing a music act ischaracterized by SKOS-based subject descriptors (egstating the genre or the label of the music act) The de-scriptors are part of the DBpedia category graph [11]A recommendation framework that utilizes SKOS an-notations for item-to-item similarity calculation is ap-plicable to a wide range of data collections and do-mains Additionally SKOS subjects are uniquely iden-tified URI resources Therefore they can be conve-niently processed for ad-hoc retrieval without furtherpreprocessing [38]This paper presents a SKOS-based recommendationengine called SKOSRecommender (SKOSRec) thatcan flexibly combine on-the-fly similarity calculationand SPARQL-like techniques to leverage the full po-tential of LOD repositories Its effectiveness has beenproven in a series of user studies In summary the maincontributions of our paper are

ndash Comprehensive literature review of the state-of-the-art in LOD-enabled information retrieval (IR)and recommender systems (Sect 2)

ndash Development of the recommendation frameworkSKORecommender that is applicable to numer-ous LOD collections and works in an open sys-tem architecture It provides novel retrieval ap-proaches through

bull Syntax and processing units for a SPARQL-based recommendation query language (Sect3)

bull The ability to flexibly switch between simi-lar resource retrieval and graph pattern match-ing at different stages of the recommendationworkflow (eg by being able to apply graph-based filter conditions on recommendation re-sults or by utilizing recommendations as partof a SPARQL-based subquery) (Subsects 32-35)

ndash Extensive evaluation of the SKOSRec system in aseries of web-based user experiments which haveproven the effectiveness of the novel retrieval ap-proaches especially in terms of diversity and re-call (Sect 4)

2 Related Work

Effective answering of retrieval requests has beeninvestigated by researchers for many years Even be-

L Wenige et al Similarity-based Graph Queries 3

fore the evolution of the LOD cloud scientists de-veloped strategies for extracting relevant informationfrom text data The most prevalent IR systems performrelevance rankings for unstructured resources and of-fer means to state filter conditions that complementqueries in a faceted search interface The approachescan combine query constraints with similarity-basedretrieval In these systems user requests are quicklyanswered due to extensive preprocessing and index-ing of resources before runtime execution [58] How-ever low latency of search results comes at the cost ofa hard-wired data model which prevents flexible cus-tomizations and causes limitations in terms of ad-hocretrieval [18]Other researchers have developed query-based RS fortable data For instance Adomavicius et al proposethe REQUEST query language that generates sugges-tions from a relational database which contains in-formation on both user profiles and items Thus per-sonalized recommendation queries can be formulated[2] Another interesting approach in the category ofquery-based RS is the FlexRecs system by Koutrikaet al The key advantage of their course RS is that itenables highly individual requests in which differentquery parts (ie profile generation filtering of rec-ommendations or like-minded peers) can be flexiblycombined into so called recommendation workflowsHence highly individual recommendation queries canbe formulated with FlexRecs [32]Despite the strengths and weaknesses of the discussedapproaches it has to be taken into account that theyare not designed to consume LOD LOD-enabled sys-tems on the other hand use RDF data for retrievalThere exist many index-based Linked Data IR systems[4 9 10 20 25 40 46 47 61] However they indexRDF resources before runtime retrieval and can there-fore neither facilitate on-the-fly similarity calculationnor do they provide extensive SPARQL-based queryoptionsAnother interesting engine category are on-the-flyLinked Data IR systems These systems process in-formation from the LOD cloud through spreading ac-tivation algorithms While they enable fast retrievalthe downside of these approaches is that graph-basedquery constraints are only possible to a limited ex-tent due to the applied probabilistic path traversal tech-niques [16 17 28 33 35 56]Existing LDRS on the other hand mostly performoffline computation of item-to-item similarities fromLOD metadata descriptions While this approach of-fers more control over the retrieval process it pre-

vents individual customizations and runtime responses[12 13 22 24 29 50 64]To this date there are only a few query-based LDRS[5 43 49] Most of these systems rely on the assump-tion that user preferences and item metadata reside inthe same repository But this goes against the notion ofthe LOD cloud as a decentralized and openly accessi-ble data space which can be queried through API-likeinterfaces Additionally while query-based LDRS pro-vide SPARQL-based query options to filter recommen-dation results they are not capable to flexibly combinesimilar resource retrieval with graph filters (eg sub-querying with recommendation results)In summary existing non-Linked Data as well asLinked Data-enabled retrieval and RS engines alreadyprovide useful features for personalized resource ac-cess Nevertheless the full potential of LOD for rec-ommendation and search tasks has yet to be realized(see Tab 1) The novel query strategies of the SKOS-Rec engine are attempts to tackle the previously out-lined research gaps to improve existing RS as well asto offer new search strategies for LOD repositories thatare guided by personal preferences

3 The SKOS Recommender

31 System Overview

The SKOSRecommender is designed to consumerecommendation requests and was developed with thehelp of the Apache Jena framework2 A SKOSRecquery triggers a recommendation workflow in whichsimilarity-based retrieval steps can be joined withSPARQL-like filter expressions Listing 1 shows thesyntax of the SKOSRec query language that needs tobe applied to formulate recommendation requests

In the given grammar underlined parts representSPARQL syntax elements that have been directly takenfrom the latest W3C specification [21] Words in cap-ital letters denote query keywords which are alsoSPARQL expressions except for the ones listed in thelanguage parts SimProjection and ServiceIntegrationAs a regular SPARQL SELECT query a SKOSRecquery always generates a table-like result set that con-tains at least a mapping to a single variable A detailedexplanation of each part of the SKOSRec grammar canbe found in the Appendix A of this paperFigure 1 depicts the overall architecture of the system

2httpsjenaapacheorg

4 L Wenige et al Similarity-based Graph Queries

Table 1Summary of existing retrieval and recommendation approaches

System Type LOD-enabled A SPARQL-based queries B On-the-fly similarity C Flexible combination ofA and B

IR systems x x x x

Query-based RS x x X (X)

Index-based LinkedData IR systems

X x x x

On-the-fly LinkedData IR systems

X x X x

LDRS X x x x

Query-based LDRS X X x x

Listing 1 Grammar of the SKOSRec query languageSKOSRecQuery = Prologue S e l e c t P a r t S i m P r o j e c t i o n PREF I t e m P a r t + RecWhereClause S e l e c t P a r t = SELECT ( DISTINCT | REDUCED) Var+ RecWhereClause

LimitClause A g g r e g a t i o n A g g r e g a t i o n = AGG IRIref Var (SUM | MAX | AVG )S i m P r o j e c t i o n = RECOMMEND Var TOP INTEGER S e r v i c e I n t e g r a t i o n S e r v i c e I n t e g r a t i o n = FROM SERVICE IRIREFI t e m P a r t = ( V a r P a r t | IRI ) SimV a r P a r t = [ Var RecWhereClause ]Sim = SIM R e l a t i o n DECIMALR e l a t i o n = ( gt | gt= | = )RecWhereClause = WHERE RecGroupGraphPa t t e rnRecGroupGraphPa t t e rn = rsquo rsquo RecGroupGraphPa t t e rnSub rsquo rsquoRecGroupGraphPa t t e rnSub = TriplesBlock ( R e c G r a p h P a t t e r n N o t T r i p l e s rsquo rsquo TriplesBlock)lowastR e c G r a p h P a t t e r n N o t T r i p l e s = RecGroupOrUnionGraphPat te rn | R e c O p t i o n a l G r a p h P a t t e r n |

RecMinusGraphPa t t e rn | Filter | Bind | InlineDataRecGroupOrUnionGraphPa t te rn = RecGroupGraphPa t t e rn ( rsquoUNIONrsquo RecGroupGraphPa t t e rn )lowastR e c O p t i o n a l G r a p h P a t t e r n = rsquoOPTIONALrsquo RecGroupGraphPa t t e rnRecMinusGraphPa t t e rn = rsquoMINUSrsquo RecGroupGraphPa t t e rn

with all its components The most important parts arethe SKOSRec query processing units (ie the parserand the compiler) They check the syntax of recom-mendation requests and regulate the correct execu-tion of critical operations of the engine (eg graph-based filtering result set joins similarity calculation)The architecture does not dictate a particular SPARQLserver implementation for LOD repositories since thesystem is decoupled from the endpoint that is accessedover HTTP during retrieval However the prototypi-cal implementation of the SKOSRec engine utilizes theOpenLink Virtuoso server3

Fig 1 also shows the sequential order of a typicalrecommendation workflow for a complex request thatconsists of the following retrieval steps

1 Issuing a SKOSRec query2 Parsing the SKOSRec query

3httpsvirtuosoopenlinkswcom

3 Loading information on the LOD repositoryS-

PARQL endpoint to be queried from the config-

uration

4 Retrieval of preferred items and setting up of the

user profile (see Subsect 33 Preference Query-

ing)

5 Optimization of the retrieval process through

workload reduction

6 Determination of similar LOD resources based

on the user profile (see Subsect 32 On-the-Fly

Recommendations

7 Ranking of LOD resources

8 Sending of recommendation results to the com-

piler and joining them with postfilter statements

9 Retrieval of the final result set (see Subsect 35

Postfiltering amp Combinations)

10 Output to the user

L Wenige et al Similarity-based Graph Queries 5

Fig 1 Architecture and workflow of the SKOSRec engine

32 On-the-Fly Recommendations

A simple on-the-fly request is the most basic form ofa SKOSRec query It is based on the enginersquos ability togenerate ad-hoc recommendations from a user profilecontaining preference statements for LOD resourcesListing 2 depicts an example SKOSRec on-the-fly re-quest that obtains suggestions based on a userrsquos pref-erence for the western movie ldquoThey Call Me Trin-ityrdquo In the given query the preferred movie is rep-resented by the DBpedia resource (dbrThey_Call_Me_Trinity) The shown query also contains a prefixwith a namespace declaration and a specification re-garding the number of suggestions the engine shouldgenerate

Listing 2 A simple recommendation query for themovie domain (Q1)

PREFIX dbr lthttpdbpediaorgresourcegt

RECOMMEND movie TOP 5PREF dbrThey_Call_Me_Trinity

In this basic form the query does neither con-tain any pre- or postfilter conditions nor a preferencequery statement Thus the engine simply identifies all

movies that are similar to the preference in the profileWhen executed over DBpedia the query returns thesolution set that is depicted in Table 2

Table 2Result set for Q1

moviedbrGod_Forgivesrsquo_I_DonrsquotdbrAce_High_(1968_film)dbrBoot_Hill_(film)dbrTrinity_Is_Still_My_NamedbrTroublemakers_(1994_film)

For determining recommendations in an ad-hocfashion the system identifies SKOS annotations bymatching a suitable property (ie annotation property)in an RDF dataset (Definition 1)

Definition 1 (Annotation property) An annotationproperty is an IRI (Listing 3) that is defined in theDublin Core Metadata Initiative (DCMI) specificationfor subject annotations It can occur in conjunctionwith one of the two namespaces which are stated bythe standard [15]

Listing 3 Annotation property

ltANNOTPROPgt = dcsubject | dctsubject

6 L Wenige et al Similarity-based Graph Queries

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis for LOD-enabledRS [37] However while their measure considers alltypes of IRI resources for the retrieval our similaritymetric purely relies on SKOS annotations (Definition6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency amongrelevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the set ofshared features (c isin Cshared) where f req(c) is thefrequency of c among all relevant resources and n isthe maximum frequency among these resources The fi-nal similarity score of two resources r and q is deter-

L Wenige et al Similarity-based Graph Queries 7

mined by summating the IC values of each concept ofthe shared feature set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items he likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graphAD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

8 L Wenige et al Similarity-based Graph Queries

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval he would receive a recommenda-tion list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevant re-sources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter (Definitions 12and 13) By this means recommendation scores reflectthe similarity of items for the actual set of filtered LODresources

Definition 12 (Conditional SKOS similarity) LetAnnot(r) be the set of SKOS features of r and Annot(q)the set of SKOS features of resource q which is con-tained in the filtered set of annotations and relevantresources (q isin micro(q)| micro isin Ωpre) then the similaritycan be derived from the IC of their shared concepts(Ccond = Annot(r) cap Annot(q)) (Eq 12)

simcond(r q) = IC(Ccond) (12)

Definition 13 (Conditional SKOS Information Con-tent) The conditional IC of a set of SKOS annota-tions is defined by the sum of the IC of each conceptc isin micro(c) | micro isin Ωpre where f reqcond(c) is the fre-quency of c among all filtered resources and n is themaximum frequency among these resources (Eq 13)

IC(Ccond) = minussum

cisinCcond

log(

f reqcond(c)

n

)(13)

The system carries out the calculations (ie deter-mination of similarity values the ranking of LOD re-sources) in the same way as in non-filtering mode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenario

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

for a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 6 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output table

10 L Wenige et al Similarity-based Graph Queries

Another weakness is the biased semantics of the queryUsually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have createdcan still be different A third factor is that recommen-dation retrieval for movie directors should also takeinto account that a film artist is likely more relevantthe more movies heshe has shot that are in line withthe preferred movies of the user Hence the SKOSRecengine enables another postfiltering strategy which iscalled aggregation-based retrieval This strategy canbe helpful whenever an entity type is linked to a cou-ple of LOD resources in a dataset such that similarityscores of related entities (eg movies) can be aggre-gated to an overall recommendation score for a certainitem (eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queriesIn the case of the movie recommendation scenarioa user could state that he enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Tab5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

The combined aggregation-based query can also bedescribed as a rollup request since suggestions arebased on preferences for sublevel entities Another ex-ample for a roll-up request stems from the applicationscenario of travel search This recommendation queryderives suggestions for city trip destinations from thepoints of interest (POI) a user has visited and liked inanother city (eg London) Listing 8 shows the corre-sponding SKOSRec query for this scenario

5httpsenwikipediaorgwikiQuentin_Tarantino

L Wenige et al Similarity-based Graph Queries 11

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREFdbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-

ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that triggers re-trieval of music artists and bands Afterwards the scoresof the top 100 suggestions are summarized with maximum-based aggregation Thus the highest similarity value amongthe set of music acts determines the ranking score of thefilm Also note because of the aggregation-based retrievalprocedure connections between movies and the LOD re-source dbrThe_Jackson_5 are excluded from similar-ity calculation The scenario exemplifies how semantic re-lationships can help to generate suggestions for a target do-main (ie movies) that are based on a profile containingitems from a different source domain (ie music acts) Whileregular SPARQL queries perform exact matching of triplestatements the SKOSRec syntax facilitates the integrationof similarity- and graph-based retrieval to enrich result listswith other potentially relevant itemsAs a point of reference consider the SPARQL query for thegiven cross-domain scenario (Listing 10)

The query omits the similarity calculation part Thus itonly retrieves movies linked to the resource dbrThe_Jackson_5 By this means the SPARQL results have abetter chance of being related to the initial user profile Onthe other hand empty result lists can occur when the pre-ferred LOD resource does not have any direct connections torecommendable items This assumption is backed up by theresult lists of the two queries (Tabs 8 and 9) The SKOS-Rec query produces a more comprehensive solution than aregular SPARQL query

The following of definitions will formalize the postfilteredand aggregation-based retrieval forms Definition 14 speci-

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 3: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 3

fore the evolution of the LOD cloud scientists de-veloped strategies for extracting relevant informationfrom text data The most prevalent IR systems performrelevance rankings for unstructured resources and of-fer means to state filter conditions that complementqueries in a faceted search interface The approachescan combine query constraints with similarity-basedretrieval In these systems user requests are quicklyanswered due to extensive preprocessing and index-ing of resources before runtime execution [58] How-ever low latency of search results comes at the cost ofa hard-wired data model which prevents flexible cus-tomizations and causes limitations in terms of ad-hocretrieval [18]Other researchers have developed query-based RS fortable data For instance Adomavicius et al proposethe REQUEST query language that generates sugges-tions from a relational database which contains in-formation on both user profiles and items Thus per-sonalized recommendation queries can be formulated[2] Another interesting approach in the category ofquery-based RS is the FlexRecs system by Koutrikaet al The key advantage of their course RS is that itenables highly individual requests in which differentquery parts (ie profile generation filtering of rec-ommendations or like-minded peers) can be flexiblycombined into so called recommendation workflowsHence highly individual recommendation queries canbe formulated with FlexRecs [32]Despite the strengths and weaknesses of the discussedapproaches it has to be taken into account that theyare not designed to consume LOD LOD-enabled sys-tems on the other hand use RDF data for retrievalThere exist many index-based Linked Data IR systems[4 9 10 20 25 40 46 47 61] However they indexRDF resources before runtime retrieval and can there-fore neither facilitate on-the-fly similarity calculationnor do they provide extensive SPARQL-based queryoptionsAnother interesting engine category are on-the-flyLinked Data IR systems These systems process in-formation from the LOD cloud through spreading ac-tivation algorithms While they enable fast retrievalthe downside of these approaches is that graph-basedquery constraints are only possible to a limited ex-tent due to the applied probabilistic path traversal tech-niques [16 17 28 33 35 56]Existing LDRS on the other hand mostly performoffline computation of item-to-item similarities fromLOD metadata descriptions While this approach of-fers more control over the retrieval process it pre-

vents individual customizations and runtime responses[12 13 22 24 29 50 64]To this date there are only a few query-based LDRS[5 43 49] Most of these systems rely on the assump-tion that user preferences and item metadata reside inthe same repository But this goes against the notion ofthe LOD cloud as a decentralized and openly accessi-ble data space which can be queried through API-likeinterfaces Additionally while query-based LDRS pro-vide SPARQL-based query options to filter recommen-dation results they are not capable to flexibly combinesimilar resource retrieval with graph filters (eg sub-querying with recommendation results)In summary existing non-Linked Data as well asLinked Data-enabled retrieval and RS engines alreadyprovide useful features for personalized resource ac-cess Nevertheless the full potential of LOD for rec-ommendation and search tasks has yet to be realized(see Tab 1) The novel query strategies of the SKOS-Rec engine are attempts to tackle the previously out-lined research gaps to improve existing RS as well asto offer new search strategies for LOD repositories thatare guided by personal preferences

3 The SKOS Recommender

31 System Overview

The SKOSRecommender is designed to consumerecommendation requests and was developed with thehelp of the Apache Jena framework2 A SKOSRecquery triggers a recommendation workflow in whichsimilarity-based retrieval steps can be joined withSPARQL-like filter expressions Listing 1 shows thesyntax of the SKOSRec query language that needs tobe applied to formulate recommendation requests

In the given grammar underlined parts representSPARQL syntax elements that have been directly takenfrom the latest W3C specification [21] Words in cap-ital letters denote query keywords which are alsoSPARQL expressions except for the ones listed in thelanguage parts SimProjection and ServiceIntegrationAs a regular SPARQL SELECT query a SKOSRecquery always generates a table-like result set that con-tains at least a mapping to a single variable A detailedexplanation of each part of the SKOSRec grammar canbe found in the Appendix A of this paperFigure 1 depicts the overall architecture of the system

2httpsjenaapacheorg

4 L Wenige et al Similarity-based Graph Queries

Table 1Summary of existing retrieval and recommendation approaches

System Type LOD-enabled A SPARQL-based queries B On-the-fly similarity C Flexible combination ofA and B

IR systems x x x x

Query-based RS x x X (X)

Index-based LinkedData IR systems

X x x x

On-the-fly LinkedData IR systems

X x X x

LDRS X x x x

Query-based LDRS X X x x

Listing 1 Grammar of the SKOSRec query languageSKOSRecQuery = Prologue S e l e c t P a r t S i m P r o j e c t i o n PREF I t e m P a r t + RecWhereClause S e l e c t P a r t = SELECT ( DISTINCT | REDUCED) Var+ RecWhereClause

LimitClause A g g r e g a t i o n A g g r e g a t i o n = AGG IRIref Var (SUM | MAX | AVG )S i m P r o j e c t i o n = RECOMMEND Var TOP INTEGER S e r v i c e I n t e g r a t i o n S e r v i c e I n t e g r a t i o n = FROM SERVICE IRIREFI t e m P a r t = ( V a r P a r t | IRI ) SimV a r P a r t = [ Var RecWhereClause ]Sim = SIM R e l a t i o n DECIMALR e l a t i o n = ( gt | gt= | = )RecWhereClause = WHERE RecGroupGraphPa t t e rnRecGroupGraphPa t t e rn = rsquo rsquo RecGroupGraphPa t t e rnSub rsquo rsquoRecGroupGraphPa t t e rnSub = TriplesBlock ( R e c G r a p h P a t t e r n N o t T r i p l e s rsquo rsquo TriplesBlock)lowastR e c G r a p h P a t t e r n N o t T r i p l e s = RecGroupOrUnionGraphPat te rn | R e c O p t i o n a l G r a p h P a t t e r n |

RecMinusGraphPa t t e rn | Filter | Bind | InlineDataRecGroupOrUnionGraphPa t te rn = RecGroupGraphPa t t e rn ( rsquoUNIONrsquo RecGroupGraphPa t t e rn )lowastR e c O p t i o n a l G r a p h P a t t e r n = rsquoOPTIONALrsquo RecGroupGraphPa t t e rnRecMinusGraphPa t t e rn = rsquoMINUSrsquo RecGroupGraphPa t t e rn

with all its components The most important parts arethe SKOSRec query processing units (ie the parserand the compiler) They check the syntax of recom-mendation requests and regulate the correct execu-tion of critical operations of the engine (eg graph-based filtering result set joins similarity calculation)The architecture does not dictate a particular SPARQLserver implementation for LOD repositories since thesystem is decoupled from the endpoint that is accessedover HTTP during retrieval However the prototypi-cal implementation of the SKOSRec engine utilizes theOpenLink Virtuoso server3

Fig 1 also shows the sequential order of a typicalrecommendation workflow for a complex request thatconsists of the following retrieval steps

1 Issuing a SKOSRec query2 Parsing the SKOSRec query

3httpsvirtuosoopenlinkswcom

3 Loading information on the LOD repositoryS-

PARQL endpoint to be queried from the config-

uration

4 Retrieval of preferred items and setting up of the

user profile (see Subsect 33 Preference Query-

ing)

5 Optimization of the retrieval process through

workload reduction

6 Determination of similar LOD resources based

on the user profile (see Subsect 32 On-the-Fly

Recommendations

7 Ranking of LOD resources

8 Sending of recommendation results to the com-

piler and joining them with postfilter statements

9 Retrieval of the final result set (see Subsect 35

Postfiltering amp Combinations)

10 Output to the user

L Wenige et al Similarity-based Graph Queries 5

Fig 1 Architecture and workflow of the SKOSRec engine

32 On-the-Fly Recommendations

A simple on-the-fly request is the most basic form ofa SKOSRec query It is based on the enginersquos ability togenerate ad-hoc recommendations from a user profilecontaining preference statements for LOD resourcesListing 2 depicts an example SKOSRec on-the-fly re-quest that obtains suggestions based on a userrsquos pref-erence for the western movie ldquoThey Call Me Trin-ityrdquo In the given query the preferred movie is rep-resented by the DBpedia resource (dbrThey_Call_Me_Trinity) The shown query also contains a prefixwith a namespace declaration and a specification re-garding the number of suggestions the engine shouldgenerate

Listing 2 A simple recommendation query for themovie domain (Q1)

PREFIX dbr lthttpdbpediaorgresourcegt

RECOMMEND movie TOP 5PREF dbrThey_Call_Me_Trinity

In this basic form the query does neither con-tain any pre- or postfilter conditions nor a preferencequery statement Thus the engine simply identifies all

movies that are similar to the preference in the profileWhen executed over DBpedia the query returns thesolution set that is depicted in Table 2

Table 2Result set for Q1

moviedbrGod_Forgivesrsquo_I_DonrsquotdbrAce_High_(1968_film)dbrBoot_Hill_(film)dbrTrinity_Is_Still_My_NamedbrTroublemakers_(1994_film)

For determining recommendations in an ad-hocfashion the system identifies SKOS annotations bymatching a suitable property (ie annotation property)in an RDF dataset (Definition 1)

Definition 1 (Annotation property) An annotationproperty is an IRI (Listing 3) that is defined in theDublin Core Metadata Initiative (DCMI) specificationfor subject annotations It can occur in conjunctionwith one of the two namespaces which are stated bythe standard [15]

Listing 3 Annotation property

ltANNOTPROPgt = dcsubject | dctsubject

6 L Wenige et al Similarity-based Graph Queries

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis for LOD-enabledRS [37] However while their measure considers alltypes of IRI resources for the retrieval our similaritymetric purely relies on SKOS annotations (Definition6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency amongrelevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the set ofshared features (c isin Cshared) where f req(c) is thefrequency of c among all relevant resources and n isthe maximum frequency among these resources The fi-nal similarity score of two resources r and q is deter-

L Wenige et al Similarity-based Graph Queries 7

mined by summating the IC values of each concept ofthe shared feature set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items he likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graphAD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

8 L Wenige et al Similarity-based Graph Queries

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval he would receive a recommenda-tion list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevant re-sources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter (Definitions 12and 13) By this means recommendation scores reflectthe similarity of items for the actual set of filtered LODresources

Definition 12 (Conditional SKOS similarity) LetAnnot(r) be the set of SKOS features of r and Annot(q)the set of SKOS features of resource q which is con-tained in the filtered set of annotations and relevantresources (q isin micro(q)| micro isin Ωpre) then the similaritycan be derived from the IC of their shared concepts(Ccond = Annot(r) cap Annot(q)) (Eq 12)

simcond(r q) = IC(Ccond) (12)

Definition 13 (Conditional SKOS Information Con-tent) The conditional IC of a set of SKOS annota-tions is defined by the sum of the IC of each conceptc isin micro(c) | micro isin Ωpre where f reqcond(c) is the fre-quency of c among all filtered resources and n is themaximum frequency among these resources (Eq 13)

IC(Ccond) = minussum

cisinCcond

log(

f reqcond(c)

n

)(13)

The system carries out the calculations (ie deter-mination of similarity values the ranking of LOD re-sources) in the same way as in non-filtering mode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenario

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

for a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 6 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output table

10 L Wenige et al Similarity-based Graph Queries

Another weakness is the biased semantics of the queryUsually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have createdcan still be different A third factor is that recommen-dation retrieval for movie directors should also takeinto account that a film artist is likely more relevantthe more movies heshe has shot that are in line withthe preferred movies of the user Hence the SKOSRecengine enables another postfiltering strategy which iscalled aggregation-based retrieval This strategy canbe helpful whenever an entity type is linked to a cou-ple of LOD resources in a dataset such that similarityscores of related entities (eg movies) can be aggre-gated to an overall recommendation score for a certainitem (eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queriesIn the case of the movie recommendation scenarioa user could state that he enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Tab5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

The combined aggregation-based query can also bedescribed as a rollup request since suggestions arebased on preferences for sublevel entities Another ex-ample for a roll-up request stems from the applicationscenario of travel search This recommendation queryderives suggestions for city trip destinations from thepoints of interest (POI) a user has visited and liked inanother city (eg London) Listing 8 shows the corre-sponding SKOSRec query for this scenario

5httpsenwikipediaorgwikiQuentin_Tarantino

L Wenige et al Similarity-based Graph Queries 11

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREFdbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-

ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that triggers re-trieval of music artists and bands Afterwards the scoresof the top 100 suggestions are summarized with maximum-based aggregation Thus the highest similarity value amongthe set of music acts determines the ranking score of thefilm Also note because of the aggregation-based retrievalprocedure connections between movies and the LOD re-source dbrThe_Jackson_5 are excluded from similar-ity calculation The scenario exemplifies how semantic re-lationships can help to generate suggestions for a target do-main (ie movies) that are based on a profile containingitems from a different source domain (ie music acts) Whileregular SPARQL queries perform exact matching of triplestatements the SKOSRec syntax facilitates the integrationof similarity- and graph-based retrieval to enrich result listswith other potentially relevant itemsAs a point of reference consider the SPARQL query for thegiven cross-domain scenario (Listing 10)

The query omits the similarity calculation part Thus itonly retrieves movies linked to the resource dbrThe_Jackson_5 By this means the SPARQL results have abetter chance of being related to the initial user profile Onthe other hand empty result lists can occur when the pre-ferred LOD resource does not have any direct connections torecommendable items This assumption is backed up by theresult lists of the two queries (Tabs 8 and 9) The SKOS-Rec query produces a more comprehensive solution than aregular SPARQL query

The following of definitions will formalize the postfilteredand aggregation-based retrieval forms Definition 14 speci-

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 4: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

4 L Wenige et al Similarity-based Graph Queries

Table 1Summary of existing retrieval and recommendation approaches

System Type LOD-enabled A SPARQL-based queries B On-the-fly similarity C Flexible combination ofA and B

IR systems x x x x

Query-based RS x x X (X)

Index-based LinkedData IR systems

X x x x

On-the-fly LinkedData IR systems

X x X x

LDRS X x x x

Query-based LDRS X X x x

Listing 1 Grammar of the SKOSRec query languageSKOSRecQuery = Prologue S e l e c t P a r t S i m P r o j e c t i o n PREF I t e m P a r t + RecWhereClause S e l e c t P a r t = SELECT ( DISTINCT | REDUCED) Var+ RecWhereClause

LimitClause A g g r e g a t i o n A g g r e g a t i o n = AGG IRIref Var (SUM | MAX | AVG )S i m P r o j e c t i o n = RECOMMEND Var TOP INTEGER S e r v i c e I n t e g r a t i o n S e r v i c e I n t e g r a t i o n = FROM SERVICE IRIREFI t e m P a r t = ( V a r P a r t | IRI ) SimV a r P a r t = [ Var RecWhereClause ]Sim = SIM R e l a t i o n DECIMALR e l a t i o n = ( gt | gt= | = )RecWhereClause = WHERE RecGroupGraphPa t t e rnRecGroupGraphPa t t e rn = rsquo rsquo RecGroupGraphPa t t e rnSub rsquo rsquoRecGroupGraphPa t t e rnSub = TriplesBlock ( R e c G r a p h P a t t e r n N o t T r i p l e s rsquo rsquo TriplesBlock)lowastR e c G r a p h P a t t e r n N o t T r i p l e s = RecGroupOrUnionGraphPat te rn | R e c O p t i o n a l G r a p h P a t t e r n |

RecMinusGraphPa t t e rn | Filter | Bind | InlineDataRecGroupOrUnionGraphPa t te rn = RecGroupGraphPa t t e rn ( rsquoUNIONrsquo RecGroupGraphPa t t e rn )lowastR e c O p t i o n a l G r a p h P a t t e r n = rsquoOPTIONALrsquo RecGroupGraphPa t t e rnRecMinusGraphPa t t e rn = rsquoMINUSrsquo RecGroupGraphPa t t e rn

with all its components The most important parts arethe SKOSRec query processing units (ie the parserand the compiler) They check the syntax of recom-mendation requests and regulate the correct execu-tion of critical operations of the engine (eg graph-based filtering result set joins similarity calculation)The architecture does not dictate a particular SPARQLserver implementation for LOD repositories since thesystem is decoupled from the endpoint that is accessedover HTTP during retrieval However the prototypi-cal implementation of the SKOSRec engine utilizes theOpenLink Virtuoso server3

Fig 1 also shows the sequential order of a typicalrecommendation workflow for a complex request thatconsists of the following retrieval steps

1 Issuing a SKOSRec query2 Parsing the SKOSRec query

3httpsvirtuosoopenlinkswcom

3 Loading information on the LOD repositoryS-

PARQL endpoint to be queried from the config-

uration

4 Retrieval of preferred items and setting up of the

user profile (see Subsect 33 Preference Query-

ing)

5 Optimization of the retrieval process through

workload reduction

6 Determination of similar LOD resources based

on the user profile (see Subsect 32 On-the-Fly

Recommendations

7 Ranking of LOD resources

8 Sending of recommendation results to the com-

piler and joining them with postfilter statements

9 Retrieval of the final result set (see Subsect 35

Postfiltering amp Combinations)

10 Output to the user

L Wenige et al Similarity-based Graph Queries 5

Fig 1 Architecture and workflow of the SKOSRec engine

32 On-the-Fly Recommendations

A simple on-the-fly request is the most basic form ofa SKOSRec query It is based on the enginersquos ability togenerate ad-hoc recommendations from a user profilecontaining preference statements for LOD resourcesListing 2 depicts an example SKOSRec on-the-fly re-quest that obtains suggestions based on a userrsquos pref-erence for the western movie ldquoThey Call Me Trin-ityrdquo In the given query the preferred movie is rep-resented by the DBpedia resource (dbrThey_Call_Me_Trinity) The shown query also contains a prefixwith a namespace declaration and a specification re-garding the number of suggestions the engine shouldgenerate

Listing 2 A simple recommendation query for themovie domain (Q1)

PREFIX dbr lthttpdbpediaorgresourcegt

RECOMMEND movie TOP 5PREF dbrThey_Call_Me_Trinity

In this basic form the query does neither con-tain any pre- or postfilter conditions nor a preferencequery statement Thus the engine simply identifies all

movies that are similar to the preference in the profileWhen executed over DBpedia the query returns thesolution set that is depicted in Table 2

Table 2Result set for Q1

moviedbrGod_Forgivesrsquo_I_DonrsquotdbrAce_High_(1968_film)dbrBoot_Hill_(film)dbrTrinity_Is_Still_My_NamedbrTroublemakers_(1994_film)

For determining recommendations in an ad-hocfashion the system identifies SKOS annotations bymatching a suitable property (ie annotation property)in an RDF dataset (Definition 1)

Definition 1 (Annotation property) An annotationproperty is an IRI (Listing 3) that is defined in theDublin Core Metadata Initiative (DCMI) specificationfor subject annotations It can occur in conjunctionwith one of the two namespaces which are stated bythe standard [15]

Listing 3 Annotation property

ltANNOTPROPgt = dcsubject | dctsubject

6 L Wenige et al Similarity-based Graph Queries

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis for LOD-enabledRS [37] However while their measure considers alltypes of IRI resources for the retrieval our similaritymetric purely relies on SKOS annotations (Definition6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency amongrelevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the set ofshared features (c isin Cshared) where f req(c) is thefrequency of c among all relevant resources and n isthe maximum frequency among these resources The fi-nal similarity score of two resources r and q is deter-

L Wenige et al Similarity-based Graph Queries 7

mined by summating the IC values of each concept ofthe shared feature set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items he likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graphAD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

8 L Wenige et al Similarity-based Graph Queries

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval he would receive a recommenda-tion list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevant re-sources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter (Definitions 12and 13) By this means recommendation scores reflectthe similarity of items for the actual set of filtered LODresources

Definition 12 (Conditional SKOS similarity) LetAnnot(r) be the set of SKOS features of r and Annot(q)the set of SKOS features of resource q which is con-tained in the filtered set of annotations and relevantresources (q isin micro(q)| micro isin Ωpre) then the similaritycan be derived from the IC of their shared concepts(Ccond = Annot(r) cap Annot(q)) (Eq 12)

simcond(r q) = IC(Ccond) (12)

Definition 13 (Conditional SKOS Information Con-tent) The conditional IC of a set of SKOS annota-tions is defined by the sum of the IC of each conceptc isin micro(c) | micro isin Ωpre where f reqcond(c) is the fre-quency of c among all filtered resources and n is themaximum frequency among these resources (Eq 13)

IC(Ccond) = minussum

cisinCcond

log(

f reqcond(c)

n

)(13)

The system carries out the calculations (ie deter-mination of similarity values the ranking of LOD re-sources) in the same way as in non-filtering mode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenario

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

for a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 6 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output table

10 L Wenige et al Similarity-based Graph Queries

Another weakness is the biased semantics of the queryUsually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have createdcan still be different A third factor is that recommen-dation retrieval for movie directors should also takeinto account that a film artist is likely more relevantthe more movies heshe has shot that are in line withthe preferred movies of the user Hence the SKOSRecengine enables another postfiltering strategy which iscalled aggregation-based retrieval This strategy canbe helpful whenever an entity type is linked to a cou-ple of LOD resources in a dataset such that similarityscores of related entities (eg movies) can be aggre-gated to an overall recommendation score for a certainitem (eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queriesIn the case of the movie recommendation scenarioa user could state that he enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Tab5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

The combined aggregation-based query can also bedescribed as a rollup request since suggestions arebased on preferences for sublevel entities Another ex-ample for a roll-up request stems from the applicationscenario of travel search This recommendation queryderives suggestions for city trip destinations from thepoints of interest (POI) a user has visited and liked inanother city (eg London) Listing 8 shows the corre-sponding SKOSRec query for this scenario

5httpsenwikipediaorgwikiQuentin_Tarantino

L Wenige et al Similarity-based Graph Queries 11

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREFdbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-

ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that triggers re-trieval of music artists and bands Afterwards the scoresof the top 100 suggestions are summarized with maximum-based aggregation Thus the highest similarity value amongthe set of music acts determines the ranking score of thefilm Also note because of the aggregation-based retrievalprocedure connections between movies and the LOD re-source dbrThe_Jackson_5 are excluded from similar-ity calculation The scenario exemplifies how semantic re-lationships can help to generate suggestions for a target do-main (ie movies) that are based on a profile containingitems from a different source domain (ie music acts) Whileregular SPARQL queries perform exact matching of triplestatements the SKOSRec syntax facilitates the integrationof similarity- and graph-based retrieval to enrich result listswith other potentially relevant itemsAs a point of reference consider the SPARQL query for thegiven cross-domain scenario (Listing 10)

The query omits the similarity calculation part Thus itonly retrieves movies linked to the resource dbrThe_Jackson_5 By this means the SPARQL results have abetter chance of being related to the initial user profile Onthe other hand empty result lists can occur when the pre-ferred LOD resource does not have any direct connections torecommendable items This assumption is backed up by theresult lists of the two queries (Tabs 8 and 9) The SKOS-Rec query produces a more comprehensive solution than aregular SPARQL query

The following of definitions will formalize the postfilteredand aggregation-based retrieval forms Definition 14 speci-

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 5: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 5

Fig 1 Architecture and workflow of the SKOSRec engine

32 On-the-Fly Recommendations

A simple on-the-fly request is the most basic form ofa SKOSRec query It is based on the enginersquos ability togenerate ad-hoc recommendations from a user profilecontaining preference statements for LOD resourcesListing 2 depicts an example SKOSRec on-the-fly re-quest that obtains suggestions based on a userrsquos pref-erence for the western movie ldquoThey Call Me Trin-ityrdquo In the given query the preferred movie is rep-resented by the DBpedia resource (dbrThey_Call_Me_Trinity) The shown query also contains a prefixwith a namespace declaration and a specification re-garding the number of suggestions the engine shouldgenerate

Listing 2 A simple recommendation query for themovie domain (Q1)

PREFIX dbr lthttpdbpediaorgresourcegt

RECOMMEND movie TOP 5PREF dbrThey_Call_Me_Trinity

In this basic form the query does neither con-tain any pre- or postfilter conditions nor a preferencequery statement Thus the engine simply identifies all

movies that are similar to the preference in the profileWhen executed over DBpedia the query returns thesolution set that is depicted in Table 2

Table 2Result set for Q1

moviedbrGod_Forgivesrsquo_I_DonrsquotdbrAce_High_(1968_film)dbrBoot_Hill_(film)dbrTrinity_Is_Still_My_NamedbrTroublemakers_(1994_film)

For determining recommendations in an ad-hocfashion the system identifies SKOS annotations bymatching a suitable property (ie annotation property)in an RDF dataset (Definition 1)

Definition 1 (Annotation property) An annotationproperty is an IRI (Listing 3) that is defined in theDublin Core Metadata Initiative (DCMI) specificationfor subject annotations It can occur in conjunctionwith one of the two namespaces which are stated bythe standard [15]

Listing 3 Annotation property

ltANNOTPROPgt = dcsubject | dctsubject

6 L Wenige et al Similarity-based Graph Queries

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis for LOD-enabledRS [37] However while their measure considers alltypes of IRI resources for the retrieval our similaritymetric purely relies on SKOS annotations (Definition6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency amongrelevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the set ofshared features (c isin Cshared) where f req(c) is thefrequency of c among all relevant resources and n isthe maximum frequency among these resources The fi-nal similarity score of two resources r and q is deter-

L Wenige et al Similarity-based Graph Queries 7

mined by summating the IC values of each concept ofthe shared feature set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items he likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graphAD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

8 L Wenige et al Similarity-based Graph Queries

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval he would receive a recommenda-tion list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevant re-sources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter (Definitions 12and 13) By this means recommendation scores reflectthe similarity of items for the actual set of filtered LODresources

Definition 12 (Conditional SKOS similarity) LetAnnot(r) be the set of SKOS features of r and Annot(q)the set of SKOS features of resource q which is con-tained in the filtered set of annotations and relevantresources (q isin micro(q)| micro isin Ωpre) then the similaritycan be derived from the IC of their shared concepts(Ccond = Annot(r) cap Annot(q)) (Eq 12)

simcond(r q) = IC(Ccond) (12)

Definition 13 (Conditional SKOS Information Con-tent) The conditional IC of a set of SKOS annota-tions is defined by the sum of the IC of each conceptc isin micro(c) | micro isin Ωpre where f reqcond(c) is the fre-quency of c among all filtered resources and n is themaximum frequency among these resources (Eq 13)

IC(Ccond) = minussum

cisinCcond

log(

f reqcond(c)

n

)(13)

The system carries out the calculations (ie deter-mination of similarity values the ranking of LOD re-sources) in the same way as in non-filtering mode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenario

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

for a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 6 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output table

10 L Wenige et al Similarity-based Graph Queries

Another weakness is the biased semantics of the queryUsually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have createdcan still be different A third factor is that recommen-dation retrieval for movie directors should also takeinto account that a film artist is likely more relevantthe more movies heshe has shot that are in line withthe preferred movies of the user Hence the SKOSRecengine enables another postfiltering strategy which iscalled aggregation-based retrieval This strategy canbe helpful whenever an entity type is linked to a cou-ple of LOD resources in a dataset such that similarityscores of related entities (eg movies) can be aggre-gated to an overall recommendation score for a certainitem (eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queriesIn the case of the movie recommendation scenarioa user could state that he enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Tab5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

The combined aggregation-based query can also bedescribed as a rollup request since suggestions arebased on preferences for sublevel entities Another ex-ample for a roll-up request stems from the applicationscenario of travel search This recommendation queryderives suggestions for city trip destinations from thepoints of interest (POI) a user has visited and liked inanother city (eg London) Listing 8 shows the corre-sponding SKOSRec query for this scenario

5httpsenwikipediaorgwikiQuentin_Tarantino

L Wenige et al Similarity-based Graph Queries 11

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREFdbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-

ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that triggers re-trieval of music artists and bands Afterwards the scoresof the top 100 suggestions are summarized with maximum-based aggregation Thus the highest similarity value amongthe set of music acts determines the ranking score of thefilm Also note because of the aggregation-based retrievalprocedure connections between movies and the LOD re-source dbrThe_Jackson_5 are excluded from similar-ity calculation The scenario exemplifies how semantic re-lationships can help to generate suggestions for a target do-main (ie movies) that are based on a profile containingitems from a different source domain (ie music acts) Whileregular SPARQL queries perform exact matching of triplestatements the SKOSRec syntax facilitates the integrationof similarity- and graph-based retrieval to enrich result listswith other potentially relevant itemsAs a point of reference consider the SPARQL query for thegiven cross-domain scenario (Listing 10)

The query omits the similarity calculation part Thus itonly retrieves movies linked to the resource dbrThe_Jackson_5 By this means the SPARQL results have abetter chance of being related to the initial user profile Onthe other hand empty result lists can occur when the pre-ferred LOD resource does not have any direct connections torecommendable items This assumption is backed up by theresult lists of the two queries (Tabs 8 and 9) The SKOS-Rec query produces a more comprehensive solution than aregular SPARQL query

The following of definitions will formalize the postfilteredand aggregation-based retrieval forms Definition 14 speci-

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 6: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

6 L Wenige et al Similarity-based Graph Queries

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis for LOD-enabledRS [37] However while their measure considers alltypes of IRI resources for the retrieval our similaritymetric purely relies on SKOS annotations (Definition6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency amongrelevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the set ofshared features (c isin Cshared) where f req(c) is thefrequency of c among all relevant resources and n isthe maximum frequency among these resources The fi-nal similarity score of two resources r and q is deter-

L Wenige et al Similarity-based Graph Queries 7

mined by summating the IC values of each concept ofthe shared feature set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items he likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graphAD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

8 L Wenige et al Similarity-based Graph Queries

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval he would receive a recommenda-tion list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevant re-sources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter (Definitions 12and 13) By this means recommendation scores reflectthe similarity of items for the actual set of filtered LODresources

Definition 12 (Conditional SKOS similarity) LetAnnot(r) be the set of SKOS features of r and Annot(q)the set of SKOS features of resource q which is con-tained in the filtered set of annotations and relevantresources (q isin micro(q)| micro isin Ωpre) then the similaritycan be derived from the IC of their shared concepts(Ccond = Annot(r) cap Annot(q)) (Eq 12)

simcond(r q) = IC(Ccond) (12)

Definition 13 (Conditional SKOS Information Con-tent) The conditional IC of a set of SKOS annota-tions is defined by the sum of the IC of each conceptc isin micro(c) | micro isin Ωpre where f reqcond(c) is the fre-quency of c among all filtered resources and n is themaximum frequency among these resources (Eq 13)

IC(Ccond) = minussum

cisinCcond

log(

f reqcond(c)

n

)(13)

The system carries out the calculations (ie deter-mination of similarity values the ranking of LOD re-sources) in the same way as in non-filtering mode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenario

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

for a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 6 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output table

10 L Wenige et al Similarity-based Graph Queries

Another weakness is the biased semantics of the queryUsually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have createdcan still be different A third factor is that recommen-dation retrieval for movie directors should also takeinto account that a film artist is likely more relevantthe more movies heshe has shot that are in line withthe preferred movies of the user Hence the SKOSRecengine enables another postfiltering strategy which iscalled aggregation-based retrieval This strategy canbe helpful whenever an entity type is linked to a cou-ple of LOD resources in a dataset such that similarityscores of related entities (eg movies) can be aggre-gated to an overall recommendation score for a certainitem (eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queriesIn the case of the movie recommendation scenarioa user could state that he enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Tab5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

The combined aggregation-based query can also bedescribed as a rollup request since suggestions arebased on preferences for sublevel entities Another ex-ample for a roll-up request stems from the applicationscenario of travel search This recommendation queryderives suggestions for city trip destinations from thepoints of interest (POI) a user has visited and liked inanother city (eg London) Listing 8 shows the corre-sponding SKOSRec query for this scenario

5httpsenwikipediaorgwikiQuentin_Tarantino

L Wenige et al Similarity-based Graph Queries 11

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREFdbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-

ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that triggers re-trieval of music artists and bands Afterwards the scoresof the top 100 suggestions are summarized with maximum-based aggregation Thus the highest similarity value amongthe set of music acts determines the ranking score of thefilm Also note because of the aggregation-based retrievalprocedure connections between movies and the LOD re-source dbrThe_Jackson_5 are excluded from similar-ity calculation The scenario exemplifies how semantic re-lationships can help to generate suggestions for a target do-main (ie movies) that are based on a profile containingitems from a different source domain (ie music acts) Whileregular SPARQL queries perform exact matching of triplestatements the SKOSRec syntax facilitates the integrationof similarity- and graph-based retrieval to enrich result listswith other potentially relevant itemsAs a point of reference consider the SPARQL query for thegiven cross-domain scenario (Listing 10)

The query omits the similarity calculation part Thus itonly retrieves movies linked to the resource dbrThe_Jackson_5 By this means the SPARQL results have abetter chance of being related to the initial user profile Onthe other hand empty result lists can occur when the pre-ferred LOD resource does not have any direct connections torecommendable items This assumption is backed up by theresult lists of the two queries (Tabs 8 and 9) The SKOS-Rec query produces a more comprehensive solution than aregular SPARQL query

The following of definitions will formalize the postfilteredand aggregation-based retrieval forms Definition 14 speci-

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 7: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 7

mined by summating the IC values of each concept ofthe shared feature set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items he likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graphAD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

8 L Wenige et al Similarity-based Graph Queries

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval he would receive a recommenda-tion list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevant re-sources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter (Definitions 12and 13) By this means recommendation scores reflectthe similarity of items for the actual set of filtered LODresources

Definition 12 (Conditional SKOS similarity) LetAnnot(r) be the set of SKOS features of r and Annot(q)the set of SKOS features of resource q which is con-tained in the filtered set of annotations and relevantresources (q isin micro(q)| micro isin Ωpre) then the similaritycan be derived from the IC of their shared concepts(Ccond = Annot(r) cap Annot(q)) (Eq 12)

simcond(r q) = IC(Ccond) (12)

Definition 13 (Conditional SKOS Information Con-tent) The conditional IC of a set of SKOS annota-tions is defined by the sum of the IC of each conceptc isin micro(c) | micro isin Ωpre where f reqcond(c) is the fre-quency of c among all filtered resources and n is themaximum frequency among these resources (Eq 13)

IC(Ccond) = minussum

cisinCcond

log(

f reqcond(c)

n

)(13)

The system carries out the calculations (ie deter-mination of similarity values the ranking of LOD re-sources) in the same way as in non-filtering mode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenario

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

for a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 6 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output table

10 L Wenige et al Similarity-based Graph Queries

Another weakness is the biased semantics of the queryUsually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have createdcan still be different A third factor is that recommen-dation retrieval for movie directors should also takeinto account that a film artist is likely more relevantthe more movies heshe has shot that are in line withthe preferred movies of the user Hence the SKOSRecengine enables another postfiltering strategy which iscalled aggregation-based retrieval This strategy canbe helpful whenever an entity type is linked to a cou-ple of LOD resources in a dataset such that similarityscores of related entities (eg movies) can be aggre-gated to an overall recommendation score for a certainitem (eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queriesIn the case of the movie recommendation scenarioa user could state that he enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Tab5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

The combined aggregation-based query can also bedescribed as a rollup request since suggestions arebased on preferences for sublevel entities Another ex-ample for a roll-up request stems from the applicationscenario of travel search This recommendation queryderives suggestions for city trip destinations from thepoints of interest (POI) a user has visited and liked inanother city (eg London) Listing 8 shows the corre-sponding SKOSRec query for this scenario

5httpsenwikipediaorgwikiQuentin_Tarantino

L Wenige et al Similarity-based Graph Queries 11

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREFdbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-

ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that triggers re-trieval of music artists and bands Afterwards the scoresof the top 100 suggestions are summarized with maximum-based aggregation Thus the highest similarity value amongthe set of music acts determines the ranking score of thefilm Also note because of the aggregation-based retrievalprocedure connections between movies and the LOD re-source dbrThe_Jackson_5 are excluded from similar-ity calculation The scenario exemplifies how semantic re-lationships can help to generate suggestions for a target do-main (ie movies) that are based on a profile containingitems from a different source domain (ie music acts) Whileregular SPARQL queries perform exact matching of triplestatements the SKOSRec syntax facilitates the integrationof similarity- and graph-based retrieval to enrich result listswith other potentially relevant itemsAs a point of reference consider the SPARQL query for thegiven cross-domain scenario (Listing 10)

The query omits the similarity calculation part Thus itonly retrieves movies linked to the resource dbrThe_Jackson_5 By this means the SPARQL results have abetter chance of being related to the initial user profile Onthe other hand empty result lists can occur when the pre-ferred LOD resource does not have any direct connections torecommendable items This assumption is backed up by theresult lists of the two queries (Tabs 8 and 9) The SKOS-Rec query produces a more comprehensive solution than aregular SPARQL query

The following of definitions will formalize the postfilteredand aggregation-based retrieval forms Definition 14 speci-

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 8: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

8 L Wenige et al Similarity-based Graph Queries

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval he would receive a recommenda-tion list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevant re-sources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter (Definitions 12and 13) By this means recommendation scores reflectthe similarity of items for the actual set of filtered LODresources

Definition 12 (Conditional SKOS similarity) LetAnnot(r) be the set of SKOS features of r and Annot(q)the set of SKOS features of resource q which is con-tained in the filtered set of annotations and relevantresources (q isin micro(q)| micro isin Ωpre) then the similaritycan be derived from the IC of their shared concepts(Ccond = Annot(r) cap Annot(q)) (Eq 12)

simcond(r q) = IC(Ccond) (12)

Definition 13 (Conditional SKOS Information Con-tent) The conditional IC of a set of SKOS annota-tions is defined by the sum of the IC of each conceptc isin micro(c) | micro isin Ωpre where f reqcond(c) is the fre-quency of c among all filtered resources and n is themaximum frequency among these resources (Eq 13)

IC(Ccond) = minussum

cisinCcond

log(

f reqcond(c)

n

)(13)

The system carries out the calculations (ie deter-mination of similarity values the ranking of LOD re-sources) in the same way as in non-filtering mode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenario

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

for a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 6 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output table

10 L Wenige et al Similarity-based Graph Queries

Another weakness is the biased semantics of the queryUsually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have createdcan still be different A third factor is that recommen-dation retrieval for movie directors should also takeinto account that a film artist is likely more relevantthe more movies heshe has shot that are in line withthe preferred movies of the user Hence the SKOSRecengine enables another postfiltering strategy which iscalled aggregation-based retrieval This strategy canbe helpful whenever an entity type is linked to a cou-ple of LOD resources in a dataset such that similarityscores of related entities (eg movies) can be aggre-gated to an overall recommendation score for a certainitem (eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queriesIn the case of the movie recommendation scenarioa user could state that he enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Tab5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

The combined aggregation-based query can also bedescribed as a rollup request since suggestions arebased on preferences for sublevel entities Another ex-ample for a roll-up request stems from the applicationscenario of travel search This recommendation queryderives suggestions for city trip destinations from thepoints of interest (POI) a user has visited and liked inanother city (eg London) Listing 8 shows the corre-sponding SKOSRec query for this scenario

5httpsenwikipediaorgwikiQuentin_Tarantino

L Wenige et al Similarity-based Graph Queries 11

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREFdbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-

ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that triggers re-trieval of music artists and bands Afterwards the scoresof the top 100 suggestions are summarized with maximum-based aggregation Thus the highest similarity value amongthe set of music acts determines the ranking score of thefilm Also note because of the aggregation-based retrievalprocedure connections between movies and the LOD re-source dbrThe_Jackson_5 are excluded from similar-ity calculation The scenario exemplifies how semantic re-lationships can help to generate suggestions for a target do-main (ie movies) that are based on a profile containingitems from a different source domain (ie music acts) Whileregular SPARQL queries perform exact matching of triplestatements the SKOSRec syntax facilitates the integrationof similarity- and graph-based retrieval to enrich result listswith other potentially relevant itemsAs a point of reference consider the SPARQL query for thegiven cross-domain scenario (Listing 10)

The query omits the similarity calculation part Thus itonly retrieves movies linked to the resource dbrThe_Jackson_5 By this means the SPARQL results have abetter chance of being related to the initial user profile Onthe other hand empty result lists can occur when the pre-ferred LOD resource does not have any direct connections torecommendable items This assumption is backed up by theresult lists of the two queries (Tabs 8 and 9) The SKOS-Rec query produces a more comprehensive solution than aregular SPARQL query

The following of definitions will formalize the postfilteredand aggregation-based retrieval forms Definition 14 speci-

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 9: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevant re-sources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter (Definitions 12and 13) By this means recommendation scores reflectthe similarity of items for the actual set of filtered LODresources

Definition 12 (Conditional SKOS similarity) LetAnnot(r) be the set of SKOS features of r and Annot(q)the set of SKOS features of resource q which is con-tained in the filtered set of annotations and relevantresources (q isin micro(q)| micro isin Ωpre) then the similaritycan be derived from the IC of their shared concepts(Ccond = Annot(r) cap Annot(q)) (Eq 12)

simcond(r q) = IC(Ccond) (12)

Definition 13 (Conditional SKOS Information Con-tent) The conditional IC of a set of SKOS annota-tions is defined by the sum of the IC of each conceptc isin micro(c) | micro isin Ωpre where f reqcond(c) is the fre-quency of c among all filtered resources and n is themaximum frequency among these resources (Eq 13)

IC(Ccond) = minussum

cisinCcond

log(

f reqcond(c)

n

)(13)

The system carries out the calculations (ie deter-mination of similarity values the ranking of LOD re-sources) in the same way as in non-filtering mode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenario

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

for a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 6 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output table

10 L Wenige et al Similarity-based Graph Queries

Another weakness is the biased semantics of the queryUsually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have createdcan still be different A third factor is that recommen-dation retrieval for movie directors should also takeinto account that a film artist is likely more relevantthe more movies heshe has shot that are in line withthe preferred movies of the user Hence the SKOSRecengine enables another postfiltering strategy which iscalled aggregation-based retrieval This strategy canbe helpful whenever an entity type is linked to a cou-ple of LOD resources in a dataset such that similarityscores of related entities (eg movies) can be aggre-gated to an overall recommendation score for a certainitem (eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queriesIn the case of the movie recommendation scenarioa user could state that he enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Tab5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

The combined aggregation-based query can also bedescribed as a rollup request since suggestions arebased on preferences for sublevel entities Another ex-ample for a roll-up request stems from the applicationscenario of travel search This recommendation queryderives suggestions for city trip destinations from thepoints of interest (POI) a user has visited and liked inanother city (eg London) Listing 8 shows the corre-sponding SKOSRec query for this scenario

5httpsenwikipediaorgwikiQuentin_Tarantino

L Wenige et al Similarity-based Graph Queries 11

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREFdbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-

ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that triggers re-trieval of music artists and bands Afterwards the scoresof the top 100 suggestions are summarized with maximum-based aggregation Thus the highest similarity value amongthe set of music acts determines the ranking score of thefilm Also note because of the aggregation-based retrievalprocedure connections between movies and the LOD re-source dbrThe_Jackson_5 are excluded from similar-ity calculation The scenario exemplifies how semantic re-lationships can help to generate suggestions for a target do-main (ie movies) that are based on a profile containingitems from a different source domain (ie music acts) Whileregular SPARQL queries perform exact matching of triplestatements the SKOSRec syntax facilitates the integrationof similarity- and graph-based retrieval to enrich result listswith other potentially relevant itemsAs a point of reference consider the SPARQL query for thegiven cross-domain scenario (Listing 10)

The query omits the similarity calculation part Thus itonly retrieves movies linked to the resource dbrThe_Jackson_5 By this means the SPARQL results have abetter chance of being related to the initial user profile Onthe other hand empty result lists can occur when the pre-ferred LOD resource does not have any direct connections torecommendable items This assumption is backed up by theresult lists of the two queries (Tabs 8 and 9) The SKOS-Rec query produces a more comprehensive solution than aregular SPARQL query

The following of definitions will formalize the postfilteredand aggregation-based retrieval forms Definition 14 speci-

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 10: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

10 L Wenige et al Similarity-based Graph Queries

Another weakness is the biased semantics of the queryUsually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have createdcan still be different A third factor is that recommen-dation retrieval for movie directors should also takeinto account that a film artist is likely more relevantthe more movies heshe has shot that are in line withthe preferred movies of the user Hence the SKOSRecengine enables another postfiltering strategy which iscalled aggregation-based retrieval This strategy canbe helpful whenever an entity type is linked to a cou-ple of LOD resources in a dataset such that similarityscores of related entities (eg movies) can be aggre-gated to an overall recommendation score for a certainitem (eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queriesIn the case of the movie recommendation scenarioa user could state that he enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Tab5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

The combined aggregation-based query can also bedescribed as a rollup request since suggestions arebased on preferences for sublevel entities Another ex-ample for a roll-up request stems from the applicationscenario of travel search This recommendation queryderives suggestions for city trip destinations from thepoints of interest (POI) a user has visited and liked inanother city (eg London) Listing 8 shows the corre-sponding SKOSRec query for this scenario

5httpsenwikipediaorgwikiQuentin_Tarantino

L Wenige et al Similarity-based Graph Queries 11

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREFdbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-

ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that triggers re-trieval of music artists and bands Afterwards the scoresof the top 100 suggestions are summarized with maximum-based aggregation Thus the highest similarity value amongthe set of music acts determines the ranking score of thefilm Also note because of the aggregation-based retrievalprocedure connections between movies and the LOD re-source dbrThe_Jackson_5 are excluded from similar-ity calculation The scenario exemplifies how semantic re-lationships can help to generate suggestions for a target do-main (ie movies) that are based on a profile containingitems from a different source domain (ie music acts) Whileregular SPARQL queries perform exact matching of triplestatements the SKOSRec syntax facilitates the integrationof similarity- and graph-based retrieval to enrich result listswith other potentially relevant itemsAs a point of reference consider the SPARQL query for thegiven cross-domain scenario (Listing 10)

The query omits the similarity calculation part Thus itonly retrieves movies linked to the resource dbrThe_Jackson_5 By this means the SPARQL results have abetter chance of being related to the initial user profile Onthe other hand empty result lists can occur when the pre-ferred LOD resource does not have any direct connections torecommendable items This assumption is backed up by theresult lists of the two queries (Tabs 8 and 9) The SKOS-Rec query produces a more comprehensive solution than aregular SPARQL query

The following of definitions will formalize the postfilteredand aggregation-based retrieval forms Definition 14 speci-

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 11: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 11

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREFdbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-

ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that triggers re-trieval of music artists and bands Afterwards the scoresof the top 100 suggestions are summarized with maximum-based aggregation Thus the highest similarity value amongthe set of music acts determines the ranking score of thefilm Also note because of the aggregation-based retrievalprocedure connections between movies and the LOD re-source dbrThe_Jackson_5 are excluded from similar-ity calculation The scenario exemplifies how semantic re-lationships can help to generate suggestions for a target do-main (ie movies) that are based on a profile containingitems from a different source domain (ie music acts) Whileregular SPARQL queries perform exact matching of triplestatements the SKOSRec syntax facilitates the integrationof similarity- and graph-based retrieval to enrich result listswith other potentially relevant itemsAs a point of reference consider the SPARQL query for thegiven cross-domain scenario (Listing 10)

The query omits the similarity calculation part Thus itonly retrieves movies linked to the resource dbrThe_Jackson_5 By this means the SPARQL results have abetter chance of being related to the initial user profile Onthe other hand empty result lists can occur when the pre-ferred LOD resource does not have any direct connections torecommendable items This assumption is backed up by theresult lists of the two queries (Tabs 8 and 9) The SKOS-Rec query produces a more comprehensive solution than aregular SPARQL query

The following of definitions will formalize the postfilteredand aggregation-based retrieval forms Definition 14 speci-

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 12: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

12 L Wenige et al Similarity-based Graph Queries

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

fies the notion of SKOSRec recommendations that may po-tentially undergo postprocessing

Definition 14 (SKOSRec recommendations) SKOSRec rec-ommendations (R(Pr)) are the set of suggestions generatedby processing the solution mappings (Ω) obtained from rec-ommendation retrieval They can be a result of simple on-the-fly retrieval prefiltering or a result of a combination ofthese techniques Pr represents the input profile that containsat least one preference for an item The cardinality of R(Pr)is the intended number of recommendations (k) that is spec-ified by the user based on which the engine selects the re-sources with the highest recommendation scores that are notcontained in the profile (Eq 14)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(14)

The approach of subquerying with recommendation re-sults builds on the idea that SKOSRec suggestions R(Pr) canbe joined with any graph pattern Ppost past to the process of

similarity calculation (Definition 15) It does not make a dif-ference how the engine obtained these recommendations

Definition 15 (Postfiltered recommendation mapping) Thepostfiltered recommendation mapping is obtained by joiningsolution mappings generated from SKOSRec recommenda-tions with the results of matching a postfilter graph pattern(Ωpost) (Eqs 15 and 16)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro) (15)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (16)

As the prefilter Ppost can be comprised of triple state-ments that are eligible in the RecGroupGraphPattern part ofthe SKOSRec grammar Aggregation-based queries have tobe treated differently than a regular postfilter request Apartfrom excluding sublevel items of the profile from the setof similar resources the engine should also omit any con-nections between the preferred superordinate item (a) andsimilar sublevel entities Additionally the user has to spec-ify the variable in his profile (eg y) that represents theitems within the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated (Def-inition 16)

Definition 16 (Resources for aggregation-based retrieval)The set of resources for aggregation-based retrieval is ob-tained by retrieving LOD resources that have a connectionto a previously generated suggestion thereby excluding allresources that are linked to the specified superordinate itema in the profile (Eq 17)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (17)

For each potential aggregation-based suggestion (y) sim-ilarity scores of connected entities (x) have to be consideredfor the final ranking In this context different approachesto aggregation can be applied In cases when both similaritems and aggregation-based recommendations are entitieson comparable levels of granularity then choosing the max-imum similarity score might be the best way to represent therelatedness of the resource to the user profile On the otherhand when postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in the movierecommendation example) the scores of related suggestionsshould be aggregated by the sum or the average of individualscores

Definition 17 (Aggregation-based ranking score) The rank-ing score of an LOD resource y that is connected to a set ofrecommended items is determined by aggregating scores of

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 13: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 13

connected entities either by the maximum (Eq 18) the sum(Eq 19) or the average (Eq 20) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (18)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (19)

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)| (20)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOSRec en-gine were evaluated in the context of web-based experimentsfor the usage scenarios of travel destination search and mul-timedia RS The online approach helped to reach out to moreparticipants with diverse demographic backgrounds whichwould not have been possible in a laboratory experimentAnother positive side-effect of the web-based setting wasthat participants could test and rate recommendations with-out being directly observed by an experimenter which mighthave caused biased results otherwise Hence the online ap-proach softened common limitations of laboratory studies[19]The experiments were carried out in the defined usagescenarios with metdata descriptions from DBpedia Thedatabase containing the local mirror of DBpedia ran on a vir-tual server Besides the triple store a web application washosted on the same machine The website ran on an ApacheTomcat 7 application server and was implemented with thehelp of Java servlet technology6 7

We conducted the online experiments between April 2016and January 2017 Upon setting up the web application apretest was carried out to control for potential pitfalls suchas misleading navigation Two test users ran step by stepthrough the web interface and provided their feedback Be-cause of their insights we slightly modified the interface(eg through rephrasing user instructions) to increase thechances of successful completion In case the SKOSRec en-gine is applied in a live setting the design of the user inter-face will have to be tested with more users Our experimentalseries focused on algorithmic performance which was testedwith numerous participants

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml

Subjects were recruited through the clickworkercom plat-form8 Participants received a compensation fee of 100e or150e for a completed survey 103 participants took part inthe study on travel destination search In the multimedia do-mains in total 154 subjects evaluated the SKOSRec engineFor each part of the evaluation it was ensured that partici-pants had not just run through the evaluation screen When-ever there appeared to be irregularities in the log files (egthe default settings of the screen were not changed) theseassessments were excluded from further analysis The sur-vey template started with a consent form It informed partic-ipants about the context and the procedure of the study Af-ter having filled in the consent form participants were nav-igated to the first part of the experiment It collected dataon subjectsrsquo demographics and their consumption and searchhabits After the preparation phase participants worked onthe first test case (TC1) in each usage scenario In this testcase the performance of the SKOSRec engine was assessedin the simple execution mode of on-the-fly recommendations(see Subsect 32) It was done to gather data on the useful-ness of suggestions resulting from a simple content-basedapproach (ie baseline method) with which more advancedrecommendation strategies were compared at a later stageIn cases where users did not state any items the applicationshowed an error page While setting up a profile requiredsome effort on the side of the user it was a necessary stepfor the experiment Since we did not have access to the logfiles of a live application profile generation had to be sim-ulated The process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search for itemsthrough an autocomplete field that applied AJAX technol-ogy9

It enabled users to type in keywords (eg the name of amusic act or the author of a book) for which matchings wereinstantaneously displayed without reloading the site Henceparticipants could quickly find and select their items of in-terest Whenever a user entered a query through the AJAXinterface the system matched the query keywords against anApache SolrLucene search index10 The index was built withitem metadata which had been extracted from DBpedia priorto setting up the website The index comprised the namesand abstracts of items Whenever available information onthumbnail pictures were saved in the index to be ready fordisplay in the AJAX interface In case the Apache Sol-rLucene search engine found index keywords that matchedthe user query metadata information for these items wereimmediately shown on the screen This information was dis-played to help participants make an informed decision onwhether the AJAX result list contained suitable items for pro-file generationAside from the AJAX feature the appeal of the interface

8httpswwwclickworkerde9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 14: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

was increased by a progress bar at the top of the webpage(see Fig 2) Progress bars are commonly utilized tools forweb surveys In conventional paper-based studies subjectsusually see how far ahead they are while in a web surveythey cannot check their progress immediately This is espe-cially true for studies that apply a screen-by-screen naviga-tion which was the chosen approach for the conducted webexperiments Progress bars prevent users from tiring of thequestionnaire and withdrawing from the study altogether byshowing how close participants are to complete the experi-ment [14]Upon profile generation the SKOSRec engine calculated on-the-fly recommendations for a simple query (similar to Q1Sect 3) which were shown to users in a separate screenSuggestions were displayed with a sufficient amount of in-formation eg thumbnails labels or a short description tohelp participants assess the quality of the recommendationIn addition to this information users could also access therespective Wikipedia article by following the hyperlink onthe label Relevance sliders were used to assess the utility ofeach recommendation They are often applied in evaluationsof IR systems [51 54] Scores were handled as points on a0 to 100 scale of the relevance slider (outer left side lowestrelevance outer right side highest relevance see Fig 3)

This approach facilitated the calculation of an accurateprecision score and the application of tests with higher sta-tistical power than an ordinal 5-star scale The score calledmean relevance (mrs) was measured with Equation 21

mrs =

ksumi=1

scorei

k(21)

In addition to the precision score the web application alsodetermined the size of the recommendation list produced by

the engine as an indicator for recall However recommenda-tion list size can only be utilized for this purpose when mrsscores reach a reasonable levelWhile accurate recommendations are useful because theybuild trust in a system [57] they only capture a narrow as-pect of performance [36] For instance an online retailer cantremendously profit from an RS that provides both relevantand unobvious recommendations since it enables users toexplore the product catalog more thoroughly On the otherhand in the case of an RS only suggesting popular prod-ucts it is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these aspectswhich is why other performance indicators such as noveltywere measured as well Thus the evaluation interface alsocontained a section where subjects had to tick a radio but-ton that indicated whether an item was new In the backendof the web application novelty was measured as the ratio ofrelevant items not familiar to the user (|nov|) among thetotal number of relevant items (|tp|) (Eq 22)

nv =|nov||tp| (22)

An item was considered relevant when it achieved an mrsscore of 50 or higher In addition to novelty and diversitythe web application also tracked the diversity of recommen-dation lists since a topically diversified result set has beenfound to increase overall user satisfaction [31 65] In theRS literature the most widely explored diversification met-rics are the ones that measure the similarity between eachitem pair in the recommendation list The more the items arealike the less diverse is the result set [19] The study seriesof this paper applied the metric by Castells et al [8] It de-termines item-to-item similarity values with the cosine sim-

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 15: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

ilarity measure Scores are then converted to distance values(Eq 23) and averaged for the recommendation list (Eq 24)

d(il im) = 1minus cos(il im) (23)

divC =2

k(k minus 1)

sumlltm

d(il im) (24)

The content-based diversity scores (divC) were calculatedin the backend of the web application and saved in a log filefor each generated recommendation list of the SKOSRec en-

gine during runtime execution In addition to the measure-ment of divC scores we asked participants about their opin-ions on recommendation list diversity directly Following theRS evaluation framework by Pu et al two Likert items wereposed for this purpose [44]

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to each

other

Besides the above listed algorithmic performance metricsthe authors gathered assessments from users regarding theoverall usefulness of the suggestions This approach is alsoin line with the empirically verified evaluation framework byPu et al which proposes to measure the psychometric con-

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 16: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

16 L Wenige et al Similarity-based Graph Queries

struct of Perceived Usefulness to get a comprehensive under-standing of user opinions on recommendation results Thusparticipants were instructed to state their agreement to pos-itive statements regarding the relevance of the suggestions[44] The statements were adapated to the specificities ofeach usage scenario Figure 4 shows an example collectionof Likert questions for the domain of music RS

After participants had provided the required informationtheir answers were saved in a log file and they were navi-gated to test case 2 (TC2) which evaluated the SKOSRecenginersquos performance for constraint-based queries (similarto Q3 Sect 3) We tried to find out if the engine was ableto improve user satisfaction through filter conditions Eventhough IR systems already successfully apply facet filterson result sets the experiments still had to prove whetherLOD-enabled constraints have the same effect on the Per-ceived Usefulness of recommendations For the comparisonsof non-filtered and filtered suggestions we applied a within-subjects design in which participants had to rate two recom-mendation lists resulting from regular (non-filtered) and fil-tered requests in TC2 We favored this setup over a between-subjects design where subjects would have been assigned toonly one result list While the between-subjects design moreclosely resembles a real-world setting since it does not re-quire participants to interact with more than one approachat a time differences among subjects (eg regarding demo-graphics or topic expertise) can potentially have a consider-able impact on quality judgments [19] Hence when apply-ing a between-subjects design one does not know whetherdifferences in performance are caused by the approaches orby the variability among participants Therefore researchersneed to conduct between-subjects experiments with a suffi-cient number of study subjects to level out these differencesWithin-subjects studies on the other hand require fewerparticipants while still achieving the same level of statisti-cal power since between-subject variability is not an issue[30 31] Beel et al have shown that even small variations inan RS test setting could cause substantial differences in per-formance outcomes [6] To keep the extent of variations evenlower comparisons between non-filter and constraint-basedrecommendations were based on the same profile for eachuser in TC2 Therefore the user profile from TC1 servedas a starting point for filtered retrieval The web applicationshowed the recently generated profiles to users and askedthem to provide an additional filter condition Figure 5 de-picts an example web form for a constraint-based recommen-dation request in the music domain It comprises the userprofile and an additional text field where users stated theirconstraint

Afterwards the engine applied this filter condition on theset of potential recommendation results before similarity cal-culation started For each usage scenario different types ofconstraints were available Regarding suitable filter optionsthe author sought to achieve a balance between assistingusers in finding filter conditions that would adequately rep-

resent their information needs while keeping the variabil-ity among users as low as possible Therefore the specifici-ties of each scenario and the availability of the respectivedata sources were determined For instance for the multime-dia domains (movie music and books) it was assumed thatusers would like to filter recommendations according to thespecific genre of the item Fortunately the DBpedia datasetcontains this kind of information for the music domain Inthis domain the genre can be retrieved through the proper-ties dbogenre However in the movie and book domainsDBpedia does not provide sufficient information Many LODresources which represent books were not assigned a genreIn the movie domain a genre property is missing entirelyand genre-specific information can often only be found in thesubject categories describing the movie Hence in contrastto the experiment on music recommendations participantsin the domains of books and films did not filter their rec-ommendation lists by genre On the other hand the propertydcsubject was applied as recommendation filter in eachmultimedia domain Since the subject property often links tomany informative features of multimedia items (eg releaseperiod geographic information or content information) theauthor decided that this feature represents a powerful filterdimensionIn the experiment on travel destination search the web inter-face offered two filters a subject and a location-specific fil-ter The subject filter enabled users to state their preferencesregarding the characteristics of a location (eg being a na-ture reserve) The location-specific filter on the other handgave participants the option to specify where the desired des-tination should be located11

The second fundamental research issue of TC2 concernedthe question whether expressive constraints can boost rec-ommendation quality even further Hence apart from ex-ecuting a constraint-based workflow with a simple filter(ie a direct attribute of an item) the SKOSRec enginegenerated suggestions with the help of an expressive fil-ter as well This second procedure while still applying thesame user constraint as the simple filter utilized an ex-pressive graph pattern to include additional LOD resourcesin the result set For instance in case a subject filter wasselected result set expansion was facilitated by exploringthe wider semantic space of the DBpedia category graphthrough a property path declaration in the graph pattern(eg skosbroader[2]) In the travel experiment anadditional expressive filter retrieved place-related informa-tion (dulhasLocation) for location-specific subprop-erty relations that are not materialized in DBpedia Hencethe expressive graph pattern expanded the filter such thatadditional properties were matched (eg dbocountrydbocity or dbostate) while still applying the spec-ified user constraint on the set of LOD resources Upon ex-ecution of constraint-based recommendation requests users

11httpdbpediaorg

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 17: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

received two recommendation lists (resulting from simpleand expressive filtering accordingly) in a separate evaluationscreen Participants evaluated result sets in the same manneras in TC1 For each result set the screen contained an addi-tional Likert item asking subjects to give their agreement tothe statement ldquoThe filter has improved the recommendationresults of listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in each usersession to avoid order effectsAfter study subjects had completed TC2 they were navi-gated to test case 3 (TC3) In the TC3 section of the travelexperiment participants could choose between three similarrollup query patterns (similar to Q6 Sect 3) Users could ob-tain recommendations for travel destinations based on POIsthey had visited during the stay in another destination (iea city region or country) Upon entity type selection theSKOSRec engine generated appropriate suggestions Figure6 shows the web form from the travel experiment that facili-tated the formulation of advanced rollup requests in TC3

When users had selected an entity type stated their fa-vorite travel destination and three belonging POIs the appli-cation generated a SKOSRec query from these parametersA simple on-the-fly request was sent to the engine as well toretrieve baseline recommendations with which the sugges-tions resulting from the advanced request were compared at

a later stage of the experiment The simple query only con-tained the userrsquos favorite travel destination and the selectedentity type option thus omitting information on the POIsDuring the travel experiment the engine processed both thesimple and the advanced query and produced two recom-mendation lists As in the previous parts of the experimentparticipants assessed the quality of the result sets which ap-peared in random order on the screenThe multimedia experiments also contained a section onrollup retrieval (TC3) As the advanced travel queries themultimedia requests aggregated similarity scores of sublevelentities through summation which the engine joined with apostfilter In addition to these features the pattern containeda preference query part This section identified a set of pre-ferred items through a single user statement Participants ei-ther specified their favorite directoractor (music domain)music act (music domain) or author (book domain) and re-ceived recommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) The SKOSRecengine generated recommendations based on the creativeworks of the favored artist Participants also received sug-gestions from a simple on-the-fly query which computed re-sults according to the artistrsquos characteristics As in the travelexperiment recommendations from the advanced rollup re-quest were compared with the on-the-fly query without let-

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 18: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

Fig 6 Travel - User profile generation TC3

ting participants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments had eval-uated the results of TC3 they were guided to test case 4(TC4) which evaluated cross-domain queries The experi-ment on travel RS ended with TC3 because no suitable graphpatterns could be identified to facilitate these kinds of sug-

gestions for the travel usage scenario In the multimedia do-main however the data from DBpedia was sufficient for thisretrieval task The entity type of the study (ie movie musicact or book) defined the source domain based on which theengine generated suggestions for items from another multi-media target domain (similar to Q7 Sect 3) Figure 7 depictsthe cross-domain web form of the music experiment

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 19: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 19

Fig 7 Music - User profile generation TC4 (page 9)

The web form assisted subjects in formulating a requestwhich then obtained movie suggestions based on the fa-vorite music act of the user Parameters from users were en-tered in the placeholders of the query pattern in the back-end of the application We formulated the query templatesin such a way that the query engine matched any propertiesor graph patterns that could potentially connect two itemsfrom the specified target and source domains For instancesuch matchings can occur when a user states his favoritemovie that is written by a particular author In case the sameauthor has also written some books they might be of in-terest to the user as well Another example stems from themovie domain Suppose in hisher profile a consumer hasdeclared a preference for a movie that links to the corre-sponding soundtrack It might well be the case that the userlikes the soundtrack and the music acts that were involved increating it just as much as heshe likes the movie The prac-tice to distinguish homonymous LOD resources through theproperty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was exploited forcross-domain requests in TC4 since it connects similar enti-ties (ie the book edition of a certain movie)Cross-domain SPARQL queries (similar to Q8 Sect 3)were posed as baseline requests in this test case While theSPARQL query only matched cross-domain relations for asingle LOD resource the SKOSRec request expanded thesearch space by considering links from more than one itemAs in the previous test cases TC4 ended with an evaluationscreen where users assessed the SPARQL-based as well asthe SKOSRec suggestions in random order without knowingwhich recommendation list belonged to which method Ta-ble 10 gives an overview of the test cases and the methodsthat were applied in the experiments in each usage scenario

42 Results

The gathered samples from the experiments contained an-swers from both genders with a slight overrepresentation ofmale participants (537) The prevailing number of studysubjects was between 20 and 40 years of age (658) Theevaluation of the demographics section also revealed that themajority of participants already perceives the search in theirdomain of interest as either ldquomanageablerdquo (385) or even

as ldquoeasyrdquo or ldquovery easyrdquo (440)Prior to analyzing the performance results of the novel re-trieval approaches of the SKOSRec engine it was checkedwhether the respective Likert items for Perceived Usefulnessand user-based diversity (divU) reached acceptable reliabil-ity levels For this purpose Cronbachrsquos α values were deter-mined Table 11 shows the outcome of this analysis for thePerceived Usefulness construct Given the acceptable valuesthroughout the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statistical tests

Besides usefulness user-based diversity scores were alsomeasured by multiple Likert items However Cronbachrsquos αvalues for this concept did not reach acceptable levels In or-der to avoid that valuable user assessments remained unusedagreements to the statement bdquoThe recommendations of list[] are diverse ldquo were taken into account as an ordinal vari-able (divU)Based on these results subsequent tests could be carried outThe first part of the analysis concerned the performance ofthe on-the-fly retrieval method (baseline) of the SKOSRecengine in TC1 The outcome of this analysis confirms thatthe on-the-fly approach represents a viable recommendationstrategy Table 12 shows the mean (M)12 standard deviation(S D) and number of participants (N) for each performancedimension and metric It can be seen that users assessed theenginersquos performance in almost each quality dimension to beabove mediocrity Aside from the positive user evaluationswith regard to Perceived Uselfulness related quality aspectssuch as Accuracy Novelty and Diversity received good as-sessments as well For instance each domain mean relevancescore (mrs) was higher than 50 score points It means thatparticipants were often convinced of the relevance of the re-sults Given these positive user assessments with regard toitem relevance it seems to be justified to consider the meanof result list sizes as an indicator of recall The engine wasable to generate the required number of recommendations inalmost each test case (see Tab 12) The on-the-fly retrievalapproach was also tested with regard to the Novelty dimen-sion Table 12 shows the mean novelty scores (nv) for the

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 20: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

20 L Wenige et al Similarity-based Graph Queries

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

baseline recommendation approach Novelty scores indicatethat the majority of items in a result list were deemed to benew in the travel and book RS whereas in the experimentson movie and music RS only one third of the recommenda-tion list was unfamiliar to users In contrast to that content-based Diversity scores (divC) were rather high in each do-main Hence it can be concluded that items in a result setusually have a low rate of matching SKOS annotations Inaddition to content-based scores user assessments of recom-mendation list Diversity (divU) were also taken into accountThe median level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse lied at 3(ldquoneutralrdquo) in the multimedia domains and at 4 (ldquoagreerdquo) inthe travel domainIn summary the baseline approach achieved fairly goodperformance results in TC1 The findings demonstrate thatLOD-enabled recommendations can potentially enhance auserrsquos search experience They also allow for subsequenttests of advanced retrieval techniques which can be com-pared to this standard method

In the analysis of performance results from TC2 (constraint-based recommendations) particular attention was paid toparticipantsrsquo agreements with the statement ldquoThe filter hasimproved the recommendation results of the listrdquo Figure8 shows the distribution of domain-wise agreement ratiosOn average they were fairly high Throughout the domainsthe vast majority of participants either selected 5 (ldquostronglyagreerdquo) 4 (ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often hada positive impact on results

The authors gathered responses to the question on im-provement for recommendation lists resulting from both thesimple and the expressive filtering approaches Hence theygive clues about the general performance of constraint-basedretrieval However since this section of the study revealedthe experimental condition to participants (ie applicationof a filter) answers to the Likert statement have to be in-terpreted cautiously Results may be slightly biased as par-

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

ticipants were able to guess the underlying agenda How-ever these limitations do not exist for comparisons betweensimple and expressive filters Since the web application ran-domly assigned the list order participants did not knowwhich approach they were assessing This enables us to takea closer look at the differences between the two filtering ap-proaches It was hypothesized that advanced filters improverecommendation quality as they may identify more relevantresources thereby overcoming problems of data quality ordata sparsity in LOD repositories Table 13 lists responserates of the two methods The rate measures the ratio of non-zero result sets among all result sets Throughout the do-

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 21: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 21

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

mains recommendation lists resulting from expressive filter-ing achieved higher response rates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

In addition to response rates user assessments and scoresin the defined quality dimensions were compared Table 14depicts the outcome of this analysis Significantly higherscores are marked in bold figures In case statistical testsidentified no meaningful variations the results of the betterperforming approach are underlined A-levels were adjustedwith the false discovery rate (FDR) in order to decrease theprobability of making a type I error due to multiple testing

According to mean scores expressive requests generatedmore comprehensive recommendation lists (size) This find-ing is in line with the increased response rates shown in Table13 However subsequently conducted t-tests confirmed sig-nificant differences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores were higherin most of the domains (travel movie music) when theSKOSRec engine processed a simple constraint-based queryHowever statistical tests did not confirm any systematic dif-ferences The authors applied Wilcoxon tests for mrs scoresand the remaining performance metrics because of the small

sample sizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was superior Inthe Diversity dimension expressive filters generated sugges-tions that were at least as topically diversified as the re-sults from simple constraint-based retrieval Even though nosignificant differences were identified between the two ap-proaches increased mean scores (divU in the travel domainand divC in each domain) indicate a slight superiority of ex-panded filter requests in this quality dimension The same ap-plies to usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied) How-ever these statements are speculative at best because statis-tical tests were not significantIn summary it can be concluded that expressive filters im-prove recall potentially leading to a slight loss in precisionscores (see Tab 14) It may also be the case that expandeduser constraints diversify as well as increase the usefulnessof recommendation lists However these claims are unveri-fied In contrast it is safe to say that expressive query pat-terns generate results of at least the same quality as simplepatterns while increasing the number of relevant suggestionsThe few differences may be explained by the high Jaccardscores (JI) (see Tab 15) of recommendation lists Through-out the domains result sets were much alike Given thesesimilarities and the increases in list sizes for expressive fil-ters it is supposed that an expanded constraint produces anenhanced version of the recommendation list resulting fromsimple filtering Therefore expanded filters should be the de-fault setting in a constraint-based retrieval context

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 22: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

22 L Wenige et al Similarity-based Graph Queries

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

In TC3 (ie evaluation of roll-up query patterns) theSKOSRec engine almost always generated non-empty rec-ommendation lists (see Tab 16) In the travel and the bookexperiment the response rate was higher for aggregation-based queries whereas in the movie domain the engine moreoften provided recommendations for regular requests How-ever these differences are only marginal and do not indicatea clear superiority of one method

Figure 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness of theapproaches in TC3 The diagram shows similar results forregular as well as advanced requests On average satisfac-tion scores reached levels which lay slightly above neutral

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

agreement A subsequent t-test confirmed no significant dif-ferences between the two methods In fact the approach withthe highest average score of Perceived Usefulness was dif-ferent in each experiment While mean values were higherfor regular requests in the studies on travel and music RSadvanced queries produced higher scores in the movie andbook experiments

Content-based Diversity (divC) was the only metric withsystematic differences Here scores went up for advancedretrieval patterns in the travel (t(98) = minus550 p lt 0001)the music (t(52) = minus451 p lt 0001) and the book domain(t(47) = minus490 p lt 0001) (see Table 17) This findingis not surprising given that the aggregation-based approach

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 23: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 23

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

explores the wider semantic network of each item in the pro-file

Hence it can be assumed that both approaches producerecommendation lists of similar quality and can provide ben-efits to potential consumers It depends on the underlying in-formation need the respective domain and the available datain the LOD repository whether or not a particular approachis better suited for a given retrieval task since neither ap-proach outperformed the other one This conclusion is espe-cially impressive given the low mean Jaccard scores for rec-ommendation lists resulting from regular and aggregation-based retrieval (see Tab 18) The mean score never exceeded01 throughout the domains It indicates a low concordancebetween result sets It is remarkable that participants per-ceived the recommendations as equally good given the highdissimilarity of the lists Thus advanced queries have anadded-value as they provide additional interesting resultsThey can help to explore LOD repositories in different waysthan a regular retrieval approach Therefore it is worthwhileto offer aggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not producedany helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated Howeversince the authors conducted experiments for TC4 only in themultimedia domains the samples were considerably smallerNevertheless it is assumed that the amount of completeduser sessions is only just enough to conduct further statisti-cal analyses Table 19 shows the response rates for the tworetrieval methods Cross-domain patterns had a higher suc-cess rate of generating non-zero result sets throughout the

domains Participants almost always received a recommen-dation for this query type whereas in case a regular SPARQLquery was executed often they did not receive a single sug-gestion at all

We applied non-parametric tests for the performance met-rics mrs nv divU divC and Perceived Usefulness dueto the low response rates of the SPARQL-based approachwhich led to a low number of comparable data points accord-ingly Overall the FDR-adjusted pairwise Wilcoxon tests de-tected no significant differences except for user-based Di-versity scores (divU) in the movie (Z = minus269 p lt 005)and the book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem to pointin the same direction because of the high mean values forcross-domain queries in each domain But the significance ofthese results was not verified The mrs scores indicate a bet-ter performance of regular SPARQL queries but the within-subjects test did not confirm this assumption statistically (seeTab 20)

However the most important finding of the analysis wasthat SKOSRec cross-domain queries generated significantlymore suggestions throughout the domains (movie t(49) =minus1792 p lt 0001 music t(52) = minus2573 p lt 0001book t(50) = minus1275 p lt 0001) (see Tab 20) Figure 10illustrates this graphically

Whenever the SPARQL request provided recommenda-tions which was the case in less than half of the user sessions(see Tab 19) the SKOSRec suggestions were assessed to beof almost equal quality as the SPARQL recommendations(see Tab 20 Usefulness) Hence when SPARQL queryingdid not produce any results in more than the other half of

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 24: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

24 L Wenige et al Similarity-based Graph Queries

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

the test cases the SKOSRec approach may still have pro-vided useful suggestions Therefore it is reasonable to as-sume that the capability of the SKOSRec engine to flexiblyswitch between processing steps of similarity calculation and

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOS-Rec cross-domain requests (TC4)

pattern matching can facilitate an improved exploration ofLOD repositories

Table 21 also shows that the items in the two sets oftendid not match as is indicated by low Jaccard indices Thisfinding further demonstrates that the execution of an addi-tional step of similarity calculation can help to retrieve otherrelevant items

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 25: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 25

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

5 Discussion amp Future Work

This paper has presented the query-based SKOSRec en-gine that facilitates new types of recommendation requestsfor LOD repositories The evaluations have shown that theSKOSRec system can often generate relevant and useful sug-gestions when certain query templates are utilized In manycases the application of expressive or advanced query pat-terns (ie in the context of constraint-based retrieval orcross-domain requests) helped to significantly improve recallvalues while still providing the same level of quality in theremaining performance dimensions Additionally advancedrecommendation requests can be used to generate diversifiedrecommendation lists in case users are not satisfied with theresults of regular recommendation requests (ie roll-up orcross-domain retrieval vs regular requests) While not beingsuperior in each domain and test case is at least safe to saythat the novel recommendation approaches of the SKOSRecengine considerably extend common retrieval methods andat least provide an alternative search strategy when conven-tional methods fail to provide useful results Additionally the

increase in diversity and recall is in line with findings fromprevious research on LOD-enabled RS [45]It is a strength of the developed approaches that they facil-itate combinations of graph-based and similarity-based re-trieval at different stages of the recommendation workflowThis feature extends general search capabilities for semanticnetworks It requires future research to investigate whetherin addition to SKOS further RDF vocabularies can be usedfor ad-hoc item-to-item similarity computation to increasethe range of potential usage scenarios It this context it alsoneeds to be determined whether the representation of the userprofile in the SKOSRec query language can be substitutedwith free-text expressions to enable search-like functionali-ties LOD researchers have already developed engines thatcan process natural language queries over RDF data [59][34] [66] It will have to be investigated how the SKOSRecengine profitably fits into this landscape of existing retrievaltoolsAnother open research question concerns the aspect of in-terfaces to other applications In its standard configurationthe system is not an end-user retrieval engine but a back-end application with which an administrator who has do-main knowledge (eg of RDF vocabularies and data mod-els of a particular LOD repository and usage scenario) cre-ates the respective query patterns Although the websites ofthe online experiments represent possible implementationsof suitable end-user applications for the SKOSRec enginethere are still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selection of

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 26: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

26 L Wenige et al Similarity-based Graph Queries

LOD items for the profile was made possible by previouslystoring the data in an index that could then be accessed andsearched before issuing a recommendation query Howeverthis runs against the idea of ad-hoc retrieval In this contextfuture research will have to determine whether it is feasibleto generate an on-the-fly mapping from items to the respec-tive LOD resources In the case of a commercial applicationthe engine needs to match items with the corresponding en-tries in the product catalog Even if it is not yet clear whichUI components can make the best possible use of the SKOS-Rec enginersquos features the results from the user experimentscan give at least a few suggestionsThe following retrieval options should be the default settingin an interface that contains a LOD-enabled RS componentto assist users in finding interesting items Since the reg-ular approach of on-the-fly retrieval achieved good resultsthroughout the domains the display of simple recommen-dations generated from previously stated user preferencesrepresents a suitable starting point for retrieval On-the-flyqueries could be refined by providing filter options Sincethe evaluations have shown that expressive constraints of-ten lead to increased recall values the application of sucha filter might be the best option for assisted retrieval Asin the preparations for the experiments appropriate graph-based query patterns for the usage scenario in question wouldalso have to be determined by a domain expert before settingup the application The same applies to advanced retrievalpatterns which should be available to the end user to refinethe query when both the simple as well as the constraint-based approach have failed to provide relevant items Thissuggestion is made because roll-up queries were competi-tive with regular recommendations in the web-based experi-ments Therefore they represent a viable alternative retrievalstrategy For some domains (eg for multimedia retrieval)it is also possible to use an optional selection field to en-able the formulation of cross-domain requests Evaluationsfor this query type suggest that users might receive interest-ing results (ie diversified and comprehensive recommenda-tion lists) from such a requestThe prototypical implementation and the evaluation of theSKOSRec engine in different usage scenarios have demon-strated that the freely available knowledge sources on theLOD cloud can be successfully applied to advance LOD-enabled retrieval approaches The novel SKOS-based recom-mendation methods proposed by this work namely expres-sive graph-based filters and advanced queries are promisingalternatives to existing retrieval strategies for semantic net-works

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be comprised ofup to six elements of which the parts SimProjection

and ItemPart are obligatory Hence users need to stateat least their preferences and how many recommen-dations they would like to receive Advanced retrievalpatterns can be formulated as well For instance sim-ilar resource retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (SelectPart)of LOD resources As in SPARQL the query can startwith a prologue that abbreviates IRIs with namespacedeclarations [21]

SelectPart The SelectPart part enables subquerying withrecommendation results The surrounding query isa SELECT query with a slightly simplified WHEREclause (ie RecWhereClause) with which postfilterconditions can be formulated In this section users havethe option to formulate aggregation-based retrieval re-quests (Aggregation)

Aggregation This query part handles aggregation-basedpostfiltering Users specify the IRI (IRIref ) resourcefor which similarity scores of sublevel entities are sum-marized Additionally it is stated how the data shouldbe aggregated (ie SUM MAX or AVG) as well aswhich variable (Var) represents the aggregation-basedrecommendations in the postfilter section The com-piler makes sure that this variable is actually containedin the RecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refers to thelist of generated suggestions It defines the recommen-dation variable to which all similar LOD resources aremapped In case pre- and postfilter conditions are setin the other sections the compiler checks whether thevariable (Var) appears in these sections as well Other-wise the required join operations cannot be executed Itis specified how many recommendations should be dis-played (Integer) and whether the request will be issuedagainst a default repository or as a cross-repository re-quest (ServiceIntegration)

ServiceIntegration This section indicates that a user in-tends to receive recommendations from a differentSPARQL endpoint than the default endpoint that isstated in the standard configuration of the SKOS-Rec engine It specifies the target SPARQL endpoint(IRIref ) from which a user wants to receive recom-mendations Upon extraction of SKOS annotationsfrom the default source endpoint a subsequent requestis sent to the specified target endpoint

ItemPart This section of a SKOSRec query represents a sin-gle preference in the user profile However recommen-dation queries can contain a couple of preference state-ments A preference is either expressed as a LOD re-source (IRIref ) or as a variable which is bound to apreference query (VarPart) In the latter case the re-sources are obtained by matching the graph pattern inthe VarPart with the triple statements in the reposi-tory The ItemPart statement can be additionally sup-plemented with a concept-to-concept similarity thresh-

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 27: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 27

old (Sim) that triggers concept expansion with the ap-proach of flexible similarity detection which is exten-sively described in one of our previous works [63]

VarPart The construct initiates preference querying Indi-vidual likings are expressed as a graph pattern in aRecWhereClause The variable (Var) is a placeholderfor the LOD resources that will be used to set up theuser profile The compiler has to make sure that thespecified variable occurs in the RecWhereClause aswell Otherwise the extraction of LOD resources can-not be carried out correctly

Sim The Sim part of a SKOSRec query denotes the thresholdof concept-to-concept scores for flexible similarity de-tection Even though knowledge-based similarity val-ues usually range between 0 and 1 the specification al-lows a broader range of digits in the Decimal datatypeto enable application of other similarity metrics in casethey are needed

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users can statewhether scores should be ldquolargerrdquo ldquolarger thanrdquo orldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three differentsections of a SKOSRec query ie either in the pre-filter (ItemPart) the postfilter (SelectPart) or the pref-erence filter section (VarPart) The RecWhereClauseclosely resembles the ldquoWhereClauserdquo of the SPARQL11 specification with all its subsequent parts [21]Hence it enables different combinations of graph pat-tern matching However the RecWhereClause of theSKOSRec language allows fewer pattern matchingexpressions than the ldquoWhereClauserdquo of the regularSPARQL syntax specification [21] For instance it isneither permitted to formulate subqueries nor to directfilter requests to more than one repository (see Rec-GraphPatternNotTriples for further explanations) Thismodification has been made to simplify similarity cal-culation After parsing this section the compiler makessure that recommendation preference or postfilter vari-ables occur at least once in the RecWhereClause toguarantee that subsequent steps of the recommendationworkflow can be applied on existing LOD resourcesIn case the RecWhereClause comprises a RecMinus-GraphPattern it also has to be checked that the vari-able is not only contained in the MINUS part of thegroup

RecGroupGraphPattern A RecGroupGraphPattern cancontain one to many basic graph patterns which canbe differently combined according to RecGroupGraph-PatternSub

RecGroupGraphPatternSub This section defines the graphpatterns that are applied to identify suitable LOD re-sources User filters can be either expressed as basicgraph patterns (TriplesBlock) or as combinations of

them (RecGraphPatternNotTriples) The sequence ofdifferent kinds of patterns can be flexibly specified

RecGraphPatternNotTriples This query part closely re-sembles the similar named ldquoGraphPatternNotTriplesrdquoof the SPARQL 11 specification [21] The only dif-ference to SPARQL is that the RecGraphPatternNot-Triples section is a little more restricted in terms ofadmissible graph patterns than the ldquoGraphPatternNot-Triplesrdquo section of the regular SPARQL specificationFor instance it does not allow to retrieve LOD re-sources other than from the default graph that is speci-fied in the configuration of the SKOSRec engine sincefilter patterns have to be applied to datasets that containSKOS annotations Hence they need to be specified be-fore runtime execution to ensure that similarity calcu-lation can be carried out correctly If a user was ableto formulate filter conditions that referred to differentRDF graphs and endpoints accordingly this would notbe possible

RecGroupOrUnionGraphPattern This query section marksan alternative graph pattern in the style of the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntax speci-fication [21] However it neither allows SPARQL-likesubqueries nor federated queries for filtered resourcesbecause this would complicate similarity calculation

RecOptionalGraphPattern In this part of a SKOSRec-Query users are enabled to specify additional graphpatterns which might extend the query solution but donot necessarily have to match the data

RecMinusGraphPattern This section handles exclusion ofLOD resources for cases when users want to omit cer-tain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 28: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

28 L Wenige et al Similarity-based Graph Queries

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 29: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge ...semantic-web-journal.net/system/files/swj2031.pdf · ommendation and search tasks has yet to be realized (see Tab. 1)

L Wenige et al Similarity-based Graph Queries 29

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM Amobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[63] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[64] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[65] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[66] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Discussion amp Future Work
          • Appendix A The SKOSRecommender Query Syntax
          • References