surprise me if you can: serendipity in health...

12
Surprise Me If You Can: Serendipity in Health Information Xi Niu 1 , Fakhri Abbas 1 , Mary Lou Maher 1 , Kazjon Grace 2 1 College of Computing and Informatics University of North Carolina at Charlotte Charlotte, NC 28223, USA {xniu2, fabbas1, M.Maher}@uncc.edu 2 Design Lab University of Sydney Sydney, NSW, Australia [email protected] ABSTRACT Our natural tendency to be curious is increasingly important now that we are exposed to vast amounts of information. We often cope with this overload by focusing on the familiar: information that matches our expectations. In this paper we present a framework for interactive serendipitous information discovery based on a computational model of surprise. This framework delivers information that users were not actively looking for, but which will be valuable to their unexpressed needs. We hypothesize that users will be surprised when presented with information that violates the expectations predicted by our model of them. This surprise model is balanced by a value component which ensures that the information is relevant to the user. Within this framework we have implemented two surprise models, one based on association mining and the other on topic modeling approaches. We evaluate these two models with thirty users in the context of online health news recommendation. Positive user feedback was obtained for both of the computational models of surprise compared to a baseline random method. This research contributes to the understanding of serendipity and how to “engineer” serendipity that is favored by users. Author Keywords serendipity, computational models, surprise, value, health news, information retrieval systems ACM Classification Keywords H3.3. Information storage and retrieval: Information search and retrieval. INTRODUCTION Serendipity has been defined as unexpected and valuable discoveries made by accident [7]. With rapid advances in search and information retrieval, query-based interaction has become the norm for interfacing with the overwhelming volumes of information that suffuses society. Today’s information delivery systems have been criticized as “reinforcement of the same, relatively limited set of information” rather than promoting unexpected exploration and discovery [13]. Similarly, Facebook has admitted that its algorithms form ‘echo chambers’ [6]. The notions of “filter bubbles” [40] and “blind spots” [36] are some of the prime motivations behind serendipity research – bursting the bubble and reducing information blind spots through thoughtful design. However, serendipity is recognized as being very challenging to actively stimulate because it is by definition unpredictable for the user [33]. In this paper we propose a conceptual framework for serendipity, implement it in the context of health news recommendation using two different computational models of surprise, and then evaluate it with a user study. Our contribution is to advance knowledge of how to model the concept of serendipity both conceptually and computationally. We also provide a user-centered approach to evaluating serendipity. RELATED WORK This research brings together two threads from the literature: how serendipity is distinct from diversity and novelty, and how serendipity can be operationalized in online information retrieval systems. Serendipity versus Diversity and Novelty First coined by Harold Walpole in 1754 [7], the word “serendipity” is used to describe the process of making discoveries by accident. It received little attention until the mid-1900s when it was used as a descriptor of accidental or unplanned discovery in the scientific context [7]. Although there is some disagreement as to the precise nature of serendipity, most accounts agree that the following two aspects are central: an unexpected encounter and a valuable discovery. In this paper, we operationalize these two core aspects – surprise and value – in order to predict serendipitous experiences. Serendipity, diversity and novelty are all “beyond- accuracy” evaluation metrics proposed for recommender systems in recent years [26]. It is worth distinguishing between them since there is significant overlap and potential for confusion. Diversity has been studied in the IR field since 1998 when Carbonell and Goldstein investigated the relationship between diversity and retrieval accuracy [15]. Before that, quantifying diversity was studied in business research for managing investment risk, as diversification of stock portfolio reduces risk [31]. In the past decade it has been argued that ranking retrieved items Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. CHI 2018, April 21–26, 2018, Montréal, QC, Canada © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-5620-6/18/04…$15.00 https://doi.org/10.1145/3173574.3173597 CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada Paper 23 Page 1

Upload: others

Post on 30-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

Surprise Me If You Can: Serendipity in Health Information

Xi Niu1, Fakhri Abbas

1, Mary Lou Maher

1, Kazjon Grace

2

1College of Computing and Informatics

University of North Carolina at Charlotte

Charlotte, NC 28223, USA

{xniu2, fabbas1, M.Maher}@uncc.edu

2Design Lab

University of Sydney

Sydney, NSW, Australia

[email protected] ABSTRACT

Our natural tendency to be curious is increasingly important

now that we are exposed to vast amounts of information.

We often cope with this overload by focusing on the

familiar: information that matches our expectations. In this

paper we present a framework for interactive serendipitous

information discovery based on a computational model of

surprise. This framework delivers information that users

were not actively looking for, but which will be valuable to

their unexpressed needs. We hypothesize that users will be

surprised when presented with information that violates the

expectations predicted by our model of them. This surprise

model is balanced by a value component which ensures that

the information is relevant to the user. Within this

framework we have implemented two surprise models, one

based on association mining and the other on topic

modeling approaches. We evaluate these two models with

thirty users in the context of online health news

recommendation. Positive user feedback was obtained for

both of the computational models of surprise compared to a

baseline random method. This research contributes to the

understanding of serendipity and how to “engineer”

serendipity that is favored by users.

Author Keywords

serendipity, computational models, surprise, value,

health news, information retrieval systems ACM Classification Keywords

H3.3. Information storage and retrieval: Information search

and retrieval.

INTRODUCTION

Serendipity has been defined as unexpected and valuable

discoveries made by accident [7]. With rapid advances in

search and information retrieval, query-based interaction

has become the norm for interfacing with the overwhelming

volumes of information that suffuses society. Today’s

information delivery systems have been criticized as

“reinforcement of the same, relatively limited set of

information” rather than promoting unexpected exploration

and discovery [13]. Similarly, Facebook has admitted that

its algorithms form ‘echo chambers’ [6]. The notions of

“filter bubbles” [40] and “blind spots” [36] are some of the

prime motivations behind serendipity research – bursting

the bubble and reducing information blind spots through

thoughtful design. However, serendipity is recognized as

being very challenging to actively stimulate because it is by

definition unpredictable for the user [33].

In this paper we propose a conceptual framework for

serendipity, implement it in the context of health news

recommendation using two different computational models

of surprise, and then evaluate it with a user study. Our

contribution is to advance knowledge of how to model the

concept of serendipity both conceptually and

computationally. We also provide a user-centered approach

to evaluating serendipity.

RELATED WORK

This research brings together two threads from the

literature: how serendipity is distinct from diversity and

novelty, and how serendipity can be operationalized in

online information retrieval systems.

Serendipity versus Diversity and Novelty

First coined by Harold Walpole in 1754 [7], the word

“serendipity” is used to describe the process of making

discoveries by accident. It received little attention until the

mid-1900s when it was used as a descriptor of accidental or

unplanned discovery in the scientific context [7]. Although

there is some disagreement as to the precise nature of

serendipity, most accounts agree that the following two

aspects are central: an unexpected encounter and a valuable

discovery. In this paper, we operationalize these two core

aspects – surprise and value – in order to predict

serendipitous experiences.

Serendipity, diversity and novelty are all “beyond-

accuracy” evaluation metrics proposed for recommender

systems in recent years [26]. It is worth distinguishing

between them since there is significant overlap and

potential for confusion. Diversity has been studied in the IR

field since 1998 when Carbonell and Goldstein investigated

the relationship between diversity and retrieval accuracy

[15]. Before that, quantifying diversity was studied in

business research for managing investment risk, as

diversification of stock portfolio reduces risk [31]. In the

past decade it has been argued that ranking retrieved items

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full

citation on the first page. Copyrights for components of this work owned by others

than ACM must be honored. Abstracting with credit is permitted. To copy otherwise,

or republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee. Request permissions from [email protected].

CHI 2018, April 21–26, 2018, Montréal, QC, Canada

© 2018 Association for Computing Machinery.

ACM ISBN 978-1-4503-5620-6/18/04…$15.00

https://doi.org/10.1145/3173574.3173597

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 1

Page 2: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

by only their retrieval accuracy (or relevance) increases the

risk of producing a limited, overly similar and unsatisfying

set of results [3, 16, 46]. Diversifying the retrieval results

may solve this problem by including different items that

may be tangential to the query but potentially match

interests missing from the query. There is a growing

consensus that user satisfaction and engagement have been

improved with such diversification, even at the cost of some

retrieval accuracy [42, 45, 49]. We believe diversity

increases the chance of serendipity, but not every

diversified result is serendipitous; equally, not all

serendipitous discoveries arise from diverse results.

A related concept is novelty, which is a measure of how

different an item is to a set of other items [20]. This set can

be a whole database (objective novelty) or an estimate of

the user’s knowledge (subjective novelty). In the

recommender system context, novelty means an item that is

unknown to the user [23]. By contrast, surprise (on which

we base our serendipity recommender) is a measure of how

strongly an item violates expectations. While both surprise

and novelty can be objective or subjective, they do not

measure the same quantity as expectation violation is more

specific than simple difference. Surprising items are novel,

but not all novel items are surprising [20]. The proposed

relationship among diversity, novelty, and serendipity is

presented in Figure 1. We will validate this hypothesis later

with our user study.

Figure 1. The proposed relationship between diversity, novelty,

and serendipity

Serendipity in Information Retrieval

Serendipity arises when something is surprising and also

valuable. The natural human information-seeking process is

full of such chance encounters. Miksa [35] described

acquisition of intellectual knowledge as a relatively

unfocused sense of inquiry, and said that information

retrieval is “better conceived as an exploratory and game-

like mechanism rather than a precise response mechanism”.

Gup in 1997 [22] expressed his concern about the 'end of

serendipity' in the digital world, recalling with fondness his

childhood experiences coming across interesting tidbits of

information while flipping pages in the encyclopedia.

Gup’s concerns are echoed by others in more recent studies

in [34, 41, 43]. The sense that the online environment is

increasingly deterministic and predictable promotes a

widespread feeling that serendipity is threatened.

However, researchers are experimenting with new tools that

support serendipity technologically [33]. Earlier attempts

include the LyricTime system [28], a music recommender

system that accommodated serendipitous access by

occasionally adding randomly-picked songs to the user’s

playlist. Campos and Figueiredo [14] developed a software

agent called Max that incorporated knowledge of concept

association to find information that imperfectly matches a

user’s profile. The idea was inspired by De Bono’s “lateral

thinking” [17], which encourages the acceptance of

accidental aspects of thinking over the adoption of a

sequential process. André, Teevan, and Dumais [5]

identified a set of partially relevant, but highly interesting,

results returned by a search engine and concluded that this

set has the highest potential for being serendipitous.

Iaquinta et al. [24] described a content-based recommender

with items represented by text descriptions. A machine

learning method was used to predict the user ratings of

unseen items. Items for which the prediction was uncertain

between positive and negative were considered potentially

serendipitous. Onuma, Tong, and Faloutsos [39] introduced

a metric called “bridging score” for item nodes in a graph.

Nodes connecting disparate subgraphs in a graph received

high bridging scores, and therefore higher serendipity.

Zhang et al. [48] used topic modeling to represent artists as

a distribution of latent user clusters. Specifically, they used

the Latent Dirichlet Allocation (LDA) model to represent

each artist, allowing for similarity calculations and

clustering among artists. Their recommender generates

potentially serendipitous recommendations by promoting

artists the user has not explicitly preferred.

There is a recent trend in the visual analytics research

community to support serendipitous discoveries through

various interactive visualization techniques, examples

include Bohemian Bookshelf [4] and Serendip [44]. These

studies, though their approaches and implementations vary,

all imply that carefully designed retrieval systems may have

a role in increasing serendipitous discoveries. Although

generating positive user feedback, these systems tended to

rely on diversity, imperfect matches, and randomness to

approximate serendipity. Without direct focus on the core

elements of serendipity, the results tend to be highly

variable or “hit or miss” [30]. By contrast we directly

predict surprise, with the aim of inducing it in a systematic

and controllable way.

Surprise, the response to unexpected stimuli, has been

suggested as a critical element in triggering and evaluating

serendipity [2]. The common approach for generating

surprise in the recommender systems research community

is to compare the recommendations with a primitive

baseline system. Murakami, Mori, and Orihara [37] assume

that primitive methods generate expected recommendations

and anything that deviates is unexpected. Following

Murakami’s idea, Adamopoulos and Tuzhilin [1] derived a

set of unexpected recommendations by deducting the items

that are recommended by a primitive prediction from the

items generated by a serendipity recommender. The

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 2

Page 3: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

usefulness of recommendations was left to user judgement.

A limitation of this comparative approach to serendipity

measurement is its sensitivity to the choice of the baseline

system. Our approaches to modelling surprise are

independent of any baseline as they are based on models of

expectation. We compare two such models: one based on

topic co-occurrence and one on the distribution of themes.

An article that contains topics with a lower likelihood of co-

occurring or has a larger divergence from a typical theme

distribution is rated as more surprising.

CONCEPTUAL FRAMEWORK FOR SERENDIPITY

The proposed conceptual framework for serendipity

consists of two components: Surprise and Value, as shown

in Figure 2. The Surprise component constructs an

expectation model to capture what the user would expect to

see. In the field of artificial intelligence, models of surprise

are based on the observation of unexpected events. For

example, Grace and Maher [19, 21] define surprise as the

violation of a confidently-held expectation and present a

probabilistic model of surprise based on co-occurrence of

features. Building on this work, we propose that expectation

for an information object is based on the expected

likelihood of a user seeing such an information object. A

violation of such expectation would be a surprise. This

differs in its details from the Bayesian surprise proposed by

Itti and Baldi [25] where surprise is defined as the

difference between prior and posterior expectation using

Bayes theorem.

Figure 2. Framework for serendipity

In information science, the value of an information object is

based on the subjective, affective, or emotional aspects of a

user’s assessment of it [11, 18]. Recognition of the affective

aspect of value for information stems from the observation

that value is invariably determined by subjective, user-

specific functions. This makes value fundamentally distinct

from whatever objective “stuff” a retrieved artifact holds.

Specifically, in this study we consider a value construct of

Willingness-to-Pay (WTP) and Experienced Utility (EU),

developed in [29], as a measure for the subjective value of

information. WTP is suggested to reflect the instrumental-

rational value that an information object has in problem-

solving tasks, while EU reflects the aesthetic-emotional

value that an information object has in its own right, as the

user engages with the object. Thus, it can be argued that

WTP and EU represent two parts of the value construct that

includes rational and emotional aspects. Our Value

component reflects both.

STUMBLEON: IMPLEMENTING OUR FRAMEWORK

We have implemented the framework in Figure 2 in the

domain of online health information, resulting in a

recommender prototype, called StumbleOn, an extension of

our previous work [38]. The architecture of StumbleOn is

shown in Figure 3. Health topics usually have strong but

hidden associations that provide a rich information ground

for inducing serendipity. Our framework is domain

independent, this implementation is exemplary.

Figure 3. Architecture of StumbleOn

StumbleOn implements the Surprise component as the

Computational Surprise Module, and the Value component

as the Personalized Value Module. The two Modules work

together, using the output of the Content Analyzer, and they

both keep the user in the loop via feedback. The Content

Analyzer uses a series of text mining and machine learning

techniques to re-represent the information in text

documents into parsable structures. These structures are

used to construct user expectations, as detailed below. Our

extension of previous work [38] is the specification of two

computational approaches for the Computational Surprise

Module as well as the addition of the content analyzer

component for constructing document representations.

Implementation of the Computational Surprise Module and Personalized Value Module

The Computational Surprise Module constructs an

expectation model, a violation of which is deemed a

surprise. Expectation is strongly domain dependent. Since

this work is about information objects (news), we model

expectation as an extension of the established view of text-

based information as a “bag of words”. We propose that

expectation about a news article is based on “a bag of co-

occurring constituents”. Depending on the granularity, the

constituents could be coarse-grained topics or fine-grained

themes. In our first approach, we view each news article as

“a bag of co-occurring topics”, where topics are the labels

assigned to an article by experts. The expectation of seeing

an article in the corpus is modeled as the expected

likelihood of a particular a bag of co-occurred topics (an

article) in the corpus, represented as the multiplication of

the individual likelihood of each topic contained in the

article, i.e. p(t1)*p(t2)…*p(tn). The actual or observed

likelihood for an article given a user selected topic tu,

however, is the joint likelihood of the topic combination

conditioned on the selected topic tu, i.e. p(t1, t2, …,tn | tu).

The difference between the observed likelihood and the

expected likelihood reflects the amount of surprise,

represented as the negative of the log ratio to discount the

very large difference, as in Equation 1:

𝑠1 = −𝑙𝑜𝑔2

𝑝(𝑡1, 𝑡2, … 𝑡𝑛|𝑡𝑢)

𝑝(𝑡1)𝑝(𝑡2) … 𝑝(𝑡𝑛)= −𝑙𝑜𝑔2

𝑝(𝑡1, 𝑡2, … 𝑡𝑛 , 𝑡𝑢)

𝑝(𝑡1)𝑝(𝑡2) … 𝑝(𝑡𝑛)𝑝(𝑡𝑢)(1)

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 3

Page 4: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

In this model, the amount of surprise is represented by the

divergence of the actual joint distribution (the upper part of

the log fraction) from the expected individual distribution

(the lower part of the log fraction). The larger the ratio, the

more likely that combination of topics is in the corpus. A

smaller ratio indicates a rare topic combination, and

therefore a higher surprise score. Our surprise calculation is

a variation of Mutual Information (MI), an established

metric in the text mining field [12]. MI measures how much

information several random variables share, or how much

the uncertainty (entropy) of one variable is reduced by

knowing the other variables. The random variables are the

topics t1, t2, … tn. We label this approach MI.

In our second approach to modeling surprise (s2) we view

each article as “a bag of co-occurring themes”. Probabilistic

topic modeling [9] is a set of algorithms that discover the

themes that run through them and how these themes are

connected. LDA (Latent Dirichlet Allocation) is a popular

example of such a model [10]. According to LDA, each

article is generated by choosing the latent themes zi

probabilistically and then for each latent theme choosing

the words wi probabilistically, represented as:

𝑑 = ∏ 𝑝(𝑧𝑖)𝑝(𝑤𝑖|𝑧𝑖)𝑘𝑖=1 (2)

where p is the distribution of the latent themes in an

article, zi is the latent theme i, 𝑝 is the distribution of the

words for the latent theme zi, and wi is the word i. In fact,

the generated model is the likelihood of observing such an

article with k latent themes. We apply this model such that

the expected likelihood of observing a typical article given

a user preferred topic will be the likelihood of observing an

“average” article by averaging the likelihood of all articles

in the preferred topic corpus. Each individual article’s

divergence from this expectation is that article’s degree of

surprise. We use the Kullback Leibler (KL) divergence as

the divergence measure since it is a common way to

evaluate the divergence between two probability

distributions. The surprise score is calculated as Equation 3,

where p is the distribution of the latent themes in an

article, q is the distribution of a typical article, and i is the

index of the latent themes. We label this approach as KL.

𝑠2 = KL(p, q) = ∑ pilog2pi

qi

ki=1 (3)

For the Personalized Value Module, we use an interactive

process to gather data about the user. Previous research has

been ambiguous on the meaning of value: some researchers

use “interest” [5], while others use “usefulness” [1]. Having

the WTP-EU construct in mind, we asked participants two

questions: usefulness to represent the WTP (rational) aspect,

and interestingness to represent the EU (emotional) aspect.

Specifically, participants were asked to rate on two 5-point

Likert-scales to indicate how useful and how interesting

they think that article is. The sum of the two ratings will

serve as the value rating.

Health News Corpus

We scraped the health news articles from Medical News

Today (MNT) since its launch in 2003 to the present. MNT

is a leading website to provide quality and updated health

news for average readers in the U.S. The corpus contains

268,850 articles, classified into 135 health topics, such as

diabetes, heart disease, anxiety, women’s/men’s health, by

health professionals working with MNT. Most articles have

multiple topic labels, as summarized in Table 1 below.

After reviewing the 135 topics, we removed topics that are

too broad (e.g., primary care, surgery, medical innovation),

too narrow (e.g., statin), or too research-focused (e.g., stem

cell research). The final list contained 100 topics. Total no. of articles 268,850

Total no. of topic labels 135

Total no. of articles with 1 topic label 86,364

Total no. of articles with 2 topic labels 83,060

Total no. of articles with 3 topic labels 60,912

Total no. of articles with 4 topic labels 38,514

Table 1. Summary of the health news corpus

For the MI approach, these topic labels were leveraged as

the topics (ti) as in Equation 1. The value of s1 for each

article was calculated using Python’s Math and Sci-Kit

Learn (sklearn) packages. For the KL approach to

calculating s2, the LDA topic modeling technique was

applied to each sub-corpus of the 100 topics (each sub-

corpus corresponding to each topic) to generate those latent

themes on the 100 sub-corpora. A high efficiency topic

modeling tool, Gensim, as a Python library was used for the

LDA analysis and calculating s2 for each article.

To demonstrate the computational result of the surprise

scores, let us take the topic diabetes for example. Figure 4

presents the cumulative distribution of the Z-scores of s1

and s2 respectively of all the 20,458 articles involving the

topic diabetes. For the s1 distribution in Figure 4a, a Z-score

at the right side (above the average 0) indicates a topic

combination with diabetes potentially surprising, co-

occurring less often than expected by chance. One such

example is the combination of bipolar disorder, depression,

and diabetes. For the KL-divergence distribution in Figure

4b, a Z-score at the right side indicates a larger divergence

from an expected article. This suggests a potentially

surprising latent theme distribution, such as an article on

diabetes also talking about kidney, brain, and sleeping

disorder. Both curves are roughly parallel to a normal

distribution, suggesting most articles are centered around

the average level of surprise. Although surprise is a

desirable feature in any news, this health news corpus is

grounded heavily in conventional knowledge (the grey

area) and occasionally reaches out to the highly surprising

zone (greater than the 90th

percentile).

Table 2 lists some examples of the most and the least

surprising articles based on s1. As we can see in the topic

label column, the most surprising articles (highlighted in

pink) have those rare topic combinations, like diabetes

together with infectious disease and public health; whereas

the least surprising articles (highlighted in grey) have

combinations like diabetes, nutrition, and obesity, which

are more common.

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 4

Page 5: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

Figure 4. MI (a) and KL-Divergence (b) distributions for all the articles involving the topic diabetes

Example Article Titles Topic Labels s1 Z-score

Diabetes: Mining The 'Wisdom of

Crowds' To Attack Disease

Diabetes,

Infectious Diseases,

Public Health

5.40 2.38

Increased Diabetes Risk In HIV-

Positive Children

Diabetes,

HIV, Pediatrics

5.02 2.26

Dogs Sniff Out Diabetes

Diabetes, Psychology,

Public Health

4.90 2.22

Exploring Diabetes' Link to

Eating Disorders

Diabetes,

Nutrition,

Obesity,

Eating Disorder

-12.54 -3.55

Stored fat fights against the

body's attempts to lose weight

Diabetes,

Obesity,

Eating Disorder

Heart Disease

-12.85 -3.65

New research exposes the health risks of fructose and sugary

drinks

Diabetes, Obesity

Heart Disease

Gout

-14.68 -4.26

*the pink area indicates surprising articles whereas the grey area indicates

non-surprising ones

Table 2. Article examples based on s1

To get a sense of what latent themes generated by the LDA

analysis, Table 3 lists the five highest weighted themes

generated from the diabetes sub-corpus (collection of

articles with the topic diabetes). Since LDA treats each

latent theme as “a bag of words”, the top five words that

have the highest probability in each theme are listed to

represent the semantic meaning of that theme, which could

be, from left to right, patient, diabetes study, insulin &

glucose, treatment, and research experiment.

Theme 1 Theme 2 Theme 3 Theme 4 Theme 5

diabetes diabetes insulin patient cell

patient type mouse metformin beta

people study fat treatment university

risk cell glucose control pancreas

blood researcher protein sugar new

Table 3. Five most highly weighted themes

The corpus centroid distribution over these five themes,

which represents a typical diabetes article, is presented as

the red line in Figure 5. The other five lines in Figure 5 are

the five example articles as detailed in Table 4. In Figure 5,

the articles of “Oklahoma student”, “carb quality”, and

“reduce cavities” are highly divergent from the centroid,

suggesting high levels of surprise; whereas “aging”, “post-

term delivery”, and “Schizophrenia” are closer to the

centroid, implying lower levels of surprise.

Figure 5. Centroid and example articles distributions over the

top five latent themes

Article shorthand Example Article Titles s2 Z-score

Oklahoma student Diabetes Management in Schools

Act Will Help Ensure Oklahoma

Students with Diabetes Are Safe at

School

2.57 6.41

Carb quality Carb quality matters - reduces

diabetes risk

2.17 5.06

Reduce Cavities What reduces cavities, food or

fluoride?

2.09 4.76

Aging Diabetes, arthritis, cancer and

Alzheimer's disease: strategy proposed for preventing diseases of

aging

0.20 -1.75

Post term delivery Post-term delivery raises risk of

complications and illness for

newborns

0.19 -1.77

Schizophrenia Dissecting the Relationship Between

Schizophrenia and An Increased

Risk of Type 2 Diabetes

0.18 -1.83

*the pink area indicates surprising articles whereas the grey area indicates non-surprising ones

Table 4. Article examples based on s2

How StumbleOn Works

StumbleOn recommends health news to users on a session

basis. Each user is initially asked to choose five to ten

topics from the list of 100 topics to indicate their

preferences. Then a recommender session contains five to

ten articles with one article corresponding to one preferred

(a) (b)

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 5

Page 6: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

topic. Since the study will not evaluate the impact of the

specific placement of articles in a list, the rank of the

articles is randomized for each session. Within each

session, users are able to view and click on article titles,

open a different window to browse the article content, and

provide ratings at the side of the window. They are also

encouraged to save their favorite articles to a shopping cart

for later reading. A screenshot of a session is presented in

Figure 6.

Figure 6. Screenshots of StumbleOn

EVALUATION STUDY

After the implementation of StumbleOn, a user study was

conducted to evaluate which computational approach

produces results that best match what users perceive as

simultaneously surprising and valuable.

We adopted a two-way factorial design to present the

different approaches to participants. One dimension in the

design is the computational approach: MI or KL, for which

the order of presentation was counterbalanced. The other

dimension is the computational score level: high and low.

High means articles with a top 33rd

percentile surprising

score (s1 or s2) within that preferred topic sub-corpus while

low means a bottom 33rd

percentile s1 or s2. In order to

better explain how StumbleOn recommends session-based

articles, let us assume one participant chooses six preferred

topics: anxiety, diabetes, depression, sleep disorder, breast

cancer, and hypertension. Each recommended session with

six articles is shown in Figure 7 below.

In addition to those four groups in Figure 7, to investigate if

the computational approaches beat blind luck via

randomness, a baseline group was created to randomly pick

articles from a preferred topic sub-corpus to recommend to

participants. The study is a within-subject design, meaning

each participant experienced all five groups.

Figure 7. Examples of recommended sessions

Thirty graduate students in the College of Computing and

Informatics were recruited, all of whom were enrolled in

computing programs and none of whom had completed any

formal medical education. After the introduction and

obtainment of consent, an entry questionnaire was

administered to collect demographic information and

preferred health topics they would like to read about. Each

participant then experienced ten recommended sessions

with articles corresponding to his/her preferred topics, each

two consecutive sessions representing one of the five

manipulation groups. Participants were encouraged to click

on whatever articles they would like to read. For each

clicked article, they were required to provide their ratings

on a 5-point Likert-scale on four questions: whether the

article content is novel, surprising, useful, and interesting.

The first question represents the concept of novelty, and

was asked to explore the relationship between novelty and

serendipity. The second question represents the surprise

component while the third and the fourth questions

represent the two aspects of the value component. In

addition to these four Likert-scale questions, participants

also needed to answer yes or no on the fifth question:

whether the article sparks a new area of interest. This last

question is inspired by the idea that serendipity is the result

of a divergent process [14]. This model suggests that it

produces a divergent path toward an unexpected new area

that the user was not previously aware of.

EVALUATION RESULTS

On average, the thirty participants had 11.4 years of seeking

information online and 5.6 years of seeking online health

information. Twenty-seven of them have listed one of the

major search engines (Google, Yahoo, etc) as their primary

source for health information. Very few of them mentioned

specialist medical websites, such as WebMD and

MedlinePlus. Out of the 100 health topics, the participants

have chosen 62 as their preferred topics, reasonable

coverage of what was available. Among the chosen topics,

nutrition, mental health, sleep disorder, depression,

diabetes, and back pain were the most frequently chosen.

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 6

Page 7: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

User Clicks and Number of Saved Articles

User clicks and saves have been regarded as an implicit

indicator of the potential interest in search engine studies

[27]. These can also serve as indicators of the user’s

potential interest in the recommended articles. We tracked

the number of clicked and saved articles for the five groups

in each session. The result in Figure 8 shows that on

average participants clicked 1.4 articles and saved 0.7 per

session, suggesting they were very selective in what they

clicked or saved. For the number of clicks, a one-way

repeated measure ANOVA test shows there was no

significant difference among the different approach groups

(F (4, 116) = 2.016, p = 0.097), suggesting these groups

seemed equally appealing in attracting user clicks.

However, for the number of saved articles, the difference

was significant (F (4, 116) = 3.129, p = 0.015). Tukey’s

post-hoc test shows that the KL-Low group had

significantly higher number of saves than all the other

groups and the KL-High group had significantly lower

number than all the other groups. On the surface, this

suggests that a higher surprise score (at least as measured

by the KL approach) would attract fewer clicks and saves,

but the picture is more interesting when value is included.

Figure 8. The average and standard deviation of the number

of clicks and saves

Surprise Ratings for each Approach

Participants’ ratings of surprise are summarized in Figure 9.

As the result, the MI-High session received the highest

average rating whereas the MI-Low session the lowest. One

way repeated measure ANOVA test shows there was

significant difference among these groups (F(4, 463) =

2.208, p = 0.067). Tukey’s post-hoc method identifies two

tiers with significant difference. Tier 1 with significantly

higher surprise ratings is the Baseline and MI-High groups

and Tier 2 with significantly lower surprise ratings is the

MI-Low, KL-Low, and KL-High sessions. It is interesting

to note that for the KL approach, the higher level of

computational surprise in fact resulted in lower level of

user-perceived surprise. For the Baseline session (randomly

selected articles), the user-perceived surprise was

unexpectedly high, very close to the MI-High group.

Figure 9. The average and standard deviation of the surprise

ratings

Serendipity Rating for each Approach

Serendipity is proposed to arise from the discovery of

information that is surprising and valuable. In this study,

the concept of value is further decomposed into being

useful and being interesting. We evaluate the serendipity

rating of an article as the aggregate of the three ratings of

surprise, usefulness, and interestingness, as in Equation 4

with the assumption that surprise and value carry the same

weight for serendipity; usefulness and interestingness have

the same weight for value:

serendipity = surprise +1

2 usefulness +

1

2 interestingness (4)

Figure 10. The average and standard deviation of the

serendipity ratings

Using this formula, we calculate the average serendipity

ratings for each approach. Shown in Figure 10, the average

serendipity rating is the highest for the MI-High approach

and the lowest for the Baseline. A one-way repeated

measure ANOVA test shows significant difference among

the five groups (F(4,463) = 3.405, p = 0.009). Tukey’s post-

hoc method reveals three tiers with significant difference.

The high tier is the MI-High and KL-Low groups, the

middle tier is the MI-Low and KL-High groups, and the

Base group is in the low tier. Similar to the surprise ratings

in Figure 8, the serendipity rating was unexpectedly higher

for the KL-Low group than the KL-High group, suggesting

that the computational model did not align well with what

the participants perceived. In a departure from the surprise

ratings, user-perceived serendipity was the lowest for the

Baseline group, suggesting the surprise identified via

randomness was not valued as much as that via

computational approaches. Random surprise did not lead to

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 7

Page 8: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

serendipitous discovery at the same rate as the approaches

based on computational surprise measures.

Previous literature reports that most users value the

usefulness of health information more than the enjoyment

[47]. However, participants’ comments on their surprise

touched on both aspects of interest, with comments like

“enjoy learning a nice variety of stuff” being more

aesthetic-emotional, whereas “relevant to my father’s

condition” was more instrumental-rational. We find a

strong correlation between the ratings of usefulness and

interestingness (r = 0.73, p < 0.0001, as shown in Table 5),

suggesting that when an article is useful, it is very likely to

be interesting as well. This suggests some users did not

express a clear-cut distinction between the two concepts.

When investigating their correlations with surprise,

however, we find a stronger correlation between being

interesting and being surprising (r = 0.63, p < 0.0001) than

that between being useful and being surprising (r = 0.54, p

< 0.0001). This means that while surprising health

information was quite likely to be both interesting and

useful, it was significantly more likely to be interesting than

it was to be useful.

surprise interest usefulness novelty

surprise

interest 0.63*

usefulness 0.54* 0.73*

novelty 0.72* 0.59* 0.48

spark 0.49 0.59* 0.52* 0.45

* indicates significance at the 0.05 level

Table 5. Pearson correlation matrix (n = 497)

Surprise, Novelty, and Inspiration

Since novelty is a well-studied research topic in the

recommender research community, we consider the

relationship between novelty and surprise in our results. We

claim that a novel item is not necessarily serendipitous, and

but all serendipity is novel. To support this claim, we

examined the correlation between the ratings of surprise

and novelty and identified a strong correlation between the

two concepts (r = 0.72, p < 0.0001, as in Table 5), meaning

being surprising is also highly likely to be novel. If we

combine the participant ratings 4 and 5 as positive, 1 and 2

as negative, and ignore the neutral ratings, the cross

distribution of positives and negatives for the five groups is

listed in Table 6. As we can see, most articles were both

surprising and novel, or neither. We still find several cells

with zeros, indicating that for both the MI and KL

approaches, there was no article that was surprising but not

novel, backing up our claim that surprise is a necessary

component of novelty.

As mentioned before, surprises that spark a new area of

interest are highly desirable. A moderate (Pearson)

correlation was found between surprising and sparking new

interest, but it was not significant (r = 0.49, p = 0.23). The

breakdown distribution by approach indicates that there is a

non-trivial number of articles that were surprising but did

not spark a new area of interest or vice versa.

Table 6. Cross distribution of surprising and novel articles

Base MI-Low MI-

High

KL-Low KL-

High

Sparking Y

N

Y

N

Y

N

Y

N

Y

N

Surprisin

g (Yes) 35 11 27 16 39 10 40 10 33 9

Surprisin

g (No) 2 8 3 10 2 15 5 16 2 22

Table 7. Cross distribution of surprising and sparking articles

Exit Questionnaire: What Participants Felt about Each Approach

Participants were asked to fill out a 10-item questionnaire

shown in Table 8, representing five latent dimensions of the

concept of serendipity, adapted from the instrument

developed by McCay-Peet and Toms [32]. These five

dimensions are: enabled connections (Q1, Q6, and Q10),

introduced the unexpected (Q2 and Q7), presented variety

(Q3, Q4, and Q8), triggered divergences (Q5), and induced

curiosity (Q9). The ratings on these ten questions are

summarized in Figure 11. Since the questionnaire was

administered immediately after experiencing each

computational approach (MI, KL, or Baseline), the ratings

were aggregated across the levels of computational surprise.

As in Figure 10, the ratings across all the approaches were

generally higher for Q1, Q2, Q8, and Q9; but lower for Q4,

Q5, and Q7, implying the system’s strength at presenting

variety and making connections, and meanwhile the

weakness at triggering divergence. The MI approach

usually received higher ratings than KL and Baseline.

Follow-Up Interview: What Participants Say About Surprise and Value

Upon finishing the last recommended session, participants

were asked about whether they encountered surprising

articles, whether they liked those surprising pieces, and

whether they thought the surprising news was useful,

interesting, or both.

Whether the users encountered surprise: All participants

indicated that they came across surprising articles. While

some participants found some articles as a whole were

surprising, others found only a paragraph or two had

surprising content. When asked to give examples of

surprising articles, one participant said “it is a surprise to

me that an article talked about how poor dental hygiene

may lead to breast cancer”. Another participant stated “I

read an article saying how obesity is viewed differently

among Muslims and Christians. That was completely

surprising because I didn’t think religion plays a role.”

Another participant cited a piece of news about a head

transplant surgery as very “shocking” to her. One

Base MI-Low MI-

High KL-Low

KL-

High

Novel Y

N

Y

N

Y

N

Y

N

Y

N

Surprisin

g (Yes) 39 2 41 0 48 0 48 0 36 0

Surprisin

g (No) 3 3 3 8 3 7 2 8 4 13

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 8

Page 9: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

participant stated that he expected that kids who were

bullied would be more likely to develop depression.

However, to his surprise, an article said that kids who

bullied others were also more likely to develop depression.

Figure 11. The average and standard deviation of the

participant ratings on the exit questionnaire

Q1 I was able to examine a variety of health topics

Q2 I explored many topics that normally I do not examine

Q3 Unexpected titles caught my eye and made me click on

Q4 Unexpected words and phrases in the snippet caught

my eye and made me click on the title

Q5 Unexpected words and phrases in the article contents sparked my thinking

Q6 The system enables me to make connections between

different health topics

Q7 I stumbled upon unexpected health topics

Q8 The system encouraged me to browse and explore

Q9 I found myself pausing to look at things more closely

Q10 I could return to topics that I had explored earlier

Table 8. Ten questions in the exit questionnaire

Whether the users like the surprise: Almost all participants

indicated that they liked at least some of the surprising

articles they encountered. One participant stated, “the

surprising ones are actually some of my favorites.” These

participants voiced positive opinions, saying they liked the

articles because they learned something they had not

previously known. Their comments can be summarized as

either presenting new information on a general level or

presenting specific new ideas”. Some other participants

also gave some negative examples of surprising news. One

participant mentioned that he selected flu as his preferred

topic and several articles on swine flu were completely

surprising and irrelevant to his need, even though he

understood the association between flu and swine flu. One

participant mentioned that smoking was his selected topic

but those articles on school policies on student smoking or

law enforcement were unexpected and not what he was

looking for. One participant mentioned that articles

involving mice eating habits or the way cats respond to

different medications, although surprising, were trivial and

irrelevant to her life. Another participant said she

questioned some of the surprising articles, such as an article

that introduced a newly discovered anti-aging hormone and

a new diet supplement product. She suspected that the

author might work with some pharmacy company that

aimed at promoting the sales of this product. Several

participants stated they were not interested in research-

oriented articles. They favored “easy reads” that were for

“people without any medical background”.

None of the participants mentioned surprise explicitly as a

reason for clicking on an article. They used words and

phrases like “catchy”, “caught my attention”, “curious”,

“seemed interesting”, “looked like what I was looking for”,

or “the headline was informative”. One participant just said

“just want to check it out what it is.” Responses were

similarly diverse when participants were asked why they

saved some of the articles into their cart. Some participants

said “for future in-depth reading” because they did not have

enough time to read it thoroughly in the lab setting of the

study. Others mentioned “wanting to email them to others”.

One participant said that he did not save any articles

because he knew he would be able to access any of these

articles through Google. Whether the users thought surprising results were

interesting or useful: Participants mentioned interestingness

more than usefulness when describing the value of

surprising articles, although they did touch on both. One

participant said an article about marijuana causing vomiting

disorder was interesting to know but was not immediately

useful to his life since he did was not a marijuana user.

Another participant found surprising an article on how to

read below the eyes instead of into the eyes to read people.

She said that the article was an interestingly new kind of

interpersonal skill for her, although she was not likely to

user the technique herself. One mentioned an article about a

research finding on a new mechanism for memory

formation. He said he would have never learned this

without the opportunity to participate in our study. He

favored the chance to gain such esoteric knowledge, even if

it was unlikely to be directly relevant to his daily life.

Another participant mentioned that she “got into” an

analysis saying that cigarette manufacturers had increased

the level of nicotine in cigarettes by 11% over the recent

seven-year period. She enjoyed reading the story and being

aware of that fact, even though she did not smoke. Very

few participants said those surprising documents were

useful. The few useful and surprising documents included

one suggesting the best time to take a nap during work, as

well as one revealing a link between a sleeping aid and

blood sugar level (useful to the participant’s father).

There were some surprising articles that the participants felt

could be useful in the future, but not at this moment. One

example was saying that young children living in big cities

are at increased risk for brain inflammation and

neurodegenerative changes: this participant had relatives

with children and planned to have his own in the future.

One point that is noteworthy is that several participants did

not distinguish well the two concepts of usefulness and

interestingness. Once asked, they started to think about

them separately. One participant said “most of the articles

that were interesting were actually useful to me. Because

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 9

Page 10: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

they were useful they were interesting”. To these

participants, usefulness made information interesting.

DISCUSSION AND CONCLUSION

We have explored whether our computational measures of

surprise correlate with user-perceived surprise. We

presented high and low levels of surprise to users in our

study of computational serendipity, based on each of our

approaches to modelling surprise: MI and KL. We found

that for the MI approach, as expected, the higher

computational scores attracted more clicks, more saved

articles, and received both higher surprise and higher

serendipity ratings. Our interpretation of these results is that

topic co-occurrence likelihood serves as a good predictor of

the level of surprise in health-related documents. Rarer

topic combinations are more likely to contain surprising

content, such as the co-occurrence example of dental

hygiene and breast cancer given by one participant.

However, for the KL approach, the lower surprise scores

consistently outperformed the higher ones in participants’

ratings of surprise. The essence of the KL approach is to

use the KL-divergence to measure how far the content of an

article is from a typical article in a corpus. When we

manually checked into the content of those “atypical”

articles, we found a number of articles on policy, insurance,

conference announcements, management, etc. Such atypical

content captured peripheral themes in the sub-corpus, which

turned out not to be a good proxy for user surprise. Many

participants showed indifference to these themes. “Typical”

articles closer to the centroid, on the other hand, contained

those more likely to elicit feelings of surprise and value.

The user perceived surprise ratings were moderate for the

Baseline (random) group. A possible explanation is that the

nature of any news story, including health news, has some

surprising element at least to some people. Our dataset is

quite diverse, meaning that randomly picked articles may

possibly surprise users. In addition, participants always felt

surprise about why some articles were recommended given

their selected topic preference, such as articles on swine flu

when they chose flu; news about herbal tea when they

selected diabetes. Their surprise was because of a topic that

did not match of their preference, or disagreement with a

label assignment to an article, but not because of the

content of the article. When considering whether the

surprising article was also valuable, reflected by the

serendipity ratings, the Baseline group dropped to the

bottom of the five groups. This means that the articles

presented based on computational models of expectations

and surprise were much more positively received than those

based on random selection. The power of computational

surprise does not lie in a metric of unexpectedness alone,

but instead in finding valued surprises: serendipity.

Surprising content can be unknown to a user, such as the

example of obesity perspectives in Christians and Muslims.

It could also come from unconventional or bold ideas, such

as the head transplant surgery. Surprising content also

confronts participants’ common sense, such as the article

about bullies and depression. From the user’s perspective

surprise stemmed from novelty, boldness, and contradiction.

From the corpus perspective, surprise arose from rare co-

occurrences, such as obesity and religion, as well as

atypical content, such as head transplants and the

depression of bullies.

Our study asked users whether recommended articles

sparked a new area of interest. Participants were quite

conservative in giving a yes to this question, when

compared to the questions on surprise and value. During the

interview, some participants mentioned that they clicked on

articles out of curiosity or they felt curious to learn more

after reading an article. This feeling of curiosity is desirable

in information-seeking behavior since it plays an essential

role in exploration [8] and the divergent process described

by Campos and Figueiredo [14]. By contrast, some short-

term surprise may be less desirable if it does not lead to

open-ended discovery.

Surprise modeling, as a component of serendipity

prediction, heavily relies on natural language understanding

technologies, which are an active research topic and

ongoing challenge in the text mining field. Both of our

approaches, MI and the KL, are fundamentally similar in

that they reduce textual information to a vector-based

topic/theme distribution. This representation has provided

several benefits such as a more structured representation for

algorithmic processing such as likelihood and similarity

calculations, and text understanding at a high level of

abstraction. These representations work well for co-

occurrence and association based surprise, but not as well

for content-based surprise. For the content-based surprise,

more advanced natural language processing techniques are

needed before deeper content-based expectations can be

modeled. In addition to a coarse topic, such as ti in Equation

1, does not always capture the nuanced content that may

generate the surprise feeling. Finer-grained themes, such as

zi in Equation 2, may lead to peripheral latent themes that

loses the main stream information of a corpus. How to

capture expectations across these different levels of

granularity is a question for future research.

This study presents a framework that models the concept of

serendipity as a combination of surprise and value. The

framework was implemented using two computational

approaches to predicting user surprise, which were then

evaluated in a user study. Our results show that the MI,

approach based on topic co-occurrence outperformed the

KL approach and our random baseline in predicting when

users would rate a document as surprising and serendipitous.

As to the broader impact, This work addresses a core

problem of accuracy-oriented search and recommender

systems. Serendipitous retrieval has the potential to

transform the way digital systems deliver information by

shifting from reinforcing similar information to facilitating

unexpected discoveries. This will provide users with

expanded access to information that is surprising, yet

beneficial. This to a variety of domains that can benefit

from such a model.

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 10

Page 11: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

REFERENCES

1. Panagiotis Adamopoulos and Alexander Tuzhilin. 2015.

On unexpectedness in recommender systems: Or how to

better expect the unexpected. ACM Transactions on

Intelligent Systems and Technology (TIST) 5, 4

(2015),54.

2. Naresh Kumar Agarwal. 2015. Towards a definition of

serendipity in information behaviour. Information

Research: An International Electronic Journal 20,

3(2015), n3.

3. Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson,

and Samuel Ieong. 2009. Diversifying search results. In

Proceedings of the 2nd, ACM International Conference

on Web Search and Data Mining. ACM, 5–14.

4. Eric Alexander, Joe Kohlmann, Robin Valenza, Michael

Witmore, and Michael Gleicher. 2014. Serendip: Topic

model-driven visual exploration of text corpora. In IEEE

Conference on Visual Analytics Science and Technology

(VAST),173-182.

5. Paul André, Jaime Teevan, and Susan T Dumais. 2009.

From x-rays to silly putty via Uranus: serendipity and its

role in web search. In Proceedings of the SIGCHI

Conference on Human Factors in Computing Systems.

ACM, 2033–2036.

6. Eytan Bakshy, Solomon Messing, and Lada A Adamic.

2015. Exposure to ideologically diverse news and

opinion on Facebook. Science 348, 6239 (2015), 1130–

1132.

7. Elinor Barber and Robert K Merton. 2004. The travels

and adventures of serendipity. A Study in Sociological

Semantics and the Sociology of Science. Princeton

(2004).

8. Daniel E Berlyne. 1966. Curiosity and exploration.

Science 153, 3731 (1966), 25–33.

9. David M Blei. 2012. Probabilistic topic models.

Communications of the ACM 55, 4 (2012), 77–84.

10. David M Blei, Andrew Y Ng, and Michael I Jordan.

2003. Latent dirichlet allocation. Journal of Machine

Learning Research 3, Jan (2003), 993–1022.

11. Pia Borlund. 2003. The concept of relevance in IR.

Journal of the Association for Information Science and

Technology 54, 10 (2003), 913–925.

12. Gerlof Bouma. (2009). Normalized (pointwise) mutual

information in collocation extraction. In Proceedings of

the Conference of the German Society for

Computational Linguistics and Language Technology.

31–40.

13. Jacquelyn Burkell, 2012. Encountering New Music: Are

Recommender Systems a Help or a Hindrance? In

Proceedings of Serendipity, Chance, and the

Opportunistic Discovery of Information Research

(SCORE) Workshop. April 28- May 1, Montreal,

Canada.

14. José Campos and Antonio Dias de Figueiredo. 2002.

Programming for serendipity. In Proceedings of the

AAAI Fall Symposium on Chance Discovery–The

Discovery and Management of Chance Events. AAAI,

48-60.

15. Jaime Carbonell and Jade Goldstein. 1998. The use of

MMR, diversity-based re-ranking for reordering

documents and producing summaries. In Proceedings of

the 21st. Annual International ACM SIGIR Conference

on Research and Development in Information Retrieval.

ACM, 335–336.

16. Charles LA Clarke, Maheedhar Kolla, Gordon

VCormack, Olga Vechtomova, Azin Ashkan, Stefan

Büttcher, and Ian MacKinnon. 2008. Novelty and

diversity in information retrieval evaluation. In

Proceedings of the 31st. Annual International ACM

SIGIR Conference on Research and Development in

Information Retrieval. ACM, 659–666.

17. Edward De Bono. 1990. Lateral thinking for managers.

Penguin Books.

18. Brenda Dervin and Michael Nilan. 1986. Information

needs and uses. Annual Review of Information Science

and Technology 21 (1986), 3–33.

19. Kazjon Grace and Mary Lou Maher. 2016. Surprise-

Triggered Reformulation of Design Goals, In

Proceedings of AAAI. 3726–3732

20. Kazjon Grace, Mary L. Maher, Douglas Fisher, and

Katherine Brady, 2015. A data-intensive approach to

predicting creative designs based on novelty, value and

surprise. International Journal of Design, Creativity and

Innovation 3(3-4):125-147.

21. Kazjon Grace, Mary Lou Maher, David Wilson, and

Nadia Najjar. 2016. Personalised Specific Curiosity for

Computational Design Systems, In Proceedings of

Design Computing and Cognition. Springer, 593-610.

22. Ted Gup. 1998. Technology and the end of serendipity.

The Education Digest 63, 7 (1998), 48

23. Jonathan L Herlocker, Joseph A Konstan, Loren G

Terveen, and John T Riedl. 2004. Evaluating

collaborative filtering recommender systems. ACM

Transactions on Information Systems (TOIS) 22, 1

(2004), 5–53.

24. Leo Iaquinta, Marco De Gemmis, Pasquale Lops,

Giovanni Semeraro, Michele Filannino, and Piero

Molino. 2008. Introducing serendipity in a content-based

recommender system. In Proceeding of the 8th. IEEE

International Conference on Hybrid Intelligent Systems

(HIS). IEEE, 168–173.

25. Laurent Itti, and Pierre F. Baldi., 2006. Bayesian

surprise attracts human attention. In Advances in neural

information processing systems, 547-554.

26. Marius Kaminskas and Derek Bridge. 2016. Diversity,

Serendipity, Novelty, and Coverage: A Survey and

Empirical Analysis of Beyond-Accuracy Objectives in

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 11

Page 12: Surprise Me If You Can: Serendipity in Health Informationstatic.tongtianta.site/paper_pdf/7247e7d2-4bfa-11e9-b101... · 2019. 3. 21. · “filter bubbles” [40] and “blind spots”

Recommender Systems. ACM Transactions on

Interactive Intelligent Systems (TiiS) 7, 1 (2016), 2.

27. Diane Kelly and Jaime Teevan. 2003. Implicit feedback

for inferring user preference: a bibliography. In ACM

SIGIR Forum, Vol. 37(2). ACM, 18–28.

28. Shoshana Loeb. 1992. Architecting personalized

delivery of multimedia information. Communications of

the ACM 35, 12 (1992), 39–47.

29. Irene Lopatovska and Hartmut B Mokros. 2008.

Willingness to pay and experienced utility as measures

of affective value of information objects: Users’

accounts. Information Processing and Management 44,

1 (2008), 92–104.

30. S Makri, EG Toms, L McCay-Peet, and A Blandford.

2011. Encouraging serendipity in interactive systems: an

introduction. In Proceedings of the 13th. IFIP TC13

International Conference on Human-Computer

Interaction, Springer-Verlag, Lisbon, 728-729.

31. Harry Markowitz. 1952. Portfolio selection. The Journal

of Finance 7, 1 (1952), 77–91.

32. Lori McCay-Peet and Elaine Toms. 2011. Measuring the

dimensions of serendipity in digital environments.

Information Research: An International Electronic

Journal 16, 3 (2011), n3.

33. Lori McCay-Peet, Elaine G Toms, and Anabel Quan-

Haase. 2016. SEADE Workshop Proposal-The

Serendipity Factor: Evaluating the Affordances of

Digital Environments. In Proceedings of the 2016 ACM

on Conference on Human Information Interaction and

Retrieval. ACM, 341–343.

34. William Mckeen. 2006. The endangered joy of

serendipity. The modern world makes it harder to

discover what you didn’t know you were looking for. St

Petersbuirg Times, March 26 (2006). Retrieved July 02,

2016 from

http://www.sptimes.com/2006/03/26/news_pf/Perspectiv

e/The_endangered_joy_of.html

35. Francis L Miksa. 1992. Library and information science:

two paradigms. Conceptions of Library and Information

Science: Historical, Empirical and Theoretical

Perspectives. Londres, Los Angeles: Taylor Graham

(1992), 229–252.

36. Javed Mostafa, Snehasis Mukhopadhyay, and Mathew

Palakal. 2003. Simulation studies of different

dimensions of users’ interests and their impact on user

modeling and information filtering. Information

Retrieval 6, 2 (2003),199–223.

37. Tomoko Murakami, Koichiro Mori, and Ryohei Orihara.

2007. Metrics for evaluating the serendipity of

recommendation lists. In Annual Conference of the

Japanese Society for Artificial Intelligence. Springer,

40–46.

38. Xi Niu and Fakhri Abbas. 2017. A Framework for

Computational Serendipity. In Adjunct Publication of

the 25th. Conference on User Modeling, Adaptation and

Personalization. ACM, 360–363.

39. Kensuke Onuma, Hanghang Tong, and Christos

Faloutsos. 2009. TANGENT: a novel, ’Surprise me’,

recommendation algorithm. In Proceedings of the 15th.

ACM SIGKDD International Conference on Knowledge

Discovery and Data Mining. ACM, 657–666.

40. Eli Pariser. 2011.The filter bubble: what the internet is

hiding from you. Londong: Viking (2011).

41. Victoria L Rubin, Jacquelyn Burkell, and Anabel Quan-

Haase. 2011. Facets of serendipity in everyday chance

encounters: a grounded theory approach to blog analysis.

Information Research 16, 3 (2011).

42. Yue Shi, Xiaoxue Zhao, Jun Wang, Martha Larson, and

Alan Hanjalic. 2012. Adaptive diversification of

recommendation results via latent factor portfolio. In

Proceedings of the 35th. International ACM SIGIR

Conference on Research and Development in

Information Retrieval. ACM, 175–184.

43. Jennifer Thom-Santelli. 2007. Mobile social software:

Facilitating serendipity or encouraging homogeneity?

IEEE Pervasive Computing 6, 3 (2007), 46–51.

44. Alice Thudt, Uta Hinrichs, and Sheelagh Carpendale.

2012. The ohemian bookshelf: supporting serendipitous

book discoveries through information visualization.

In Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems,1461-1470.

45. Saúl Vargas and Pablo Castells. 2014. Improving sales

diversity by recommending users to items. In

Proceedings of the 8th. ACM Conference on

Recommender Systems. ACM, 145–152.

46. Jun Wang and Jianhan Zhu. 2009. Portfolio theory of

information retrieval. In Proceedings of the 32nd.

International ACM SIGIR Conference on Research and

Development in Information Retrieval. ACM, 115–122.

47. Martin Wiesner and Daniel Pfeifer. 2014. Health

recommender systems: concepts, requirements, technical

basics and challenges. International Journal of

Environmental Eesearch and Public Health 11, 3

(2014), 2580–2607.

48. Yuan Cao Zhang, Diarmuid Ó Séaghdha, Daniele

Quercia, and Tamas Jambor. 2012. Auralist: introducing

serendipity into music recommendation. In Proceedings

of the 5th. ACM International Conference on Web

Search and Data Mining. ACM, 13–22.

49. Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan,

and Georg Lausen. 2005. Improving recommendation

lists through topic diversification. In Proceedings of

the14th. International Conference on World Wide Web.

ACM, 22–32.

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 23 Page 12