developers’ responses to app review feedback a study of ... · providing responses to reviews has...
TRANSCRIPT
Appears in the proceedings of COIN 2017 workshop held in conjunction with AAMAS 2017
Developers’ responses to app review feedback – A study
of communication norms in app development
Bastin Tony Roy Savarimuthua, Sherlock A. Licorish
a, Manjula Devananda
a, Georgia
Greenhelda, Virginia Dignum
b, Frank Dignum
c
a - Department of Information Science, University of Otago, Dunedin, New Zealand
b - Faculty of Technology, Policy and Management, TU Delft, The Netherlands
c - Department of Information and Computing Sciences, Utrecht University, The Netherlands
{tony.savarimuthu, sherlock.licorish, manjula.devananda,
georgia.greenheld}@otago.ac.nz, [email protected],
Abstract. Norms are general expectations of behavior in societies. Huge amount of
computer-mediated interaction data available in the social media domain provides an
opportunity to extract and study communication norms, both to understand their prev-
alence and to make informed decisions about adopting them. While interactions in
social media platforms such as Twitter and Facebook have been widely studied, only
recently researchers have started examining app reviews provided by users and the
responses provided by developers in the domain of app development. In this vein, a
lot of attention has been devoted to study the nature of user reviews, however, little is
known about developer responses to such reviews. Additionally, no other prior work
has scrutinized the nature of communication norms in this domain. Towards address-
ing these gaps, this work pursues three objectives using a dataset comprising user
reviews and developer responses from Google’s top-20 apps used to track running
with a total of 24,407 reviews and 2,668 responses. First, based on prior literature in
computer-mediated interactions, the study identifies 12 norms in responses provided
by developers in three categories (obligation norms, prohibition norms and domain-
specific response norms). Second, it scrutinizes the awareness and adoption of these
norms. Third, based on the results obtained, this study identifies the need for creating
a response recommendation system that generates responses to user reviews either
automatically, or with some help from the developers. The proposed response rec-
ommendation system is a normative system that will generate responses that abide by
the norms identified in this work, and will also monitor potential norm violations (if
the responses were to be modified by the developers). Development of such a system
forms the focus of future work.
Keywords: app reviews, norms, mining, developer responses, communication norms
1 Introduction
App reviews contain valuable information that can inform developers and users about
issues that need to be fixed and enhancements and new features required [1]. The
availability of ‘big-data’ comprising users’ reviews and developers’ responses has
provided an opportunity to study the type of communication norms (patterns of ex-
cepted social behavior in communication exchanges, also known as communication
etiquettes [2, 3]) among involved parties (users and developers), to understand their
prevalence and to make informed decisions about adopting them. While several stud-
ies have investigated the nature of app reviews submitted by users [1, 4, 7], only a few
have investigated the nature of responses provided to these reviews by app develop-
ment firms [5, 6]. Additionally, these works have not investigated response provision
from a normative perspective. This work aims to bridge this gap by investigating the
patterns of developers’ responses to users’ reviews.
Fig. 1. Sample reviews from Fitbit app
Providing responses to reviews has only been a recent development, with Google Play
enabling this service from 2013 [5, 6] (see examples of reviews and responses in Fig-
ure 1). This new functionality provides opportunities for researchers exploring the
nature of responses provided, particularly from a norms perspective. It is important to
study the presence of norms and their adoption levels in the new domain of app re-
views for two reasons. First, insights on prevalent communication norms between
users and developers can be inferred. In this work, we evaluate these aspects using
two metrics: norm awareness and norm adoption (discussed in the next section). Sec-
ond, the study helps to identify requirements for response recommendation systems,
proposing responses to reviews. Such systems would solve the problem of developers
having to respond to voluminous reviews [7]. Some apps attract tens of thousands of
reviews each day (e.g., Pokemon Go received 40,000 reviews per day between its
release and January 2017), and it is cumbersome and expensive for humans to re-
spond to each of these reviews. One solution is to develop a response recommenda-
tion system that can generate appropriate responses based on historical data (i.e., re-
views and responses already available). The development of such a system needs the
knowledge about prevailing communication norms. Once the system is aware of the
prevailing communication norms it could respond appropriately considering such
norms (i.e., a normative system).
This paper thus addresses three objectives: (1) it identifies a set of communication
norms that can be identified in developer responses to user reviews, (2) it investigates
norm awareness and adoption, and (3) it discusses the implications of the results for
developing a normative response recommendation system. This paper is structured as
follows. Section 2 provides background and related work considering communication
norms. This section also introduces the domain of app development and how prior
work has mined norms from large datasets including legal texts and business con-
tracts. Section 3 presents the norms investigated in this study. Section 4 provides an
overview of the adopted methodology, and Section 5 presents the results. Section 6
discusses the utility of the results in the development of a normative response recom-
mendation system. Finally, Section 7 concludes the paper.
2 Background and related work
In communication studies, there has been a huge body of work focusing on the widely
used email communication platforms [2, 3, 8-10] . From a norms perspective, re-
searchers have focused on extracting patterns of common behaviors in email respons-
es [3, 9, 10]. For example, the work of Kooti et al. [9] mined over 16 billion email
interactions and noted that about 90% of emails are answered within a day. Research
reported in [3, 10] focuses on different forms of greeting and closing norms in email
messages. Beyond the email domain, researchers have focused on communication
patterns in social-media platforms such as Twitter [11] and Facebook [12]. Kooti et
al. [11] investigated the emergence and adoption of the retweet norm (i.e., the use of
the ‘RT’ symbol for retweeting) when compared to the other proposals for retweeting.
The work of Pérez-Sabater [12] investigated language conventions (greeting, closing,
etc.) used by English and non-English speakers in Facebook posts.
In the context of software development, the domain of study of this work, there has
been research investigating email discussions. For example, the work of Squire pre-
sents an overview of research work that uses email archives for various purposes in-
cluding decision-making in software development [13]. Beyond emails, researchers
have also investigated how users expressed their opinion on Twitter [14] about soft-
ware they use (both desktop and mobile apps). Only recently, researchers have started
investigating app reviews from a content-analysis viewpoint [1, 4]; we examine this
body of work closely in the following sub-section.
2.1 User reviews and developer responses in app development
The app development domain has been popularized due to the rapid adoption of smart
phones. After an app is released, users provide reviews that contain valuable infor-
mation for other users and the developer(s) of the app. While prospective users read
reviews to evaluate the suitability of apps for their needs, developers use these re-
views for enhancing their app. The mechanism allowing users to provide reviews has
been long implemented, however, the feature allowing developers to respond to re-
views has only been enabled in app stores such as Google Play from 2013 (and this
feature will be available for iOS only in the later part of 20171). Since the provision-
ing of responses to reviews is relatively new, we believe norms in this domain might
be in their formative stages, and hence, it is important to identify those norms so as to
provide appropriate responses that conform to these norms.
Researchers have investigated the nature of reviews provided by users [1, 4, 7].
Maalej et al. [1] for instance have developed an approach that identifies whether re-
views contain a bug report, a new feature request or a praise. The work of Chen et al.
[4] proposes an approach for identifying informative reviews from noisy data and
clustering reviews into relevant groups. Panichella et al. [7] have identified categories
of users’ requests (e.g., information giving, seeking) from reviews, thus, extending the
work in [1]. However, developers’ responses to users’ reviews have not received a lot
of attention. Developers’ responses to users’ reviews are important from a customer
relationship management point of view since they: (a) indicate to the users that their
feedback is listened to, appreciated, and appropriate action is taken to address their
concerns, (b) provide solutions to problems, (c) help avoid losing customers [15]
(reducing churn) which is wide-spread in the app community due to a large number of
choices available, and (d) increase the reputation of brands and attract more customers
(e.g., in the hotel industry, up to 60% increase in room sales as a result of providing
responses (when compared to no responses) has been reported [16]). A few studies
have investigated responses to app reviews [5, 6]. The work of McIlroy [5] noted that
providing responses had a positive impact on user rating (about 38% attracting im-
proved rating). The work of Bailey [6] investigated how developers accommodated
user reviews (i.e., what actions developers undertook). However, the actions of devel-
opers weren’t explicitly communicated to the users as the work investigated apps in
iOS store which did not have the response feature for direct communication. Never-
theless, this study showed the usefulness of reviews for informing developers about
their product deficiencies. We note these works have not scrutinized communication
norms in users-developers interactions, as we do in this paper.
2.2 Mining norms in normative multi-agent systems
The perspective of norms that we adopt in this work has been borrowed from the
normative multi-agent systems’ research work [17], where norms are treated as prohi-
1 https://techcrunch.com/2017/01/24/apple-will-finally-let-developers-respond-to-app-store-reviews/
bitions, obligations and permissions. Researchers have identified applications of these
types of norms in many different domains [18]. For example, Gao and Singh have
investigated a corpus of business contracts [19] that contain explicit specifications of
prohibitions and obligations. Hashmi has proposed a methodology for extracting legal
norms from regulatory documents [20]. Sadiq and Governatori [21] have discussed
how various works have investigated norm compliance by investigating the extent to
which business processes followed in organizations confirmed to norms (contractual
obligations or legal requirements). Norms from textual documents in the abovemen-
tioned works (e.g., contracts [19]) are mined by the presence of explicit deontic mo-
dalities such as must and must not in sentences. For example, the phrase “One must
pay his/her bills on time” would indicate the presence of an obligation. Researchers
have also explored mining norms of software development from open source software
repositories [22-24], applying and extending techniques previously developed for use
in the multi-agent systems domain. For example, the work of Avery [22] scrutinizes
the presence of different types of norms (prohibitions and obligations) in bug reports
submitted by users and developers. Despite these developments, there is a gap in ex-
tracting and studying communication norms between users and developers in the
software development context, a gap we address in this study. In the next section, we
present the norms investigated in this work.
3 Norms investigated
We investigate three types of norms in this work: (1) domain-specific response norms
(norms 1-4 in Table 1), (2) obligation norms (norms 5-11 in Table 1), and (3) prohibi-
tion norms (norm 12 in Table 1). A brief description of all the 12 norms investigated
in this work is provided in Table 1.
Domain-specific response norms - The domain-specific response norms quantify the
aggregate patterns in the domain of investigation (i.e., for all apps). These include
response rate to reviews, timeliness of responses, response length and review modifi-
cation rate (after the response is provided by the developer).
Obligation norms - These norms describe communication patterns that are expected
to be followed. For example, a response is expected to contain a greeting [3]. We
divide obligation norms into three parts – (1) the social etiquette norms, (2) direct
help norms, and (3) extended help norms. These have been divided into three parts
based on prior work on social etiquette norms in communications [2, 3, 9, 10], and
content-analysis based works that have scrutinized the nature of responses provided to
customers (direct help and extended help in other domains such as hotel reviews [15,
25, 26] and film reviews [27]). We describe these three aspects below.
1) Social etiquette norms correspond to social aspects of communication, including
the use of greeting words such as ‘Hi’ and ‘Dear’ [3, 10], the use of customer name in
responses (personalization) [3, 10], appreciating feedback provided [28], apologizing
for a complaint about the app [29], and sign-off (or closing) using developer’s name
or organization’s name [3, 10]. These are norms 5-9 in Table 1. Figure 1 shows two
sample responses. The response on the right conforms to the five norms mentioned
above.
Table 1. Norms investigated in this work and their descriptions
2) Direct help norms correspond to the help offered towards solving an issue faced by
the customer (norms 10a-c), or informing the customer that a solution will be availa-
ID Norm Description (in the question form)
Category 1 - Domain specific response norms
1 Response rate What proportion of reviews received a response?
2 Timeliness of responses How long (in days) does it take developers to respond to
reviews?
3 Response length What is the length of the reviews (in characters)?
4 Response modification rate What proportions of reviews are modified after receiving
a response?
Category 2 - Obligation norms
(5-9) Social Etiquette norms
5 Salutation (opening) Do responses contain appropriate salutation (e.g., Hi and
Dear)?
6 Personalization Do responses include user’s name?
7 Appreciation Do responses exhibit appreciation for feedback provided?
8 Apology Do responses contain apologies for the inconveniences
experienced by the users?
9 Personalized sign-off Do responses have a personalized sign-off (e.g., developer
name or organization name)?
10 Direct help norms
a) Provides complete or partial
solution
Do responses provide solutions to the users’ problems
with the app?
b) Indicates that solution will be
available in the future or
provides no solution
Do responses indicate that a solution will be available in
the future or that a solution won’t be available (for what-
ever reason)?
c) Provides additional infor-
mation/asks question/provides
advise
Do responses contain additional information for the user,
or offer advice or asks questions and provide answers?
11 Extended help norms
a) Details in a webpage (URL) Does the review provide a URL of a web page for seeking
additional help?
b) Email Does the review contain an email address for seeking
additional help?
c) Phone Does the review contain a phone number for seeking
additional help?
Category 3 - Prohibition norm
12 Use of canned responses
(i.e., same response to reviews)
Do developers use canned responses?
ble in the future. A direct help norm addresses a problem in full or part (norm 10a), or
provide information about whether the solution will be available in the future (norm
10b) and/or offers helpful suggestions for the user (norm 10c). These norms are in-
spired by the findings of researchers in other domains (e.g., hotel reviews [15, 25, 26]
and film reviews [27]) who have studied the nature of reviews. The responses in Fig-
ure 1 provide direct help to the users.
3) Extended help norms (norms 11a-c) offer additional help to the customer. This help
may be in conjunction with the direct help provided, or may be the only type of help
offered. Literature on customer support channels notes there are numerous ways to
listen to the customer such as phone, email, discussion boards and online chat. In fact,
social-media platforms such as Twitter [30] also provide an opportunity to receive
users’ comments and facilitate service providers’ responses. Content analysis of re-
sponses revealed the use of three channels for obtaining further help. These are: (1) a
URL to a web page that can be used to communicate with the developers such as
filling a web-based form (norm 11a), (2) an email address to write to the developers
to obtain further help (norm 11b), and (3) a phone number to speak to a support per-
son (norm 11c). The responses in Figure 1 contain URLs to the support web pages.
Prohibition norms – These proscribe certain actions or events. The prohibition norm
that we investigate is the norm against using canned responses2 that are perceived to
be too impersonal (norm 12).
While some of the identified norms have been shown to hold in other relevant com-
puter-mediated communication domains (e.g., social etiquette norms in email, and
direct help norms in hotels’ responses), we investigate whether these norms hold in
the app review domain involving interactions between users and developers. We scru-
tinize this using two metrics – norm awareness and norm adoption. Norm awareness
identifies whether the developers of an app are aware of a particular norm. If at least
one of the responses from an app shows that the developer is aware of a norm, then
the app is considered to be norm aware. Norm adoption is the extent to which a norm
is adopted in the responses provided, taking into account the number of times the
norm is adhered to in a situation where it is expected to hold. For example, if there are
20 responses provided by an app, and if four of these contain greeting words, then the
app developers are assumed to be aware of the greeting norm (norm 1), and the norm
adoption is 20% (4 out of 20).
4 Methodology
This study investigated norms in one particular domain of apps – the top-20 apps on
Google Play that tracked running (i.e., running apps). The apps considered in this
2 Examples of prohibitions of canned responses are listed in the guidelines for reviews by Amazon
and TripAdvisor (refer to: https://www.amazon.com/gp/community-help/official-comment-
guidelines and http://www.reviewtrackers.com/ultimate-guide-responding-tripadvisor-reviews/)
study are shown in Table 2. A Java program was implemented to obtain the reviews
and responses for these apps as ranked by Google on 20th
November 2016. We ob-
tained a total of 24,408 reviews of which 2,668 had responses. The number of reviews
and responses for each app are shown in columns 3 and 4 of Table 2. We analyzed the
2,668 responses to study the nature of response norms across the 20-apps. To identify
response norms, we initially automated the identification process for norms 1-9 and
12 using customized SQL queries, utilizing keywords identified by the first author (in
consultation with the second and third authors), based on previous research [2, 3, 8-
10, 26], and also based on content analysis pursued on the dataset (use of hi, hey,
dear, hello etc. for the greeting norms). While norms 1-9 and 12 could be fully auto-
mated using a keyword based approach, norms 10 and 11 required a manual interpre-
tation due to the nature of the arguments presented in natural language.
In the process of manually interpreting the presence of norms 10 and 11, the presence
of norms 5-9 were also validated. Note that the other norms (norms 1-4 and 12) do not
require manual verification since the automation captures the metrics for these norms
accurately (i.e., norms 1-4 report aggregated values and norm 12 checks for duplicate
responses). To facilitate the verification process, we developed a tool which was used
by two evaluators to indicate whether a review contains a particular norm (presence or
absence of a norm). The first and the third author of this study divided the data into
two parts to manually label the data for the presence of norms 5-11. For data analysis,
we followed the interpretation guidelines for qualitative data [31]. The first author
developed guidelines for coding the data, which was reviewed by the second author.
After discussions between the two coders on how to interpret the presence of these
norms, a sample of 20 developers’ responses was coded and the results were dis-
cussed and adjustments were made to the interpretation guidelines. To formally eval-
uate the coding strength, a sample of 50 developers’ responses was then coded by
both investigators. The inter-rater reliability was computed using Cohen’s kappa sta-
tistic measure which yielded a value of 0.94 for the two raters, which shows an almost
perfect agreement (a value between 0.81 and 1 is considered to be almost perfect)
[32]. The next section presents the results of our analysis of developers’ responses.
5 Results
This section presents results for the three types of norms. The discussion of these
results is presented in the subsequent section.
5.1 Domain specific response norms
The results for the domain specific response norms are given in Table 2.
Response rate (norm 1) - The overall response rate across the apps is 11%. This figure
in closer to the previous finding of 13.8% by other researchers [5]. It can be observed
that the response rate varies across the top-20 apps (min=0, max=62, S.D=22), as
shown in column 5 of Table 2.
Timeliness (norm 2) – We observed that 81% of responses across all apps were pro-
vided within 7 days (see column 6 of Table 2) and 75% responses were provided
within 4 days. However, only 21% of the responses are provided within the first day.
This is in contrast to email communication norms where 90% of the responses were
provided within the first day [9]. App
ID
App Name No. of
re-
views
No. of
respons-
es
Re-
sponse
rate
Timeli-
ness (<7
days)
Response
length
Review
modifica-
tion rate
1 Adidas train &
run
805 252 31% 83% 293 12%
2 C25K® - 5K
Running Trainer
1816 0 0% 0% 0 0%
3 Couch to 5K 732 4 1% 83% 118 0%
4 Endomondo -
Running & Walk-
ing
1858 200 11% 84% 240 15.8%
5 FITAPP - Run-
ning Walking
Fitness
196 121 62% 70% 151 0.6%
6 Fitbit 1161 339 29% 88% 278 12.4%
7 Fitso Running &
Fitness App
389 83 21% 61% 240 11.4%
8 Garmin Connect
Mobile
1263 150 12% 53% 217 39.8%
9 Google Fit -
Fitness Tracking
1473 45 3% 82% 337 17.6%
10 Nike+ Run Club 1402 0 0% 0% 0 0%
11 Run With Map
My Run
2079 0 0% 0% 0 0%
12 Runkeeper - GPS
Track Run Walk
2006 8 0% 44% 204 55.6%
13 Running Distance
Tracker
1150 709 62% 95% 55 2.3%
14 Running For Weight Loss
709 58 8% 90% 215 3%
15 Runtastic Run-
ning & Fitness
3146 28 1% 100% 132 0%
16 S Health 886 607 69% 89% 239 11.2%
17 Sportractive GPS Running App
94 10 11% 100% 122 10.5%
18 Sports Tracker
Running Cycling
1274 13 1% 68% 143 0%
19 Strava Running
and Cycling GPS
1727 0 0% 0% 0 0%
20 Zombies, Run!
(Free)
241 41 17% 96% 185 4.4%
Overall 24407 2668 11% 81% 159 12%
Table 1. Results of the domain specific response norms for the top-20 running apps
Response length (norm 3) – The average response length across the apps is 159 char-
acters (min=55, max=337, S.D=105), which is less than half of the allowed response
length of 350 characters. It can be observed from column 7 of Table 2 that some apps
(app 1 and app 9) use the response length allowed more effectively (e.g., the reviews
contained detailed information) than others (with average response lengths of 293 and
337 respectively).
Review modification (norm 4) – After the responses are received users have modified
12% of the original reviews they had provided (min=0, max=56, S.D=15). This modi-
fication included changes to the reviews and the review rating. We noticed that the
average rating for reviews that were modified was higher (average=3.12) than the
ones that weren’t modified (average=2.98), pointing towards the utility of the re-
sponses in satisfying users and enhancing their feelings about apps, ultimately result-
ing in apps attracting higher ratings.
5.2 Obligation norms
The results for the social etiquette norms, direct help norms and extended help norms
(norms 5-11) are presented below.
Fig. 2. Results for norm awareness vs. adoption in 20 apps
Social etiquette norms (norms 5-9) - We present the results based on two metrics –
norm awareness and norm adoption. Results for norm awareness show that overall,
developers of 70% of the apps are aware of the greeting norm (norm 5) and develop-
ers of 65% of the apps are aware of the personalization norm (norm 6). Eighty percent
(80%) of the developers’ responses towards reviews for the apps showed evidence for
the awareness of the appreciation norm (norm 7). Awareness of apology norm was
seen in 65% of the apps responses (norm 8). However, only 30% of the apps respons-
es showed evidence for sign-off norm (norm 11). The results for norm adoption show
that overall, 51% of responses had greetings (norm 5). The use of customers’ name
was present in 58% of responses (norm 6). In addition, 72% of responses contained
appreciation (norm 7). However, only 35% of responses to reviews that had com-
plaints contained apology (norm 8). Further, only 25% of the responses had a person-
alized sign-off (norm 11). The comparison for norm awareness versus adoption for
the norms is shown in Figure 2 (including results for direct and extended help norms).
The results for norms 5-11 show that while the apps developers are aware of these
norms (average of 65%), their adoption rate is low (average of 49%).
Direct help norms - Figure 3 shows the results for the nature of direct help messages
(norm 9) across apps for three categories – (1) solution available, (2) solution una-
vailable or will be available in the future, and (3) other helpful information. It can be
observed that out of 16 apps that provided responses (excluding apps 2, 10, 11 and 19
that had zero responses), 13 of them (81%) had a higher proportion of responses that
contained solution (category 1) than the other two categories (categories 2 and 3).
Only in two apps (apps 5 and 7) there were more ‘other’ help information than the
other two categories. App 9’s responses did not contain any direct help to the user.
These results show that responses in general have pointed to the solutions for issues
faced by the users. It is interesting to note that some apps have provided help messag-
es to the fullest extent possible – i.e., 100% (apps 12 and 18), although, they had re-
sponded only to a small number of reviews (8 and 13 respectively). Overall, norm
awareness for direct help messages was 75%, and norm adoption was 55% (results
shown in Figure 2).
Fig. 3. Proportion of three types of direct help norms in responses across 20 apps.
Extended help norms - Figure 4 shows the results for the provision of extended help
messages by developers across the apps (norm 11). Messages contained extended help
information to contact support staff through a web interface, an email or phone refer-
ence. It can be observed that 15 out of 16 apps that provided responses (94%) used
one of these three mechanisms for providing extended help, with 6 of them (38%)
using two forms. Web interfaces and email addresses were used by 10 apps (63%),
and the phone reference was least popular among the three options with only two apps
using the option. However, the extent to which these extended help mechanisms were
used varied. While apps 6 and 9 provided the URL for the users in more than 85% of
the messages, apps 7 and 8 provided email addresses for a similar proportion of mes-
sages. These results show that the responses in general contained helpdesk details in
an attempt to resolve the issues faced by users. Overall, norm awareness for providing
extended help messages was 75% and norm adoption was 47% (shown in Figure 2).
Fig. 4. Proportion of three types of extended help norms in responses across 20 apps.
5.3 Prohibition norms
Canned responses (norm 12) - There were about 30% (810 out of 2,668) of canned
responses (reuse of the same responses). These messages did not have any personali-
zation. Eight out of 16 apps used some form of canned responses (i.e., norm violated
in 50% of the apps), with some apps using them excessively. For example, 75% or
more of posts from apps 8 and 13 contained canned responses and 100% of app 9
responses (45 responses) were the same canned response.
6 Discussion
In this section we provide a discussion of the results presented in Section 5, particu-
larly discussing the implications of the results presented in the previous section for
developing a recommendation system that provides responses to reviews which con-
forms to norms.
a) Domain specific response norms – The results for the response rate norm (norm 1)
show that only 11% of the reviews are responded to by developers. Prior research
suggests that one of the key advantages of social media platforms for businesses is
their ability to facilitate dialogue with their customers (e.g., keeping them informed
about actions taken based on their feedback). The low response rate presents an op-
portunity to develop a response recommendation system that generates responses to
new reviews, thus improving response rates. The recommendation system can directly
post responses to reviews in straightforward cases and also provide a template of a
response for a human to complete in complex cases. While some app developers have
indeed realized the importance of responding to reviews (apps 5, 13 and 16) and have
responded to more than 60% of reviews, certain other apps have completely ignored
responding to reviews (apps 2, 10, 11 and 19). Research has shown that there is ten-
dency for apps receiving a lot of reviews [5] to ignore responding to issues raised due
to the voluminous nature of reviews. This again points towards the opportunity to
develop a response recommendation system. The result of the timeliness norm (norm
2) shows that 96% of reviews that receive responses attract responses within 7 days.
However, only 21% are responded to within a day (unlike 90% of emails responded to
within a day [9]). The delay in providing response could be because of reasons such
as the time required to reproduce and then resolve a reported issue, and the availabil-
ity of personnel. Having said that, research has shown that customer feedback needs
to be responded to at the earliest since users’ satisfaction and future repurchase inten-
tions are directly related to the time taken to respond [33]. Additionally, in the app
domain, users can easily try competitors’ apps, hence, the timeliness of response
needs improvement. A response recommendation system may be beneficial to provide
responses in a timely fashion. The average response length for a message (norm 3)
was 159 characters as opposed to 350 characters allowed. This also offers an oppor-
tunity to provide more detailed responses, which can be incorporated in generated
responses (e.g., providing extended help as a part of every message and incorporating
social etiquette norms). Our results for reviews’ modification show that only 12% of
reviews are modified after reading a response. Other researchers have noted that a
significant proportion of users increased the ratings after reading a response, up to
some 38% [5]. Hence, improving response provision rates through the proposed re-
sponse generation system is likely to improve ratings.
b) Obligation norms – The results for the social etiquette norms (norms 5-9) indicate
that norm adoption lags behind norm awareness. While app developers are aware of
these norms, they do not adopt these norms. Independent of the reasons for the failure
to adopt norms, norm adoption can be improved using a response recommendation
system that considers the social etiquettes (greeting, use of customer names, thanking,
apologizing, and signing-off). A recommendation system (or a software agent), can be
easily programmed to adhere to these norms. Direct help norms (norms 10a-c) had the
highest proportion of adoption (55%), among all norms. This is understandable given
that the main goal of providing a response is to help users to overcome the issues they
face. However, the proportion of responses that contain helpful information can be
increased further. Two approaches may be beneficial towards this end. The first is to
use appropriate machine learning approaches for identifying responses provided by
the developers in the past for issues that are similar to the ones reported in the new
review (i.e., review for which a response needs to be generated) similar to the work in
[34], that recommends auto completion phrases for creating new reviews. If issues
reported in a review match a previous review for which a response is available (e.g.,
based on a pre-determined threshold for matching features), the response generated
can be directly posted to the user. The second approach involves the creation of tem-
plate messages by developers for new issues that the users face (e.g., a solution de-
scribing a fix for a new bug that has not been reported by users before). A response
recommendation system needs to consider both of these approaches (as presented in
the basic workflow of the normative response recommendation system in Figure 5).
Extended help norms (norms 10a-c) are easy to automate since they provide generic
helpdesk information. However, these need to be customized based on the type of
support (web, email or phone) the app development firm wishes to provide.
c) Prohibition norm – While the adoption of obligation norms can be ensured during
the implementation of a response system, care should be taken to avoid violating pro-
hibition norms. It is expected that developers will be presented with a template of a
response which they can choose to modify. As a part of modifying the response, users
may violate prohibitions. When these are violated, the system should warn the devel-
oper about potential issues. For example, if the developer replaces the response sug-
gested by the system with an impersonalized canned response (norm 12), a warning
can be generated. Note that 30% of the responses were canned responses. These re-
sponses do not use any personalization (i.e., without the use of greeting, customer
name and developer sign-off). However, these elements can be easily incorporated in
a response recommendation system (i.e., through the adherence of obligation norms
5-9). Additionally, warning messages can be generated if these norms are not adhered
to when the developer posts the message. Thus, the response recommendation system
should have a module that checks for violation of both types of norms (obligations
and prohibitions) and provide appropriate warning messages to the developers.
Start
Is the similarity between the new and a old review (with a
response) greater than a certain threshold?
Fetch a new review
Create (or modify) a message template using social etiquette
normsNo
Construct a message adhering to norms(social etiquette, direct and extended help norms)
Post the response message to the user
Present the message to the developer to modify accordingly (i.e. provide
normative information – direct and/or extended help)
Is the message norm compliant?
YesEnd
NoYes
Fig. 5. Basic workflow of the normative response recommendation system
Further considerations on the normative response recommendation system – In
addition to ensuring that the messages are norm compliant as shown in Figure 5 the
system will also need to prioritize which reviews should be responded to first by the
developers (e.g., based on urgency inferred through rating, and also whether other
users face the same problem). For example if 10 users had rated the app as being poor
and 8 of them complained about a new bug and two about a known bug (for which the
template solution exists) then the system should present the response addressing the
new bug issue to the developer to make a decision due to its potential impact on mul-
tiple users. The recommendation system should also be provided with some autonomy
to post responses to certain messages without the intervention of the developers (e.g.,
all reviews that have 5 star ratings that do not have any suggestions for improvement
or questions can be thanked automatically), considering the volume of reviews posted
for the most popular apps. Additionally, if the responses generated do not meet the
threshold for automatic post, they can be presented to the developers in the form of a
ranked-ordered (priority) list for their action and approval, taking into account con-
siderations such as urgency. If a similar response has to be provided to multiple users,
rephrasing responses can be considered using existing services3 to avoid the issue of
canned responses. Although, personalization of messages using appropriate
3 http://www.gingersoftware.com/
usernames and sign-offs is likely to somewhat reduce the canned response problem.
Implementing the outlined solution forms the focus of our subsequent work.
7 Conclusion
In the domain of app development, communication norms between users and devel-
opers have rarely been investigated. This work addresses this gap by first abstracting
three categories of response norms for app reviews comprising 12 norms (4 domain
specific response norms, 7 obligation norms and 1 prohibition norm). It scrutinizes
awareness and adoption of these norms from a dataset of user reviews and developer
responses in Google Play’s top-20 running apps. The work demonstrated that there is
a gap between norm awareness and adoption across apps. App developers, despite
being aware of these norms, do not adopt the norms effectively. To improve the pro-
vision of responses to users this work proposes a normative response recommendation
system that also has the potential to improve user satisfaction, increase user ratings
and potentially reduce developers’ workload in responding to reviews.
8 References
1. Maalej, W., Nabil, H.: Bug report, feature request, or simply praise? on automatically classi-
fying app reviews. In RE, 2015 conference, pp. 116-125. (2015)
2. Lewin-Jones, J., Mason, V.: Understanding style, language and etiquette in email communi-
cation in higher education: a survey. Research in PC Education 19, 75-90 (2014)
3. Waldvogel, J.: Greetings and closings in workplace email. Journal of Computer‐Mediated
Communication 12, 456-477 (2007)
4. Chen, N., Lin, J., Hoi, S.C., Xiao, X., Zhang, B.: AR-miner: mining informative reviews for
developers from mobile app marketplace. In ICSE conference, pp. 767-778. ACM, (2014)
5. McIlroy, S., Shang, W., Ali, N., Hassan, A.: Is it worth responding to reviews? a case study
of the top free apps in the Google Play store. IEEE Software (2015)
6. Bailey, K.: Out of the Mouths of Users: Examining User-Developer Feedback Loops Facili-
tated by App Stores. (2015)
7. Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C.A., Canfora, G., Gall, H.C.: How can I
improve my app? Classifying user reviews for software maintenance and evolution. In ICSME
conference, pp. 281-290 (2015)
8. Markus, M.L.: Finding a happy medium: Explaining the negative effects of electronic com-
munication on social life at work. ACM TOIS 12, 119-149 (1994)
9. Kooti, F., Aiello, L.M., Grbovic, M., Lerman, K., Mantrach, A.: Evolution of conversations
in the age of email overload. In WWW conference, pp. 603-613 (2015)
10. Bjørge, A.K.: Power distance in English lingua franca email communication1. International
Journal of Applied Linguistics 17, 60-80 (2007)
11. Kooti, F., Yang, H., Cha, M., Gummadi, P.K., Mason, W.A.: The Emergence of Conven-
tions in Online Social Networks. In ICWSM conference (2012)
12. Pérez-Sabater, C.: The linguistics of social networking: A study of writing conventions on
facebook. Linguistik online 56, (2013)
13. Squire, M.: How the FLOSS research community uses email archives. International Journal
of Open Source Software and Processes (IJOSSP) 4, 37-59 (2012)
14. Guzman, E., Alkadhi, R., Seyff, N.: A Needle in a Haystack: What Do Twitter Users Say
about Software? In RE conference, pp. 96-105 (2016)
15. Chan, N.L., Guillet, B.D.: Investigation of social media marketing: how does the hotel
industry in Hong Kong perform in marketing on social media websites? Journal of Travel &
Tourism Marketing 28, 345-368 (2011)
16. Ye, Q., Gu, B., Chen, W., Law, R.: Measuring the value of managerial responses to online
reviews-a natural experiment of two online travel agencies. In ICIS conference (2008)
17. Andrighetto, G., Governatori, G., Noriega, P., van der Torre, L.W.: Normative multi-agent
systems. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2013)
18. Singh, M.P., Arrott, M., Balke, T., Chopra, A.K., Christiaanse, R., Cranefield, S., Dignum,
F., Eynard, D., Farcas, E., Fornara, N.: The uses of norms. Schloss Dagstuhl LZI (2013)
19. Gao, X., Singh, M.P.: Extracting normative relationships from business contracts. In
AAMAS conference, pp. 101-108. (2014)
20. Hashmi, M.: A methodology for extracting legal norms from regulatory documents. In
EDOCW workshop, pp. 41-50 (2015)
21. Sadiq, S., Governatori, G.: Managing regulatory compliance in business processes. Hand-
book on Business Process Management 2, pp. 159-175. Springer (2010)
22. Avery, D., Dam, H.K., Savarimuthu, B.T.R., Ghose, A.: Externalization of software behav-
ior by the mining of norms. In MSR conference, pp. 223-234 (2016)
23. Dam, H.K., Savarimuthu, B.T.R., Avery, D., Ghose, A.: Mining software repositories for
social norms. In ICSE conference, pp. 627-630 (2016)
24. Savarimuthu, B.T.R., Dam, H.K.: Towards mining norms in open source software reposito-
ries. In ADMI workshop, pp. 26-39 (2013)
25. Ye, Q., Law, R., Gu, B.: The impact of online user reviews on hotel room sales. Interna-
tional Journal of Hospitality Management 28, 180-182 (2009)
26. Sparks, B.A., So, K.K.F., Bradley, G.L.: Responding to negative online reviews: The ef-
fects of hotel responses on customer inferences of trust and concern. Tourism Management 53,
74-85 (2016)
27. Basuroy, S., Chatterjee, S., Ravid, S.A.: How critical are critical reviews? The box office
effects of film critics, star power, and budgets. Journal of marketing 67, 103-117 (2003)
28. Khoo-Lattimore, C., Ekiz, E.H.: Power in praise: Exploring online compliments on luxury
hotels in Malaysia. Tourism and Hospitality Research 14, 152-159 (2014)
29. Matzat, U., Snijders, C.: Rebuilding Trust in Online Shops on Consumer Review Sites:
Sellers' Responses to User‐Generated Complaints. Journal of CMC 18, 62-79 (2012)
30. Malthouse, E.C., Haenlein, M., Skiera, B., Wege, E., Zhang, M.: Managing customer rela-
tionships in the social media era: introducing the social CRM house. Journal of Interactive
Marketing 27, 270-280 (2013)
31. Merriam, S.B., Tisdell, E.J.: Qualitative research: A guide to design and implementation.
John Wiley & Sons (2015)
32. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data.
Biometrics 159-174 (1977)
33. Miller, R.B.: Response time in man-computer conversational transactions. In FJCC confer-
ence, pp. 267-277 (1968)
34. Arnold, K.C., Gajos, K.Z., Kalai, A.T.: On Suggesting Phrases vs. Predicting Words for
Mobile Text Composition. In UIST symposium, pp. 603-608 (2016)