deliverable - 5g-victoritvring.eu/wp-content/uploads/deliverables/d4.1.1...3 version 1.5, 03/09/2014...

0 version 1.5, 03/09/2014

D4.1.1.Evaluation Plan

DELIVERABLE

Project Acronym: TV-RING

Grant Agreement number: 325209

Project Title: Television Ring - Testbeds for Connected TV services using HbbTV

D4.1.1 Evaluation Plan

Revision: 1.5

Authors:

Sven Glaser (RBB)

Daniel Giribet (TVC)

Jordi Payo (TVC)

Franz Baumann (IRT)

Jeroen Vanattenhoven (KUL)

Project co-funded by the European Commission within the ICT Policy Support Programme

Dissemination Level

P Public x

C Confidential, only for members of the consortium and the Commission Services

Abstract: This document gathers the overall planning for the evaluation of the TV Ring pilots, including the parameters chosen, the methods to be used and the schedule to carry out these activities.

1 version 1.5, 03/09/2014


Revision History

Revision Date Author Organisation Description

0.0 28/05/2014 Sven Glaser RBB ToC improved

0.1 10/07/2014 Sven Glaser RBB Incorporated all partner input

0.2 11/07/2014 Sven Glaser RBB Incorporated additional partner input

0.2.1 14/7/2014 Daniel Giribet, Jordi Payo

TVC Incorporated additional TVC input

0.3 14/7/2014 Franz Baumann IRT Added tables and description to 6.2

0.4 16/7/2014 David Pujals RTV chapter 8 introduction

0.5 16/7/2014 Jennifer Müller RBB Incorporated additional partner input

0.7 17/7/2014 Jennifer Müller RBB Integration of calendars and proposal for pilot evaluation timeline 8.7.2

0.8 17/7/2014 Jennifer Müller RBB Integration of KUL input

0.9 22/7/2014 Marc Aguilar I2CAT Added ch. 6 intro and contents for Spanish pilot user evaluation

1.0 25/7/2014 Jeroen Vanattenhoven

KUL Added Dutch recommender pilot scenario evaluation methodology (input NPO & KUL), and Spanish pilot UX evaluation descriptions (input I2CAT)

1.1 05/08/2014 Annette Wilson RBB Added Executive summary, Introduction and conclusion. Language check.

1.2 05/08/2014 Sven Glaser RBB Final check. Removing comments and other metadata. Re-structuring (tightening) of chapters 8.5 – 8.8.

1.3 18/08/2014 Pau Pamplona I2CAT Format revision.

1.4.1 21/08/2014 Sven Glaser RBB Final revision.

1.4.2 24/08/2014 Sergi Fernández I2CAT Final revision.

1.5 03/09/2014 Sven Glaser RBB Last revision.

2 version 1.5, 03/09/2014


Statement of originality:

This deliverable contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both.

Disclaimer

The information, documentation and figures available in this deliverable, is written by the TV-RING (Testbeds for Connected TV services using HbbTV) – project consortium under EC grant agreement ICT PSP-325209 and does not necessarily reflect the views of the European Commission. The European Commission is not liable for any use that may be made of the information contained herein.

3 version 1.5, 03/09/2014


1. Executive Summary

This deliverable documents the work TV-RING partners have conducted to date in Task 4.1. The objective of this task is to create an evaluation plan, including objective and subjective measurements of the TV-RING services and evaluate the real interest of end-users in connected TV services through Next Generation Access (NGA) networks – specifically networks providing downlink speeds of 20 Mbit/s and more.

TV-RING covers three pilot countries, in each of which different services are tested. The Dutch pilot will test three applications: a streaming application requiring DRM (Digital Rights Management), a recommender application and a second-screen application. The German pilot evaluation will be focussing on an application with the working title “Abenteuer Liebe” (“Adventure Love”) which will accompany the TV series with the same title. Tests with the app will investigate UHD (ultra-high definition) content and social interaction. A further aspect of the German TV pilot is the TVAppGallery – a portal and directory service developed to offer better market access for developers/providers of connected TV applications. The Spanish pilot will evaluate high quality video transmission over a CDN (on a controlled network and on the internet) and advanced interactivity using MPEG-DASH (letting users select multiple points of view).

For each of these services, a set of objectives was defined. The approach is, on the one hand, to test the user experience (UX) of the apps involved. These tests will involve smaller groups of dedicated testers. On the other hand, the aim is to conduct an analysis and optimisation of the platforms’ technical parameters, which will involve a larger numbers of users.

Under the guidance of KU Leven a suitable methodology for scientifically rigorous user experience evaluations, based on the latest literature and past experience was developed. The UX evaluation is comprised of three main periods: before the deployment of the application, during the deployment of the application, and afterwards. The methodology used for the technical evaluation is that mainly used for web applications and streaming video content.

Each pilot site coordinator has provided a schedule for deploying and conducting the pilot. Reporting procedures have been specified and the overall timing has been defined. Calendars are available for each pilot, enabling a quick overview of the scheduled activities.

The results of the evaluation will be reported in Deliverable D4.3 once activities in the pilots have been concluded.

4 version 1.5, 03/09/2014


2. Contributors

First Name Last Name Company e-Mail

Marc Aguilar I2CAT [email protected]

Franz Baumann IRT [email protected]

Daniel Giribet TVC [email protected]

Sven Glaser RBB [email protected]

Jennifer Müller RBB [email protected]

Jordi Payo TVC [email protected]

David Pujals RTV [email protected]

Jeroen Vanattenhoven KUL [email protected]

Aylin Vogl IRT [email protected]

Annette Wilson RBB [email protected]

Ralf Neudel IRT [email protected]

5 version 1.5, 03/09/2014


Content

1. Executive Summary ................................................................................................................ 3

2. Contributors ........................................................................................................................... 4

3. Introduction ........................................................................................................................... 9

4. Action Log ............................................................................................................................ 10

5. Objectives ............................................................................................................................ 11

5.1. General evaluation planning objectives ...................................................................... 11

5.1.1. Dutch pilot ........................................................................................................... 11

5.1.2. German pilot ....................................................................................................... 12

5.1.3. Spanish pilot ........................................................................................................ 13

5.2. Service elements ......................................................................................................... 13

5.2.1. Dutch pilot ........................................................................................................... 14

5.2.2. German pilot ....................................................................................................... 17

5.2.3. Spanish pilot ........................................................................................................ 20

6. Approach .............................................................................................................................. 24

6.1. User evaluation of applications ................................................................................... 24

6.1.1. End user evaluation ............................................................................................. 24

6.1.2. Professional user evaluation ............................................................................... 26

6.2. Technical evaluation of platform ................................................................................ 26

6.2.1. Technical measurements .................................................................................... 28

7. Evaluation Methodology ...................................................................................................... 34

7.1. Applications ................................................................................................................. 34

7.1.1. Methodology developed for TV-RING UX Evaluation ......................................... 34

7.1.2. The TV-RING UX Evaluation Methodology .......................................................... 36

7.2. Platform ....................................................................................................................... 46

7.2.1. Data collection ..................................................................................................... 46

7.2.2. Data storage and processing ............................................................................... 46

7.2.3. Data analysis ........................................................................................................ 47

8. Pilot evaluation planning ..................................................................................................... 51

8.1. Scope and objective of the pilots ................................................................................ 51

8.2. Participating users, locations and duration ................................................................ 51

8.3. Support and communication plan for the pilot........................................................... 51

8.4. Known risks and contingency plans ............................................................................ 53

8.5. Schedule for deploying and conducting the pilot ....................................................... 53

6 version 1.5, 03/09/2014


8.5.1. Dutch pilot ........................................................................................................... 53

8.5.2. German pilot ....................................................................................................... 54

8.5.3. Spanish pilot ....................................................................................................... 56

8.6. Evaluation reporting .................................................................................................... 58

8.7. Pilot evaluation calendar ............................................................................................. 58

8.7.1. Dutch pilot ........................................................................................................... 59

8.7.2. German pilot ....................................................................................................... 60

8.7.3. Spanish pilot ........................................................................................................ 61

8.7.4. Common TV-RING evaluation calendar ............................................................... 62

9. Conclusions .......................................................................................................................... 63

10. Bibliography & References .............................................................................................. 64

11. Annex............................................................................................................................... 66

11.1. Pilot UX Evaluation Template .................................................................................. 66

11.2. UX Measures Table .................................................................................................. 70

11.3. UX Methods Overview ............................................................................................ 72

11.4. TV-RING Complete UX Evaluation Methodology .................................................... 73

11.5. General Calendar – Printable version ..................................................................... 77

7 version 1.5, 03/09/2014


Table of Figures

Image 1: TV-RING pilots evaluation approach ............................................................................ 24

Image 2: Values in Action: overall UX framework [6] ................................................................. 35

Image 3: Measurement of end user location .............................................................................. 47

Image 4: Measurement of element visits ................................................................................... 48

Image 5: Measurement or returning visits ................................................................................. 48

Image 6: Measurement of visits per visit duration ..................................................................... 48

Image 7: Measurement of user actions ...................................................................................... 49

Image 8: Device analysis ............................................................................................................. 50

Image 9: Support and communication plan ................................................................................ 52

Image 10: Dutch pilot evaluation calendar ................................................................................. 59

Image 11: German pilot evaluation calendar .............................................................................. 60

Image 12: Spanish pilot evaluation calendar .............................................................................. 61

Image 13: Evaluation methodology for the Dutch pilot .............................................................. 74

Image 14: Evaluation methodology for the German pilot .......................................................... 75

Image 15: Evaluation methodology for the Spanish pilot ........................................................... 76

Table 1: UX factors to be used for the service elements ............................................................ 25

Table 2: Evaluation methods to be used for the service elements ............................................. 27

Table 3: Location parameters...................................................................................................... 28

Table 4: Engagement parameters ............................................................................................... 29

Table 5: Actions parameters ....................................................................................................... 30

Table 6: Devices parameters ....................................................................................................... 31

Table 7: Traffic parameters (without MPEG DASH) .................................................................... 32

Table 8: Traffic parameters for MPEG DASH ............................................................................... 33

Table 9: UX evaluation methods for Spanish pilot Multicam Live .............................................. 37

Table 10: UX evaluation methods for Spanish pilot Multicam VoD ............................................ 38

Table 11: UX evaluation methods for German pilot Abenteuer Liebe ........................................ 40

Table 12: UX evaluation methods for German pilot TVAppGallery ............................................ 41

Table 13: UX evaluation methods for Dutch pilot DRM .............................................................. 42

Table 14: UX evaluation methods for Dutch pilot Recommender .............................................. 43

8 version 1.5, 03/09/2014


Table 15: UX evaluation methods for Dutch pilot 2nd Screen ................................................... 45

9 version 1.5, 03/09/2014


3. Introduction

TV-RING will execute large-scale pilots in three European countries The Netherlands, Germany and Spain. Since the project aims to ensure that all developments are in line with user needs, we are following a user oriented, iterative approach. Hence, the setup and conduction of user tests and the subsequent evaluation of results according to proven concepts and methods.

The purpose of this document is to provide a concrete plan for the evaluation of the activities carried out in the three TV-RING pilots. This is the result of work conducted in Task 4.1. The plan includes measurements that can be used to clearly show the benefits of the TV-RING service and evaluate real interest of end-users in connected TV services through Next Generation Access (NGA) networks.

The document starts by outlining the overall objectives of the project and then the more detailed objectives of each individual pilot. As the services tested in each pilot are different, the pilot countries have broken down their planned services into individual service elements. The idea here was to allow as a much comparison as possible despite the differences in the overall services.

The approach to the evaluation for both end-user and professional user testing is explained in addition to the technical measurements that will be used to help evaluate the services. The methodology, developed under the guidance of the usability experts at project partner KU Leuven, is explained including how it was developed. Section 8 details the evaluation planning including calendars of events for each pilot.

10 version 1.5, 03/09/2014


4. Action Log 09/05/2014 – Consortium Meeting Berlin. all partners 16/06/2014 – Conference Call. RBB, i2Cat, IRT, RTV, NPO, PPG, KUL, TVC. D4.0.1 Kick-Off

Meeting 27/06/2014 – Conference Call. i2CAT, KUL, IRT, TVC, RBB, NPO. D4.0.1 Follow-up Meeting 11/07/2014 – Conference Call. i2CAT, IRT, TVC, RBB, PPG, RTV. D4.0.1 Follow-up Meeting 29/07/2014 – Conference Call. i2CAT, IRT, TVC, RBB, RTV. D4.0.1 Follow-up Meeting

11 version 1.5, 03/09/2014


5. Objectives

5.1. General evaluation planning objectives

The overall evaluation objectives of the pilots in TV-RING are to determine the criteria for user acceptance tests in order to evaluate the suitability, acceptance and feasibility of the envisaged services, and to define metrics that will be used to measure this.

A further objective is to define the reporting methods for each pilots and finally to plan the evaluation actions for each pilot.

5.1.1. Dutch pilot

In the first scenario, the Dutch pilot partners want to investigate if it is possible to differentiate in stream rate quality by using Digital Rights Management technique (DRM). Will it lead to a simplified and more cost efficient encoding environment for broadcasters and new business models for DRM delivery and companies?

The partners here want to investigate the following questions:

- Can we simplify the encoding process and differentiate the quality of content based on one key with different statuses (basic, premium and gold)?

- Test user perception of service (objective and subjective). Are people willing to pay more for high quality content?

Willingness to pay will be based on a literature study and the Dutch pilot will make a proof of concept for the encoding process with different quality of content.

The goal of the recommendation pilot scenario is to scan household video content consumption by a family and present recommendations for individual persons on the central HbbTV set. Pilot partners want to develop an intelligent recommendation engine data entry that presents personal recommendations, using variables as time of day, device status and historical data.

The questions the partners here want to answer are as follow:

- How can we measure all NPO on-demand usage in a household and how can an HbbTV app make recommendations with this information, making the TV even smarter?

- Can we develop a recommendation engine that suggests, on a personal basis, pro-grams based on the information gathered and that are interesting for that particular person?

In scenario 3, second screen competition, the Dutch pilot wants to investigate how an HbbTV app can act as central interface for group second screen play-along, in a home network.

Questions partners want answered:

- How to pair all (2nd screen) devices in a household and by social media to one ‘master’ app and how can we synchronize these results with HbbTV?

- How do we keep it scalable (Cloud) - How do we manage and encourage in-house “real” social interaction and create and

encourage a competition model?

For both pilot 2 and 3 Dutch pilot partners will do several tests, interviews and observations with a test group of 15-20 different families.

12 version 1.5, 03/09/2014


5.1.2. German pilot

The German pilot will tackle three core challenges namely streaming of ultra-high quality content, developing highly interactive transmedia TV format and opening HbbTV to third parties.

In recent years, media streaming has been the main driver for bandwidth demand, as we moved from stand definition to high definition (HD) and now to ultra-high definition (UHD) formats this demand will further increase. For the first part of the German pilot we plan to deploy and evaluate a selection of adaptive streaming offerings that exploit the full range of typically available bandwidth figures: high bandwidths for very high quality on one end and support for low to medium bandwidth connections on the other end. This content will be available via the HbbTV app “Abenteuer Liebe” (second part of the pilot).

In the pilot the partners want to

- conduct technical measurements to measure the use of this content, - assess bandwidth requirements and technical parameters, - investigate the perceived difference for the user, - determine user demand for this content.

With the second part of the German pilot, an interactive TV format, TV-RING wants to investigate how an HbbTV-based service accompanying a TV show should be shaped. Many users are already familiar with accessing additional media and information from the internet on a different device in parallel. The “Abenteuer Liebe” HbbTV application will offer users ample interaction opportunities on the main screen.

The German partners want to investigate the following questions:

- Is the service usable by first-time HbbTV users and experienced HbbTV users? - Do users perceive the TV show and HbbTV app as a seamless service or do they feel

distracted from the TV show? - Do users feel continuously motivated to use the service? - Do users feel involved in the show? - Do users feel the presence of other users? - Do users enjoy the service?

For the German pilot part as described above, the partners will conduct several qualitative tests, interviews and observations with a test group of approximately 40 different users. As the application will be openly available on-air via German free TV, a much larger number of users are expected to use the application and to implicitly provide data for the technical measurements.

The third part of the German pilot is about opening the HbbTV market to third parties, thus allowing developers and SMEs to freely offer apps directly to the general public. So far, HbbTV applications are mostly tied to broadcast programs and can be accessed through the “red button concept” from within the TV programme. The current HbbTV standard does not provide any specific technology to give access to third party applications. However, the TVAppGallery developed by IRT can open the HbbTV application market to non-broadcasting companies. Initial studies have shown that people are generally excited about the opportuni-ties such an open application directory service could offer and that they - as an end-user - would also benefit from a portal for HbbTV applications. The main challenges for the concept are the legal implications such a service brings and the competition it faces with vertical

13 version 1.5, 03/09/2014


approaches. To tackle these activities around the TVAppGallery pilot are following two complementing approaches.

Firstly, before this portal can be made publically available, a reliable, independent partner to run the backend needs to be found. To this end, the portal will be published and presented to potential partners and will also be demonstrated at trade fairs and mentioned in HbbTV presentations. The portal concept will be discussed with many stakeholders and feedback will be gathered.

Secondly, to further promote the concept and advantages of the TVAppGallery, its design is improved and a selection of attractive applications from the project will be promoted on the first page. To gain more feedback, it is foreseen to perform further user evaluation – specifically towards the following questions:

- Do users feel the need for such an application portal? - Do users recognize the advantages of an open portal? - Do users get understand the portal structure? - What additional functionalities are users expecting from such a portal?

For this evaluation it is planned to have interviews and/or questionnaires with a test group of about 15 people. Apart from this, it is not planned to perform a deeper technical evaluation as first a mass-market deployment would be required which is not the case until now.

5.1.3. Spanish pilot

The Spanish pilot will basically evaluate two main scenarios, high quality video transmission over a Content Delivery Network (CDN) (on a controlled network and on the internet) and advanced interactivity using MPEG-DASH (letting user select multiple points of view). The main basic questions that are to be tested are:

- If we provided multiple-view on-demand content, would people watch more of it? - If people watched an on-demand show that has multiple views, would they enjoy it

more? - Would people watch the show again if extra views were made available on-demand? - Would people be less likely to switch to another channel if live content had multiple

views? - If people watched a live show that has multiple views, would they enjoy it more?

In addition to that, high-quality video transmission over CDN will be tested answering the following questions:

- Do the users appreciate the qualitative signal improvement? - Do the users perceive as positive the signal adaptability?

This pilot will investigate these questions in a qualitative and quantitative fashion. Qualitatively at the CDN controlled environment with a group of test users, between 20 and 40 people (using interviews and questionnaires, etc.). At the universal CDN environment, no interviews have been scheduled. The evaluation in this case will be done collecting quantitative information from video marking and from user facing feedback mechanisms (such as a "Like" button, present in each video).

5.2. Service elements

In this section, a detailed description of all elements that will be under evaluation is given. The objective of providing this granularity is to structure all contents and help to better understand

14 version 1.5, 03/09/2014


what information will be obtained and from what. The diversity of each pilot makes a direct comparison difficult, but instead, atomizing all parts contained in each pilot, makes it easier to set and describe the whole evaluation.

The definition of each element has been done for each partner involved in the pilots according to what is their minor element with value by itself. In other words, if we split this element into smaller pieces, then it will not bring any value for its developer. This consequently means that different criteria have been used, but once evaluated results are more relevant and targeted. Complementary, they are also more exhaustive and can feed a more general pilot evaluation. Otherwise, this would have been more complex and un-structured (or at least this would have been more difficult).

5.2.1. Dutch pilot

Element Name: Content differentiation by using DRM keys

Deploying pilot: Quality differentiation by using DRM

Developer: PPG & Infostrada Delivery date: September 2014

Evaluator(s): NPO & KUL

Description:

Investigate technical possibilities of using DRM technique on quality and archive depth differentiation. That can lead to a more hybrid production and distribution facility that can support both free and non-free video services .The consumer’s willingness to pay research adds a key variable to possible future DRM business models. The app will be composed of the following features: Content (different quality and type), DRM key value (basic, SD or HD), Archive dept.

Objectives (for its evaluation):

Simplify the encoding process and differentiate quality and content, based on one DRM access key with different statuses

Investigate if a certain DRM key plays the right content

Test user perception of service (objective and subjective). Are people willing to pay more for differentiated content?

Investigate the maxium used bandwith

Investigate new DRM business models

Parameters under evaluation:

Engagement: stream starts, page views, archive depth Traffic: maximum served bitrate played

Methodology and data gathering:

Browse stats and archive dept based on Google analytics Analysing log files Literature study on willingness to pay

Involved KPIs: Daily network consumption1

Others: End-user evaluation

1 The network activity will be monitored to identify consumption of this module and of users.

15 version 1.5, 03/09/2014


Element Name:

Engagement by Quiz- second screen

Deploying pilot:

HbbTV as a central interface for second screen competition

Developer: PPG, Angry bytes, NPO Delivery date: January 2015 + possibly April 2015 (EU Song festival contest)


Description:

How can an HbbTV App act as central interface for group second screen play-along, in a closed network


pair many (2nd screen) devices in a household or other closed network to one ‘master’ app and synchronize the results with HbbTV

Investigate how we can keep the technology scalable

Create and encourage real social interaction


Location: region Engagement: duration of visits Actions: number of unique visitors, entry page, exit page Devices: type of second screen device, HbbTV, number of devices Rating (engagement, usability)


Google analytics Comscore Questionnaires and (online) survey Interviews

Involved KPIs: Web based usage indicators Number of third party apps included in the pilot

Others:

16 version 1.5, 03/09/2014


Element Name: Content differentiation by Recommendations

Deploying pilot:

In-house recommenda-tions for HbbTV and Cable TV apps

Developer: PPG, NPO ICT Delivery date: October 2014


Description:

Scan the complete household video content consumption on a selected video on-demand (VOD) service and present recommendations for individual persons and groups on the central HbbTV set. Develop an intelligent recommendation-engine data entry that presents both personal and group recommendations, using variables such as of time of day, device status and historical data.


Investigate how people watch televison content in a household

Investigate what influence the variables mood, time of day, device and family composition at that time of day have on viewing habits

Develop and recommendation engine data entry that presents recommendations for individual persons and groups on the central HbbTV set

Investigate how the outcome can be integrated in existing recommendation models and tools


Engagement: number of unique visits, duration of visit, stream starts, click through rate Action: page views, entry page, exit page Used devices: HbbTV and second screen, static PC and laptop Traffic: VOD absolute and average, size of stream absolute and average Rating (how accurate is the recommendation, usability)


Google analytics Comscore Questionnaires and (online) survey & rating Analysing log files

Involved KPIs: Average periodic network activity Web based usage indicators

Others:

17 version 1.5, 03/09/2014


5.2.2. German pilot

Element Name: Abenteuer Liebe (AL) App with UHD content

Deploying pilot: German pilot

Developer: RBB, IRT Delivery date: 0. November 2014

Evaluator(s): RBB

Description:

The “Abenteuer Liebe” app (AL app) is an HbbTV application that accompanies a 20-part TV series, during and after the broadcasts, allows for transmedia storytelling, browsing TV and additional non-TV content (UHD video, images, texts), commenting on events happening in the show (input via second screen, texts, pictures and video clips) and taking part in playful voting, ratings, quizzes. Multi-canvas application integrating:

- video player (scaled down and full-screen) - image galleries - text areas - blog (via ScribbleLive API, also voting and rating) - TV box (broadcast picture, either scaled down or full-screen)

For the first round of testing the focus will be on the use of UHD content


UX-wise to find out: - Do the users perceive any difference in UHD and other content? - Do users enjoy the content - Do users feel continuously motivated to use the content? - Is the service usable by first-time HbbTV users and experienced HbbTV users? - Do users perceive the TV show and HbbTV app as a seamless service or do they feel

distracted from the TV show? Technically to find out about:

- How many users used the content? - What were the bandwidth requirements?


UX-wise: Accessibility, overall usability, aesthetics/appeal/attractiveness, enjoy-ment/pleasure, engagement, hedonic quality, flow/immersion, empowerment, sociability, participation, reciprocity, social presence Technically: For video streams: requests per programme, video requests for pilot duration, total bandwidth, video stream size, accumulated traffic, measured traffic per programme


UX methods: - Interviews - Meetings - Questionnaires

For technical parameters: - PIWIK - Google Analytics - Akamai LunaControl - Web server measurements

Involved KPIs: Daily network consumption Web-based usage indicator

18 version 1.5, 03/09/2014


Element Name: AL App with social interaction Deploying pilot: German pilot

Developer: RBB, IRT Delivery date: Summer 2015 (TBC)

Evaluator(s): RBB

Description:

The “Abenteuer Liebe” app (AL app) is an HbbTV application that accompanies a 20-part TV series, during and after the broadcasts, allows for transmedia storytelling , browsing TV and additional non-TV content (video, images, texts), commenting on events happening in the show (input via second screen, texts, pictures and video clips) and taking part in playful voting, ratings, quizzes. Multi-canvas application integrating

- video player (scaled down and full-screen) - image galleries - text areas - blog (via ScribbleLive API, also voting and rating) - TV box (broadcast picture, either scaled down or full-screen)

For the second round of testing the focus will be on interactive format


UX-wise to find out: - Is the service usable by first-time HbbTV users and experienced HbbTV users? - Do users enjoy the service? - Do users feel continuously motivated to use the service? - Do users perceive TV show and app as a seamless service or do they feel distracted

from the TV show? - Do users feel involved in the TV show? - Do users feel the presence of other users?

Technically to find out about: - How many users used the app? - How long did users stay in the app? - What are the most used parts of the app? - What is the most used content?


UX-wise: Accessibility, effectiveness, overall usability, aesthetics/appeal/attractiveness, enjoyment/pleasure, perceived usefulness, engagement, hedonic quality, flow/immersion, distraction/helpful, empowerment, sociability, participation, reciprocity, social presence Technically: for the application: clicks per visit, duration, average duration for returning visitors, page views, average generation time of the site, average time on page For video streams: video requests per programme, video requests for pilot duration, total bandwidth, video stream size, accumulated traffic, measured traffic per programme


UX methods: - Interviews - Meetings - Questionnaires

For technical parameters: - PIWIK platform - Google Analytics - Akamai LunaControl - Web server measurements

19 version 1.5, 03/09/2014


Involved KPIs: Daily network consumption, web-based usage indicator

Element Name: TVAppGallery Deploying pilot: German pilot

Developer: IRT Delivery date: M20

Evaluator(s): IRT

Description:

The TVAppGallery is a system to provide an open marketplace for HbbTV applications. Mostly, HbbTV applications are tied to a broadcast programme and are accessed through the “red button” concept. The current HbbTV standard does not provide a specific technology to give access to third party applications. The TVAppGallery opens the HbbTV application market to developers and SME who otherwise have to buy in to proprietary app portals offered by some device manufactures. But the TVAppGallery makes it possible to have equal opportunities and an efficient access to SmartTV devices for all parties.


UX-wise to find out: - Do users need this service? - Are they willing to use the portal? - Feel users comfortable with the menu structure? - Could they handle certain configuration steps? - What do users think about the portal idea?


UX-wise: - Accessibility - Effectiveness - Overall usability - Aesthetics/appeal/attractiveness - Usefulness


The data will be gathered by interviews, meetings or questionnaires.

Involved KPIs: Number of 3rd party apps included in the pilots

Others:

20 version 1.5, 03/09/2014



Element Name: MPEG DASH encoder Deploying pilot: Spanish pilot

Developer: I2CAT Delivery date: M13

Evaluator(s): TVC & i2CAT technicians

Description:

A MPEG-DASH encoder and segmenter developed by i2CAT. This software is capable of receiving live RTP H264/AAC streams, to decode them, and to re-encode them in different qualities with different parameters and to then encapsulate them as different DASH tracks. It is run as Software-as-a-Service (SaaS) with a simple RESTful API and an according test web application.


The solution delivered by i2CAT targets to become a competitive alternative to current software-based commercial solutions such as Wowza. For this reason, the encoder will be evaluated in order to assess its performance and strong points.


Performance metrics to be measured:

Resources usage: CPU, memory, etc.

Maximum number of live video tracks per DASH stream

Maximum transcoding quality


Quantitative analysis using specific metrics. A fixed test video set will be used to feed the DASH encoder in lab and desired parameters will be measured. At least 3 different bitrates will be considered (10Mbps, 6Mbps and 3Mbps).


Others:

21 version 1.5, 03/09/2014


Element Name: Local Managed CDN Deploying pilot: Spanish pilot

Developer: i2CAT Delivery date: M13

Evaluator(s): I2CAT

Description:

Over the managed network of the pilot, i2CAT will deploy proxy caches in order to set up a delivery network for a local area. The idea is to provide congestion control techniques such as HTTP caching as close as possible to the end-user.


It is of relevant interest to demonstrate the effectiveness of a simple local CDN solution for distributing media contents in scenarios like those defined by the Spanish pilot in TV-RING. For this reason, it will be evaluated if its performance is sufficient to support a later potential new business model. In addition, the evaluation process will contribute to improve and optimize the whole system’s performance and to study its scalability.


Performance metrics to be measured

Bandwidth consumption from the origin server

Bandwidth consumption from the proxy cache

Bandwidth savings

Latency


The measurements will be simulated and tested in a lab in order to refine the procedure for the data recording and collecting that will be used and automated on the field. At first, it will be tested and evaluated in a lab, then the evaluation tools will be deployed on the field to collect and validate the expected results with real data.


Others:

22 version 1.5, 03/09/2014


Element Name: Global CDN Deploying pilot: Spanish pilot

Developer: Retevision Delivery date: M13

Evaluator(s): Retevision

Description:

Retevision will bring the Usage and Performance CDN measures taken from a set of CDN reporting tools and reported in a separate specific portal web. A spectrum of technical parameters will be measured and reported in order to have a clear idea about the Pilot content usage/consumption.


Compared with the local managed CDN, the external CDN is a worldwide network that is going to be used in the Spanish pilot to deliver HbbTV applications and MPEG-DASH content. Retevision is developing its own web portal illustrating the CDN performance and usage measures. These will be available for the project reports and in comparison with the ones taken from the local CDN which will help to optimize and dimension the local network.


Usage and Performance metrics to be measured: Consumption (Usage)

GB per month

Total Requests

Origin Volume Throughput (Performance)

Peak Req./Sec

Avg Req/Sec

Bandwidth at 95% (Mbps)

Cache Efficiency (%)

Peak Mbps

Avg Mbps

Peak Origin Mbps

Avg Origin Mbps


Measures will be taken from the specific network tools we are going to use during the Pilot performance. Some measures are defined and will automatically be gathered from these tools and some others will need some labour to be generated. Retevision is developing a tool to allow more easily gathering these measures from the CDN log files. This is going to be a PHP + Java WEB tool. This tool will be used to ease gathering the consumption and performance information directly from the CDN automated data generator and to better understand what and how it is measured. This will enable operators to easily perform the required information reporting.


Others:

23 version 1.5, 03/09/2014


Element Name:

TV3 A LA CARTA Multicamera service on HbbTV

Deploying pilot: Spain

Developer: TVC Delivery date: October 2014

Evaluator(s): TVC

Description:

Add-on for the existing HbbTV IPTV at TVC (TV3 A LA CARTA) to support multiple points of view of a on-demand or live video. Content is to be delivered using the two network elements (local and global CDN).


Multiple view on-demand content, yields to more content being watched

More user enjoyment of on-demand content that has multiple views

On-demand repeated consumption of content previously watched live due to availability of multiple views

Less channel zapping when live content has multiple views

More user enjoyment of live content that has multiple views


Engagement: duration of visits Actions: number of unique visitors, entry page, exit page Devices: HbbTV, number of devices Rating (engagement, usability)


Adobe Omniture Questionnaires and (online) survey Interviews

Involved KPIs: Web-based usage indicators

Others:

24 version 1.5, 03/09/2014


6. Approach

The TV-RING pilots have a comprehensive approach to the evaluation and validation of the deployed technologies and contents. Therefore, the pilot evaluation includes a thorough assessment of the user experience across several UX metrics, and a series of performance tests to measure key technical parameters at regular intervals.

Image 1: TV-RING pilots evaluation approach

Under the user evaluation, data generating actions are envisioned for both end-users and professional users, thus covering the whole spectrum of pilot stakeholders.

6.1. User evaluation of applications

6.1.1. End user evaluation

In this section we present the User Experience (UX) factors that will be measured in each pilot. Most of these have been gathered from the state-of-the-art UX literature. Some factors are new because they did not exist yet; these constructs will have to be created and introduced in this project.

TV RING pilots

User evaluation

End users Professional

users

Technical measurements

25 version 1.5, 03/09/2014


Pilot region Spain Germany The Netherlands

Service element User experience factors

Multicam Live

Multicam VoD

Aben-teuer Liebe

TVAG DRM Recom-mender

2nd Screen

aesthetics/ appeal

X X X X

distraction X

empowerment X

engagement X X X X X

enjoyment X X X X X

expectations X X X X X X X

flow X X

habits X X X X X

motivation X X

overall UX X X X X X X

participation X

reciprocity X

sociability X X

social image X

social presence X X

usability X X X X X X X

willingness-to-pay

X

Table 1: UX factors to be used for the service elements

26 version 1.5, 03/09/2014


6.1.2. Professional user evaluation

For the Dutch pilots, interviews and online questionnaires are foreseen to ask the opinion of professional users about the outcomes of all three scenarios.

In the German pilots, RBB will be concentrating on the end-user app, which will be developed in conjunction with professional users and the type of content they can and will provide, but as no professional Content Management System (CMS) is being developed to create the app, there will be no dedicated technical professional user tests. Feedback will be gathered about the general concept and experience from an editorial perspective. This will be done before the start of the pilot, during and after the pilot has been concluded.

At the Spanish pilot, validation of the TV-RING technologies from the point of view of the professional users will be achieved through two sets of research actions:

- For the global CDN element, with a series of in-depth interviews with the professionals from TVC and RTV who are involved in the pilot. These interviews will be conducted towards the end of pilot phase 1 (around November and December), and will be fo-cused on confirming that professionals feel comfortable working with the technologies of the pilot, and eliciting suggestions of small tweaks and improvements.

- For the local controlled CDN element, professional users from I2CAT and project collaborator Guifinet will be involved in the four focus groups planned at the beginning and the end of each pilot phase.

6.2. Technical evaluation of platform

The following table gathers the elements that will be evaluated, including contents, applications and backend platform services, which have been developed in TV-RING. Depending on the specific element, different approaches will be followed

27 version 1.5, 03/09/2014


Approach

Element

End User Professional

User Usage

(Metrics) Performance

(Metrics)

Local Managed CDN X X X

Global CDN X ? X

Multi Camera Service X X

Multi Camera Content/ Oh Happy Day

X X X

MPEG-DASH Transcoder X ? X

AL app X X X X

TVAG X X X

Recommendations X X X X

DRM X X X X

Quiz - second screen X X X X

Table 2: Evaluation methods to be used for the service elements

28 version 1.5, 03/09/2014


6.2.1. Technical measurements

The technical measurements for the performance evaluation of the pilots will be classified in the following categories:

- Location - Engagement - Actions - Devices - Traffic (with and without MPEG-Dash)

The following tables give an overview over the parameters of the measurements. There are some risks that measurements will not provide useful data because most of the web analytic tools are developed for normal (X)HTML-applications and will be modified for HbbTV. Another reason for a failure of some measurements is that MPEG-Dash is a very new standard. The implementation of the features regarding MPEG-Dash is at the very beginning and hardly tested. The risks are described for every category in the tables.

Category: Location

Definition: The physical location of the user who used the respective TV-RING-Apps/Services.

Sources: IP-Address, Geo-location databases

Tools: PIWIK, Google Analytics, Akamai LunaControl, Adobe Analytics (formerly Omniture SiteCatalyst), Other internal

Risks: Precision could not be so high if there are legal issues regarding the saving of IP-Addresses

Some regions and countries do not have high precision

Spanish Pilot

Dutch Pilot

German Pilot

Parameters: Country x x x

Region x x

City x

Responsible: TVC NPO/PPG RBB

Periodicity2: m m w

Table 3: Location parameters

2 Periodicity values: annually (y), monthly (m), weekly (w), daily (d)

29 version 1.5, 03/09/2014


Category: Engagement

Definition: Measurement of the duration of the visits

Sources: Included JavaScript, Tracking Pixel


Risks: If Cookies will be deleted it's not possible to detect returning visitors

Legal issues regarding the User Engagement

Spanish Pilot

Dutch Pilot

German Pilot

Parameters: Visits per visit duration [min:sec] x x x

Average duration for returning visitors [min:sec]

x x x


Periodicity: m m w

Table 4: Engagement parameters

30 version 1.5, 03/09/2014


Category: Actions

Definition: Basis behaviour of the user by using the app/service

Sources: Included JavaScript, Tracking Pixel


Risks: The JavaScript-Engine could be not fully implemented on elder Settop-Boxes, maybe not all parameters will be measured.

This measurement will just measure the behaviour on different "sites" in the HbbTV App.

Dynamically generated content in the frontend ("AJAX"). This will maybe not fully be evaluated by the tools.

It's difficult to find out the reason for broken streams (e.g. Application failed, problems with the provider...)

Spanish Pilot

Dutch Pilot

German Pilot

Parameters: Pageviews [number] x x x

Average generation time of the site

[sec]

(x) tbc x x

Average time on page [sec] x x x

Entry Page [URL] x x x

Exit Page / Exit Rate [URL][%] x x x


Periodicity: m m w

Table 5: Actions parameters

31 version 1.5, 03/09/2014


Category: Devices

Definition: Used Devices of the end user

Sources: The JavaScript-Engine could be not fully implemented on elder Setup-Boxes, maybe not all parameters will be meas-ured

Some devices will not be in the USER-AGENT database at the beginning of the measurements

Tools: Included JavaScript, Tracking Pixel

Risks: PIWIK, Google Analytics, Akamai LunaControl, Adobe Analytics (formerly Omniture SiteCatalyst), Other internal

Spanish Pilot

Dutch Pilot

German Pilot

Parameters: Used Device [SetupBox] x x x

Manufacturer [brand] x x x

Model/Firmware version [modelNr][FirmwareVersion]

x x x

Used Browers [browserFamily] x x x

Resolutions of the device [width|heigh]

(x) tbc x


Periodicity: m y w

Table 6: Devices parameters

32 version 1.5, 03/09/2014


Category: Traffic (without MPEG DASH)

Definition: Measurements regarding the traffic of the pilot applications

Sources: Internal databases, Logfiles


Risks: It's difficult to find out the reason for broken streams (e.g. Application failed, problems with the provider...)

Spanish

Pilot Dutch Pilot

German Pilot

Parameters: Video requests per programme [count]

x x

Video requests while pilot duration [count]

x x

Total bandwith [Mbit/s] x x

Video stream size [byte] x x

Accumulated traffic [Mbit/s] x

Measured Traffic per programme [Mbit/s]

x


Periodicity: m w

Table 7: Traffic parameters (without MPEG DASH)

33 version 1.5, 03/09/2014


Category: Traffic (only MPEG-DASH)

Definition: Measurements regarding the traffic of the pilot applications

Sources: Internal databases, Logfiles


Risks:

It's difficult to find out the reason for broken streams (e.g. Application failed, problems with the provider...)

The End-User-Devices choose the quality of the video - not the server from the broadcaster. It depends on the DASH-Implementation of the manufacturer which works all a little bit different and not always perfect.

Spanish

Pilot Dutch Pilot

German Pilot

Parameters: Video requests per programme [count]

(x) tbc x

Video requests while pilot duration [count]

(x) tbc x

Total bandwidth [Mbit/s] (x) tbc *

Video stream size [byte] (x) tbc *

Accumulated traffic [Mbit/s] (x) tbc *

Measured Traffic per programme [Mbit/s]

(x) tbc *

Responsible: I2CAT/RTV IRT

Periodicity: w (tbc) **

Table 8: Traffic parameters for MPEG DASH

*German pilot: MPEG DASH service is not implemented at the used streaming platform (Akamai) so far. At the moment, these measurement values could not be assessed for the pilot. If during the pilot phase the service will be supported, it is possible to provide these values.

**German pilot: There is no experience with these values for MPEG DASH streaming, so at the moment it is not clear which periodical phases make sense to provide an appropriate analysis.

34 version 1.5, 03/09/2014


7. Evaluation Methodology

7.1. Applications

7.1.1. Methodology developed for TV-RING UX Evaluation

This section describes how we developed the UX evaluation methodology to be applied in the TV-RING pilots. This methodology is based on the latest insights from the literature on how to conduct scientifically rigorous user experience evaluations. In several critical reviews of past UX evaluation practices, a number of concerns were formulated:

There is still a lack of systematic research on how to evaluate and measure UX [28].

When looking at when user experience is evaluated, it was observed that most evaluations only occur after use. Studies of actual use, or before use (expectations for example), are rare [28].

There are no truly longitudinal studies of UX [1].

Scientific quality is a big concern in studies incorporating UX before use: non-validated metrics are used, and self-created metrics are not documented, not clearly defined [1][28].

(Too) many UX constructs exist; the relation between these constructs is rarely clarified or investigated [1].

To address these issues we made sure

to collect relevant UX constructs together with existing validated instruments for measuring them,

to set up measurements before pilot deployment, during pilot deployment and afterwards,

to make use of qualitative and quantitative data collection methods,

to obtain objective and subjective information,

to use correct procedures for the creation of novel UX measures that have to be created [7],

to combine retrospective UX methods with longitudinal approaches.

Based on these insights we created a series of steps by which we aim to address these issues. The procedure will allow the TV-RING project partners to:

perform proper quantifiable modelling, and have generalizable findings

gain sufficient qualitative insight into the how and why

predict which UX qualities (criteria) a user is very likely to experience with an interac-tive entity of interest, given integration and interaction of specific UX factors (predic-tors) [23]

determine the likelihood a user will purchase or adopt a system/product/service (criterion), based on a specific set of user experiences (predictors) [23]

The following procedure sketches how the TV-RING UX evaluation methodology was created:

1. Based on a review of the literature, Pilot UX Evaluation Templates were created (see Annex 11.1), in which the responsible partners could fill in their pilot evaluation details. Given that the partners responsible for the pilots knew their applications and test envi-

35 version 1.5, 03/09/2014


ronment best, and that KU Leuven is expert in User Experience Evaluation, these docu-ments were completed in a number of iterations in which feedback was exchanged.

a. The first section contained a brief description of the pilot, the application, and the number of users.

b. The next section in this document concerned the focus of the pilot evaluation: the responsible partners were to formulate the research questions in their own words (“What do you want to learn from this pilot”?).

c. The section after this linked the Research Questions of each pilot scenario to re-lated UX measures in one table. To help the partners with this, a separate docu-ment was created containing relevant UX measures (see Annex 11.2). These meas-ures were collected from literature. We also made sure these covered the six cate-gories of UX formulated in [6]: emotional value, interpersonal value, epistemic value, functional value, and conditional value (see Figure 2).

d. The final section contained a table to link the methods to the chosen measures. There were three methods tables: one for carrying out a measurement before the start of the pilot (expectations etc.), one for measuring during the deployment of the applications, and one for a final evaluation after deployment. For this final sec-tion we also included a table that contained a overview and description of estab-lished methods for UX evaluation (see Annex 11.3). In addition, a number of exist-ing UX evaluation procedures in literature (exemplars such as [20]) were distrib-uted, for inspiration.

2. After the UX templates were completed, KU Leuven gathered all the input, reviewed and processed the information to create the UX evaluation plan. This plan makes sure that specific UX measures can be targeted in the different application scenarios and that at the same time, a number of measures can give some cross-pilot insights.

Image 2: Values in Action: overall UX framework [6]

36 version 1.5, 03/09/2014


7.1.2. The TV-RING UX Evaluation Methodology

In this section we will explain the methodology in detail. An overview was created in Excel; however, because of its size and complexity, it is included in the Annex (see Annex 11.4). A number of important points across pilot have to be made first. The following table illustrates a high-level overview of the UX evaluation methodology:

Before pilot deployment

During pilot deployment After pilot deployment

Right before episode of use

During episode of use

Right after episode of use

Larger scale measure-

ments

baseline measurements (expectations, current use situations)

questionnaires

mainly logging, can occur without disturbing the participant

short questionnaires

one final survey, inquiring about the entire, overall experience

In-depth investigation

baseline measurements (expectations, current use situations)

interviews

focus groups

interview about what people are expecting for the coming episode of use

observation (probably via installation of cameras in households)

interviews about what happens, about observations, about the experience that just happened

one final, overall assessment, via focus groups, or UX curve inter-views; the latter charts the retro-spective user experience

The UX evaluation is comprised of three main periods: before the deployment of the application, during the deployment of the application, and afterwards. Before the deployment we can gain insight into the way participants currently use similar application, and what their expectations are. During the deployment we foresee three periods: right before an episode of use, during an episode of use, and right after an episode of use. With an episode of use we mean the period that participants are actually using an application. In some pilots, such as the Multicam Live and 2nd Screen pilot applications this refers to the time of the broadcast of the respective shows, in which certain functionality can be used. The entire period of pilot deployment will contain many episode of use, with periods of non-use in between. In all pilots, except for the DRM pilot, we foresee more lightweight measurements (mainly quantitative) that can be conducted on a larger amount of users, and more intensive evaluations (mainly qualitative) that can only be conducted with a more limited amount of users. In each measurement period we have defined the to-be-measured UX factors, and the method by which it will be evaluated.

37 version 1.5, 03/09/2014


7.1.2.1. Spain - Multicam Live Application Scenario

This section contains the UX evaluation methodology for the Multicam Live application scenario in Spain. For this scenario, a focus group session will be carried out before and after the deployment of each of two pilot phases, in which end users and project professionals will work together to understand and co-create aspects of the application which need further elucidation and polishing.

Also, right after each episode of use of the application, user panel members will answer a short online questionnaire of satisfaction with the application, to obtain a quantitative measure of satisfaction and have a satisfaction benchmark with which to assess the progress of the application towards an optimal solution from the user’s point of view.

However, the bulk of the user experience evaluation data is expected to come from video observations and in-depth in situ interviews with the user panel households. These users will record with their own devices their reactions and interactions while they watch the show live on Saturday night, and send the recordings to the project researchers for further analysis. The post-event semi-structured interview will serve as a participant debriefing of the whole experience, and clarify any aspect that requires a deeper understanding:

Table 9: UX evaluation methods for Spanish pilot Multicam Live

38 version 1.5, 03/09/2014


7.1.2.2. Spain - Multicam VoD Application Scenario

This section contains the UX evaluation methodology for the Multicam VoD application scenario in Spain. This scenario will be evaluated alongside the Multicam Live application scenario, using the same methods and in the same set of research actions.

Table 10: UX evaluation methods for Spanish pilot Multicam VoD

39 version 1.5, 03/09/2014


7.1.2.3. Germany - Abenteuer Liebe

This section contains the UX evaluation methodology for the “Abenteuer Liebe” application scenario in Germany. For this scenario, RBB will organise an introductory meeting for all users from the dedicated test panel. At the beginning of the home-use phase, RBB will carry out interviews with each user from the test panel about their expectations. During the pilot phase each test user will fill out online questionnaires, weekly or at least every second week. At the end of the pilot in RBB will carry out closing interviews with all users from the test panel. This will be followed by a closing event for participants. This process will apply to both phases of the pilot.

Also the editor(s) who will be responsible for the app during the pilot will be interviewed before the pilot starts and after the pilot ends. The feedback of the editor(s) will also be documented regular through interviews or questionnaires weekly or every second week.

As the application will be openly available on public TV in Germany, we expect to gather a larger amount of quantitative data about the general usage patterns and technical measurements:

40 version 1.5, 03/09/2014


Table 11: UX evaluation methods for German pilot Abenteuer Liebe

41 version 1.5, 03/09/2014


7.1.2.4. Germany – TVAppGallery

This section contains the UX evaluation methodology for the TVAppGallery. For this element a “in depth evaluation” is planned. It is foreseen that it starts with an introduction, followed by the evaluation and ends with a personal interview. For the evaluation the user panel members will be asked to answer a comprehensive questionnaire to gather information about the accessibility, effectiveness, usability and attractiveness of the TVAppGallery:

Table 12: UX evaluation methods for German pilot TVAppGallery

42 version 1.5, 03/09/2014


7.1.2.5. The Netherlands - DRM

This section contains the UX evaluation methodology for the DRM application scenario. For the larger scale evaluation we will use questionnaires to investigate people’s expectations, motivations and experience concerning the DRM solutions. During the use, several technical logs will be kept to investigate how people use it. After pilot deployment, a more extensive questionnaire will be launched targeting attractiveness, engagement, and motivation for the users, overall UX and usability.

Table 13: UX evaluation methods for Dutch pilot DRM

43 version 1.5, 03/09/2014


7.1.2.6. The Netherlands - Recommender

This section contains the UX evaluation methodology for the Recommender application scenario in The Netherlands. For the larger scale evaluation we will use questionnaires to investigate people’s expectations, motivations, and habits concerning recommender systems, or choosing what to watch. During the use of the recommender, several technical logs will be kept to investigate how people use it. Furthermore, we will introduce very small questionnaires concerning relevance of the recommendations, the timing (one item might be suitable on Saturday morning, but not during prime-time), and the suitability for the viewers (a good recommendation for one person, but the whole family is now watching). After pilot deployment, a more extensive questionnaire will be launched targeting relevance, timing, suitability for the group, overall UX and usability. For the in-depth evaluation we will conduct Skype or telephone interviews to inquire about how participants used the recommender and why.

Table 14: UX evaluation methods for Dutch pilot Recommender

44 version 1.5, 03/09/2014


7.1.2.7. The Netherlands – 2nd Screen

This section contains the UX evaluation methodology for the 2nd Screen application scenario in The Netherlands. This pilot focuses on a 2nd screen app for the Dutch TV program “De Rijdende Rechter”, and possibly also the Eurosong contest in 2015. This application will augment the TV program by allowing people to interact with the show by voting, and playing against other members in the household. 10-50 households will take part in the larger scale evaluation, in which the main method of inquiry is a survey. In this survey many UX factors are included. In the beginning we focus on what people expect from the show and the app, how they currently watch this show, and why. During the show, technical logging data can inform us about the participants’ engagement. Right after the show, participants will receive an online survey with a number of UX measure. On exception will be “flow” via the Flow State Scale, which is a validated questionnaire for measuring flow consisting of 36 questions. Because of its size, it is not feasible to include this with the other UX measures. Thus, to measure flow, one or two episode of the show will be shown, where only flow is measured; all the other episodes will then include the other UX measures and not the Flow State Scale questions. Finally, after the pilot deployment, a final assessment will be carried out, inquiring about how people evaluate the whole pilot period. Next to the larger scale evaluation, we will conduct an in-depth UX evaluation, consisting of in-house visits, with pre-episode interviews, observations via camera installation in the home, and post-episode interviews again. Given that these methods require a substantial effort, between 5 and 10 households will be recruited. After the pilot deployment, we will also conduct an evaluation into how the households evaluate the whole pilot period. This will happen via the UX Curve method, an advanced UX evaluation method that allows researchers to chart the evolution of several UX aspects over time, retrospectively.

45 version 1.5, 03/09/2014


Table 15: UX evaluation methods for Dutch pilot 2nd Screen

46 version 1.5, 03/09/2014


7.2. Platform

The technical evaluation methodology of the services will be the same as used for web applications. At the moment there are no specific HbbTV service evaluation methods or tools. The evaluation methodology consists primarily of three main tasks:

- data collection - data storage and processing - analysis of the data

7.2.1. Data collection

The collection of data is the most important aspect but it should be running in the background if possible, so that the user won’t be interrupted while he is using the service. There are two different main approaches for collecting user information. The first possibility is, to collect data via log files generated by the server. Every request sent from the browser will be registered in the log file. This file is usually a text file which is generated anew every day. Typical data which could be gathered by log files is:

- Date and time of request - URL - IP address - User agent - Referrer - Status - Cookie

The most common way to gather user information can be achieved through page tagging. This method is applied on client-side. The user agent of the client records the user behaviour. By means of JavaScript and a 1x1 pixel picture it is possible to read the user agent configuration and some of the client actions. Typical information which could be gathered by page tagging is:

- Mouse actions, clicks, and the position - Keyboard input like form content - Screen resolution - Installed plugins like Flash, Java or QuickTime - Language - Additional functionalities like Cookies or Java - Duration or interruption of multi-media files like videos - Etc.

Compared with the log file output, the data gained with this method is much more extensive. The entire information gathered by server log files is also available for client-side methods.

7.2.2. Data storage and processing

Data storage should be well considered. A huge pile of data can be collected in short time. So the backend should contain a great server capacity and performance, and should be fail-safe (including backup mechanism). Therefore two solutions are possible, either hosting an own infrastructure or using the infrastructure and the software of a third party (SaaS). Both variants have its advantages and drawbacks. Cost factor, time exposure and privacy issues are the main criteria’s that must be considered. There are several products on both sides:

47 version 1.5, 03/09/2014


- Internal storage: PIWIK, Open Web Analytics, AWStats, Webalizer, etc. - Software as a Service: Google Analytics, Adobe Analytics (SiteCatalyst), Yahoo Web

Analytics, etc.

7.2.3. Data analysis

Last task for the evaluation phase is the data analysis. Following the table with the technical parameters (see chapter 6.2) it is recommended to use analysis reports which will be introduced now. Each analysis will be generally generated per pilot element, but in some cases it is not possible to raise the values. This could happen e.g. for video parameters when the application contains no video or for legal reasons, if the institution is not allowed to measure the values.

7.2.3.1. Location

Listing of countries, regions and city from where users visited the application. It will be measured the count of the users per location.

Image 3: Measurement of end user location

7.2.3.2. Engagement

Creating a diagram where visits and returning visits per element over the pilot duration time could be shown. Depending on the kind of pilot, different active application phases are possible. Some applications are bound to TV shows and therefore only available to the end user during the broadcasting time of the show. In this case it makes no sense to use a time scale of the whole pilot duration time. Each element should use its own appropriate time scale for these diagrams.

48 version 1.5, 03/09/2014


Image 4: Measurement of element visits

Image 5: Measurement or returning visits

Next Image shows an example how PIWIK is illustrating the visit duration time. In this case most of the users stayed between 0 and 10 seconds. It gives just a simple overview.

Image 6: Measurement of visits per visit duration

49 version 1.5, 03/09/2014


7.2.3.3. Actions

The user interactions on the HbbTV application could be also represented in tables or in appropriate diagrams. The definition of these values and how they are gained will be explained now.

Page views: The number of times this page was visited.

Unique page views: The number of visits that included this page. If a page was viewed multiple times during one visit, it is only counted once.

Bounce rate: The percentage of visits that stared on this page and left the website straight away.

Average time on page: The average amount of time visitors spent on this page (only the page, not the entire website).

Exit rate: The percentage of visits that left the website after viewing this page.

Average generation time: The average time it took to generate the page. This metric includes the time it took the server to generate the web page, plus the time it took for the visitor to download the response from the server. A lower “Avg. generation time” means a faster website for the visitors.

Image 7: Measurement of user actions

7.2.3.4. Devices

Gathering this information will be very interesting also for HbbTV distribution of the different versions. The main problem for this measurement is the wide range of end devices and their fast grow. Most of the tools do not have the latest database of current settop-boxes and connected TVs, so it happens very often that the number of unknown devices is very high.

50 version 1.5, 03/09/2014


Image 8: Device analysis

7.2.3.5. Video traffic

To evaluate the multimedia traffic, some parameters are suggested previously in section 6.2. This part includes every video or audio encoded file, delivered through the various HbbTV applications. The intention is, to get a meaningful impression about of how well it was received by users and how much traffic it caused. At least six parameters were chosen to retain and document the traffic. These parameters are explained in more detail here:

Video requests per programme: Periodically the video requests for each programme (episode) will be counted. Therefore it is insignificant if the users are unique or returning visitors.

Video requests while pilot duration: This parameter will be assessed also periodically. It represents the sum of all video request of the application. The values will be cumulative within the time periods.

Total bandwidth: This parameter shows the total bandwidth consumption for each period.

Accumulated traffic: The value of this parameter represents the total traffic, caused by the application. These values will be also cumulative within the time periods.

Measured traffic per programme: Periodically the traffic for each programme (episode) will be measured. Therefore it is also insignificant if the users are unique or returning visitors.

Video stream size: The size of the provided multimedia files will be documented here.

To illustrate these values it is recommended to use suitable diagrams - depending on the possibilities the analysis tools provide.

51 version 1.5, 03/09/2014


8. Pilot evaluation planning

The three pilots will last up to a maximum of 12 months. Following this, a final report with the results of all pilots will be delivered as Deliverable D4.3 “Evaluation results”. This document will contain the same basic information provided by all pilots, although more information can be provided about each pilot if its partners see it useful or necessary for their specific purpose.

The main details that need to be defined before starting the pilots include:

- Scope and objectives of the pilots

- Participating users, locations and duration

- Support and Communication plan for the pilot

- Known risks and contingency plans

- Schedule for deploying and conducting the pilot

- Planning of evaluation reporting

8.1. Scope and objective of the pilots

It is necessary before commencing with each pilot, to clearly identify what are the main purposes for running that pilot and what is expected to get out of its execution. The answer to these questions will give us the guide to implement the reporting. These objectives have been identified by the partners and are listed in section 5 of this document.

8.2. Participating users, locations and duration

The second point to be identified after answering the first one is how to carry out that pilot in order to achieve that target.

In that case, we will need to define how many participants we are counting, their location and the pilot or sub-pilot (testing) duration.

We need to take into account if the number of participants matches with their location and the duration of these tests. If not, we will need to modify these parameters in order to have a closer scenario where all users could easily contribute.

Important points to be taken in considerations are:

- All users have to be able to access the same TV and video content.

- Duration does not need to be so large; this makes the end users to decrease its interest in participating in the tests.

- Internet access cannot be a trouble or excuse for not participating in the tests (for example in some rural places).

- The pilot end users sample must be enough representative of its population.

8.3. Support and communication plan for the pilot

The pilots have to assume that sometimes by mistake or other reasons any test could need to be modified, due to end users behaviour or due to for example a question or template improvement.

We need to clearly define how these changes are requested, submitted, approved, tested, and implemented.

52 version 1.5, 03/09/2014


In case that an end user has any issues, or just has a question, we need to define where they could post these issues/questions. The use of an email account or telephone line could be so useful in these cases. Although a dedicated web site to report any issue or mistake or the use of a FAQ could also be considered.

When the pilot consortium is aware of an issue, they will need to study and review the situation, prioritize, take decisions and finally fix that issue or just answer the questions.

A process is defined to get and notify the appropriate personnel:

1. Issue is communicated or detected by pilot operators/administrators. 2. Issue is analyzed by pilot operators/administrators 3. Issue is communicated to pilot consortium 4. If it is the case, the issue is forwarded to project consortium 5. Decisions or actions are decided. 6. Pilot operators execute these actions and communicate with end users.

Image 9: Support and communication plan

53 version 1.5, 03/09/2014


8.4. Known risks and contingency plans

As the pilot complexity is not low, we can identify several potential risks in its implementation. Some of the known risks are listed here but others that have not been considered could occur.

- Costs above expectations/limits

- HbbTV receivers, as they are devices, can crash

- End users Internet connection can fail.

In the first two cases this will imply to increase the budget. This point should be discussed within the project consortium and some decisions will be needed. It obviously will depend on how much the budget needs to be increased.

For the third case, we can check the Internet connection and try to fix the problem, or contact directly the Internet Service Provider (ISP), but at last if no solution is found some people have to be identified from the beginning just in case we need them to act as backup in the pilot development.

8.5. Schedule for deploying and conducting the pilot

Each pilot will have their own schedule to be deployed, but all pilots will start at M13 of the project and finish at month M24 at the latest.

8.5.1. Dutch pilot

The Dutch pilot is scheduled for M13 till M23. The first scenario will be the Recommendations that will begin in M13 (September 2014) and will finish in M19 (March 2015). Secondly the DRM scenario will start M14 (October 2014) and will end M20 (April 2015). The last started scenario will be the Quiz- second screen, from M17 (January 2015) till M23 (July 2015).

8.5.1.1. Application testing

The Dutch pilot partners want to invite current HbbTV users, who already use the NPO ‘Uitzending gemist’ Platform on HbbTV. These users already have access to HbbTV via satellite (canal digital), digitenne, through fiberglass operators (Glashart) or Cable via the providers CAIWay, SKV Veendam, Cable Nord or Delta. This has a reach of around 450.000 unique monthly visitors. Pilot partners will create a banner on the existing platform to invite users to join our pilots. Via a QR code or URL they can connect to a Google form where they can find more information and leave their e-mail address and contact information.

For the pilots people must be in possession of a Samsung 2013 or 2014 model, HbbTV version 0.5 compliant. PPG can detect if the TV has HbbTV and will only show the banner on these TV’s. To see if the TV’s are 0.5 compliant the owners will have to send their model details, which will be asked on the Google form.

The Dutch pilot asks their local partners (CAI, Cable Nord, Delta, SKV and Glashart) to actively approach their users (by digital newsletter or e-mail) to explain the project and invite users to participate. Pilot partners can also use social media to invite people to participate.

If the Dutch pilot does not get enough responses or there are not enough TV’s 0.5 compliant, they can hire a recruitment office to select people to participate.

54 version 1.5, 03/09/2014


The pilot needs around 30-40 test-users willing to participate. To make it attractive NPO can offer them a free account for NL Ziet (and NPO Plus) during the pilot. NL Ziet is the paid (monthly fee) on demand service from RTL, SBS and NPO. Test-users must sign a form to accept access to their IP data & data logging for a certain period. They must also be willing to fulfil questionnaires (before, during and after the pilots) and participate in interviews.

Scenario 1 (DRM)

A test group of around 20 individual users, varying in age, gender and type of TV-viewer.

Scenario 2 (Recommendations)

A test group of around 15-20 families of different nature and composition.

Scenario 3 (Quiz- second screen)

10 or more families or groups living or being together in different nature and composition (with and without children, student house, sport clubs, bars). Already used to HbbTV and second screen.

8.5.1.2. Platform testing

The test group acquisition and recruitment will be planned in August and September 2014. We will recruit test users in the existing HbbTV Catch-up TV platform, by local partners who offer the infrastructure and access for HbbTV in the Netherlands and via social media. We will organise an introduction meeting for all users from the dedicated test panel or contact them online before the actual start of the pilots.

We will start with interviews on how people watch hybrid TV, being the combination of linear and on demand content, and how there moods are and organize observations how user groups watch television in September. We will repeat these sessions in January, March and April 2015. Besides we will carry out questionnaires in the same periods about user perception.

To evaluate engagement also professional users will be interviewed and questioned about their expectations and perception during and after the pilots, between September and April 2015.

Actual usage will be evaluated through analysing log files, this will be done during the whole pilot period from September 2014 until July 2015. Finally we will use Comscore and Google analytics to analyse stream starts, click through rates, number and duration of visits during the whole period and create monthly reports during September until August 2015.

8.5.2. German pilot

The initial German pilot is scheduled for project month 15 and 16. The AL App will be on air from the beginning of November 2014 until the End of December 2014. The related TV Show will be on air from 17.10.2014 until 18.12.2014 weekdays at 20:10. The whole series will be re-broadcast in 2015, at the time of writing the provisional planning is early summer, but this is a program decision and cannot be confirmed until nearer the date.

55 version 1.5, 03/09/2014



For the German pilot, existing HbbTV users will be recruited. These testers will use the service for the time the “Abenteuer Liebe” TV series is on-air. This is expected to be the case for approximately two months. Users will be asked to share their experiences via user experience questionnaires, interviews and a focus group discussion at the end of the trial.

The user acquisition will be planned in September 2014, then it will be decided which channels, for example the RBB Facebook page or “Kinderkanal” (KiKA channel) website will be used to recruit users from the target group. Promotional texts and images will be prepared. The recruitment process will start end of September 2014 and is expected to finish by the end of October. RBB will organise an introductory meeting for all users from the dedicated test panel. This meeting could be online, if some users live quite far away. In the first two weeks of November 2014, namely at the beginning of the home-use phase, RBB will carry out interviews with each user from the test panel about their expectations (e.g. via phone). In the pilot phase in November and December, each test user will fill out online questionnaires, weekly or at least every second week. At the end of the pilot in January 2015, RBB will carry out closing interviews with all users from the test panel. After this, around February, there will be a closing event for all participants, again if not most of the users live quite far away.

Also the editor(s) who will be responsible for the app during the pilot will be interviewed before the pilot starts in the end of October and after the pilot in January 2015. The feedback of the editor(s) will also be documented regularly through interviews or questionnaires weekly or every second week.

This process will be repeated for the second phase of testing. The planned time schedule is pictured in the calendar in section 8.7.2

Precondition for the users is to have DVB reception, a fast ADSL or even VDSL internet connection, an HbbTV-enabled TV or set top-box, a smartphone and if needed a parental consent.

For the evaluation of the used MPEG-DASH material, the German pilot expects to reach a subset of MPEG-DASH-enabled devices (HbbTV v0.5) as only these devices support DASH. For the users, it is not a precondition to participate actively in the pilot. The piloted HbbTV application will make use of a browser detection feature that can differentiate between devices that are MPEG-DASH-enabled and those which are not.

The German pilot aims at recruiting in total 40 test users for pilot participation. A team for managing the contacts and all issues of their involvement during the pilot and evaluation will be set up at RBB.

For the TVAppGallery evaluation, IRT will request a group of test users. The marketing team as well as our HbbTV experts could offer some contacts for possible test users. The test group will include about 15 persons. It is imaginable to invite the test users to IRT for one day and have a collective evaluation. If this is not possible as some interested users might live too far away, the evaluation can also be carried out remotely. To participate in the evaluation, a regular HbbTV settop-box or TV with satellite and broadband connection is sufficient. The portal will be offered to the users through a test channel from RBB or will be integrated by IRT directly into the HbbTV devices.

It is planned to carry out the evaluation in the middle of 2015, so the recruiting of possible test users will start at the beginning of 2015.

56 version 1.5, 03/09/2014



The application being used in the German pilot will be on-air and freely available, it is expected that it will be advertised from end of October through some RBB channels, for example through a radio interview or integrated in the TV trailer. This means that in addition to the dedicated UX evaluation test panel, a much larger number of users is expected to use the application. For the technical platform, all user interaction will be measured with the help of technical tools in the whole pilot phase. Data, for example, regarding clicks per visit, duration and video stream size will be collected through tools like PIWIK, Google Analytics, Akamai LunaControl and Web server measurements. Exactly what information is measured will be subject to RBB data protection guidelines.

After the follow-up interviews and the closing event in February 2015, RBB will take care of collecting and statistical processing of data. The data will be prepared and analysed to generate recommendations and ideas for new applications in May 2015.

This process will be repeated for the second phase of testing. The planned time schedule is pictured in the calendar in section 8.7.2

For the TVAppGallery evaluation the recruiting of the test users is planned in the beginning of 2015. As well as explained for the previous user evaluation, it is not yet decided if it will be in the form of a local or remote evaluation. There will be introduction information and instructions available for the users. The evaluation will be supported by interviews and questionnaires. Another option is to use online questionnaires or telephone interviews. The TVAppGallery will be evaluated in one cycle. The technology behind it is already finished and was sufficiently tested in preceding projects, so there will be no major development work during the pilot phase. The frontend of the portal will be adapted following the evaluation outcomes to improve its attractiveness to potential partners. Nevertheless, one evaluation process is seen as sufficient. After the evaluation, IRT will collect all data and work out a summary.


The Spanish pilot will run in two phases from M13 (September 2014) to M23 (July 2015). Pilot phase I has been confirmed and is linked to the second season of a specific TV show, “Oh Happy Day!”. This phase will run from M13 (September 2014) to M17 (January 2015). At the core of this phase there is the period in which the aforementioned show will be on air, from the 1st week of October 2014 to the 2nd week of January 2015.

A second pilot phase has been proposed for M18 (February 2015) to M23 (July 2015). The planning for this pilot phase II is contingent on finding and securing adequate contents, which cannot be known at the present moment, because the selection of next year’s programs is not yet available. The exact schedule and contents for this pilot phase is to be confirmed in M15 (November 2015), three months before the planned start of phase II.


In the Spanish pilot, users for the evaluation of the HbbTV application will be reached in two ways, depending on whether they experience the service during the pilot period through the managed or the non-managed CDN.

For the managed CDN side, the committed user panel selected in T3.2 in preparation of the pilot will be mobilised in a series of co-creation and user experience evaluation actions. This user panel will be composed of a group of 15 to 20 households, encompassing between 20 to

57 version 1.5, 03/09/2014


50 individual users of mixed demographic profiles (young couples, senior citizens, families with kids, and single-person households). To assist in the recruiting and running of the pilot actions regarding the user panel, a partnership has been formalised with Guifinet, a local network provider with a strong presence and reputation in the area selected for the pilot (Gurb, a town in central Catalonia). The user panel households will receive a TV set with HbbTV 0.5 support, to be used for the pilot activities. Upon successful completion of the pilot actions, user panel households will be ceded the TV sets as compensation for their participation.

The user panel will be involved in a series of data-generating activities, which will be focused on the evaluation of the user experience with the HbbTV applications and contents offered throughout the pilot’s duration. These will be:

- A total of 4 co-creation workshop sessions, at the beginning and end of each of two pilot phases, in which end users and project professionals will work together to under-stand and co-create aspects of the application which need further elucidation and pol-ishing.

- After each use of the application, user panel members will be asked to answer a short online questionnaire of satisfaction with the application. This questionnaire has two main purposes. First, to obtain a quantitative measure of satisfaction which will allow the project’s researchers to triangulate these non-technical data with technical metrics of quality of service. And second, to have a satisfaction benchmark with which to as-sess the progress of the application towards an optimal solution from the user’s point of view.

- Ethnographic methods are one of the main planned sources of knowledge in the pilot. For up to two times in total per user, the researchers of the project will perform a par-ticipant observation in a household to watch TV with the volunteer, assess first-hand experience of the user, and identify areas for improvement. To overcome logistic chal-lenges, a participative two-step approach to ethnography will be followed. First, the selected household will be assisted to record with their own devices their reactions and interactions while they watch the show live on Saturday night, and send the re-cordings to the project researchers for further analysis. And second, the project re-searchers will visit the household to talk with the users and observe in situ their con-sumption of on-demand contents. A post-event semi-structured interview will serve as a participant debriefing of the whole experience.

For the non-managed CDN side, strong dissemination efforts are expected to yield an organic growth of the HbbTV market. These users will be enticed to use the TVC application with a series of promotion actions. Adequate audience monitoring and user feedback mechanisms will be in place to ensure that high-quality data on usage and satisfaction is generated.


The platform technical evaluation for the controlled CDN scenario will start in September 2014. Initially the evaluation will start making laboratory measurements in order to refine and procedure data acquisition. The pilot TV show for the first phase of the pilot (Oh Happy Day) will be on air in early October, so during September the data acquisition and its process will be determined in detail. As it will be a weekly TV show, the evaluation metrics will gathered weekly until the end of the first phase, until January 2015.

The platform evaluation from the professional users perspective will be achieved as it has been stated before in 6.0.2 by performing in depth interviews with the professionals from TVC, RTV for the global CDN scenario around November/December 2014. For the controlled scenario,

58 version 1.5, 03/09/2014


the Guifinet and i2CAT professionals will be involved in the four focus groups planned at the beginning and the end of each pilot phase.

8.6. Evaluation reporting Final execution reporting will be described within the deliverable D4.2 “Pilot execution Report” and at the end of the project the conclusions will be described in the deliverable D4.3 “Evaluation Results”. Meanwhile the pilot is running (12 month), reporting should be carried out regularly. It is expected that every one or two months data (consume and performance) reporting will be delivered to the pilot consortium and then to the project consortium. Some end user tests are identified in each pilot during the pilot execution, these tests outcomes will be reported to the consortium once they are cleaned and ready to be presented.

8.7. Pilot evaluation calendar

The TV-RING pilots can be defined of successive coordinated actions, which have different scheduling on each region. This is mostly motivated for the interrelation of the Applications to be deployed and the associated content. As the evaluation process is complicated, it has been defined a calendar of activities per pilot. Then, it will be possible to merge all pilots and define common milestones that will facilitate the final evaluation and a better organisation of the related tasks.

In the following section, there is a detailed scheduling of all actions that are linked with the pilot evaluation, and in baseline with the pilot execution. Thanks to this information, it is much clearer to explain and understand how the evaluation process is expected to be conducted during the piloting stage. It is important to highlight that this calendar only makes reference to evaluation tasks, but not with other actions that are part of each pilot.

59 version 1.5, 03/09/2014


8.7.1. Dutch pilot

Image 10: Dutch pilot evaluation calendar

60 version 1.5, 03/09/2014


8.7.2. German pilot

Image 11: German pilot evaluation calendar

GANTT CHART

Evaluation and execution task

Pilot month

Week 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Nominal project schedule

Pilot Germany (T4.2)

Final Evaluation of pilots (T4.5)

Pilot execution schedule

AL app on air

TV show on air

Evaluation plan schedule

Preperation of user aquisation

User panel recruitment

End user panel evaluation

Introductory meeting (if feasible)

Expectation interview at the beginning

of the home-use phase

Regular online questionnaires

Follow-up interviews at the end of the

home-use phase

Closing event (if feasible)

On air user evaluation (all,

testpanel + viewers)

Advertising of TV show and application

(facebook, on air trailer)

User support - documented feedback

Continuous and regular technical

measurement (see 8.8.2.)

In-app qustionaire evaluation

Proffesional user panel evaluation

Expectation interview at the beginning

of the pilot

Editor training

Regular Interview or qiuestionaire

Follow-up interview

Results agregation and analysis

Collection and statistical processing of

data

evaluation and preparation of data

recommendations and ideas for new

apps

TVAppGallery user evaluation


Evaluation

Processing of the results

20162014 2015

Year 2 Year 3Year 1

FebMarc Apr May Jun Jul Aug Sep Oct Nov Dec Jan

27 28 29 30

Nov Dec Jan Feb

21 22 23 24 25 2615 16 17 18 19 20

Aug

12

Sep

13

Oct

14

61 version 1.5, 03/09/2014



Image 12: Spanish pilot evaluation calendar

62 version 1.5, 03/09/2014


8.7.4. Common TV-RING evaluation calendar

Week 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Nominal project schedule

Pilots Execitopm (T4.2 /T4.3 / T4.4)

Final Evalutation of Pilots (T4.5)


Module - Recommender

Module - DRM

Module - Quiz

Evualtuation plan schedule

Call for user panel

Literature study - on willingness to pay

Interviews - on how people watch tv and

how there moods are

Observation - on how a usergroup watches

tv

Questionnaires - perception of user

watching tv

Analyse logfiles - engagement of users

Analysing Comscore - start of stream,

clickthrough

Analysing Google Analytics - number and

duration of visits

Planned schedule

Call for user panel members


"Oh Happy Day" 2nd season on air (On

Demand multicamera programme test)

"Oh Happy Day" 2nd season Finale, (Live

multicamera programme test)

"Oh Happy Day" 2nd season test results

retrospective

Overall results agregation and analysis

User panel focus groups (phase I)

User panel evaluation (video-based

observation and post-test interviews, phase

I)

In-app marking evaluation

Qualitative non-managed evaluation ("like")

Potential schedule

Establishing next programme

Second On Demand - Live test

Second test Retrospective

"Oh Happy Day" 3rd season on air (On

Demand multicamera programme test)

"Oh Happy Day" 3rd season Finale, (Live

multicamera programme test)

"Oh Happy Day" 3rd season test results

retrospective

La marató 2014 (Charity one day special

programme)

La marató 2014 Retrospective

La marató 2015 (Charity one day special

programme)

La marató 2015 Retrospective

User panel focus groups (phase II

User panel evaluation (video-based

observation and post-test interviews, phase

II)

Infrastructure

RTE CDN interconnection testing

(Planned) i2cat interconnection

(Potential) i2cat interconnection


AL app on air

TV show on air

Evaluation plan schedule

Preparation of user acquisition


End user panel evaluation

Introductory meeting (if feasible)

Expectation interview at the beginning of

the home-use phase

Regular online questionnaires

Follow-up interviews at the end of the home-

use phase

Closing event (if feasible)

On air user evaluation (all, test panel +

viewers)

Advertising of TV show and application

(Facebook, on air trailer)

User support - documented feedback

Continuous and regular technical

measurement (see 8.8.2.)

In-app questionnaire evaluation

Professional user panel evaluation

Expectation interview at the beginning of

the pilot

Editor briefing

Feedback collection

Follow-up interview

Results aggregation and analysis

Collection and statistical processing of

data

Evaluation and preparation of data

Ideas and recommendations for

improvements

Overall results aggregation

TVAppGallery user evaluation


Evaluation

Processing of the results

German Pilot

Spanish pilot

Dutch pilot

63 version 1.5, 03/09/2014


9. Conclusions

In this document, each TV-RING pilot site has provided a comprehensive evaluation plan for the services it intends to tests. A fair part of the work was based on the know-how and the expertise of the TV-RING partners KU Leuven and IRT, who suggested approaches and provided templates with parameters and methodologies to be considered for evaluating user experience as well as technical performance of all pilot services in the TV-RING pilot areas. Substantial work has been spent on harmonising the evaluation plan while coping with the varied nature of the services being tested as well as with implications of running pilots in the multiple locations and by involving users in different manners. Thanks to the cooperation and collaboration among partners, the project is following a cohesive, coordinated approach to testing and evaluation. Based on this approach, we are optimistic that the TV-RING pilots will produce reliable and meaningful results that again will be of use for the whole HbbTV and connected TV ecosystem. The evaluation results will be documented in Deliverable D4.3 becoming available in project months 30.

64 version 1.5, 03/09/2014


10. Bibliography & References

1. Bargas-Avila, J.A. and Hornbæk, K. Old wine in new bottles or novel challenges: a critical analysis of empirical studies of user experience. Proceedings of the 2011 annual confer-ence on Human factors in computing systems, ACM (2011), 2689–2698.

2. Brooke, J. SUS-A quick and dirty usability scale. Usability evaluation in industry 189, (1996), 194.

3. Chuttur, M. Overview of the Technology Acceptance Model: Origins, Developments and Future Directions. 2009. http://sprouts.aisnet.org/9-37/.

4. Desmet, P. Measuring emotion: Development and application of an instrument to measure emotional responses to products. In Funology. Springer, 2005, 111–123.

5. Desmet, P.M., Hekkert, P., and Jacobs, J.J. When a Car Makes You Smile: Development and Application of an Instrument to Measure Product Emotions. Advances in consumer research 27, 1 (2000).

6. Fuchsberger, V., Moser, C., and Tscheligi, M. Values in Action (ViA): Combining Usability, User Experience and User Acceptance. CHI ’12 Extended Abstracts on Human Factors in Computing Systems, ACM (2012), 1793–1798.

7. Green, W., Dunn, G., and Hoonhout, J. Developing the scale adoption framework for evaluation (SAFE). International Workshop on, Citeseer (2008), 49.

8. Hassenzahl, M. The Interplay of Beauty, Goodness, and Usability in Interactive Products. Hum.-Comput. Interact. 19, 4 (2008), 319–349.

9. Hassenzahl, M., Burmester, M., and Koller, F. AttrakDiff: Ein Fragebogen zur Messung wahrgenommener hedonischer und pragmatischer Qualität. In Mensch & Computer 2003. Springer, 2003, 187–196.

10. Izard, C.E., Libero, D.Z., Putnam, P., and Maurice, O. Stability of emotion experiences and their relations to traits of personality. Journal of Personality and Social Psychology 64, 5 (1993), 847–860.

11. Jackson, S.A. and Eklund, R.C. Assessing Flow in Physical Activity: The Flow State Scale-2 and Dispositional Flow Scale-2. Human Kinetics Journals, 2010.

12. Jackson, S.A. and Marsh, H.W. Development and Validation of a Scale to Measure Optimal Experience: The Flow State Scale. Human Kinetics Journals, 2010.

13. Jain, J. and Boyce, S. Case study: longitudinal comparative analysis for analyzing user behavior. Proceedings of the 2012 ACM annual conference extended abstracts on Human Factors in Computing Systems Extended Abstracts, ACM (2012), 793–800.

14. Kahneman, D. (2010). The riddle of memory vs. experience. http://www.ted.com/talks/daniel_kahneman_the_riddle_of_experience_vs_memory

15. Kahneman, D., Krueger, A.B., Schkade, D.A., Schwarz, N., and Stone, A.A. The Day Reconstruction Method (DRM). Instrument Documentation. Retrieved April 3, (2004), 2005.

16. Karapanos, E., Zimmerman, J., Forlizzi, J., and Martens, J.-B. User experience over time: an initial framework. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2009), 729–738.

17. Karapanos, E., Martens, J.-B., and Hassenzahl, M. Reconstructing Experiences through Sketching. arXiv:0912.5343 [cs], (2009).

18. Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., and Newell, C. Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction 22, 4-5 (2012), 441–504.

http://www.ted.com/talks/daniel_kahneman_the_riddle_of_experience_vs_memory

65 version 1.5, 03/09/2014


19. Kujala, S. and Miron-Shatz, T. Emotions, Experiences and Usability in Real-life Mobile Phone Use. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2013), 1061–1070.

20. Kujala, S., Roto, V., Väänänen-Vainio-Mattila, K., Karapanos, E., and Sinnelä, A. UX Curve: A method for evaluating long-term user experience. Interacting with Computers 23, 5 (2011), 473–483.

21. Lang, P.J. Behavioral treatment and bio-behavioral assessment: Computer applications. (1980).

22. Lavie, T. and Tractinsky, N. Assessing dimensions of perceived visual aesthetics of web sites. International Journal of Human-Computer Studies 60, 3 (2004), 269–298.

23. Law, E.L.-C. The measurability and predictability of user experience. Proceedings of the 3rd ACM SIGCHI symposium on Engineering interactive computing systems, ACM (2011), 1–10.

24. Lewis, J.R. IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. International Journal of Human-Computer Interaction 7, 1 (1995), 57–78.

25. McAuley, E., Duncan, T., and Tammen, V.V. Psychometric Properties of the Intrinsic Motivation Inventory in a Competitive Sport Setting: A Confirmatory Factor Analysis. Research Quarterly for Exercise and Sport 60, 1 (1989), 48–58.

26. Sheth, J.N., Newman, B.I., and Gross, B.L. Why we buy what we buy: A theory of consumption values. Journal of Business Research 22, 2 (1991), 159–170.

27. Snillito, M.L. and de Marie, D. Value: its measurement, design and management. (1992) 28. Vermeeren, A.P.O.S., Law, E.L.-C., Roto, V., Obrist, M., Hoonhout, J., and Väänänen-Vainio-

Mattila, K. User experience evaluation methods: current state and development needs. Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries, ACM (2010), 521–530.

66 version 1.5, 03/09/2014


11. Annex

11.1. Pilot UX Evaluation Template

Pilot scenario description

In this section please provide a short description about the application scenario in each pilot. For example, in the Dutch pilot, there are 3 application scenarios. Each one requires a separate template document.

67 version 1.5, 03/09/2014


Table 0. Pilot scenario description information

Pilot scenario

Short scenario

description (1

paragraph)

Country / region

Target group (number

of people, ages,

individuals-households,

type of viewer…)

Target device(s)

Total scenario

deployment time

period

(month X1 – month X2)

Research question

Try to formulate what you want to learn from your pilot. What is your focus? You don’t have to complete 4 research questions; 3 is OK – 6 is also OK. This is a first iteration. When we (KU Leuven) receive your input, we will look at it, and provide feedback if necessary.

Table 2. Overview of research questions

Research Question 1

Research Question 2

Research Question 3

Research Question 4

Selected Evaluation Measures

Indicate a number of measures you believe might be interesting for your pilot – measures you believe will provide some insight into your research questions. The measures are available in a separate document (D4.1 UX Measures.xlsx). You can include your own measures if you cannot find it in our list.

68 version 1.5, 03/09/2014


Table 3. UX Measures and related research question

UX Measure Related Research Question(s) Number(s)

Preferred methods

Describe which methods you would like to use to answer your research questions. A selection of methods can be found in a separate document (D4.1 UX Methods.xlsx). Even more UX Methods can be found at http://www.allaboutux.org. Finally, you can also use your own methods if that is required for you pilot scenario.

Make sure you include

What-people-say methods AND what-people-do methods

Qualitative AND quantitative data gathering methods

If you are not really certain about the right method, don’t worry. We will help every partner in setting up the proper methodology based on your input.

Before start of scenario deployment

Method Related Measure

During scenario deployment


http://www.allaboutux.org/

69 version 1.5, 03/09/2014


After scenario deployment


70 version 1.5, 03/09/2014


11.2. UX Measures Table

71 version 1.5, 03/09/2014


72 version 1.5, 03/09/2014


11.3. UX Methods Overview

73 version 1.5, 03/09/2014


11.4. TV-RING Complete UX Evaluation Methodology

74 version 1.5, 03/09/2014


Image 13: Evaluation methodology for the Dutch pilot

measures methods measures methods measures methods measures methods measures methods

expectations questionnaire engagement logging (technical) aesthetics/appeal questionnaire aesthetics/appeal questionnaire

habits (current use) questionnaire distraction questionnaire distraction questionnaire

motivation questionnaire engagement questionnaire engagement questionnaire

enjoyment questionnaire enjoyment questionnaire

motivation questionnaire motivation questionnaire

overall UX questionnaire overall UX questionnaire

sociability questionnaire sociability questionnaire

social image questionnaire social image questionnaire

social presence questionnaire social presence questionnaire

usability IBM ASQ usability SUS

flow Flow State Scale

expectations interview aesthetics/appeal observation aesthetics/appeal interview aesthetics/appeal UX Curve / Interview

habits (current use) interview distraction observation distraction interview distraction UX Curve / Interview

motivation interview engagement observation engagement interview engagement UX Curve / Interview

enjoyment observation enjoyment interview enjoyment UX Curve / Interview

motivation observation motivation interview motivation UX Curve / Interview

overall UX observation overall UX interview overall UX UX Curve / Interview

sociability observation sociability interview sociability UX Curve / Interview

social image observation social image interview social image UX Curve / Interview

social presence observation social presence interview social presence UX Curve / Interview

usability observation usability interview usability UX Curve / Interview

expectations questionnaire usage logging (technical) usability SUS

habits (current use) questionnaire relevance TV questionnaire relevance questionnaire

motivation questionnaire timing TV questionnaire timing questionnaire

suitable for the group TV questionnaire suitable for the groupquestionnaire

overall UX questionnaire

usability Skype or telephone interview

relevance Skype or telephone interview

timing Skype or telephone interview

suitable for the groupSkype or telephone interview

overall UX Skype or telephone interview

expectations questionnaire engagement logging (technical) attractiveness questionnaire attractiveness questionnaire

motivation questionnaire engagement questionnaire engagement questionnaire

motivation questionnaire motivation questionnaire


usability IBM ASQ usability SUS

will ingness-to-pay questionnaire will ingness-to-pay questionnaire

country pilot scenario

before pilot deployment during pilot deployment after pilot deployment

before episode of use during episode of use after episode of use

DRM

LARGER SCALE EVALUATION LARGER SCALE EVALUATION LARGER SCALE EVALUATION

LARGER SCALE EVALUATION LARGER SCALE EVALUATIONLARGER SCALE EVALUATION

IN DEPTH UX EVALUATION IN DEPTH UX EVALUATIONIN DEPTH UX EVALUATION


IN DEPTH UX EVALUATION IN DEPTH UX EVALUATION IN DEPTH UX EVALUATION

NL

2nd screen

Recommender

75 version 1.5, 03/09/2014


Image 14: Evaluation methodology for the German pilot


expectations questionnaire engagement logging (technical) usability IBM ASQ usability SUS

habits (current use) questionnaire distraction questionnaire distraction questionnaire

empowerment questionnaire empowerment questionnaire

engagement questionnaire engagement questionnaire



participation questionnaire participation questionnaire

reciprocity questionnaire reciprocity questionnaire

sociability questionnaire sociability questionnaire

social presence questionnaire social presence questionnaire

flow Flow State Scale

expectations interview distraction observation distraction interview distraction UX Curve / Interview

habits (current use) interview empowerment observation empowerment interview empowerment UX Curve / Interview

engagement observation engagement interview engagement UX Curve / Interview

enjoyment observation enjoyment interview enjoyment UX Curve / Interview

overall UX observation overall UX interview overall UX UX Curve / Interview

participation observation participation interview participation UX Curve / Interview

reciprocity observation reciprocity interview reciprocity UX Curve / Interview

sociability observation sociability interview sociability UX Curve / Interview

social presence observation social presence interview social presence UX Curve / Interview

usability observation usability interview usability UX Curve / Interview

LARGER SCALE EVALUATION LARGER SCALE EVALUATION

IN DEPTH UX EVALUATION

LARGER SCALE EVALUATION

IN DEPTH UX EVALUATIONIN DEPTH UX EVALUATION

country pilot scenario

DE Abenteuer Liebe

after pilot deploymentduring pilot deploymentbefore pilot deployment

before episode of use during episode of use after episode of use

76 version 1.5, 03/09/2014


Image 15: Evaluation methodology for the Spanish pilot



habits (current use) questionnaire usage logging (technical) aesthetics/appeal questionnaire aesthetics/appeal questionnaire




expectations focus group expectations interview aesthetics/appeal observation aesthetics/appeal interview aesthetics/appeal focus group / co-creation

habits (current use) focus group habits (current use) interview engagement observation engagement interview engagement focus group / co-creation

enjoyment observation enjoyment interview enjoyment focus group / co-creation

overall UX observation overall UX interview overall UX focus group / co-creation

usability observation usability interview usability focus group / co-creation


habits (current use) questionnaire usage logging (technical) aesthetics/appeal questionnaire aesthetics/appeal questionnaire




expectations focus group expectations interview aesthetics/appeal observation aesthetics/appeal interview aesthetics/appeal focus group / co-creation

habits (current use) focus group habits (current use) interview engagement observation engagement interview engagement focus group / co-creation

enjoyment observation enjoyment interview enjoyment focus group / co-creation

overall UX observation overall UX interview overall UX focus group / co-creation

usability observation usability interview usability focus group / co-creation

after pilot deploymentduring pilot deployment

before episode of use during episode of use after episode of usecountry pilot scenario

before pilot deployment

LARGER SCALE EVALUATION LARGER SCALE EVALUATION




LARGER SCALE EVALUATIONES

Multicam Live

Multicam VoD

Week 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Nominal project schedule Pilots Execitopm (T4.2 /T4.3 / T4.4)Final Evalutation of Pilots (T4.5)

Pilot execution scheduleModule - RecommenderModule - DRMModule - QuizEvualtuation plan scheduleCall for user panel Literature study - on willingness to payInterviews - on how people watch tv and how there moods are

Observation - on how a usergroup watches tvQuestionnaires - perception of user watching tvAnalyse logfiles - engagement of users Analysing Comscore - start of stream, clickthroughAnalysing Google Analytics - number and duration of visits

Planned scheduleCall for user panel membersUser panel recruitment"Oh Happy Day" 2nd season on air (On Demand multicamera programme test)"Oh Happy Day" 2nd season Finale, (Live multicamera programme test)"Oh Happy Day" 2nd season test results retrospectiveOverall results agregation and analysisUser panel focus groups (phase I)

User panel evaluation (video-based observation and post-test interviews, phase I)In-app marking evaluation

Qualitative non-managed evaluation ("like")Potential scheduleEstablishing next programmeSecond On Demand - Live test Second test Retrospective"Oh Happy Day" 3rd season on air (On Demand multicamera programme test)"Oh Happy Day" 3rd season Finale, (Live multicamera programme test)"Oh Happy Day" 3rd season test results retrospectiveLa marató 2014 (Charity one day special programme)La marató 2014 RetrospectiveLa marató 2015 (Charity one day special programme)La marató 2015 RetrospectiveUser panel focus groups (phase IIUser panel evaluation (video-based observation and post-test interviews, phase II)InfrastructureRTE CDN interconnection testing(Planned) i2cat interconnection (Potential) i2cat interconnection

Pilot execution scheduleAL app on airTV show on airEvaluation plan schedulePreparation of user acquisitionUser panel recruitmentEnd user panel evaluationIntroductory meeting (if feasible)Expectation interview at the beginning of the home-use phaseRegular online questionnaires Follow-up interviews at the end of the home-use phaseClosing event (if feasible)On air user evaluation (all, test panel + viewers)Advertising of TV show and application (Facebook, on air trailer)User support - documented feedbackContinuous and regular technical measurement (see 8.8.2.)In-app questionnaire evaluationProfessional user panel evaluationExpectation interview at the beginning of the pilotEditor briefingFeedback collectionFollow-up interview Results aggregation and analysis

Collection and statistical processing of dataEvaluation and preparation of dataIdeas and recommendations for improvementsOverall results aggregationTVAppGallery user evaluationUser panel recruitmentEvaluationProcessing of the results

German Pilot

Spanish pilot

Dutch pilot

D4.1.1 Evaluation Plan version 1.5, 03/09/2014

11.5. General Calendar – Printable version

77

deliverable - 5g-victoritvring.eu/wp-content/uploads/deliverables/d4.1.1...3 version 1.5, 03/09/2014...

Documents