deliverable - 5g-victoritvring.eu/wp-content/uploads/deliverables/d4.1.1...3 version 1.5, 03/09/2014...
TRANSCRIPT
0 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
DELIVERABLE
Project Acronym: TV-RING
Grant Agreement number: 325209
Project Title: Television Ring - Testbeds for Connected TV services using HbbTV
D4.1.1 Evaluation Plan
Revision: 1.5
Authors:
Sven Glaser (RBB)
Daniel Giribet (TVC)
Jordi Payo (TVC)
Franz Baumann (IRT)
Jeroen Vanattenhoven (KUL)
Project co-funded by the European Commission within the ICT Policy Support Programme
Dissemination Level
P Public x
C Confidential, only for members of the consortium and the Commission Services
Abstract: This document gathers the overall planning for the evaluation of the TV Ring pilots, including the parameters chosen, the methods to be used and the schedule to carry out these activities.
1 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Revision History
Revision Date Author Organisation Description
0.0 28/05/2014 Sven Glaser RBB ToC improved
0.1 10/07/2014 Sven Glaser RBB Incorporated all partner input
0.2 11/07/2014 Sven Glaser RBB Incorporated additional partner input
0.2.1 14/7/2014 Daniel Giribet, Jordi Payo
TVC Incorporated additional TVC input
0.3 14/7/2014 Franz Baumann IRT Added tables and description to 6.2
0.4 16/7/2014 David Pujals RTV chapter 8 introduction
0.5 16/7/2014 Jennifer Müller RBB Incorporated additional partner input
0.7 17/7/2014 Jennifer Müller RBB Integration of calendars and proposal for pilot evaluation timeline 8.7.2
0.8 17/7/2014 Jennifer Müller RBB Integration of KUL input
0.9 22/7/2014 Marc Aguilar I2CAT Added ch. 6 intro and contents for Spanish pilot user evaluation
1.0 25/7/2014 Jeroen Vanattenhoven
KUL Added Dutch recommender pilot scenario evaluation methodology (input NPO & KUL), and Spanish pilot UX evaluation descriptions (input I2CAT)
1.1 05/08/2014 Annette Wilson RBB Added Executive summary, Introduction and conclusion. Language check.
1.2 05/08/2014 Sven Glaser RBB Final check. Removing comments and other metadata. Re-structuring (tightening) of chapters 8.5 – 8.8.
1.3 18/08/2014 Pau Pamplona I2CAT Format revision.
1.4.1 21/08/2014 Sven Glaser RBB Final revision.
1.4.2 24/08/2014 Sergi Fernández I2CAT Final revision.
1.5 03/09/2014 Sven Glaser RBB Last revision.
2 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Statement of originality:
This deliverable contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both.
Disclaimer
The information, documentation and figures available in this deliverable, is written by the TV-RING (Testbeds for Connected TV services using HbbTV) – project consortium under EC grant agreement ICT PSP-325209 and does not necessarily reflect the views of the European Commission. The European Commission is not liable for any use that may be made of the information contained herein.
3 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
1. Executive Summary
This deliverable documents the work TV-RING partners have conducted to date in Task 4.1. The objective of this task is to create an evaluation plan, including objective and subjective measurements of the TV-RING services and evaluate the real interest of end-users in connected TV services through Next Generation Access (NGA) networks – specifically networks providing downlink speeds of 20 Mbit/s and more.
TV-RING covers three pilot countries, in each of which different services are tested. The Dutch pilot will test three applications: a streaming application requiring DRM (Digital Rights Management), a recommender application and a second-screen application. The German pilot evaluation will be focussing on an application with the working title “Abenteuer Liebe” (“Adventure Love”) which will accompany the TV series with the same title. Tests with the app will investigate UHD (ultra-high definition) content and social interaction. A further aspect of the German TV pilot is the TVAppGallery – a portal and directory service developed to offer better market access for developers/providers of connected TV applications. The Spanish pilot will evaluate high quality video transmission over a CDN (on a controlled network and on the internet) and advanced interactivity using MPEG-DASH (letting users select multiple points of view).
For each of these services, a set of objectives was defined. The approach is, on the one hand, to test the user experience (UX) of the apps involved. These tests will involve smaller groups of dedicated testers. On the other hand, the aim is to conduct an analysis and optimisation of the platforms’ technical parameters, which will involve a larger numbers of users.
Under the guidance of KU Leven a suitable methodology for scientifically rigorous user experience evaluations, based on the latest literature and past experience was developed. The UX evaluation is comprised of three main periods: before the deployment of the application, during the deployment of the application, and afterwards. The methodology used for the technical evaluation is that mainly used for web applications and streaming video content.
Each pilot site coordinator has provided a schedule for deploying and conducting the pilot. Reporting procedures have been specified and the overall timing has been defined. Calendars are available for each pilot, enabling a quick overview of the scheduled activities.
The results of the evaluation will be reported in Deliverable D4.3 once activities in the pilots have been concluded.
4 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
2. Contributors
First Name Last Name Company e-Mail
Marc Aguilar I2CAT [email protected]
Franz Baumann IRT [email protected]
Daniel Giribet TVC [email protected]
Sven Glaser RBB [email protected]
Jennifer Müller RBB [email protected]
Jordi Payo TVC [email protected]
David Pujals RTV [email protected]
Jeroen Vanattenhoven KUL [email protected]
Aylin Vogl IRT [email protected]
Annette Wilson RBB [email protected]
Ralf Neudel IRT [email protected]
5 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Content
1. Executive Summary ................................................................................................................ 3
2. Contributors ........................................................................................................................... 4
3. Introduction ........................................................................................................................... 9
4. Action Log ............................................................................................................................ 10
5. Objectives ............................................................................................................................ 11
5.1. General evaluation planning objectives ...................................................................... 11
5.1.1. Dutch pilot ........................................................................................................... 11
5.1.2. German pilot ....................................................................................................... 12
5.1.3. Spanish pilot ........................................................................................................ 13
5.2. Service elements ......................................................................................................... 13
5.2.1. Dutch pilot ........................................................................................................... 14
5.2.2. German pilot ....................................................................................................... 17
5.2.3. Spanish pilot ........................................................................................................ 20
6. Approach .............................................................................................................................. 24
6.1. User evaluation of applications ................................................................................... 24
6.1.1. End user evaluation ............................................................................................. 24
6.1.2. Professional user evaluation ............................................................................... 26
6.2. Technical evaluation of platform ................................................................................ 26
6.2.1. Technical measurements .................................................................................... 28
7. Evaluation Methodology ...................................................................................................... 34
7.1. Applications ................................................................................................................. 34
7.1.1. Methodology developed for TV-RING UX Evaluation ......................................... 34
7.1.2. The TV-RING UX Evaluation Methodology .......................................................... 36
7.2. Platform ....................................................................................................................... 46
7.2.1. Data collection ..................................................................................................... 46
7.2.2. Data storage and processing ............................................................................... 46
7.2.3. Data analysis ........................................................................................................ 47
8. Pilot evaluation planning ..................................................................................................... 51
8.1. Scope and objective of the pilots ................................................................................ 51
8.2. Participating users, locations and duration ................................................................ 51
8.3. Support and communication plan for the pilot........................................................... 51
8.4. Known risks and contingency plans ............................................................................ 53
8.5. Schedule for deploying and conducting the pilot ....................................................... 53
6 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
8.5.1. Dutch pilot ........................................................................................................... 53
8.5.2. German pilot ....................................................................................................... 54
8.5.3. Spanish pilot ....................................................................................................... 56
8.6. Evaluation reporting .................................................................................................... 58
8.7. Pilot evaluation calendar ............................................................................................. 58
8.7.1. Dutch pilot ........................................................................................................... 59
8.7.2. German pilot ....................................................................................................... 60
8.7.3. Spanish pilot ........................................................................................................ 61
8.7.4. Common TV-RING evaluation calendar ............................................................... 62
9. Conclusions .......................................................................................................................... 63
10. Bibliography & References .............................................................................................. 64
11. Annex............................................................................................................................... 66
11.1. Pilot UX Evaluation Template .................................................................................. 66
11.2. UX Measures Table .................................................................................................. 70
11.3. UX Methods Overview ............................................................................................ 72
11.4. TV-RING Complete UX Evaluation Methodology .................................................... 73
11.5. General Calendar – Printable version ..................................................................... 77
7 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Table of Figures
Image 1: TV-RING pilots evaluation approach ............................................................................ 24
Image 2: Values in Action: overall UX framework [6] ................................................................. 35
Image 3: Measurement of end user location .............................................................................. 47
Image 4: Measurement of element visits ................................................................................... 48
Image 5: Measurement or returning visits ................................................................................. 48
Image 6: Measurement of visits per visit duration ..................................................................... 48
Image 7: Measurement of user actions ...................................................................................... 49
Image 8: Device analysis ............................................................................................................. 50
Image 9: Support and communication plan ................................................................................ 52
Image 10: Dutch pilot evaluation calendar ................................................................................. 59
Image 11: German pilot evaluation calendar .............................................................................. 60
Image 12: Spanish pilot evaluation calendar .............................................................................. 61
Image 13: Evaluation methodology for the Dutch pilot .............................................................. 74
Image 14: Evaluation methodology for the German pilot .......................................................... 75
Image 15: Evaluation methodology for the Spanish pilot ........................................................... 76
Table 1: UX factors to be used for the service elements ............................................................ 25
Table 2: Evaluation methods to be used for the service elements ............................................. 27
Table 3: Location parameters...................................................................................................... 28
Table 4: Engagement parameters ............................................................................................... 29
Table 5: Actions parameters ....................................................................................................... 30
Table 6: Devices parameters ....................................................................................................... 31
Table 7: Traffic parameters (without MPEG DASH) .................................................................... 32
Table 8: Traffic parameters for MPEG DASH ............................................................................... 33
Table 9: UX evaluation methods for Spanish pilot Multicam Live .............................................. 37
Table 10: UX evaluation methods for Spanish pilot Multicam VoD ............................................ 38
Table 11: UX evaluation methods for German pilot Abenteuer Liebe ........................................ 40
Table 12: UX evaluation methods for German pilot TVAppGallery ............................................ 41
Table 13: UX evaluation methods for Dutch pilot DRM .............................................................. 42
Table 14: UX evaluation methods for Dutch pilot Recommender .............................................. 43
8 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Table 15: UX evaluation methods for Dutch pilot 2nd Screen ................................................... 45
9 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
3. Introduction
TV-RING will execute large-scale pilots in three European countries The Netherlands, Germany and Spain. Since the project aims to ensure that all developments are in line with user needs, we are following a user oriented, iterative approach. Hence, the setup and conduction of user tests and the subsequent evaluation of results according to proven concepts and methods.
The purpose of this document is to provide a concrete plan for the evaluation of the activities carried out in the three TV-RING pilots. This is the result of work conducted in Task 4.1. The plan includes measurements that can be used to clearly show the benefits of the TV-RING service and evaluate real interest of end-users in connected TV services through Next Generation Access (NGA) networks.
The document starts by outlining the overall objectives of the project and then the more detailed objectives of each individual pilot. As the services tested in each pilot are different, the pilot countries have broken down their planned services into individual service elements. The idea here was to allow as a much comparison as possible despite the differences in the overall services.
The approach to the evaluation for both end-user and professional user testing is explained in addition to the technical measurements that will be used to help evaluate the services. The methodology, developed under the guidance of the usability experts at project partner KU Leuven, is explained including how it was developed. Section 8 details the evaluation planning including calendars of events for each pilot.
10 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
4. Action Log 09/05/2014 – Consortium Meeting Berlin. all partners 16/06/2014 – Conference Call. RBB, i2Cat, IRT, RTV, NPO, PPG, KUL, TVC. D4.0.1 Kick-Off
Meeting 27/06/2014 – Conference Call. i2CAT, KUL, IRT, TVC, RBB, NPO. D4.0.1 Follow-up Meeting 11/07/2014 – Conference Call. i2CAT, IRT, TVC, RBB, PPG, RTV. D4.0.1 Follow-up Meeting 29/07/2014 – Conference Call. i2CAT, IRT, TVC, RBB, RTV. D4.0.1 Follow-up Meeting
11 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
5. Objectives
5.1. General evaluation planning objectives
The overall evaluation objectives of the pilots in TV-RING are to determine the criteria for user acceptance tests in order to evaluate the suitability, acceptance and feasibility of the envisaged services, and to define metrics that will be used to measure this.
A further objective is to define the reporting methods for each pilots and finally to plan the evaluation actions for each pilot.
5.1.1. Dutch pilot
In the first scenario, the Dutch pilot partners want to investigate if it is possible to differentiate in stream rate quality by using Digital Rights Management technique (DRM). Will it lead to a simplified and more cost efficient encoding environment for broadcasters and new business models for DRM delivery and companies?
The partners here want to investigate the following questions:
- Can we simplify the encoding process and differentiate the quality of content based on one key with different statuses (basic, premium and gold)?
- Test user perception of service (objective and subjective). Are people willing to pay more for high quality content?
Willingness to pay will be based on a literature study and the Dutch pilot will make a proof of concept for the encoding process with different quality of content.
The goal of the recommendation pilot scenario is to scan household video content consumption by a family and present recommendations for individual persons on the central HbbTV set. Pilot partners want to develop an intelligent recommendation engine data entry that presents personal recommendations, using variables as time of day, device status and historical data.
The questions the partners here want to answer are as follow:
- How can we measure all NPO on-demand usage in a household and how can an HbbTV app make recommendations with this information, making the TV even smarter?
- Can we develop a recommendation engine that suggests, on a personal basis, pro-grams based on the information gathered and that are interesting for that particular person?
In scenario 3, second screen competition, the Dutch pilot wants to investigate how an HbbTV app can act as central interface for group second screen play-along, in a home network.
Questions partners want answered:
- How to pair all (2nd screen) devices in a household and by social media to one ‘master’ app and how can we synchronize these results with HbbTV?
- How do we keep it scalable (Cloud) - How do we manage and encourage in-house “real” social interaction and create and
encourage a competition model?
For both pilot 2 and 3 Dutch pilot partners will do several tests, interviews and observations with a test group of 15-20 different families.
12 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
5.1.2. German pilot
The German pilot will tackle three core challenges namely streaming of ultra-high quality content, developing highly interactive transmedia TV format and opening HbbTV to third parties.
In recent years, media streaming has been the main driver for bandwidth demand, as we moved from stand definition to high definition (HD) and now to ultra-high definition (UHD) formats this demand will further increase. For the first part of the German pilot we plan to deploy and evaluate a selection of adaptive streaming offerings that exploit the full range of typically available bandwidth figures: high bandwidths for very high quality on one end and support for low to medium bandwidth connections on the other end. This content will be available via the HbbTV app “Abenteuer Liebe” (second part of the pilot).
In the pilot the partners want to
- conduct technical measurements to measure the use of this content, - assess bandwidth requirements and technical parameters, - investigate the perceived difference for the user, - determine user demand for this content.
With the second part of the German pilot, an interactive TV format, TV-RING wants to investigate how an HbbTV-based service accompanying a TV show should be shaped. Many users are already familiar with accessing additional media and information from the internet on a different device in parallel. The “Abenteuer Liebe” HbbTV application will offer users ample interaction opportunities on the main screen.
The German partners want to investigate the following questions:
- Is the service usable by first-time HbbTV users and experienced HbbTV users? - Do users perceive the TV show and HbbTV app as a seamless service or do they feel
distracted from the TV show? - Do users feel continuously motivated to use the service? - Do users feel involved in the show? - Do users feel the presence of other users? - Do users enjoy the service?
For the German pilot part as described above, the partners will conduct several qualitative tests, interviews and observations with a test group of approximately 40 different users. As the application will be openly available on-air via German free TV, a much larger number of users are expected to use the application and to implicitly provide data for the technical measurements.
The third part of the German pilot is about opening the HbbTV market to third parties, thus allowing developers and SMEs to freely offer apps directly to the general public. So far, HbbTV applications are mostly tied to broadcast programs and can be accessed through the “red button concept” from within the TV programme. The current HbbTV standard does not provide any specific technology to give access to third party applications. However, the TVAppGallery developed by IRT can open the HbbTV application market to non-broadcasting companies. Initial studies have shown that people are generally excited about the opportuni-ties such an open application directory service could offer and that they - as an end-user - would also benefit from a portal for HbbTV applications. The main challenges for the concept are the legal implications such a service brings and the competition it faces with vertical
13 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
approaches. To tackle these activities around the TVAppGallery pilot are following two complementing approaches.
Firstly, before this portal can be made publically available, a reliable, independent partner to run the backend needs to be found. To this end, the portal will be published and presented to potential partners and will also be demonstrated at trade fairs and mentioned in HbbTV presentations. The portal concept will be discussed with many stakeholders and feedback will be gathered.
Secondly, to further promote the concept and advantages of the TVAppGallery, its design is improved and a selection of attractive applications from the project will be promoted on the first page. To gain more feedback, it is foreseen to perform further user evaluation – specifically towards the following questions:
- Do users feel the need for such an application portal? - Do users recognize the advantages of an open portal? - Do users get understand the portal structure? - What additional functionalities are users expecting from such a portal?
For this evaluation it is planned to have interviews and/or questionnaires with a test group of about 15 people. Apart from this, it is not planned to perform a deeper technical evaluation as first a mass-market deployment would be required which is not the case until now.
5.1.3. Spanish pilot
The Spanish pilot will basically evaluate two main scenarios, high quality video transmission over a Content Delivery Network (CDN) (on a controlled network and on the internet) and advanced interactivity using MPEG-DASH (letting user select multiple points of view). The main basic questions that are to be tested are:
- If we provided multiple-view on-demand content, would people watch more of it? - If people watched an on-demand show that has multiple views, would they enjoy it
more? - Would people watch the show again if extra views were made available on-demand? - Would people be less likely to switch to another channel if live content had multiple
views? - If people watched a live show that has multiple views, would they enjoy it more?
In addition to that, high-quality video transmission over CDN will be tested answering the following questions:
- Do the users appreciate the qualitative signal improvement? - Do the users perceive as positive the signal adaptability?
This pilot will investigate these questions in a qualitative and quantitative fashion. Qualitatively at the CDN controlled environment with a group of test users, between 20 and 40 people (using interviews and questionnaires, etc.). At the universal CDN environment, no interviews have been scheduled. The evaluation in this case will be done collecting quantitative information from video marking and from user facing feedback mechanisms (such as a "Like" button, present in each video).
5.2. Service elements
In this section, a detailed description of all elements that will be under evaluation is given. The objective of providing this granularity is to structure all contents and help to better understand
14 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
what information will be obtained and from what. The diversity of each pilot makes a direct comparison difficult, but instead, atomizing all parts contained in each pilot, makes it easier to set and describe the whole evaluation.
The definition of each element has been done for each partner involved in the pilots according to what is their minor element with value by itself. In other words, if we split this element into smaller pieces, then it will not bring any value for its developer. This consequently means that different criteria have been used, but once evaluated results are more relevant and targeted. Complementary, they are also more exhaustive and can feed a more general pilot evaluation. Otherwise, this would have been more complex and un-structured (or at least this would have been more difficult).
5.2.1. Dutch pilot
Element Name: Content differentiation by using DRM keys
Deploying pilot: Quality differentiation by using DRM
Developer: PPG & Infostrada Delivery date: September 2014
Evaluator(s): NPO & KUL
Description:
Investigate technical possibilities of using DRM technique on quality and archive depth differentiation. That can lead to a more hybrid production and distribution facility that can support both free and non-free video services .The consumer’s willingness to pay research adds a key variable to possible future DRM business models. The app will be composed of the following features: Content (different quality and type), DRM key value (basic, SD or HD), Archive dept.
Objectives (for its evaluation):
Simplify the encoding process and differentiate quality and content, based on one DRM access key with different statuses
Investigate if a certain DRM key plays the right content
Test user perception of service (objective and subjective). Are people willing to pay more for differentiated content?
Investigate the maxium used bandwith
Investigate new DRM business models
Parameters under evaluation:
Engagement: stream starts, page views, archive depth Traffic: maximum served bitrate played
Methodology and data gathering:
Browse stats and archive dept based on Google analytics Analysing log files Literature study on willingness to pay
Involved KPIs: Daily network consumption1
Others: End-user evaluation
1 The network activity will be monitored to identify consumption of this module and of users.
15 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Element Name:
Engagement by Quiz- second screen
Deploying pilot:
HbbTV as a central interface for second screen competition
Developer: PPG, Angry bytes, NPO Delivery date: January 2015 + possibly April 2015 (EU Song festival contest)
Evaluator(s): NPO & KUL
Description:
How can an HbbTV App act as central interface for group second screen play-along, in a closed network
Objectives (for its evaluation):
pair many (2nd screen) devices in a household or other closed network to one ‘master’ app and synchronize the results with HbbTV
Investigate how we can keep the technology scalable
Create and encourage real social interaction
Parameters under evaluation:
Location: region Engagement: duration of visits Actions: number of unique visitors, entry page, exit page Devices: type of second screen device, HbbTV, number of devices Rating (engagement, usability)
Methodology and data gathering:
Google analytics Comscore Questionnaires and (online) survey Interviews
Involved KPIs: Web based usage indicators Number of third party apps included in the pilot
Others:
16 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Element Name: Content differentiation by Recommendations
Deploying pilot:
In-house recommenda-tions for HbbTV and Cable TV apps
Developer: PPG, NPO ICT Delivery date: October 2014
Evaluator(s): NPO & KUL
Description:
Scan the complete household video content consumption on a selected video on-demand (VOD) service and present recommendations for individual persons and groups on the central HbbTV set. Develop an intelligent recommendation-engine data entry that presents both personal and group recommendations, using variables such as of time of day, device status and historical data.
Objectives (for its evaluation):
Investigate how people watch televison content in a household
Investigate what influence the variables mood, time of day, device and family composition at that time of day have on viewing habits
Develop and recommendation engine data entry that presents recommendations for individual persons and groups on the central HbbTV set
Investigate how the outcome can be integrated in existing recommendation models and tools
Parameters under evaluation:
Engagement: number of unique visits, duration of visit, stream starts, click through rate Action: page views, entry page, exit page Used devices: HbbTV and second screen, static PC and laptop Traffic: VOD absolute and average, size of stream absolute and average Rating (how accurate is the recommendation, usability)
Methodology and data gathering:
Google analytics Comscore Questionnaires and (online) survey & rating Analysing log files
Involved KPIs: Average periodic network activity Web based usage indicators
Others:
17 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
5.2.2. German pilot
Element Name: Abenteuer Liebe (AL) App with UHD content
Deploying pilot: German pilot
Developer: RBB, IRT Delivery date: 0. November 2014
Evaluator(s): RBB
Description:
The “Abenteuer Liebe” app (AL app) is an HbbTV application that accompanies a 20-part TV series, during and after the broadcasts, allows for transmedia storytelling, browsing TV and additional non-TV content (UHD video, images, texts), commenting on events happening in the show (input via second screen, texts, pictures and video clips) and taking part in playful voting, ratings, quizzes. Multi-canvas application integrating:
- video player (scaled down and full-screen) - image galleries - text areas - blog (via ScribbleLive API, also voting and rating) - TV box (broadcast picture, either scaled down or full-screen)
For the first round of testing the focus will be on the use of UHD content
Objectives (for its evaluation):
UX-wise to find out: - Do the users perceive any difference in UHD and other content? - Do users enjoy the content - Do users feel continuously motivated to use the content? - Is the service usable by first-time HbbTV users and experienced HbbTV users? - Do users perceive the TV show and HbbTV app as a seamless service or do they feel
distracted from the TV show? Technically to find out about:
- How many users used the content? - What were the bandwidth requirements?
Parameters under evaluation:
UX-wise: Accessibility, overall usability, aesthetics/appeal/attractiveness, enjoy-ment/pleasure, engagement, hedonic quality, flow/immersion, empowerment, sociability, participation, reciprocity, social presence Technically: For video streams: requests per programme, video requests for pilot duration, total bandwidth, video stream size, accumulated traffic, measured traffic per programme
Methodology and data gathering:
UX methods: - Interviews - Meetings - Questionnaires
For technical parameters: - PIWIK - Google Analytics - Akamai LunaControl - Web server measurements
Involved KPIs: Daily network consumption Web-based usage indicator
18 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Element Name: AL App with social interaction Deploying pilot: German pilot
Developer: RBB, IRT Delivery date: Summer 2015 (TBC)
Evaluator(s): RBB
Description:
The “Abenteuer Liebe” app (AL app) is an HbbTV application that accompanies a 20-part TV series, during and after the broadcasts, allows for transmedia storytelling , browsing TV and additional non-TV content (video, images, texts), commenting on events happening in the show (input via second screen, texts, pictures and video clips) and taking part in playful voting, ratings, quizzes. Multi-canvas application integrating
- video player (scaled down and full-screen) - image galleries - text areas - blog (via ScribbleLive API, also voting and rating) - TV box (broadcast picture, either scaled down or full-screen)
For the second round of testing the focus will be on interactive format
Objectives (for its evaluation):
UX-wise to find out: - Is the service usable by first-time HbbTV users and experienced HbbTV users? - Do users enjoy the service? - Do users feel continuously motivated to use the service? - Do users perceive TV show and app as a seamless service or do they feel distracted
from the TV show? - Do users feel involved in the TV show? - Do users feel the presence of other users?
Technically to find out about: - How many users used the app? - How long did users stay in the app? - What are the most used parts of the app? - What is the most used content?
Parameters under evaluation:
UX-wise: Accessibility, effectiveness, overall usability, aesthetics/appeal/attractiveness, enjoyment/pleasure, perceived usefulness, engagement, hedonic quality, flow/immersion, distraction/helpful, empowerment, sociability, participation, reciprocity, social presence Technically: for the application: clicks per visit, duration, average duration for returning visitors, page views, average generation time of the site, average time on page For video streams: video requests per programme, video requests for pilot duration, total bandwidth, video stream size, accumulated traffic, measured traffic per programme
Methodology and data gathering:
UX methods: - Interviews - Meetings - Questionnaires
For technical parameters: - PIWIK platform - Google Analytics - Akamai LunaControl - Web server measurements
19 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Involved KPIs: Daily network consumption, web-based usage indicator
Element Name: TVAppGallery Deploying pilot: German pilot
Developer: IRT Delivery date: M20
Evaluator(s): IRT
Description:
The TVAppGallery is a system to provide an open marketplace for HbbTV applications. Mostly, HbbTV applications are tied to a broadcast programme and are accessed through the “red button” concept. The current HbbTV standard does not provide a specific technology to give access to third party applications. The TVAppGallery opens the HbbTV application market to developers and SME who otherwise have to buy in to proprietary app portals offered by some device manufactures. But the TVAppGallery makes it possible to have equal opportunities and an efficient access to SmartTV devices for all parties.
Objectives (for its evaluation):
UX-wise to find out: - Do users need this service? - Are they willing to use the portal? - Feel users comfortable with the menu structure? - Could they handle certain configuration steps? - What do users think about the portal idea?
Parameters under evaluation:
UX-wise: - Accessibility - Effectiveness - Overall usability - Aesthetics/appeal/attractiveness - Usefulness
Methodology and data gathering:
The data will be gathered by interviews, meetings or questionnaires.
Involved KPIs: Number of 3rd party apps included in the pilots
Others:
20 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
5.2.3. Spanish pilot
Element Name: MPEG DASH encoder Deploying pilot: Spanish pilot
Developer: I2CAT Delivery date: M13
Evaluator(s): TVC & i2CAT technicians
Description:
A MPEG-DASH encoder and segmenter developed by i2CAT. This software is capable of receiving live RTP H264/AAC streams, to decode them, and to re-encode them in different qualities with different parameters and to then encapsulate them as different DASH tracks. It is run as Software-as-a-Service (SaaS) with a simple RESTful API and an according test web application.
Objectives (for its evaluation):
The solution delivered by i2CAT targets to become a competitive alternative to current software-based commercial solutions such as Wowza. For this reason, the encoder will be evaluated in order to assess its performance and strong points.
Parameters under evaluation:
Performance metrics to be measured:
Resources usage: CPU, memory, etc.
Maximum number of live video tracks per DASH stream
Maximum transcoding quality
Methodology and data gathering:
Quantitative analysis using specific metrics. A fixed test video set will be used to feed the DASH encoder in lab and desired parameters will be measured. At least 3 different bitrates will be considered (10Mbps, 6Mbps and 3Mbps).
Involved KPIs: Daily network consumption Web-based usage indicator
Others:
21 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Element Name: Local Managed CDN Deploying pilot: Spanish pilot
Developer: i2CAT Delivery date: M13
Evaluator(s): I2CAT
Description:
Over the managed network of the pilot, i2CAT will deploy proxy caches in order to set up a delivery network for a local area. The idea is to provide congestion control techniques such as HTTP caching as close as possible to the end-user.
Objectives (for its evaluation):
It is of relevant interest to demonstrate the effectiveness of a simple local CDN solution for distributing media contents in scenarios like those defined by the Spanish pilot in TV-RING. For this reason, it will be evaluated if its performance is sufficient to support a later potential new business model. In addition, the evaluation process will contribute to improve and optimize the whole system’s performance and to study its scalability.
Parameters under evaluation:
Performance metrics to be measured
Bandwidth consumption from the origin server
Bandwidth consumption from the proxy cache
Bandwidth savings
Latency
Methodology and data gathering:
The measurements will be simulated and tested in a lab in order to refine the procedure for the data recording and collecting that will be used and automated on the field. At first, it will be tested and evaluated in a lab, then the evaluation tools will be deployed on the field to collect and validate the expected results with real data.
Involved KPIs: Daily network consumption Web-based usage indicator
Others:
22 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Element Name: Global CDN Deploying pilot: Spanish pilot
Developer: Retevision Delivery date: M13
Evaluator(s): Retevision
Description:
Retevision will bring the Usage and Performance CDN measures taken from a set of CDN reporting tools and reported in a separate specific portal web. A spectrum of technical parameters will be measured and reported in order to have a clear idea about the Pilot content usage/consumption.
Objectives (for its evaluation):
Compared with the local managed CDN, the external CDN is a worldwide network that is going to be used in the Spanish pilot to deliver HbbTV applications and MPEG-DASH content. Retevision is developing its own web portal illustrating the CDN performance and usage measures. These will be available for the project reports and in comparison with the ones taken from the local CDN which will help to optimize and dimension the local network.
Parameters under evaluation:
Usage and Performance metrics to be measured: Consumption (Usage)
GB per month
Total Requests
Origin Volume Throughput (Performance)
Peak Req./Sec
Avg Req/Sec
Bandwidth at 95% (Mbps)
Cache Efficiency (%)
Peak Mbps
Avg Mbps
Peak Origin Mbps
Avg Origin Mbps
Methodology and data gathering:
Measures will be taken from the specific network tools we are going to use during the Pilot performance. Some measures are defined and will automatically be gathered from these tools and some others will need some labour to be generated. Retevision is developing a tool to allow more easily gathering these measures from the CDN log files. This is going to be a PHP + Java WEB tool. This tool will be used to ease gathering the consumption and performance information directly from the CDN automated data generator and to better understand what and how it is measured. This will enable operators to easily perform the required information reporting.
Involved KPIs: Daily network consumption Web-based usage indicator
Others:
23 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Element Name:
TV3 A LA CARTA Multicamera service on HbbTV
Deploying pilot: Spain
Developer: TVC Delivery date: October 2014
Evaluator(s): TVC
Description:
Add-on for the existing HbbTV IPTV at TVC (TV3 A LA CARTA) to support multiple points of view of a on-demand or live video. Content is to be delivered using the two network elements (local and global CDN).
Objectives (for its evaluation):
Multiple view on-demand content, yields to more content being watched
More user enjoyment of on-demand content that has multiple views
On-demand repeated consumption of content previously watched live due to availability of multiple views
Less channel zapping when live content has multiple views
More user enjoyment of live content that has multiple views
Parameters under evaluation:
Engagement: duration of visits Actions: number of unique visitors, entry page, exit page Devices: HbbTV, number of devices Rating (engagement, usability)
Methodology and data gathering:
Adobe Omniture Questionnaires and (online) survey Interviews
Involved KPIs: Web-based usage indicators
Others:
24 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
6. Approach
The TV-RING pilots have a comprehensive approach to the evaluation and validation of the deployed technologies and contents. Therefore, the pilot evaluation includes a thorough assessment of the user experience across several UX metrics, and a series of performance tests to measure key technical parameters at regular intervals.
Image 1: TV-RING pilots evaluation approach
Under the user evaluation, data generating actions are envisioned for both end-users and professional users, thus covering the whole spectrum of pilot stakeholders.
6.1. User evaluation of applications
6.1.1. End user evaluation
In this section we present the User Experience (UX) factors that will be measured in each pilot. Most of these have been gathered from the state-of-the-art UX literature. Some factors are new because they did not exist yet; these constructs will have to be created and introduced in this project.
TV RING pilots
User evaluation
End users Professional
users
Technical measurements
25 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Pilot region Spain Germany The Netherlands
Service element User experience factors
Multicam Live
Multicam VoD
Aben-teuer Liebe
TVAG DRM Recom-mender
2nd Screen
aesthetics/ appeal
X X X X
distraction X
empowerment X
engagement X X X X X
enjoyment X X X X X
expectations X X X X X X X
flow X X
habits X X X X X
motivation X X
overall UX X X X X X X
participation X
reciprocity X
sociability X X
social image X
social presence X X
usability X X X X X X X
willingness-to-pay
X
Table 1: UX factors to be used for the service elements
26 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
6.1.2. Professional user evaluation
For the Dutch pilots, interviews and online questionnaires are foreseen to ask the opinion of professional users about the outcomes of all three scenarios.
In the German pilots, RBB will be concentrating on the end-user app, which will be developed in conjunction with professional users and the type of content they can and will provide, but as no professional Content Management System (CMS) is being developed to create the app, there will be no dedicated technical professional user tests. Feedback will be gathered about the general concept and experience from an editorial perspective. This will be done before the start of the pilot, during and after the pilot has been concluded.
At the Spanish pilot, validation of the TV-RING technologies from the point of view of the professional users will be achieved through two sets of research actions:
- For the global CDN element, with a series of in-depth interviews with the professionals from TVC and RTV who are involved in the pilot. These interviews will be conducted towards the end of pilot phase 1 (around November and December), and will be fo-cused on confirming that professionals feel comfortable working with the technologies of the pilot, and eliciting suggestions of small tweaks and improvements.
- For the local controlled CDN element, professional users from I2CAT and project collaborator Guifinet will be involved in the four focus groups planned at the beginning and the end of each pilot phase.
6.2. Technical evaluation of platform
The following table gathers the elements that will be evaluated, including contents, applications and backend platform services, which have been developed in TV-RING. Depending on the specific element, different approaches will be followed
27 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Approach
Element
End User Professional
User Usage
(Metrics) Performance
(Metrics)
Local Managed CDN X X X
Global CDN X ? X
Multi Camera Service X X
Multi Camera Content/ Oh Happy Day
X X X
MPEG-DASH Transcoder X ? X
AL app X X X X
TVAG X X X
Recommendations X X X X
DRM X X X X
Quiz - second screen X X X X
Table 2: Evaluation methods to be used for the service elements
28 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
6.2.1. Technical measurements
The technical measurements for the performance evaluation of the pilots will be classified in the following categories:
- Location - Engagement - Actions - Devices - Traffic (with and without MPEG-Dash)
The following tables give an overview over the parameters of the measurements. There are some risks that measurements will not provide useful data because most of the web analytic tools are developed for normal (X)HTML-applications and will be modified for HbbTV. Another reason for a failure of some measurements is that MPEG-Dash is a very new standard. The implementation of the features regarding MPEG-Dash is at the very beginning and hardly tested. The risks are described for every category in the tables.
Category: Location
Definition: The physical location of the user who used the respective TV-RING-Apps/Services.
Sources: IP-Address, Geo-location databases
Tools: PIWIK, Google Analytics, Akamai LunaControl, Adobe Analytics (formerly Omniture SiteCatalyst), Other internal
Risks: Precision could not be so high if there are legal issues regarding the saving of IP-Addresses
Some regions and countries do not have high precision
Spanish Pilot
Dutch Pilot
German Pilot
Parameters: Country x x x
Region x x
City x
Responsible: TVC NPO/PPG RBB
Periodicity2: m m w
Table 3: Location parameters
2 Periodicity values: annually (y), monthly (m), weekly (w), daily (d)
29 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Category: Engagement
Definition: Measurement of the duration of the visits
Sources: Included JavaScript, Tracking Pixel
Tools: PIWIK, Google Analytics, Akamai LunaControl, Adobe Analytics (formerly Omniture SiteCatalyst), Other internal
Risks: If Cookies will be deleted it's not possible to detect returning visitors
Legal issues regarding the User Engagement
Spanish Pilot
Dutch Pilot
German Pilot
Parameters: Visits per visit duration [min:sec] x x x
Average duration for returning visitors [min:sec]
x x x
Responsible: TVC NPO/PPG RBB
Periodicity: m m w
Table 4: Engagement parameters
30 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Category: Actions
Definition: Basis behaviour of the user by using the app/service
Sources: Included JavaScript, Tracking Pixel
Tools: PIWIK, Google Analytics, Akamai LunaControl, Adobe Analytics (formerly Omniture SiteCatalyst), Other internal
Risks: The JavaScript-Engine could be not fully implemented on elder Settop-Boxes, maybe not all parameters will be measured.
This measurement will just measure the behaviour on different "sites" in the HbbTV App.
Dynamically generated content in the frontend ("AJAX"). This will maybe not fully be evaluated by the tools.
It's difficult to find out the reason for broken streams (e.g. Application failed, problems with the provider...)
Spanish Pilot
Dutch Pilot
German Pilot
Parameters: Pageviews [number] x x x
Average generation time of the site
[sec]
(x) tbc x x
Average time on page [sec] x x x
Entry Page [URL] x x x
Exit Page / Exit Rate [URL][%] x x x
Responsible: TVC NPO/PPG RBB
Periodicity: m m w
Table 5: Actions parameters
31 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Category: Devices
Definition: Used Devices of the end user
Sources: The JavaScript-Engine could be not fully implemented on elder Setup-Boxes, maybe not all parameters will be meas-ured
Some devices will not be in the USER-AGENT database at the beginning of the measurements
Tools: Included JavaScript, Tracking Pixel
Risks: PIWIK, Google Analytics, Akamai LunaControl, Adobe Analytics (formerly Omniture SiteCatalyst), Other internal
Spanish Pilot
Dutch Pilot
German Pilot
Parameters: Used Device [SetupBox] x x x
Manufacturer [brand] x x x
Model/Firmware version [modelNr][FirmwareVersion]
x x x
Used Browers [browserFamily] x x x
Resolutions of the device [width|heigh]
(x) tbc x
Responsible: TVC NPO/PPG RBB
Periodicity: m y w
Table 6: Devices parameters
32 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Category: Traffic (without MPEG DASH)
Definition: Measurements regarding the traffic of the pilot applications
Sources: Internal databases, Logfiles
Tools: PIWIK, Google Analytics, Akamai LunaControl, Adobe Analytics (formerly Omniture SiteCatalyst), Other internal
Risks: It's difficult to find out the reason for broken streams (e.g. Application failed, problems with the provider...)
Spanish
Pilot Dutch Pilot
German Pilot
Parameters: Video requests per programme [count]
x x
Video requests while pilot duration [count]
x x
Total bandwith [Mbit/s] x x
Video stream size [byte] x x
Accumulated traffic [Mbit/s] x
Measured Traffic per programme [Mbit/s]
x
Responsible: TVC NPO/PPG RBB
Periodicity: m w
Table 7: Traffic parameters (without MPEG DASH)
33 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Category: Traffic (only MPEG-DASH)
Definition: Measurements regarding the traffic of the pilot applications
Sources: Internal databases, Logfiles
Tools: PIWIK, Google Analytics, Akamai LunaControl, Adobe Analytics (formerly Omniture SiteCatalyst), Other internal
Risks:
It's difficult to find out the reason for broken streams (e.g. Application failed, problems with the provider...)
The End-User-Devices choose the quality of the video - not the server from the broadcaster. It depends on the DASH-Implementation of the manufacturer which works all a little bit different and not always perfect.
Spanish
Pilot Dutch Pilot
German Pilot
Parameters: Video requests per programme [count]
(x) tbc x
Video requests while pilot duration [count]
(x) tbc x
Total bandwidth [Mbit/s] (x) tbc *
Video stream size [byte] (x) tbc *
Accumulated traffic [Mbit/s] (x) tbc *
Measured Traffic per programme [Mbit/s]
(x) tbc *
Responsible: I2CAT/RTV IRT
Periodicity: w (tbc) **
Table 8: Traffic parameters for MPEG DASH
*German pilot: MPEG DASH service is not implemented at the used streaming platform (Akamai) so far. At the moment, these measurement values could not be assessed for the pilot. If during the pilot phase the service will be supported, it is possible to provide these values.
**German pilot: There is no experience with these values for MPEG DASH streaming, so at the moment it is not clear which periodical phases make sense to provide an appropriate analysis.
34 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
7. Evaluation Methodology
7.1. Applications
7.1.1. Methodology developed for TV-RING UX Evaluation
This section describes how we developed the UX evaluation methodology to be applied in the TV-RING pilots. This methodology is based on the latest insights from the literature on how to conduct scientifically rigorous user experience evaluations. In several critical reviews of past UX evaluation practices, a number of concerns were formulated:
There is still a lack of systematic research on how to evaluate and measure UX [28].
When looking at when user experience is evaluated, it was observed that most evaluations only occur after use. Studies of actual use, or before use (expectations for example), are rare [28].
There are no truly longitudinal studies of UX [1].
Scientific quality is a big concern in studies incorporating UX before use: non-validated metrics are used, and self-created metrics are not documented, not clearly defined [1][28].
(Too) many UX constructs exist; the relation between these constructs is rarely clarified or investigated [1].
To address these issues we made sure
to collect relevant UX constructs together with existing validated instruments for measuring them,
to set up measurements before pilot deployment, during pilot deployment and afterwards,
to make use of qualitative and quantitative data collection methods,
to obtain objective and subjective information,
to use correct procedures for the creation of novel UX measures that have to be created [7],
to combine retrospective UX methods with longitudinal approaches.
Based on these insights we created a series of steps by which we aim to address these issues. The procedure will allow the TV-RING project partners to:
perform proper quantifiable modelling, and have generalizable findings
gain sufficient qualitative insight into the how and why
predict which UX qualities (criteria) a user is very likely to experience with an interac-tive entity of interest, given integration and interaction of specific UX factors (predic-tors) [23]
determine the likelihood a user will purchase or adopt a system/product/service (criterion), based on a specific set of user experiences (predictors) [23]
The following procedure sketches how the TV-RING UX evaluation methodology was created:
1. Based on a review of the literature, Pilot UX Evaluation Templates were created (see Annex 11.1), in which the responsible partners could fill in their pilot evaluation details. Given that the partners responsible for the pilots knew their applications and test envi-
35 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
ronment best, and that KU Leuven is expert in User Experience Evaluation, these docu-ments were completed in a number of iterations in which feedback was exchanged.
a. The first section contained a brief description of the pilot, the application, and the number of users.
b. The next section in this document concerned the focus of the pilot evaluation: the responsible partners were to formulate the research questions in their own words (“What do you want to learn from this pilot”?).
c. The section after this linked the Research Questions of each pilot scenario to re-lated UX measures in one table. To help the partners with this, a separate docu-ment was created containing relevant UX measures (see Annex 11.2). These meas-ures were collected from literature. We also made sure these covered the six cate-gories of UX formulated in [6]: emotional value, interpersonal value, epistemic value, functional value, and conditional value (see Figure 2).
d. The final section contained a table to link the methods to the chosen measures. There were three methods tables: one for carrying out a measurement before the start of the pilot (expectations etc.), one for measuring during the deployment of the applications, and one for a final evaluation after deployment. For this final sec-tion we also included a table that contained a overview and description of estab-lished methods for UX evaluation (see Annex 11.3). In addition, a number of exist-ing UX evaluation procedures in literature (exemplars such as [20]) were distrib-uted, for inspiration.
2. After the UX templates were completed, KU Leuven gathered all the input, reviewed and processed the information to create the UX evaluation plan. This plan makes sure that specific UX measures can be targeted in the different application scenarios and that at the same time, a number of measures can give some cross-pilot insights.
Image 2: Values in Action: overall UX framework [6]
36 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
7.1.2. The TV-RING UX Evaluation Methodology
In this section we will explain the methodology in detail. An overview was created in Excel; however, because of its size and complexity, it is included in the Annex (see Annex 11.4). A number of important points across pilot have to be made first. The following table illustrates a high-level overview of the UX evaluation methodology:
Before pilot deployment
During pilot deployment After pilot deployment
Right before episode of use
During episode of use
Right after episode of use
Larger scale measure-
ments
baseline measurements (expectations, current use situations)
questionnaires
mainly logging, can occur without disturbing the participant
short questionnaires
one final survey, inquiring about the entire, overall experience
In-depth investigation
baseline measurements (expectations, current use situations)
interviews
focus groups
interview about what people are expecting for the coming episode of use
observation (probably via installation of cameras in households)
interviews about what happens, about observations, about the experience that just happened
one final, overall assessment, via focus groups, or UX curve inter-views; the latter charts the retro-spective user experience
The UX evaluation is comprised of three main periods: before the deployment of the application, during the deployment of the application, and afterwards. Before the deployment we can gain insight into the way participants currently use similar application, and what their expectations are. During the deployment we foresee three periods: right before an episode of use, during an episode of use, and right after an episode of use. With an episode of use we mean the period that participants are actually using an application. In some pilots, such as the Multicam Live and 2nd Screen pilot applications this refers to the time of the broadcast of the respective shows, in which certain functionality can be used. The entire period of pilot deployment will contain many episode of use, with periods of non-use in between. In all pilots, except for the DRM pilot, we foresee more lightweight measurements (mainly quantitative) that can be conducted on a larger amount of users, and more intensive evaluations (mainly qualitative) that can only be conducted with a more limited amount of users. In each measurement period we have defined the to-be-measured UX factors, and the method by which it will be evaluated.
37 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
7.1.2.1. Spain - Multicam Live Application Scenario
This section contains the UX evaluation methodology for the Multicam Live application scenario in Spain. For this scenario, a focus group session will be carried out before and after the deployment of each of two pilot phases, in which end users and project professionals will work together to understand and co-create aspects of the application which need further elucidation and polishing.
Also, right after each episode of use of the application, user panel members will answer a short online questionnaire of satisfaction with the application, to obtain a quantitative measure of satisfaction and have a satisfaction benchmark with which to assess the progress of the application towards an optimal solution from the user’s point of view.
However, the bulk of the user experience evaluation data is expected to come from video observations and in-depth in situ interviews with the user panel households. These users will record with their own devices their reactions and interactions while they watch the show live on Saturday night, and send the recordings to the project researchers for further analysis. The post-event semi-structured interview will serve as a participant debriefing of the whole experience, and clarify any aspect that requires a deeper understanding:
Table 9: UX evaluation methods for Spanish pilot Multicam Live
38 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
7.1.2.2. Spain - Multicam VoD Application Scenario
This section contains the UX evaluation methodology for the Multicam VoD application scenario in Spain. This scenario will be evaluated alongside the Multicam Live application scenario, using the same methods and in the same set of research actions.
Table 10: UX evaluation methods for Spanish pilot Multicam VoD
39 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
7.1.2.3. Germany - Abenteuer Liebe
This section contains the UX evaluation methodology for the “Abenteuer Liebe” application scenario in Germany. For this scenario, RBB will organise an introductory meeting for all users from the dedicated test panel. At the beginning of the home-use phase, RBB will carry out interviews with each user from the test panel about their expectations. During the pilot phase each test user will fill out online questionnaires, weekly or at least every second week. At the end of the pilot in RBB will carry out closing interviews with all users from the test panel. This will be followed by a closing event for participants. This process will apply to both phases of the pilot.
Also the editor(s) who will be responsible for the app during the pilot will be interviewed before the pilot starts and after the pilot ends. The feedback of the editor(s) will also be documented regular through interviews or questionnaires weekly or every second week.
As the application will be openly available on public TV in Germany, we expect to gather a larger amount of quantitative data about the general usage patterns and technical measurements:
40 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Table 11: UX evaluation methods for German pilot Abenteuer Liebe
41 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
7.1.2.4. Germany – TVAppGallery
This section contains the UX evaluation methodology for the TVAppGallery. For this element a “in depth evaluation” is planned. It is foreseen that it starts with an introduction, followed by the evaluation and ends with a personal interview. For the evaluation the user panel members will be asked to answer a comprehensive questionnaire to gather information about the accessibility, effectiveness, usability and attractiveness of the TVAppGallery:
Table 12: UX evaluation methods for German pilot TVAppGallery
42 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
7.1.2.5. The Netherlands - DRM
This section contains the UX evaluation methodology for the DRM application scenario. For the larger scale evaluation we will use questionnaires to investigate people’s expectations, motivations and experience concerning the DRM solutions. During the use, several technical logs will be kept to investigate how people use it. After pilot deployment, a more extensive questionnaire will be launched targeting attractiveness, engagement, and motivation for the users, overall UX and usability.
Table 13: UX evaluation methods for Dutch pilot DRM
43 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
7.1.2.6. The Netherlands - Recommender
This section contains the UX evaluation methodology for the Recommender application scenario in The Netherlands. For the larger scale evaluation we will use questionnaires to investigate people’s expectations, motivations, and habits concerning recommender systems, or choosing what to watch. During the use of the recommender, several technical logs will be kept to investigate how people use it. Furthermore, we will introduce very small questionnaires concerning relevance of the recommendations, the timing (one item might be suitable on Saturday morning, but not during prime-time), and the suitability for the viewers (a good recommendation for one person, but the whole family is now watching). After pilot deployment, a more extensive questionnaire will be launched targeting relevance, timing, suitability for the group, overall UX and usability. For the in-depth evaluation we will conduct Skype or telephone interviews to inquire about how participants used the recommender and why.
Table 14: UX evaluation methods for Dutch pilot Recommender
44 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
7.1.2.7. The Netherlands – 2nd Screen
This section contains the UX evaluation methodology for the 2nd Screen application scenario in The Netherlands. This pilot focuses on a 2nd screen app for the Dutch TV program “De Rijdende Rechter”, and possibly also the Eurosong contest in 2015. This application will augment the TV program by allowing people to interact with the show by voting, and playing against other members in the household. 10-50 households will take part in the larger scale evaluation, in which the main method of inquiry is a survey. In this survey many UX factors are included. In the beginning we focus on what people expect from the show and the app, how they currently watch this show, and why. During the show, technical logging data can inform us about the participants’ engagement. Right after the show, participants will receive an online survey with a number of UX measure. On exception will be “flow” via the Flow State Scale, which is a validated questionnaire for measuring flow consisting of 36 questions. Because of its size, it is not feasible to include this with the other UX measures. Thus, to measure flow, one or two episode of the show will be shown, where only flow is measured; all the other episodes will then include the other UX measures and not the Flow State Scale questions. Finally, after the pilot deployment, a final assessment will be carried out, inquiring about how people evaluate the whole pilot period. Next to the larger scale evaluation, we will conduct an in-depth UX evaluation, consisting of in-house visits, with pre-episode interviews, observations via camera installation in the home, and post-episode interviews again. Given that these methods require a substantial effort, between 5 and 10 households will be recruited. After the pilot deployment, we will also conduct an evaluation into how the households evaluate the whole pilot period. This will happen via the UX Curve method, an advanced UX evaluation method that allows researchers to chart the evolution of several UX aspects over time, retrospectively.
45 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Table 15: UX evaluation methods for Dutch pilot 2nd Screen
46 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
7.2. Platform
The technical evaluation methodology of the services will be the same as used for web applications. At the moment there are no specific HbbTV service evaluation methods or tools. The evaluation methodology consists primarily of three main tasks:
- data collection - data storage and processing - analysis of the data
7.2.1. Data collection
The collection of data is the most important aspect but it should be running in the background if possible, so that the user won’t be interrupted while he is using the service. There are two different main approaches for collecting user information. The first possibility is, to collect data via log files generated by the server. Every request sent from the browser will be registered in the log file. This file is usually a text file which is generated anew every day. Typical data which could be gathered by log files is:
- Date and time of request - URL - IP address - User agent - Referrer - Status - Cookie
The most common way to gather user information can be achieved through page tagging. This method is applied on client-side. The user agent of the client records the user behaviour. By means of JavaScript and a 1x1 pixel picture it is possible to read the user agent configuration and some of the client actions. Typical information which could be gathered by page tagging is:
- Mouse actions, clicks, and the position - Keyboard input like form content - Screen resolution - Installed plugins like Flash, Java or QuickTime - Language - Additional functionalities like Cookies or Java - Duration or interruption of multi-media files like videos - Etc.
Compared with the log file output, the data gained with this method is much more extensive. The entire information gathered by server log files is also available for client-side methods.
7.2.2. Data storage and processing
Data storage should be well considered. A huge pile of data can be collected in short time. So the backend should contain a great server capacity and performance, and should be fail-safe (including backup mechanism). Therefore two solutions are possible, either hosting an own infrastructure or using the infrastructure and the software of a third party (SaaS). Both variants have its advantages and drawbacks. Cost factor, time exposure and privacy issues are the main criteria’s that must be considered. There are several products on both sides:
47 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
- Internal storage: PIWIK, Open Web Analytics, AWStats, Webalizer, etc. - Software as a Service: Google Analytics, Adobe Analytics (SiteCatalyst), Yahoo Web
Analytics, etc.
7.2.3. Data analysis
Last task for the evaluation phase is the data analysis. Following the table with the technical parameters (see chapter 6.2) it is recommended to use analysis reports which will be introduced now. Each analysis will be generally generated per pilot element, but in some cases it is not possible to raise the values. This could happen e.g. for video parameters when the application contains no video or for legal reasons, if the institution is not allowed to measure the values.
7.2.3.1. Location
Listing of countries, regions and city from where users visited the application. It will be measured the count of the users per location.
Image 3: Measurement of end user location
7.2.3.2. Engagement
Creating a diagram where visits and returning visits per element over the pilot duration time could be shown. Depending on the kind of pilot, different active application phases are possible. Some applications are bound to TV shows and therefore only available to the end user during the broadcasting time of the show. In this case it makes no sense to use a time scale of the whole pilot duration time. Each element should use its own appropriate time scale for these diagrams.
48 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Image 4: Measurement of element visits
Image 5: Measurement or returning visits
Next Image shows an example how PIWIK is illustrating the visit duration time. In this case most of the users stayed between 0 and 10 seconds. It gives just a simple overview.
Image 6: Measurement of visits per visit duration
49 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
7.2.3.3. Actions
The user interactions on the HbbTV application could be also represented in tables or in appropriate diagrams. The definition of these values and how they are gained will be explained now.
Page views: The number of times this page was visited.
Unique page views: The number of visits that included this page. If a page was viewed multiple times during one visit, it is only counted once.
Bounce rate: The percentage of visits that stared on this page and left the website straight away.
Average time on page: The average amount of time visitors spent on this page (only the page, not the entire website).
Exit rate: The percentage of visits that left the website after viewing this page.
Average generation time: The average time it took to generate the page. This metric includes the time it took the server to generate the web page, plus the time it took for the visitor to download the response from the server. A lower “Avg. generation time” means a faster website for the visitors.
Image 7: Measurement of user actions
7.2.3.4. Devices
Gathering this information will be very interesting also for HbbTV distribution of the different versions. The main problem for this measurement is the wide range of end devices and their fast grow. Most of the tools do not have the latest database of current settop-boxes and connected TVs, so it happens very often that the number of unknown devices is very high.
50 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Image 8: Device analysis
7.2.3.5. Video traffic
To evaluate the multimedia traffic, some parameters are suggested previously in section 6.2. This part includes every video or audio encoded file, delivered through the various HbbTV applications. The intention is, to get a meaningful impression about of how well it was received by users and how much traffic it caused. At least six parameters were chosen to retain and document the traffic. These parameters are explained in more detail here:
Video requests per programme: Periodically the video requests for each programme (episode) will be counted. Therefore it is insignificant if the users are unique or returning visitors.
Video requests while pilot duration: This parameter will be assessed also periodically. It represents the sum of all video request of the application. The values will be cumulative within the time periods.
Total bandwidth: This parameter shows the total bandwidth consumption for each period.
Accumulated traffic: The value of this parameter represents the total traffic, caused by the application. These values will be also cumulative within the time periods.
Measured traffic per programme: Periodically the traffic for each programme (episode) will be measured. Therefore it is also insignificant if the users are unique or returning visitors.
Video stream size: The size of the provided multimedia files will be documented here.
To illustrate these values it is recommended to use suitable diagrams - depending on the possibilities the analysis tools provide.
51 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
8. Pilot evaluation planning
The three pilots will last up to a maximum of 12 months. Following this, a final report with the results of all pilots will be delivered as Deliverable D4.3 “Evaluation results”. This document will contain the same basic information provided by all pilots, although more information can be provided about each pilot if its partners see it useful or necessary for their specific purpose.
The main details that need to be defined before starting the pilots include:
- Scope and objectives of the pilots
- Participating users, locations and duration
- Support and Communication plan for the pilot
- Known risks and contingency plans
- Schedule for deploying and conducting the pilot
- Planning of evaluation reporting
8.1. Scope and objective of the pilots
It is necessary before commencing with each pilot, to clearly identify what are the main purposes for running that pilot and what is expected to get out of its execution. The answer to these questions will give us the guide to implement the reporting. These objectives have been identified by the partners and are listed in section 5 of this document.
8.2. Participating users, locations and duration
The second point to be identified after answering the first one is how to carry out that pilot in order to achieve that target.
In that case, we will need to define how many participants we are counting, their location and the pilot or sub-pilot (testing) duration.
We need to take into account if the number of participants matches with their location and the duration of these tests. If not, we will need to modify these parameters in order to have a closer scenario where all users could easily contribute.
Important points to be taken in considerations are:
- All users have to be able to access the same TV and video content.
- Duration does not need to be so large; this makes the end users to decrease its interest in participating in the tests.
- Internet access cannot be a trouble or excuse for not participating in the tests (for example in some rural places).
- The pilot end users sample must be enough representative of its population.
8.3. Support and communication plan for the pilot
The pilots have to assume that sometimes by mistake or other reasons any test could need to be modified, due to end users behaviour or due to for example a question or template improvement.
We need to clearly define how these changes are requested, submitted, approved, tested, and implemented.
52 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
In case that an end user has any issues, or just has a question, we need to define where they could post these issues/questions. The use of an email account or telephone line could be so useful in these cases. Although a dedicated web site to report any issue or mistake or the use of a FAQ could also be considered.
When the pilot consortium is aware of an issue, they will need to study and review the situation, prioritize, take decisions and finally fix that issue or just answer the questions.
A process is defined to get and notify the appropriate personnel:
1. Issue is communicated or detected by pilot operators/administrators. 2. Issue is analyzed by pilot operators/administrators 3. Issue is communicated to pilot consortium 4. If it is the case, the issue is forwarded to project consortium 5. Decisions or actions are decided. 6. Pilot operators execute these actions and communicate with end users.
Image 9: Support and communication plan
53 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
8.4. Known risks and contingency plans
As the pilot complexity is not low, we can identify several potential risks in its implementation. Some of the known risks are listed here but others that have not been considered could occur.
- Costs above expectations/limits
- HbbTV receivers, as they are devices, can crash
- End users Internet connection can fail.
In the first two cases this will imply to increase the budget. This point should be discussed within the project consortium and some decisions will be needed. It obviously will depend on how much the budget needs to be increased.
For the third case, we can check the Internet connection and try to fix the problem, or contact directly the Internet Service Provider (ISP), but at last if no solution is found some people have to be identified from the beginning just in case we need them to act as backup in the pilot development.
8.5. Schedule for deploying and conducting the pilot
Each pilot will have their own schedule to be deployed, but all pilots will start at M13 of the project and finish at month M24 at the latest.
8.5.1. Dutch pilot
The Dutch pilot is scheduled for M13 till M23. The first scenario will be the Recommendations that will begin in M13 (September 2014) and will finish in M19 (March 2015). Secondly the DRM scenario will start M14 (October 2014) and will end M20 (April 2015). The last started scenario will be the Quiz- second screen, from M17 (January 2015) till M23 (July 2015).
8.5.1.1. Application testing
The Dutch pilot partners want to invite current HbbTV users, who already use the NPO ‘Uitzending gemist’ Platform on HbbTV. These users already have access to HbbTV via satellite (canal digital), digitenne, through fiberglass operators (Glashart) or Cable via the providers CAIWay, SKV Veendam, Cable Nord or Delta. This has a reach of around 450.000 unique monthly visitors. Pilot partners will create a banner on the existing platform to invite users to join our pilots. Via a QR code or URL they can connect to a Google form where they can find more information and leave their e-mail address and contact information.
For the pilots people must be in possession of a Samsung 2013 or 2014 model, HbbTV version 0.5 compliant. PPG can detect if the TV has HbbTV and will only show the banner on these TV’s. To see if the TV’s are 0.5 compliant the owners will have to send their model details, which will be asked on the Google form.
The Dutch pilot asks their local partners (CAI, Cable Nord, Delta, SKV and Glashart) to actively approach their users (by digital newsletter or e-mail) to explain the project and invite users to participate. Pilot partners can also use social media to invite people to participate.
If the Dutch pilot does not get enough responses or there are not enough TV’s 0.5 compliant, they can hire a recruitment office to select people to participate.
54 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
The pilot needs around 30-40 test-users willing to participate. To make it attractive NPO can offer them a free account for NL Ziet (and NPO Plus) during the pilot. NL Ziet is the paid (monthly fee) on demand service from RTL, SBS and NPO. Test-users must sign a form to accept access to their IP data & data logging for a certain period. They must also be willing to fulfil questionnaires (before, during and after the pilots) and participate in interviews.
Scenario 1 (DRM)
A test group of around 20 individual users, varying in age, gender and type of TV-viewer.
Scenario 2 (Recommendations)
A test group of around 15-20 families of different nature and composition.
Scenario 3 (Quiz- second screen)
10 or more families or groups living or being together in different nature and composition (with and without children, student house, sport clubs, bars). Already used to HbbTV and second screen.
8.5.1.2. Platform testing
The test group acquisition and recruitment will be planned in August and September 2014. We will recruit test users in the existing HbbTV Catch-up TV platform, by local partners who offer the infrastructure and access for HbbTV in the Netherlands and via social media. We will organise an introduction meeting for all users from the dedicated test panel or contact them online before the actual start of the pilots.
We will start with interviews on how people watch hybrid TV, being the combination of linear and on demand content, and how there moods are and organize observations how user groups watch television in September. We will repeat these sessions in January, March and April 2015. Besides we will carry out questionnaires in the same periods about user perception.
To evaluate engagement also professional users will be interviewed and questioned about their expectations and perception during and after the pilots, between September and April 2015.
Actual usage will be evaluated through analysing log files, this will be done during the whole pilot period from September 2014 until July 2015. Finally we will use Comscore and Google analytics to analyse stream starts, click through rates, number and duration of visits during the whole period and create monthly reports during September until August 2015.
8.5.2. German pilot
The initial German pilot is scheduled for project month 15 and 16. The AL App will be on air from the beginning of November 2014 until the End of December 2014. The related TV Show will be on air from 17.10.2014 until 18.12.2014 weekdays at 20:10. The whole series will be re-broadcast in 2015, at the time of writing the provisional planning is early summer, but this is a program decision and cannot be confirmed until nearer the date.
55 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
8.5.2.1. Application testing
For the German pilot, existing HbbTV users will be recruited. These testers will use the service for the time the “Abenteuer Liebe” TV series is on-air. This is expected to be the case for approximately two months. Users will be asked to share their experiences via user experience questionnaires, interviews and a focus group discussion at the end of the trial.
The user acquisition will be planned in September 2014, then it will be decided which channels, for example the RBB Facebook page or “Kinderkanal” (KiKA channel) website will be used to recruit users from the target group. Promotional texts and images will be prepared. The recruitment process will start end of September 2014 and is expected to finish by the end of October. RBB will organise an introductory meeting for all users from the dedicated test panel. This meeting could be online, if some users live quite far away. In the first two weeks of November 2014, namely at the beginning of the home-use phase, RBB will carry out interviews with each user from the test panel about their expectations (e.g. via phone). In the pilot phase in November and December, each test user will fill out online questionnaires, weekly or at least every second week. At the end of the pilot in January 2015, RBB will carry out closing interviews with all users from the test panel. After this, around February, there will be a closing event for all participants, again if not most of the users live quite far away.
Also the editor(s) who will be responsible for the app during the pilot will be interviewed before the pilot starts in the end of October and after the pilot in January 2015. The feedback of the editor(s) will also be documented regularly through interviews or questionnaires weekly or every second week.
This process will be repeated for the second phase of testing. The planned time schedule is pictured in the calendar in section 8.7.2
Precondition for the users is to have DVB reception, a fast ADSL or even VDSL internet connection, an HbbTV-enabled TV or set top-box, a smartphone and if needed a parental consent.
For the evaluation of the used MPEG-DASH material, the German pilot expects to reach a subset of MPEG-DASH-enabled devices (HbbTV v0.5) as only these devices support DASH. For the users, it is not a precondition to participate actively in the pilot. The piloted HbbTV application will make use of a browser detection feature that can differentiate between devices that are MPEG-DASH-enabled and those which are not.
The German pilot aims at recruiting in total 40 test users for pilot participation. A team for managing the contacts and all issues of their involvement during the pilot and evaluation will be set up at RBB.
For the TVAppGallery evaluation, IRT will request a group of test users. The marketing team as well as our HbbTV experts could offer some contacts for possible test users. The test group will include about 15 persons. It is imaginable to invite the test users to IRT for one day and have a collective evaluation. If this is not possible as some interested users might live too far away, the evaluation can also be carried out remotely. To participate in the evaluation, a regular HbbTV settop-box or TV with satellite and broadband connection is sufficient. The portal will be offered to the users through a test channel from RBB or will be integrated by IRT directly into the HbbTV devices.
It is planned to carry out the evaluation in the middle of 2015, so the recruiting of possible test users will start at the beginning of 2015.
56 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
8.5.2.1. Platform testing
The application being used in the German pilot will be on-air and freely available, it is expected that it will be advertised from end of October through some RBB channels, for example through a radio interview or integrated in the TV trailer. This means that in addition to the dedicated UX evaluation test panel, a much larger number of users is expected to use the application. For the technical platform, all user interaction will be measured with the help of technical tools in the whole pilot phase. Data, for example, regarding clicks per visit, duration and video stream size will be collected through tools like PIWIK, Google Analytics, Akamai LunaControl and Web server measurements. Exactly what information is measured will be subject to RBB data protection guidelines.
After the follow-up interviews and the closing event in February 2015, RBB will take care of collecting and statistical processing of data. The data will be prepared and analysed to generate recommendations and ideas for new applications in May 2015.
This process will be repeated for the second phase of testing. The planned time schedule is pictured in the calendar in section 8.7.2
For the TVAppGallery evaluation the recruiting of the test users is planned in the beginning of 2015. As well as explained for the previous user evaluation, it is not yet decided if it will be in the form of a local or remote evaluation. There will be introduction information and instructions available for the users. The evaluation will be supported by interviews and questionnaires. Another option is to use online questionnaires or telephone interviews. The TVAppGallery will be evaluated in one cycle. The technology behind it is already finished and was sufficiently tested in preceding projects, so there will be no major development work during the pilot phase. The frontend of the portal will be adapted following the evaluation outcomes to improve its attractiveness to potential partners. Nevertheless, one evaluation process is seen as sufficient. After the evaluation, IRT will collect all data and work out a summary.
8.5.3. Spanish pilot
The Spanish pilot will run in two phases from M13 (September 2014) to M23 (July 2015). Pilot phase I has been confirmed and is linked to the second season of a specific TV show, “Oh Happy Day!”. This phase will run from M13 (September 2014) to M17 (January 2015). At the core of this phase there is the period in which the aforementioned show will be on air, from the 1st week of October 2014 to the 2nd week of January 2015.
A second pilot phase has been proposed for M18 (February 2015) to M23 (July 2015). The planning for this pilot phase II is contingent on finding and securing adequate contents, which cannot be known at the present moment, because the selection of next year’s programs is not yet available. The exact schedule and contents for this pilot phase is to be confirmed in M15 (November 2015), three months before the planned start of phase II.
8.5.3.1. Application testing
In the Spanish pilot, users for the evaluation of the HbbTV application will be reached in two ways, depending on whether they experience the service during the pilot period through the managed or the non-managed CDN.
For the managed CDN side, the committed user panel selected in T3.2 in preparation of the pilot will be mobilised in a series of co-creation and user experience evaluation actions. This user panel will be composed of a group of 15 to 20 households, encompassing between 20 to
57 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
50 individual users of mixed demographic profiles (young couples, senior citizens, families with kids, and single-person households). To assist in the recruiting and running of the pilot actions regarding the user panel, a partnership has been formalised with Guifinet, a local network provider with a strong presence and reputation in the area selected for the pilot (Gurb, a town in central Catalonia). The user panel households will receive a TV set with HbbTV 0.5 support, to be used for the pilot activities. Upon successful completion of the pilot actions, user panel households will be ceded the TV sets as compensation for their participation.
The user panel will be involved in a series of data-generating activities, which will be focused on the evaluation of the user experience with the HbbTV applications and contents offered throughout the pilot’s duration. These will be:
- A total of 4 co-creation workshop sessions, at the beginning and end of each of two pilot phases, in which end users and project professionals will work together to under-stand and co-create aspects of the application which need further elucidation and pol-ishing.
- After each use of the application, user panel members will be asked to answer a short online questionnaire of satisfaction with the application. This questionnaire has two main purposes. First, to obtain a quantitative measure of satisfaction which will allow the project’s researchers to triangulate these non-technical data with technical metrics of quality of service. And second, to have a satisfaction benchmark with which to as-sess the progress of the application towards an optimal solution from the user’s point of view.
- Ethnographic methods are one of the main planned sources of knowledge in the pilot. For up to two times in total per user, the researchers of the project will perform a par-ticipant observation in a household to watch TV with the volunteer, assess first-hand experience of the user, and identify areas for improvement. To overcome logistic chal-lenges, a participative two-step approach to ethnography will be followed. First, the selected household will be assisted to record with their own devices their reactions and interactions while they watch the show live on Saturday night, and send the re-cordings to the project researchers for further analysis. And second, the project re-searchers will visit the household to talk with the users and observe in situ their con-sumption of on-demand contents. A post-event semi-structured interview will serve as a participant debriefing of the whole experience.
For the non-managed CDN side, strong dissemination efforts are expected to yield an organic growth of the HbbTV market. These users will be enticed to use the TVC application with a series of promotion actions. Adequate audience monitoring and user feedback mechanisms will be in place to ensure that high-quality data on usage and satisfaction is generated.
8.5.3.1. Platform testing
The platform technical evaluation for the controlled CDN scenario will start in September 2014. Initially the evaluation will start making laboratory measurements in order to refine and procedure data acquisition. The pilot TV show for the first phase of the pilot (Oh Happy Day) will be on air in early October, so during September the data acquisition and its process will be determined in detail. As it will be a weekly TV show, the evaluation metrics will gathered weekly until the end of the first phase, until January 2015.
The platform evaluation from the professional users perspective will be achieved as it has been stated before in 6.0.2 by performing in depth interviews with the professionals from TVC, RTV for the global CDN scenario around November/December 2014. For the controlled scenario,
58 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
the Guifinet and i2CAT professionals will be involved in the four focus groups planned at the beginning and the end of each pilot phase.
8.6. Evaluation reporting Final execution reporting will be described within the deliverable D4.2 “Pilot execution Report” and at the end of the project the conclusions will be described in the deliverable D4.3 “Evaluation Results”. Meanwhile the pilot is running (12 month), reporting should be carried out regularly. It is expected that every one or two months data (consume and performance) reporting will be delivered to the pilot consortium and then to the project consortium. Some end user tests are identified in each pilot during the pilot execution, these tests outcomes will be reported to the consortium once they are cleaned and ready to be presented.
8.7. Pilot evaluation calendar
The TV-RING pilots can be defined of successive coordinated actions, which have different scheduling on each region. This is mostly motivated for the interrelation of the Applications to be deployed and the associated content. As the evaluation process is complicated, it has been defined a calendar of activities per pilot. Then, it will be possible to merge all pilots and define common milestones that will facilitate the final evaluation and a better organisation of the related tasks.
In the following section, there is a detailed scheduling of all actions that are linked with the pilot evaluation, and in baseline with the pilot execution. Thanks to this information, it is much clearer to explain and understand how the evaluation process is expected to be conducted during the piloting stage. It is important to highlight that this calendar only makes reference to evaluation tasks, but not with other actions that are part of each pilot.
59 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
8.7.1. Dutch pilot
Image 10: Dutch pilot evaluation calendar
60 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
8.7.2. German pilot
Image 11: German pilot evaluation calendar
GANTT CHART
Evaluation and execution task
Pilot month
Week 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Nominal project schedule
Pilot Germany (T4.2)
Final Evaluation of pilots (T4.5)
Pilot execution schedule
AL app on air
TV show on air
Evaluation plan schedule
Preperation of user aquisation
User panel recruitment
End user panel evaluation
Introductory meeting (if feasible)
Expectation interview at the beginning
of the home-use phase
Regular online questionnaires
Follow-up interviews at the end of the
home-use phase
Closing event (if feasible)
On air user evaluation (all,
testpanel + viewers)
Advertising of TV show and application
(facebook, on air trailer)
User support - documented feedback
Continuous and regular technical
measurement (see 8.8.2.)
In-app qustionaire evaluation
Proffesional user panel evaluation
Expectation interview at the beginning
of the pilot
Editor training
Regular Interview or qiuestionaire
Follow-up interview
Results agregation and analysis
Collection and statistical processing of
data
evaluation and preparation of data
recommendations and ideas for new
apps
TVAppGallery user evaluation
User panel recruitment
Evaluation
Processing of the results
20162014 2015
Year 2 Year 3Year 1
FebMarc Apr May Jun Jul Aug Sep Oct Nov Dec Jan
27 28 29 30
Nov Dec Jan Feb
21 22 23 24 25 2615 16 17 18 19 20
Aug
12
Sep
13
Oct
14
61 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
8.7.3. Spanish pilot
Image 12: Spanish pilot evaluation calendar
62 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
8.7.4. Common TV-RING evaluation calendar
Week 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Nominal project schedule
Pilots Execitopm (T4.2 /T4.3 / T4.4)
Final Evalutation of Pilots (T4.5)
Pilot execution schedule
Module - Recommender
Module - DRM
Module - Quiz
Evualtuation plan schedule
Call for user panel
Literature study - on willingness to pay
Interviews - on how people watch tv and
how there moods are
Observation - on how a usergroup watches
tv
Questionnaires - perception of user
watching tv
Analyse logfiles - engagement of users
Analysing Comscore - start of stream,
clickthrough
Analysing Google Analytics - number and
duration of visits
Planned schedule
Call for user panel members
User panel recruitment
"Oh Happy Day" 2nd season on air (On
Demand multicamera programme test)
"Oh Happy Day" 2nd season Finale, (Live
multicamera programme test)
"Oh Happy Day" 2nd season test results
retrospective
Overall results agregation and analysis
User panel focus groups (phase I)
User panel evaluation (video-based
observation and post-test interviews, phase
I)
In-app marking evaluation
Qualitative non-managed evaluation ("like")
Potential schedule
Establishing next programme
Second On Demand - Live test
Second test Retrospective
"Oh Happy Day" 3rd season on air (On
Demand multicamera programme test)
"Oh Happy Day" 3rd season Finale, (Live
multicamera programme test)
"Oh Happy Day" 3rd season test results
retrospective
La marató 2014 (Charity one day special
programme)
La marató 2014 Retrospective
La marató 2015 (Charity one day special
programme)
La marató 2015 Retrospective
User panel focus groups (phase II
User panel evaluation (video-based
observation and post-test interviews, phase
II)
Infrastructure
RTE CDN interconnection testing
(Planned) i2cat interconnection
(Potential) i2cat interconnection
Pilot execution schedule
AL app on air
TV show on air
Evaluation plan schedule
Preparation of user acquisition
User panel recruitment
End user panel evaluation
Introductory meeting (if feasible)
Expectation interview at the beginning of
the home-use phase
Regular online questionnaires
Follow-up interviews at the end of the home-
use phase
Closing event (if feasible)
On air user evaluation (all, test panel +
viewers)
Advertising of TV show and application
(Facebook, on air trailer)
User support - documented feedback
Continuous and regular technical
measurement (see 8.8.2.)
In-app questionnaire evaluation
Professional user panel evaluation
Expectation interview at the beginning of
the pilot
Editor briefing
Feedback collection
Follow-up interview
Results aggregation and analysis
Collection and statistical processing of
data
Evaluation and preparation of data
Ideas and recommendations for
improvements
Overall results aggregation
TVAppGallery user evaluation
User panel recruitment
Evaluation
Processing of the results
German Pilot
Spanish pilot
Dutch pilot
63 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
9. Conclusions
In this document, each TV-RING pilot site has provided a comprehensive evaluation plan for the services it intends to tests. A fair part of the work was based on the know-how and the expertise of the TV-RING partners KU Leuven and IRT, who suggested approaches and provided templates with parameters and methodologies to be considered for evaluating user experience as well as technical performance of all pilot services in the TV-RING pilot areas. Substantial work has been spent on harmonising the evaluation plan while coping with the varied nature of the services being tested as well as with implications of running pilots in the multiple locations and by involving users in different manners. Thanks to the cooperation and collaboration among partners, the project is following a cohesive, coordinated approach to testing and evaluation. Based on this approach, we are optimistic that the TV-RING pilots will produce reliable and meaningful results that again will be of use for the whole HbbTV and connected TV ecosystem. The evaluation results will be documented in Deliverable D4.3 becoming available in project months 30.
64 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
10. Bibliography & References
1. Bargas-Avila, J.A. and Hornbæk, K. Old wine in new bottles or novel challenges: a critical analysis of empirical studies of user experience. Proceedings of the 2011 annual confer-ence on Human factors in computing systems, ACM (2011), 2689–2698.
2. Brooke, J. SUS-A quick and dirty usability scale. Usability evaluation in industry 189, (1996), 194.
3. Chuttur, M. Overview of the Technology Acceptance Model: Origins, Developments and Future Directions. 2009. http://sprouts.aisnet.org/9-37/.
4. Desmet, P. Measuring emotion: Development and application of an instrument to measure emotional responses to products. In Funology. Springer, 2005, 111–123.
5. Desmet, P.M., Hekkert, P., and Jacobs, J.J. When a Car Makes You Smile: Development and Application of an Instrument to Measure Product Emotions. Advances in consumer research 27, 1 (2000).
6. Fuchsberger, V., Moser, C., and Tscheligi, M. Values in Action (ViA): Combining Usability, User Experience and User Acceptance. CHI ’12 Extended Abstracts on Human Factors in Computing Systems, ACM (2012), 1793–1798.
7. Green, W., Dunn, G., and Hoonhout, J. Developing the scale adoption framework for evaluation (SAFE). International Workshop on, Citeseer (2008), 49.
8. Hassenzahl, M. The Interplay of Beauty, Goodness, and Usability in Interactive Products. Hum.-Comput. Interact. 19, 4 (2008), 319–349.
9. Hassenzahl, M., Burmester, M., and Koller, F. AttrakDiff: Ein Fragebogen zur Messung wahrgenommener hedonischer und pragmatischer Qualität. In Mensch & Computer 2003. Springer, 2003, 187–196.
10. Izard, C.E., Libero, D.Z., Putnam, P., and Maurice, O. Stability of emotion experiences and their relations to traits of personality. Journal of Personality and Social Psychology 64, 5 (1993), 847–860.
11. Jackson, S.A. and Eklund, R.C. Assessing Flow in Physical Activity: The Flow State Scale-2 and Dispositional Flow Scale-2. Human Kinetics Journals, 2010.
12. Jackson, S.A. and Marsh, H.W. Development and Validation of a Scale to Measure Optimal Experience: The Flow State Scale. Human Kinetics Journals, 2010.
13. Jain, J. and Boyce, S. Case study: longitudinal comparative analysis for analyzing user behavior. Proceedings of the 2012 ACM annual conference extended abstracts on Human Factors in Computing Systems Extended Abstracts, ACM (2012), 793–800.
14. Kahneman, D. (2010). The riddle of memory vs. experience. http://www.ted.com/talks/daniel_kahneman_the_riddle_of_experience_vs_memory
15. Kahneman, D., Krueger, A.B., Schkade, D.A., Schwarz, N., and Stone, A.A. The Day Reconstruction Method (DRM). Instrument Documentation. Retrieved April 3, (2004), 2005.
16. Karapanos, E., Zimmerman, J., Forlizzi, J., and Martens, J.-B. User experience over time: an initial framework. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2009), 729–738.
17. Karapanos, E., Martens, J.-B., and Hassenzahl, M. Reconstructing Experiences through Sketching. arXiv:0912.5343 [cs], (2009).
18. Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., and Newell, C. Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction 22, 4-5 (2012), 441–504.
65 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
19. Kujala, S. and Miron-Shatz, T. Emotions, Experiences and Usability in Real-life Mobile Phone Use. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2013), 1061–1070.
20. Kujala, S., Roto, V., Väänänen-Vainio-Mattila, K., Karapanos, E., and Sinnelä, A. UX Curve: A method for evaluating long-term user experience. Interacting with Computers 23, 5 (2011), 473–483.
21. Lang, P.J. Behavioral treatment and bio-behavioral assessment: Computer applications. (1980).
22. Lavie, T. and Tractinsky, N. Assessing dimensions of perceived visual aesthetics of web sites. International Journal of Human-Computer Studies 60, 3 (2004), 269–298.
23. Law, E.L.-C. The measurability and predictability of user experience. Proceedings of the 3rd ACM SIGCHI symposium on Engineering interactive computing systems, ACM (2011), 1–10.
24. Lewis, J.R. IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. International Journal of Human-Computer Interaction 7, 1 (1995), 57–78.
25. McAuley, E., Duncan, T., and Tammen, V.V. Psychometric Properties of the Intrinsic Motivation Inventory in a Competitive Sport Setting: A Confirmatory Factor Analysis. Research Quarterly for Exercise and Sport 60, 1 (1989), 48–58.
26. Sheth, J.N., Newman, B.I., and Gross, B.L. Why we buy what we buy: A theory of consumption values. Journal of Business Research 22, 2 (1991), 159–170.
27. Snillito, M.L. and de Marie, D. Value: its measurement, design and management. (1992) 28. Vermeeren, A.P.O.S., Law, E.L.-C., Roto, V., Obrist, M., Hoonhout, J., and Väänänen-Vainio-
Mattila, K. User experience evaluation methods: current state and development needs. Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries, ACM (2010), 521–530.
66 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
11. Annex
11.1. Pilot UX Evaluation Template
Pilot scenario description
In this section please provide a short description about the application scenario in each pilot. For example, in the Dutch pilot, there are 3 application scenarios. Each one requires a separate template document.
67 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Table 0. Pilot scenario description information
Pilot scenario
Short scenario
description (1
paragraph)
Country / region
Target group (number
of people, ages,
individuals-households,
type of viewer…)
Target device(s)
Total scenario
deployment time
period
(month X1 – month X2)
Research question
Try to formulate what you want to learn from your pilot. What is your focus? You don’t have to complete 4 research questions; 3 is OK – 6 is also OK. This is a first iteration. When we (KU Leuven) receive your input, we will look at it, and provide feedback if necessary.
Table 2. Overview of research questions
Research Question 1
Research Question 2
Research Question 3
Research Question 4
Selected Evaluation Measures
Indicate a number of measures you believe might be interesting for your pilot – measures you believe will provide some insight into your research questions. The measures are available in a separate document (D4.1 UX Measures.xlsx). You can include your own measures if you cannot find it in our list.
68 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Table 3. UX Measures and related research question
UX Measure Related Research Question(s) Number(s)
Preferred methods
Describe which methods you would like to use to answer your research questions. A selection of methods can be found in a separate document (D4.1 UX Methods.xlsx). Even more UX Methods can be found at http://www.allaboutux.org. Finally, you can also use your own methods if that is required for you pilot scenario.
Make sure you include
What-people-say methods AND what-people-do methods
Qualitative AND quantitative data gathering methods
If you are not really certain about the right method, don’t worry. We will help every partner in setting up the proper methodology based on your input.
Before start of scenario deployment
Method Related Measure
During scenario deployment
Method Related Measure
69 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
After scenario deployment
Method Related Measure
70 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
11.2. UX Measures Table
71 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
72 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
11.3. UX Methods Overview
73 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
11.4. TV-RING Complete UX Evaluation Methodology
74 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Image 13: Evaluation methodology for the Dutch pilot
measures methods measures methods measures methods measures methods measures methods
expectations questionnaire engagement logging (technical) aesthetics/appeal questionnaire aesthetics/appeal questionnaire
habits (current use) questionnaire distraction questionnaire distraction questionnaire
motivation questionnaire engagement questionnaire engagement questionnaire
enjoyment questionnaire enjoyment questionnaire
motivation questionnaire motivation questionnaire
overall UX questionnaire overall UX questionnaire
sociability questionnaire sociability questionnaire
social image questionnaire social image questionnaire
social presence questionnaire social presence questionnaire
usability IBM ASQ usability SUS
flow Flow State Scale
expectations interview aesthetics/appeal observation aesthetics/appeal interview aesthetics/appeal UX Curve / Interview
habits (current use) interview distraction observation distraction interview distraction UX Curve / Interview
motivation interview engagement observation engagement interview engagement UX Curve / Interview
enjoyment observation enjoyment interview enjoyment UX Curve / Interview
motivation observation motivation interview motivation UX Curve / Interview
overall UX observation overall UX interview overall UX UX Curve / Interview
sociability observation sociability interview sociability UX Curve / Interview
social image observation social image interview social image UX Curve / Interview
social presence observation social presence interview social presence UX Curve / Interview
usability observation usability interview usability UX Curve / Interview
expectations questionnaire usage logging (technical) usability SUS
habits (current use) questionnaire relevance TV questionnaire relevance questionnaire
motivation questionnaire timing TV questionnaire timing questionnaire
suitable for the group TV questionnaire suitable for the groupquestionnaire
overall UX questionnaire
usability Skype or telephone interview
relevance Skype or telephone interview
timing Skype or telephone interview
suitable for the groupSkype or telephone interview
overall UX Skype or telephone interview
expectations questionnaire engagement logging (technical) attractiveness questionnaire attractiveness questionnaire
motivation questionnaire engagement questionnaire engagement questionnaire
motivation questionnaire motivation questionnaire
overall UX questionnaire overall UX questionnaire
usability IBM ASQ usability SUS
will ingness-to-pay questionnaire will ingness-to-pay questionnaire
country pilot scenario
before pilot deployment during pilot deployment after pilot deployment
before episode of use during episode of use after episode of use
DRM
LARGER SCALE EVALUATION LARGER SCALE EVALUATION LARGER SCALE EVALUATION
LARGER SCALE EVALUATION LARGER SCALE EVALUATIONLARGER SCALE EVALUATION
IN DEPTH UX EVALUATION IN DEPTH UX EVALUATIONIN DEPTH UX EVALUATION
LARGER SCALE EVALUATION LARGER SCALE EVALUATION LARGER SCALE EVALUATION
IN DEPTH UX EVALUATION IN DEPTH UX EVALUATION IN DEPTH UX EVALUATION
NL
2nd screen
Recommender
75 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Image 14: Evaluation methodology for the German pilot
measures methods measures methods measures methods measures methods measures methods
expectations questionnaire engagement logging (technical) usability IBM ASQ usability SUS
habits (current use) questionnaire distraction questionnaire distraction questionnaire
empowerment questionnaire empowerment questionnaire
engagement questionnaire engagement questionnaire
enjoyment questionnaire enjoyment questionnaire
overall UX questionnaire overall UX questionnaire
participation questionnaire participation questionnaire
reciprocity questionnaire reciprocity questionnaire
sociability questionnaire sociability questionnaire
social presence questionnaire social presence questionnaire
flow Flow State Scale
expectations interview distraction observation distraction interview distraction UX Curve / Interview
habits (current use) interview empowerment observation empowerment interview empowerment UX Curve / Interview
engagement observation engagement interview engagement UX Curve / Interview
enjoyment observation enjoyment interview enjoyment UX Curve / Interview
overall UX observation overall UX interview overall UX UX Curve / Interview
participation observation participation interview participation UX Curve / Interview
reciprocity observation reciprocity interview reciprocity UX Curve / Interview
sociability observation sociability interview sociability UX Curve / Interview
social presence observation social presence interview social presence UX Curve / Interview
usability observation usability interview usability UX Curve / Interview
LARGER SCALE EVALUATION LARGER SCALE EVALUATION
IN DEPTH UX EVALUATION
LARGER SCALE EVALUATION
IN DEPTH UX EVALUATIONIN DEPTH UX EVALUATION
country pilot scenario
DE Abenteuer Liebe
after pilot deploymentduring pilot deploymentbefore pilot deployment
before episode of use during episode of use after episode of use
76 version 1.5, 03/09/2014
D4.1.1.Evaluation Plan
Image 15: Evaluation methodology for the Spanish pilot
measures methods measures methods measures methods measures methods measures methods
expectations questionnaire engagement logging (technical) usability IBM ASQ usability SUS
habits (current use) questionnaire usage logging (technical) aesthetics/appeal questionnaire aesthetics/appeal questionnaire
engagement questionnaire engagement questionnaire
enjoyment questionnaire enjoyment questionnaire
overall UX questionnaire overall UX questionnaire
expectations focus group expectations interview aesthetics/appeal observation aesthetics/appeal interview aesthetics/appeal focus group / co-creation
habits (current use) focus group habits (current use) interview engagement observation engagement interview engagement focus group / co-creation
enjoyment observation enjoyment interview enjoyment focus group / co-creation
overall UX observation overall UX interview overall UX focus group / co-creation
usability observation usability interview usability focus group / co-creation
expectations questionnaire engagement logging (technical) usability IBM ASQ usability SUS
habits (current use) questionnaire usage logging (technical) aesthetics/appeal questionnaire aesthetics/appeal questionnaire
engagement questionnaire engagement questionnaire
enjoyment questionnaire enjoyment questionnaire
overall UX questionnaire overall UX questionnaire
expectations focus group expectations interview aesthetics/appeal observation aesthetics/appeal interview aesthetics/appeal focus group / co-creation
habits (current use) focus group habits (current use) interview engagement observation engagement interview engagement focus group / co-creation
enjoyment observation enjoyment interview enjoyment focus group / co-creation
overall UX observation overall UX interview overall UX focus group / co-creation
usability observation usability interview usability focus group / co-creation
after pilot deploymentduring pilot deployment
before episode of use during episode of use after episode of usecountry pilot scenario
before pilot deployment
LARGER SCALE EVALUATION LARGER SCALE EVALUATION
IN DEPTH UX EVALUATION IN DEPTH UX EVALUATION IN DEPTH UX EVALUATION
LARGER SCALE EVALUATION LARGER SCALE EVALUATION LARGER SCALE EVALUATION
IN DEPTH UX EVALUATION IN DEPTH UX EVALUATION IN DEPTH UX EVALUATION
LARGER SCALE EVALUATIONES
Multicam Live
Multicam VoD
Week 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Nominal project schedule Pilots Execitopm (T4.2 /T4.3 / T4.4)Final Evalutation of Pilots (T4.5)
Pilot execution scheduleModule - RecommenderModule - DRMModule - QuizEvualtuation plan scheduleCall for user panel Literature study - on willingness to payInterviews - on how people watch tv and how there moods are
Observation - on how a usergroup watches tvQuestionnaires - perception of user watching tvAnalyse logfiles - engagement of users Analysing Comscore - start of stream, clickthroughAnalysing Google Analytics - number and duration of visits
Planned scheduleCall for user panel membersUser panel recruitment"Oh Happy Day" 2nd season on air (On Demand multicamera programme test)"Oh Happy Day" 2nd season Finale, (Live multicamera programme test)"Oh Happy Day" 2nd season test results retrospectiveOverall results agregation and analysisUser panel focus groups (phase I)
User panel evaluation (video-based observation and post-test interviews, phase I)In-app marking evaluation
Qualitative non-managed evaluation ("like")Potential scheduleEstablishing next programmeSecond On Demand - Live test Second test Retrospective"Oh Happy Day" 3rd season on air (On Demand multicamera programme test)"Oh Happy Day" 3rd season Finale, (Live multicamera programme test)"Oh Happy Day" 3rd season test results retrospectiveLa marató 2014 (Charity one day special programme)La marató 2014 RetrospectiveLa marató 2015 (Charity one day special programme)La marató 2015 RetrospectiveUser panel focus groups (phase IIUser panel evaluation (video-based observation and post-test interviews, phase II)InfrastructureRTE CDN interconnection testing(Planned) i2cat interconnection (Potential) i2cat interconnection
Pilot execution scheduleAL app on airTV show on airEvaluation plan schedulePreparation of user acquisitionUser panel recruitmentEnd user panel evaluationIntroductory meeting (if feasible)Expectation interview at the beginning of the home-use phaseRegular online questionnaires Follow-up interviews at the end of the home-use phaseClosing event (if feasible)On air user evaluation (all, test panel + viewers)Advertising of TV show and application (Facebook, on air trailer)User support - documented feedbackContinuous and regular technical measurement (see 8.8.2.)In-app questionnaire evaluationProfessional user panel evaluationExpectation interview at the beginning of the pilotEditor briefingFeedback collectionFollow-up interview Results aggregation and analysis
Collection and statistical processing of dataEvaluation and preparation of dataIdeas and recommendations for improvementsOverall results aggregationTVAppGallery user evaluationUser panel recruitmentEvaluationProcessing of the results
German Pilot
Spanish pilot
Dutch pilot
D4.1.1 Evaluation Plan version 1.5, 03/09/2014
11.5. General Calendar – Printable version
77