p4all - d401 1 evaluation framework and supporting ... · prosperity4all, and as such, the backbone...

Ecosystem infrastructure for smart and personalised inclusion and PROSPERITY for ALL stakeholders

D401.1 Evaluation framework and supporting material for evaluation design

Project Acronym Prosperity4All

Grant Agreement number FP7‐610510

Deliverable number D401.1

Work package number WP401

Work package title Evaluation framework and technical

validation

Authors Katerina Touliou, Maria Gemou, Till Rieder,

Matthias Berning, Stefan Schürz, Víctor Manuel

Hernández Ingelmo, Jess Mitchell, Evangelos

Bekiaris, Gregg Vanderheiden.

Status Final

Dissemination Level Confidential

Delivery Date 26/01/2015

Number of Pages 87

Ecosystem infrastructure for smart and personalised inclusion and PROSPERITY for ii ALL stakeholder. www.prosperity4all.eu

Abstract

The current deliverable D401.1 “Evaluation framework and supporting material for evaluation design” of the Prosperity4All project proposes the development of a human‐centred framework with four iteration phases to be conducted with implementers, end‐users and all relevant types of stakeholders, including also a final impact assessment. Diversity and complexity are strong forces of the evaluation framework incorporating many aspects, as expected to happen when bringing into real use accessibility products for diverse groups under the same roof. It starts with identifying the evaluation questions and core aspects of the framework as well as the Key Performance Indicators (KPIs) ‐as they are intended to be addressed in evaluation‐ and then specifies the logical models for each evaluation type (i.e. implementers, end‐users, impact assessment). Elaborate tables presenting the high level objectives, methods, and metrics for testing with implementers, end users, and the final impact assessment were annexed to be used as reference materials for designing the detailed evaluations to follow and to be available for partners whenever needed (Annex B: Compilation of chosen evaluation materials & relevant instruments). The framework addresses also practical and ethical aspects related to the whole evaluation‐and the separate iterations‐such as ethics approval, recruitment, data handling and analysis, risk mitigation and actual work allocation. The deliverable concludes with the requirements for sketching the first evaluation plan for implementers (context of forthcoming D402.1). It should be stressed that D401.1 intends to be the high level framework of the evaluation to be held in Prosperity4All, and as such, the backbone for the specific (and more detailed) evaluation plans that will follow in the corresponding deliverables (D402.1, D403.1, D404.1 & D404.2). Also, as the very relevant work of SP1 of the project (Economic Model) has not been finalised at the time of this deliverable release, revisions and updates of the presented content may emerge in an updated version of this deliverable and related deliverables (i.e. to be reflected in the above deliverables).

Keyword List

Evaluation framework, implementers, end‐users, usability testing, user experience, impact

assessment.

Ecosystem infrastructure for smart and personalised inclusion and PROSPERITY for iii ALL stakeholder. www.prosperity4all.eu

Version History

Revision Date Author Organisation Description

1 01/06/2014 Maria Gemou,

Katerina Touliou, Till

Rieder

CERTH, KIT First version of the framework

2 6/09/2014 Jess Mitchell IDRC Categories of actors, functional roles, and

value propositions

3 9/09/2014 Katerina Touliou,

Maria Gemou ,Till

Rieder, Matthias

Berning, Stefan

Schürz, Víctor Manuel

Hernández Ingelmo,

Gregg Vanderheiden

CERTH, KIT,

LIFEtool,

TECHNOSITE, RTF‐I.

Second version of framework circulated for

comments and suggestions

4 9/09/2014 Gregg Vanderheiden RTF‐I Suggestions about restructuring the tables,

rephrasing evaluation questions and

feedback on content

5 8/10/2014 Mark Magennis NCBI Comments, suggestions

6 3/11/2014 Katerina Touliou,

Maria Gemou ,Till

Rieder, Matthias

Berning, Stefan

Schürz, Víctor Manuel

Hernández Ingelmo,

Jess Mitchell,

Evangelos Bekiaris,

Maria Panou, Athina

Dimou, Gregg

Vanderheiden

CERTH, KIT,

LIFETool,

TECHNOSITE, RTF‐I

Submitted for internal peer review

7 12/12/2014 Gianna Tsakou,

Mathias Peissner

SILO, FHG Received internal review comments

8 26/01/2015 Katerina Touliou CERTH Incorporated comments and changes and

submitted to EC

Ecosystem infrastructure for smart and personalised inclusion and PROSPERITY for iv ALL stakeholder. www.prosperity4all.eu

Table of Contents

Executive Summary .......................................................................................................... 10

1 Introduction ............................................................................................................. 2

1.1 Evaluation in the project’s workplan ........................................................................... 3

1.2 Evaluation aims and questions ..................................................................................... 5

5 Evaluation in the context of Prosperity4All ............................................................ 16

5.1 The Prosperity4All Key Performance Indicators (KPIs) ............................................. 16

5.2 Steps in creating the evaluation framework ............................................................. 20

5.3 Iterative phases ......................................................................................................... 24

6 The evaluation framework ..................................................................................... 31

6.1 Introduction ............................................................................................................... 31

6.1.1 Major dimensions of evaluation in Prosperity4All ............................................. 31

6.2 Evaluation with implementers & users ..................................................................... 36

6.2.1 Evaluations with implementers .......................................................................... 37

6.2.2 Evaluations with end‐users ................................................................................. 44

6.2.3 Evaluation context and evaluation conditions ................................................... 49

6.2.4 Anticipated limitations ........................................................................................ 52

6.2.5 Implementers ...................................................................................................... 52

6.2.5.1 Examples of generic personas and application scenarios for implementers

54

6.2.6 End‐users............................................................................................................. 56

6.2.6.1 Examples of generic personas and application scenarios for end‐users .... 59

6.2.7 Reference case (baseline) ................................................................................... 59

6.2.8 Variations per evaluation phase ......................................................................... 60

6.3 Impact assessment .................................................................................................... 60

6.3.1 Evaluation context and evaluation conditions ................................................... 64

6.3.2 Examples of personas & generic application scenarios for impact assessment 64

6.3.3 Reference case .................................................................................................... 65

Ecosystem infrastructure for smart and personalised inclusion and PROSPERITY for v ALL stakeholder. www.prosperity4all.eu

6.3.1 Anticipated limitations ........................................................................................ 66

7 Prosperity4All test sites descriptions ...................................................................... 66

7.1 LIFETOOL (Austria) ..................................................................................................... 67

7.2 TECHNOSITE (Spain) .................................................................................................. 67

7.3 KIT (Germany) ............................................................................................................ 68

7.4 CERTH (Greece) ......................................................................................................... 68

8 Participants recruitment ......................................................................................... 69

8.1 User involvement strategies ..................................................................................... 69

8.1.1 Selecting participants .......................................................................................... 70

8.2 Basic recruitment steps ............................................................................................. 70

8.3 Prosperity4All Collaborative Network ...................................................................... 71

9 Ethical issues .......................................................................................................... 72

9.1 Focus of Ethics in Evaluation ..................................................................................... 72

9.2 Ethics Control during Evaluation Activities ............................................................... 74

9.2.1 Ethics control for end‐user testing at pilot sites ................................................. 74

9.2.2 Ethics control for implementers testing ............................................................. 75

9.2.3 Ethical issues when users have interchangeable roles in the Prosperity4All

ecosystem ........................................................................................................................... 76

1 0 Planning across evaluation phases ......................................................................... 76

10.1 Core elements of testing plans ................................................................................. 76

10.2 Towards the 1st Evaluation Phase – Building the Reference Case for the

Implementers ....................................................................................................................... 78

10.3 Mapping between SP2 tools/resources and SP3 Implementations ......................... 79

1 1 Training issues ........................................................................................................ 80

1 2 Integrity of Evaluation ............................................................................................ 82

1 3 Data handling & Statistical analysis ........................................................................ 87

1 4 Practical Organisation of Work ............................................................................... 89

1 5 Conclusions and Next Steps .................................................................................... 89

References ........................................................................................................................ 90

Ecosystem infrastructure for smart and personalised inclusion and PROSPERITY for vi ALL stakeholder. www.prosperity4all.eu

List of Abbreviations

Abbreviation Full form

AAL Ambient Assisted Living

AoD Assistance on Demand

API Application Program Interface

AsteRICS ACS Assistive Technology Rapid Integration & Construction Set

AT Assistive Technology

D Deliverable

DoW Description of Work

DSpace DeveloperSpace

GUI Graphical User Interface

GPII Global Public Inclusive Infrastructure

HF Human Factors

IA Impact Assessment

ICT Information and Communications Technology

IDE Integrated Development Environment

ISO International Organization for Standardization

IT Information Technology

KPI Key Performance Indicator

LMS Learning Management system

Lo‐FI Low Fidelity

PoT Producer of Things

R&D Research and Development

SES Socio‐Economic Status

SP Sub‐Project

SPSS Statistical Package for Social Sciences

SUMI Software Usability Measurement Inventory

Ecosystem infrastructure for smart and personalised inclusion and PROSPERITY for vii ALL stakeholder. www.prosperity4all.eu

Abbreviation Full form

SUPRQ The Standardized User Experience Percentile Rank Questionnaire

SUS System Usability Scale

SVVP Software Verification and Validation Plan

S/W SoftWare

TAM Technology Acceptance Model

TTS Text to Speech

UAT User Acceptance Testing

UEQ User Experience Questionnaire

UI User Interface

UX User Experience

WAMMI Website Analysis and Measurement Inventory

WoZ Wizard of Oz

WP Work Package

WtH Willingness to Have

WtP Willingness to Pay

Ecosystem infrastructure for smart and personalised inclusion and PROSPERITY for viii ALL stakeholder. www.prosperity4all.eu

List of Tables

Table 1: Key Performance Indicators (KPIs) addressed in SP4 evaluation: Overview and

summary .................................................................................................................... 17

Table 2: Evaluation phase 1 – Content, Actors, Evaluation methodology .............................. 27




Table 6: Lifecycle evaluation and impact assessment – Content, Actors, and Evaluation

methodology .............................................................................................................. 30

Table 7: Connecting the evaluation questions and the KPIs with primary indicators

(implementers) .......................................................................................................... 38

Table 8: Qualitative method comparison of relevant usage parameters ............................... 41

Table 9: Allocation of participants – Implementers ................................................................ 53

Table 10: Connecting the evaluation questions and the KPIs with primary indicator (end‐

users).......................................................................................................................... 56

Table 11: Allocation of participants – evaluations with end‐users ......................................... 58

Table 12: Connecting the evaluation questions and the KPIs with primary indicators (impact

assessment) ............................................................................................................... 61

Table 13: Test sites ethics responsible persons ....................................................................... 74

Table 14: Developers/Implementers feedback loop template ............................................... 82

Table 15: Evaluation risk and mitigation plan.......................................................................... 83

List of Figures

Figure 1: The core elements of the Prosperity4All evaluation framework ................................6

Figure 2: The connections in the Prosperity4All evaluation methodology ............................. 11

Figure 3: Functional roles for key actors.................................................................................. 13

Figure 4: SP4 Evaluation interdependencies with other Prosperity4All SPs ........................... 15

Ecosystem infrastructure for smart and personalised inclusion and PROSPERITY for ix ALL stakeholder. www.prosperity4all.eu

Figure 5: Interdependency between C4All and Prosperity4All ............................................... 15

Figure 6: Iterative phases ......................................................................................................... 25

Figure 7: A logical model for the evaluations with internal and external implementers........ 43

Figure 8: Peter Morville ‐User Experience Honeycomb [16] ................................................... 44

Figure 9: Potential tradeoffs of cognitive dimensions ............................................................ 45

Figure 10: Τechnology Αcceptance Model .............................................................................. 46

Figure 11: A logical model for the evaluations with end‐users ............................................... 51

Figure 12: A logical model for the impact assessment ............................................................ 63

Figure 13: User testing facilities (LIFEtool) .............................................................................. 67

Figure 14: User testing facilities (CERTH/HIT) .......................................................................... 69

Figure 15: Ethical issues monitoring process and actors ......................................................... 75

Figure 16: Initial mapping and interaction between the different functionalities and the SP2

developments (DoW, p. 63) ....................................................................................... 80

Figure 17: ATutor courses menu (example) ............................................................................ 82

Figure 18: Gantt chart presenting the work allocation and the duration of each task in SP4 89

Ecosystem infrastructure for smart and personalised inclusion and PROSPERITY for x ALL stakeholder. www.prosperity4all.eu

Executive Summary

This deliverable presents a human‐centred, multi‐faceted evaluation framework for the Prosperity4All project. The deliverable is structured in 15 Chapters and 4 Annexes (Annexes) In Chapter 1 the aims of the project are briefly presented followed by the questions that have driven the development of the evaluation framework. Chapter 2 briefly presents the envisioned outcome of the project as a rich, interactive ecosystem with potential actors that may interact with the P4All platform.

The arising interdependencies among the different project’s working teams are discussed in Chapter 3, as well as the connections and overlapping areas between Cloud4all and Prosperity4All. After sketching the actual frame of the framework, as a part of the project and European research, the evaluation is introduced in Chapter 4. A human‐centered approach is adopted for the development of the evaluation framework elaborated with stratification tables. The framework aims to address the evaluation‐directed Key Performance Indicators (KPIs) for: a) implementers (developers who want to make their applications and services more accessible), and b) end‐users (people with a broad range of accessibility related needs and preferences, including several stakeholder groups) (complete table in Annex A). The evaluation framework addresses certain dimensions across 4 iterations, as presented in Chapter 5. A preliminary account of the objectives, methods, and indicators for carrying out evaluations with both major groups are provided in Chapter 6 with considerations about potential variations among iterations (in Annex B relevant evaluation materials are compiled) and depicted in three logical models for each evaluation type. Evaluations of end‐users and stakeholders groups will be carried out across four pilot sites in Europe (Germany, Austria, Spain, and Greece) that are described in Chapter 7.

Participants will be recruited according to processes and guidelines included in Chapter 8 and testing will abide to European and National ethics guidelines as provided in Chapter 9 (further related materials for the project teams are provided in Annexes C & D, whereas the analytical ethics framework of the project will be provided in D501.2 releases). The core elements of testing across all evaluation phases are presented in Chapter 10, followed by an overview of the training platform that will be available online (Chapter 11). A mitigation strategy for arising issues during testing or evaluation planning is presented in Chapter 12. The next chapter focuses on how data will be handled and statistically processed in each evaluation (Chapter 13). Work allocation within SP4 is depicted in a Gantt diagram dedicated solely to this sub‐project’s activities and discussed in Chapter 14. Finally, the deliverable closes with Chapter 15 encompassing the conclusions and the next steps towards the planning of the first iteration with implementers.

The main outcomes of the deliverable are: a) the modelling of the evaluations to be carried out with implementers and end‐users based on overarching logical models

Ecosystem infrastructure for smart and personalised inclusion and PROSPERITY for 2 ALL stakeholder. www.prosperity4all.eu

for each evaluation type and the impact assessment accompanied by a general overview of the potential methods, techniques and indicators to be used, b) establishment of the timeframe for the evaluation with consideration for the technical validation activities, c) the definition of the mitigation steps and a meta‐evaluation activity to be conducted after each evaluation phase. The framework represents the foundation for the specific evaluation plans to follow.

As this deliverable will be submitted prior to SP1 work finalization of the list of actors and the models to be considered into the ecosystem, the work presented hereinafter will be amended accordingly. An updated version of this deliverable will be submitted in 12 months, when respective SP1 documents will have been finalised and made available. Then the related information will be revisited in the analytical testing and impact assessment protocols of D402.1, D403.1, D404.1 & D404.2 and, if necessary, at an updated version of this deliverable.

1 Introduction

There are dozens of online marketplaces where people can buy and sell online. Finding the right one depends on what people want to buy, how much they are willing to spend, and who they are trying to sell to. Online e‐commerce experience has multiplied by almost 4.2 since the dot.com bubble [ 1]. Even though we know four times more, still almost a quarter of online shopping is lost because of users failing to complete transactions; resulting in resources and potential still left on the table that needs to be covered or satisfied. There are giants like eBay or Amazon as well as smaller enterprises targeting narrower or niche markets. More relevant to the work carried out in the project are examples of multi‐platforms available for developers and professionals, such as Mozilla Drumbeat [ 2] and Arcbazar [ 3]. A similar community is the Tetra society of North America [ 4], a community dedicated to assisting people with disabilities to achieve an independent and fulfilling life by providing custom assistive devices for people with disabilities and has been running since 1987 thanks to their extensive network of their volunteers. It is difficult to identify how many similar platforms currently exist because they constantly pop‐up in the world of online purchasing. There are not many online spaces that offer users and developers the opportunity to communicate and share products and information. It is even more difficult to create markets addressing the needs of people representing the tail of the tails, especially in times of economic recession where things are rather tough even for vast, international companies. The choices are far more restricted for people with disabilities looking customised products. If we imagine the limited accessibility of online shops and add upon this the limited access to ATs, a substantial market is created and Prosperity4All aims to address these needs and requirements.

The aim of Prosperity4All is to provide an infrastructure for the development of an ecosystem by employing modern and new techniques, like crowdsourcing and gamification, to enable new strategies for developing accessibility services and introduce a new approach to accessibility solution development. Moreover, the project foresees the creation of productive and diverse commercial and research


relationships, where the user is active and independent in choosing self‐accommodating, customised and personalised solutions. As a result, the project will engage diverse stakeholder groups ‐that will freely interact in various ways‐ when the ecosystem is deployed and thus broaden the development process out to the user.

The Prosperity4All innovation is that any user with accessibility needs and preferences will be able to suggest, micro‐finance, find, choose and buy customised tools, applications, and services. Implementers will be better able to sell their services, tools, and applications or understand market needs. Such new directions in development and marketing require us to view testing and the user in a new light; focussing also on human‐centred techniques for testing the design process for the designer as well as for the user and in addition a hybrid role (developer‐user).

1.1 Evaluation in the project’s workplan

Building the evaluation framework will be based on the objectives of the project and the specific objectives set for the evaluation. It starts from the overall objectives considering the market‐related and economic goals of the project‐as depicted in the table with the KPIs (overview: Table 1) and ends‐ups with proposing models for planning the evaluations with the two major actor groups; implementers and end‐users and other stakeholder groups. The conduction of evaluation is part of the SP4 activities including the impact assessment and demonstration activities to be carried out later in the lifespan of the project and there are six strategic, implementation and business related aspects related to these aims. It comprises two evaluation types: a) testing with implementers (WP402) and testing with end‐users (WP403) and a separate impact assessment (WP404). The evaluation follows the same strategic and implementation strategies governing the whole project by closely adopting the following best practices. Its aim within the project is firstly to further reinforce and ensure a human‐centered design is followed by including real users in evaluations and secondly to organize and monitor any technical validation activities related to SP2 and SP3 development work, thus ensuring smooth implementation before each evaluation phase. Thirdly, this incremental, iterational and evolution approach supports also the development processes to be established during the project, leading to more sophisticated designs and development work in the later stages of the project, according to the feedback provided and data gathered by real users. The incremental and iterational testing will allow for refining products based on tails‐of‐tails’ accessibility needs and clusters of needs a user and a potential ecosystem might have. Fourthly, the evaluation will aim to engage various stakeholder types and addresses the complexity and diversity of the project, in order to provide as much as possible representation for the evaluations. Last, the impact assessment (WP404) will be based initially at the current relevant market situation (WP101) and will compare it to the Prosperity4All business model (WP103). It also aims to create indicators and provide estimations of global usage and investigate opportunities for adoptions of the P4All ecosystem by measuring and monitoring the project’s impact and by


applying validation simulative forecasting methods for continuously monitoring its sustainability on business, market and economic‐related levels. The evaluation teams should always bear in mind two overarching objectives of the project; the first objective refers to the evaluation phases and the second one to the impact assessment. The developer and implementer teams are working towards addressing the needs of a much wider array of users including the unserved and at a much lower cost. And the impact estimations will be for a system that will be built based on game theory and crowdsourcing that allows for the whole system to attract and effectively use a much wider and larger group of individuals to work together, with individual drivers (magnanimous and commercial) to cumulatively create the ecosystem of people, products, services and users that needs to exist.

The user – either having the role of the end‐user of a product or of an implementer wishing to provide a high quality accessible product or a very specifically customised product to be used by a user with very specific needs – is the key decision maker, and as such, will be the centre of evaluation activities carried out in SP4. As different evaluation activities will be carried out, in order to address all evaluation dimensions, the current framework is established to support these activities, acting as a backbone and reference document. Therefore it is vital to be flexible enough to accommodate the specificities of each evaluation phase scheduled to be conducted later in the project.

Building up the evaluation framework requires knowledge not only in the areas of traditional usability and user experience testing but also insight in customer perception and e‐commerce marketing analytics. The latter is especially true for the impact assessment. Usability will be a precondition for the SP3 applications and services which are already products and it might not be the principal desired outcome of the evaluations, however investigation of cost‐effectiveness is a relevant dimension. The proposed SP2 solutions should decrease cost and time required for enhancing/changing their products compared to existing practices. The latter will be investigated with structured interviews before any evaluations (i.e. implementers and end‐users) take place, for defining their current way of working. This is called the reference case (baseline) and the reference case for implementers is presented in section 10.2. The reference case for the end‐users will be presented in the evaluation plan for end‐users (WP403) starting much later in the project. Focussing on implementers at the moment is relevant because the first iteration phase with implementers will start within the next few months (February‐April 2015).

There is no monolithic approach in establishing the framework for evaluation and the respective plans but rather a zero‐based approach is followed that is built up borrowing perspectives from mixed methodologies and agile usability techniques with stepwise addition of evaluation and assessment components and tools to be selected for each phase. The complexity of probable interactions and interdependencies presents a challenge in bringing theory (e.g. user modelling, methodologies, instruments to measure quantitative and qualitative indicators) to practice (e.g. participatory design, remote testing, field testing) for evaluating a multi‐faceted ecosystem that aims to provide a framework with internationalization prospects. For this purpose, the Inherent language variations in Europe fit well for


internalization of development work and testing and, thus, cultural context variability is an advantage of operationalising this endeavour in Europe. In other words, Europe offers the opportunity of testing the work developed within the project in different languages and different cultural contexts and, thus, investigating certain internationalization prospects of the project outcomes.

1.2 Evaluation aims and questions

Evaluations include back‐end evaluation of developer‐facing tools (SP2) and Developer Space (DSpace resources), their use in making accessible applications (SP3) by internal and external developers and implementers and front‐end evaluation of user‐facing tools (SP2) and various internal (SP3) and external applications and services. Last but not least, an impact assessment will be conducted based on both the business and market models (with estimations of significant strengths, weaknesses, and opportunities formed within SP1) for the deployment of the ecosystem and taking also advantage of the iteration phases’ outcomes from the evaluation processes. The Developer Space (DSpace) is currently under construction and will allow structured search in a comprehensive list of building blocks for Assistive Technology development and implementation. A wiki page was created that contains the GPII Developer Space Component Listing which collects components for the upcoming Developer Space repository and gives an idea of the type of resources that will be made available there. The implementers will decide and choose the tools and resources they believe will be of greater use for their work. They will evaluate the DSpace and the resources available in it. Figure 1 depicts the core elements of evaluation offering a macro‐perspective of the framework with no dependencies. Implementers will use SP2 resources for making applications and services accessible (e.g. SP3 implementers), end‐users and other stakeholder groups will access the user‐facing tools and the improved and enhanced applications and services (e.g. from SP3). Testing scenarios will be based on analyse carried out in SP1 in order to define the demand and supply chains characterising the Prosperity4All ecosystem. Finally, the impact assessment will be conducted for the progressively built‐up ecosystem.

This document is the basic instrument for creating the later testing plans for each evaluation phase to be conducted within the lifespan of the project (D402.1 for tests with implementers, D403.1 for testing with end‐users, and D404.1 for testing the Prosperity4All platform). The testing plans will provide complete and detailed research protocols with defined steps and procedures for executing testing with implementers and end‐users and will include specific application scenarios for the actors addressed in each phase. They will follow the rationale and methodology established in the evaluation framework and will draw upon the initial objectives and techniques presented in this deliverable.


Figure 1: The core elements of the Prosperity4All evaluation framework

The importance of the evaluation framework lies not only in the content but also in the adoption of the specific theoretical stand taken in order to answer the research questions that will be shaped in the evaluation plans. Unavoidably, strict testing or traditional usability models cannot be adopted but only adapted to the main objectives of the project. Traditional usability testing and their adaptations –like Rapid Iterative Testing and Evaluation (RITE) [ 5] ‐ are conducted for developing an application in several iterations, starting with a mock‐up in order to get to the final application/service/etc. For example, a typical usability benchmarking study is based on quantitative usability metrics in order to produce comparable results. Usability is a high level objective mainly for testing with implementers (WP402), where implementers will evaluate SP2 resources to be developed within the framework of the project and we are interested in finding out if they are able to use them (related to evaluation question 1, see below). The situation is different for end‐users where the apps/services/etc. are already existing products, therefore the usability testing is not traditional in the sense it aims to reveal major usability problems but rather if the improvements make the apps/services/etc. more accessible and useful for the end‐user. Thus, the aim is not to simply bridge to gap in the mental models between developers, users, and evaluators (system causality conveyance; [ 6]).

Drawing upon existing work in creating frameworks or plans for evaluation, the following questions were identified as important aspects not only for the evaluation but also the meta‐evaluation (i.e. process to derive the lessons learnt after each evaluation phase). A “fivewhys” approach [ 7] was initially adopted for sketching the framework depicted in


Figure 2: The connections in the Prosperity4All evaluation methodology

, for gathering information in order to get the complete picture; not so often used in research but rather in policy making and are set in a direct way (Who is it about? What will happen? When will it take place? Where will it take place? Why it will happen?). The initial set of questions was a subject of discussion among the SP4 partners and the management teams and the evaluation questions where the outcome of this approach and any held discussions. The key driving questions we are trying to answer with the evaluation framework are the following:

1. Are the SP2 tools/resources for Developers (DSpace and all of the frameworks, components, marketing tools, etc.) usable by and useful to developers/implementers (both SP3 developers and external developers, implementers)?

a. i.e Do SP2 tools/resources help SP3 developers?

i. ..to make their SP3 applications more useful or usable by end‐users? ‐‐‐ OR

ii. ..to decrease cost to develop? OR

iii. ..to increase market size/share? OR

iv. increase profits?

2. Are the SP2 tools/resources for end‐users (Unified Listing, Marketplace, Consumer‐developer connection, Assistance on Demand (AoD) service) usable and accepted by and useful to end‐users?

3. Are the SP3 applications more accessible for end‐users after they have been enhanced with SP2 tools/resources?


4. Does the existence of the GPII and more specifically the Prosperity4All components of the GPII – enhance or facilitate the evolution of the accessibility ecosystem – such that it does one or more of the following:

a. Is it able to reach more people

i. in total,

ii. geographically,

iii. economically?

b. Is it able to better serve the tails and tails‐of‐the‐tails?

c. Does it reduce costs to consumers, developers, and/or governments?

5. What context, content, and other related factors (e.g. cultural) might affect the evaluations to be carried out within the project?

The usability of applications for end‐users is outside the scope of P4All, as long as these applications and services have not been built with P4All tools and technologies, are already existing products and have been evaluated for their usability in the past. However, if certain elements and functionalities of an application have changed within the project lifetime to a degree that its usability might be affected, then this could be of importance and it will be highlighted by the needs of each iteration phase. There are though attributes usually related to usability that will be of interest to study, such as efficiency and performance. The priority is to investigate the attributes primarily as part of accessibility testing rather than anything else when it comes to end‐users.

These are questions not only related to the evaluation but to the scopes of the project. Undoubtedly, more questions will arise as the project will evolve and they will be reflected in the forthcoming evaluation deliverables.

Finally, complexity and diversity will characterize all steps of the development work and are reflected in the evaluation framework. On one hand, the layers of complexity are the following:

Different SP2 tools/resources will be used by different SP3 applications and services (it might be true, though, that the same SP2 tools/resources/resources might be used for different SP3 applications and services);

Different levels of fidelity will be available at each evaluation phase for different implementations (for example, use or integration of SP2 tools/resources/ resources might prove to be easier for certain applications or services than for others)

Different end‐user and stakeholder groups will benefit by different SP2 user‐facing tools and SP3 applications and services.

On the other hand, the layers of diversity are the following:


Variety in professional backgrounds of developers, internal and external implementers and thus in needs, occupational habits, knowledge of

accessibility;

Wide areas of applications and services (from educational to health and

business);

Cultural diversity of end‐users, developers, and internal and external implementers;

The multi‐faceted full scale impact assessment of the Prosperity4All ecosystem

in real life conditions will incorporate all aspects tested in previous evaluation

phases and address different and numerous additional factors and aspects.

The impact assessment has many facets as there are at least three actor

categories, with at least three main categories of DSpace resources and SP2

tools for applications and services for 5 different application areas (e.g.

health, education). The exact methodologies will be defined in WP404

deliverables (D404.1 & D404.2 due M12).

At this stage, it is not possible to map the use of SP2 tools/resources to SP3 and external applications because it depends on the implementers who will interact with Prosperity4All. The key for an efficient framework is to consider the outcome of the interactions between diversity and complexity and their products and by‐products at various stages of development work, verification, and, subsequently, evaluation. This is the main reason a mitigation strategy was developed and added in chapter 0, acting as a pro‐active monitoring system for scheduling and assessing the evaluation efforts within the lifespan of the project with the incorporation of a meta‐evaluation assessment after the end of each evaluation phase. Partners involved in each evaluation phase (either with implementers or end‐users) will be able to follow the mitigation strategy in order to solve any problems arising during the evaluation phase. The mitigation steps taken can be used also for ensuring the same problems will not appear in the next evaluation phase. In addition, the meta‐evaluation of the evaluation process apart from revealing any issues, delays, and other problems with the conduction of sessions, shall aim to improve or refine the evaluation materials applied (in case these or similar ones will be used in the next evaluation phase).

The deliverable comprises two parts: a) the main body and b) the annexes that are available on WiKi. The evaluation framework is presented in the main body of the document and the evaluation methods, techniques and materials have been annexed.

2 Methodology

During the last decade, the emergence of complex and mixed approaches in development and design has led to the adoption of equally mixed conceptual evaluation frameworks in practice. The Prosperity4All methodological repertoire


includes the use of both qualitative and quantitative methods. Such practice, however, needs to be grounded in a human‐centred approach that can meaningfully guide the design and implementation of mixed‐method evaluations, as it is the case for the iteration phases carried out in this project.

The framework was developed from existing literature on user‐centred methodologies and then adapted accordingly (i.e. to the work that will be carried out within the framework of the project). As there are many aspects to be considered and addressed in the framework, a schematic representation is presented early in the deliverable in order to guide the reader. Two other schematics are relevant to this depiction of the framework; the core elements of the framework (Figure 1) and the diagram presenting the interdependencies with other SPs within the project (Fehler! Verweisquelle konnte nicht gefunden werden.). As it is evident by both aforementioned figures, evaluations with implementers (WP402) will be carried for evaluating the use of DSpace resources and SP2 tools by internal and external implementers and evaluations with end‐users will be conducted for SP2 user‐facing tools and SP3 applications/services/etc. (improved by the implementers) (WP403).

The chapters in this deliverable reflect the methodology to develop the evaluation framework. The evaluation questions (Chapter 1) were the basis for creating the three logical models (implementers, end users, impact assessment; Chapter 6.2) in order to be able to answer these questions and measure the Key Performance Indicators (KPIs) (section 5.1) of the project relevant to evaluation. As shown below, the identification of actors and their relations, as defined in SP1, will clearly affect many processes within evaluation, including recruitment. The methodology considers the connections between the different SPs with central focus on the evaluation activities. Not all direct connections are depicted because it would considerably increase the complexity of the diagram. However, the methodology follows a logical and necessary rationale. Each SP provides different content to the framework in order for the evaluation to be carried out. The questions reveal what is required by each SP and what the evaluation will give back to these SPs (i.e. evaluation outcomes will be communicated to development teams to improve their work). The methodology followed for the framework is to identify which elements are required in order to design and carry out the evaluation, how these elements are connected and how these connections can be revealed, studied and their outcomes interpreted. The identified elements are shown in

Figure 2 and are essential for answering the evaluation questions. Prosperity4All aims to create an ecosystem and selecting an illustration using homocentric circles, as it is usually encountered in user‐centered and user experience designs and evaluations, does not suffice for describing the ecosystem and therefore another representation was adapted.

The document is divided in two parts: a) the evaluation framework and b) a compilation of evaluation materials and the elaborated evaluation tables which will provide guidance for the conduction of each evaluation type. The first part is discussed in this document which is the main body of the deliverable. The second part is made available online as it will as direct reference for partners directly (acting as pilot sites) and indirectly involved in evaluations carried out. The second part


includes elaborated tables, indicators, methods and measurements per evaluation type and relevant materials, including any ethics related issues to be taken under consideration being in line with the project’s ethics policy (D502.2: ‘Ethics Manual’).

Figure 2: The connections in the Prosperity4All evaluation methodology

3 Upper level description of Prosperity4All ecosystem

The idea is to introduce a holistic approach to inclusive design that will attract new contributors by applying crowdsourcing and gamification principles in order to create a service‐based infrastructure for promoting a prosperity‐based ecosystem. This is aimed to be a complex two‐sided network (i.e. end‐users and implementers) or a multi‐sided platform (i.e. multiple sides for end‐users, implementers, other stakeholder groups). Within the ecosystem, it is anticipated that consumers can be also producers.

Detailed demand‐supply transaction models will be developed (WP102) accommodating for factors that lead to greatest impact. These models will include all stages of the chain, from looking for a supplier to including third party funders and professionals from diverse but relevant backgrounds. An analogy to a party might be adopted in order to make it easier to understand what will be the process for identifying the value for stakeholder groups in the ecosystem to be created.

Who do we invite to this party?

How do we invite them?

How do we make them see this is something they want?

Who else do they expect to meet there?

What is going to happen and how?

Why is it going to be better than other attempts?


These questions were identified by SP1 partners and will drive the journey mapping for the buyer and the seller when interacting with the ecosystem, informing the final impact assessment. Journey mapping refers to the journey of the users since they join the platform till they complete a transaction or exit. This final aspect of modelling will be performed within SP1 and will be reflected in impact assessment.

The first step is to identify the actors that have interest and potential to interact with the ecosystem. These are the actors that need to participate in the evaluation of the ecosystem, within but also outside the project duration (

Figure 3). They are categorized in three groups based on their functional roles, although an actor can be more than one type of player and play more than one role at different times. In our party analogy, if these are the people joining the party then it is assumed that there is always a host (i.e. platform and operators) and guests can interact with the host and with other guests. An early categorization of the actors interacting with the Prosperity4All ecosystem is based on their functional role (see also Figure 3):

A. Actors – Direct impact

Producers Consumers

Matchers B. Re‐actors –measurement (“consuming” info/data)

Interested in success criteria Monitoring the process

Funders/investors Policy‐makers

Evaluating and marketing C. Platform supporters

Owners/operators of platform

SP2 developers

Re‐actors (B) are organizations or individuals who will consume any resulting information or outcomes of the interactions of users with the platform. There are three components of the value proposition for any functional user group presented above: a) ethics, e.g. to advocators of inclusive design, b) law, e.g. companies forced by legislation wanting to avoid fines, and c) economics, e.g. financial gain, exchange for small and large companies (i.e. companies required by law to offer accessible products). In addition, the main agenda for the demand‐supply modelling will probably be based on the unmet needs of involved populations not currently addressed by the existing technologies and markets. This work is currently underway and the categories and categorisations presented in this deliverable are initial proposals that will be further worked and elaborated as important SP1 outcomes to guide evaluation, dissemination, and exploitation activities. The final connection of actors, models of exchange (SP1) and impact assessment via real life testing will be depicted in a user’s journey map considering both the scenarios of interactions and the actors interacting with the ecosystem. In other words, the final outcome will be


addressed in impact assessment and it will be depicted with the journey map of groups of users from real‐life testing.

Users can be categorised in three functional groups based on the role they might play in Prosperity4all and the nature of their interaction with the ecosystem. Their roles interact and in some cases overlap, so their relationships are depicted with a Venn diagram (

Figure 3). The empty spaces between the Venn diagrams refer to the interchangeable roles the actors might play within the Prosperity4All ecosystem (e.g. a teacher can also be a blind user looking for accessible products). Therefore, journey mapping for end‐users will be fruitful for both marketing and business strategies and the impact assessment. Therefore, we move forward from the traditional developer, end‐user, and stakeholder categorisation towards an interactive and reciprocal model for defining our actors. Realising this is important for the sustainability of the ecosystem to be created within the lifetime of the project and after the end of it.

As this deliverable will be submitted prior to the finalization in SP1 of the list of actors and the models to be implemented into the ecosystem, the work presented in this chapter and, subsequently, the related personas and applications scenarios (sections 6.2.5.1, 6.2.6.1, and 6.3.2) are to be amended accordingly. This includes “the people who will join the party and how we will address/attract them”. An updated version of this deliverable will be submitted in 12 months, when respective SP1 documents have been finalised and are available. Then the related information will be revisited in the analytical testing and impact assessment protocols of D402.1, D403.1, D404.1 & D404.2 and the evaluation framework will be updated accordingly.

Figure 3: Functional roles for key actors


4 Evaluation interdependencies in the context of Prosperity4All & GPII

Successful evaluation relies on early identification of any interdependencies amongst different parts of the project (development work evaluation and exploitation activities). The evaluation framework is designed with these interdependencies in mind and this is the reason they are discussed early in the deliverable. Whatever the role and origin of the latter might be, they are required to be examined as they stand but also with regards to their relations, connections, and impacts on all parts of the project. Addressing them within the framework ensures the following:

Selecting success indicators representative of the work carried out within the project;

Avoiding pitfalls when excluding valuable information not easily retrieved when dependencies are not revealed;

Effectively communicating methods, techniques and results between work teams and towards the right direction and in due time;

Creating viable and sustainable links with other research efforts with similar directions (i.e. Cloud4all and GPII, as shown in

Figure 5).

These interdependencies are present on an inner‐project level. The level of interaction with work outside Prosperity4All starts from the overall interdependency of Cloud4all and Prosperity4All in the GPII context, which is evident in

Figure 5.

While, Cloud4All is mostly focusing on creating all the infrastructure required for the most effective possible user preferences profile exploitation in the accessibility world, so that it ensures that user needs and preferences regarding accessibility are met in a transferable, sustainable way, Prosperity4All takes a step forward towards easier development of tools for users with accessibility needs, addressing mainly the developers and business world, and, subsequently the end‐users again that will benefit from high level accessibility products and services. Having as common ground GPII, they both use new solutions, AT and mainstream devices and services, with Prosperity4All specifically aiming to provide an online market for offering customised solutions to end‐users with various accessibility needs and preferences, developers involved in accessibility (or wishing to get involved) and companies and enterprises creating ATs in order to communicate through a real life and alive ecosystem with benefits for all involved stakeholders. As illustrated in

Figure 5, Cloud4All is focused on creating the infrastructure required for the "auto‐personalization from preferences” (APfP) capability of GPII. Prosperity4all focuses on developing the infrastructure to allow a new ecosystem for accessibility developers to grow; one that is based on self‐rewarding collaboration, that can reduce redundant development, lower costs, increase market reach and penetration internationally, and create the robust cross‐platform spectrum of mainstream and


assistive technology based access solutions required. This will be done through "a process based on true value propositions for all stakeholders and resulting in a system that can profitably serve markets as small as one, at a personally and societally affordable cost”.

SP1: Economic model

SP3: Implementations

SP5: Horizontal Actions

SP2: Tools & Infrastructure

SP4: Evaluation

WP1: Evaluation framework & technical validation

WP2: Evaluation with implementers

WP3: Evaluation with end-users

WP4: Assessment of P4A platform

WP5: Demonstration

Training & Dissemination Activities

• SP1 will orient through the supply chains and models the evaluation framework of SP4 WP1.

• SP1 will also continuously interact with SP4 WP4, since it will define the evaluation indicators (among other) for the impact assessment of the integrated infrastructure.

• SP2 tools & infrastructure will be tested by implementers in SP4 WP2, who will give feedback back to SP2 developers for optimisation.

• SP3 user facing tools, in specific, will be tested by end-users in SP4 WP3 who will give feedback back to SP2 developers for optimisation.

• SP3 implementations will be tested by end users in SP4 WP3, who will give feedback to SP3 implementers for optimisation.

• SP2 tools and SP3 implementations will be demonstrated in SP4 WP5. Through demonstration, feedback for optimisation may emerge.

• SP4 interacts with SP5; basically in terms of dissemination and training activities.

Figure 4: SP4 Evaluation interdependencies with other Prosperity4All SPs

Two of the major connection points between C4All and Prosperity4All are: 1) the tools for adding APfP capability to products (the tools will be part of the DeveloperSpace) and 2) the GPII Unified Listing/Marketplace which will be a key resource to developers as well as consumers.

Figure 5: Interdependency between C4All and Prosperity4All

External implementers:

will be attracted through SP5 activities;

with guidance and assistance by SP3 teams;

will use the SP2 development work and resources

their work will be evaluated by end‐users (SP4 evaluation activities).


5 Evaluation in the context of Prosperity4All

5.1 The Prosperity4All Key Performance Indicators (KPIs)

The fulfilment of the evaluation goals can be measured by the application of specific indicators.

The overall project’s key performance indicators (KPIs) constitute the success criteria for the evaluations. As shown below (Table 1), most KPIs will be directly measured/proved through the evaluation outcomes. However some KPIs are also related to exploitation and other activities of the project (complete table in Annex A). Below, it is explicitly clarified where (i.e. which SP, WP, etc.) and how (i.e. through which evaluation processes of the framework as described in chapter 0 and shown in Annex B) these success indicators will be measured. The performance indicators refer to the full duration of the project, which means that the outcomes of the most mature evaluation phase for each type of users will be taken into account. For example, if there are two evaluation phases for an SP2 user facing tool with end‐users, then the outcomes of the second phase, when the tool will have undergone an optimization phase and will be most mature, will be used in the assessment against certain indicators.

These success criteria, in order to server as intermediate success milestones, will be instantiated per phase in the context of the respective evaluation plans. When the p indicator is general, i.e. not referring to a specific tool/application, then it is valid for each tool/application of the same cluster (e.g. for each SP2 User Facing Tool that will be tested with end‐users). The KPIs are connected to the evaluation questions, at least those indicators that will be measured during evaluation activities.

The following table is a summary of the complete table (Annex A) which is based on the original table included in DoW (Table 1, p.23). In this summary only evaluation related KPIs are included, with a brief statement for each KPI connected to the evaluation questions as set to be answered within this framework. There are other KPIs related to dissemination and management activities not addressed in this deliverable. The complete table contains detailed recommendations of methods and techniques to be used for measurement and complete descriptions. Evaluation is an exit‐point for numerous KPIs but not for all of them. Some KPIs are measured in other SPs and this is important for the meta‐evaluation of the work performed within the project. It ensures the harmonization of any efforts carried out towards the deployment of a fully exploitable platform because the project has many exit points and not just the evaluation activities. Numbering in the following table is not sequential but depicts the number of each Prosperity4All vision objective in the original DoW table. The table was refined as a result of the collaboration between the members of the SP4 team, the project management teams and partners involved in the evaluation process. Communication and exchange of ideas were established via regular management and SP4 audio conference meetings, especially covering the first period of the life of the project, when the vision and the main strategy were set. Finally, these indicators have been mapped to the respective places in the tables of


Annexes B.1, B.2, and B.5 (last column – “Success targets/thresholds”). Each measuring technique is connected to specific high level objectives.

Table 1: Key Performance Indicators (KPIs) addressed in SP4 evaluation: Overview and summary

Prosperity4All vision

KPI Related Evaluation Question(s) (p. 5)

Relevant SP/WP Measuring technique/tools

1) Reduce costs Reduction of time for experienced and non‐experienced developers Efficient use of SP2 user‐facing tools and/or SP3 apps/services (without additional costs)

1aii 1ai

WP402 WP403

Qualitative assessment via interviews and questionnaire feedback forms Related to Efficiency and indirectly to Usability (as a positive effect to output quality; reduce time and costs)

Positive Willingness to Have( WtH) and Pay (WtP)

1aiii‐iv WP404 Online feedback forms (part of impact assessment)

2) Address the full range of users

Year 3: User facing tools usability & user acceptance >3 (0‐5) Year 4: User facing tools usability & user acceptance >3.5 (0‐5) Identify main users per development (not existing in the KPI table in DoW)

2 WP403 Usability (e.g. SUS and task‐based usability evaluations), accessibility, and User acceptance scale (e.g. TAM) standardized scales

T401.2: Recruitment – ensure diversity by exploitation of Prosperity4All Collaborative Network Define main user groups per development

3) Address the tails and the tails’ tails

Developers show interest in developing/designing for smaller markets

4b WP402, WP404 User acceptance questionnaires (e.g. TAM), open‐questions, feedback mechanisms for platform (e.g. online feedback tools)

4) Address all technologies

Successful use of SP2 tools/resources/ resources by implementers User acceptance of SP3 implementations increase 1 unit (0‐5)

1 WP402 WP403

Interviews & Utility and Usability metrics (aspects of utility are inherent in usability) User acceptance (e.g. TAM)





for end‐users Ubiquity (Implementation objective: use in different devices, platforms and Operating Systems) (not included in the KP table in DoW)

SP2, SP3

Addressed within implementation (SP2 and SP3)

5) Provide a plan mechanism for creating a vibrant, profitable, assistive technology market

Business models (SP1) and stakeholders report that SP1 market mechanisms address their needs

4 SP1 and WP404 related to SP5

Quality of life, cost‐effectiveness, and organizational quality are valid measuring indicators for impact assessment and will be assessed via online feedback forms These quality attributes will be enriched as soon as business models within SP1 will be finalised

6) Decrease costs and expertise required of mainstream companies

Mainstream companies embracing the Prosperity4All ecosystem and using our tools.

1aiii/4c WP402 for implementers, WP404 and Exploitation related

Online feedback forms (rate of new comers)

8) Involve consumer and consumer expertise in product development

Consumers per site involved in product development design.

4a/5 WP402 WP404 for stakeholders and related to exploitation and dissemination

Interview developers: users involvement in implementation process (easy to involve, level of involvement, number and type of users involved) Usage/popularity, globalization/diversity of consumers communicating with developers when the ecosystem is out; can be measured in impact assessment ( as an outcome of exploitation/dissemination activities)

10) Recruit and engage more players

Number of unsubsidized external developers and stakeholders engaged in the

4a/5 In the context of T401.2 “Participants recruitment” as it seems to be

This is basically a success criterion of the recruitment and dissemination processes Recruitment utilises the





project>10 (each category)

criterion related to recruitment process rather than testing and could be reinforced by activities within SP5 WP402 in collaboration with SP2 and SP3 for implementers

Prosperity4All Collaboration Network to ensure this KPI is met For stakeholders: feedback monitoring tools (i.e. popularity/usage indicator) Interviews and questionnaires about revealing factors (indirect relation to KPI) about easiness, comfort, and successful utilsation of motivational techniques (e.g. gamification)

11) Not forget documents, media, and services

Number and Quality of WP204 results on Media and Material Automated/Crowd sourced Transformation Infrastructures and extent of usage of enhanced features of these tools

2/3 In the context of WP204 and WP404

Assessed in impact assessment with indicators such as: popularity/usage, conversion rates (visits with actions/visits no actions), number of downloads/access, average potential revenue per visit (for WP404)

12) Provide both technology and human accessibility service support

AoD ratings>3 (0‐5) for end‐users Ability to use (75% of users)

3 WP403 User acceptance scores (e.g. TAM) compared to reference case & user experience metrics (e.g. questionnaires, interviews, service diaries, etc.) Accessibility, utility, and general user experience qualitative and quantitative data

13) Work across all domains of life

External implementation included in the project Improved SP3 apps/services increase of user acceptance 1 unit (0‐5)

1ai/3 WP402, SP3, SP3, SP5

Records of number and types of external implementers interested in the project (i.e. interviews, communication reports, etc.) User acceptance ratings (e.g. TAM) and user





experience metrics (diaries, questionnaires, etc.)

14) Be applicable and work internationally

External implementers participating in evaluations (from at least 8 countries including 2 non‐European)

1/4a/5 In the context of WP402, assisted by SP3 and accommodated by dissemination and exploitation WPs

Effective utilisation of networks already established (i.e. various entities have signed letters of collaboration) will be defined in recruitment protocols for evaluation phases

(discussed in Chapter 8) This is a success criterion also for dissemination and collaboration activities within the project in order to attract external implementers to participate in evaluation among other project activities

Stakeholders participating in evaluations (from at least 8 countries including 2 non‐European)

4/5 In the context of WP404, accommodated by T502.3 End users’ and Stakeholder Connections

Relevant success indicators could be diversity, popularity as an impact assessment indicator. It is a prime dissemination target to attract stakeholders and engage them in any project activities, including evaluation.

5.2 Steps in creating the evaluation framework

The evaluation framework involves a combination of diverse end‐users groups with interchangeable roles at various parts of the interaction process. Before moving on to defining the characteristics of the Prosperity4All framework, it is necessary to outline the main characteristics of the actual evaluation components and steps. The research protocols of each evaluation plan will include the necessary evaluation instruments for testing to take place (e.g. specific scenarios/tasks, research questions and hypotheses, procedures, etc.).

The evaluation framework is grounded in a series of iterations across pilot sites and, as already stated, falls into the Human‐Centered family of evaluation methodologies.

‐Who are our target groups? Target groups are the list of actors presented in


Figure 3 and will be finalized within the activities of WP101. There are two major categories: a) implementers, i.e. developers and designers who will use the SP2 resources, b) end‐users, who will use the SP3 applications and services and the user‐facing SP2 applications and services, c) other interested parties will be considered mainly for the impact assessment.

‐What do we aim to measure? The initial evaluation questions (chapter 1.2) set the frame for the evaluation, and then the project’s KPIs ( 5.1) set the objectives within this framework. In a nutshell, evaluation activities aim to measure the utility of SP2 for making accessible applications/services for implementations and how accessible these implementations are for end‐user groups. In addition, the impact of the deployment of the Prosperity4All ecosystem on the accessibility market will be assessed. Therefore, the whole P4All ecosystem is addressed by the impact assessment and not the separate evaluation phases, which they address the specific groups of actors. The whole is not the sums of its parts. ‐What we want to end‐up with? The final outcome is a flexible framework for the conduction of pilots across the different pilot sites but also for the assessment that will be carried out by the implementation teams (mainly at their own sites) at each iteration phase. Evaluation will be conducted for the work carried out within the project (for developers, implementers, end‐users, and stakeholders). The framework also addresses the rationale for further impact analysis to be carried out (consistent process to derive necessary methods and metrics as evident in the table of Annex B.5).

‐How we know that our measures are both valid and reliable? Measures that are included are based on literature searches and state‐of‐the art techniques, standardized questionnaires and appropriate indicators. A valid measure is one that measures what it is supposed / expected to measure (e.g. selecting a standardized questionnaire will ensure a valid measurement). A reliable measure is one that will provide the same outcome every time it is applied (i.e. selection has been based on current literature and similar work performed, wherever available). A compilation of measures already used in the literature are provided in Annex B. Some of them are already standardized questionnaires, others will not be. There will be also instances were broader and larger effects are measured, where standardized questionnaires do not make sense or are not applicable. In these cases, there will be no investigation of validity and reliability but for those situations it makes sense to use standardized instruments, they will be preferred. Data gathered from distribution and use of standardized instruments might prove useful for the impact assessment measurements. ‐Is it flexible enough to accommodate different pilot sites, diverse markets, and reciprocal end‐user roles? The flexibility is ensured by offering diversified selection of methods and measures without restricting to selecting specific ones (i.e. solely traditional like the System Usability Scale (SUS) [ 8], Software Usability Measurement Inventory SUMI [ 9] or


solely innovative ones created within the project). Selected indicators are depicted in figures Figure 7Figure 11 Figure 12 for each part of the evaluation framework (evaluation with implementers, end‐users, and the P4All ecosystem impact assessment, respectively) and the respective tables (Annexes B.1, B.2, and B.5). The dynamic and multi‐level nature of evaluation within Prosperity4All is a unique opportunity to assess aspects not present in other research efforts (increased variety in presented methods and metrics as evident in Annex B).

‐Does the framework provide information for statistical analysis and data processing? The framework will provide generic information for statistical analysis and data processing. Any qualitative differences between iterations will be interesting to investigate. Nevertheless, guidance for selection of appropriate statistics, either descriptive or inferential, will be provided when tests and questionnaires are selected, as it is expected to be the case for the separate evaluation plans. The calculation of Confidence Intervals for e.g. usability ratings will give precision and location of measurements, especially with reference to standardized measureable aspects (Chapter 13).

‐Does it have a flexible data analysis plan? The data analysis plan will be prepared for each separate evaluation phase based on the backbone provided by the framework and therefore will be provided separately for each evaluation plan. Separate planning templates will be distributed to test site managers with regards to meta‐data descriptions, test execution, deviations from original planning and justifications, and other relevant issues. The framework is flexible as it targets areas or groups of indicators and metrics and not specific instruments. This was purposefully chosen in order to guide the evaluation but not restrict any future additions.

‐Does it offer alternatives in case of recruitment problems? A separate chapter dedicated to recruitment exists in the evaluation framework considering the networks identified in the technical annex and existing test site collaborations (e.g. previous projects) in order to avoid any hindrances. Suggestions for applying relevant recruitment techniques are provided (Chapter 8).

‐Does it accommodate for real aspects of the ICT evaluation within testing iterations? In each iteration phase, techniques and tests will be specific to the addressed user group. For example, evaluation of use of code (SP2) for making SP3 applications accessible will not be conducted with usability tests but rather with participatory methods such as heuristic evaluations and application of the cognitive dimensions methodology (Annex B.1).

‐Do we identify key actors for strengthening and stabilizing the feedback loop? Key actors in the feedback loop (i.e. feeding back the findings of the evaluations into the development process of SP2 and SP3) will be the evaluation working group including partners from SP2 and SP3 that are directly involved in the development work as discussed in the chapter dedicated to work allocation (Chapter 14). In addition, SP1 will provide the basis and reference case for the impact assessment.


‐How concrete and stable can the framework be at each stage of development? The evaluation framework provides the foundation for the evaluation by identifying specific evaluation objectives and areas, facilitating the definition and design of its dimensions (i.e. validation, evaluation phases, and impact assessment). Each evaluation plan is presented with no elaboration and details as there will be separate deliverables dedicated to these plans and it is not feasible to prepare them at this stage of the project.

‐Can we allow for changes? On one hand, the framework is flexible as it does not include specific procedures for testing. Each evaluation phase testing plan will be adapted based on the development work carried out in the project and the specific available tools/ applications available at each phase. Therefore, if changes are made during development phase, then they will be reflected in the testing plans. On the other hand, it provides the materials that would prove more appropriate, useful and efficient for any evaluation within the project, without neglecting to consider business‐oriented aspects as they are quite apparent in both the objectives and the impact assessment aims of the Prosperity4All ecosystem.

‐Have we thought about communication breakdowns and how to avoid/fix them? Communication among the SP4 task team has been established since the first months of the project with separate virtual meetings with developers, implementers and test sites. Feedback was collected from involved partners on how they envisaged evaluation at the beginning of the project. However, this was an early attempt; each group has a different idea and role in the project and bringing people together in such an early stage considers that their visions of the work to be performed might not be the same and efforts to harmonize them and bring them closer ensures all ideas are well‐presented and represented from all five cornerstones (SPs) of the project. This checklist is far from exhaustive but illustrates the top‐down thinking required by all involved partners (developers, implementers, evaluators, pilot site managers; what we want to improve/enhance) and bottom‐up processes (analysts, evaluation planners, the framework itself, resources’ availability; what we want to measure). This list of initial questions and answers is considered when building the framework and will be elaborated further in later evaluation plans.

This document compiles relevant and appropriate methods and techniques available and agreed by project partners to be used in the iteration phases to follow. To facilitate even further the selection of appropriate instruments, Annex B includes measuring instruments – including questionnaires and techniques – that are related to objectives set in the framework for carrying out testing with implementers and end‐users and probably even for conducting impact assessment analysis (i.e. web analytics for gathering large datasets for longer periods of time). Some of attributes of interest are presented in the respective logical models (Chapter 6) but more ‐that might be used in the evaluation plans‐ are mentioned in Annexes B.1, B.2, and B.5 that could be useful for the iterations to follow. Thus, Annex B will serve as a quick reference compilation for instruments to be used in tests that will be carried out


with different user groups. This compilation is far from exhaustive and restrictive and it will be enriched further for each evaluation plan (i.e. each evaluation plan will include an Annex with relevant materials). Test sites will use these and other appropriate instruments in evaluation phases depending on evaluation needs arising at each level and each site. Each evaluation plan will exploit Annex B and it shall expand it within each respective deliverable.

5.3 Iterative phases

As human‐centered design is a proven prerequisite for a successful outcome (in all aspects), the different types of evaluation need to be placed in a formative evaluation context, so as to prove valuable for the implementation process.

Next to formative assessment, however, summative evaluation needs also to take place. While, summative tells you how usable a system is, formative tells you what is not usable about it [Fehler! Verweisquelle konnte nicht gefunden werden.].

The aims of the three aspects of evaluation within the project are the following: 1. Technical validation: the system and its components to be technically robust

and identify what isn’t enough robust and strengthen it. 2. Iterations: Find out what do all relevant stakeholders think of the ecosystem

(and its different parts) and what they believe is not good about it. 3. Impact assessment: Find out how much prosperous is the ecosystem (or not)

and how could it get more prosperous.

Evaluation in Prosperity4All will take place across four distinct evaluation phases. The feedback collected and consolidated in each phase will be used for optimization before the next evaluation round begins (formative) whereas, in each phase, the current outcomes will be assessed. The evaluation framework aims to set a standard basis for all iterative evaluation phases, to allow the collection of comparable results (across phases), but also to ensure that all three above aspects will be sufficiently covered.


Figure 6: Iterative phases

It is essential to pay attention to several issues stemming from the following diagram:

1. Gradual implementation of the Prosperity4All system: The Prosperity4All ecosystem will not be built at once. As such, we will not be in position to evaluate the whole of if, meaning the multi‐sided platform, from the beginning. Inevitably, we have to evaluate progressively the building components of the system, across the different evaluation phases. This also implies that not all possible interactions between all potential actors that will participate in the ecosystem will be present. In reality, before the 3rd evaluation phase, the actors that will be matched in the available versions of the tools/resources will come from limited categories (see above) and will not be representative of the openness and wholeness of the ecosystem. In the third evaluation, however, the progressively built ecosystem will open towards external to the project actors (although the very beginning will have been done from the 2nd phase), and this will allow a more “real‐life” approximation, which will also make feasible for the first time the impact assessment, which is not meaningful earlier.

2. 3.


4. Table 2 presents in short how the ecosystem will look before each evaluation phase, who will be the actors present, what is the expected level of interactions among them and which is the type of evaluation that is applicable on the basis of the above. 5. Horizontal technical validation: Technical validation will always be performed

at least 2 months before each iteration with users (whoever these users may be, i.e. developers, implementers, end‐users, other stakeholders, etc.), in order to allow adequate time for debugging and correction of significant technical failures. All details of the technical verification and validation that will take place in the project will be detailed in the Software Verification and Validation Plan (D401.2) that will be kept updated throughout the project. 6. Impact assessment and potential for a lifecycle evaluation: One of the major challenges of Prosperity4All is to build an ecosystem that will be dynamic and will auto‐evolve and auto‐improve after the end of the project. For this reason, automatic (or semi‐automatic) mechanisms will be built in the system that will allow its lifecycle evaluation and impact assessment. These automated feedback tools will be added to the ecosystem and will give quantifiable and automatic results. There is a possibility that first attempts will start from the 2nd evaluation phase of the project in order for the ecosystem be improved as much as possible before the project end and starts their lifecycle use. Also, lifecycle evaluation makes sense only in terms of gathering user –related feedback and long‐term impact assessment, and not technical validation, as technical failures will be auto‐recognized through the feedback mechanisms and will be at the disposal of their contributors to correct them if they still want to be part of the ecosystem (this is part of the “openness” of the ecosystem). Lifecycle evaluation is not part of the evaluation activities as defined in the DoW but rather refers to the evaluation of the ecosystem after it is deployed and available for use by diverse actors and groups of actors. Lifecycle evaluation will happen either near the end or after the end of the project and will be achieved through automatic feedback mechanisms like the ones used for impact assessment. The automatic feedback mechanisms will allow users to provide their feedback about their experience (e.g. FLOE metadata feedback tool with the capacity to integrate into existing websites to allow users to provide feedback about how a resource matched their preferences and to request alternatives or modifications to improve it; for an example with a simple Oracle Enterprise Repository (OER), please check their metadata feedback tool). The lifecycle evaluation will aim to sustain the growth and maintenance of the P4All ecosystem.


Table 2: Evaluation phase 1 – Content, Actors, Evaluation methodology

EP 1 TIMELINE: TESTING MARCH‐APRIL 2015

Content

(What will be

available before the

evaluation phase)

A limited set of SP3 applications (internal to the project) that will have selected/decided to use certain SP2 components, frameworks and services (called tools, internal to the project). Developers of SP2 will indicate what tools will be ready to integrate for the first evaluation phase. These tools will be sampled from an initial set of tools documented in the component repository that will be created as

part of T202.1. 3 months ahead, each internal implementer will be

called to pick to at least 2 components, services or frameworks from the developer space of SP2 that have been contributed in the scope of the project. Tools that will be mostly tested will be single components and services with APIs. The component repository itself as web‐based interface between both developers and interfaces will be tested first.

Level of ecosystem

interactions

happening – who

are the “actors”

that are present

Interactions between internal implementers and tool/component developers are limited to existing facilities of the GPII developer space such as mailing list, the project bug tracker and to direct contact between the groups.

An initial version of the developer space repository will be used for matching “producers and developers of tools/components and applications” of the ecosystem.

The developer space repository will be used as an interface to identify external developers and components.

Evaluation

methodology &

involved actors

Technical Validation: It will be conducted for all modules/tools that will be involved in the evaluation phase 1.

User evaluation: the SP2 DF tools/components that Internal (SP3) implementers will start using will be tested for the 2nd Implementation/Optimization phase (Focus on Formative Aspects). As part of the web based infrastructure of the DeveloperSpace the repository will be evaluated with all internal developers that have started their implementations. Hypothesis underlying different tools to be developed as part of the DSpace will be formatively tested with both implementers and developers.


EP 2 TIMELINE: TESTING STARTS JULY 2015

Content

(What will be

available before

the evaluation

phase)

A more increased set of SP3 applications (internal to the project and a few external to the project) that will use selected SP2 tools/resources/ resources (internal to the project). In the second phase implementers will also be free to choose further tools that are listed within the DSpace repository but have not been developed inside the project. Tools particularly addressing external implementers will particularly include IDEs and development frameworks with user interfaces such as the Web version of the AsteRICS ACS and the AsteRICS runtime API. The focus will particularly shift towards smart and adaptive components. Furthermore the first


EP 2 TIMELINE: TESTING STARTS JULY 2015

set of services will be tested in depth. Also e.g. feedback and gamification mechanisms will be tested as part of the web based interface of the DSpace. Evaluations with developers will cover some of the developer facing parts of the DSpace infrastructure. Also all other developments of SP2 should be available for at least first/internal implementation testing, some of the SP2 developments should be available for testing with external implementers.

Level of ecosystem

interactions

happening – who

are the “actors”

that are present

Interactions are limited to interactions between internal and external implementers and tool/component developers (communication will be gradually formalized via elements of the developer space such as the component repository)

Participants matching “producers and developers of tools/components and applications” of the ecosystem.

Evaluation

methodology &

involved actors

Technical Validation: It will be conducted for all modules/tools that will be involved in the evaluation phase 2. User evaluation: SP2 tools/resources/ resources that Internal (SP3) implementers have integrated will be tested for the 3rd Implementation/Optimization phase. Relevant tools will be further tested with first external implementers. Focus shifts towards both quantitative and qualitative testing methods that scale well. Still objectives are formative. As the ecosystem is used increasingly, developer facing tools will be assessed. Evaluation with implementers will further be extended particularly towards frameworks and services.

Impact Assessment (pre‐mature phase): Automatic feedback assessment of the overall ecosystem operation by all actors present (as outlined above)


EP 3 TIMELINE: TESTING STARTS AUGUST 2016

Content

(What will

be available

before the

evaluation

phase)

Beta versions of SP2 tools/resources that have been used by SP3 applications (internal and external to the project).

First release of fully functional ecosystem up and running with available internal and external tools including many parts of SP2 Facing tools (Unified Listing, DSpace, MarketPlace, Feedback, Gamification)


EP 3 TIMELINE: TESTING STARTS AUGUST 2016

Level of

ecosystem

interactions

happening –

who are the

“actors”

that are

present

Interactions between internal and external implementers, internal (and potentially external) developers and all the other types of end‐users as shown below in the context of the ecosystem.

Participants matching the following users of the ecosystem: – producers and developers of tools and applications – consumers with disabilities and other consumers with unmet needs – (some may be prosumers, sharing their approach with others) – service organizations serving people with disabilities – independent service providers (e.g., therapists) – public or government social services mandated to support people with disabilities

– insurance companies charged with addressing the needs of people with disabilities

– teachers or educators – employers of people with disabilities – families that include people with disabilities and other groups that include people with disabilities

Evaluation

methodolog

y & involved

actors

Technical Validation: It will be conducted for all modules/tools that will be involved in the evaluation phase 3.

User evaluation: Assessment of beta versions of SP2 DF tools by internal (SP3) and external implementers. Focus shift towards summative evaluation objectives and success thresholds. Summative testing will be focused on the effects of the overall P4A project/ecosystem on different developments and implementations. Testing particularly with non‐professional developers (potential prosumers) will be done for the first time in parallel with the first end‐user testing.

– Assessment by end‐users (all the ones listed above, besides developers) of SP2 User‐Facing tools (Unified Listing and Open Marketplace – WP201, user‐configured Assistance on Demand – WP205, Consumers – Developers connection tools‐ WP206 and perhaps more to be defined)

Impact Assessment: Automatic feedback assessment of the overall ecosystem operation by all actors present (as outlined above).


EP 4 TIMELINE:TESTING STARTS MAY 2017

Content

(What will be

available

before the

evaluation

phase )

All SP2 tools/resources and a series of tools/components available in the DSpace. Demo implementations outside the DSpace infrastructure exist that expose the end‐user facing aspects of the SP2 tools/resources (Feedback mechanisms, etc.)

All SP3 applications and external applications available. Ecosystem platform up and running with internal and external tools

integrated.


EP 4 TIMELINE:TESTING STARTS MAY 2017

Level of

ecosystem

interactions

happening –

who are the

“actors” that

are present

Interactions between all actors (ideally) that could be part of the ecosystem.

Ideally, participants should match all the possible actors of the ecosystem but this will not probably be feasible within the lifetime of the project. However, representatives from the three main categories of actors, as defined within SP1, will be addressed.

Evaluation

methodology

& involved

actors

Technical Validation: it will be conducted for all modules/tools that will be involved in the evaluation phase

User evaluation: Full scale usability testing of SP2 User ‐ Facing tools by

end‐users (in the broad sense, representing all different categories). Testing with selected SP3 applications (enhanced/developed with SP2 tools/resources/ resources) by end‐users (before and after when applicable), such as the Assistance on Demand services (WP303), Routing Guidance System (T301.6), T302.3 applications (improved educational content for blind and low‐sighted)

Impact Assessment: Automatic assessment of the overall ecosystem operation by all categories of actors present. Impacts in specific aspects will be assessed also through “before‐after” evaluations with end‐users.

Table 6: Lifecycle evaluation and impact assessment – Content, Actors, and Evaluation

methodology

LIFECYCLE EVALUATION & IMPACT

ASSESSMENT

MIGHT START DECEMBER 2017 ‐

Content (What will be available

before the evaluation phase)

Everything intended in the project available and optimized.

Hopefully more external to the project tools and applications (at least from the organizations that belong to the Collaborative Network).

Level of ecosystem interactions

happening – who are the “actors”

that are present

Interactions between all potential actors that of the ecosystem, anticipated or not (see Chapter

3). Potential actors are outlined in Chapter 3 – the list is not exhaustive.

Evaluation methodology & involved

actors User evaluation & Impact Assessment: Automatic and systematic recording of measures necessary for carrying out the impact assessment of the overall ecosystem operation by all actors present (as outlined above).


6 The evaluation framework

6.1 Introduction

A formative Human‐Centered evaluation inclusive framework is adopted in the early stages of the project. Usability measures will be gathered even at the early iteration phases, however, summative assessment of holistic user experience will be only carried out at the end of development process to provide comparison data and success rates (relative to thresholds and criteria set during the lifespan of the project). It should be borne in mind that this is far from any traditional iterative testing of an application and its usefulness and acceptance from a zero‐based approach. We start with the main project objectives and the KPIs and then stepwise we are building the framework considering the requirements for the categories for actors and the development work to be carried out within the project.

The evaluation framework addresses every element related to evaluation as soon as it is available for testing. In other words, it involves the following:

Technical validation: ensuring robustness and reliability of tools and applications to be tested by identified actors participating in all evaluation phases of the project.

Recruitment: the methods and techniques used to recruit participants based on inclusion/exclusion criteria elaborated further in respective testing plans as they depend on requirements of testing and procedures.

Evaluation phases: sketching the main components of each evaluation is presented with regards to the content, the user groups, the methods, tools and levels of maturity.

Main questions: these should not be confused with research hypotheses within a strict evaluation plan (i.e. testing null/testing hypotheses). These are the main questions of the evaluation framework, starting from purpose of evaluation and working through the final lifecycle assessment. These questions are general and will be the basis for the research questions in the evaluation plans. The evaluation questions are presented in Chapter 1.

Subjects of evaluation: Define the main subjects to be evaluated in each evaluation category. The main subjects under investigation will be presented based on DoW and early work available currently at the project.

Main research objectives indicators, methods, techniques: separate tables were prepared (Annex B) for implementers and end‐users for stratifying the high level objectives down to metrics and indicators.

Impact assessment of the integrated Prosperity4All platform will be also targeted in a sustainable way.

6.1.1 Major dimensions of evaluation in Prosperity4All

The evaluation framework encompasses the three following types/aspects of evaluation:


Technical validation has to be performed before any evaluation to any degree with any type of users. Technical validation will deal with the robustness of the system and its components and will encompass unit testing, integration testing and system testing. While unit and integration testing will be carried out by the individual development teams of each project deliverable every time unit or module modifications happen or new releases are produced, system testing will be conducted before the evaluation iterations. Technical validation is addressed in task 401.3 and will be presented in a separate deliverable (D401.2). Technical validation is a horizontal activity aiming to ensure that the tools and applications to be tested will be verified to work properly, they meet the specifications and requirements for actual testing and that they fulfill their intended purpose. Verification and validation will be implemented based on the Software Verification and Validation Plan (SVVP) by each development team for each tool (SP2) and application (SP3) to be used in the evaluation phases. As already mentioned, technical validation measures, procedures and the whole methodology will be described and discussed in detail in deliverable D401.2, thus, not included in this deliverable. Validation tests will be carried out in three development phases at three levels: unit testing, integration testing and system testing and data gathered by each testing phase at each level will be analysed to estimate performance success and set optimization goals for the next development cycle. Technical validation of tools or services that will be involved in each evaluation phase will start two months prior to the start data of the respective evaluation phase in order to ensure that applications, tools, and services are working and running. The latter depends on the functional readiness of each prototype to be used in each iteration phase. Further, tasks, scenarios and evaluation materials will be tested in dry‐run before testing to validate their appropriateness, feasibility and value for each evaluation with the support of S2, SP3, and SP4 teams.

Subjects of evaluation. The two user categories will evaluate different developments. Implementers will evaluate SP2 developer facing/non facing resources and end‐users will evaluate the SP3 implementations. Other stakeholder groups and users holding both of the aforementioned roles will be addressed in impact assessment. Impact assessment will be carried out for the real life experience of the P4A ecosystem and other evaluations will contribute with any common measures and useful insights. The main areas of development that will be subject to evaluation are the following:

DeveloperSpace (Developer Facing elements)

Tools for Auto‐personalization from preferences” (APfP) capability

Building Blocks

Frameworks and Tools

Financial Infrastructures

Developer Resources (information, people, testers, etc.)

Service Infrastructures

Media Augmentation Infrastructures

Document Transformation Infrastructures

Doc/Media Hybrid Infrastructure

Assistance on Demand


User Tools (User facing elements)

Unified Listing

Marketplace

User Participation

And the principal distinction in P4A evaluations are the following:

Evaluation – highest level objectives

Developer Facing (tested with developers) (usable and useful testing)

User Facing (tested with users) (usable and useful testing)

Overall Prosperity Evaluation – (impact assessment)

Within SP2 the overall technological infrastructure (WP201) will be developed, where the various tools and components will be used integrally by developers with an integrated payment system (WP201). The implementers will evaluate the 1. Developer Space (DSpace) and the components it includes:

resources to help new researchers or developers in the field get started including connections to consumers and other experts;

a rich array of components for building new or better solutions for different user groups with accessibility needs;

standard building blocks so they don't have to reinvent them once again standard input output and processing modules (T202.2 and T202.3);

special translation modules (e.g. braille translators) (T202.2);

specialized modules to allow them to incorporate leading‐edge research into their products (T202.4, 202.4‐6); and

accessible web components for implementers to use on their page and have the pages to behave differently when individuals visit the site with different personal preference sets, etc.

2. Frameworks and development environments to allow them to start with a functioning system and modify it for a new unmet need. 3. Open source transformation service infrastructures that can allow people to set up new services for their language or culture to make media and materials accessible including the following:

the infrastructure for creating document transformation services for geographies or for a company or government agency where documents can never leave;

the infrastructure for media transformations services to make media accessible (end‐product evaluated by end‐users);

the mechanism to tie the two infrastructures together to create for the first time a unified way to handle the rapidly emerging documents and ebooks that are have embedded movies and media in them;

All of these tied into the Cloud4all/GPII auto‐personalization, to make accessibility instant and transparent to the user (of Cloud4all/GPII) (T201.1) (end‐product evaluated by end‐users).

4. An Assistance on Demand service infrastructure to allow individuals and organizations to create Assistance on Demand services to handle those disabilities


where we cannot automate accessibility to meet their needs (cognitive, language, and learning disabilities and aging) (WP205) 6. A series of mechanisms to keep consumers and developers closely connected in order to keep the entire ecosystem consumer‐need‐focused, and able to address the tails of the distributions rather than being “disability‐sweet‐spot”, technology, or existing‐product‐line focused.

mechanism for providing feedback on existing products (evaluated by end‐users)

mechanisms for describing tricks and strategies to make what exists work (evaluated by implementers)

mechanisms to allow users to ask for, or even pledge funds toward, new features separately or while they are browsing/shopping for an existing solution) (implementers and end‐users).

and for end‐users: SP3 implementations will be evaluated by end‐users and they cover a very wide range of application domains: Communication, Daily Living Health and Accessible Mobility (WP301), Education, eLearning, Business and Employment (WP302), Assistance on Demand (WP303). The specific SP3 products that will be enhanced per application area in presented also in page 39. The SP3 implementations also include deployment of the SP2 tools in public access points (T301.3). End‐users will evaluate the Unified Listing and Open Marketplace (T201) that allows users for the first time to find not only any assistive technologies that might help them but also mainstream products that have accessibility features built into them that would address their needs. Micropayment and GPII/Cloud4All autopersonalisation, although presented as separate dimensions below, will be most probably available and will be evaluated with the marketplace.

They will also evaluate the micro‐payment system for Assistance on Demand (AoD) which will probably bring the “appstore model” to the accessibility needs’ area.

Assistance on Demand services capable of providing new services to low incidence consumers whose needs cannot be met by technology today (or anytime soon). Allowing consumers to provide services to others or to other consumers;

enabling individuals with lower technology skills to provide technology mediated services through specially designed interfaces;

allowing individuals to be able to configure Assistance on Demand services for family or loved ones themselves (205.4);

that are all tied into the auto‐personalization of Cloud4all/GPII to make them autoconfigure and to ensure assistants communicate effectively to anonymous (but verified) requests for assistance where needs cannot be known. End‐users will evaluate the feedback mechanisms that will be built to allow consumers to be able to better connect to developers to pass on new ideas are allowed developers to know the user priorities for new capabilities.

The resources available in the Developer Space will be evaluated by internal and external implementers. The service infrastructures and the user tools will be evaluated by the end‐users and in impact assessment representatives of major actor


categories (including stakeholders) will participate in the overall Prosperity4All evaluation with consideration for economic, business, market, and real use perspectives. The impact assessment is an overall assessment when the ecosystem will be running with all components, developments and resources in real life and accessed by internal (e.g. operators) and external to the project actors and re‐actors.

Evaluations with users. For the evaluation framework the distinction will be made between two major groups; the implementers and end‐users, as the primary groups of interaction with the system. The stakeholders can only provide indirect feedback on evaluations but be more active in impact assessment. Evaluation will be mainly formative at earlier stages for both implementers and end‐users and becoming more summative towards the final evaluations. Three iterations will be carried out for implementers and two for end‐users with main methods being the expert reviews, focus groups, and remote testing with mainly qualitative data collection during the initial phases and moving towards field and performance testing in the final phases. Usefulness, utility, cost efficiency, internationalization are attributes of particular interest during the four in total iteration phases.

Impact Assessment (IA) is "a process aimed at structuring and supporting the development of policies. It identifies and assesses the problem at stake and the objectives pursued. It identifies the main options for achieving the objective and analyses their likely impacts in the economic, environmental and social fields. It outlines advantages and disadvantages of each option and examines possible synergies and trade‐offs" [ 10].

Impact assessment is the upper scope of the evaluation activities in Prosperity4All. We need finally to understand in an as objective as possible manner, the viability, growth, sustainability of the P4All ecosystem. Obviously, some of the aspects that are being introduced in the evaluation with users are also feeding impact assessment. However, impact assessment refers to the whole ecosystem in a much broader sense.

In Prosperity4All, specific mechanisms will be built in WP404: “Assessment of Prosperity4All platform”, for Prosperity4All ecosystem performance monitoring against the “prosperity” measures that will be first identified in SP1 and applied afterwards in WP404. Impact assessment is not the same as lifecycle evaluation, but the latter will definitely follow the first. Lifecycle evaluation refers to the potential of gathering measures and data after the end of the project when the ecosystem will be alive and growing. The lifecycle evaluation is not part of the evaluation as defined in DoW but it will be based on the metrics as defined in impact assessment deliverables (D404.1 and D404.2). Its purpose could be to preserve and increase the sustainability of the ecosystem, improve and ameliorate its parts, as it grows and progresses. An alive system requires regular, systematic check‐ups to ensure it is healthy and its growing process is healthy, too.

For each of the main evaluation categories, logical models were created for guiding the later evaluation plans. Each logical model contains a similar structure; actors and tools, the process, and the outcomes.


6.2 Evaluation with implementers & users

Human factor evaluation is applicable for both the implementers that will assess the SP2 tools/resources that will have integrated in their applications (mostly back‐end evaluation), but also for the end‐users (of all different categories as outlined in Chapter 1) that will assess both the SP2 User Facing tools but also some of the SP3 applications that will have integrated one or more tools (only front‐end evaluation).

The evaluation perspectives are different as well as the objectives but both evaluations are supplementary and serve the same scope, to provide an efficient and highly performing ecosystem. During the selection of actors not only key actors will be selected but also professionals from supporting fields (e.g. requirements’ analysts, quality assurance specialists). The roles might be reciprocal and, therefore, the creation of a network map would depict who are the key actors, their relationships, and the exchanged data. The network map depicts their connections; how they might interact within the ecosystem and how the professionals might also be users (e.g. an implementer with visual impairment can also act as an end‐user).

Data‐to‐be‐collected (as presented in the tables of Annexes B1, B.2, and B5) are categorized in qualitative and quantitative, and such categorization derives from statistics. Both data types can be subjective and objective. Neither is exclusive or inclusive.

There is considerable overlap between the indicators to be studied and methods and measuring instruments (e.g. questionnaires and Think Aloud protocols). The reason for presenting different techniques for the same indicator is fourfold:

a) The evaluation framework serves as a reference point for suggesting appropriate and relevant techniques. The lack of explicit testing plans, makes it inevitably difficult to specify the optimal measures (if such exist);

b) Using different measures could reveal valuable information lying in the “gaps” and one data type could complement the other;

c) Enhance the potential of impact assessment measures;

d) Add to the validity (wherever potential for data triangulation exists).

The evaluation stratification tables in Annexes B1, B.2, and B5 were prepared following a top‐bottom process, i.e. from higher evaluation objective to specific or general success criteria, depending on the objective. These tables will be used along with the compilation of measurements in order to allow the selection of the methods and metrics to be used at separate evaluation phases. The stratification process comprises the following: a) high level objectives b) attribute, c) measuring methods, d) measuring techniques, tools, success criteria. The latter remains generic, till the actual research questions of each detailed plan become available. KPIs will be measured for mature versions of SP2 tools/resources and implementations through the process of answering the P4All evaluation questions. In order to answer these evaluation questions, the appropriate methods and indicators are selected as depicted at each logical model (sections Fehler! Verweisquelle konnte nicht gefunden werden., 6.2.6, 6.3). These are the primary and direct indicators appropriate for each method applied at each evaluation phase for each


addressed user group. These techniques and indicators will also be used for measuring KPIs. It should be borne in mind though that KPIs will in large be measured during field testing (i.e. real life testing) and additional data might be gathered resulting from market and business modeling within SP1 and the indicators selected within WP404. Further supplementary and secondary measures and techniques may be selected either from the evaluation tables or tool/implementation‐specific depending on testing requirements.

6.2.1 Evaluations with implementers

Implementers are the Prosperity4All actors who will use the SP2 tools/resources in order to make their applications more accessible or to improve the user experience of accessible applications (e.g. implementers in SP3). In these user groups, developers will directly add outcomes of SP2 to SP3 applications and use SP2 developed tools and infrastructure during the improvement. During their implementation, they will also evaluate this work and implementers will evaluate the utility ‐among other attributes‐ of SP2 tools/resources to “accessibilize” SP3 applications and services. SP3 implementers are representatives of a bigger group of “implanters” that need to be considered in the wider scope. During a workshop in Crete we identified the following relevant stakeholders, which may enter the Developer Space (DSpace) ‐from an implementer perspective‐ and are closely related to actors identified in chapter 3.

Categories of stakeholders that have an implementer perspective on the DSpace:

1. AT Developers

• Hardware AT

• Software AT (install)

• Web/Cloud AT

2. Mainstream Developers

• Desktop Applications

• Web/Cloud Applications

• Cloud Service Providers

• Hardware/Appliances

• Mobile Applications

3. System Integrators

4. Researchers

• University/College/Tech‐Institute /Faculty and Students

• Other Tech (or other) education

5. Community Developers

• Micro Service Developers

• Friends and Family

• Prosumers

6. Service Delivery Professionals

• Clinicians

• Teachers

Further stakeholders will be identified by the economic analysis (SP1) and will influence the evaluation framework. Other stakeholders may be influenced by the implementer’s perspective of the DSpace; this particularly includes the government. Governmental agencies are setting the regulatory frame for many implementations as well as do procurement officers or decision makers. They highly influence what will be considered for implementation. The evaluation acknowledges that roles may be fluid, so that also consumers with decision making powers have influence on


implementation decisions and that particularly pro‐sumers are interesting stakeholders in the realm of accessibility. One underlying assumption of the evaluation framework is, however, that the implementers’ perspective on Prosperity4All is common but very heterogeneous for all the stakeholders.

The following table presents the connection between the evaluation questions, the KPIs and the logical model for the end‐users (Table 7). Each evaluation question is linked to a KPI and according to the logical model there are certain indicators that are primarily connected to both of them. Of course, these indicators are not the only ones that are important but they are directly connected to the EQ and the specific KPIs.

Apart from traditional testing aspects, there are therefore two key features to be considered for the evaluation framework: a) matchmaking ‐ how tools fit the process and applications, b) cost‐effectiveness ‐ of the use of specific tools in order to reveal how implementers are driven to choice of tool(s) (i.e. elaboration on decision process in relation to reference case). For the later multi‐criteria analysis could be used for certain number of implementers. Tests with the SP3 applications and services will investigate the applicability and usefulness of the SP2 technology infrastructure. Most of the SP3 implementations are applications and services already offered to consumers and therefore they will not be evaluated per se. On the contrary, many of the SP2 tools/resources and the Developer Space (DSpace) will be developed within the project and therefore many of them will be offered as prototypes, mock ups or even proof‐of‐concept at the very early stages of the project.

Table 7: Connecting the evaluation questions and the KPIs with primary indicators (implementers)

Evaluation question (s)

KPIs Primarily relevant attributes per evaluation phase

EQ1 1, 4, 13, 14 1st phase: Utility

2nd phase: Usability/acceptance

3rd phase: All the above and number of external implementers interested in the project, effective utilization of Collaborative network.

The selection of specific evaluation methods, techniques and measures for specific SP3 applications will be explicitly presented and discussed in deliverable 402.1, dedicated to the evaluation plans for testing with implementers.

These SP2 tools/resources will be used to make SP3 applications and some external applications more accessible. Which tools will be used for which applications, is still not available but a first attempt to map the SP2 tools/resources with the internal SP3 tools should be added to the next evaluation plan (D402.1). The definition of the evaluation metrics appropriate for testing the use of SP2 tools/resources (e.g. web‐


based developer resources) for making accessible an SP3 application (e.g. integration of Prosperity4All with FLOE; T302.5) remains to be elaborated in the evaluation plans for implementers (D402.1). In order to compose the evaluation framework, tables (Annex B.1) have been created and reflect the rationale and the components of evaluation with implementers accommodating for the objectives and needs of Prosperity4All for the actors within this group. It is necessary to clarify the categorization of data within the framework in order to be easier to be communicated to involved partners and analysts.

One important consideration was made within the evaluation framework and that was to evaluate the specific effect of Prosperity4Aall as much as possible. While most of the applications and services were evaluated already in the scope of other projects, enhancing them is “another story to be told”. It is very important that human factor evaluation also focusses on the unique prosperity propositions that come through the project and the exposure of the tools within an ecosystem. Therefore it is important to understand that all interactions between developers and implementers in the project are made through exactly that evolving ecosystem. Evaluation will be carried out for both the DSpace (where the SP2 tools/resources will be available) and the use of SP2 for SP3 applications (internal and external to the project) in the context of the DSpace.

The following SP3 applications and services will be improved with the use of SP2

tools/resources:

1. Learning and training S/W (T301.1)

2. Brian (T301.2)

3. GuadalInfo (T301.3)

4. Home appliances, home entertainment and

home services (T301.4)

5. SOCIABLE (SILO)

6. MLS Destinator Talk&DriveTM (T301.6)

7. SpagoBI Open Source Business Intelligence Suite (T302.1)

8. Counselling and printing services (T302.2)

9. Learning material (T302.3)

10. ATLab Framework & SwitchTrainer (T302.4)

11. Flexible Learning for Open Education (FLOE)(T302.5)

12. Consumer, business, technical AoD services (T303.1‐3)

Evaluation for implementers will be conducted for four groups of SP2 outcomes. These categories are not exclusive; some of the SP2 developments might fall into more than one category.

SP2 categories of tools to be used by developers:

1. Web‐based Developer Resources DeveloperSpace

WP301: Communication, Daily Living, Health, and Accessible

Mobility (1‐6)

WP302:

Education,

eLearning,

Business and

Employment

(7‐11)

WP303: Assistance

on Demand

Services (12)


2. Tools and Frameworks with Graphical User Interfaces for Development (IDEs)

3. Assistance of Demand (AOD) services

4. Building Blocks and Frameworks (with no graphical interfaces) for developers (API)

Those categories are driven by both the categories of different outcomes of SP2 (components, tools, services, infrastructure) and practical considerations and needs for testing and evaluation. The categorization is considering the applicability of methods as presented in the following tables. Both SP2 outcomes and SP3 implementations can belong to multiple categories.

Particularly for the infrastructure of the Developer Space (DSpace), SP2 will develop multiple user facing components that are exposed to implementers (there is also user facing components exposed to end‐users that are not part of the testing methods). Many of those outcomes will be presented as web‐based developer resources. Most prominent example is the component listing (repository) that will be a directly visible outcome of WP202. In those cases, proven user experience methodology can be applied. The user model of a developer is different from an end‐user in the domain, however, particularly here transition between roles need to be considered for certain stakeholder classes (actors who are user‐programmers can play both roles within the ecosystem). Table 8 presents the most relevant objective and techniques that are derived from this perspective.

Particularly for the first and second category, the matchmaking aspect is becoming of further importance. While many of the web‐based resources will be an entry‐point for many types of stakeholders, the picture differentiates quickly after that. Particularly there will be “no‐one‐fits‐all” usability for components. The goal of Prosperity4All is to enable the selection of fitting components and furthermore the fitness of the components for relevant stakeholders, which differ from component to component. Because we cannot often get a summative [n:m] picture regarding components, services and tools and implementations there will be a two stage process. The first stage will be a matchmaking inside the project via the DSpace. This matchmaking already takes the usability of the web resources into account. After this initial match‐making, the hypothesis is built that the selected implementation should be fitted to the implementer that selected it. In the second step we are particularly evaluating the usability based on this assumption and use the evaluation also as a formative tool to improve the SP2 implementation to become better. For those evaluations, established human factor evaluation techniques can be partially applied; particularly if a tool exposes a graphical interface to the developer. This is the reason why category 1 and 2 are distinguished. However, the graphical user interface is only a small part of the overall usability of the APIs. Because of the challenges involved in testing (non‐graphical) interfaces for implementers, an outline of main considerations made for the evaluation are provided before listing in tables the specific objectives and methods.

In end‐user testing typically clear demands are given. There is a clear explanation if the product does not meet users’ expectations. Missing functionality can be reasonably identified due to the obviousness and knowledge of the processes and


procedures (e.g. text‐processing system or spreadsheet program). By contrast, APIs in the past and present are to a large extend business to business solutions, used by experts. In the last decade, the Application Programming Interface (API) area is changing and the semi‐professionals and non‐specialist user‐groups grew and are of growing importance. Therefore, the need for making digital code efficient, dependable, and reusable, led to the development of Application Programming Interfaces (APIs). This pattern makes it possible to reuse existing solutions in a new context without adapting or understanding their inner functionality. It’s intuitively understandable that the quality of development process has an influence on it. For example, poor APIs directly increase development cost [ 18]. To improve the user‐experience in the field, an application/tool/service has to be improved to be attractive for actual and potential user‐groups.

Current evaluation methods emerged from two different branches. The first to mention is the cognitive dimensions branch coming from usability analysis of visual programming environments, also discussed in end‐users’ section. The work of Steve Clarke in 2004 [ 18] connected the cognitive dimensions brought up by Green in 1996 [ 19]. Following the citations in other studies, his initiative had a path‐breaking effect on the research in this field. The user is the center of the evaluation, and it is seen as an advantage to design and develop around him. Developers and implementers are now seen as humans [ 20]. Those methods are often informal and simple in usage. The second branch is the code metrics branch, which connects software cost estimation models like COCOMO with the codebase. Automatic evaluation methods are here more conceivable than in the first branch.

Current research in this field shows that design choices, which make a good architecture, often result in bad usability. Knowing this, can help to evaluate the design, as well as to find a clever tradeoff. Farooq and Zirkler state that formal and empirical usability evaluation methods overlook many problems [ 12]. Lab studies based on the cognitive dimension framework overcome those shortcomings, but are expensive and don’t scale well [ 13]. A selection of appropriate methods, techniques and success criteria were set for testing with implementers in all three iteration phases and stratified tables to be used in the iterations to follow are presented in detail in Annex B.1. Based on this preliminary analysis ‐and the resources and possibilities that are available in the scope of the project‐ a mixture of potential methods that can be considered a framework for concrete evaluation steps for the other two important evaluation categories were selected. The methods presented in Table 8 could address market‐oriented issues, as they are presented in Table 1. Time consumption, transferability of findings and human resources are related to cost‐effective measures and outcomes.

Table 8: Qualitative method comparison of relevant usage parameters

LS PR DPE ADI TW

Before implementation --- + + + + --- + + +

During implementation + + + + + + - + + +

After publishing + + -- --- + + + -


LS PR DPE ADI TW

Human ressources --- + + -- + + + + +

Time consumption -- + + --- + + + +

Scalability --- + + + + + +

Feedback relevance + + + + + -- - +

Transferability of results -- + + + + + --- --

LS = Lab studies, PR = Peer Reviews, DPE = Design pattern, evaluation ADI = (Automated) Documentation improvement, TW = Technical writers

Apart from static evaluation phases, it would be useful and could provide insight into the day‐to‐day obstacles developers ‐at least internal implementers‐ encounter in order to capture the dynamics of the inclusive design process. With this in mind, it would be useful also to selectively gather‐at random time windows of development cycle‐information about bugs reported, errors, obstacles from real reporting and communication between the developer and implementer teams within the project (e.g. JIRAs). By collecting information from developers from actual work adds ethnographic attributes and leads to achieving emic (i.e. from the perspective of the developer) and etic validity (i.e. from the perspective of the researcher as observer and data collector) for the results collected during pilots. Gathering such indicators is easy to track and record as they will be available to partners. This dynamic evaluation will be random and could be realised in collaboration with SP2, SP3, and the technical validation teams. The evaluation with implementers will incorporate more aspects about economic success and market impact of our project and subsequently will provide valuable feedback for developers.

The logic model (Figure 7) for the evaluation with implementers presents the main methods to be applied in the three iteration phases that will be carried out with internal and external implementers and the main attributes to be measured. It depicts also the communication of results and inferences to SP2 development and SP3 implementing teams. The logical model has three major dimensions: a) the participants and the subjects of evaluation (actors and tools), the evaluation procedure and respective samples (process), and primary measures/indicators per evaluation phase (outcomes). The same modelling was applied for all three evaluation types (implementers, end‐users, and impact assessment). The evaluation stratification tables available at the WiKi elaborate on potential methods, indicators, metrics, and measures for each evaluation type. They will be both used as reference for building the evaluation plans and choosing the appropriate evaluation instruments. Additionally, they might prove useful for connecting the evaluations with the real life impact assessment and subsequently the lifecycle evaluation.


Figure 7: A logical model for the evaluations with internal and external implementers


The model is based on specific assumptions and when it will be time to interpret the results and draw inferences, the potential effect of external factors will be considered. Any interactions are depicted with arrows and they are important for understanding the evaluation content and planning the iterations. This model reflects the framework for the implementers.

6.2.2 Evaluations with end-users The end‐users represent a diverse cluster of users far from including only the person with accessibility needs. Within P4A, these are people who are more active in the decisions in selecting certain products as their needs might not be addressed by products and services already existing in the market. They are also people coming from various stakeholders groups (e.g. therapists, teachers, clinicians, users with combination of accessibility needs, etc.) and they have a common goal, to find customized products and services to their needs. It is crucial to evaluate the user experience with the product as well as the user acceptance of the product after the robustness of the product has been validated. Even if the product satisfies all its technical specifications, still the end‐users would not end‐up using it and/or liking it.

User experience encompasses all aspects of the end‐user's interaction with the company, its services, and its products. It is sometimes helpful to distinguish UX and

usability: According to the definition of usability: Usability is a quality attribute that assesses how easy user interfaces are to use. The term "usability" also refers to methods for improving ease‐of‐use during the design process [ 14]. It is a quality attribute of the UI, covering whether the system is easy to learn, efficient to use, pleasant, and so forth. User Experience is an even broader concept. User experience (UX) focuses on having a deep understanding of users, what they need, what they value, their abilities, and also their limitations. At the core of UX is ensuring that users find value in

Figure 8: Peter Morville ‐User Experience Honeycomb [ 16]

what you are providing to them [ 15]. According to ISO 9241‐210 [ 17], user experience is “a person's perceptions and responses that result from the use or anticipated use of a product, system or service". According to the ISO definition, user experience includes all the users' emotions, beliefs, preferences, perceptions, physical and psychological responses, behaviour and accomplishments that occur before, during and after use. The ISO also list three factors that influence user experience: system, user and the context of use [ 18]. The third aspect, context, is very important for iterative evaluation frameworks, as usually the system is progressively being evolved and the context is quite different from one phase to another.

This implies that user experience in the first Lo‐Fi or Low integration phases of a system may not give valid results due to the non‐realistic context in terms of which


they have experienced the system. This is why it is very important, before any evaluation with users, to present them in advance what exactly the applications, services, etc. will look like at the end but also what are the expectations for the specific evaluation moment.

Also, note 3 of the standard hints that usability addresses aspects of user experience, e.g. "usability criteria can be used to assess aspects of user experience". Unfortunately, the standard does not go further in clarifying the relation between user experience and usability. Clearly, the two are overlapping concepts, with usability including pragmatic aspects (getting a task done) and user experience focusing on users’ feelings stemming both from pragmatic and hedonic aspects of the system [ 18]. In any case, in the Prosperity4All evaluation framework, we will try to define in which way usability is a subset of UX and which measures are corresponding in each case.

It is interesting to note that quite often usability is not “measurable” in a global

objective way but rather debatable. The idea of the cognitive dimensions'

framework, first introduced by Green [ 19], is meant to provide discussion tools to

help people (like programming experts) who are not HCI experts in making quick but

useful evaluations. The dimensions shown in

Figure 9 can be addressed separately but if they are addressed together, they provide a clearer picture of any problems and weaknesses present, especially for developers but also for end‐users.

Figure 9: Potential tradeoffs of cognitive dimensions [ 19]

Cognitive dimensions acknowledge that something that is cognitively hard in one environment may be much easier in another. Prosperity4All deals with a very diverse set of such environments, users and development activities. Green (1996) states that the properties also depend on the tools available in a given environment: “You can fix any kind of difficulty, either by changing the notation or by changing the environment, but you usually pay for it with another kind“[ 19]. Implementers of Prosperity4All may face some of those decisions when using tools from the Prosperity4All ecosystem.

Cognitive dimensions have proven a particularly natural match for visual programming languages such as the web based IDEs developed within P4A. For the usability of components and especially for such specialized APIs as the ones to be


developed within the project only a few tools are available. In this case, adapted forms of cognitive dimension has been successfully used by Clarke, 1996, for Microsoft .NET, which we will also use within the project. They will be useful mainly for evaluations with developers but also with end‐users and especially interesting for final evaluations when interchangeable roles (user‐developer will be of interest).

In addition to UX and usability, user acceptance is an essential part of user evaluation. User acceptance testing (UAT) consists of a process of verifying that a solution works for the user [ 20]. It is not system testing (ensuring software does not crash and meets documented requirements), which is an objective of technical validation, but rather it is there to ensure that the solution will work for the user, i.e. test the user accepts the solution (software vendors often refer to as Beta testing).

In Prosperity4All, user acceptance will be investigated with various techniques, including the Technology Acceptance Model (TAM), which is an information systems theory that models how users come to accept and use a technology. The model suggests that when users are presented with a new technology, a number of factors influence their decision about how and when they will use it [ 21]. There are limitations and assumptions when using this model, as the complexity of the systems is continuously arising and predicting use gets more difficult‐especially for growing system‐but it is a valuable instrument with acceptable predictive creditability.

Figure 10: Τechnology Αcceptance Model [ 21]

The main purpose of using the TAM‐and other user acceptance measures‐ within the project is to investigate the intention of participants of using the offered Prosperity4All tools and applications in the future. The objective is to sum up the collected data under the U, E, A, and BI ( Figure 10) indicators, and create the weight for each component to be used in predicting the actual system use. An exploratory analysis is usually performed to identify the variance explained in the model, because of each component of the model.

Evidently, analysis of data collected about usability, user acceptance and user experience, in general, falls under a mixed approach. As the systems have been becoming increasingly complex, then all facets of development work from UI design to evaluations is characterized by a paradigm shift towards human being kept in the loop perspective of the development process by mixing in automatic, semi‐automatic and manual transformations [ 22]. Therefore, the inclusion of end‐user groups with interchangeable roles not only covers the tails of long tails but also


follows the current trend of design and development for active participation in the development process from all interested parties.

The end‐users are the direct users of the Prosperity4All implementations (i.e. user‐facing SP2 tools/resources like Unified Listings, AoD and Marketplace and SP3 applications and services) and the indirect users (stakeholders) who are potential matchers between the actual producers and developers and the final consumers such as teachers, volunteers, service organizations, as shown in

Figure 3.

The evaluation with end‐users will be carried out in the two last evaluation phases of the project (i.e. 3rd and 4th) and potentially within the final lifecycle evaluation. The SP3 implementations will be applications and services which will be made accessible‐and/or the user experience to be improved‐and external applications from companies and organizations which are interested to improve their work with the SP2 tools/resources contained in DeveloperSpace.

The end‐users will assess the front‐end of the Prosperity4All services (e.g. AoD of WP205) and implementations, and, therefore, emphasis will be placed on accessibility, user experience, and user acceptance. Usability will be mainly measured for the SP2 User Facing tools developed during the lifespan of the project. The boundaries between these three are not so clear and the authors accept a degree of overlap (e.g. part of user experience is usability but it is so more than that, and a product is expected to be usable for the user in order to be acceptable). Fair representation of involved actors will be included depending on both type and purpose of implementations (e.g. for a health‐related application, involved stakeholders might be healthcare professionals). The evaluation will be mostly formative in the 3rd iteration and summative techniques will be added in the fourth and, if possible, in the final lifecycle assessment (impact assessment and its continuation after the end of the project). The evaluations with end‐users aim to answer the following main questions. The evaluation plans that will follow will be prepared with these questions in mind but also for answering more elaborate questions about the use of specific SP2 tools/resources for specific SP3 implementations.

1. Are the User Facing (UF) tools useful (utility) and usable (usability) for their purpose?

2. Are UF tools, SP3 and external applications/services accessible?

3. Did we address the accessibility needs of tails‐of‐tails?

The baseline (or reference case) will be an estimation of users’ current way of doing things and will be instigated by mostly qualitative and subjective techniques. Revealing their current experience with various products, applications, and services will most probably not be an easy task. The types of products they use, their context preferences, and the frequency of use will serve as a baseline depicting their current use of applications and services. Users with accessibility needs know pretty well what they want but sometimes they do not really know what they miss. Lauesen (2004) points out that we might need to interview both typical and expert users in


order to get the whole spectrum of use [ 23]. Thus, we will ask people representing what we call the average user, giving us an impression of the typical user profile, what is not possible for them to do right now and how they normally carry out relevant tasks. By average user, we refer to their computer literacy and not to their needs. Within P4A, the needs of tails and their tails are addressed and there is no average option for their accessibility needs. The expert user will have a broader overview and is knowledgeable about specific rules to apply and available options not commonly used.

In addition, during the final real‐life evaluation (i.e. lifecycle evaluation) real‐life scenarios will be implemented which shall be based on the application scenarios as identified by the work performed within SP1 and, specifically, the scenarios resulting from the demand‐supply chain modelling. There is a possibility that the interchangeability of the roles between implementers and developers will be addressed in the impact assessment, where real life testing will occur. In case a lifecycle evaluation takes place, then the expansion of lifecycle assessment shall accommodate for this endeavor, both research‐ and market‐wise.

As two major axes of user evaluation are considered: user experience and user acceptance, it has been argued that positive user experience and high product acceptance might be related with highly usable, effective, and accessible products. Usability might be a precondition for positive user experience but often even this is questionable. User experience has a lot to do with value for the user – not only in terms of a given task, but in terms of basic human needs. Usability and user experience are considered in the evaluations also through the KPIs (as shown in Table 1).

For those users who will be also developers or implementers, the acceptance rates from both sides (acceptance as developer‐user might be distinct but the role will be played by the same person, thus complete distinction is not possible to be attained) will be gathered. In most cases the testing plans will be different among participants in tests with implementers and tests with end‐users. In addition, investigation of the interchangeability of roles will probably be explored either in the last evaluation phase (i.e. 4th) and the impact assessment. Potential investigation of interchangeable roles shall be feasible during the real life testing of the ecosystem and recruitment of users with possibility to act as both. Therefore, fair representation of developers and implementers with accessibility needs should be attained. This is discussed further in the chapter dedicated to recruitment (Chapter 8).

Field testing corresponds largely to impact assessment part. Some of the above and those indicators included in the detailed tables are going also to be measured objectively through the field testing ‐ besides subjectively as it is presented above, in the context of the impact assessment (section 6.3). Better mapping will be done when we will know all indicators of impact assessment and all the mechanisms that we can build in the ecosystem (requires stable releases of SP1 and SP2 respective outcomes).

The logical model (Figure 11) for the evaluations with end‐users is similar in structure and purpose to the model presented earlier for the evaluations with implementers. It comprises three main parts: a) defining the participants –end users and


stakeholders (actors) and the implementations they will test (actors and tools), b) the evaluations and number of participants per iteration phase (Process), and c) the methods, indicators and measures per evaluation phase (Outcomes), including gathering baseline data. Certain assumptions govern the evaluation phases. More could be added based on the specifics of each evaluation plan and the results from the third to the fourth phase. External factors will be considered when interpreting the main findings. The arrows depict the actual interactions and the evaluation flow. The logical model for the evaluation with end‐users can be used in conjunction with the stratification tables available at WiKi for preparing and developing the evaluation plans and materials required for carrying out the iteration phases with end‐users.

6.2.3 Evaluation context and evaluation conditions

The initial two evaluation phases will focus on developers and implementers, whilst the two later on end‐users. Therefore, context and conditions related differences exist and are inherent in these two evaluation categories.

Context is defined as the environmental characteristics (e.g. remote testing, testing in pc, tablet, mobile phone or other device, testing in a laboratory, field testing, ethnographic testing, etc.). Context refers also to the local environment for each user group addressed.

Conditions refer to the procedural elements of testing (e.g. if there are any control conditions, or if there are testing comparisons between two versions of the same product (A/B) and the potential application scenarios for testing.

The application scenarios refer to the scenarios with certain personas created out of the initial use cases prepared within SP1. They are referred to potential application scenarios, i.e. cases and instances of applications with general personas and not scenarios related to software applications. These scenarios will prove to be useful for creating the tasks and actual testing scenarios for testing with any of these groups. Whilst preparing this deliverable, the final modelling of the demand and supply for involved actors is still not available. Therefore these application scenarios are presented as typical examples of the scenarios to be used for testing with users but also for the impact assessment of using the Prosperity4All platform and the ecosystem. The scenarios were based on the work circulated by the SP1 teams on the functional roles stakeholders might have and suggested value propositions for attracting them to use the ecosystem and access the platform. The elaborated scenarios will be available in an updated version of the deliverable which will be submitted in 12 months. The decision to update the deliverable was taken because there is a need for incorporation of final work carried out in SP1 and this will be available after the submission of this deliverable, as the investigation of the market/world in which P4All will evolve is an ongoing process.

Any preliminary scenarios will facilitate the later development of the testing plan for the first iteration and they will be enriched with updated information and work available at the time of preparing the deliverable for the first iteration phase. These scenarios do not cover all instances; they remain examples of implementations and work to be carried out by users.


Figure 11: A logical model for the evaluations with end‐users


6.2.4 Anticipated limitations

Implementers

The P4A ecosystem is anticipated to be enriched with diverse products after the end of the project. The number of external products and services attracted by the project it is not possible to address the majority of tails and tails of tails but rather a fair representation. Therefore, not all possible combinations or optimal combinations between SP2 and external implementation will be achieved but the representation that will be attained will provide a good practice approach for the lifecycle evaluation that could be carried out with automatic metadata mechanisms when the ecosystem will be finally deployed, be self‐sustained and self‐regulating.

End‐users

The available user‐facing products might not cover the needs of all the persons and will not address all the tails of the tails because something like that would not be feasible within the lifetime of the project, if the complexity and diversity is considered as discussed in earlier sections of this deliverable. There is also the possibility that the people who fill the requirements might not be so interested to the specific products available but be more interested for others not available yet.

6.2.5 Implementers

This group includes the SP3 implementers and external professionals who might be freelancers or even companies, service providers, and other groups as identified in the list of actors.

Testing with implementers will be primarily carried out in three contexts: a) in their own work environment with real use of components and tools (group assessment, i.e. peer heuristics), b) remote testing (remote data gathering), and c) face‐to‐face qualitative assessment. Considerable part of testing will be carried out in their own environment gathering mostly qualitative data. Focus groups will be carried out with implementers in small groups (5‐8 participants) which could be organized in parallel with the demo workshops. The focus groups will provide enriching data to the interviews held with participants and the rest of collected data. This is relevant to data triangulation and filling the “gaps” of other methods of data acquisition. Conditions will vary between phases and among users. Each user might implement different SP2 outcome for making an SP3 (or other) application or service accessible. During the first phase, implementation will be in some cases emulated as non‐functional versions of tools might only be available (e.g. mock ups, paper prototypes). The exact conditions of testing will be sketched when the final mapping between SP2 outcomes and SP3 apps and services is available, as it is evidently closely related to the nature of use of SP2 tools/resources for SP3 apps and services (i.e. their matching).

Another sample of implementers should be anticipated for the impact assessment. They will probably remotely assess the Prosperity4All platform as part of impact assessment. These users will freely interact with the platform and evaluation will


involve a real‐life assessment of the ecosystem as is presented in section 6.3. The minimun number of participants per evaluation phase, as defined by DoW are presented in Table 9.

Table 9: Allocation of participants – Implementers

Implementers participating in each evaluation phase (minimum numbers)

1st 2nd 3rd Total

5 SP3

implementers

5 SP3

implementers

and a few

external

15 internal and

external

implementers

(coming from 8

countries including

2 non‐European)

25 in total

The conditions of testing will be based on application scenarios which will serve the requirements of the evaluation framework. The application scenarios utilize personas (based on identified actors) to present potential use of a tool/resource/application/service. They should not be confused with evaluation scenarios that are used for testing purposes. Such scenarios will be prepared before each evaluation phase and sketched in the evaluation plans. The following scenarios are based on initial work related to the identification of main scenarios with regards to all potentially involved actors and not only implementers. It is envisaged that these scenarios will be elaborated and improved when the relevant work will be available. The same holds true also for application scenarios for testing will end users. SP2 teams are currently working with personas to structure the requirements for the DeveloperSpace. Additionally, discussions with SP1 partners ‐working on the unmet needs of potential actors will facilitate structuring the appropriate scenarios as these scenarios will subsequently result by modelling work for the demand and supply chains within Prosperity4All. In SP4 audiomeetings carried out with partners from most SPs, these personas and generic application scenarios for testing emerged. These discussions are the main reason for using these personas to reach the application scenarios which will be updated with work from SP1 leading to more enriched and specific generic application scenarios ‐ in the next updated version.

Three potential generic personas and application scenarios are based on initial ideas of how main actors will interact with the ecosystem‐in still a fragmented style‐ but focusing mainly on the story about who the user of a particular technology is, what they want, what they know. The application scenarios are therefore usually written in narrative form, perhaps with pictures and illustrations as well. Scenarios are generally written at the beginning of a project during discovery and requirement gathering phases. On the other hand, business case scenarios are part of detailed product requirement documentation and their roles shall complement the application scenarios inlcuded in this deliverable when they will be available.


Based on the list of actors, application scenarios like short stories are created for three groups of actors which belong to the producer category (i.e. people who produce products, create apps, improve services, etc.) and have a direct impact to the Prosperity4All developments [i.e. belong to the family of Producer of Things (PoTs) scenarios]. They were created for three different value propositions (i.e. reasons to join platform). At this stage, scenarios are characterized by the functional role of the stakeholder, the value proposition (broad), and the family of scenarios it belongs.

The value of these scenarios for the evaluation framework lies in the fact that they provide insight in the many types of actors that they could be involved as implementers, the way they can work and collaborate, and the variations in their expertise, knowledge and even the areas of interest within accessibility.

6.2.5.1 Examples of generic personas and application scenarios for implementers

The scenarios presented for implementers and end‐users and then expanded for the impact assessment are generic as the final demand‐supply models of the project are not available and their creation was based on the categorization of user groups based on their functional roles (chapter 3) and existing relevant literature [ 24].

The personas include functional elements (e.g. what the identified persona is doing with the implementation) accompanied by a short application scenario for testing purposes. The interdependence of the application scenarios and the initial work carried out in SP1 is evident in both the methodology of this framework (

Figure 2) and the sketching of the interdependencies as identified in Figure 14.

Persona 1: Actor –Producer‐Economics: GUI adaptation of route guidance system for visually impaired users (support independent living)

Simon is a developer (Actor ‐Producer – supply‐end of chain) who has long been working in making accessible applications for many years. He has worked in a large company for many years and lately he is interested in navigation support systems for marginalised user groups such as people with visual impairments. He found out about the Prosperity4All multi‐sided platform via a blog for developers he often visits and receives the Newsletter. When he visits the developer part of the platform he is unsure about which component of the DeveloperSpace is more appropriate for what he is looking for to do. He checks the link and visits the Prosperity4All training platform. He selects the curriculum for external implementers and specifically the course on adapting GUIs for visually impaired users especially for navigation support software.

He then selects the component for changing the interface of the routing guidance system and makes it available to the platform for users to buy. There is also an option for the user to ask for a specific customisation to be made, to start a bidding process for it in order to fund the relevant customisation and there is an opportunity to hold a discussion with the developer prior to the purchase.


Application of scenario for testing: The implementer will adapt the GUI interface of the route guidance system for visually impaired users. Testing at early stages of development will be performed together with other low or medium fidelity prototypes. This is related to the application scenario mentioned in the next section with end‐users.

Persona 2: Actor – Producer ‐ Law: Making accessible learning materials (Support independent education and work)

Carla (Actor – Producer – supply end of chain) is a freelancer who is currently collaborating with a large public library aiming to make their digital resources accessible to blind and visually impaired users. She visits the Prosperity4All platform and accesses the part for developers and implementers in order to find relevant resources for her work. The training videos were very helpful and she found numerous resources about different screen readers and their implementation to the vast and diverse digital books and information available. The workload is huge but still the resources and tools available at the Prosperity4All platform will assist Carla by saving‐time looking for methods and tools in the internet and increasing her potential and knowledge in the accessibility domain.

Application of scenario for testing: The implementer will use the SP2 tool(s) to enhance the accessibility of digital documents to be accessible by blind and visually impaired users (e.g. either one or two screen readers). Testing at early stages of development will be performed together with either low or medium fidelity prototypes. This is relevant to end‐user application scenario.

Persona 3: Actor ‐ Producer ‐ Ethics: Adaptation of AoD services for older people (support inclusion of lower or no literacy computer users)

Nick (Actor‐Producer‐supply end of chain) is working as a developer and IT specialist in a national bank branch. He is also a volunteer at the regional Elderly Centre near his home. He is deeply concerned about older people and their limited or no digital literacy. He is helping them to learn how to use computers. He wants to find a way to help older visitors use the website of the elderly centre. He is teaming up with a friend who is actually working as social worker at the centre and is pretty aware of the problems older computer users might face and he is just an enthusiast (i.e. he is an amateur software designer). A friend informed him about the Prosperity4All platform and the availability of the AoD framework that can help anyone set up an AoD service easily. Nick considers setting up a technical support service so that older persons facing technical difficulties with their IT are able to ask for help from IT experts during pre‐defined timeslots. Their work aims to increase independent use of computers by the users.

Application of scenario for testing: The implementer will use the SP2 AoD infrastructure to set up an AoD service for technical support. Testing at early stages of development will be with either low or medium fidelity prototypes. This is relevant to the respective end‐user application scenario.


6.2.6 End-users

The end‐users are the people who will use the SP3 and external implementations and the SP2 user‐facing tools. Testing with users from various disability groups and other stakeholder groups (like teachers, educators, volunteers, policy makers who might act as matchers or re‐actors) varies from traditional performance and usability testing to collecting quantifiable data, focus groups and interviews. The main objectives are set by the KPIs in Table 1. The following table presents the connection between the evaluation questions, the KPIs and the logical model for the end‐users (Table 10). Each evaluation question is linked to a specific KPI and according to the logical model there are certain attributes that are primarily connected to both of them.

Table 10: Connecting the evaluation questions and the KPIs with primary indicator (end‐

users)

Evaluation question

KPIs Primary indicators per evaluation phase

2, 3, 5 2, 12 1st: Utility, user experience

2nd: all the above, and accessibility, user acceptance, and focused usability (i.e. usability of a specific functionality and/or feature(s))

The elements borrowed from performance testing ‐ a form of usability evaluation of a working system under realistic lab conditions‐ is mainly to identify if any problems occur during the session. There might be data gathering, such as success rate, task time and user satisfaction with requirements but mainly for comparing the user’s subjective and perceived experience for an application or service rather than the usability of the actual product. The products have been tested in previous research and evaluation activities as they are actual products. The interest for the project lies in the additions or changes made to the product within the lifetime of the project. Therefore, any performance measurements will be rather focused on these aspects for certain added functionalities and/or features. For example, users will probably be asked to complete a task with a device and/or application and data will be gathered on site (i.e. like in lab settings).

Performance measures in the lab are planned both before and after the field tests to identify both issues that occur when users use a product for the first time and problems that persist even after longer periods of using the product. Field tests will be carried out in the final evaluation phase and will largely contribute to impact assessment calculations. A number of users will participate in sessions for testing specific implementations and a number of users will participate in field tests. Field tests are the real tests that will most probably provide data for the impact assessment calculation and simulations.

However, performance testing in its whole extent (before and after field tests), in order to provide meaningful results, has to be applied, only when the stage of the prototypes allow that. Also, the tasks, upon which it will be realised need to be very


specific and clear and absolutely relevant to both the prototype to be evaluated and the specific user group which will evaluate it. In case the maturity of the prototypes is prohibitive for performance testing, only subjective methods should be preferred.

Testing sessions will be conducted upon specific evaluation scenarios that will be formulated upon the application scenarios emerging from the demand‐supply chains of the project (WP102). In order to approach a demand‐supply chain as realistically as possible, performance testing has to be realized with the participation of all the named actors. This requires a very careful planning on behalf of the test sites, and a recruitment process that will successfully end‐up with a diverse pool of users (encompassing all the users listed in Chapter 3).

In the first iteration with end‐users, sessions will be probably combined with contextual inquiry and naturalistic observation, depending on the characteristic of the implementation and the additions/changes planned to be made within the project. Service diaries and think‐aloud protocol will be important tools that will be used by the facilitators. Different measures of service diaries are applicable for each key indicator. Measures for the service diaries might include metrics like errors, time to complete, success rates but also subjective assessment of the observer (e.g. awareness, loss, frustration).

Performance testing upon specific scenarios will be conducted before and after the use of the SP2 resources and tools (a consistent versioning system is necessary in this case).

While performance testing is applicable for both 3rd and 4th evaluation rounds, field testing is mainly applicable for the last evaluation round. Field testing is comparable to usability testing, but here the evaluation is performed in the user’s normal environment. That is, users will use a new product for a certain period in their own environment, doing their own everyday tasks. This is the most realistic type of testing that can be applied but it requires a certain maturity of tools and applications, which is safer to assume that will happen before the last evaluation round. Feedback in this case will be achieved through built in feedback and logging mechanisms of the platform that are planned in WP404, since all user types will be motivated to remotely access and use the ecosystem. In reality, field testing with the help of the automatic mechanisms of WP404 will be continuously applied in the context of the lifecycle evaluation, beyond the end of the project.

It should be stressed that while performance testing aims to the human factor evaluations of the User Facing tools and the applications for users, field testing aims to the assessment of the overall ecosystem infrastructure in a realistic context, and is the means for the impact assessment. As such, through field testing, all the human factor indicators can be continuously monitored (through feedback mechanisms) and contribute to impact assessment of Prosperity4All.

Questionnaires/interviews are applicable for both 3rd and 4th evaluation rounds with end‐users and will be addressed in face‐to‐face sessions after performance testing. The same is valid for focus groups; mostly applicable for gathering from user experience related measures.


Performance testing as well as questionnaires completion, interviews and focus groups realization with end‐users will be performed locally in the test sites that have been established in Prosperity4All. The process and mechanisms that will be followed for their recruitment as well as the test sites infrastructure are described in Chapters 8 and 0, respectively.

The numbers of participants per test site across the evaluation phases are shown in the following table (Table 11). Completion of the table is based on “rules‐of‐thumb” for formative and summative evaluation. Formative evaluation is usually carried out with 5 to 8 participants. Summative evaluation ‐where sometimes inferential statistics are applied‐ requires higher number of participants. These numbers will be revisited when preparing the evaluation plans per user group.

Table 11: Allocation of participants – evaluations with end‐users

Test Site Participants per site in performance testing

3rd 4th Total

TECHNOSITE, Spain 12 28 40 participants

Lifetool, Austria 12 28 40 participants

KIT, Germany 12 28 40 participants

CERTH, Greece 12 28 40 participants

Apart from testing within the framework of already agreed testing phases, a number of users should participate during real testing of the platform as part of impact assessment. The number of users might vary depending on the targets and aims of impact assessment evaluation plan (D404.1) and the size of estimated impact (e.g. if several and fragmented impacts will be calculated or overall real life deployment will lead to the numbers behind the impact assessment calculations). Based on literature such numbers vary between hundreds‐thousands of users for long term impact. The numbers usually reflect also the sensitivity of content and context (e.g. health, medicines, etc.), running period and expected impact. For example, if the partners would envision a large impact to existing market, then the platform should run for at least six months and accessed/used potentially by hundreds of users representing probably most actors. Nonetheless, such deployment is not feasible to be estimated within the lifetime of the project, and, thus, other methods and directions should be investigated and probably followed.

Apart from representation of other categories of actors at impact assessment and final lifecycle assessment, each pilot site should recruit a group of stakeholders (5‐8) representing actor groups indirectly related to the ecosystem (.e.g. interested parties in consuming information that will be available on the platform). Stakeholders will be introduced to the concept of Prosperity4all and will participate in focus groups with other stakeholders aiming to elicit productive discussions leading to new insights about the evaluation and the project and gathering qualitative data and suggestions.


6.2.6.1 Examples of generic personas and application scenarios for end-users

The application scenario examples presented in the previous version can be accompanied by respective application scenarios (i.e. use of implementations by end‐users). The following application scenarios are based on current un‐met needs of user groups as identified within the DoW and the tasks to be carried out in both SP2 and SP3. The actual demand‐supply modelling has not been performed yet and thus these application scenarios are preliminary and generic. When the Prosperity4All models and respective scenarios will be developed, then the application scenarios will be updated and enriched.

Persona 1: Actor ‐ Consumer ‐ Economics: Use of Brian for assisted living

Charlotte is a middle‐aged housewife and cares for her mother Linda who is suffering from age‐related cognitive impairment. She (family carer) and her mother have Brian integrated to their home with the enhanced sensory and monitoring support. The integration of this application allows Charlotte to have a few hours for herself and her mother to feel safer, less agitated and frustrated, and more relaxed when her daughter is away.

Application of scenario for testing: Enhanced SP3 application Brian with the use of SP2 resources will be tested by different end‐user groups, among others (carer and patient; both direct impact end‐user groups). This application will be tested in 3rd and 4th evaluation phases and potentially in the context of a lifecycle assessment. Initial assessment will probably be based on a non‐functional prototype.

Persona 2: Actor ‐ Consumer ‐ Law: Provision for accessible counselling and printing services

Ian is a university professor and he is required to provide teaching, studying, counselling and support services in an accessible digital format. Prosperity4All gives him the opportunity to do it without the necessity of any programming skills. The Prosperity4All implementations will make the digital information and resources accessible to blind and visually impaired users without excluding his students.

Application of scenario for testing: He represents a matcher (i.e. a mediator) who will test the accessibility of an SP3 application for their students (e.g. validation of concept, focus group with other stakeholders). There is, of course, an identical application scenario for the blind and visually impaired end‐users. There is a possibility that only prototypes will be available for earlier testing phases.

6.2.7 Reference case (baseline)

Reference case is the baseline which we assume reflects the previous and current state of working (in the case of implementers) and performing relevant tasks (for end‐users).

For implementers, the reference case consists of what is their current experience, preferences and needs regarding the several tools existing around and match or could match with their application and/or service. Furthermore the expectations of


the developers regarding future tool use will be collected. This type of feedback can be collected only qualitatively. Techniques like persona building will be used to consolidate common views between multiple implementers. The most cost‐efficient method is remote workshops/focus groups and individual telephone interviews for more in‐depth investigation, however, most preferably upon specific questions that would serve as a first guide. Ethnographic testing and studies of current behaviour will be conducted where feasible and affordable.

This collection and documentation of the current view of implementers and developers should be a continuous exercise that would optimally last quite a long period before the evaluation round, in order to give sufficient and valuable data. Evaluation questions may be added in an agile fashion, resulting from findings from those pre‐studies. It is an exercise that is specifically essential to take place before the first evaluation round, although it could be also repeated partially before the subsequent evaluation rounds, given that technology is dynamically evolving and is more than possible to influence the current view and expectation of any actor, and in this case the implementers. It will be used also used as a communication tool to ensure engagement of all evaluation parties and to support a participatory design process.

In section 10.2, there is an initial set of questions prepared in order to collect and progressively build the above described reference case for the implementers, prior to the first evaluation round. This is a continuous exercise that started from the early beginning of the project, aiming – besides collecting the needs and preferences of the implementers – to engage them to the whole implementation – evaluation loop that they should actively participate.

For end‐users, the reference case should respectively consist of what their current experience, needs and preferences are regarding user‐facing tools and applications of different kinds addressing them. Again, this type of feedback can only be collected subjectively through questionnaires, before performance testing.

6.2.8 Variations per evaluation phase

Not all presented methods and metrics for implementers and end‐users will be used in all evaluation phases. The first phases will be mostly formative and testing in many cases mocks up and development work with limited functionalities. In addition, there is a possibility that tools will not be all on the same functional level, and therefore one‐solution for all will not be making any sense and most important will not be effective. For lower fidelity work at each stage – probably two for the first two evaluation phases‐ purely formative evaluation will take place in most occasions.

6.3 Impact assessment

The prosperity of the Prosperity4All ecosystem needs the feedback and interaction of all potential actors of it, as already outlined in Chapter 3. Whoever the actor is, there are specific aspects that need to be evaluated and the ways to collect feedback are in all cases the same. Besides, impact assessment needs to be structured in such a way that will be feasible in the long term horizon, and, beyond of the


Prosperity4All initiative. The impact framework will be elaborated within WP404 and the respective deliverables (D404.1 and D404.2). The impact assessment is generally discussed in this deliverable, in order to highlight its connection with the other evaluation phases and because it is actually an evaluation, therefore it is a part of the evaluation framework.

The following table presents the connection between the evaluation questions, the KPIs and the logical model for the end‐users (Table 12). Each evaluation question is linked to a specific KPI and according to the logical model there are certain attributes that are primarily connected to both of them.

Table 12: Connecting the evaluation questions and the KPIs with primary indicators

(impact assessment)

Evaluation question (s)

KPIs Primarily related attributes

4, 5 3, 5, 6, 8,10, 14 Usage, popularity, diversity, conversion rates, average potential revenue per unit, new user downloads

The frame for the impact assessment is just an early presentation of the methods and tools that could be used and applied (


Figure 12). At the time this deliverable is prepared, it is very early to have a finalized and concrete approach and model of the Prosperity4All ecosystem, as it is anticipated to be a later stage in the project outcome. A Gestalt approach is adopted for starting to create the framework for the assessment of impact or impact(s) of the ecosystem. The impact of the whole ecosystem is far from the addition of each separate part (i.e. SP2 tools/resources and repositories, the SP3 implementations, and the platform where they will be available and actors can interact with them). There are no content related attributes and high level objectives, as the content and user interface aspects might be added later in the life of the project but they are key categories solely defined though by their own attributes, thus adding them to the following table will not be of any added value, at least for this first disposition. At some point, we need to evaluate the online purchasing/bidding experience from both accounts (developers/implementers and end‐users) as a universal user experience [easily embedded to online feedback forms: administration of E‐commerce worksheet and/or SUPR‐Q: usability, credibility (trust, value, comfort), loyalty, appearance] [ 25].

The evaluation of the indicators and measures resulting by the deployment of the Prosperity4All ecosystem will utilize social networking not only as a means for increasing popularity as well as to investigate the role of social proof. Social proof is a psychological phenomenon where people reference the behaviour of others to guide their own behaviour, addressing tails and tails of tails in order to attain an alive


Figure 12: A logical model for the impact assessment


and sustainable ecosystem. Additionally, crowdsourcing techniques, feedforward mechanisms and automatic feedback methods will be considered within WP404 ‐in collaboration with SP2 and SP3 teams involved in related development work‐ as an integral part of defining the measures and mechanisms of the ecosystem.

Impact assessment will be carried out with data gathered probably during the last two iteration phases and will use automated metrics and measures. These data will allow for impact calculations; how the creation and growth of the P4All ecosystem will affect the relevant market and how it meets impact‐related goals. The impact assessment will also assess the potential consequences, positive and negative of the deployment and use of the P4A ecosystem.

6.3.1 Evaluation context and evaluation conditions

Impact assessment, meaning assessment of the prosperity of the system in all above aspects will be facilitated as long as the infrastructure will be up and running. The more actors and, in specific, the more external actors, tools and applications are part of the system, the more valuable the feedback will be. It is clear that the context should be as close to real life as possible which means that no specific tasks should be anticipated for the impact assessment. Only motivations will be given to all kinds of actors, from the 2nd evaluation round on, in order to trigger as many real life interactions as possible.

This process corresponds to “field testing” stage of the impact assessment, where all types of actors will be motivated to use the platform and freely interact with each other as they need/prefer, remotely, from their own environment, whereas their feedback will be tracked through the logging and feedback mechanisms that will be built in the platform. The evaluation context will be as realistic as possible we actors from different countries interacting with the system and performing all possible interactions. The integration of feedback mechanisms will allow for frequent, easy, and constant data gathering of subjective, self‐reported of any visitors/subscribers will the potential to create a rather big and dynamic user network. In addition, platform analytics will be able to provide objective and probably traditional web metrics which will prove very important for the development of the feedback tools and forms.

6.3.2 Examples of personas & generic application scenarios for impact assessment

The application scenarios in their quite generic form presented in the two previous sections for implementers and end‐users serve also for impact assessment. However, two more scenarios are added for impact assessment to emphasize the interchangeability of roles that could be accommodated in impact assessment and the transaction processes relevant to real use of the platform. On purpose one scenario from the implementers and one from end‐users are modified to highlight the extension of the work to be involved in the final real deployment of the platform and the subsequent testing.


Persona 1: Actor – Producer/Consumer‐Law: Making accessible learning materials (Support independent education and work)

Carla (Actor – Producer – supply end of chain) is a freelancer who is currently collaborating with a large public library aiming to make their digital resources accessible to blind and visually impaired users. She is visually impaired and is very familiar in using screen readers. She visits the Prosperity4All platform and accesses the part for developers and implementers in order to find relevant resources for her work. The training videos were very helpful and she found numerous resources about other screen readers and their implementation to the vast and diverse digital books and information available. The workload is huge but still the resources and tools available at the Prosperity4All platform will assist Carla by saving‐time looking for methods and tools in the internet and increasing her potential and knowledge in the accessibility domain. She used the new tools at her work but she got so interested in them that she returned to the Prosperity4All platform to select the implementations with other resources she has not used before.

Application of scenario for testing: The implementer will access the platform use the SP2 tool(s) to enhance the accessibility of digital documents to be accessible by blind and visually impaired users (e.g. either one or two screen readers). The same developer is also visually impaired and can access the platform to perform transactions in order to get other implementations (and/or similar) with the resources that got familiar with. The user subscribes to the platform, likes the platform on Facebook and shares the link for the resources with other friends and collaborators. The user completes his/her feedback and provides suggestions for addition of other tools he/she came across a couple of weeks ago.

Persona 2: Actor – Consumer ‐ Law: Provision for accessible counselling and printing services

Ian is a university professor and he is required to provide teaching, studying, counselling and support services in an accessible digital format. Prosperity4All gives him the opportunity to do it without the necessity of any programming skills. The Prosperity4All implementations will make the digital information and resources accessible to blind and visually impaired users without excluding his students. He also joined the community to interact with other mediators and developers who have to offer a variety of tools and applications which he had no idea they existed! He shares the link to the platform with other staff at his department and his students.

Application of scenario for testing: it represents a matcher (i.e. a mediator) testing the accessibility of an SP3 application used by a number of his students (e.g. validation of concept, focus group with other stakeholders). There is, of course, an identical application scenario for the blind and visually impaired end‐users. All tools will be functional and available on the platform.

6.3.3 Reference case

There are several multi‐sided platforms at the moment offering various products and services but the situation is less overwhelming for the longer of the long tails. There


are available spaces for developers and for users but their connection in one platform is an innovative feature to be offered by Prosperity4All. A brief overview is provided in the Introduction (Chapter 1).

The definite reference case and the elaborated account of the current status quo with regards to similar ecosystems and platforms will be presented in deliverables D101.2 and D404.1. The building of the reference case will reflect the current market will be searched in order highlight the unmet needs of actors envisioned to interact with the ecosystem. In addition, cultural aspects and preferences hold a special place in the P4A ecosystem’s sustainability and longevity. If it would be to use a “supermarket analogy”, ten stores of the same supermarket chain might offer different types of products depending on the location of the store.

Therefore, the variations per phase, and as a whole, will most probably will reveal an important evaluation dimension for the estimation of impact assessment. Some of the cultural aspects will prove more robust than others and although different from one country to the other, the reason might be similar. For example, voice assistance might be very important for assistance on demand for older users across countries but the language and the support required might differ. These can be called robust aspects. There will be others, though, that will prove to be very cultural‐specific and this will even make the existing diversity more complex. These are called cultural‐sensitive aspects.

6.3.1 Anticipated limitations

Assessing impact requires established long‐term operation of the ecosystem in order to gather adequate real data and for a reasonable time period in order to reach valid and reliable outcomes that have to some extent generalisability. Therefore, the size of the estimated impact might be as large as anticipated but the sample size (i.e. users freely interacting with the ecosystem) might be restrictive. A way around might be to “borrow” data from the actual iteration phases but this is only feasible with considerable assumptions that bear their own set of limitations. The major limitation is that it might not be feasible to go with one, big overall ecosystem assessment but rather to base it on small, restricted market and economic estimations and then extrapolate from those smaller estimations to calculations of probabilities based on forecasting and simulation methods.

7 Prosperity4All test sites descriptions

Four test sites will be the testing centres for the evaluation phases to be carried out. Tests will developers and implementers will be led by KIT (Germany) and tests with end‐users will be carried out by KIT (Germany), LIFETool (Austria), TECHNOSITE (Spain) and CERTH (Greece).

Remote testing will take place mainly with implementers and external implementers. There should be also consideration for testing with various stakeholder groups in the


final lifecycle evaluation assessment, where users from other countries might participate.

7.1 LIFETOOL (Austria)

LIFEtool is a non‐profit research organisation founded by Diakonie Austria and the Austrian Institute of Technology. It is dedicated to research and development in the fields of assistive technology and special needs, and is responsible for the non‐profit consultation network in Austria, Serbia and the Czech Republic. LIFEtool concentrates on the selection, production, distribution and evaluation of innovative information and communication devices. LIFEtool has worked in several successful research projects as partner and coordinator and can draw on broad experience in project and network management as well as in end‐user evaluation. Since 2010, LIFEtool is active in international AAL projects, HOME.OLD and MOBILE.OLD (coordination), performing trial operations and evaluation at the Austrian pilot site. Whereas HOME.OLD was targeting on prevention of isolation and loneliness, as well as on advancement of social interaction, MOBILE.OLD was dealing with maintenance of mobility and activity among elderly citizens. Within the range of both projects existing ICT services running on TV, Smartphone and/or Tablet were adapted to meet elderly requirements. According to its competences LIFEtool will contribute to improve the user experience as well as ensuring the usability of the services. Specifically, LIFEtool will organize the Austrian pilot, inviting end‐users (recruitment, technical set‐up, trial operations) to its barer free consultation room. This ambience features:

a cosy homey atmosphere

wireless as well as cable‐based internet connection (high bandwidth) multiplatform devices (Mac OS, iOS, Windows, Android, Linux)

access to a dazzling array of assistive technology devices presentation equipment (beamer, smart board, flipchart, computers)

LIFETool’s interdisciplinary team combines educational, psychological and social expertise with technical and programming know‐how of researchers and developers.

Figure 13: User testing facilities (LIFEtool)

7.2 TECHNOSITE (Spain) Technosite has a vast experience in user testing and in the last decade has worked with multitude of users and researchers in similar projects. The conjunction of Technosite and Once Foundation makes available thousands of users for recruitment


since Once Foundation has headquarters in every Spanish city. ONCE Foundation (FONCE) and Technosite (TECH) have collaborated in similar research studies before and have already built a common basis for conduction of evaluation studies (e.g. C4All, ICARUS, CERMICLOUD, APSIS4all, INREDIS, T‐Orienta Project, AEGIS, Monitoring eAccessibility in Europe). The evaluations will take place at ONCE Foundation premises where numerous other tests were carried out and their accessibility is not an issue. The building is situated in the city centre of Madrid (close to Seveal underground station). Users will be rewarded with a small reward as gratitude and users will be offered transport to the premises, refreshments and they will be welcomed by experienced staff in accessibility. Technosite intends to test the developed tools from different perspectives. Pilot responsible persons have a good knowledge of disability, accessibility and user involvement issues.

7.3 KIT (Germany)

The Study Centre for the Visually Impaired Students (SZS) is a cross faculty service and research facility at the Karlsruhe Institute of Technology (KIT). As part of the university, the SZS assists blind and partially sighted students, particularly in the completion of the study courses offered at KIT. Strategic components of the work of SZS include beyond this service offerings and research. Testing is an essential part of the research and comprises a broad test field with visually impaired persons (especially visually impaired persons; blind and partially sighted). The SZS has the ability to access a broad test user base (also beyond the university boundaries). They are regularly involved in testing. Some examples from 2014 are tests for other running EU projects or accessibility testing of software, websites, iOS or Android Apps. Facilities exist that host a range of assistive technologies particularly for this user group. Accessible workstations exist for parallel testing. An extension of those facilities is planned in the near future. As supervisors/conductors our support centre stuff has long‐term experience on the one hand in making IT and AT accessible to visually impaired and on the other in preparing testing guidelines and tutorials for different accessibility studies.

7.4 CERTH (Greece)

The Hellenic Institute of Transport (HIT) is part of the Centre for Research and Technology Hellas (CERTH) which is a non‐profit organization under the auspices of the Ministry of Education and Religious Affairs, Sport and Culture and it is based in Thessaloniki. The institute has extensive experience in carrying out pilot and usability tests with older citizens and disabled users among other user groups in the framework of numerous European projects (e.g. C4All, ASK‐IT, OASIS, SAVE ME, REMOTE, VERITAS, ACCESSIBLE).

CERTH will conduct end‐user tests with participants suffering mainly from visual, motor, hearing and cognitive impairments utilizing the user testing facilities situated at the ground floor of its main building. Testing usually takes place in a specially


designed living lab with full accessibility for both motor and visually impaired users, emulating a real home environment ( Figure 14) with wireless and cable internet access, cameras (web for remote testing and free standing), accessible workstations, AT technologies, and presentation tools (wide screen TV, beamers, projector, smart board, flipchart).

Figure 14: User testing facilities (CERTH/HIT)

Testing supervisors have extended expertise in organizing, monitoring and conducting tests with disabled users and older people.

8 Participants recruitment

The recruitment of participants refers to the creation of strong and reliable liaisons with addressed users group which is an essential component of the Prosperity4All project. The list of Prosperity4All actors is long and this the reason for a specific task dedicated on recruitment processes within the project (401.1). Recruitment is realized at both pilot site and project level. Liaisons are an integral part of dissemination activities. The project teams have to attract external implementers and other stakeholder groups. The latter is true especially for the final assessment. The recruitment framework is established and will be later adapted to the requirements of each evaluation phase. Recruitment is closely related also to Ethics management at pilot sites and on project level (i.e. Ethics policy).

8.1 User involvement strategies

An important aspect of establishing a recruitment procedure that is efficient (i.e. users will be happy to participate and return for further activities, either within this project or other and not necessarily related to testing). Recruiting and retaining volunteers is an essential process that will guarantee a reliable outcome regarding users input in all testing activities. For example, it will be beneficial for the project to recruit implementers who have provided feedback for preparing the reference case for the first iteration and then the same implementers to participate in later stages. The reason is not to provide comparative data so much on statistics but to provide rich qualitative data for investigating several work related attributes (e.g. time spent now for implementations) and attitudes towards their use (this could be also important). Such gain in richness resulting from follow up adds value and reliability in testing and project outcomes.


There are three main recruitment sources (starting from small and moving towards larger scale of effect) that partners and project as a whole should utilize:

– Local liaisons with organizations, existing participant database, word‐of‐mouth;

– End user project representatives will also put in use their own liaisons and contacts with assistance from dissemination teams and will be in close collaboration for taking advantage of dissemination activities (SP5) to recruit participants (e.g. participation in events about accessibility outside EU might bring external implementers and workshops jointly with other projects or national initiatives brings in knowledge and practices from other projects);

– Utilize the extensive Prosperity4All collaborators network already established in the project as many companies and institutes have signed the letter of commitment (among others, IBM, Project Possibility, Burton Blatt Institute, etc.). The dissemination and pilot site teams should harmonize their activities and exploit the potential of each member of the network in order to maximize the possibility of including diverse user groups and thus validate and generalize their findings (extremely valuable for impact assessment).

8.1.1 Selecting participants

As the platform will be used by many different types of users (e.g. developers, implementers, people with disabilities, organizations, volunteers, older citizens with low digital literacy) a major aim is to reflect this diversity in the testing procedures. This adds on ecological validity, facilitates the impact assessment and sets a strong foundation for a potential lifecycle assessment to kick off near the end of the project. Selection and exclusion criteria define the recruitment process based on the requirements of the evaluation and the pilot site. There are certain aspects apart from the criteria which will be defined separately for each testing category: a) testing with implementers and b) testing with end‐users. The pilot site team will be responsible for applying the agreed and chosen criteria of recruitment. Usually a telephone interview suffices for investigating if they are met and the participant can be recruited to the study. Equal gender representation, different age groups, variety in disabilities addressed, people coming from different Socio‐Economic Status (SES), variation in experiences with using applications in their daily living activities (i.e. differences in ICT literacy), professional with experience in accessibility and others who are not, are important things to consider for each pilot site when they are preparing their groups of participants.

The latter (differences in experience working in accessibility area) is actually an important objective of the project and, thus, should be taken into serious consideration when pilot site carrying out tests with implementers (either internal or external) when recruiting professionals.

8.2 Basic recruitment steps

For contacting people with disabilities, there are certain guidelines and suggestions (Annex C.3). Researchers and people involved in testing and recruitment should


always ask users with accessibility needs how they want to receive and fill in/complete information, if they require assistance or anything else.

In each activity ‐that requires the participation of users‐ is necessary ensuring that it follows the steps below for managing users’ involvement:

First impression: Cognitive economy is a strong aspect of the impression users will form about the study and the test teams. Users will in most cases get official invitations via the organization they are members. Official invitation letter will be prepared and communicated to interested parties and organizations. Test teams should be available for any further queries the organizations, member, and other people might have. Such research is often advertised on media and social networks. Information about testing procedure and specifics is also provided.

Interview via phone/arrange test appointment and check participation criteria: A short telephone interview will ensure users fulfil testing requirements and affirmation for participation is provided. Reminder emails are sent if they are requested.

User compensation: If participants are compensated (e.g. monetary, voucher) this should be relevant to research effort. Users should be informed about it prior participation and during recruitment. The information/invitation letter sent out to organizations, centres and similar stakeholder centres should provide necessary information about compensation and voluntary activity with reference to test duration.

8.3 Prosperity4All Collaborative Network

Assurance that participants from established international institutes and companies will be actively involved in the project‐and thus should be officially invited to do so ‐ should be attained quite early in the project. In close and direct collaboration with dissemination and management teams, partners involved in Task 401.2 ‐ should establish contact with member’s representatives (e.g. media and public relation managers). These organizations are already aware of the aims and objectives of the project and, therefore, establishing contact will be more effective if the aims of testing are communicated to them as early as possible in order to start the recruitment process. The organizations that are members of the collaborative network and have signed the letter of commitment are among others, MADA‐Quatar Assistive Technology Center , Center for Assistive Technology and Environmental Access (CATEA), Georgia Institute of Technology, Trace R&D Center, University of Wisconsin‐Madison, Assist Me Live, Ideal Group, IBM‐Corporate ad IBM‐Korea Cloud Services, and The international trade association for consumers Foundation. Starting with such an extensive external pool of implementers ensures that they will bring in their own experiences (e.g. professional, cultural, personal) and will enhance the diversity in the already existing sample of participants coming from Austria, Germany, Greece, and UK.


9 Ethical issues

Ethics are very important when carrying out any type of research with human participants. The participant should be respected and protected in any case and their needs should be considered prior any evaluation phase starts. The supervisor and the facilitators should be experienced in ethical code of conduct and should apply it in any interaction with users. This chapter addresses mainly ethical issues relevant to the evaluation process (i.e. carrying out tests with human participants, ethical approval by regional committees, ethical regulations in each country acting as a pilot site, and data handling). The Prosperity4All Ethics Manual is part of D502.1 and ethical issues for the whole project will be addressed in this document and are not part of the evaluation framework. However, this chapter is relevant to the whole project’s ethical policy and these two documents should be aligned with regards to their content and perspective.

9.1 Focus of Ethics in Evaluation

The core issue of ethics in evaluation relates to the conduct of tests with all types of users that are foreseen in the context of the project. The major categories of users that will be involved in testing are end‐users of tools and applications (including people with disabilities), developers and implementers. In addition, all types of stakeholders that may have an interest in accessibility services and products (i.e. service providers, carers, governmental organizations with relevant activities, etc.) will be also involved.

In this context, it is vital to establish an ethical code of conduct, with which we will comply across all anticipated evaluation phases of the project. The focus of moral responsibility during the pilots is to protect participants. Ethics refer to the correct rules of conduct necessary when carrying out research. The Prosperity4All partners have a moral responsibility to protect research participants from any harm, anticipated or not.

Obviously, the ethical code of conduct for evaluation aligns with the Prosperity4All overall ethics policy, as it will be described in the project’s ethics manual (incl. in D502.1).

Herein, the ethical code of conduct for conducting tests with developers, implementers and end‐users is presented. It is essential to note that this section does not and by no means involves ethical issues related to the development work within the project for guaranteeing future products. In other words, the Prosperity4All ecosystem will incorporate services, products, and tools that need to be ethically designed with regards to any barriers anticipated or encountered in the ecosystem and the protection, security, privacy of personal data (e.g. location, routing, disability type, purchase/account details, etc.). Such aspects are discussed in the Ethics Manual.

Ethical considerations will be further specified in the separate testing plans deliverables (subsequent versions of D402.1 and D403.1 for implementers and end‐users, respectively). Until then, the evaluation schemes and specific data collection


techniques and procedures will be concrete and probably an adopted ethical protocol will emerge.

The pilot testing ethical protocol should be connected to the overall ethical and legal code of conduct established for the Prosperity4All ecosystem as presented in the Ethics Manual incorporated in deliverable D501.2 (updated in M3 and M6). Abidance to ethics policy is to be reported in ethical conduct controlling reports (updated on annual basis; incl. in D501.3, D501.4, D501.5).

The major ethical issues that relate to evaluation are the following:

Ethics Control and Monitoring

Informed Consent

Confidentiality and data protection

Deception

Risk assessment, Safety & Insurance

Withdrawal from a trial

Reimbursement and incentives for participation

Accessible facilities and services

Gender and overall equilibrium

Debriefing to participants

The way they are handled within Prosperty4All and within the evaluation are presented in Annex C.4. Ethics control refers to both pilot testing and training activities involving human participants. The pilot site manager will be in close collaboration with the ethics responsible person. In a nutshell, ethics representatives from pilot sites will be responsible for ensuring abidance to Prosperity4All ethical policy, ethics code of conduct for evaluation as well as national and European laws, Directives, guidelines, and moral considerations.

They will be also responsible for supervising ethical‐related procedures entailing preparing, completing and submitting the ethical application form to the regional/institutional or other relevant ethics body at least one month prior any test conduction (this includes pre‐pilots and technical verification; in case the latter involves participants). Time management with respect to ethics application submission might differ based on regional code‐of‐practice and work volume. In case of any issues, ethics responsible partners should communicate their problems with the Prosperity4All Ethics Advisory Board and collaborating partners.

Testing within Prosperity4All abides to both European and National guidelines as they are discussed in Annex D. This Annex serves as a compendium of legislation, guidelines and national rules and restrictions as they are active in each country serving as pilot test site.


9.2 Ethics Control during Evaluation Activities

9.2.1 Ethics control for end-user testing at pilot sites

An ethics site responsible will be identified in each pilot site (cf. Table 13), to guarantee that the pilots abide to the ethics code of conduct for evaluation (inherent part of the overall ethics policy of the project) and the relevant policy and restrictions posed by the local research ethics committees and other respective national authorized bodies in each case.

In compliance with the ethical code of conduct for evaluation, the local ethics responsible will ensure that the collected/retrieved performance data of the test participants are stored and kept properly secure and anonymised before use and post‐processing. Each pilot site ethics responsible is also responsible for communicating the Ethics application form for pilot conduction to both the regional and/or governmental bodies, following all the processes anticipated by the local/regional/national law and also to the Prosperity4All Ethics Advisory Board. The Ethics responsible person will be in collaboration with the Ethics Test Site Coordinator and the Ethics Supervisor (

Figure 15).

In addition, part of the Ethics Manual (D501.2) will be the Ethics Controlling Report Ethics Controlling Reports (parts of all periodic progress reports: D501.3, D501.4, D501.5, D501.6). In addition to other type of Partners that will complete it for other reasons in the course of the project (i.e. development entities when handling profile data, etc.), test sites ethics responsible will be obliged to complete it before (in the form of a commitment) and after the tests (in order to prove conformance to their prior commitment and report any deviations from it). The Ethics Advisory Board will be responsible for both the collection of the forms from each test site and the approval of its content or not. Upon receipt of the ethics controlling report by the test sites responsible, the Board will be empowered to cancel the tests, if it considers it does not comply with the ethics policy of the project and there are serious concerns that should be addressed beforehand. On the other hand, after the tests, the Board will be responsible to check the level of conformance to the ethics rules and any deviations and reasons for deviations. Upon this, the Board may give a warning notice for next iteration testing or, in the extreme case, may also forbid the post use of data collected.

Table 13: Test sites ethics responsible persons

Site Ethics responsible

TECHNOSITE & FONCE, Spain Javier Bazquez

LIFEtool, Austria Stefan Schürz

KIT, Germany Thorsten Schwarz

CERTH, Greece Katerina Touliou


In parallel, the test sites responsible will have to get approval by their local committees (through the Ethics Pilot Application Form they will submit to them – see Annex C.1). Prior to that, they will be obliged to communicate the pilot site ethics application form (i.e. research protocol) to the Prosperity4All Ethics Advisory Board in order to establish that their research protocol agrees with the project’s ethics policy and objectives prior submission to other national ethics related bodies. Core ethical issues ‐as presented in this chapter‐ should be addressed when compiling the ethical application form complementary to any other country‐specific ethical issues. In short, in order to proceed with the tests, test sites should get prior consent by both the Ethics Advisory Board of the project as well as their local Boards (Figure 15). If at least one type of approval is not given, the test site will not proceed to the tests, until all appropriate modifications take place in order to lead to abidance with the respective rules and policies. Approval (both types) should be received at least 1 month before the kick‐off of the tests (if not another timeframe is specifically anticipated by the local regulation). The ethics responsible per test site is shown in Table 13.

Figure 15: Ethical issues monitoring process and actors

9.2.2 Ethics control for implementers testing

A potential restriction might the possibility of external implementers participating in remote testing from another country; this is the most important scenario to consider for testing and this is the reason it is mentioned here. In such instance, privacy protection and data protection laws of both countries are effective and the strictest guidelines (from both countries) apply in instances of ethics, secure data collection, participant privacy, confidentiality, and anonymity. Usually, the regulation applies where the servers are hosted or rather what the imprint on the website says.


However, server hosting and data collection at the evaluators’ site is preferred/chosen in order not to tackle with data export regulations.

9.2.3 Ethical issues when users have interchangeable roles in the Prosperity4All ecosystem

In some cases, users will play different roles that are interchangeable in the context of Prosperity4All ecosystem. Fundamental ethical considerations are not affected by these roles. However, specifics might be affected when users are participating as members of one group rather than the other. If a user is participating as a developer, then reimbursement might differ (e.g. gift voucher) when compared to the testing sessions where they are considered as end‐users (e.g. monetary reimbursement). This is just an example of potential secondary issues that might arise when participants will play different roles. However, such issues will probably be elaborated further in the project’s lifespan.

1 0 Planning across evaluation phases

10.1 Core elements of testing plans

There are certain common aspects among evaluation phases and plans and they are clearly defined as necessary ten steps to be followed:

1. Decide purpose

Identify the objectives of the specific evaluation phase and come up with the appropriate research questions and respective hypotheses to be answered in each evaluation phase. The objectives of the whole evaluation framework were identified and added in the introduction of the deliverable.

2. Identify the primary intended users

Define the target users groups, the samples required by each group and the inclusion/exclusion criteria (if any). The intended users have been initially categorized based on their functional role they might play with regards to using the Prosperity4All platform.

3. Identified content and context of evaluation

Select the appropriate scenarios, application scenarios‐as identified in SP1‐and prepare the test tasks.

4. Use measures, indicators, metrics and materials

The evaluation material chosen for each evaluation phase will be based on the compilation annexed in this deliverable and the stratification tables as presented for implementers and end‐users. Specific materials (e.g. questionnaires, interview templates) will be prepared in each iteration phase to accommodate the needs of specific evaluation scenarios and tasks.

5. Procedure


The process of carrying out the test session is inevitably the body part of the evaluation plan and it will be explicitly and detailed enough in order all test sites testing the same aspects, to be able to harmonise their activities and reach the desired outcome (i.e. to gather reliable and valid data) and in general to measure what they are supposed to measure. Successful implementation of procedural steps entails a well‐managed execution of validation, pre‐tests, and actual tests.

6. Data management and collection

Data will be collected at each site based on templates that will be evaluated and prepared by the evaluation core team. Collection is anonymized at the spot.

For each evaluation phase, separate templates will be prepared depending on the requirements and needs. An evaluation guide will provide guidance to evaluation team members who will actively be involved in test conduction and data collection.

Automatic feedback mechanisms will also gather anonymized data and will make them available in a format which is easier to work (e.g. spreadsheet).

7. Synthesis of data from single evaluations ‐ Statistical analysis

Data analysis involves descriptive and inferential statistics. It is common to involve descriptive statistics in iterative/formative assessments and inferential statistics in summative testing where we want to quantify the change or improvement. The role of statistics will potentially be more to aggregate and visualize available data from all pilot sites than actually carrying out robust statistical tests. However, the actual worth of applying statistics shall depend mostly on the actual testing plans. There will be various data patterns and regardless user groups there will be numeric and textual data collection.

8. Understand causes of outcomes and impacts

At this stage, the results are presented and data visually displayed (e.g. bar chart, word cloud, interactive mapping). The research questions of each evaluation phase will be answered based on the produced outcomes and impacts that have been observed and collected during the specific evaluation phase.

9. Discussion

The last two aspects of the evaluations are related to the inferences derived by the analysis of results. Discussion should not be taken lightly as it will feed the next step and also the feedback loop back to developers (at least this holds true for the formative evaluation).

10. Lessons learnt – next steps

For the final impact assessment, users shall be various ecosystem actors with the possibility to be providers and byers; then the commercial element is strong in this evaluation framework and then the evaluation plan –at the latter stages‐ should include a journey map for the byers in order to assess the purchasing and bidding process.


10.2 Towards the 1st Evaluation Phase – Building the Reference Case for the Implementers

The reference case for the implementers needed to be progressively built from the beginning of the project, since the selected and integrated tools’ assessment by this group is the objective of the first evaluation round.

The following questions were set to partners involved in SP2 and SP3 teams (internal members working in developing and implementing teams) and were prepared in collaboration with partners from representation from all teams. As it shown below the questions are open‐ended and aim to gather as much feedback as possible about current status of working with consideration for unmet needs and experiences.

Implementers were interviewed on the following:

For what AT/UX related areas of your product or development work are you using or consider using/integrating external tools and components?

What are things in / about the design of tools and APIs that is particularly important for you?

What are the most important aspects concerning functionality? Are you satisfied with the current tools? What aspects are most frustrating / aggravating about them? What should the currently available tools (you considered or used) provide

that they don’t? How do you expect new tools to help you prosper — to be better off? Are you incentivized to use more free (as in speech) vs. free (as in beer) vs.

commercial tools? If yes, why? Where are you currently looking for new components, tools and frameworks? Are the tools accessible? Which are the critical parameters for your choice of new tools? How much effort (time and cost related) are you willing to spend to integrate

new tools and frameworks?

It is important that for the reference case, we avoid using standard scales that will be used later in the HF assessment, since the purpose here is to get as much descriptive information as possible without “guiding” the interviewees.

In a similar way, analytical descriptions of SP2 tools/resources and resources were provided for covering the following items in each case. These descriptions have allowed a primary understanding of SP2 tools/resources developers view for their tools potential use in the context of SP3 and external to the project implementations.

Who are the target users of your tool/technology? What motivates them to use it? How is your tool/technology used today? Describe some of the applications,

commercial or open source, in which it is used.


Why should anyone outside the project use your tool/technology in the future? What is the specific value brought to your potential users?

What should be done in order to make sure that your tool/technology will be widely used in commercial projects or products after P4A?

How will the success of your tool/technology benefit from being part of the GPII?

What are some of the problems or barriers you have encountered to wider adoption of your tool/technology?

What are your goals and roadmap for improving your tool/technology over the next few years?

Why is your tool/technology not already widely used in commercial projects or products?

What are the main barriers/problems?

The data collected from these interviews will be transcribed and the content will be analysed in order to highlight the existing conditions for professionals working in relevant disciplines. The derived inferences will provide knowledge about –among others‐what are currently the tools used by developers working in accessibility. The content analysis will result into specific themes/topics. These topics will be included in the later iteration phases in order to investigate any changes in participant’s attitudes with regards to initially identified topics. The exact evaluation materials (e.g. questionnaire) will depend on the following: a) the iteration phase (readiness of tool/technology), b) the development‐implementation connection, and c) the theme/topic identified (for example, practice is easier to reveal than attitudes with using simple question items).

10.3 Mapping between SP2 tools/resources and SP3 Implementations

An early identification of the dimensions and functionalities to be evaluating during the lifetime of the project are presented in Figure 1. Connecting the SP2 tools/resources with the SP3 applications and services, both belonging in main categories shown in section 6.2.1, is called mapping the outcomes or expected tools of SP2 with applications and services of SP3. An initial connection between SP2 developments and SP3 applications and services is available in the Description of Work (DoW) as presented in Figure 16.


Figure 16: Initial mapping and interaction between the different functionalities and the SP2 developments (DoW, p. 63)

However, the decision of choosing the appropriate SP2 tools/resources to improve and enhance which SP3 applications and services is part of the SP2 and SP3 development work (still in process) and will be the decisive feedback for the identification of the appropriate evaluation methods in each case, as being presented in this deliverable. The mapping that will be addressed in each iteration will be included in the respective evaluation plan, which will be based on that as well as on the respective reference case.

1 1 Training issues

An online training platform will be available for training and providing information of all participant groups (developers and end‐users) regardless if they are internal or external users and will be developed based on a hybrid educational and pedagogical model incorporating conventional (face‐to‐face) and e‐learning techniques with valuable information based on the development work available at time of testing.

Training will be carried out as on‐line tutorials with reciprocal feedback. Trainees will be able to assess the effectiveness and the usefulness of the course and their readiness and learning experience and level. Depending on the evaluation plans for each phase, it should be ensured that the appropriate and complete materials will be available. Training will be necessary for many purposes that are not necessarily only related to the online training courses:


a) Learning to use the tools (for inexperienced developers/implementers) which should be evaluated for each effectiveness per se (Demonstration and learning Days and face‐to‐face meetings);

b) Familiarisation with SP3 implementations, external applications and services for end‐users depending on the needs and requirements of addressed group (e.g. participants with cognitive impairment have increased need of simplified training which is carried out not long before testing takes place);

c) Learning curve estimation prior testing with regards to testing materials which is related to specific scenarios and tasks to be carried out by implementers and end‐users;

d) Training on the actual testing procedure, if it is decided that this is a necessary step prior any testing takes place;

e) Application guidelines provided to stakeholders;

f) Training of supervisors and facilitators in evaluation materials, testing scenarios, applications, services, and others that will be tested.

Training is essential (T502.4) for the evaluation and the successful, valid and reliable data collection, irrespective of data types (i.e. qualitative or quantitative). It is crucial to avoid false positive and true negatives when evaluating the use of either SP2 tools/resources and/or SP3 apps. In other words, efficient training and use of the training materials is mandatory for ensuring that learning is not a confounder in later data collection by adding learning into the evaluation process and ruining the actual evaluation results.

The online training programme will be developed with ATutor: Learning Management System (http://atutor.ca/). ATutor is an Open Source Web‐based Learning Management System (LMS) used to develop and deliver online courses. Administrators can install or update ATutor in minutes, develop custom themes to give ATutor a new look, and easily extend its functionality with feature modules (

). Educators can quickly assemble, package, and redistribute web‐based instructional content, easily import pre‐packaged content, and conduct their courses online. Students learn in an accessible, adaptive, social learning environment. The training courses might share many similar aspects with traditional classroom courses but in the success criteria, the online interaction should be taken into serious consideration for specific learning requirements of each user group.

Training courses and content need to be available at least a month before any testing takes place. Evaluation of training activities and content will be considered for the next iteration phase and the content will be specific to the requirements of the phase per se. Training scenarios will be prepared in close collaboration with development and testing teams. Training scenarios will be based on economic models and derived application scenarios. They will also facilitate testing, thus tasks will be available with examples and possibly accompanies by video tutorials. Therefore, harmonisation of development work, identification of user training requirements for this work, updating of training content accommodating their needs (e.g. narrator for blind users) and timely communication of the evaluation plan,


including scenarios and tasks, is important, in order that the available and appropriate information to be delivered on time prior any testing and before each phase starts.

Figure 17: ATutor courses menu (example)

1 2 Integrity of Evaluation

The integrity of evaluation is closely related to the identification of risks and threats that could affect the evaluation process, the evaluation objectives, data collection and the results, and, subsequently, the inferences driven by these findings.

A risk protection and mitigation plan addresses potential risks or threats and provides a common step‐by‐step mitigation strategy in case any problems arise. The following table is a quick and easy technique to identify any issues and could be used by all partners at each evaluation phase. A risk mitigation plan and the feedback loop tables will be important communication instruments among different work teams for the whole implementation process. The feedback loop table is actually a testimonial template for the evaluation supervisor to communicate any problems arising during testing prior any analysis is carried out (Table 14). Communicating problems as early as possible is a time‐efficient strategy and helps both end‐points to communicate issues. Whether and when a problem is fixed depends on the nature of the problem. Some problems are more time‐consuming and more complex than others.

Table 14: Developers/Implementers feedback loop template

Issue Date Tools/app

version

Description Allocated

partner

Priority*

(H/M/L)

Bug/failure/other?

An upper‐level distinction would be between general testing issues and group‐specific issues. The following table summarises key risks that might arise during evaluation (Table 15). Each potential risk is assigned a Likelihood of Occurrence (LoO) based on the complexity and diversity factors discussed in the beginning of this


document. The aim is to provide mitigation strategies for at least the Medium/Medium risks. Two steps were taken in order to complete the table with most important risks that could affect the stability and insularity of the evaluation framework.

Step 1: For each risk, assign a High/Medium/Low value for both likelihood of occurrence and potential impact on the project.

Step 2: Develop a mitigation strategy for each High/High, High/Medium and Medium/High risk.

For the low risks, there is no mitigation plan but they will be re‐visited in the first evaluation plan in order to ensure this categorization was appropriate.

Finally, meta‐evaluation will be carried out after the end of each evaluation phase by all involved partners in testing in order to sum up the lessons learnt and change or eliminate any false positive or negatives from the next evaluation phase. This process will prove useful not only for evaluation‐for which is directly beneficial‐but also for the development work to be continued after each phase. It should be borne in mind that meta‐evaluation is closely related to the process of maintaining a living and efficient feedback loop with developers and implementers and strengthens the collaboration among partners with different roles, not only as identified in DoW but also as it is relevant to the final Prosperity4All ecosystem.

Table 15: Evaluation risk and mitigation plan

Risk

Potential Impact on Evaluation Success (L/M/H)

LoO

(L/M/H)

Mitigation Plan

(strongly recommended for H/H, H/M and M/H; recommended for M/M)

Test planning and scheduling problems

H L Specific test plans will be available for each phase and initial scheduling exists in DoW (revisit in first evaluation plan)

Participant involvement and commitment problems

H (especially for external participants)

L Many entities have signed letters of support and partners are experienced in user testing

Back up sample, recruit more than required for replacing participants in case someone drops out


Risk


LoO

(L/M/H)

Mitigation Plan


Monitoring & management of evaluation conduction

H M Internal: end day testing meeting to discuss issues, problems, omissions, errors

External: Keep common diaries for evaluation teams and reporting their activities(e.g. google doc)

In case of no monitoring evidence, post hoc request for reporting

Test organizational & professionalism problems

M L Experience in testing and participation in numerous EU projects

Hold emergency discussion with responsible partners and management teams, decide upon next steps: ethics and good practice is guiding any such communication

Test tools & environments problems

H M Ensure status of tools/applications/services to be tested are available

Depending upon tools expected level of fidelity for specific evaluation phase, the tools/developments will either not be tested in the specific iteration or will be tested at a later stage


Risk


LoO

(L/M/H)

Mitigation Plan


Test communication problems

M L Internal: train all involved members and provide material (align with training activities)

External: Align activities with other test sites

Create a template or space to discuss issues apart from regular held meetings

In cases of mistakes and omissions because of lack or not adequate communication, then

Requirements related problems

M M List participants‘ requirements before testing

Communicate list of requirements to involved partners from other sites

Check requirements with test leaders

If the fact of requirements not being met during testing significantly affects the volume and quality of gathered data, then the responsible partner should conduct additional testing


Risk


LoO

(L/M/H)

Mitigation Plan


Data loss/inadequate data gathering

H

L Pre‐test with facilitators and representatives of user groups

Organize data in both raw and aggregated form to be able to return to initial form in case problems/errors occur

Keep testing diary in electronic format

Keep copies and duplicates If data are lost and this

occurrence affects the results and the quality, reliability and validity of the rest of gathered data for the purposes of the specific iteration phase, then additional testing will be carried out by the responsible partner to substitute any missing data.

Delays in ethical application submission and acceptance

M M Contact regional Ethics Committee for requirements and submission process as early as possible

Any delays in ethical application or approval affect testing and timelines. If approval is not obtained and delay is possible for the specific partner, then they will delay their testing for a while.


1 3 Data handling & Statistical analysis

There are traditionally four steps to be taken in order to reach inferences. The first two steps are relevant to data handling (data gathering and entering) and the two later steps to statistical analysis (descriptive and inferential). Firstly, data will be gathered at each pilot site.

Confidentiality and data protection (data handling & ethics)

Participants, and the data retrieved from them (performance or subjective responses) must be kept anonymous unless they give their full consent to do otherwise. However, in Prosperity4All, there is no reason to record and map names of participants to test data. The following guidelines should be followed by pilot sites carrying tests with end‐users:

1. Identifiable personal information should be encrypted (i.e. anonymisation and coding). Otherwise ethical approval is necessary specifically for this;

2. Anonymisation is preserved by consistently coding participants with unique identification codes. Only one person at each pilot site will have access to personal identifiers (if any). A Test ID will be issued for each of the participants, whereas the pilot site person that will collect and issue them will not have participated in the evaluation and will have not come into contact with the test participants and their performance in the tests;

3. Each individual entrusted with personal information is personally responsible for their decisions about disclosing it;

4. Pilot site managers must take personal responsibility for ensuring that training procedures, supervision, and data security arrangements are sufficient to prevent unauthorised breaches of confidentiality.

Coding anonymized data and storing

Information should be anonymised so that individual identities cannot be revealed. Anonymisation provides a safeguard against accidental or mischievous release of confidential information. There are different ways in which personal data can be modified to conceal identities:

Coded information contains information which could readily identify people, but their identity is concealed by coding. The key to which is held by members of the research team using the information.

Anonymised data with links to personal information is anonymised to the research team that holds it, but contains coded information which could be used to identify people. The key to the code might be held by the custodians of a larger research database.

Unlinked anonymised data contains nothing that has reasonable potential to be used by anyone to identify individuals. Combinations of all demographic data that might lead to identification of individuals or small groups will be avoided (e.g. age, gender, nationality, occupational and Socio‐Economic Status (SES), impairment type, address,


other contact details). In cases of in‐depth qualitative data collection (e.g. ethnographic observations, interviews) with increased complexity of data collection, potential links in data identification will be judged on a case‐by‐case basis and it will be taken into serious consideration for ethics approval.

Any databases including participants’ details will not be maintained after the end of the project, unless participants state so (i.e. in many occasions participants inform researchers that they would like to participate in other studies). In such cases, participants provide written consent of their willingness to share their personal details. The latter also depends heavily on national laws and guidelines. For the statistical analysis, the answers provided by the participants will be associated with their type of impairment (if any) or expertise, age, gender, familiarity and use of IT and AT, etc. However, each month, and during the project, the anonymised data will be re‐sorted randomly, to mix participants’ order. Data handling will be carried out only for anonymised datasets and will be aggregated and consolidated by the partner who shall consolidate and analyse data.

Different templates will be prepared for data gathering based on data type. Additional testing materials related to data gathering will be used such as meta‐data template (i.e. a template describing briefly the data types collected at each site and any related data that describe and present the procedure. Meta‐data templates facilitate analysts to understand the procedures and the nature of tests conducted at each site. This proves very helpful and efficient in cases the analyst is not the test responsible or is not a member of the test conduction team.

Separate common templates will be created for each instrument and technique applied. For example, interviews with open‐ended questions will be transcribed under main themes topics for further content analysis and questionnaires could be available in electronic forms (e.g. google forms). Common templates are essential instruments for harmonised data collection and consolidation of findings. In case of different instruments used for similar attributes but different facets (e.g. usefulness in usability), then standardised values will be calculated to provide appropriate descriptive statistics. As data have been identified to certain categories (e.g. subjective and objective, qualitative and quantitative with respective combinations) it provides a first categorisation for further data analysis and for the software statistical tools used to carry out any descriptives or inferentials. If further analysis is required, then data will be either imported to statistical software (e.g. SPSS) or qualitative data analysis tools (e.g. NVivo; content/theme analysis). As mentioned at the beginning of this document, calculating the Confidence Intervals for certain data types will be of benchmarking, formative, and extrapolating value of data gathered within the lifespan of the project. The latter is of significance and value for the final impact assessment calculations.

Evaluation of mature versions will include estimation of confidence intervals wherever appropriate to associate also the marketability and provide input to impact assessment calculations. Impact assessment might also “borrow” meaningful aggregated analysis if they will be assessed to be of considerable value for performing the impact assessment.


1 4 Practical Organisation of Work

The framework includes also practical information about work distribution among involved partners. The following Gantt ( Figure 18) presents the work allocation and a timeline for the work to be carried out within SP4 and the teams involved. It includes also activities related to final demonstrations because they may not be actual evaluation but they will be related to both development work and evaluation status.

This Gantt is different from the ones included in the technical annex. It was prepared specifically for the evaluation framework and presents distribution of tasks within each of the SP4 WPs, among partners who lead and are involved in these tasks, as well as the duration of each task in actual dates. This Gantt can be used for reference in order partners who are involved in evaluation to be able to check where they are expected to contribute and when specific tasks are active or not. The months of the project have been replaced by real dates to facilitate easy access to information about required work and deadlines; when work is expected to finish. The work relevant to SP4 lasts through the lifespan of the project. The evaluation planning and execution will be conducted in close collaboration with the other SPs and the management team.

Figure 18: Gantt chart presenting the work allocation and the duration of each task in SP4

1 5 Conclusions and Next Steps

The Prosperity4All evaluation framework is defined by the driving force to make accessible products‐on‐demand and offering a living ecosystem, where all these will be available. Initially, a zero‐approach was adopted and was built upon the two main user groups, implementers and end‐users, being later extended to all stakeholders expected to interact with the ecosystem. Aspects from user experience and usability research were integrated with business indicators for the holistic ecosystem evaluation. Each phase, apart from being the stepping stone for the next, will offer the opportunity to reveal different aspects from the previous one. The first iteration


will show utility strengths and weaknesses and will probably give a feel of completing the frame of the puzzle. The second iteration phase will be the communication portal for offering the Prosperity4All development work and idea to external implementers, the third phase, will be the first time with end‐users, and the final phase will be a full‐scale assessment. Each step within Prosperity4All is not just iteration for software improvement; each entails elements and, thus, variation in planning and executing. Therefore, each evaluation plan will need to reflect validated development work and stands as evaluation with technical verification aspects embedded in each process prior any testing with users.

The evaluation framework provides the backbone for the development of the evaluation plans of each iteration phase. The next step is the preparation of the evaluation plans for implementers (D402.1) and the main focus is to identify the empirical plan for conducting the first evaluation phase with implementers. Key aspects will be the definition of personas as numerous actors are identified within SP1, the provision for specific evaluation metrics and instruments, the actual matching between the SP2 tools/resources and the SP3 internal applications and services –serving also as a basis for the testing protocols and elaborated plans‐ and, last, building the reference case for developers and implementers.

Finally, this document shall be updated when the demand and supply chain modelling is complete with clear demand‐supply models, and subsequently application scenarios, for all three types of evaluations (i.e. implementers, end‐users, and impact assessment). This deliverable will be updated with new information from SP1 and an updated version will be available in 12 months.

References

1. http://www.userfocus.co.uk/articles/datathink.html

2. www.drmbit.org

3. www.arcbazar.com

4. http://www.tetrasociety.org/

5. Medlock, M.C., Wixon, D., Terrano, M., Romero, R., and Fulton, B. (2002).

Using the RITE method to improve products: A definition and a case study.

6. Norman DA. 1983. Some Observations on Mental Models. In: Gentner D, Steven

A, editors. Mental Models. Hillsdale (NJ): Lawrence Earlbaum Associates. p. 7‐14.

7. "Five Whys Technique". adb.org. Asian Development Bank. February 2009.

(Retrieved: 29 August 2014).

8. Brooke, J. (1996). SUS: A "quick and dirty" usability scale. In P. W. Jordan, B.

Thomas, B. A.Weerdmeester, & A. L. McClelland (Eds.), Usability Evaluation in

Industry. London: Taylor and Francis.

9. Kirakowski, J., Corbett, M. SUMI: TheSoftware Usability Measurement Inventory.

British Journal of Educational Technology, 24(3), 1993, pp. 210‐212.

10. Cauley, K, M.; McMillan, J. H. (2010). Formative Assessment Techniques. The

Clearing House, 83.


11. http://ec.europa.eu/governance/impact/index_en.htm

12. Steven Clarke, (2004). Describing and Measuring API Usability with the Cognitive

Dimensions.

13. Farooq, U. & Zirkler, D. (2010), API peer reviews: a method for evaluating

usability of application programming interfaces., in Kori Inkpen Quinn; Carl

Gutwin & John C. Tang, ed., ’CSCW’ , ACM, , pp. 07‐210.

14. Karwowski W, Soares M M, Stanton, N A. Human Factors and Ergonomics in

Consumer Product Design: Methods and Techniques (Handbook of Human Factors

in Consumer Product Design): Needs Analysis: Or, How Do You Capture,

Represent, and Validate User Requirements in a Formal Manner/Notation before

Design (Chapter 26 by K Tara Smith), CRC Press. 2011.

15. Marc Hassenzahl (2013): User Experience and Experience Design. In: Soegaard,

Mads and Dam, Rikke Friis (eds.). The Encyclopedia of Human‐Computer

Interaction, 2nd Ed. Aarhus, Denmark: The Interaction Design Foundation.

16. http://semanticstudios.com/user_experience_design/

17. ISO FDIS 9241‐210:2009. Ergonomics of human system interaction ‐ Part 210:

Human‐centered design for interactive systems (formerly known as 13407).

International Organization for Standardization (ISO). jithin dev.

18. Law, E., Roto, V., Hassenzahl, M., Vermeeren, A., Kort, J.: Understanding,

Scoping and Defining User Experience: A Survey Approach. In Proceedings of

Human Factors in Computing Systems conference, CHI’09, 4–9 April 2009, Boston,

MA, USA (2009).

19. Green, T. R. G.; Petre, M. (1996). Usability analysis of visual programming

environments: A ’cognitive dimensions’ framework. Journal of Visual Languages

and Computing 7 (2): 131–174. doi:10.1006/jvlc.1996.0009.

20. Cimperman, Rob (2006). UAT Defined: A Guide to Practical User Acceptance

Testing. Pearson Education. pp. Chapter 2. ISBN 9780132702621.

21. Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user

acceptance of information technology. MIS Quarterly, 13(3), 319–340.

22. Jean‐Sébastien Sottet, Gaëlle Calvary, Joëlle Coutaz, Jean‐Marie Favre (2008). A

Model‐Driven Engineering Approach for the Usability of Plastic User Interfaces.

Engineering Interactive Systems. Lecture Notes in Computer Science, Vol. 4940,

140‐157.

23. Lauesen, S. (2004). User Interface Design: A software engineering perspective.

Pearson Education Ltd., Essex, UK.

24. Pruitt, J. & Adlin, T. (2010). The persona lifecycle: keeping people in mind

throughout product design. Morgan Kaufmann.

25. WAMMI: 20‐statement questionnaire Website Analysis and MeasureMent

Inventory (www.wammi.com).

p4all - d401 1 evaluation framework and supporting ... · prosperity4all, and as such, the backbone...

Documents