d4.1 - requirements methodology - openmintedopenminted.eu/wp-content/uploads/2017/01/d4.1... ·...

D4.1 – Requirements methodology

31 AUG 2015 OpenMinTeD

Open Mining INfrastructure for TExt and Data

Deliverable Code: D4.1

Version: 1 – Final

Dissemination level: PUBLIC

This deliverable presents the plan of design and development of the OpenMinTeD infrastructure services for the next 12 months. At month 12 and 24 an update of this deliverable will be produced. Its intended use is mainly for technical partners to locate their software release duties in the wider context of the project software release, but also for the generic reader to get an overall picture and insight view of the technical activities.

H2020-EINFRA-2014-2015/H2020-EINFRA-2014-2

Topic: EINFRA-1-2014

Managing, preserving and computing with big research data

Research & Innovation action Grant Agreement 654021

Ref. Ares(2015)3782098 - 13/09/2015

PUBLIC

D4.1 - Requirements methodology Page 1

Document Description

D4.1 – Requirements methodology

WP4 - Community Driven Requirements and Evaluation

WP participating organizations: ARC, UNIMAN, UKP-TUDA, INRA, EMBL-EBI, AK, LIBER, UvA, OU, EPFL, CNIO, USFD, GESIS, GRNET, Frontiers

Contractual Delivery Date: 07/2015 Actual Delivery Date: 08/2015

Nature: Report Version: 1 Final

Public Deliverable

Preparation Slip

Name Organisation Date

From Nikolaos Marianos Charalampos Thanopoulos Vassilis Protonotarios Effie Tsiflidou Thodoris Kontogiannis

AK AK AK AK AK

24/07/2015

Edited by Nikolaos Marianos AK 24/07/2015

Reviewed by Sophie AubinRichard Eckart de Castilho Peter Mutschke

INRA UKP-TDA

GESIS

19/08/2015

Approved by Mike Chatzopoulos ARC 11/09/2015

For delivery Mike Chatzopoulos ARC 11/09/2015

Document Change Record

Issue Item Reason for Change Authors Organization

V0.1 Draft version ToC, Tools and guidelines for partners

Charalampos Thanopoulos, Vassilis Protonarios

AK

V0.5 Draft version Description of methodology Nikolaos Marianos Charalampos Thanopoulos Vassilis

AK

PUBLIC

Page 2 D4.1 - Requirements methodology

Protonarios Effie Tsiflidou

Thodoris Kontogiannis

V0.9 First Delivery First full version for review Nikolaos Marianos Charalampos Thanopoulos

AK

V1.0 Final version Final version addressing comments from peer-review

Nikolaos Marianos Vassilis Protonarios

AK

PUBLIC


Table of Contents

1| CONTEXT 11

1.1 BACKGROUND AND CONTEXT 11 1.2 USE CASES TO BE STUDIED BY THE PROJECT 12 1.2.1 SCHOLARLY COMMUNICATION 13 1.2.2 LIFE SCIENCES 14 1.2.3 AGRICULTURE / BIODIVERSITY 15 1.2.4 SOCIAL SCIENCES 17 1.2.5 SUMMARY OF INITIAL USE CASES ANALYSIS 17 1.3 OVERVIEW OF PROJECT STAKEHOLDERS 19 1.4 KEY MILESTONES OF REQUIREMENTS ANALYSIS 21

2| DESIRED OUTCOMES 23

2.1 PROFILING USER PERSONAS 23 2.2 CONTENT ANALYSIS 26 2.3 INFORMATION-RELATED CHALLENGES AND PROBLEMS 29 2.4 DESCRIPTION OF INFORMATION SERVICES & SYSTEMS 31 2.5 DESCRIPTION OF CURRENT USAGE SCENARIOS OF INFORMATION SERVICES & SYSTEMS 33 2.6 DESIGN INTERFACE WIREFRAMES 33 2.7 DESCRIPTION OF USER-ENVISAGED USAGE WORKFLOWS 34 2.8 LIST RESULTING REQUIREMENTS FOR TDM-POWERED OUTCOMES SERVICES 36

3| METHODOLOGY 39

3.1 INITIAL PROFILING OF TARGETED PERSONAS 40 3.1.1 USING ONLINE QUESTIONNAIRES 40 3.1.2 INTERVIEWS 42 3.1.3 USING THE PERSONA GRAPH 43 3.2 VALIDATION OF USERS’ REQUIREMENTS 44 3.3 DESIGNING FINAL WIREFRAMES WITH REQUIRED FEATURES FOR THE INFORMATION SERVICES OF THE USE CASES 44 3.4 TRANSLATING FEATURES INTO REQUIREMENTS FOR TEXT MINING SERVICES AND E-INFRA 45

4| TOOLS AND TEMPLATES 46

4.1 GENERIC QUESTIONNAIRE FOR USER & CONTENT PROFILING 46 4.2 EXAMPLE OF CUSTOMIZED QUESTIONNAIRE FOR AGRICULTURAL PERSONAS 47 4.3 GUIDELINES FOR THE ORGANIZATION OF REQUIREMENTS' WORKSHOPS PER USER PARTNER (FOR PERSONA VALIDATION &

BRAINSTORMING) 47 4.3.1 PRACTICALITIES 48 4.3.2 INTERACTIVE SESSIONS 49 4.3.3 MATERIALS REQUIRED FOR THE IMPLEMENTATION OF THE EVENT 49 4.4 FINAL INTERFACE WIREFRAMES TO REVISIT & REFINE PER USER PARTNER AND INFORMATION SERVICE 52

5| SCHEDULE OF NEXT STEPS 53

6| CONCLUSIONS 55

7| REFERENCES 57

PUBLIC


8| ANNEX A: PERSONA GRAPH TEMPLATE 59

9| ANNEX B: GENERIC ONLINE QUESTIONNAIRE 60

10| ANNEX C: ONLINE QUESTIONNAIRE EXAMPLE (AGRIS USE CASE) 64

11| ANNEX D: INVITATION TO THE WORKSHOP 69

12| ANNEX E: TEMPLATE FOR THE AGENDA OF A WORKSHOP 70

13| ANNEX F: SCRIPT FOR RUNNING A WORKSHOP 74

14| ANNEX G: METHODOLOGY OF THE WORKSHOP PPT TEMPLATE 76

15| ANNEX H: INTERACTIVE SESSION I PPT TEMPLATE 78

16| ANNEX I: INTERACTIVE SESSION II PPT TEMPLATE 81

17| ANNEX J: INTERACTIVE SESSION III PPT TEMPLATE 83

18| ANNEX K: REPORTING FORM FOR A WORKSHOP 85

PUBLIC


Table of Figures Figure 1: The envisaged use cases per thematic areas, according to the DoA ________________________________________ 13 Figure 2: Empty Persona Graph with some initial questions to be answered __________________________________________ 24 Figure 3: A Persona Graph produced using markers and a flipchart sheet during the OpenMinTeD kick-off meeting _________ 25 Figure 4: Example of a completed Persona Graph for the Agriculture / Wheat use case, from the OpenMinTeD kick-off meeting 26 Figure 5: Identification of content / data requirements for the Wheat Researcher persona, through the Persona Graph _______ 27 Figure 6: Identification of information challenges for the Wheat Researcher persona, through the Persona Graph ____________ 30 Figure 7: Identification of Solution Feautures for the Wheat Researcher persona, through the Persona Graph _______________ 32 Figure 8: Wireframe produced for the Agriculture / Wheat use case using a software tool, during the OpenMinTeD kick-off

meeting _____________________________________________________________________________________________ 34 Figure 9: Process workflow for the Agriculture/Wheat use case, drawn during the OpenMinTeD kick-off meeting ___________ 35 Figure 10: Workflow & Wireframes for the Agriculture / Wheat use case, from the kick-off meeting ____________________ 36 Figure 11: Example Drupal-based form for collecting responses: The AGRIS use case _________________________________ 40 Figure 12: Partial view of the OpenMinTeD generic questionnaire ________________________________________________ 41 Figure 13: Information about a persona to be transferred to a Persona Graph_______________________________________ 43 Figure 14: Part of the generic online questionnaire with comments under each question ________________________________ 46 Figure 15: Part of the AGRIS use case online questionnaire ______________________________________________________ 47 Figure 16: Requirements Elicitation Gantt ____________________________________________________________________ 53 Figure 17: Example of a map guiding participants to the meeting place ____________________________________________ 72

List of tables Table 1: Summary of the requirements extracted from the OpenMinTeD envisaged use cases ....................................................... 18 Table 2: Key milestones and outcomes timeframe ......................................................................................................................... 22 Table 3: Content sources of interest per thematic area ................................................................................................................. 27 Table 4: Initial content requirements as Identified During the OpenMinTeD Kick-Off Meeting ....................................................... 28 Table 5: Information-related Challenges as Identified During the OpenMinTeD Kick-Off Meeting ................................................ 30 Table 6: Solution features as Identified During the OpenMinTeD Kick-Off Meeting ...................................................................... 32 Table 7: Important dates ............................................................................................................................................................. 53

PUBLIC


PUBLIC


Disclaimer This document contains description of the OpenMinTeD project findings, work and products. Certain parts of it might be under partner Intellectual Property Right (IPR) rules so, prior to using its content please contact the consortium head for approval.

In case you believe that this document harms in any way IPR held by you as a person or as a representative of an entity, please do notify us immediately.

The authors of this document have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated in the creation and publication of this document hold any sort of responsibility that might occur as a result of using its content.

This publication has been produced with the assistance of the European Union. The content of this publication is the sole responsibility of the OpenMinTeD consortium and can in no way be taken to reflect the views of the European Union.

The European Union is established in accordance with the Treaty on European Union (Maastricht). There are currently 28 Member States of the Union. It is based on the European Communities and the member states cooperation in the fields of Common Foreign and Security Policy and Justice and Home Affairs. The five main institutions of the European Union are the European Parliament, the Council of Ministers, the European Commission, the Court of Justice and the Court of Auditors. (http://europa.eu.int/)

OpenMinTeD is a project funded by the European Union (Grant Agreement No 654021).

PUBLIC


Acronyms DoA Description of Action

TDM Text and Data Mining

UIMA Unstructured Information Management Architecture

PUBLIC


Publishable Summary This document is the first part of the project’s Work Package 4 titled “Community Driven Requirements and Evaluation” and aims to provide the methodology for collecting requirements relevant to TDM from research communities that have been identified as potential end users of the OpenMinTeD project services.

The first step in the process of the OpenMinTeD project developing its text- and data-mining powered services, as well as its services to end users and e-infrastructure, is the identification of the actual needs of the communities that are currently in need of these. This requires a series of steps that will ensure the identification and description of the potential end users and communities of interest of the project, the extraction, collection and analysis of requirements related to the expected outcomes of the project, and the validation of all requirements collected before they are transformed into project outcomes.

In this context, the OpenMinTeD project aims to investigate, record and analyze the actual needs of a wide variety of stakeholders that will drive the requirement analysis and come up with the interoperability specifications and the design and development of the overarching platform, as well as the community services and support structures that will surround this platform. The definition of the functional specifications of OpenMinTeD and all subsequent research and development will be driven by this comprehensive requirement analysis. Its first priority is to bring different communities together and engage them in the process of requirements elicitation in a methodological manner.

As described in the following sections, the requirements will be elicited through an online survey, interviews and events like workshops that will engage researchers and users from the participating organizations that bring text mining researchers as well as various end user types and communities. The proposed methodology revolves around user profiling and analysis, referring to the identification of characteristics of different user types. This will be carried out through the persona methodology, which refers to the development of use profiles using common characteristics regarding their educational and research background, behavior patterns, goals, skills etc. The analysis of each of the four user communities will assist in prioritizing the importance of the identified personas and use the appropriate ones to conduct problem validation interviews.

An important aspect of this task is also the validation of the collected requirements, which will take place mostly during workshops that will ensure the participation of various types of stakeholders and through interactive sessions, user feedback on content sources, content related issues, proposed TDM-powered services and related workflows will be extracted and the initial designs of these will be drawn. As a next step, this feedback will be validated through the use of online surveys and questionnaires, real life or Skype/phone call interviews etc.

The methodology described in this document aims to provide the means to successfully perform the following activities:

identify typical use cases and applications;

record and chart the profiles of end users/researchers who are involved in or use TDM;

PUBLIC


analyze the requirements of the distinct use cases and synthesize to produce the OpenMinTeD

functional specifications;

transform the generic use cases used in the requirements process to concrete and detailed scenarios

that will drive domain specific TDM applications into solve researchers’ problems;

define a validation framework through a set of indicators to validate all aspects of the OpenMinTeD

infrastructure.

The mean through which the methodology described in this document will be applied are the envisaged use cases that aim to engage different groups of stakeholders in the activities aiming at the elicitation of requirements and their validation at a later stage. These use cases will be described in the following sections.

PUBLIC


1| CONTEXT

1.1 Background and context

The aim of the OpenMinTeD project is to enable the development of an infrastructure that fosters and facilitates the use of text and data mining technologies in the field of scientific publications but not limited to it, by two main user categories: application domain users and text-mining experts. OpenMinTeD aims to take advantage of existing tools and text mining platforms, facilitating access to them through the appropriate registries, and enabling or enhancing their interoperability through an interoperability layer based on existing standards. OpenMinTeD supports awareness of the benefits and training of text mining users and developers alike and demonstrates the merits of the approach through a number of use cases identified by scholars and experts from different scientific areas, ranging from life sciences (bioinformatics, biochemistry, etc.) to food and agriculture and social sciences and humanities related literature. It brings together different types of stakeholders, including content providers and scientific communities, text mining and infrastructure builders, legal experts, data and computing centres, industrial players and SMEs. Through its infrastructural foresight activities, OpenMinTeD’s vision is to make operational a virtuous cycle in which

a) primary content is accessible through standardized programmatic interfaces and access rules,

b) by well-documented and easily discoverable text mining services and workflows which process, analyze and annotate text to

c) identify patterns and extract new meaningful actionable knowledge, which will be used for

d) structuring, indexing and searching content, and, in tandem,

e) act as a new knowledge resource useful for drawing new relations between content items and firing a new mining cycle.

OpenMinTeD aims to provide a service-oriented infrastructure that will enable search, retrieval, selection and access, remote or local, to all the necessary content, TDM services, appropriately documented with formal metadata, combinable into executable workflows, usable in a range of use cases. The project takes a science-oriented, researcher-centred approach throughout its design and implementation phases, which aim at ensuring that researcher communities’ needs are perfectly addressed, thus maximizing the acceptance and uptake of the infrastructure. To achieve these aims, OpenMinTeD involves researchers from a number of scientific communities ranging from scholarly communication (OpenAIRE, UK/CORE, LIBER, Frontiers), to biochemistry (EMBL-EBI) and neuroinformatics (Human Brain Project), to agriculture (INRA, Agro-know/FAO) and social sciences (GESIS) domains to

(i) gather requirements and chart the respective fields as to TDM usage and practices as well as

tools, resources and standards used,

(ii) define prototype applications that serve the corresponding scientific communities via the

OpenMinTeD infrastructure, and

(iii) evaluate these applications in relation to the infrastructure.

It is worth noting that the OpenMinTeD project instead of developing yet another infrastructure and TDM-powered services, it builds upon existing efforts from text mining partners who have

PUBLIC


developed their own text-mining systems and platforms that are in production and serve diverse scientific communities (GATE, GATE Cloud and AnnoMarket, Argo and U-Compare, Alvis, DKPro, META-SHARE and its language processing service providing node QT21), abide to architectural standards (UIMA, GATE) or implement their own architectures, align to different types of European infrastructures (OpenAIRE, EGI federated cloud, emerging AAI infrastructure), and provide support to community initiatives (e.g. BioCreative, BioNLP, FOSTER). The project will closely collaborate with the winning proposal of the GARRI (Governance for the Advancement of Responsible Research and Innovation) Call at critical phases to receive input regarding legal and policy aspects, as well as to plan for joint efforts in awareness and community engagement activities.

In order for the project to be able to engage user communities and test the application of these existing TDM-powered services, tools and e-infrastructures, it needs to define use cases that involve the user communities of interest, their content-related challenges and sources and identify potential solutions addressing these challenges. In this context, OpenMinTeD has defined the thematic areas of interest as well as a number of use cases involving different types of users, different content sources and different content-related challenges; in this way, the project will be able to test and propose a variety of solutions adapted to meet the needs of each individual user community, while the aim is to provide an appropriate solution to each type of user identified. These use cases are presented in the next section.

1.2 Use cases to be studied by the project

The OpenMinTeD project has identified a selected a number of use cases that will be studied through the lifetime of the project and will provide the mean for developing the tools to serve the corresponding research communities. These use cases refer to communities and software platforms that could benefit from the outcomes of the OpenMinTeD project, which will offer innovative applications of text and data mining approaches. These use cases fall under four (4) thematic areas of interest to the OpenMinTeD project:

1. Scholarly Communication;

2. Life Sciences;

3. Agriculture / Biodiversity;

4. Social Sciences.

Each thematic area is expected to include a number of use cases that will be used for the elicitation of user requirements and the development of services aiming to meet the specific needs of each user community. An indicative list of use cases per thematic area is presented in the following table.

PUBLIC


FIGURE 1: THE ENVISAGED USE CASES PER THEMATIC AREAS, ACCORDING TO THE DOA

Each one of these use cases provides preliminary information on requirements that will be studied for the development of the project’s outcomes, in terms of content and data sources, TDM service providers, target users and communities. Brief information about these use cases per thematic area is provided in the following section, based on the project’s DoA. It should be noted that these use cases are subject to revisions, according to the initial outcomes of the methodology described in this document.

1.2.1 Scholarly communication

DESCRIPTION

In the context of Scholarly Communication, there are three (3) envisaged use cases that aim to focus on slightly different aspects:

1. Semantic search and discovery of open scientific outcomes

2. Map of academia – scholarly communication network

3. Research monitoring and analytics

The first use case aims to identify and address issues related to the traditional information retrieval technologies that can usually meet only the basic information retrieval needs of the end users but fail to provide refined search and retrieval options. Using content that has been semantically enriched and well-described, the project aims to provide content and service providers with enhanced semantic metadata extraction mechanisms and at the same time provide semantic search mechanisms.

The second use case aims to create a comprehensive dynamic map of relations in academia between people (such as open citation index, co-authorship network), institutions, publications, funding sources, patents and data/software citations. This is expected to be addressed by using scholarly communication infrastructures like OpenAIRE1 and CORE2 for discovering the appropriate disambiguation and entity resolution services, configure them for their needs (e.g., based on language) and apply them to repository content with the appropriate policies.

1 https://www.openaire.eu/ 2 http://core.ac.uk/

https://www.openaire.eu/

http://core.ac.uk/

PUBLIC


The third use case is focused on publishing (registered and discovered) innovative services for topic modelling (such as the ARC Communication Research Network3/OpenAIRE), potentially used by a wider range of stakeholders. These innovative services are expected to be enriched through OpenMinTeD with language detection and NLP mechanisms.

POTENTIAL END-USERS

Among the potential end users of this set of use cases are content (publishers and repositories) and service (OpenAIRE, CORE, Europe PMC) providers that will be engaged in using text-mining services to incorporate semantic metadata extraction mechanisms and provide semantic search mechanisms.

1.2.2 Life Sciences

DESCRIPTION

In the context of Life Sciences, there are two (2) envisaged use cases that aim to focus on slightly different aspects:

1. Text mining assisted curation of the EMBL-EBI chemical databases

2. Text mining assisted curation of the neuroscience resources KnowledgeSpace and NeuroLex

The former aims at implementing a workflow-based application oriented towards curation, demonstrating how it can accelerate extraction and curation of information about small molecules of biological interest (such as natural products, drugs, chemical compounds, etc.) from the available open literature. It will demonstrate how large-scale analysis of the literature (EPMC) can be used to automatically extract chemical structures, chemical properties, biological roles, bioactivities, biological targets and reactions, by leveraging text mining tools already developed by the community and shared tasks (BioNLP, BioCreative and U-Compare).

The latter aims to use the same approach as the former but in a different field. More specifically, it aims to implement a workflow-based application oriented towards curation, demonstrating how it can accelerate extraction and curation of information about neurons, brain regions, anatomical entities, diseases from the open literature. It will demonstrate how large-scale analysis of the literature (EPMC) can be used to automatically extract such information, leveraging text mining tools already developed by the community and shared tasks and demonstrating how new solutions can be customized, based on the OpenMinTeD infrastructure and its annotation platform using crowd sourcing

POTENTIAL END-USERS

The potential end users and other stakeholders of these use cases include but are not limited to:

Text mining and NLP developers working on the processing of unstructured data in the domains of

life sciences, biomedicine or chemistry. Also developers of building block technologies such as

information retrieval, text categorization, named entity recognition, named entity grounding,

information extraction, relation mining, development of interactive text mining systems or visualization

of text mining results.

3 https://researchdata.ands.org.au/arc-communications-research-network/64707

https://researchdata.ands.org.au/arc-communications-research-network/64707

PUBLIC


Database curators carrying out literature curation, including model organism databases (MOD) like

TAIR, MGI, RGD, WormBase, FlyBase, MaizeGDB; Functional genome annotation databases (e.g.

GOA), proteomics databases (BioGRID, IntAct, MINT), comparative toxicogenomics (CTD).

Experimental biomedical and basic science researchers: (1) to improve the interpretation and

design of experimental research by improving the access to previously published information on the

studied bio-entities; (2) using literature mining and knowledge discovery software for the generation

of new hypotheses that will be experimentally validated.

Clinicians: improve the information access for evidence-based clinical practice using text mining

technologies and biomedical semantic search engines

Chemists: systematic access to chemical information (structure associated chemical entities) described

in the literature and patents.

Biocurators and Bioinformaticians: Text-mining assisted curation results are useful as Gold Standard

validation sets for predictive bioinformatics results.

Pharma Industry: Drug discovery and target selection, identifying adverse drug effects, competitive

intelligence and knowledge management

Publishers: semantic annotation of online publications, structured digital abstracts

Scientific papers authors: author derived annotations (assisted completion of structured digital

abstracts)

Patients: improved search engines, especially important for the detection of cases of similar rare

disease cases and personalized medicine (automatic detection of mutations descriptions published in

the literature)

Computer scientists: useful training data provided by BioCreative to improve the performance of

cutting edge statistical machine learning algorithms and feature selection/exploration.

1.2.3 Agriculture / Biodiversity

DESCRIPTION

In the context of Agriculture and Biodiversity, there are three (3) envisaged use cases that aim to focus on slightly different aspects:

1. Enrich agricultural databases to assist food- and water- borne disease outbreak alerts and product

recalls

2. Image, figure and dataset discovery in the AGRIS FAO online service (8M bibliographic resources)

3. Aggregation of various types of data for serving the Wheat researchers’ community

The first use case aims to provide solutions to the community consisting of food safety officers and agencies, water quality managers and other stakeholders of human health through nutrition, such as the Global Food Safety Partnership (GFSP). OpenMinTeD aims to provide a semantic-based querying including relations among microorganisms and their locations and graphical display of the parsing results through a text mining-powered discovery and aggregation of relevant content

PUBLIC


(e.g., microorganisms, kind of food, food processing, and water origin) from diverse trusted sources and their normalization by a shared reference vocabulary.

The second use case plans to support researchers using the AGRIS bibliographic database4 by providing them with value-added services based on text-mining mechanisms that will preview after each search related images/datasets that are located or mentioned inside the publications. End users are expected to be able to click the preview of the related figure (image, diagram, dataset name) and will be redirected to the specific location of the publication to study more details.

The third use case will focus on how text and data mining services may enhance and support further online publication and data search applications. The community of researchers and breeders working on Wheat and other plants is expected to benefit from the project’s outcomes through the automatic linking of different pieces of information from databases and literature. The targeted application will focus on information relating to genetics, regulation and phenotypes with information retrieval and data integration perspectives. Whenever possible, it will rely on Linked Open Data standards and resources and will conform to the RDA Wheat Data Interoperability Working Group5 recommendations.

The fourth use case plans to support indexing and search of information on biodiversity database through GBIF (Global Diversity Information Facility) portal. The aim is to improve search features, currently based on IDs, countries and numerical data (e.g. geolocalisation), with the possibility for the end user to query the portal with general habitat terms, such as 'aquatic environment' and retrieve all occurrences of species known to be living in aquatic environment (e.g. river, lake). Habitat structured information is critical for ecology and genetics studies. Beyond GBIF, the workflow could interest many other stakeholders (e.g. working on biodiversity, genetics, etc.). The information source for the use case could be the free text of the species occurrence records, especially the habitat slot and the external sources.

POTENTIAL END USERS

Food safety officers and agencies, water quality managers

Information managers

Researchers

Breeders

Bioinformatics application providers

Veterinarians, epidemiologists

4 http://agris.fao.org 5 https://rd-alliance.org/groups/wheat-data-interoperability-wg.html

http://agris.fao.org/

https://rd-alliance.org/groups/wheat-data-interoperability-wg.html

PUBLIC


1.2.4 Social Sciences

DESCRIPTION

In the context of Social Sciences there are two (2) envisaged use cases that aim to focus on slightly different aspects:

1. Automatic detection, disambiguation and linking of (named) entities in Social Science text corpora to

enhance indexing and searching

2. Automatic coding of unstructured answers in surveys

The first use case aims to provide solutions to the problem that social scientists spend considerable time on researching relevant publications and research data. Thus, there is a considerable interest in automatic methods for entity recognition and linking to ensure a reliable and context-sensitive retrieval of relevant entities (such as publications, data, persons, institutions, places, references, citations, scientific concepts). This includes reliable recognition and disambiguation of relevant entities (such as persons and vague data citations in scientific publications) as well as linking recognized entities within and across documents. This may also enhance information extraction (such as detection of definitions and hot topics) and generating knowledge maps on the base of linked entities. A further important issue of the use case is to make annotations comprehensively re-usable for the scientific community. This is also relevant for the analysis of communication documented in texts. Tracing scientific discourses over a large period of time may serve as an example for this.

The second use case addresses a well-known problem in survey research. Research data collected by surveys play a central role the Social Science research. Surveys usually contain elements that provide structured answers by participants but also unstructured elements where participants are encouraged to give answer in free text form. While structured answers can easily be maintained a recurring problem is to recognize and code unstructured answers according to code schemas to which the survey under study have to be mapped. Text mining may help to solve this problem..

POTENTIAL END-USERS

□ Social scientists seeking for information relevant to their research

□ Content and service providers

□ Social science researchers doing surveys.

1.2.5 Summary of initial use cases analysis

The following table provides an overview of information that is of interest to the context of the requirements elicitation processes discussed in this document, namely the type of end users, their content-related challenges where the OpenMinTeD project could provide solutions as well as an indicative list of content sources of interest to the users per thematic area.

PUBLIC


TABLE 1: SUMMARY OF THE REQUIREMENTS EXTRACTED FROM THE OPENMINTED ENVISAGED USE CASES

Thematic Area

Potential end-users Content-related aspects Content sources (indicative list)

Scholarly Communication

content providers,

such as publishers and

repositories,

service providers such

as OpenAIRE, CORE

and Europe PMC.

commercial and non-

commercial services’

developers, text-

miners,

scientometricians,

researchers and the

general public

funders and

government

organisations needing

to discover scientific

trends and evaluate

research impact.

authors of scientific

publications,

publishers, institutional

managers

OpenMinTeD will be

used to allow content

and service providers in

using text-mining

services to incorporate

semantic metadata

extraction mechanisms

and provide semantic

search mechanisms.

Information extraction

from full-texts of

research papers

Extraction of citations

from full-texts of

research papers

Content

recommendation of

related research

papers

Mining the licence of

research papers

Supporting scientific

knowledge discovery

Text-categorisation of

papers

Europe PMC content with

attached licensing info, of

which some is restrictive.

Publisher content with

restricted licenses, open

publisher content with CC

licenses (PLOS, Frontiers,

Copernicus, etc.).

OpenAIRE and CORE

repositories content with

no specific licences.

Publishers willing to

provide content abiding to

the legal specs (Frontiers,

PLOS, etc.).

Researcher networks

(Mendeley, Frontiers).

European Patent Office

data (www.epo.org) to

derive links to scientific

publications or supporting

data.

Life Sciences Biocurators

Researchers:

Clinicians, Chemists,

Bioinformaticians,

Computer scientists

SME’s

Pharma Industries

Improving performance

of TDM methodologies.

Promote the

development of

accessible real world

TDM applications.

Explore strategies to

integrate TDM results

with knowledge base.

Define formal and

practical solution of

TDM interoperability.

PMC Europe content with

attached licensing.

Publisher content with

restricted licenses, open

publisher content with CC

licenses.

Frontiers publications and

researcher network.

Agriculture / Biodiversity

food safety officers

and agencies

water quality

facilitate the discovery

and aggregation of

relevant content from

AGRIS bibliographic

database

agriculture-specific content

http://www.epo.org/

PUBLIC


managers

information

managers,

bionformaticians

researchers,

veterinarians,

epidemiologists,

breeders

diverse trusted sources

support added services

based on text-mining

mechanisms will

preview after each

search related

images/datasets that

are located/mentioned

inside the publications.

automatic linking of

different pieces of

information obtained

through TDM services

repositories

Foodborne Outbreak

Databases.

ProMED-mail Europe PMC

content with attached

licensing info.

GBIF portal

Encyclopedia of Life

Social Sciences

Social scientists

seeking for

information relevant

to their research

Content and service

providers in the

Social Sciences

Social science

researchers doing

surveys

disambiguation,

recognition, linking and

context-sensitive

retrieval of key entities

information extraction

from full texts

automatic coding of

unstructured answers in

surveys

making annotations

comprehensively re-

usable for the scientific

community

The Social Sciences Open

Access Repository (SSOAR)

Proceedings of the Annual

Meetings of the German

Society for Sociology

(DGS)

Social media content

relevant to Science

research

This information can be used as a basis for the next steps of the methodology of eliciting requirements and more specifically for the initial profiling of the personas of interest to the project. Based on this information, persons with characteristics that fit into one of the aforementioned categories will be engaged in the user elicitation activities of the project, as described in the following sections.

1.3 Overview of project stakeholders

The project consortium consists of partners with long experience in the area of text and data mining that will contribute their expertise in activities related to the elicitation of requirements from the potential end users of the project’s outcomes. The OpenMinTeD project partners can fall under six (6) major categories:

1. Research scientific communities, including EMBL-EBI, EPFL, INRA, CNIO, GESIS, AK and LIBER. These

communities are characterized as the text-mining service consumers and the corresponding partners

are expected to play a key role during the elicitation of infrastructure requirements. These

organizations may also be considered as User partners, who adapt the solutions provided by the

TDM experts in order to work on TDM-powered services for specific user communities. In addition, they

have close connections with the communities that they serve through their added-value services.

PUBLIC


2. Text & Data Mining (TDM) experts, including ARC, UNIMAN, UKP-TUDA, USFD, INRA and CNIO,

leaders in the text-mining domain, building and maintaining systems that actually serve customers.

They are the organizations that use the infrastructure available for their TDM-powered solutions and

related infrastructure.

3. e-Infrastructure providers, with expertise in building and supporting e-infrastructures that can be used

(and is actually used) for the services to be developed by the project. Project partners ARC, USFD,

UKP-TUDA, AK and GRNET are included in this category.

4. Content providers that are providing access to Open Access content. ARC, AK, GESIS and Frontiers

are included in this category.

5. Legal experts that share their expertise on legal aspects such as international copyright and

intellectual property laws. UVA and ARC are the partners included in this category.

6. Industry: represented by AK which provides the business perspective contributing to the sustainability

and exploitation of the project results, defining appropriate business models for SMEs using the

OpenMinTeD infrastructure. AK is also considered a User partner.

This combination of different partner types allows the OpenMinTeD project to provide a complete set of TDM-powered solutions that will meet the needs of a wide variety of stakeholders.

In addition, the project has already identified a number of potential stakeholders that will benefit from the outcomes of the project. These stakeholder groups can be summarized as follows:

Small publishers, scholarly societies and repositories that would benefit from content that is more

visible and therefore more widely used by the researchers.

Text mining and language researchers that would benefit from their interaction with and contribution

to the development of text mining services and various types of related software resources.

Reference tools and social networks used by researchers that will make use of the TDM-powered

services of the project.

SMEs that will use the OpenMinTeD specifications and services to find/compile services to more

advanced or innovative products.

Funding bodies that will be able to raise awareness on TDM need for infrastructure services and

policy harmonization.

Research Communities, such as ESFRI projects6, that will be able to integrate TDM and open science

awareness into their training material.

The definition of the expected user communities for the project will allow the engagement of targeted users of interest for the project and its expected outcomes.

Of specific interest to the activities described in this document as the partners who are actively involved in research scientific communities, referring to text-mining service consumers and the corresponding partners are expected to play a key role during the elicitation of infrastructure requirements. These project-induced communities will be the ones providing the requirements for

6 https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=esfri

https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=esfri

PUBLIC


designing the TDM-powered solutions of the project, as well as to adapt these solutions at a later stage. The project partners that fall into this category are the following:

EMBL-EBI: EMBL-EBI’s Cheminformatics and metabolism team provides the biomedical community with information on metabolism through the development and maintenance of MetaboLights, a metabolomics database and archive and ChEBI, EMBL-EBI’s database and ontology of chemical entities of biological interest.

École Polytechnique Fédérale de Lausanne (EPFL): The Neuroinformatics team at EPFL, associated with the Human Brain Project, the International Neuroinformatics Coordinating Facility and Neuroscience Information Framework, are developing KnowledgeSpace, a community-based semantic wiki for living review articles providing semantically linked data and employing curated and community contributed vocabularies and ontologies.

Institut National de la Recherche Agronomique (INRA): INRA carries out mission-oriented research for high-quality and healthy foods, competitive and sustainable agriculture and a preserved and valorised environment. It produces and enables access to knowledge to the international community of researchers and practitioners in agriculture but also towards policy makers and society.

Agro-Know (AK): Agro-Know is a SME that provides meaningful services on top of open data in the agrifood sector. Agro-Know has a long experience in data and metadata management and has a strong involvement in various agrifood research communities worldwide. Through a strategic partnership with the Food & Agriculture Organization (FAO) of the United Nations, the Chinese Academy of Agricultural Sciences, and the ARIADNE Foundation, Agro-Know is hosting the Data Processing Unit of the traditional AGRIS service – a global information system providing access to more than 8M bibliographic records, collecting and making accessible bibliography on agricultural science and technology.

Spanish National Cancer Research Centre (CNIO): CNIO is one of the main organizers and initial founding members of the international BioCreative7 challenge initiative, the main initiative to evaluate and promote the implementation of text mining systems applied to life sciences.

Gesellschaft Sozialwissenschaftlicher Infrastruktureinrichtungen (GESIS): GESIS, the Leibniz-Institute for the Social Sciences is the largest infrastructure institution for the Social Sciences in Germany, offering empirical social researchers services for the various phases of the research data cycle.

Ligue des Bibliothèques Européennes de Recherche – Association of European Research Libraries (LIBER): LIBER is the principal association of the major research libraries of Europe, and along with ARC (through OpenAIRE) represent the scholarly communication domain.

1.4 Key milestones of requirements analysis

In order to work out a concise requirements elicitation plan, the scheduling of the different project activities needs to be taken into account. The following Table presents the requirements related tasks and the related milestones and deliverables that need to be delivered during the duration of the project and their respective deadlines. A detailed plan is section 5.

7 www.biocreative.org

http://www.biocreative.org/

PUBLIC


TABLE 2: KEY MILESTONES AND OUTCOMES TIMEFRAME

Key Milestones Leader Delivery Date

T4.1 Requirements elicitation (M1-14) AK 31/07/2016

D4.1 Requirements methodology AK 31/07/2015

MS14 Interim Requirements Report AK 30/11/2015

T4.2 Requirements analysis and harmonization (M3-16) AK 30/09/2016

MS19 Interim Requirements Analysis Report AK 29/02/2016

D4.2 Community Requirement Analysis Report AK 31/05/2016

D4.3 OpenMinTeD functional specifications ARC 31/07/2016

PUBLIC


2| DESIRED OUTCOMES The methodology described in this document aims to provide the means for extracting information from potential stakeholders of the project, as they have been already identified. The requirements extracted from these stakeholders will be used for shaping the expected outcomes of the project in the form of e-infrastructure, text and data mining services as well as TDM-powered services for the end users. This chapter aims to provide a description of what each one of these expected outcomes should look like, facilitating the work of the project partners responsible for this task.

2.1 Profiling user personas

The first step in the proposed methodology is the identification of the different user profiles (personas) of interest to the OpenMinTeD project. The project has already identified a number of user groups that is expected to benefit from the envisaged services of the project. These groups are the following:

Researchers: people actively involved (i.e. conducting) scientific research.

Curators: usually the content specialists, which are responsible for an institution's collections (digital

and analogue).

Text Miners: scientists working on deriving high-quality information from text.

Service Developers: software developers working on the development of services and the integration

of the text- and data- mining services.

Publishers: Entities (persons or organizations) that produces and distributes publications such as

journals and books in printed or digital form.

People involved in Libraries and Databases, such as librarians, information scientists, knowledge and

information managers, content and repository managers etc.

Research Communities that engage researchers working on topics of common interest.

SMEs, referring to companies with a limited number of employees that are working on topics of

interest to the OpenMinTeD project.

Each one of these groups consists of one or more user types, as the ones described in the previous Chapter. A persona is defined as a fictional character that is created in order to represent the different user types that may use a service. A persona is defined as a representation of the behavior of a hypothesized but validated group of users. The profile of a persona is compiled from information extracted through interviews with real users; this information is collected, evaluated and validated before it is merged with related information from interviews with other users with similar characteristics. This compilation of information is used for developing the profile of a persona that includes behavior patterns, skills, attitudes, and other attributes. The use of personas is common in workflows related to product development and validation, as it allows the design of a product based on the needs assigned to a specific persona.

In the case of OpenMinTeD, the following attributes are used for the development of a persona:

PUBLIC


Persona description and demographics;

Data types of interest to each persona as well as data types already being used (own data or data

from external sources) by the specific persona;

Top challenges (data-related issues) faced by each persona regarding improving access to research

data;

Features of the envisaged solutions for the specific community.

These different types of information are analysed in the following sections.

For the recording of these requirements, a template called Persona Graph is used. This allows the uniform description of different personas by different project partners involved in this process. The following figure presents an example of an empty Persona Graph with some initial questions to be answered.

FIGURE 2: EMPTY PERSONA GRAPH WITH SOME INITIAL QUESTIONS TO BE ANSWERED

The Persona Graph is based on an adapted version of the Lean Canvas, as it was presented by Maurya (2012). The Lean Canvas is a template that aims to include information related to the development of a new product and its business model. It focuses on addressing broad customer problems and solutions and delivering them to customer segments through a unique value proposition. The version used in the requirements’ elicitation methodology is adapted in a way

PUBLIC


that allows the collection of information focused on the project’s expected developments, without the need of collecting additional information or going into too many details that would not be substantial for the project.

A Persona Graph contains information that is collected through the use of the online questionnaire, interviews or through the focused workshops, as described in the next sections. The first version of the Persona Graph is usually hand-drawn, using markers and paper, as shown in the following figure.

FIGURE 3: A PERSONA GRAPH PRODUCED USING MARKERS AND A FLIPCHART SHEET DURING THE OPENMINTED KICK-OFF MEETING

The prioritization of the information recorded in each Persona Graph, based on the importance of each statement according to the users, is of major importance. In this direction, in the boxes related to the Data Requirements, Information Challenges and Solution Features the most important (according to the users) statements should be recorded on the top of the list and the least important at the bottom.

The Persona Graph template is provided in the Annex of this document. The following figure shows how the information collected during the workshop was transferred from the handwritten form to the template.

PUBLIC


FIGURE 4: EXAMPLE OF A COMPLETED PERSONA GRAPH FOR THE AGRICULTURE / WHEAT USE CASE, FROM THE OPENMINTED KICK-OFF MEETING

2.2 Content analysis

Content analysis is the second step of the proposed workflow. It refers to the process that aims to identify, record and analyze the characteristics of the content of interest as well as the content already in use for the personas identified and studied in this task. The following parameters are included in the analysis of the content analysis:

Content type (e.g. publications, datasets, presentations, maps etc.);

Content format (e.g. PDF/DOC files, PPTs, ZIP files etc.);

Content volume (number of resources or size in MB/GB);

Content source (database, repository, website or web portal etc.) and interoperability options (e.g.

OAI-PMH, API, RSS, SPARQL endpoint etc.);

Content Language (referring to the language of the content and its associated metadata);

Intellectual Property Rights (IPR) and licensing information, referring to use, reuse, adapt and

redistribute, among others.

PUBLIC


The detailed analysis of the content identified through this process will allow the optimization of the project’s services so that they will meet the specific requirements of specific communities making use of specific content.

FIGURE 5: IDENTIFICATION OF CONTENT / DATA REQUIREMENTS FOR THE WHEAT RESEARCHER PERSONA, THROUGH THE PERSONA GRAPH

The results of the preliminary analysis so far, through the description of the envisaged use cases, show that the vast majority of content of interest for the purposes of the project is indeed in the form of text. The following table provides information on the content sources for each one of the thematic areas of interest to the OpenMinTeD project.

TABLE 3: CONTENT SOURCES OF INTEREST PER THEMATIC AREA

Thematic area Content sources (indicative list)


Europe PMC content with attached licensing info, of which some is restrictive.

Publisher content with restricted licenses, open publisher content with CC licenses (PLOS, Frontiers, Copernicus, etc.).

OpenAIRE and CORE repositories content with no specific licences.

Publishers willing to provide content abiding to the legal specs (Frontiers, PLOS, etc.).

Researcher networks (Mendeley, Frontiers).

European Patent Office data (www.epo.org) to derive links to scientific publications or supporting data

http://www.epo.org/

PUBLIC


Thematic area Content sources (indicative list)

Life Sciences PMC Europe content with attached licensing

Publisher content with restricted licenses

Open publisher content with CC licenses

Frontiers publications and researcher network

Agriculture/Biodiversity

Foodborne Outbreak Databases.

ProMED-mail Europe PMC content with attached licensing info

GBIF database

FAO AGRIS

Social Sciences The Social Sciences Open Access Repository (SSOAR)

LeibnizOpen: OA publications of Leibniz institutions researchers

Proceedings of the Annual Meetings of the German Society for Sociology (DGS)

Additional content sources and types will be identified through the profiling of the user personas using the methodology described in this document.

During the OpenMinTeD kick-off meeting, the outcomes of three of the envisaged use cases in terms of content / data requirements were collected and presented. These requirements are presented in the following table.

TABLE 4: INITIAL CONTENT REQUIREMENTS AS IDENTIFIED DURING THE OPENMINTED KICK-OFF MEETING

Use Case Content Requirements

Biocreation Access full text biomedical literature (also pre-publication)

Linked information (interdisciplinary)

Exploration tools to navigate & discover content from different sources

Access to other domain experts

Community sharing/ Social tools

Scholarly Communication Primary sources

Body

Articles

Dissertations

Multimedia

PUBLIC


Use Case Content Requirements

Metadata

Licenses

Agriculture / Wheat Needs to link raw data with publications

Purpose: find similar/complementary info in both types of sources

Needs knowledge about previous work on wheat phenotype and environmental info, including

Traits of plants (e.g. disease resistance)

Varieties

Culture conditions (cultivation)

Genetic info

Needs complementary info/ surrounding info on related topics that affect wheat yields beyond her own expertise

Social Sciences N/A

It should be noted that during the OpenMinTeD kick-off meeting, the use cases of Social Sciences were not introduced.

2.3 Information-related challenges and problems

One of the aspects of the definition of each persona and the identification of its attributes is the identification of information- and content-related challenges and problems that a specific persona is facing. These problems may have to do with identifying, retrieving, accessing and managing the content of interest, among others.

The identification of these content-related issues will allow the project partners to work on services that will address these issues and provide meaningful, TDM-powered solutions. These solutions may also be proposed by participants of the project’s user requirements’ workshops or through interviews.

When recording the information in the corresponding box of the Persona Graph, it is really important to indicate the importance of each entry by prioritizing them in the list. In this direction, the most important statement (according to the users) should be listed on the top of the list with the least important one at the bottom of the same list.

PUBLIC


FIGURE 6: IDENTIFICATION OF INFORMATION CHALLENGES FOR THE WHEAT RESEARCHER PERSONA, THROUGH THE PERSONA GRAPH

TABLE 5: INFORMATION-RELATED CHALLENGES AS IDENTIFIED DURING THE OPENMINTED KICK-OFF MEETING

Thematic area Information-related challenges (indicative list)


Assigning & curating metadata

How to describe license

How to define the link between text and metadata

Figure out licensing

Technology updates

Exposing metadata to many aggregators

Too many standards

Create relationship to publishers

Visibility

Life Sciences Literature is copyrighted-needs clear indicators on the copyrights of papers

Information scattered throughout different papers & different sources

Extracting different dimensions of information

Knowledge hidden in tables, figures and supplementary data

PUBLIC



Filter different types of mentions of common entities to create new use cases


Raw data & publications are always separate

Info expressed in many ways – dispersed in silos

Challenge is to identify, to normalize and integrate it in existing knowledge

No user-friendly tools to search, visualize and check/ correct accuracy of info

Has the necessary subscriptions but no way/ not allowed to download and share the publications. No software/ legal barriers

Bioinformaticians are not text mining people; are not interested

Proper use of TDM would save time and money

Needs to filter non-relevant journals/ wheat varieties that are commercialized in her country

Social Sciences N/A


2.4 Description of information services & systems

The OpenMinTeD project has compiled a list of available information services and systems that will be used as use cases and will be enhanced through the integration of the OpenMinTeD services. On top of that, additional ones are expected to be identified during the user requirements elicitation. Both existing and new information services and systems need to be described in details by the users during the interviews and related workshop sessions. This information needs to be extracted from the expected end users of the project’s outcomes through the interactive sessions of the related workshops to be organized by the project as well as through interviews and other means of extracting requirements. These services and systems are expected to be enhanced through the integration of TDM-powered functionalities that will enhance the experience of their end users.

An example of such a platform is AGRIS8, a large bibliographic database of more than 8M bibliographic records. These records are already linked to various external sources through the use of the AGROVOC thesaurus as the backbone of its linked data approach allowing users to retrieve related resources from these external sources. The AGRIS use case aims to allow users of the platform to retrieve components of research publications, such as images, charts and datasets, which are currently integrated in the document through the use of semantically enriched, TDM-powered services.

8 http://agris.fao.org

http://agris.fao.org/

PUBLIC


FIGURE 7: IDENTIFICATION OF SOLUTION FEAUTURES FOR THE WHEAT RESEARCHER PERSONA, THROUGH THE PERSONA GRAPH

TABLE 6: SOLUTION FEATURES AS IDENTIFIED DURING THE OPENMINTED KICK-OFF MEETING



Service to provide & curate metadata

Service to automatically extract the funding source & the license type

Automatic linking to knowledge bases (entities), datasets, software code

Interfacing to intelligent services, e.g. semantic search, elsewhere

Training activities as part of dissemination identifying related content from everywhere

Automatically disambiguating authors’ names

Life Sciences Repository of copyrighted material, use of Open Data, integrated OMICS

Dynamic & Interactive interface with Text Mining Workflows integrated to Biocuration Workflows

Community Curating Tool

Crowdsourcing of info & community involvement

Dynamic filtering, relevance detection, provenance info

PUBLIC




Service that connects data & publications presented in a user-friendly way

Possibility to query in a specific way or browsing through knowledge base without using exact term

Measurement of quality/ citation impact

Possibility to run bioinformatics tools with the data extracted from text and databases

Should have alternative means to check validity

Interface to check and correct extracted data going back to the text

Data is accessible to everyone

Social Sciences N/A


2.5 Description of current usage scenarios of information services

& systems

A set of requirements of high importance to the project will be related to the description of the interaction between the user and the information services and systems. The way that users interact with such services need to be described in details and depicted as a set of steps consisting a workflow. This information will allow a better understanding of the activities of a user when using a service or a system and therefore a better identification of his/her requirements, steps that could be improved or enhanced with the use of the OpenMinTeD services etc.

2.6 Design interface wireframes

During the interactive sessions of the workshops to be organized by the project for elicitation of user requirements, participants will be asked to share their ideas on the integration of the new, envisaged service or feature powered by text mining, in a new or existing website or repository. This design refers to the location of the new service box or button in a way that the user will feel comfortable with, e.g. not interfering with the current workflow of the user. These requirements will drive the User Interface Design (UI) that should be considered and implemented by the corresponding project partners. The wireframes that will result from the workshops will be in the form of a drawing in a flip chart paper. The final wireframes will be more professional and will be created by the partners using specific software tools (see example in figure 8).

PUBLIC


FIGURE 8: WIREFRAME PRODUCED FOR THE AGRICULTURE / WHEAT USE CASE USING A SOFTWARE TOOL, DURING THE OPENMINTED KICK-OFF

MEETING

2.7 Description of user-envisaged usage workflows

After describing the content-related issues and proposing potential solutions that would address these issues, usage workflows related to the use of these envisaged services need to be described. More specifically, potential users of the OpenMinTeD services (e.g. interviewees or participants of the project’s workshops) will be asked to describe the interaction between the users and these services in the form of a usage workflow.

A usage workflow refers to the description of all the activities that take place for a user completing a specified task; more precisely, a usage workflow is defined as a use case drawn out into a step-by-step procedure, sometimes accompanied by a flowchart. For example, if a user wants to extract a figure from a research publication, he/she would have to go through a number of steps, which could be more or less the following:

The user visits a content repository;

He/she uses a search term for retrieving publications related to his/her work;

He/she retrieves a number of results;

He/she filters the more relevant results through the use of filters;

He/she selects a specific publication

He/she goes through the publication and identifies an interesting reference

He/she clicks on a button allowing him/her to retrieve only the figures of this publication

PUBLIC


FIGURE 9: PROCESS WORKFLOW FOR THE AGRICULTURE/WHEAT USE CASE, DRAWN DURING THE OPENMINTED KICK-OFF MEETING

All these steps need to be recorded by the users in details and in a logical order. This will allow the OpenMinTeD project partners to provide users with a set of functionalities that will meet the expectations of the users, taking them into consideration as User Experience Design (UX) requirements.

PUBLIC


FIGURE 10: WORKFLOW & WIREFRAMES FOR THE AGRICULTURE / WHEAT USE CASE, FROM THE KICK-OFF MEETING

2.8 List resulting requirements for TDM-powered outcomes

services

The feedback and requirements collected during the interviews, workshops and other means organized for this purpose will have to be collected, organized and validated before it is used for the development of the corresponding OpenMinTeD TDM-powered outcomes, such as e-infrastructure, text mining and final online services for end users. All additional related information collected through the aforementioned means will also have to be transformed into the corresponding requirements. The resulting requirements will be classified according to their topic, in the following categories:

Requirements for the definition and optimal description of the user persona (referring to the

demographics of the persona);

Requirements related to the content sources already used or of interest to each persona;

Requirements related to the content-related issues of each persona;

Requirements related to the expected TDM-powered services of each persona;

Requirements related to the interaction of a user with the new service(s);

PUBLIC


Requirements related to the integration of each envisaged service in an existing or new website,

portal and other content source.

The requirements collected and validated through the processes described in the following chapters will drive the development of the project’s TDM-powered services both at functionality and at user interface level.

PUBLIC


PUBLIC


3| METHODOLOGY Project partners involved in tasks related to the elicitation of requirements will have to organize and implement a number of interviews with specific stakeholders that fall into the user groups identified by the project, as described earlier in this document. The initial profiling of the personas may take place in the form of customized online questionnaires or interviews with users featuring the characteristics of interest to the OpenMinTeD project. In both cases, the same set of predefined questions should be used; in the former, the responses are provided by the anonymous user himself/herself while in the latter the responses are recorded by the facilitator (project partner).

The validation of this feedback and elaboration on the responses collected will be achieved during dedicated, focused workshops that aim to engage different types of users (personas) and allow them to work closely on the description of solutions that each different persona is facing. During these workshops, participants will be asked to describe each persona in details, providing detailed information on its demographics, content-related sources and content-related issues that need to be addressed (per persona). In addition, during the workshop, they will be asked to design how an initial design will look like in the form of wireframes, using paper and markers.

The next step of the process has to do with the validation of this initial feedback (referring to both the persona and the wireframes) with feedback acquired through online questionnaires, interviews & workshops. Only after this information has been validated can the project partners involved in this task confirm the suggested personas and design the wireframes using an appropriate software tool.

The proposed methodology consists of the following steps:

1. Initial profiling of a user persona

a. A generic online questionnaire is developed for the collection of initial requirements and the

initial profiling of a persona.

b. The generic questionnaire is adapted by the project partners responsible, in order for it to

meet the specific needs of each user community.

i. E.g. translated the survey questionnaire (all the results should be in English – report

and interviewees’ answers)

c. The adapted questionnaires are used for the initial profiling of each persona for each use

case described in this document.

i. one adapted questionnaire for each use case

ii. partners distribute the survey questionnaire, using their own means

d. Feedback is collected from the online questionnaires by the corresponding project partners.

e. Profiling can also take place through interviews (face to face, through Skype or phone call)

2. Validation of the collected requirements

a. Workshops are organized for the validation of the feedback collected regarding the

personas.

b. New information regarding the personas is acquired.

c. Feedback from each workshop is collected through reporting

d. Validation can also take place through questionnaires

3. Brainstorming on new TDM-powered services, in terms of functionalities, user interface (wireframes)

and interaction between the user and the services (usage workflows)

PUBLIC


4. Transformation of validated information and user requirements to requirements that will allow

technical partners of the project to work on the TDM-powered outcomes to serve the corresponding

user communities and personas.

Reports from all workshops are used for the analysis of requirements that will drive the development of services by the project partners

3.1 Initial profiling of targeted personas

The initial profiling of personas, including content sources of interest and relevance to them as well as information on content-related challenges and problems that these personas face is probably the most important aspect of this work. The information extracted from these personas will have to be transformed to requirements that will drive the development of the project’s services.

3.1.1 Using online questionnaires

For the initial profiling of targeted personas, the use of online questionnaires is proposed, as they provide a free, efficient and quick mean for collecting requirements with only limited effort required for setting up the initial questionnaire. While any online survey tool can be used for acquiring requirements, Google Forms9 is suggested for this purpose; it is a free, widely used tool, allows automatic export of responses in a spreadsheet and easy processing in this form. In addition, a Google Form can be easily and collaboratively revised by team members working on a specific use case, adapted and reused with modifications for serving different needs, only requiring a valid Gmail address. However, the use of alternative tools such as LimeSurvey10 and SurveyMonkey11 may also be considered for this purpose. For example, Agro-Know plans to use the integrated Drupal form functionality of its Agro-Know Stem platform12.

FIGURE 11: EXAMPLE DRUPAL-BASED FORM FOR COLLECTING RESPONSES: THE AGRIS USE CASE

9 https://www.google.com/forms/about 10 https://www.limesurvey.org 11 https://www.surveymonkey.com 12 for example see http://www.akstem.com/agris

https://www.google.com/forms/about

https://www.limesurvey.org/en/

https://www.surveymonkey.com/

http://www.akstem.com/agris

PUBLIC


What is important is that the responses collected through any online questionnaire should be exported and stored as a spreadsheet file that will allow the processing and analysis of the extraction of requirements.

STRUCTURE OF THE ONLINE QUESTIONNAIRE

An initial, generic online questionnaire in the form of Google Form is available online13 and accessible upon request. Partners working on specific use cases for the project are encouraged to use this questionnaire and adapt it in order to meet the specific needs of the communities involved in the corresponding use cases. In this context, online copies of this questionnaire may be provided to partners working on this task, so that it can be freely adapted.

FIGURE 12: PARTIAL VIEW OF THE OPENMINTED GENERIC QUESTIONNAIRE

The structure of this generic questionnaire is the following:

13 https://docs.google.com/forms/d/1_OBM6cAl28MdhMSMU20xo2TePwses_JGhpnNoZitG2k/viewform?usp=send_form

https://docs.google.com/forms/d/1_OBM6cAl28MdhMSMU20xo2TePwses_JGhpnNoZitG2k/viewform?usp=send_form

PUBLIC


Questions 1-7 refer to demographics, aiming to provide a generic profile of the persona;

Questions 8-10 refer to the content / data of relevance and interest to the specific persona;

Question 11 refers to content-specific challenges faced by the persona. This question should start with “Difficulty to...” or “Lack of...”. This question also aims to extract the importance of each one of the challenges reported by the person completing the survey by using a scale with 5 values ranging from Very important to Not important at all.

Question 12 refers to the suggestion of potential solutions for addressing the content-specific challenges and enhancing existing content portals and websites. This question should start with “I would like to...”. This question also aims to extract the importance of each one of the challenges reported by the person completing the survey by using a scale with 5 values ranging from Very important to Not important at all.

Despite the fact that each partner is encouraged to adapt the questionnaire so that it better reflects the requirements of their specific communities, it is important that the structure of the questionnaire (e.g. the four different sections) is maintained.

CONSIDERATIONS REGARDING THE ONLINE QUESTIONNAIRE

A number of additional aspects should be taken into consideration, for ensuring the maximization of the questionnaire’s outputs.

For each use case, a minimum of fifteen (15) responses is required. While this number refers to the

total number of responses collected by any mean (e.g. including face-to-face or Skype/call

interviews), it is important that a number of at least 15 responses are provided for each use case.

The questionnaire can be used either as an online survey by sharing the public URL of the survey (not

the internal one that allows full access to the form) or as a script for direct interviews, i.e. through

Skype, phone calls or even face to face.

The questionnaire can also be used for collecting preliminary feedback and requirements from

registered participants of any of the events planned for collecting requirements (see next section).

Then, participants may elaborate on their initial feedback through the interactive sessions of these

events, which will also allow the validation of the personas and the feedback acquired through them.

The online questionnaire may remain open / active for the collection of as many responses as

possible. It can also be used for the collection of feedback on solutions and features at a later stage,

when the scenarios for each use case will run.

3.1.2 Interviews

Another mean for the elicitation of requirements is through interviews. Interviews may be conducted with potential stakeholders, using the same set of questions included in the online questionnaire, in any of the following means:

face to face

through Skype

through a phone call

Responses collected through the interviews should be recorded either using the online questionnaire (so that the same spreadsheet will contain all the responses) or in a spreadsheet

PUBLIC


that contains the same fields as the one provided by the online form, in order to ensure the homogeneity of the templates and facilitate the aggregation of responses from various means.

A minimum of fifteen (15) interviews per use case are expected to provide sufficient feedback for the purposes of this task.

3.1.3 Using the Persona Graph

The Persona Graph consists of four (4) different boxes that need to be completed in details:

1. Persona characteristics (demographics): Includes name, role in team, affiliation, research interests, field

of expertise etc.

2. Data/content related requirements: What data/content types are of interest to the specific persona

and which one is this persona currently using?

3. Key information challenges: Does this persona face any challenges related to identifying, accessing,

retrieving and managing data/content of interest?

4. Features of the solution: What is the expected solution that would help the persona address the

aforementioned challenges?

The following figure provides an example of the information collected by a specific user for building the persona profile.

FIGURE 13: INFORMATION ABOUT A PERSONA TO BE TRANSFERRED TO A PERSONA GRAPH

It should be noted that during the collection of information for each persona, the importance of each need and challenge should also be mentioned and the corresponding input should be prioritized accordingly in the persona profile table. More specifically, the most important statement for each category should be at the top of the list while the least important one at the bottom of the same list.

PUBLIC


3.2 Validation of users’ requirements

After the targeted personas have been identified and their initial profiles have been created (including demographics, content-related issues and proposed solutions), they need to be validated in order to ensure their accuracy and eligibility for being used as the basis for the project’s envisaged services. This validation may take place in the form of smaller- or larger-scale events.

Small scale events include interviews, that can take place either face to face, through phone or online (e.g. Skype calls), as described earlier.

Large scale events include meetings with teams of stakeholders and even workshops. They can facilitate a high number of participants and should include sessions for teamwork. Such events need to be based on a well-defined agenda that will include at least the following sessions:

1. Introduction to the event, including the methodology to be followed throughout the event;

2. Introduction to the OpenMinTeD project;

3. Presentation of the specific user community and related content-related issues;

4. Interactive session for the profiling of a persona (Persona graph);

5. Interactive session for the design of initial wireframes of the envisaged TDM-powered services (hand-

drawn of the wireframe on paper;

a. It will be useful if text miners partners will be included into the discussions in order to guide the

participants

6. Interactive sessions for the definition and design of user-related workflows that will highlight the

interaction of the users with the aforementioned services (hand-drawn of the workflow on paper).

a. It will be useful if text miners partners will be included into the discussions in order to guide the

participants

The outcomes of each user requirements’ event should include the following:

1. input per persona in the Persona Graphs provided by the facilitators. It should be noted that in the

case of such events, there should be parallel interactive sessions running per persona. The outcome

should look like the one in Figure 4 of Section 2.5.

2. a visualization of the envisaged services to be developed by OpenMinTeD (per persona) (hand-

drawn, using markers and paper)

3. a proposed workflow per persona (hand-drawn, using markers and paper), explaining the use of such

as service and the role of different types of users.

Chapter 4 of this document provides detailed information on the preparation, implementation and reporting of events like workshops for the elicitation of user requirements.

3.3 Designing final wireframes with required features for the

information services of the use cases

Through the aforementioned means of acquiring feedback from the specific user types, a number of expected services are expected to be identified, described and analyzed for each persona within each one of the use cases studied by the project. Such brainstorming sessions should take

PUBLIC


place in the context of workshops, where different types of potential end users are expected to participate.

As regards the design of wireframes, the following steps should be followed:

1. Initial wireframes will be drawn by the participants of the events (one wireframe per persona),

highlighting the way that these services will be integrated in existing websites, finders and other

sources of content.

2. Additional requirements and feedback collected through interviews or online questionnaire to be

taken into consideration for improving or revising the initial wireframes

3. Refined wireframes to be designed by the OpenMinTeD project partners who are involved in this task,

using software such as Balsamiq14, Pencil Project15 and Mockingbird16. The use of Balsamiq is

proposed but partners are allowed to use any related software they feel more comfortable with, also

taking into consideration the software’s business model.

4. Follow-up interviews to take place for the validation of the proposed wireframes, with selected

representatives of the targeted users’ community (solution validation)

Agro-Know will provide training on the aforementioned steps, focusing on the design of the wireframes and workflows using Balsamiq and other tools, where applicable.

3.4 Translating features into requirements for text mining services

and e-infra

The definition and description of the expected TDM-powered services will provide a number of requirements that are expected to shape and drive the development of the OpenMinTeD services. The features of these services will need to be transformed into the corresponding requirements so that the partners of the project will work on and provide the services described by the users.

These features will be collected from the aforementioned means of requirements’ elicitation, including the interviews, the online questionnaires and the workshops to be organized by the project partners for this purpose, and will have to be transformed into requirements to be used by the project’s technical partners for the development of text and data mining services.

14 Desktop app or web app with 30 trial https://balsamiq.com 15 Open source desktop app http://pencil.evolus.vn 16 Web app https://gomockingbird.com

https://balsamiq.com/

http://pencil.evolus.vn/

https://gomockingbird.com/

PUBLIC


4| TOOLS AND TEMPLATES The implementation of the methodology described in this document requires the use of tools for the extraction, collection and organization of the user requirements. In order to facilitate the process, a number of tools and templates have already been developed and shared with the project partners working on the user requirements. These will need to be adapted by the project partners in order to include the specific information required for meeting the requirements of specific user communities.

The following sections provide information on these templates, the full version of which can be found as an Annex to this document.

4.1 Generic questionnaire for user & content profiling

The initial mean for the elicitation of requirements is the online questionnaire that will be set up and shared with potential users of the TDM-powered services. In order to facilitate the process, a generic questionnaire was developed consisting of generic questions that can be revised by each partner responsible in order to meet the requirements of each community.

The generic questionnaire was developed as a Google Form and is available at http://goo.gl/forms/FopxFlNOao. It consists of twelve (12) questions in four (4) groups, as mentioned earlier in this document. The generic questionnaire contains questions based on the AGRIS use case (to be used as examples) as well as comments in each question that facilitate its adaptation; these comments should be removed before publishing and sharing the questionnaire with the users.

FIGURE 14: PART OF THE GENERIC ONLINE QUESTIONNAIRE WITH COMMENTS UNDER EACH QUESTION

The generic questionnaire can also be found in the Annex B of this document.

http://goo.gl/forms/FopxFlNOao

PUBLIC


4.2 Example of customized questionnaire for agricultural

personas

The first step in the process regarding the questionnaire is the adaptation of the generic questionnaire of each one of the use cases studied by the project. In this context, the first customized questionnaire that was developed for profiling members of a specific community is the one for the AGRIS use case. In this case, the number and structure of the questions were kept intact while the questions are adapted in order to be appropriate for the identification of requirements from the agricultural research and information management community.

FIGURE 15: PART OF THE AGRIS USE CASE ONLINE QUESTIONNAIRE

The specific questionnaire is addressed to the users of the AGRIS platform, who are mostly researchers aiming to retrieve research publications related to their work. In this context, all pre-defined responses have been adapted in order to be among the most prominent for the specific user community.

The questionnaire is currently available at http://gTo.gl/forms/c0fHKLNP7d and is ready to be circulated among the potential stakeholders and targeted users of the use case. It may also be found in the Annex C of this document.

4.3 Guidelines for the organization of requirements' workshops

per user partner (for persona validation & brainstorming)

The methodology proposed in this section is based on the organization and implementation of face-to-face workshops, which will allow the direct communication between the facilitator(s) and

http://goo.gl/forms/c0fHKLNP7d

PUBLIC


the stakeholders. These workshops can vary in scale (from meetings with less than 5 people to larger scale events where participants will work in groups). The planning, organization and implementation of events, such as small-scale meetings and larger-scale workshops that will engage the stakeholders defined by the project and will allow the collection of requirements that will drive the outcomes of the project is an important task that will be analyzed in the following sections.

The basic steps for the extraction and validation of user feedback during a workshop are the following:

1. Sessions during which users will provide their requirements related to the envisaged services to be

provided by the OpenMinTeD project;

2. Design of the proposed envisaged services by users drawing on paper (flip chart paper).

In practice, during a workshop the project partners will only need to collect and record the challenges/issues for each persona, and ask participants to draw potential new services using markers and paper as well as to show how the user (each persona identified) interacts with the service (workflow design).

4.3.1 Practicalities

Basic aspects that should be taken care of for the planning and implementation of a workshop include:

Identification of the target group and selection of people to be invited. This will allow the

adaptation of the material provided in this section and the better organization of the event. The list of

participants for each workshop should take into consideration the personas identified for the specific

use case.

Invitation: After selecting the target group, potential participants should be formally invited through

an invitation letter (probably in the form of an e-mail).

Pre-Workshop Checklist: Before the beginning of the workshop you should make sure that the

following have been developed and are ready for usage:

o List of attendants

o Agenda

o Presentations

o Reporting Template

Running the Workshop: A number of practical issues should be examined before the implementation

of the Workshop, in order to ensure its success:

o Internet connection

o PC or laptop with PowerPoint installed for the presentations

o Projector (if needed) for delivering the presentations

o An internet browser, such as Mozilla Firefox, Google Chrome, Opera or Safari

o Flipchart sheets and/or whiteboard for drawing

Reporting: After the end of the workshop, facilitators are expected to provide a small report

including a description of the workshop, the feedback gathered and photos of the event using a pre-

defined template.

PUBLIC


4.3.2 Interactive Sessions

There are three envisaged interactive sessions for each workshop:

Interactive Session 1: Identification and definition of the OpenMinTeD personas. For each persona, the following information is required:

1. Persona Demographics

2. Data Types of interest for each persona

3. Top challenges related to accessing research information for the specific persona

4. Proposed solutions for the aforementioned challenges for the specific persona

Interactive Session 2: Design of the envisaged OpenMinTeD-related solution for each persona.

Interactive Session 3: Design of the workflows for each persona = depicting the interaction between the user and the new service

You can find more information about each one of these sessions in the PPT templates provided for each session in the Annex of this document.

4.3.3 Materials required for the implementation of the event

The organization of an event for eliciting user requirements requires the preparation of a number of materials in the form of agenda, presentations, script for facilitating the implementation of the event etc. The following sections aim to provide the necessary templates of various types of materials that are required for the implementation of an event. It should be noted that these templates are provided as generic material that provides a basic functionality; they can (and should) be adapted in order to meet the specific requirements of each event.

INVITATION TO THE WORKSHOP

A template for the invitation text is available through the project’s Redmine installation17 and also included in the Annex D of this document. This template can be freely adjusted in order to meet the style of the organizer as well as the specific user community to which it will be addressed.

AGENDA OF THE WORKSHOP (TEMPLATE)

The agenda of each event should be developed before the event, in order to ensure that all necessary sessions will be included and that the time available for the event will be allocated in the best possible way. In order to facilitate the development of an agenda, a template is provided. This template can be adapted in each case, in order to meet the specific needs of a specific event, based on the type of stakeholders, number of participants and other factors.

The agenda of the event is expected to include at least the following sessions:

an introduction to the Workshop;

a presentation of the project;

a presentation of the specific use case of interest to the workshop;

a presentation of the methodology to be followed during the workshop;

17 http://redmine.openminted.eu/attachments/download/21/1.%20OpenMinTeD_Invitation_to_workshop.docx

http://redmine.openminted.eu/attachments/download/21/1.%20OpenMinTeD_Invitation_to_workshop.docx

PUBLIC


an interactive session for the extraction of requirements for each persona identified;

an interactive session for the visualization (design/drawing) of expected user interfaces for the services

identified in the previous session;

an interactive session for the visualization of the user workflows = how the user interacts with the new

service

a number of breaks, depending on the total duration of the event.

The minimum duration of a large-scale event is expected to be around 4,5 hours, which may be elaborated depending on the number of participants and the availability of experts delivering additional presentations on the topics of text and data mining, among others.

An indicative agenda of a Workshop is available through the project’s Redmine installation18 and also included in the Annex E of this document..

SCRIPT, FACILITATION & NOTES

In order to ensure the uninterrupted flow of an event, a script indicating the roles and expected activities of a team of facilitators should be prepared. The script refers to a list of activities that take place before, during and after the event, their expected duration and the people assigned for each activity.

Three major teams are expected to be involved in the preparation and implementation of such an event:

1. Facilitators: They will be responsible for delivering the presentations (at least the ones related to the

project), introducing external speakers and the general overview of the event. They are also

responsible for reporting back on the event.

2. Assistants: A number of people assisting with practicalities will be required. They are the ones

responsible for the documentation of the event (note keeping, taking photos and videos), practical

arrangements (e.g. setting up the venue with laptop, projector, speakers, whiteboards and flipcharts)

as well as for ensuring that participants will have access to whatever needed during the event (e.g.

sheets of paper, markers etc.).

3. TDM experts: A number of experts involved in each use case are welcome to take part and guide the

participants (it is not required). In case no TDM experts from the project partners participate to the

events, they will need to help the use cases partners to develop the final wireframes at the next stage.

An indicative script of a Workshop is available through the project’s Redmine installation19, as well as in the Annex F of this document.

PRESENTATIONS (TEMPLATES)

There are three presentations needed for the basic implementation of an event:

1. A presentation of the project, focusing on the project's aims & objectives of relevance to the audience

of the event;

2. A presentation of the specific use case / user community of interest to the workshop;

18http://redmine.openminted.eu/attachments/download/3/1.%20OpenMinTeD_Event_agenda.docx 19 http://redmine.openminted.eu/attachments/download/16/2.%20OpenMinTeD_Event_script.xlsx

http://redmine.openminted.eu/attachments/download/3/1.%20OpenMinTeD_Event_agenda.docx

http://redmine.openminted.eu/attachments/download/16/2.%20OpenMinTeD_Event_script.xlsx

PUBLIC


3. A presentation that will provide information on the methodology to be followed during the event, as

well as the key concepts. A template/example of this use case is available through the project’s

Redmine installation20, as well as in the Annex G of this document.

All three presentations should be revised accordingly in order to be appropriate for the specific audience of the event.

Additional slides will be required for introducing the participants to the concept of the three interactive sessions that are expected to be included in the agenda of the event.

1. Interactive Session I: Identification of requirements. A template/example case is available through the

project’s Redmine installation21, as well as in the Annex H of this document.

2. Interactive Session II: Visualization of solutions. A template/example case is available through the

project’s Redmine installation22, as well as in the Annex I of this document.

3. Interactive Session III: Design of user workflows. A template/example case is available through the

project’s Redmine installation23, as well as in the Annex J of this document.

REPORTING FORM (TEMPLATE)

All partners organizing a workshop are kindly requested to provide a short report about it so that we can promote their work and make the most out of the feedback received. The following should be included per persona for each use case, as a part of the report:

1. Persona Graph: A Persona Graph for each different persona represented in the workshop should be

prepared and collected. An example from the Kick-off meeting for the Agro-Wheat Use Case AS3 is

available online24.

2. Hand-drawn wireframe on paper: An example from the Kick-off meeting for the Agro-Wheat Use

Case AS3 is available online25.

3. Hand-drawn Workflow on paper, depicting the interaction between the user and the new service

described above. An example from the Kick-off meeting for the Agro-Wheat Use Case AS3 is

available online26.

This short report should also include the name of the event during which the workshop was held, some information regarding the event and information about the participants (number, background, main area of work and expertise). Then a short description of the workshop should follow presenting its structure and focusing on the feedback received. Also any other interesting comments and/or ideas from participants should be included, so that feedback received as a result of face-to-face interaction can be collected. Also, presentations made as well as any photos taken should be included in the report.

A template for the reporting form is available through the project’s Redmine installation27 and is also available in the Annex K of this document.

20 http://redmine.openminted.eu/attachments/download/22/4.%20OpenMinTeD_Workshop_Methodology_template.pptx 21 http://redmine.openminted.eu/attachments/download/17/Interactive%20Session%20I.pptx 22 http://redmine.openminted.eu/attachments/download/19/Interactive%20Session%20II.pptx 23 http://redmine.openminted.eu/attachments/download/18/Interactive%20Session%20III.pptx 24 http://redmine.openminted.eu/attachments/download/12/UseCase3_PersonaGraph.jpg 25 http://redmine.openminted.eu/attachments/download/13/UseCase3_Wireframe_Workflow.jpg 26 http://redmine.openminted.eu/attachments/download/13/UseCase3_Wireframe_Workflow.jpg 27 http://redmine.openminted.eu/attachments/download/24/5.%20OpenMinTeD_Reporting_Template.docx

http://redmine.openminted.eu/attachments/download/22/4.%20OpenMinTeD_Workshop_Methodology_template.pptx

http://redmine.openminted.eu/attachments/download/17/Interactive%20Session%20I.pptx

http://redmine.openminted.eu/attachments/download/19/Interactive%20Session%20II.pptx

http://redmine.openminted.eu/attachments/download/18/Interactive%20Session%20III.pptx

http://redmine.openminted.eu/attachments/download/12/UseCase3_PersonaGraph.jpg

http://redmine.openminted.eu/attachments/download/13/UseCase3_Wireframe_Workflow.jpg

http://redmine.openminted.eu/attachments/download/13/UseCase3_Wireframe_Workflow.jpg

http://redmine.openminted.eu/attachments/download/24/5.%20OpenMinTeD_Reporting_Template.docx

PUBLIC


4.4 Final interface wireframes to revisit & refine per user partner

and information service

Each partner responsible for collecting feedback for a specific use case should ensure the collection of requirements and feedback from all available activities (workshops, online questionnaire and/or interviews) and follow the steps described below:

1. Drawing of the proposed wireframes using Balsamiq or any other related software;

2. Drawing of the proposed workflows using PowerPoint.

In both cases, Agro-Know can help by providing trainings on how these can be implemented.

As a next step, the project partners responsible for this task should validate the outcomes related to the wireframes and the workflows. This can happen by presenting them to stakeholders and potential end users either online (e.g. through Skype) or through focused meetings with a small number of participants (e.g. 5-10 people). During this step, project partners should be able to showcase the envisaged new services and explain their main functionalities. The aim of this step is to understand if these new services are of interest and use for the end users.

Project partners are kindly requested to provide the Agro-Know team with a list of people that have contributed to the validation of these services (including name, affiliation and email) so that they can be contacted for further information and clarifications, if needed.

PUBLIC


5| SCHEDULE OF NEXT STEPS This chapter presents a plan showing when each requirement analysis step is planned to take place, gathered & analysed. The figure below presents the Gantt of the various steps and deliverables, providing an overview of the process, while Table 5 presents an overview of all steps and their results/deliverables, the partners responsible and the respective deadlines.

FIGURE 16: REQUIREMENTS ELICITATION GANTT

TABLE 7: IMPORTANT DATES

Tasks / Deliverables Responsible Deadline

Generic online questionnaire AK 31/07/2015

Online questionnaire customisation and setup User partners 31/08/2015

Submission of results from online questionnaire to AK User partners 30/09/2015

Schedule 1st round of interactive sessions User partners 15/09/2015

Run 1st round of interactive sessions and submit updated personas matrix, initial wireframes and workflows to AK

User partners 31/10/2015

Interim Requirements Report AK 30/11/2015

Schedule 2nd round of interactive sessions User partners 15/12/2015

Run 2nd round of interactive sessions and submit updated personas matrix, initial wireframes and workflows to AK

User partners 15/01/2016

Interim Requirements Analysis Report AK 29/02/2016

Design High Fidelity Wireframes User partners 29/02/2016

Translate features into requirements for text mining services & e-infra

AK 30/04/2016

D4.2 Community Requirement Analysis Report AK 31/05/2016

D4.3 OpenMinTeD functional specifications ARC 31/07/2016

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Tasks and Deliverables M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14

Initial profiling of targeted personas

Online Questionnaire Customisation and Setup

Online Questionnaire Launch and Data Collection

Validation of users’ requirements

Schedule Interactive Sessions

Run Interactive Sessions

Interim Requirements Report

Design High Fidelity Wireframes

Translate features into requirements and specs for

text mining services and e-infra

D4.2 Community Requirement Analysis Report

D4.3 OpenMinTeD functional specifications

Deliverable

Milestone

20162015

PUBLIC


PUBLIC


6| CONCLUSIONS The methodology described in this document is intended to be used for the elicitation of requirements from the targeted end-users. The related work undertaken so far includes the identification of the user communities of interest for the project, as well as the definition of a number of use cases that are expected to be studied by the project. Through these use cases, personas of interest to the project will be engaged to activities related to the elicitation of requirements that will eventually drive the TDM-powered outcomes of the project.

The methodology consists of a number of well-defined steps that should be completed for the successful acquisition of requirements. These series of steps allow the building of the targeted persona as well as the extraction of requirements in the following way:

1. Elicitation of initial feedback using an online questionnaire or interviews:

a. Initial profiling of a target persona through the online questionnaire, in terms of

demographics;

b. Collection of content-related requirements, such as content-related challenges and sources;

c. Identification of potential or expected solutions to the aforementioned challenges;

2. Acquisition of feedback regarding the

a. Functionalities of the envisaged by the users services;

b. Expected user interface of the new service(s), in the form of hand-drawn wireframes;

c. Interaction between the users and the new service(s) in the form of hand-drawn workflows /

flowcharts.

3. Validation of feedback collected

a. During focused workshops, to be attended by various personas included in each use case;

b. Through interviews or focus groups.

4. Transformation of feedback collected into requirements that can be used for the development of the

project’s TDM-powered outcomes;

The results acquired through this process are expected to exhibit a significant diversity in terms of terminology, level of familiarity with the proposed technologies, community maturity, but at the same time they are expected to converge in several areas of expectancies and needs.

The next steps of this process include the analysis and harmonization of the requirements collected during this process and the subsequent synthesis of the results into a single report that allows:

categorization of users (typology) and prioritization of the requirements;

better understanding of end user requirements from the text mining research community;

better use of available resources at the stage of the applications design and implementation, by

reducing the number of systems that can handle user requirements and targeting more sustainable /

reusable technology.

These activities are expected to take place in the context of the Task T4.2 “Requirements Analysis and Harmonization”, which will take place as soon as the elicitation of requirements has been successfully completed.

PUBLIC


PUBLIC


7| REFERENCES

[1] Bradbeer, J. (1999). “Evaluation”, FDTL Technical Report no. 8, University of Portsmouth

[2] Farbey, B., Land, F. and Targett, D. (1993). “How to Assess your IT investment: A Study of Methods

and Practices”, Oxford: Management Today and Butterworth-Heinemann.

[3] Manouselis, N., Psochios, Y., Marianos, N., Nikoulina, V. and Bosca, A. (2012). Deliverable D7.1

“Evaluation and Validation Plan”, Organic.Lingua ICT-PSP project.

[4] Maurya, A. (2012). Running Lean: Iterate from Plan A to a Plan that Works. O'Reilly Media; 2nd

edition (March 9, 2012). ISBN-10: 1449305172.

[5] Ries, E. (2011). The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create

Radically Successful Businesses. Crown Business; 1st Edition (September 13, 2011). ISBN-10:

9780307887894

[6] Swinkels, F.G. (1997). “Managing the life cycle of Information and Communication Technology

investments for added value.” In European Conference on the Evaluation of Information Technology.

Delft: Delft University Press.

[7] Willcocks, L. and S. Lester. (1996). “The evaluation and management of information systems

investments: from feasibility to routine operations.” In Investing in information systems: evaluation and

management, ed. L. Willcocks. London: Chapman & Hall.

PUBLIC


PUBLIC


8| ANNEX A: PERSONA GRAPH TEMPLATE Also available as a separate file at the project’s common space.

PUBLIC


9| ANNEX B: GENERIC ONLINE QUESTIONNAIRE Available at:

https://docs.google.com/forms/d/1_OBM6cAl28MdhMSMU20xo2TePwses_JGhpnNoZitG2k

https://docs.google.com/forms/d/1_OBM6cAl28MdhMSMU20xo2TePwses_JGhpnNoZitG2k

PUBLIC


PUBLIC


10| ANNEX C: ONLINE QUESTIONNAIRE EXAMPLE (AGRIS USE CASE)

Available at:

https://docs.google.com/forms/d/1iCNCkp5l9k8oBNQbT0ngZaeqyRp2lEl8tLVwPjK_60Q/viewform



PUBLIC


PUBLIC


11| ANNEX D: INVITATION TO THE WORKSHOP Also available as a separate file at the project’s common space.

Invitation to attend a workshop on (title of the workshop here)

Dear colleague,

By this email I would like to invite you to participate in the workshop that we will be holding on xx/xx/xx at the (place here) focusing on the identification of user requirements .................:

During this workshop participants will have the chance to get introduced to the OpenMinTeD project which is a European initiative (www.openminted.eu) that aims to enable the creation of an infrastructure that fosters and facilitates the use of text mining technologies in the scientific publications world, builds on existing text mining tools and platforms, and renders them discoverable and interoperable through appropriate registries and a standards-based interoperability layer, respectively.

During the workshop you will be also asked to participate in two interactive sessions aiming to collect requirements regarding the use of text and data mining technologies in the field of [name of field here]

This survey will also allow participants to express their ideas on the design of the new services and applications that they have in mind, by drawing the user interface of these and sharing ideas on their integration in existing Web interfaces, like Web portals.

We want to better understand the current situation, as well as the needs and requirements of all people involved in research in the fields of [name of field here]. Your opinion is of great value to us. Therefore we would kindly like to ask you for your participation in our workshop.

For more information about the workshop and to let us know whether you are interested in participating please contact (your email here).

On behalf of the organizers,

Kind regards,

[Name of the project partner sending the invitation]

http://www.openminted.eu/

PUBLIC


12| ANNEX E: TEMPLATE FOR THE AGENDA OF A WORKSHOP Also available as a separate file and online at http://redmine.openminted.eu/attachments/download/3/1.%20OpenMinTeD_Event_agenda.docx

OpenMinTeD:

Open Mining INfrastructure for TExt and Data

User Requirements’

Workshop/Meeting Agenda

XX/XX/2015, City, Country

This project is funded under the Horizon 2020 programme, H2020-EINFRA-2014-2 | RIA - Research and Innovation action.



PUBLIC


Location & duration

The meeting will be held on Friday, 24th of July 2015, from 14.00 to 18.40 at the [location].

Agenda

Time Session

14:00-14:10

Welcome Setting up & Introduction to the event (15 mins)

14:10-14:30

Introduction to OpenMinTeD

Introduction to the OpenMinTeD project (general, aims & objectives) (20 mins)

14:30-14:50

About the event Aims & objectives, and methodology to be followed in

the event (20 mins)

14.50-15.10

Presentation of the specific community

Presentation of the community represented at the Workshop: User types, data sources and data-related

issues

15:10-15:20

Short break

15:20-16:20

Interactive Session I

Identification of requirements

(persona characteristics, data requirements, challenges, proposed solution)

16.20-17.00

Interactive Session II

Drawing the wireframes

17:00-17:20

Coffee break

17:20-18:00

Interactive Session III

Design of user workflows

18:00-18:20

Wrap-up Overview of the workshop, discussions on the outcomes

Approximate duration of the event: 4,5 hours (duration may vary)

PUBLIC


Useful information

Reaching the meeting premises

Please provide short and useful information on how the participants will be able to reach the meeting premises using public transportation or their own means. A map (screenshot or a URL to a Google map) would also be useful in some cases.

FIGURE 17: EXAMPLE OF A MAP GUIDING PARTICIPANTS TO THE MEETING PLACE

Necessary material

Will the participants need something specific with them during the meeting? Will they need e.g. their own laptops or to work on their own ideas before the workshop? If so, please mention that here.

PUBLIC


List of Participants

This section will be useful for keeping track of the participants of the event. Alternatively, an online registration form can be used; however, the people that actually participated need to be highlighted.

Name Organization Role email

PUBLIC


13| ANNEX F: SCRIPT FOR RUNNING A WORKSHOP Also included as a separate file at the project’s common space.

Slot Time Basic Event ActivitiesCoordinated by

(duration)Time Support Activities People Assigned Notes

13.40-14.00 Make sure all the material needed are available Person 1 (30') Assistant 1

Refers to all necessary materials (e.g. post-

its, whiteboard, flipchart sheets, markers,

laptop & projector etc.)

13.30-13.00 Discuss with local hosts to make sure everything is OK Person 2 (30') Assistant 2

Ensure wireless connectivity to the

internet, availability of PPTs in laptop,

working power outlets, set up projector

etc.

13.40-14.00 Short meeting to go over the entire meeting Facilitators (20') Allocate presentations

Event starts

Person 1 (10') 14.00-14.10 Documentation with photos Assistant 1

Person 2 (10') 14.00-14.10 Helping people to setup Assistant 2

14.10-14.50 Introduction to OpenMinTeD & About the event Facilitators (40') 14.10-15.10 Help facilitators switch between PPts Assistant 1

14.50-15.10 Facilitators lead the group into a group activity (to be decided) Facilitators (20') 14.10-15.10 Documentation with photos, videos Assistant 2

15.10-15.20 Short break to stretch feet and get up - not a long coffee break Person 1 (10') 15.10 Announcement of short break Assistant 1Help participants if needed (e.g. smoking

area)

15.20-15.35Facilitators explain the concept of the interactive session to the

participantsFacilitators (15') 15.20-16.20

Organize participants in groups if needed, explain the

concept

15.35-16.10Participants work on the identification & recording of

requirements using the templates providedParticipants (60') 15.20-15.30 Ensure availability of paper and markers Assistant 1 (10')

16.10-16.30 Announce end of Session I, participants to finalize their work Facilitators (15') 16.10-16.20Ensure that all groups will finalize their work &

documentation of session with photosAssistant 2 (10')

16.20 Upcoming coffe break Person 1 (30') 16.00-16.20 Arrange practical details for coffee breaks coming up Assistant 1 (20') Ensure coffee availability, snacks etc.


participantsFacilitators (10') 16.20-16.35 Documentation with photos Assistant 2 (15')

16.30-17.00 Participants work on the drawing/design of wireframes Participants (30') 16.20-16.30 Ensure availability of paper and markers Assistant 1 (10')

16.50-17.00 Announce end of Session II, participants to finalize their work Facilitators (10') 16.50-17.00Ensure that all groups will finalize their work &


16.55 Assistant 1 makes the announcement for the coffee break Assistant 1 (5')

17.00-17.15 Facilitators meet to briefly discuss upcoming phase Facilitators (15')

17.15Assistant 1 makes the final announcement for people to come

back in the roomsAssistant 1 (5')


participantsFacilitators (10') 17.20-17.40 Documentation with photos Assistant 2 (20')

17.30-18.00 Participants work on the drawing/design of workflows Participants (30') 17.20-17.30 Ensure availability of paper and markers Assistant 1 (10')

17.50-18.00 Announce end of Session II, participants to finalize their work Facilitators (10') 17.50-18.00Ensure that all groups will finalize their work &


18.00-18.10 Overview of the workshop, discussions on the outcomes Facilitators (10') 18.00-18.20 Documentation with photos Assistant 1 (20')

18.10-18.20 Participants to provide their feedback on the workshop Participants (10') 18.20-18.30

Adapt online feedback questionnaire to meet the needs of the

eventFacilitators (30')

Share questionnaire with participants through email Assistant 1 (10')

Evaluate outcomes of the event and prepare report Facilitators (3h)

Organize and share photos & videos from the event Assistant 1 (60')

13.30-

14.00

16.20-

17.00

Interactive Session III

14.10-

15.10

17.20-

18.00

Event organizers go to the venue to make sure that everything is in place. More specifically:

Interactive Session II

Short break15.10-

15.20

15.20-

16.20

Interactive Session I

17.00-

17.20

Presentations

14.00-

14.1014.00-14.10 Setup, settle down, introduction

18.00-

18.20

After the event

Coffee break

Conclusions / Wrap-up

PUBLIC


PUBLIC


14| ANNEX G: METHODOLOGY OF THE WORKSHOP PPT TEMPLATE Also included as a separate file at the project’s common space.

PUBLIC


PUBLIC


15| ANNEX H: INTERACTIVE SESSION I PPT TEMPLATE Also included as a separate file at the project’s common space.

PUBLIC


PUBLIC


16| ANNEX I: INTERACTIVE SESSION II PPT TEMPLATE Also included as a separate file at the project’s common space.

PUBLIC


PUBLIC


17| ANNEX J: INTERACTIVE SESSION III PPT TEMPLATE Also included as a separate file at the project’s common space.

PUBLIC


PUBLIC


18| ANNEX K: REPORTING FORM FOR A WORKSHOP Also available as a separate file at the project’s common space.

Reporting Template for Organizers of the OpenMinTeD Workshops

For User Requirements

Formal Data:

Name of Institution organizing the event:

Date of event:

Location of event:

Number of Participants:

Type of participants:

Names of facilitators:

Any other supporting person? If yes, please list the name and function of this person:

Summary of the Workshop:

Content related Data:

List issues/questions raised during the session Presentation of the OpenMinTeD project

List issues/questions raised during the session Presentation of the Methodology

List issues/questions raised during the Interactive Session I

PUBLIC


List issues/questions raised during the Interactive Session II

List issues/questions raised during the session Closing - Discussion

Other comments?

d4.1 - requirements methodology - openmintedopenminted.eu/wp-content/uploads/2017/01/d4.1... ·...

Documents