the male fertility gene atlas - a web tool for collecting ...€¦ · 10/02/2020  · male...

23
Male Fertility Gene Atlas (MFGA) 1 The Male Fertility Gene Atlas – A web tool for collecting and integrating data about 1 epi-/genetic causes of male infertility 2 3 Running title: Male Fertility Gene Atlas (MFGA) 4 5 H. Krenz 1 , J. Gromoll 2 , T.Darde 3 , F. Chalmel 3 , M. Dugas 1 , F. Tüttelmann 4 * 6 7 1 Institute of Medical Informatics, University of Münster, 48149 Münster, Germany 8 2 Centre of Reproductive Medicine and Andrology, University Hospital Münster, 48149 Münster, 9 Germany 10 3 Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) 11 - UMR_S 1085, F-35000 Rennes, France 12 4 Institute of Human Genetics, University of Münster, 48149 Münster, Germany 13 14 *Correspondence address: Institute of Human Genetics, University of Münster, Vesaliusweg 15 12-14, 48149 Münster, Germany, [email protected] 16 17 18 Word count: abstract 276, main text: 3914 19 20 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 13, 2020. . https://doi.org/10.1101/2020.02.10.20021790 doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

Upload: others

Post on 02-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    1 

The Male Fertility Gene Atlas – A web tool for collecting and integrating data about 1

epi-/genetic causes of male infertility 2

3

Running title: Male Fertility Gene Atlas (MFGA) 4

5

H. Krenz1, J. Gromoll2, T.Darde3, F. Chalmel3, M. Dugas1, F. Tüttelmann4* 6

7

1Institute of Medical Informatics, University of Münster, 48149 Münster, Germany 8

2Centre of Reproductive Medicine and Andrology, University Hospital Münster, 48149 Münster, 9

Germany 10

3Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) 11

- UMR_S 1085, F-35000 Rennes, France 12

4Institute of Human Genetics, University of Münster, 48149 Münster, Germany 13

14

*Correspondence address: Institute of Human Genetics, University of Münster, Vesaliusweg 15

12-14, 48149 Münster, Germany, [email protected] 16

17

18

Word count: abstract 276, main text: 3914 19

   20

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

Page 2: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    2 

Abstract 21

Interconnecting results of previous OMICs studies is of major importance for identifying novel 22

underlying causes of male infertility. To date, information can be accessed mainly through 23

literature search engines and raw data repositories. However, both have limited capacity in 24

identifying relevant publications based on aggregated research results e.g. genes mentioned 25

in images and supplements. To address this gap, we present the Male Fertility Gene Atlas 26

(MFGA), a web tool that enables standardised representation and search of aggregated result 27

data of scientific publications. An advanced search function is provided for querying research 28

results based on study conditions/phenotypes, meta information and genes returning the exact 29

tables and figures from the publications fitting the search request as well as a list of most 30

frequently investigated genes. As basic prerequisite, a flexible data model that can 31

accommodate and structure a very broad range of meta information, data tables and images 32

was designed and implemented for the system. The first version of the system is published at 33

the URL https://mfga.uni-muenster.de and contains a set of 46 representative publications. 34

Currently, study data for 28 different tissue types, 32 different cell types and 20 conditions is 35

available. Also, ~5,000 distinct genes have been found to be mentioned in at least ten of the 36

publications. As a result, the MFGA is a valuable addition to available tools for research on the 37

epi-/genetics of male infertility. The MFGA enables a more targeted search and interpretation 38

of OMICs data on male infertility and germ cells in the context of relevant publications. 39

Moreover, its capacity for aggregation allows for meta-analyses and data mining with the 40

potential to reveal novel insights into male infertility based on available data. 41

42

43

Keywords: male infertility, genetics, epigenetics, omics, database 44

45

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 3: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    3 

Introduction 46

Male infertility is a prevalent and highly heterogeneous disease for which the underlying causes 47

can currently be identified for only about 30% of male partners in infertile couples (Tüttelmann 48

et al., 2018). In the last years, a growing number of studies have been published on the 49

genetics and epigenetics of male infertility using a broad range of genomic technologies. For 50

an overview see Oud et al. (2019). Often, researchers give extensive insight into their findings 51

by providing supplementary data or access to their raw data. Theoretically, this enables 52

clinicians and other researchers to validate those findings or interpret their own results in a 53

broader context; in practice, however, the findability and reusability of this vast set of 54

information is limited. To date, there is no public resource for the field of male infertility that 55

enables clinicians and researchers alike to easily access a comprehensive overview of recent 56

findings, although tools for other diseases have long been established, such as, for example, 57

the Gene ATLAS (Canela-Xandri et al., 2018) that was released in 2017. It provides information 58

on genetic associations of 778 traits identified in genome-wide association studies (Canela-59

Xandri et al., 2018), male infertility is, unfortunately, not one of them. 60

Instead of being accessible through a specified tool, research on male infertility relies mainly 61

on general search engines for scientific publications like PubMed or Google Scholar. These 62

engines can be employed to search for information based on keywords of interest, e.g., gene 63

names in combination with specific conditions. However, if the required information can only 64

be found in a supplementary table or figure, these search engines are unable to mark the 65

corresponding publication as relevant. Other sources of information are raw data repositories 66

like Gene Expression Omnibus (GEO) which provides access to the raw data files of 67

microarray and genomic data from many publications (Barrett et al., 2013), or Sequence Read 68

Archive (SRA). This is a marked achievement for the scientific community, but such 69

repositories do not allow users to find a data file based on a gene or variant of interest. For the 70

most part, publications of interest have to be identified in advance, and comprehensible insight 71

into the data can only be achieved by reanalysing it, using complex bioinformatics pipelines. 72

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 4: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    4 

To address this need specifically for the reproductive sciences, Chalmel and colleagues 73

established the ReproGenomics Viewer (RGV) in 2015 (Darde et al., 2019; Darde et al., 2015). 74

It provides a valuable resource of manually-curated transcriptome and epigenome data sets 75

processed with a standardised pipeline. RGV enables the visualisation of multiple data sets in 76

an interactive online genomics viewer and, thus, allows for comparisons across publications, 77

technologies and species (Darde et al., 2019). However, the RGV is not designed to show 78

downstream analysis results such as differentially expressed genes or genomic variation. Also, 79

it is not possible to identify relevant publications based on genes of interest. Other tools in this 80

field are GermOnline (Lardenois et al., 2010) and SpermatogenesisOnline (Zhang et al., 2013). 81

However, both have not been maintained in recent years. 82

In order to offer a public platform that provides access to a comprehensive overview of 83

research results in the field of epi-/genetics of male infertility and germ cells and to bridge the 84

gap between textual information in publications on the one hand and complex data sets on the 85

other hand, we designed, developed, and now introduce the publicly available Male Fertility 86

Gene Atlas (MFGA, https://mfga.uni-muenster.de). Its objective is to provide fast, simple and 87

straightforward access to aggregated analysis results of relevant publications, namely by 88

answering questions like “What is known about the gene STAG3 in the context of male 89

infertility?” or “Which genes have been identified to be associated with azoospermia?“. To this 90

end, we created an advanced search interface as well as comprehensive overviews and 91

visualisations of the publications and search results. A basic prerequisite was that we had to 92

design and implement a data model that can accommodate and structure a very broad range 93

of meta-information, data tables and images. 94

95

Materials and methods 96

Requirements engineering 97

The MFGA is designed to support researchers and clinicians in the field of male infertility in 98

finding and evaluating the analysis results of relevant publications. To ensure that the MFGA 99

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 5: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    5 

offers a relevant scope and focusses on the central needs and interests of the targeted user 100

group, an extensive requirements engineering process had to be employed. This process 101

aimed at defining the most relevant parts of the system design; in case of the MFGA, this 102

means answering the following five questions: 103

1. What kind of data should be available? 104

2. How should the database be maintained and kept up to date? 105

3. Who should be authorised to access the data on the website? 106

4. What interface should be provided to access the data? 107

5. How should the data be visualised? 108

Since user-centric design is proven to be a critical factor of success for software projects 109

(Maguire and Bevan, 2002), the requirements were acquired in close cooperation with a large 110

group of prospective users of the MFGA. These researchers and clinicians are part of the 111

Münster-based clinical research unit (CRU326) (http://www.male-germ-cells.de) and provided 112

input on a broad range of aspects of male infertility, germ cell development as well as sperm 113

morphology, motility, and function. The development of the MFGA is based on the software 114

development and requirements engineering paradigm Rapid Prototyping, which focusses on 115

developing early testing versions of the product to enable informed user feedback and iterative 116

improvements (Budde et al., 1992; Gordon and Bieman, 1995). 117

The requirements analysis was based on a set of 39 representative and relevant 118

publications/data sets that the MFGA should be able to host, provided by members of the 119

CRU326. The publications were highly heterogeneous and covered a broad range of 120

methodologies, from GWA studies to targeted genotyping, microarray, bulk or single-cell RNA 121

sequencing as well as methylation. The complete list is available in Suppl. Tab. S1 and has 122

already been included into the MFGA database. 123

124

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 6: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    6 

Current state of requirements 125

The requirements regarding the kind of data that should be available in the MFGA can be 126

summarised in two points: First, the MFGA should host data representing the analysis results 127

of a comprehensive set of publications on male infertility. Generally speaking, the MFGA needs 128

to be able to deal with publications of arbitrary form and content. To account for the extremely 129

high degree of heterogeneity between publications, a very flexible, modular data model is 130

required. However, to enable the MFGA’s central service of offering selective data retrieval 131

operations, it is crucial to identify, standardise and tag recurrent relevant information items and 132

to curate them manually for each publication. Whenever applicable, standardisation should be 133

based on ontologies as provided by OBO Foundry (Smith et al., 2007). Second, the MFGA 134

should provide complementary data from external databases to increase the comprehensibility 135

of information, e.g., to explain the background of gene names and provide links to more details. 136

However, the MFGA is not supposed to contain any raw data; instead the system should link 137

to the corresponding repositories if provided by a publication. Also, the MFGA is not planned 138

for processing raw data by itself; the system is only intended to show aggregated result data 139

provided by publications. 140

The database of the MFGA should be fully public and accessible to all researchers and 141

clinicians in the field of male infertility. Maintenance and curation of selected publications 142

should be provided by the development team. In order to keep the database up to date in a 143

structured way, a recurring process for preparing new relevant publications will be 144

implemented (Fig. 1). It comprises three main steps: selection of relevant publications, manual 145

curation according to the requirements of the MFGA database and data upload. Prospectively, 146

it should be triggered once per month and combine the proposed publications of four different 147

sources into one prioritised list. The four information sources are (1) the list of not yet 148

processed publications from the previous month, (2) newly published manuscripts from the 149

CRU326, (3) other relevant new publications identified by the MFGA team, and, as an 150

additional unbiased source, (4) recent publications queried from PubMed (query: “(male OR 151

men) AND (fertility OR infertility) AND (gene OR genetic) AND ("YYYY/MM/01"[Date - 152

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 7: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    7 

Publication] : "YEAR/MM/31"[Date - Publication])”). Publications will be curated and uploaded 153

by priority. Eventually, users are supposed to be enabled to act as registered contributors and 154

support selection, curation and upload of content. Thus, a user management and 155

authentication system is required. 156

Regarding the interface for accessing the information, users should be able to view a complete 157

list of publications and their key features to get a quick overview on the full content of the data 158

base. Additionally, an advanced search function should provide identifying and showing 159

subsets of relevant publications based on the occurrence of gene names or IDs in tables and 160

figures as well as meta-information like data type, OMICS, processed tissues/cells and 161

conditions/species. Detailed information and results of each individual publication should be 162

displayed by comprehensible, standardised overviews, including images, tables and on-163

demand plots. As a special feature, the MFGA should display data from different publications 164

in an aggregated way to allow for meta-analyses and integrated analyses. All pages should be 165

enriched with appropriate additional information from external databases and with cross 166

references, allowing for quick navigation through the atlas. 167

168

Results 169

IT architecture 170

The MFGA has been implemented as a modern Java web application that is securely hosted 171

on a server at the University of Münster, Germany, and can be accessed freely via the URL 172

https://mfga.uni-muenster.de. Its architecture (Suppl. Fig. S1) is based on the specific 173

requirements, identified during requirements engineering and explained in detail in the 174

supplement. For a good user experience, the graphical user interface utilises the Bootstrap 175

library which enables a smooth adjustment of the application to different browsers and devices 176

(Otto et al., 2020) and familiar design elements. On the server side, Spring Security framework 177

is employed to provide secure authentication processes and manage data access privileges 178

(Pivotal Software Inc., 2019). 179

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 8: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    8 

Data model 180

The specific relational data model of the MFGA as well as the preparation process for 181

publication data are a direct result of the requirements analysis and were developed based on 182

the initial set of relevant publications. They are the main prerequisite for enabling complex 183

search queries, aggregation functions and meta analyses on the MFGA’s database. For 184

representing the highly heterogeneous publications, a flexible and thus modular data model 185

was designed (Suppl. Fig. S2). It is based on information items that were identified to be 186

preserved over large subsets of publications on male infertility and standardised to seven 187

classes of information items: publication meta information, data set meta information, 188

processed tissue type, processed cell type, cohort’s condition, image and table (Tab. 1). For 189

each publication, they can freely be combined in any quantity including zero. In order to prevent 190

inconsistencies in the database caused by misspelling or synonymous terms, structured data 191

entry with drop-down lists and ontologies is implemented. Currently, the MFGA employs 192

BRENDA Tissue Ontology (BTO) (Gremse et al., 2011) for tissue type definition, Cell Ontology 193

(CL) (Diehl et al., 2016) for cell types and Human Phenotype Ontology (HPO) (Köhler et al., 194

2019) for conditions. Further technical specification, especially representation and annotation 195

of data tables and images, is explained in the extended methods (see supplement). 196

197

Available content 198

Currently there are 46 publications available on the MFGA, and they comprise information on 199

101 data sets (Fig. 2), 28 different tissue types, 32 different cell types and 20 conditions. The 200

majority of data sets contains data on the transcriptome (22 single-cell RNA sequencing, 12 201

bulk RNA sequencing & 3 microarray). The second largest group of data sets contains 202

information on the genome / exome (13 targeted genotyping, 9 genome-wide association 203

studies, 4 whole-genome sequencing & 4 sanger sequencing). Additionally, 6 data sets on 204

whole-genome bisulfite sequencing and 2 on targeted deep bisulfite sequencing are available. 205

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 9: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    9 

The most frequent conditions in the data sets are variants of abnormal spermatogenesis, e.g. 206

azoospermia and oligozoospermia. Testicular tissue has been processed by 32 proteome, 207

methylome and transcriptome data sets. The predominantly represented cell types are 208

embryonic stem cells, primordial germ cells, spermatogonia, spermatocytes, and spermatids. 209

Also, ~5,000 distinct genes have been found to be mentioned in at least ten of the publications, 210

with the top genes being TEX11, TEX14, TEX15, SYCP3, SMC1B, REC8 and HORMAD1. 211

The full list of 46 publications can be accessed via the URL https://mfga.uni-212

muenster.de/publication.html. 213

214

Functionality 215

The MFGA provides a web interface for fast, simple, and straightforward access to the results 216

of publications on the epi-/genetics of male infertility and germ cells. For this purpose, an 217

advanced search form is provided on the Search tab of the atlas (Suppl. Fig. S3), enabling the 218

specification of search requests for relevant data sets. This function supports search terms 219

from seven categories: OMICS, data type (e.g., single-cell RNA sequencing or targeted 220

sequencing), condition, species, cell type, tissue type and gene/ID. For each of the first six 221

categories, one can choose from a list of search terms equivalent to the information items 222

stored in the database. Condition, cell type and tissue type are based on appropriate 223

ontologies: HPO (Köhler et al., 2019), CL (Diehl et al., 2016) and BTO (Gremse et al., 2011). 224

The number of available data sets is shown in brackets after each search term. Gene names 225

and IDs can be specified in a text field as a comma-separated list. Also, there are three search 226

modalities: (1) users can perform a broad search to identify all data sets that are annotated 227

with at least one of the specified search terms, (2) users can perform a more targeted search 228

for all data sets that are annotated with at least one of the specified search terms per category 229

and (3) users can perform a very specific search for all data sets that are annotated with each 230

of the search terms (default option). As an example, Fig. 3 shows an extract of the output of a 231

simple search request for data sets in the MFGA database containing information on the gene 232

STAG3 (Suppl. Fig S2 for the full screenshot). This refers back to the introductory question: 233

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 10: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    10 

“What is known about the gene STAG3 in the context of male infertility?”. For the following 234

examples, the gene STAG3, which is located on chromosome 7, encodes a protein involved 235

in meiosis and can be related to male infertility (van der Bijl et al., 2019), is employed to explain 236

the general functionalities of the MFGA. A detailed Walk Through is provided on 237

https://mfga.uni-muenster.de/walkThrough.html. 238

Executing a search request on the MFGA results in a list of data sets matching the 239

requirements, represented via charts and result tables (Fig. 3 and Suppl. Fig. S3). The left 240

chart shows the number of returned data sets grouped by publication. Multiple data sets 241

matching the search request in one publication can indicate that the authors provided further 242

proof for their findings, such as e.g., van der Bijl et al. (2019), who screened two cohorts of 243

infertile men. A textual summary of the data sets returned by the search request and some 244

important meta-information is given by the Data sets table. All tables in the MFGA format are 245

fully interactive such that the columns can be sorted, filtered and plotted online. Whenever the 246

search contains genes or IDs, the right chart represents their frequencies in the different 247

publications. This provides an initial estimation of relevance. Additionally, a second table lists 248

their specific occurrences in data tables and figures. In the case when a search request is 249

targeted at revealing relevant genes, such as, e.g., in the introductory question: “Which genes 250

have been identified to be associated with azoospermia?” (Suppl. Fig. S4), the right chart and 251

second table present the genes that occur most frequently in data tables of the returned data 252

sets. 253

The search results returned by the MFGA are designed for quick navigation and information 254

access. Therefore, many cross references are provided: Table entries directly link to overview 255

pages of the corresponding publication and its data sets. Data tables and images are linked in 256

the MFGA as well. However, this functionality is restricted to publications that are published 257

under an open access license. As an example, Fig. 3 shows that the publication van der Bijl et 258

al. (2019) mentions STAG3. Clicking on the data set title leads to the overview page of that 259

publication (Suppl. Fig. S5). Also, data tables containing the searched gene can directly be 260

accessed in MFGA by a single click, e.g., supplementary table 3 of van der Bijl et al. (2019) 261

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 11: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    11 

which is shown partly in Fig. 4 (full version in Suppl. Fig. S6). For a deeper analysis of the 262

underlying read data of individual data sets, the MFGA provides a link to the RGV, whenever 263

a data set is available in both tools, such as, e.g., Hammoud et al. (2015) in Fig. 3. 264

Data from external public databases has been integrated in order to enrich information on the 265

publications shown in the MFGA. Throughout the application, gene names can be selected in 266

table cells and plots to open up an overlay with further explanations (for an example, see 267

Fig. 5). The overlay bundles textual information from RefSeq (O'Leary et al., 2016) and HGNC 268

(Yates et al., 2017; HGNC Database, 2018). Additionally, a link to the corresponding 269

GeneCards page (Stelzer et al., 2016) is provided, as well as a shortcut to the MFGA search 270

for that gene. Data from the GTEx project (Lonsdale et al., 2013; The Broad Institute of MIT 271

and Harvard, 2019) is employed to show the gene’s expression in different tissues scaled by 272

the largest median. The expression of the gene in testis tissue is highlighted in red. 273

274

Discussion 275

The MFGA provides a public platform to support researchers and clinicians in obtaining an 276

overview of the recent findings in the field of male infertility and germ cells. The web-based 277

tool includes recent libraries and methods for an improved user experience and straightforward 278

usability. In order to enable an advanced search for relevant publications, a relational data 279

model has been developed and implemented for knowledge representation. It records the 280

recurrent information items in a structured and consistent way and can accommodate arbitrary 281

publications from the field of male infertility, regardless of the kind of data analysis and 282

technology. The search form enables a broad range of simple to very complex search queries 283

such as “What is known about the gene STAG3 in the context of male infertility?”, “Which 284

genes have been identified to be associated with azoospermia?” or “Which genes have been 285

found to be expressed in Sertoli cells of human testis tissue using single cell RNA 286

sequencing?” (Suppl. Fig. S7). 287

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 12: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    12 

The main purpose of the MFGA is to enable researchers and clinicians to consider their own 288

research results in the context of other relevant literature. To this end, the system enables a 289

highly selective search for studies with comparable conditions and parameters and offers the 290

means to quickly review their main analysis results. Additionally, searches for genes can be 291

performed in order to identify data sets containing the corresponding genes in their analysis 292

results. This functionality supports the validation of candidate genes. Further, supplementary 293

information from various sources is embedded into the MFGA, e.g., RefSeq (O'Leary et al., 294

2016), HGNC (Yates et al., 2017; HGNC Database, 2018) and GTEx project (Lonsdale et al., 295

2013; The Broad Institute of MIT and Harvard, 2019). 296

Compared to general search engines like Google Scholar and PubMed, with millions of 297

records, the MFGA will always return smaller numbers of search results. However, in the 298

domain of male infertility, the relevance of the individual results in relation to the corresponding 299

search terms used in the MFGA is expected to be greater than when using those same search 300

terms in a large search engine. Search engines for literature are usually restricted to mining 301

textual information of publications and cannot consider information that is presented in tables, 302

images, or supplementary material. The MFGA, however, is specifically designed for that task, 303

which is enabled by its key features: A standardised data representation and disease-specific 304

manual curation. 305

Another problem occurs when relevant data sets are searched in raw data repositories like 306

GEO or processed data resources like RGV (Darde et al., 2015; Darde et al., 2019). While the 307

information that these tools are able to provide is much more detailed than overviews in the 308

MFGA, they cannot be used to identify a data set based on, for example, a list of genes that 309

are supposed to be differentially expressed, since raw data does usually not include that kind 310

of annotation. The MFGA facilitates searching through aggregated result data and, thus, 311

identifying relevant data sets. Once identified, such data sets might then be further investigated 312

in the RGV (Darde et al., 2015; Darde et al., 2019) or by processing GEO data with 313

bioinformatics tools. Since RGV (Darde et al., 2015; Darde et al., 2019) and MFGA 314

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 13: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    13 

complement each other in their functionality, the MFGA links to RGV whenever a data set is 315

present in both tools; further, an even closer integration is planned. 316

Regarding tools that might appear similar to the MFGA, Zhang et al. (2013) provide a tool 317

termed SpermatogenesisOnline for searching individual genes in the context of 318

spermatogenesis based on a set of publications from 2012 and earlier. However, it remains 319

unclear which publications were included and whether or not they have been updated since 320

then. Another tool from Lardenois et al. (2010), GermOnline, provides functional information 321

about individual genes and access to transcriptomics data from microarrays on germline 322

development from 2008. Both tools have, to our knowledge, not been updated and are, thus, 323

not very useful anymore. 324

There are some limitations to the approach of the MFGA. Since analysis results are not 325

reproduced using in-house pipelines, the MFGA is restricted to the analysis results authors are 326

presenting in their publications. In the case where a publication provides, e.g., only significant 327

SNPs from a GWA study, the MFGA, too, reports only these. The only meta-analyses enabled 328

are those that use the aggregation of information in the MFGA, e.g., the top score of gene 329

appearances, to approximate gene importance; the information cannot indicate correlation or 330

even causality. Thus, the MFGA focusses on proposing plausible research hypotheses. 331

Finally, full functionality of the MFGA can only be provided for publications under an open 332

access license. Publications that allow no or restricted reuse are only partly integrated. 333

Prospectively, the MFGA will be updated and enlarged based on the proposed content 334

management process (Fig. 1). In this context, a web form will be implemented to enable users 335

to propose publications that are, from their point of view, still missing. Additionally, a closer 336

integration of RGV and MFGA is planned, e.g. by coupling the publication submission forms, 337

as well as further tools for meta-analyses. 338

In conclusion, the MFGA is a valuable addition to available tools for research on the epi-339

/genetics of male infertility. It helps fill the gap between pure literature searches as provided 340

by PubMed and raw data repositories like GEO or SRA. The MFGA enables a more targeted 341

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 14: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    14 

search and interpretation of OMICS data on male infertility and germ cells in the context of 342

relevant publications. Moreover, its capacity for aggregation allows for meta-analyses and data 343

mining with the potential to reveal novel insights into male infertility based on available data. 344

Ultimately, by combining RGV and MFGA with AI methods, we aim to develop a powerful gene 345

prioritisation system dedicated to male infertility similar to the GPSy tool (Britto et al., 2012). 346

347

Acknowledgements 348

We are grateful to all members of the Clinical Research Unit (CRU) ‘Male Germ Cells’ who 349

contributed through either identifying relevant publications, supporting publication curation or 350

providing feedback during the development of the MFGA: Michael Storck, Marius Wöste, Nina 351

Neuhaus, Sven Berres, Sandra Laurentino, Lina Franziska Lanuza Pérez, Corinna Friedrich, 352

Maria Schubert, Nikithomas Loges, Isabella Aprea, Jana Emich, Eva Maria Mall, Johanna 353

Raidt, Nadja Rotte, Alexander Busch, Sara Kim Plutta, Jascha Henseler, Anna Natrup. We 354

thank Dr. Celeste Brennecka for language editing of the manuscript. 355

356

Authors’ roles 357

H.K. designed and developed the MFGA data model and system architecture, implemented 358

the tool and drafted the manuscript. J.G. provided critical input during the development of the 359

MFGA and first drafts of the manuscript. T.D. and F.C. contributed to integrating the mutual 360

links between RGV and MFGA. M.D. and F.T. contributed to the system design and supervised 361

the whole study. All authors critically revised the manuscript and approved the final version. 362

363

Funding 364

This work was carried out within the frame of the German Research Foundation (DFG) Clinical 365

Research Unit ‘Male Germ Cells: from Genes to Function’ (CRU326). 366

367

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 15: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    15 

Conflict of interest 368

The authors declare no conflicts of interest. 369

370

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 16: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    16 

References 371

Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy 372 KH, Sherman PM, Holko M, et al. NCBI GEO: archive for functional genomics data sets--373 update. Nucleic acids research 2013;41:D991-D995. 374

Britto R, Sallou O, Collin O, Michaux G, Primig M, Chalmel F. GPSy: a cross-species gene 375 prioritization system for conserved biological processes--application in male gamete 376 development. Nucleic acids research 2012;40:W458-65. 377

Budde R, Kautz K, Kuhlenkamp K, Züllighoven H. What is prototyping? Info Technology & 378 People 1992;6:89–95. 379

Canela-Xandri O, Rawlik K, Tenesa A. An atlas of genetic associations in UK Biobank. Nature 380 genetics 2018;50:1593–1599. 381

Chinosi M, Trombetta A. BPMN: An introduction to the standard. Computer Standards & 382 Interfaces 2012;34:124–134. 383

Darde TA, Lecluze E, Lardenois A, Stévant I, Alary N, Tüttelmann F, Collin O, Nef S, Jégou B, 384 Rolland AD, et al. The ReproGenomics Viewer: a multi-omics and cross-species resource 385 compatible with single-cell studies for the reproductive science community. Bioinformatics 386 (Oxford, England) 2019;35:3133–3139. 387

Darde TA, Sallou O, Becker E, Evrard B, Monjeaud C, Le Bras Y, Jégou B, Collin O, Rolland 388 AD, Chalmel F. The ReproGenomics Viewer: an integrative cross-species toolbox for the 389 reproductive science community. Nucleic acids research 2015;43:W109-W116. 390

Diehl AD, Meehan TF, Bradford YM, Brush MH, Dahdul WM, Dougall DS, He Y, Osumi-391 Sutherland D, Ruttenberg A, Sarntivijai S, et al. The Cell Ontology 2016: enhanced content, 392 modularization, and ontology interoperability. Journal of biomedical semantics 2016;7:44. 393

Gordon VS, Bieman JM. Rapid prototyping: lessons learned. IEEE Softw. 1995;12:85–95. 394

Gremse M, Chang A, Schomburg I, Grote A, Scheer M, Ebeling C, Schomburg D. The 395 BRENDA Tissue Ontology (BTO): the first all-integrating ontology of all organisms for enzyme 396 sources. Nucleic acids research 2011;39:D507-13. 397

Hammoud SS, Low DHP, Yi C, Lee CL, Oatley JM, Payne CJ, Carrell DT, Guccione E, Cairns 398 BR. Transcription and imprinting dynamics in developing postnatal male germline stem cells. 399 Genes & development 2015;29:2312–2324. 400

HGNC Database. HGNC Database: retrieved in November 2018: HUGO Gene Nomenclature 401 Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics 402 Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United 403 Kingdom, 2018. 404

Köhler S, Carmody L, Vasilevsky N, Jacobsen JOB, Danis D, Gourdine J-P, Gargano M, Harris 405 NL, Matentzoglu N, McMurry JA, et al. Expansion of the Human Phenotype Ontology (HPO) 406 knowledge base and resources. Nucleic acids research 2019;47:D1018-D1027. 407

Lardenois A, Gattiker A, Collin O, Chalmel F, Primig M. GermOnline 4.0 is a genomics gateway 408 for germline development, meiosis and the mitotic cell cycle. Database the journal of biological 409 databases and curation 2010;2010:baq030. 410

Lonsdale J, Thomas J, Salvatore Mea. The Genotype-Tissue Expression (GTEx) project. 411 Nature genetics 2013;45:580–585. 412

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 17: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    17 

Maguire M, Bevan N. User Requirements Analysis. In: Hammond J, Gross T, Wesson J (eds). 413 Usability. Boston, MA: Springer US, 2002. 414

O'Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, 415 Smith-White B, Ako-Adjei D, et al. Reference sequence (RefSeq) database at NCBI: current 416 status, taxonomic expansion, and functional annotation. Nucleic acids research 2016;44:D733-417 D745. 418

Otto M, Thornton J, contributors aB. Introduction. Retrieved 09 January 2020. from 419 https://getbootstrap.com/docs/4.4/getting-started/introduction/, 2020. 420

Oud MS, Volozonoka L, Smits RM, Vissers LELM, Ramos L, Veltman JA. A systematic review 421 and standardized clinical validity assessment of male infertility genes. Human reproduction 422 (Oxford, England) 2019;34:932–941. 423

Pivotal Software Inc. Spring Security. Retrieved 10 December 2019. from 424 https://spring.io/projects/spring-security, 2019. 425

R Development Core Team. R: A language and environment for statistical computing. Vienna: 426 R Foundation for Statistical Computing, 2008. 427

Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland 428 A, Mungall CJ, et al. The OBO Foundry: coordinated evolution of ontologies to support 429 biomedical data integration. Nat Biotechnol 2007;25:1251–1255. 430

Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, Stein TI, Nudel R, 431 Lieder I, Mazor Y, et al. The GeneCards Suite: From Gene Data Mining to Disease Genome 432 Sequence Analyses. Current protocols in bioinformatics 2016;54:1.30.1-1.30.33. 433

The Broad Institute of MIT and Harvard. GTEx Portal. File: GTEx_Analysis_2016-01-434 15_v7_RNASeQCv1.1.8_gene_median_tpm.gct.gz. Retrieved 09 December 2019. from 435 https://gtexportal.org/home/datasets, 2019. 436

Tüttelmann F, Ruckert C, Röpke A. Disorders of spermatogenesis: Perspectives for novel 437 genetic diagnostics after 20 years of unchanged routine. Medizinische Genetik Mitteilungsblatt 438 des Berufsverbandes Medizinische Genetik e.V 2018;30:12–20. 439

van der Bijl N, Röpke A, Biswas U, Wöste M, Jessberger R, Kliesch S, Friedrich C, Tüttelmann 440 F. Mutations in the stromal antigen 3 (STAG3) gene cause male infertility due to meiotic arrest. 441 Human reproduction (Oxford, England) 2019;34:2112–2119. 442

Yates B, Braschi B, Gray KA, Seal RL, Tweedie S, Bruford EA. Genenames.org: the HGNC 443 and VGNC resources in 2017. Nucleic acids research 2017;45:D619-D625. 444

Zhang Y, Zhong L, Xu B, Yang Y, Ban R, Zhu J, Cooke HJ, Hao Q, Shi Q. 445 SpermatogenesisOnline 1.0: a resource for spermatogenesis based on manual literature 446 curation and genome-wide data mining. Nucleic acids research 2013;41:D1055-62. 447

448

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 18: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    18 

Table 1. Information classes and items available for each publication. 449

Information Class Information items1

Publication meta information

title publishing date

author important genes*

citation link to publication

abstract link to repository*

Data set meta information

title* OMICS type

technology* reference genome*

data type species

Processed tissue types

name maturity*

description* potential*

no of subjects* species

no of probes per subject* reference genome*

Processed cell types

name maturity*

description* potential*

no of subjects* species

no of cells per subject* reference genome*

Cohort's conditions Human phenotype ontology* id (Köhler et al., 2019)

name

size of cohort* comment*

Images title description*

annotation of genes or relevant IDs

path

Tables

title description*

annotation of standard columns (gene names, relevant IDs, loci, …)

annotation of all additional columns as numeric or text

450

1Optional items are marked with *. 451

452

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 19: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    19 

Figure 1. The process of identifying, selecting and prioritising novel publications 453

depicted based on Business Process Model and Notation (Chinosi and Trombetta, 2012). 454

The process is triggered monthly. Potential publications are collected and merged into a list. 455

This list is then prioritised by the MFGA team. Subsequently, curation and upload are 456

performed in the determined order. 457

458

459

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 20: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    20 

Figure 2. Data sets available on the MFGA grouped by technology. Currently 101 data 460

sets corresponding to 46 publications are fully curated and publically accessible. 461

Abbreviations: seq – sequencing, GWAS – genome-wide association study, WGBS – whole 462

genome bisulfite sequencing, BS – bisulfite. Plot created with R (R Development Core Team, 463

2008). 464

465

466

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 21: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    21 

Figure 3. Example screenshot of the MFGA’s search functionality (https://mfga.uni-467

muenster.de/search.html). It shows part of the result of the search for the gene STAG3 in the 468

MFGA database. The side bar on the left contains the search form. Here, STAG3 is typed into 469

the search field and search option ‘all search terms’ is checked. In the main window, plots 470

show the number of returned data sets and the frequency of STAG3 in data tables of the 471

corresponding publications. Below, the returned data sets are presented in a table enriched 472

with further meta-information, linking to the overview of data set and publication. The second 473

table shows the specific data tables and images in which the gene was found. Here, tables are 474

cropped and screenshot is compressed for better representability. See Suppl. Fig. S3 for 475

original screenshot. Retrieved 10 January 2020. 476

477

478

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 22: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    22 

Figure 4. Example screenshot of the MFGA’s functionality to represent data from 479

publications (https://mfga.uni-muenster.de/publication/results?ssDetailsId=3968811). It 480

shows part of supplementary table 3 of van der Bijl et al. (2019) in MFGA format. The table is 481

represented in an interactive way and can directly be filtered and sorted. Additionally, 482

histograms can be plotted for the table columns, here, e.g., based on column “Testis 483

Expression”. For the full screenshot see Suppl. Fig. S6. Retrieved 10 January 2020. 484

485

486

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint

Page 23: The Male Fertility Gene Atlas - A web tool for collecting ...€¦ · 10/02/2020  · Male Fertility Gene Atlas (MFGA) 2 21 Abstract 22 Interconnecting results of previous OMICs studies

Male Fertility Gene Atlas (MFGA)

    23 

Figure 5. Screenshot of the MFGA showing the overlay for additional information on 487

genes (https://mfga.uni-muenster.de/publication/results?ssDetailsId=3968811). Here, as an 488

example, the overlay for the gene STAG3 is shown. The textual information originates from 489

RefSeq (O'Leary et al., 2016) and HGNC (Yates et al., 2017). Additionally, gene expression in 490

different tissues is shown based on the data from the GTEx project (Lonsdale et al., 2013). 491

Links for searching the gene in the MFGA database and opening the corresponding 492

GeneCards entry (Stelzer et al., 2016) are provided. Retrieved 10 January 2020. 493

494

All rights reserved. No reuse allowed without permission. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for thisthis version posted February 13, 2020. .https://doi.org/10.1101/2020.02.10.20021790doi: medRxiv preprint