technical background document molecular data sharing for ... · the field of microbial genomics are...

16
Technical background document Molecular Data Sharing for Public Health Purposes Preface Before you lies a document that outlines the potential strengths of the free exchange of genomic data for public health purposes/ protection, and the limitations currently hampering the effective implementation of data sharing. This document intends to be a guide for the conference on this topic, that will be held in Utrecht, The Netherlands, 6-8 December 2009. At this conference we aim to bring together a representation of relevant actors in the field and produce a roadmap towards sustainable data exchange in public health. During recent years there has been much talk about data sharing in genomics, but discussions have focused primarily on the high profile human genome projects. From these discussions the Bermuda Principles have been put forward in 1999, and they were confirmed and strengthened by the Fort Lauderdale agreement in 2003. These documents outline the key principles of open access that are currently handled in genomic data sharing, in human genome projects. These agreements do however not extend to the field of microbial genomics, where other factors influence opinion and attitude and the resulting decision making and possibilities. For example, ethical considerations in the field of microbial genomics are quite different from those in the field of human genomics, as are the public health issues, pertaining mainly (or only) to infectious diseases with a microbiological (viral, bacterial, parasitical) etiology, that are at stake. Microbial genomics is a field that is developing extremely fast. Laboratories across the world that are involved in detecting and characterizing pathogens are exploring new microbial genomics techniques, yielding pathogen specific genome data at an ever increasing speed. This genomic data sometimes has commercial value. Development mostly takes place in academia and industry and where innovation is a powerful driving force but not necessarily coupled with the "down-to earth", open access focus on questions addressed by public health officials. Thus the public health arena, where real-time data-exchange is sometimes a matter of life and death for many, is struggling to keep up. Discussions on -…- are taking place among clinicians, persons involved in response to infectious diseases outbreaks, epidemiologists and laboratory scientists involved in risk management, surveillance and research, and bio-informatics experts. In addition to these problem- oriented groups, interaction is needed at a more strategic level including dealing with issues like data and specimen sharing, confidentiality, intellectual property and market value of surveillance data that may jeopardize the future of pathogen–based surveillance systems. A properly functioning data-sharing network for public health requires political commitment, and agreed rules of engagement. As a result, there is increasing demand for guidance for organizations such as DGSANCO, ECDC, EFSA, WHO, FAO and Codex. Here we describe … invullen naar invulling van document Note: WHA is in May 2010

Upload: others

Post on 05-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

Technical background document Molecular Data Sharing for Public Health Purposes Preface Before you lies a document that outlines the potential strengths of the free exchange of

genomic data for public health purposes/ protection, and the limitations currently hampering the effective implementation of data sharing. This document intends to be a guide for the conference on this topic, that will be held in Utrecht, The Netherlands, 6-8 December 2009. At this conference we aim to bring together a representation of relevant actors in the field and produce a roadmap towards sustainable data exchange in public health.

During recent years there has been much talk about data sharing in genomics, but discussions

have focused primarily on the high profile human genome projects. From these discussions the Bermuda Principles have been put forward in 1999, and they were confirmed and strengthened by the Fort Lauderdale agreement in 2003. These documents outline the key principles of open access that are currently handled in genomic data sharing, in human genome projects. These agreements do however not extend to the field of microbial genomics, where other factors influence opinion and attitude and the resulting decision making and possibilities. For example, ethical considerations in the field of microbial genomics are quite different from those in the field of human genomics, as are the public health issues, pertaining mainly (or only) to infectious diseases with a microbiological (viral, bacterial, parasitical) etiology, that are at stake.

Microbial genomics is a field that is developing extremely fast. Laboratories across the world

that are involved in detecting and characterizing pathogens are exploring new microbial genomics techniques, yielding pathogen specific genome data at an ever increasing speed. This genomic data sometimes has commercial value. Development mostly takes place in academia and industry and where innovation is a powerful driving force but not necessarily coupled with the "down-to earth", open access focus on questions addressed by public health officials. Thus the public health arena, where real-time data-exchange is sometimes a matter of life and death for many, is struggling to keep up.

Discussions on -…- are taking place among clinicians, persons involved in response to

infectious diseases outbreaks, epidemiologists and laboratory scientists involved in risk management, surveillance and research, and bio-informatics experts. In addition to these problem-oriented groups, interaction is needed at a more strategic level including dealing with issues like data and specimen sharing, confidentiality, intellectual property and market value of surveillance data that may jeopardize the future of pathogen–based surveillance systems. A properly functioning data-sharing network for public health requires political commitment, and agreed rules of engagement. As a result, there is increasing demand for guidance for organizations such as DGSANCO, ECDC, EFSA, WHO, FAO and Codex.

Here we describe … invullen naar invulling van document Note: WHA is in May 2010

Page 2: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

1 General intro on molecular data i) What is molecular data? The world of micro-organisms formally includes bacteria, Archaea, yeast and fungi, and

unicellular animals (the Protista). Informally, and technically incorrectly, the term micro-organism is also used to refer to subcellular structures such as plasmids and viruses, and plant, animal and human cell lines. We will focus on data pertaining to (emerging) pathogenic micro-organisms and viruses.

In this document when we refer to molecular data, data that are required to enable taking adequate actions for protecting the public health from infectious diseases is meant. Such data generally comprises genome sequences or partial genome sequences describing pathogens of microbial, parasitic or virological nature. Sequences may be either DNA, RNA or amino acid.

It is relevant to note here that many pathogens are highly changeable at the genetic level and consequently at the phenotypic level. Especially viruses may have high mutation rates, which has several implications for molecular genomics.

Additionally we feel it is important note early on that there is great potential added value in information additional to genomic data. We need to combine biological data across geographic and disciplinary boundaries. A sequence by itself is just a sequence. With epidemiological information as simple as a detection date, place, and source it is already much more, and numerous other relevant bits of information can be thought of. Doing this been made possible resulting from technological advances in both the biological sciences fields and the communication and information technologies fields.

ii) How does molecular data contribute to public health safety? Molecular data may be used for public health protection in a number of ways, and different

uses are interlinked with each other. In general, genomic data is nowadays used as the preferred method used for classification of viruses, and most bacteria. Thus, for clear communication about microbes scientists use sequence data.

Public health protection against (emerging) infectious diseases is often approached through a cyclic procedure, entailing three main activities; surveillance/ detection/ signaling, intervention and increasing knowledge.

Surveillance is essential for establishing knowledge of baseline circulation of pathogens, and is usually implemented in the form of longitudinal studies. Such surveillance is applied for numerous pathogens by different (sorts of) institutions and in collaborative efforts across the world. Good examples are influenza viruses1, MRSA2, poliovirus3, and norovirus4. This knowledge of baseline prevalence in turn enables the (early) signaling of emerging variants (increasing incidence and / or a change in prevalent strains), or the identification of a common source of pathogens causing disease at different geographic locations (e.g. through contaminated foodstuffs). Additionally, detailed sequence data can be vital for monitoring the –possibly changing - virulence potential of a pathogen. Alternatively, when illness due to an unknown pathogen occurs, for example a metagenomic study on patient materials can be used to reveal genomic material of the etiologic agent.

Following from this stage, intervention strategies can be devised when needed. These typically entail the development and deployment of products like diagnostic assays and medicines / drugs, but also more basic actions like altering surveillance levels and quarantining patients. The intervention phase of this cycle involves valorization of the knowledge that was often gained during the surveillance phase. This valorization process usually takes place in commercial companies, who have economic interests that may differ from protection of public health.

1 www.gisaid.org, www.offlu.net, www.fludb.org 2 http://www.rivm.nl/earss/, http://www.harmony-microbe.net/, 3 http://www.who.int/immunization_monitoring/laboratory_polio/en/index.html/ 4 http://www.noronet.nl/fbve/background/, http://www.noronet.nl/noronet/, calicinet –in progress

Page 3: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

The cycle is completed by a phase of increasing knowledge and insight, gained from experiences during the surveillance and intervention stages. This stage usually entails research. Research with a public health focus is done at numerous sorts of research facilities, but typically these are national or international public health institutes, (academic) hospitals, or universities with a health focus in their academic program. The knowledge obtained from studying collected data catalyses new discoveries and research, which in their turn may shift or refine the focus of surveillance studies.

If genomic data describing pathogens is effectively and efficiently shared throughout the

world, preparedness/ intervention strategies aiming to minimize illness and death due to the described pathogen may be drafted more timely, more efficiently, and more uniformly throughout different regions and, as such, likely be more effective.

Page 4: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

2 A changing world asks for a new approach i) Why this discussion and why now? From the public health perspective we have observed that the situation around data-sharing is

changing. Processes underlying this change will be outlined below, and are for example rapidly evolving technology, the ongoing globalization of the world, and ever increasing economic stakes, for example in the field of medicine production. Additionally, differences in available resources in developed and developing countries are still growing. The main effect of all these changes combined for public health protection, is that data sharing is no longer a logical given, not even in situations of immediate threats to the global population. Therefore we attempt to put this discussion on the agenda of worldwide politics. While realizing that the obstacles to real-time datasharing are of genuine and valid nature, we aim to provide suggestions for alternative ways of publicizing data and governing ownership, in order be able to create a vision / road map towards a sustainable manner of real-time datasharing, thus enabling future adequate interventions when health threats are identified.

ii) An ever smaller world: molecular data should be analyzed through a global / collaborative approach There are great differences in the effectiveness of spread (speed and range) between different

pathogens, owing to their different properties, such as incubation times, modes of transmission, minimal infectious doses, levels of shedding etc. Nevertheless, every infectious disease, anywhere in the world, is a potential threat to all people in the world. It is generally accepted that in modern times human behavioral patterns have greatly facilitated and accelerated the global spread of infectious pathogens; every infectious agent in the world is as far away as the (airplane) traveling time it requires to get here, either in an infected individual or by other transport / transferring method.

The fastest spreading and most infectious, and therefore most common, viruses such as influenza and norovirus have the ability to spread across the globe in a matter of weeks. When the basic reproduction number of a pathogen is low, or can be lowered by intervention strategies, spread is slower. SARS for example was spread across the world by travelers in several weeks time and reached around 8 000 infections in about half a year. Intervention strategies comprising mainly of quarantining all patients and people who had been in contact with them, stopped SARS from causing an epidemic in 2003.

iii) The technology behind molecular biology is evolving rapidly Given the ease with which new genomic data can be obtained these days, it is unpractical to

not share data; given the availability of samples of interest, any second scientist can very easily re-do the analyses done by the first. In (emerging) infectious disease research, however, the availability of sample material is oftentimes a bottleneck.

Growing technical possibilities have enabled extremely rapid and high-quantity data-generation, and this development is ongoing. Rapid (whole genome) sequencing of microbial pathogens has come into reach of many labs in the (developed) world, as well as the screening of clinical samples for the presence of known and also as yet undescribed pathogens, including viruses, bacteria and parasites. Sequencing speed and quality / accuracy has increased tremendously during the past decades, leaving the challenge in the analyses and processing of the data rather than in the actual generation of it. Currently, NCBI lists complete genomes for 2891 viruses, and 40 viroids (sept 2nd 2009, http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome). The website Genomes Online Database (http://genomesonline.org) reports over 898 complete bacterial genomes, 66 Archaea and 116 Eukaryote full genome sequences, including the human genome and model organisms like Caenorhabditis

Elegans, Arabidopsis thaliana and Drosophila Melanogaster(2nd Sept 2009). By definition, these numbers were outdated the second they were typed here.

Page 5: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

Throughput times and costs have diminished drastically, which was well illustrated (and driven) by the Human Genome Project. The first draft of a full human genome took approximately 13 years to accomplish, and around $300 million [find correct figure]. Reportedly in 2009 the sequencing of a full human genome will cost as little as around $30 000 to $70 000. A bacterial genome can be sequenced for under $5000 and a virus at around $100. Time wise the developments are comparably fast: nowadays a whole genome of a virus, bacteria, or fungus can ideally be obtained in a single sequencing run (454 sequencing) with run time of 10 hours. A prize has been put up for the first team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 per genome. The first group who can accomplish this stands to win $10 000 000 (http://genomics.xprize.org/).

A remaining challenge lies in the integration of physical (pertaining to materials) and informational (pertaining to digital data describing materials) resources, and mining the combined data.

It is important to note that simultaneous with the increasing rate at which new data is being generated, the rate at with ‘old’ data is lost also increases.

iv) Modern IT and communication technologies enable new collaboration methods Parallel to these technical developments in the field of microbiology and not less importantly,

similar developments in the fields of information and communication technology have generated the computational force and infrastructure required for analyses, storage, comparison, exchange and dissemination of the huge amounts of data that are involved in these processes. It is needless to state that these developments have been crucial to enable the whole process of microbiological developments. Computer storage capacity as well as data processing power (memory capacity and processing speed) have continually grown. According to Moore’s law, that was first posed in a 1965 paper5, and was updated some times since then6, capacity doubles every 18 months.

Additionally, the internet has enabled entirely new methods of collaboration, opening the way to interactive and multiple user/ contributor networks, rather than the earlier bilateral, one- or two-

5 Moore, Gordon E. (1965). "Cramming more components onto integrated circuits". Electronics Magazine. pp. 4. http://download.intel.com/museum/Moores_Law/Articles-Press_Releases/Gordon_Moore_1965_Article.pdf. accessed 20-10-2009 6 See at Wikipedia: http://en.wikipedia.org/wiki/Moore’s_law, accessed 20-10-2009

Page 6: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

way scheme. While multi-user research collaborations are commonly known to be more effective and faster than traditional research, carried out by single research-groups or small scale collaboration efforts, the conventional values that authenticate a researcher’s work (e.g. publications, presentations) are taken out of such large scale processes. Moreover, social factors play a big role in scientists’ reluctance to participate in large scale collaborations; it is much more pleasant and natural to collaborate with not many others whom you know well, than with a large group of people who you may not be very familiar with.

v) Access to molecular data The process of data sharing7 essentially entails making it accessible to others, either freely and

without restrictions, or to a limited group of users, who may or may not have to pay for access and use. For genomic data, or sequences, this is generally done by depositing sequences into a searchable public sequence database, or alternatively in a database that is accessible to a defined and limited group of users.

Currently sequence data of all organisms can be deposited into three major and publicly accessible databanks who share the submitted data amongst themselves (NCBI Entrez Genome Database GenBank, SAKURA DNA Databank of Japan (DDBJ), EMBL Nucleotide Sequence Database). Data deposited in these databanks can in principle be freely accessed, viewed and used by others, without any further obligations to the original submitter than the mentioning of the accession number that was assigned to the sequence. However, some sequences available in GenBank are patented. Use of these is regulated under patent law and licenses-for-use are required. These databanks increasingly offer tools that can assist the user in data-extraction, data-mining and data-analysis.

Alternatively, besides these three major databanks a growing number of databanks specialize in (storing) genomes of specific organisms. For influenza for example a number of databanks exist, such as the Influenza Sequence Database in Los Alamos, USA, GISAID (Global Initiative on Sharing Avian Influenza Data), ISD (Influenza Sequence Database). These specialized databases are not all publicly accessible, and right-to-use restrictions range from access for registered users only (and any scientist can register) to e.g. access for submitting parties only. Some examples are detailed later in this document.

It is not uncommon that sequences obtained during research that are of (potential) scientific value, are not made public until the moment that a relating research paper has been accepted for publication. This may take many months, and sometimes up to years. Additionally, in industry, and also increasingly in academia or academic spin-offs, and some public health institutes, genomes or parts thereof are patented because potential monetary value is involved. The Bayh-Dole act, which was adopted in 1980 in the US, deals with IP derived from publicly funded research and allows for researchers to claim patents on findings from this publicly funded research. Many countries, including the EU, have adopted legislation similar to this Bayh-Dole act.

vi) Define public health situations during immediate datasharing is crucial. [marion]

7 to share, shared, shar·ing, shares: To participate in, use, enjoy, or experience jointly or in turns. www.thefreedictionary.com

Page 7: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

3 Players and data flows Who are players in generating, exchanging and using this molecular data? Many different parties all across the globe have a role in the process of molecular data

exchange for public health purposes, the most important of which are listed in table 1. Table 1. Players in the field of microbial genomic data exchange.

Gen

erat

e da

ta

Free

ly e

xcha

nge

data

Use

dat

a

Adv

ance

kn

owle

dge

with

da

ta

Dev

elop

new

pr

oduc

ts/

proc

edur

es w

ith

data

Com

mer

cial

ize

data

The public No/indirectly NA Indirectly No No No No

Sub national public health authorities Yes Mostly Yes Yes Some No No

National Public Health Institutes Developed Countries Yes Mostly Yes Yes Yes Some Ye

National Public Health Institutes Developing Countries Moderately Mostly Yes Some No? No Ye

Governmental animal health related institutes Yes § Yes * Yes Yes Some Yes? Ye

Food safety authorities Moderately Yes * Yes Some Some No? Ye

Academic parties Yes Mostly ± Yes Yes Yes Some / yes No

Commercial parties Moderately No Yes Some Yes Yes No

International organizations (WHO, ECDC, EFSA, etc) Moderately Yes Yes Yes No (Yes?) No Ye

Data-base owners / organizers No Yes No.. No No… No? No

Funding Resources, parties funding the research No No # No No No No No§ But pertaining to animal disease (note: zoonoses) ± But limited in timeliness relating to publication pressure * But sometimes problematic exchange with human health systems # Although funding parties may not have an active role here, they can make ‘the difference’ by requiring that data coming from research funded by them is shared NA Not Applicable Local, sub national authorities (e.g. Municipal Health Authorities) usually form the connecting

bridge between the public and national institutes (e.g. National Public Health Institutes). The national health institutes in their turn collaborate / confer with multiple layers of national and international authorities, governing bodies and institutes (e.g. their own ánd foreign national governments, international health authorities (e.g. ECDC, and regional and global WHO), international governing bodies (EU, other example?). Even though this list is far from complete, it already shows a complex structure with of multiple layers of national and international authorities. Mixed into this structure are collaborations with other parties mentioned above such as the academic and commercial parties, which may lead to additional interests.

Communication with food safety authorities (e.g. national food safety institutes, and international ones such as EFSA) can be vital for protecting public health against infectious disease, as is communication with animal health authorities, such as OIE. Public health researchers focus by definition on pathogens affecting humans. Pathogens, however, do not always keep to one host. On the contrary, a large proportion of emerging pathogens is of zoonotic origin and the ongoing exchange of information and data between veterinary and medical scientists is of great relevance in that view. Such interactions are nonetheless not standard and the veterinary en medical scientists do not always speak the same language. Similarly, institutes and agencies dealing with foodsafety are generally poorly linked to public health authorities.

An interesting place in the system can be played by the parties who fund the research. Some funding parties require that all data obtained in research founded by them, and resulting publications, are made publicly accessible. E.g. the NIH requires all data collected during NIH funded research is made publicly available. Funding applications to the NIH should address a data

Page 8: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

sharing plan, as substantiated as follows: “Data sharing is essential for expedited translation of research results into knowledge, products and procedures to improve human health.”8 Thus, research that was funded by ‘public money’ can be legally forced to place the resulting data, and this can even be raw data, back in the public domain.

The interaction between the different parties is complex, and it is not always easy to know who has mandate to act where and when, as illustrated for a representation of the (legislative)

players the European region in figure 1. In practice, because of the multilayered configuration of collaborations, the public interest, in the end the most important party in the whole scheme, may not always be served best.

Figure 1. Legislative force fields at play in the European Region. Note, er missen er vast nog,

en zoals de wereld bank, ik weet dat die ‘dingen doen’ bij uitbraken, maar niet wat zij voor mandaat hebben.

Shall I leave it with this or should I go into more detail and discuss the roles of all parties

separately? Too many details will not improve the readability and use much???? Define Open Access and Different levels/ interpretations thereof…

8 http://grants.nih.gov/grants/policy/data_sharing/, on 28-8-2009

Page 9: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

Open access does not (always) mean that anyone can make unlimited use of the data. From Dedeurwaerdere 2006, the institutional economics… And from Cook-Deegan 2006, regarding the science commons in life science: • access: the right to access a resource/information; • direct use: the right to change a resource/information; • follow-on use: the right to change a resource/information and obtain ownership of the

follow-on applications; • management: the right to decide upon the way a resource/information (for instance, a

database) is managed; • ownership: the right to exclude others from the use of a resource (exclusion right) and to

sell the resource and all the related rights (alienation right). “From this institutional point of view, it is clear that the structure of the science commons differs widely

when discussing cases such as GenBank,2 MOSAICC3 and GBIF4 (see Table 1). For instance, as we will discuss below, for GenBank, ‘‘open access’’ does not mean that the user of the information automatically has the right to use it for commercial purposes or to develop follow-on applications. If the sequences published on GenBank are the subject of patents, one has to get a licence to use them in research or product development.5 For GBIF, the ownership of the resource and all the related rights are in the hands of the local data provider and hence access conditions vary according to the policies of the – mostly public – funding agencies. Access to the international culture collections network MOSAICC is open to all; however, when acquiring a resource, users have to sign a Material Transfer Agreement that should guarantee traceability of the resources and fair benefit-sharing with their providers.” Cook-Deegan 2006

moet er iets in over meldingsplichtige ziekten?

Page 10: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

4. What has been done by others towards free sharing of data? We are not the first, nor the only ones addressing the issue of free flow of scientific data, and

more specifically, microbial genomic data. Within the whole field of life sciences there is an increasing need not only for (rapid) exchange of sequence data, but also of integration of all types of biological information across geographical and disciplinary boundaries. Other initiatives have, however, either not focused on the topic we find of specific and pressing importance: enabling public health protection by means of the efficient use of available data sources, or alternatively, they’ve focused on very specific pathogens.

Even though we feel that the addition of the (global) public health protection purpose adds crucial dimensions (urgency, necessity, enforceability?) to this discussion, which are likely to fundamentally shape the essence of its outcome, there is much to be learnt from these other initiative. In this chapter we shall look at two examples of note from two extremes. First we take a brief look at The Microbial Commons, which was initiated in 2006, in Belgium. This is a broad and academic approach to the theme of sharing microbial materials and the data pertaining to these materials, that aims to create a true microbial research commons. Secondly we briefly discuss a very focused effort towards the free and real-time, but nonetheless regulated sharing of influenza sequences and epidemiological data, also initiated in 2006, The Global Initiative on Sharing Avian Influenza Data (GISAID).

Microbial Commons Form: The Microbial Commons initiative started out as a joint undertaking of several organizations

(Belgian Science Policy, Science Commons, StrainInfo.net, Genomics Standards Consortium, Bioversity International and the US national committees of CODATA, IUMS and IUBS)9. This collaboration organized a symposium in Ghent, Belgium in June 200810 preceded by several publications outlining the basics of the topic11 and a Scientific Background Document12. This conference was expanded on recently, with a symposium held at the National Institutes of Science, Washington, USA13, where, among others, the contents of a monograph, resulting of the first symposium, the previous Scientific Background Document and a vast amount of background studies, was presented. The amount of work that has been done by the people in this Microbial Commons undertaking can’t be done justice by summarizing it briefly and towards our needs, but an attempt is made here.

Aim and scope: The Microbial Commons forum focused their attention to a much broader and more academic

area of interest of microbial data sharing than we. The organizers and participants focused on issues such as bioinformatics, intellectual property rights, material transfer agreements, text mining and integration with genomics databases, all aiming at building one integrated infrastructure for open microbial research.

This was done by aiming to accomplish the following tasks14: 1. Delineate the research and applications opportunities from improved integration of

microbial data, information, and materials and from enhanced collaboration within the global microbial community.

9 From the website: http://www.microbialcommons.ugent.be/, accessed 29-10-2009 10 In Ghent, 12-13th June 2008, http://www.microbialcommons.ugent.be. 11 International Social Science Journal, Vol. 188 12 http://www.microbialcommons.ugent.be/2007_11_13_Microbial_Commons_conference-short.pdf 13 In Washington, 8-9th October 2009, The International Symposium on Designing the Microbial Research Commons, http://sites.nationalacademies.org/PGA/brdi/PGA_050857 14 From the website: http://sites.nationalacademies.org/PGA/brdi/PGA_050858, accessed 29-10-2009

Page 11: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

2. Identify the global challenges and barriers—the scientific, technical, institutional, legal, economic, and sociocultural—that hinder the integration of microbial resources and the collaborative practice of scientific communities in the microbial commons.

3. Characterize the alternative legal and policy approaches developed and implemented by other research communities, such as common-use licensing for scientific data and information, standard-form material transfer agreements, open access publishing, and open data networks, which could be applied successfully by the microbial research community.

4. Define the contributions of new information and communication technology (ICT) tools in building federated information infrastructures, such as ontologies, data and text mining, and web 2.0.

5. Discuss and evaluate the institutional design and governance principles of data and information sharing among information infrastructures, drawing upon and analyzing successful and failed case studies in the life sciences.

6. Identify the range of policy issues that need to be addressed for maximizing open access to materials, data and literature information in an integrated microbial research commons.

Since part of the goal of the Microbial Commons document and derived symposia was to ‘gain

a better understanding of the impact of policies on access and reuse that are internal and external to the microbial community’ there is good reason to learn from the knowledge gathered through this initiative.

Most important findings so far: It is proposed that a future sequence database has a good starting point in the currently

existing public databases and public bioinformatics centers. A premise that we subscribe to is that data placed in the database, that was generate by publicly funded research, should be regarded as a public good, and therefore the resulting database should be a public good. Nonetheless integration of physical materials and digital resources is regarded as a preferred situation over two stand-alone outcomes of shared digital, e.g. sequence data on the one hand, and a materials common on the other. Integration is, however, difficult, because these resources are typically governed by different legal frameworks and different institutional settings.

Hurdles encountered: In order to reach the proposed situation, the foundations on which common practice in the

scientific community is currently based will have to be broken down and rebuilt in the proposed new way. This would entail revolutionizing the (microbial) scientific world as we know it, the way scientists work, think and value their own and their peers’ work.

GISAID Form: GISAID, a nonprofit organization based in Washington, D.C. has as its goal is to facilitate and

support data sharing and collaboration by scientists in the global community. Since May 2008, GISAID has maintained and made available, free of charge, a publicly accessible database for influenza gene sequences, known as the GISAID EpiFlu Database.

The GISAID database was initiated to stimulate and facilitate the international sharing of influenza sequences. On August 31st 2006 a paper15 was published in Nature, outlining the future database and the prerequisites for access and use.

Aim and scope:

15 Nature 442, 981 (2006)

Page 12: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

Those who use the database will gain a more complete and global understanding of influenza. Collaborations will be sparked among scientists in the industrialized world as well as between them and scientists in developing countries. The knowledge gained may for example be employed to develop influenza vaccines, antiviral drugs and diagnostic kits. Whereas the initiative was originally set up to serve researchers investigating highly pathogenic H5N1 avian influenza, sequences of all influenza viruses accompanied by epidemiological data may be submitted. Access has to be applied for, and once granted gives the users the status of ‘registered user’. From the website16: “Registered users can upload data relating to sequences, clinical manifestations in humans, epidemiology, observations in poultry and other animals, etc. These data will be accessible to all other registered users, but not to others unless they have agreed to the same terms of use. This maintains confidentiality of the data. The data can be used to publish results if the publication acknowledges the originating laboratory and the authors agree to collaborate with the data provider in further analysis and research. The data can also be used to develop vaccines and other interventions.”

Most important accomplishments so far: Scientists from over … countries have so far submitted .. sequences, supported by …

epidemiological data (access date …), whereas some countries, some of which were highly relevant in the unfolding story of the avian influenza were at first reluctant or even unwilling to share their data and information17. Agreeing on clear and reasonable terms of use of the submitted data has convinced even those initially unwilling to share. GISAID has become the database for retrieving and submitting data regarding the new influenza H1N1v; ‘the GISAID platform offers a comprehensive collection of the sequence of the A/H1N1 pandemic flu strain (swine lineage) together with latest news, discussion and scientific contributions on the subject’18.

Are there accomplishments to report of the nature of vaccines developed, etc?? Hurdles encountered: As described above, data from a key-nation / hot spots in the avian flu [thing] Indonesia was

only available to a very limited number of researchers, from Indonesia and from the WHO. This was solved by… intervention of GISAID-representatives???

An other hurdle had to be overcome in July 2009, when the Swiss Institute of Bioinformatics (SIB), that hosted the database on the basis of a formal agreement, ‘hijacked’ the database and made it accessible only through the SIB website19. Very rapidly a new and improved database was built, that was launched on September 14th 2009.

What can we learn from these initiatives from two extremes? Not surprisingly, the outcomes of these approaches (a preliminary one for the Microbial

Commons), one from the academic and highly comprehensive side of the spectrum, and the other a very down-to-earth and practical approach, are very different in their nature. Probably, the comprehensive approach taken by the Microbial Commons consortium, that lead to revolutionizing recommendations, is on a too grand scale for us. Nonetheless, the gist of the (direction of the) outcome and the route towards it, and the research done to support this, is of great value for our discussion, albeit perhaps in the background. On the other hand, the bold action taken by GISAID, to ensure free-flow of data for influenza viruses in the best possible manner, is inspiring in a different way. The success of the public database, the relative ease with which the influenza-community has adopted it, shows that with a clear goal and comprehensible and good basic agreements our goal can clearly be attained.

16 http://platform.gisaid.org/, accessed 29-10-2009 17 Nature 441, 1028 (2006) 18 www.gisaid.org, Accessed 18-9-2009 19 http://www.prlog.org/10340963-gisaid-launches-second-influenza-database.pdf Accessed on 18-9-2009

Page 13: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

5. Examples of data-sharing initiatives and networks Platforms and approaches? Overzicht van wat netwerken, databases ed op basis van inventarisatie en

vragenlijst Interview principle investigators of several networks, see next page Attention for A Existing long-term or ongoing surveillance networks: What was the original aim? Rules of engagement and confidentiality Set up (technical, what data is collected, who can contribute, who has access) What output has been produced? Specific actions, papers / overviews new insights etc. What problems have been encountered and how were they overcome? Examples: FBVE database, polio, influenza (success is current H1N1, failure H5N1-Indonesia), …,

bacteria, other pathogens, systems focusing on other level than the pathogen itself such as food-borne, aviation-related, symptom / syndrome etc.(ik roep maar wat, weet niet of t bestaat).

B Ad hoc response systems: What (event) triggered this collaboration? What aims –if any- were formulated? What agreements were made? Set up (technical, what data, who can contribute, who has access) Would this have worked better if a system had originally been in place? Examples: SARS, Q-fever meeting last year???, something gone terribly wrong??, include an example

from related field of agriculture- infections of crops??? Summary of strengths and weaknesses of systems encountered in practice Parallels with HUGO? Examples of good practice Examples of situations to avoid/solve

Page 14: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

6 Identification of hurdles (or leave this out (to be listed at the meeting), limit this here to the tech hurdles?) elaborate

Hurdles may be Economic (Planned) Patenting of findings generally leads to delayed release and later on to restrictions

in the use of the data. Technical Need for sharing is not recognized. “If I have no use for this data, or find no relevant scientific

merit in it, then nobody else will.” No infrastructure available that enables structured datasharing. E.g. the public databases

(GenBank etc) do not allow for inclusion of relevant epidemiological data etc. Scientific Very often data is not shared in real time, but only after publication of a resulting scientific

paper describing and exploring the data. Need for sharing is not recognized Legal Confidentiality agreement has been made within a collaborative network (planned) Patenting of findings leads to delay in release and restrictions in the use of the data In human genomics sharing of data endangers / invades the privacy of patients. This

(probably) does not hold true for microbial data exchange. Compare with human genome data and Bermuda principles / Fort Lauderdale agreement. (legal, moral/ ethical)

Institutional In a crisis situation, e.g. in the case of an emerging pathogen, that causes disease in animals

and humans, or spreads through contaminated food-items, it may be unclear which organization has the mandate to take action or collect data. Complicated situations may arise, with overlap in the authorities of local, national, and several international institutes, and/ or governing bodies (see force fields, previously). Interests and outlooks, and the resulting prioritizing, may vary greatly between different (levels) of organizations.

Social or sociocultural Need for sharing is not recognized. “…” “I know that mr X, the other scientist, is also working on this topic, and if I share my data he

will use it. I do not personally like mr X, and therefore I will not share.” The development of the internet and resulting novel opportunities for mass collaboration and

data sharing, have lead to a massiveness or sort of obscureness in which scientists are unwilling to put their (valuable) data, “out into the great wide open”, for anyone to take, without maintaining control over it.

Page 15: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

7 Conclusions and Possible directions (recommendations? Of doen we dat pas na de conferentie, en moet dit meer met een open eind?)

Page 16: Technical background document Molecular Data Sharing for ... · the field of microbial genomics are quite different from those in the field of human genomics, as are the public health

8 Acknowledgements We thank the organizers of the Microbial Commons meeting in Washington for their kind invitation to attend their meeting.