the european nutrigenomics organisation un oslo un munich un florence un balearic illes un cork...

45
the European Nutrigenomics Organisation Nu Nu GO G O Nu Nu GO G O Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE Un Krakow Inserm Marseille TNO Un Wageningen Un Maastricht EBI Nu GO Un Lund Rikilt Rivm

Upload: kellie-bailey

Post on 29-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGOUn Oslo

Un Munich

Un Florence

Un Balearic Illes

Un Cork

Trinity

Un. Ulster

Rowett

Un Newcastle

Un Reading

IFR DiFE

Un Krakow

Inserm Marseille

TNO

Un Wageningen

Un Maastricht

EBI

NuGO

Un Lund

RikiltRivm

Page 2: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

The NuGO Black Box project:

A distributed bioinformatics infrastructure

for nutrigenomics research

Tony Travis

University of Aberdeen

Rowett Institute of Nutrition and Health

Page 3: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO NuGO is a virtual organisation

• Why?

• Management of research projects spans institutional boundaries

• Avoids duplication of effort, and share resources to solve problems effectively

• How?

• Scientists working in different research labs collaborate and share their data

• Labs develop trust relationships, and share intellectual property rights

Page 4: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Why is data sharing important?

• It's evidence that a trust relationship exists

• Help to reconcile conflicts of interest– Take measures to restrict access to data– Avoid accidental 'prior-disclosure'– Prevent unauthorised or inappropriate use– Support potential patent applications– Permit correct attribution of scientific work

• Lack of trust - disincentive to data sharing

Page 5: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO NuGO is virtually organised

• Strengths

• Apply the aggregate resources of many partners to problem

• e.g. PPS (Proof of Principle Study)

• Free exchange of ideas within NuGO

• Weaknesses

• Trust relationships are quite fragile...

• Conflicts due to 'prior disclosure' of unpublished data

• Unfair attribution of work accomplished

Page 6: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Utopian view

Property is theft… Share data

freely Everyone

benefits Ideas develop Science

prospers

Page 7: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Not everyone agrees!

Big pharma make a profit by exploiting academic science

ISV's promote proprietary software

Knowledge is power...

Freedom is a threat

Page 8: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Reconciliation...

• IPR– Intellectual property

rights are important

• Freedom– Intellectual freedom

is important

• Attribution– Supports IPR– Defends freedom

Page 9: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Data management

• Single most difficult problem for science– No simple solution to 'schema' integration– Data centres are appropriate for business– Business methods are not appropriate for

science...

Page 10: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Business computing methods

• Bottom line– Minimise the cost of ICT infrastructure– Centralise resources– Maximise profit

• Rigid and inflexible ICT policies– Reduce costs– Use industry 'standards'– Avoid expensive 'non-standard' solutions

Page 11: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Scientific computing methods

• Intellectual freedom– Maximise the benefit of ICT to scientists– Collaborative development of software– Freedom to innovate

• Flexible ICT policies– User administered PC's– Devolution of authority– Well supported and documented

Page 12: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Data centres are quite good

• The problem is business methods– Profit, not the 'customer' is the priority

• A well-managed data-centre is a good place to store your data!

• Users don't need to worry about backups and disaster recovery

• Science is sometimes underfunded, so economies of scale can be important

Page 13: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Let's compromise...

• Size matters– A large, remote data centre is too big– A typical laptop/desktop is too small

• The solution should be scaled appropriately– Our unit of collaboration is the lab– Let's say five of six people in a lab– Everyone has their own PC

• We need a 'lab-scale' solution

Page 14: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO NBX strategy

• Server-grade PC– Designed to be running 24/7/365– Resilient to hardware failure– Powerful enough for five or six people

• Use all available resources– The NBX is a web server 'appliance'– The lab PC's are clients that use the NBX– The NBX does most of the work– The client PC's display the results

Page 15: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO IT policy at NuGO partners

• Limited access requested– Port 22 (SSH) and 80 (HTTP) open

– Tunnel insecure protocols via SSH

•Client PC requirements modest– Java enabled web browser

– Optional installation of Windows clients

•Remote admin of NBX's by NuGO

Page 16: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGOWhy a ‘black-box’ approach?

Don’t need to know how it works to use itDeploy a pre-configured Linux server easily

– Install from ‘live’ DVD on existing hardware

– Pre-installed on systems supplied by NuGO

– Reduce need for IT support in every lab

– Automatic backup and software updatesAutonomous system

– able to discover peer NBX systems

– cooperate with peers to share workload

Page 17: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO NuGO Black Box (NBX)

• lab-scale server– Based on Bio-Linux– NERC/NEBC

• Web-appliance– Web browser– Web services

• NBX network– Bioinformatics

infrastructure

Page 18: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

Page 19: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

Page 20: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

Page 21: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

Page 22: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

Page 23: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

Page 24: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

Page 25: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO NuGO data sharing network

Page 26: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO NBX roll-out to NuGO partners

• Limited access – Port 22 (SSH) and

80 (HTTP) open

•Client PC– Web browser

– Optional clients

•NBX Admin– Local “manager”– Remote “nugo”

Page 27: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGONuGO-Grid

•Network of NuGO-Linux servers– Interconnected to create Grid

•Compute Grid– Load-balance between servers

•Data Grid– Pool data and share resources

• P2P (Peer-to-peer) – Local control of resource sharing

Page 28: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Current status of NuGO-Grid

Page 29: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

NuNuGOGONuNuGOGO Kerrighed

• Active EU FP6 project• Funded until 2010• http://www.kerrighed.org• Uses ideas from openMosix and OpenSSI

Page 30: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

NuNuGOGONuNuGOGO Prototype NBX clusters

• Maastricht– Four NuGO NBX's

• RINH– Four NuGO NBX's– Four RINH NBX's– Eight BioSS NBX's

• Collaboration with Mario Negri Institute

• Objectives– Aggregate CPU and

memory of locally connected NBX's

– Incremental upgrade of NBX's instead of NBX replacement

– Adjust resource to scale of problem

Page 31: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO XtreemOS Grid operating system

• INRIA, Paris

– Kerrighed-based

– SSI kernel patch

– Grid capabilities in

Linux kernel space

– No middleware

– Virtual organisations

Page 32: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO XtreemOS proof of concept

• NuGO project at RINH and Mario Negri– Evaluate Kerrighed and XtreemOS for NBX– Using Bio-Linux 5.0 version of NuGO-Linux– Prototype seven node Kerrighed cluster

• EasyUbuntuClustering Wiki– http://wiki.ubuntu.com/EasyUbuntuClustering– Community-based collaborative development– Part of the 'biobuntu' blueprint

Page 33: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

Harnessing the power ofHarnessing the power ofDisruptive TechnologiesDisruptive Technologies

A PEER-TO-PEER approachA PEER-TO-PEER approachfor data sharing in clinical trials for data sharing in clinical trials

and bioinformatics researchand bioinformatics research

Page 34: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

NuNuGOGONuNuGOGO Peer-to-peer data sharing

• Luca Clivio, Mario Negri Institute Milan– p2pDB for clinical data– Case study 1: SINPE-DOMUS (in production)

• Italian Registry of Domiciliar Artificial Nutrition• ~3000 patients enrolled in 60 centres• Each patient visited about 10 times

– Case study 2: (under test)• Italian Gynaecologic Ovarian Cancer Tissue Bank• Three tissue/cell line banks at the Oncology Dept.

Page 35: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

NuNuGOGONuNuGOGO SINPE-DOMUS (web model)

CentralWeb server

centre

centre

centre

centre

centre

centre

centre

centre

Page 36: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

NuNuGOGONuNuGOGO SINPE-DOMUS (distributed DB)

Centre Centre

Centre

Centre

Centre

Centre Centre

Centre

MARIO NEGRIInstitute

Page 37: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

NuNuGOGONuNuGOGO SINPE-DOMUS (p2pDB)

Centre Centre

Centre

Centre

Centre

Centre Centre

Centre

MARIO NEGRIInstitute

Page 38: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

NuNuGOGONuNuGOGO

Storage Peer

Storage Peer

High performance Cluster Peer

Partner

Index node A

Index node C

Index node B

Index node D

p2p Network coordination

Partner

Partner

Partner

Push-based p2pDB

Page 39: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

NuNuGOGONuNuGOGO Proposed Infrastructure

Arrayexpress

GEO

MicroarrayExperimentsDB (LIMS)

ClinicalTrials DB

BioBankDB

(tissue banks or cell lines)

Microarrayexperiments

Analysisworkflow

Tony Travis (RRI/BioSS)NUGO Black Boxes

(European nutrigenomics)

Output

Duccio Cavalieri(Istituto Toscano Tumori)

Giovanna Chiorino(Fondo Edo Tempia)

Page 40: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

NuNuGOGONuNuGOGO Unbalanced networks

• Inevitable in 'omics' research– The huge amount of data involved does not

allow a fully replicated distributed database– Unpublished data can not be shared without

explicit agreement between collaborators– Unverified data should not be shared at all

• p2p data sharing– Designed for unbalanced networks– map/reduce moves computation to data

Page 41: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO The wrong cloud*

• Amazon Web Services (AWS)– Elastic Compute Cloud (EC2)– Simple Storage Service (S3)

• Cost effective if you seldom use a computer– The more you use a computing and storage infrastructure

the less economic it becomes to rent it from someone else

• Nothing new: Computer bureaux and expensive BIG iron

• Private clouds are the way forward– Maximise use of resources within an organisation

* Peter Lucas, Joseph Ballay, Ralph Lombreglia (MAYA Design, Inc., March 2009).

www.maya.com/file_download/126/The%20Wrong%20Cloud.pdf

Page 42: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Development of NuGO-Grid

• NBX Data Grid

– Data sharing

– NuGO-Linux

• NBX Compute Grid

– Kerrighed

– XtreemOS

Page 43: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO NuGO-Linux USB stick

• Installation– Workstation/server

• Rescue– Disaster recovery

• Personal NBX– 'live' USB stick– demo/evaluation

NuGO-Linux DVD 'iso' image at: http://nbx1.nugo.org/biobuntuContact: NuGO communications manager: [email protected]

Page 44: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Summary

• Viable NBX Data Grid

• Basic NBX Compute Grid

• Bio-Linux 5.0 NBX's being deployed

• XtreemOS NBX proof of concept project

• Collaboration with NEBC and NTC

• Proposed p2bDB infrastructure

• NuGO-Linux USB stick

http://www.nugo.org/nbx

Page 45: The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork Trinity Un. Ulster Rowett Un Newcastle Un Reading IFR DiFE

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Acknowledgements NBX project

Ulrich Harttig (NIN and NBX repository)

Harrie Kools (NBX installation)

Philippe Rocca-Serra (base2)

Philip de Groot (GenePattern)

Chris Evelo, Martijn van Iersel, Thomas Kelder (Desktop)

Duccio Cavalieri (EuGene and NBX access policy)

Patrick Ahles, Charly John, Olivier Riche (NBX upgrade)

Lars Eissen, Caroline Reiff (NBX help-desk)

Marten Renkema (NuGO-Net NBX pages)

Kerrighed/XtreemOS proof of concept project Luca Clivio, Alicia Mason (Kerrighed and XtreemOS)

Ruan Elliott (WPT coordinator and NBX tester)

p2pDB (IRFMN) project Luca Clivio, Bioinformatics Dept.

Sergio Marchini, Oncology Dept.

Maddalena Fratelli, Biochemistry Dept.

Giovanna Chiorino, Fondo Edo Tempia, Biella

Duccio Cavalieri, Istituto Toscano Tumori, Università di Firenze