code analysis repository and modelling for e-neuroscience - car… · carmen is a big data portal...

44
Code Analysis Repository and Modelling for E-Neuroscience An e-science virtual laboratory supporting collaboration in neurophysiology

Upload: others

Post on 12-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Code Analysis Repository and Modelling for E-Neuroscience An e-science virtual laboratory supporting collaboration in neurophysiology

Page 2: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Content

•  Overview of CARMEN – What it is – Who it is for – What does it consist of

•  Current status •  Portal technology

–  To the extent that I know it! •  Ways forward and issues

Page 3: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

CARMEN is a Big Data portal for neurophysiology

•  An e-science virtual laboratory supporting collaboration in neurophysiology

•  Runs on a cluster of machines at York University •  Portal based

–  That is, accessed over the web via a portal, from anywhere •  Enables uploading/downloading of data •  Running of services, including some display services •  Enables workflows.

Page 4: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Theoreticians and modellers need data and can provide

predictions to experimentalists

Analysts provide the tools to statistically and

mathematically describe the data

Experimentalists obtain original data to test

hypotheses and derive new knowledge

Page 5: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

CARMEN Objectives � To create a grid-enabled ‘virtual laboratory’ environment for neurophysiological data (‘co-laboratory’, or ‘virtual research environment’)

� To develop an extensible, client-defined ‘toolkit’ for data extraction, analysis and modelling

� To provide a ‘repository’ for archiving, sharing, integration and discovery of data

� To demonstrate and sustain advances in neuroscience enabled by e-science technology

Page 6: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Newcastle

York Stirling

Leicester

Imperial

Manchester

Sheffield

Warwick

Plymouth

St Andrews

Cambridge

CARMEN Consortium First two compute nodes (CAIRNS)

Collaborators in: Edinburgh; Berkeley; Washington; St. Louis; Aberdeen; Seoul; Pennsylvania; New York; Boston; Brazil

Page 7: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Access  Portal  through  www.carmen.org.uk    

Click on link to access the portal

Page 8: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Registra6on  Open  to  all  Academic  Users  

Automatic Registration for New Users

Page 9: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Organisa6on  of  Workspace  

Workspace divided into repository of resources (left hand window) and information

and activity (right hand window)

Page 10: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Organisa6on  of  Data  

Data you have permission to access are organised into ‘personal’, ‘shared’ and ‘public’ folders

Page 11: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Entering  Data  and  Metadata  

Upload of new data generates metadata forms. Pre-compiled

templates speeds upload process

Page 12: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Entering  Data  and  Metadata  

Expanding windows are used for collecting the metadata

Page 13: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Entering  Data  and  Metadata  

A browser allows you to select files to upload

to the platform

Page 14: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

SeAng  Security  ACributes  You have options to make data and metadata private, shared with collaborators, or public.

Page 15: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

SeAng  Security  ACributes  

Typing part of a name, address or e-mail suggests possible registered users for sharing

Page 16: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Finding  Resources  in  the  Repository  Search function using

multiple terms enables users to find appropriate resources

Page 17: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Viewing  Metadata  and  Annota6ons  

Metadata can be accessed to understand any resources and those with permission

can add annotations

Page 18: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Viewing Data Files

Time series data files can be viewed by launching the thick client tool ‘Signal Data Explorer’

Page 19: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Viewing  Time-­‐Series  Data  Files  

Multiple views can be used to explore the data, including multielectrode array view

Page 20: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Feature  Searching  

Pattern matching function allows feature searching and averaging

Page 21: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Execute  Services  Services are uploaded and shared like data. Available services can be added

to ‘Favourites’ list in the resources.

Page 22: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Execute  Services  Associated metadata describe the function and required parameters

Page 23: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Set  Service  Parameters  through  Interface  

Parameters are entered using an automatically generated entry form

Page 24: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Service  Log  A Service Log monitors progress

with executed services and shows results of prior services

Page 25: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Service  Results  Output  

On completion results are written to the resources

directory and can be viewed

Page 26: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Common  File  Format  (NDF)  An internal file format (NDF) allows services to run across multiple data types. A service converts files from

original formats.

Page 27: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Workflow  SoRware  

NDF will enable services to be linked into more

complex workflows

Page 28: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Platform Development: CARMEN ‘Cloud’ (CAIRN)

Raw  Signal  Data  Search  &  Visualisa4on  

Data

Metadata

Compute Cluster on which Services are Dynamically

Deployed

Web

Porta

l Rich

Clients

S e c u

r i t y

Workflow Enactment

Engine

Registry Service

Repository

Enactment  of  scien4fic  analysis  processes  

 

Security  Policies  Controlling  Access  to  Data  &  Code  

Search  for  Data  &  Analysis  Code  

Structured  Metadata  Store  Enabling  Search  &  Annota4on  

Analysis  Code  Store  

Raw  &  Derived  Data  Store  

Page 29: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Metadata – The MINI Document � Attempt to identify core information in 8 domains: 1. Contact and context (4 terms) 2. Study subject (16 terms) 3. Recording location (5 terms) 4. Task (4 terms) 5. Stimulus (5 terms) 6. Behavioural event (3 terms) 7. Recording (6 terms) 8. Time series data (3 terms)

� Most terms are user defined but this will become fixed by a lexicon – work with INCF

Nature Precedings (2009) http://hdl.handle.net/10101/npre.

2009.1720.2

Page 30: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Data format issues

•  Raw electrophysiology data comes from many different manufacturers and sources –  Manufacturers: Multichannel Systems, Blackrock, Plexon, …, and some researchers build their own systems

–  Proprietary data formats •  Not open data formats!

–  Some assistance with data conversion •  Neuroshare DLLs (unidirectional: enables data interrogation)

•  … and that’s just electrophysiology data: not including EEG data, for example

Page 31: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Data formats and services

•  We want to run services – that is code that processes data to produce derived data –  [and updates the metadata to show how the derived data was reached]

•  We don’t want to have to write a new service for each possible data format. –  So we need to either:

•  Convert all the data into a fixed format, or •  Write a set of data conversion services that enable each service to cope with all

the data formats –  CARMEN went with the first option. –  In face we developed out own data format, NDF, which is not directly

HDF 5 compatible •  Services run on a machine hidden behind a portal

–  What implications does this have for interactivity?

Page 32: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

NDF (Neural Data Format) and HDF5

•  NDF was developed by Bojian Liang at York University –  At the time HDF5 was not directly available to us –  HDF4 could not cope with large datasets

•  Ours could be up to more than 20Gbytes per experiment •  If we started again we would use HDF5

–  But actually, HDF5 doesn’t really solve the problem •  HDF5 is only a mechanism for data storage •  The precise format still needs defined

–  And there needs to be an Applications Programmer Interface to go with it …

Page 33: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Services and Workflows

•  Services take in data, and produce new derived data, and/or displays from that data –  Services are often parameterised: these are values that can

be set which alter the precise behaviour of a service •  Workflows are concatenated sequences of services

–  Again, each service may have its own parameter set –  Also, there may be loops in the workflow, and possibly

decisions (like when to terminate) as well

Page 34: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Issues

•  Design of a portal-based access to Big Data –  Who are the target clients? –  What sort of user interface do they want/expect/are they willing to put up

with? •  What are they used to?

–  Portal-based systems tend to have a higher latency/delay than local systems

•  (Latency is the time between the last key stroke/mouse gesture and the beginning of a visible reaction)

•  Because the latency is the sum of the latency of the local machine sending a message on the internet, the portal and server processing it, plus the round trip delay

•  This can present a problem in acceptance for those who previously used only local systems

Page 35: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Parameter entry for services •  Currently entering them

one by one … –  Even with a default

•  … is clumsy and slow •  Is there a better design? •  Would a command-driven

approach be better? –  Even if rather old-

fashioned? •  Is there a better way?

Page 36: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Workflow creation is currently graphical

Page 37: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Workflows and GUIs

•  Graphical workflow creations looks good –  And sounds like the right way to go:

•  But: for complex workflows –  Like ones that cycle through a set of parameters in the

different internal services •  … it is not easy to see how to create the appropriate

graphical tools – What might be better?

Page 38: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Technologies in CARMEN

•  The CARMEN Virtual Laboratory Architecture: –  Three-tier

architecture

Web Portal: Google Web Toolkit

System management: Java Servlet Based

MySQL Database

Page 39: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

CARMEN Architecture continued

Web Portal: Google Web Toolkit

System management: Java Servlet Based

MySQL Database

User's View

Services on Compute Servers

Service metadata, security info, asset

identifiers

1st tier: Portal built using Ajax, GWT running in the browsers 2nd tier: Back end build using Java, Servlets, C/C++, XML runs across Linux and Windows 3rd tier: Storage MySQL and HP X9000 storage system.

Page 40: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Service architecture

Page 41: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

Service wrapping •  Services are written by users in many languages:

–  Matlab, R, Python, … •  … and they need to be deployed •  They need to respect the service architecture

–  Getting parameters, reading and writing files, returning results, …

•  … and to do this, they need to be wrapped. •  Web services wrapping: turn each service into a web

service –  Very general: enables remote services to be run –  Heavyweight: using JAX-WW, added about

20Mbyte/service, added 10-20 seconds to deployment

•  Alternative: make them a simple loadable class –  Simple and fast –  But does imply that service deployment is local.

Page 42: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

What have we learned?

•  CARMEN is quite an old project, as Big Data projects go –  Started in 2006

•  What would we do differently –  In response to the users –  if we started again?

•  Poster.

Page 43: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

What’s happening now with CARMEN?

•  Right now (next week): being demonstrated again at Society for Neuroscience meeting, Washington DC. –  30,000 delegates!

•  Practical issues –  Running out of funding in February 2015

•  Very soon –  Writing a Horizon 2020 proposal to join it together with other

European “Big Data in Neuroscience” projects to create a large-scale Virtual learning Environment

Page 44: Code Analysis Repository and Modelling for E-Neuroscience - CAR… · CARMEN is a Big Data portal for neurophysiology • An e-science virtual laboratory supporting collaboration

CARMEN Consortium