remi database: a data management system for nuclear ...remi facts about remi developed for the...

43
REMI Database: A Data Management System for Nuclear Medicine Studies Antall Johann Fernandes Florida Institute of Technology afernandes2010@my.fit.edu 30 th November, 2011 Antall Johann Fernandes (Florida Tech.) REMI Database 30 th November, 2011 1 / 43

Upload: others

Post on 27-Nov-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Database: A Data Management System forNuclear Medicine Studies

Antall Johann Fernandes

Florida Institute of Technology

[email protected]

30th November, 2011

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 1 / 43

Page 2: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI

Facts about REMI

Developed for the Nuclear Imaging Group of the Life SciencesDivision at Lawrence Berkeley National Laboratory, California.

Supported by a subcontract on NIH grant 5-R01-EB007219-04,”Molecular Imaging of Cardiac Hypertrophy Using microPET andPinhole SPECT”.

Currently is in production and houses 180 experiments, and 8105 datafiles.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 2 / 43

Page 3: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Motivation

Typical Work Process by the Nuclear Imaging Group

Perform a SPECT, DT-MRI or Micro-PET experiment.

Copy experimental data to individual researcher’s workstation.

Perform data analysis, image reconstruction and various otherprocessing tasks on experimental data.

Extract results from processed data.

Publish results and findings based on results.

What happens to the experiment raw data or processed data?

Data resides on individual researcher’s workstation under a researchergenerated directory structure.

Data is moved or copied into a central file server into a researchergenerated location and directory structure.

Data about the experimental data may or may not be captured.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 3 / 43

Page 4: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Motivation

Knowledge about Data

What experimental data has been collected?

Where does all of the experimental data reside?

Data Sharing

Can we share the experimental data with others?

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 4 / 43

Page 5: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Motivation

The Need for a System

Aware of all the experimental data captured over the years.

Knowledge of the location of the experimental data.

Provide the capabilities to share the experimental data.

Sharing Experimental Data

Data should be stored in a standardized manner.

Data should be query-able.

Meta-data needs to be captured in a timely manner. (Meta-data isexplained later).

Users should not need to deal with file systems.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 5 / 43

Page 6: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Design Considerations

Typical Scientific Data Management Systems should have...

Creation of logical collections - physical data to logical collections.

Physical data handling - storage of the physical data files.

Security support - data access authorization and change verification.

Data ownership - who is responsible for data quality and meaning.

Persistence - data lifetime.

Knowledge and information discovery - identify useful informationinside the data collection.

Meta-data collection, management and access.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 6 / 43

Page 7: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Design Considerations

REMI’s Design Considerations

Open access to downloading of experimental data.

Users should be able to access the database from anywhere.

Authorized researchers should be able to upload experimental data.

REMI should handle the physical storage of data files (explainedlater).

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 7 / 43

Page 8: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Understanding Experiments and its Data

SPECT Experiment

Data and Meta-Data

Patient Coordinate System

Detector Coordinate System

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 8 / 43

Page 9: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Meta-data

What is Meta-data?

Information about data.

Describes how data is measured, acquired, and computed.

Enables data browsing, data transfer, and data documentation

Makes it possible to build data independent automated tools.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 9 / 43

Page 10: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Entities and Relationships

Entity

A ”thing” or ”object” in the real world that is distinguishable from allother objects. Eg. Person or Vehicle.

An entity has a set of properties and values that identify an entity.

Relationship

An association among several entities. Eg. A Person owns a Vehicle.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 10 / 43

Page 11: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Entities and Relationships

Entity-Relationship Data Model

Based on the perception of the real world that consists of a set of basicobjects called entities, and of relationships among these objects.

Mapping Cardinalities

Represents constraints on the relationship.

Expresses the number of entities to which another entity can beassociated via a relationship.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 11 / 43

Page 12: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Meta-data Database Development

Logical Entity Relationship Diagram

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 12 / 43

Page 13: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Meta-data Database Development

Physical Table Design Diagram

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 13 / 43

Page 14: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Meta-data Database Development

Physical Table Design Diagram: Machine

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 14 / 43

Page 15: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Meta-data Database Development

Physical Table Design Diagram: Tracer

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 15 / 43

Page 16: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Meta-data Database Development

Physical Table Design Diagram: File Tag

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 16 / 43

Page 17: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Meta-data Database Development

Physical Table Design Diagram: File Association

Owner ID maps to the entity which owns the File.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 17 / 43

Page 18: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Meta-data Database Development

Physical Table Design Diagram: User

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 18 / 43

Page 19: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI File Storage Development

REMI’s Design Consideration (mentioned earlier)

REMI should handle the physical storage of data files.

File Storage Schemes

(c) Single Directory (d) Directory Hierarchy

All Files under a single directory.

Individual directory for each experiment.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 19 / 43

Page 20: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI File Storage Development

REMI File Storage

Concatenate experiment ID and filename and generate a 64 lengthSHA256 string.

Create directory structure based on SHA256 string, directory levels,and directory name size

Parameter directory name size determines how many characters touse of the hash value

Parameter directory levels determines the depth of the directorystructure

Example 1

SHA256 string: fb80f1735b3153e7a41d38390de5d9773c35259965..directory levels : 2directory name size : 2=>ROOT DIR/fb/80/<experiment id> <filename>.<extension>

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 20 / 43

Page 21: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI File Storage Development

REMI File Storage: Example 2

SHA256 string: fb80f1735b3153e7a41d38390de5d9773c35259965..directory levels : 2directory name size : 3=>ROOT DIR/fb8/0f1/<experiment id> <filename>.<extension>

REMI File Storage: Example 3

SHA256 string: fb80f1735b3153e7a41d38390de5d9773c35259965..directory levels : 3directory name size : 2=>ROOT DIR/fb/80/f1/<experiment id> <filename>.<extension>

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 21 / 43

Page 22: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI File Storage Development

File Storage

directory levels : 2directory name size : 2

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 22 / 43

Page 23: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Application Development

Rails Model-View-Controller (MVC) Architecture

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 23 / 43

Page 24: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Application Development

System Design

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 24 / 43

Page 25: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Application Development

System Component Design

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 25 / 43

Page 26: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Application Development

Reasons to upload Data Files in Parts

Web Browsers can upload a maximum of 4 GB of data per request.

Certain data files are in excess of 50 GB in size.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 26 / 43

Page 27: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Application Development

File Upload Process

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 27 / 43

Page 28: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Application Development

File Upload Process

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 28 / 43

Page 29: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Application Development

File Upload Process: Save the chunk files

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 29 / 43

Page 30: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI Application Development

File Upload Process: Concatenate the chunk files

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 30 / 43

Page 31: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI User Interface Development

JSON (JavaScript Object Notation)

A lightweight data-interchange format.

Easy for humans to read and write.

Built on two structures.

A collection of name/value pairs.Realized as an object, record, struct, dictionary, hash table, keyed list,or associative array.An ordered list of values.Realized as an array, list, or sequence.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 31 / 43

Page 32: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI User Interface Development

JSON Syntax

(e) Object Syntax

(f) Array Syntax

(g) Value Syntax

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 32 / 43

Page 33: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI User Interface Development

REMI Menu

id: Translates to the element ID on the HTML page.

value: Display name shown on the user interface.

link: URL to be called.

submenu: Array to contain sub-menus.Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 33 / 43

Page 34: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI User Interface Development

Download Menu Options

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 34 / 43

Page 35: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI User Interface Development

Searching for Experiment based on Modality

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 35 / 43

Page 36: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI User Interface Development

Creating a New SPECT Experiment

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 36 / 43

Page 37: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

REMI User Interface Development

Various Sections within the SPECT Experiment

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 37 / 43

Page 38: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Weaknesses

Weaknesses within REMI

Lack of full support for Semi-Structured Data.

Semi-Structured Data does not conform with the formal structure oftables and data models associated with relational databases.Meta-data is all semi-structured data.

Resume functionality for uploading and downloading data files.

Server side file compression on large files.

Meta-data Templates and User Personalization.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 38 / 43

Page 39: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Future Enhancements

Provide support for Semi-Structured Data.

Relational databases with XML support.

Move from a relational model to Entity-Attribute-Value model.

Document based databases.

User Personalization.

Provide user specific views of owned experiments.

Support saving of personalized search queries.

Meta-data

Best approach to modeling meta-data.

Make the meta-data capture process less cumbersome.

Reading meta-data from header files.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 39 / 43

Page 40: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Future Enhancements

Resume functionality for Data File Uploads.

Files uploaded in chunks.

Query the server for existing file size on the server.

Transfer file chunks greater than the size on the server.

Resume functionality for Data File Downloads.

Look at HTTP Byte Range Retrieval Extensions.

See how the resume functionality affects the compression process onthe server.

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 40 / 43

Page 41: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Questions

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 41 / 43

Page 42: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Demonstration

REMI Website

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 42 / 43

Page 43: REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the Nuclear Imaging Group of the Life Sciences Division at Lawrence Berkeley National Laboratory,

Thank You

Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 43 / 43