prototyping a virtual filesystem for storing and processing petascale neural circuit datasets

1

Prototyping a virtual filesystem for storing and processing petascale neural circuit datasets.

R. Clay Reid, Jeff Lichtman, Wei-Chung Allen Lee Harvard Medical School, Allen Institute for Brain Science

Center for Brain Science, Harvard UniversityDavi Bock

HMMI Janelia Farm David Hall and Scott Emmons

Albert Einstein College of Medicine

Art Wetzel, Greg Hood and Markus DittrichNational Resource for Biomedical Supercomputing

Pittsburgh Supercomputing [email protected] 412-268-3912www.psc.edu and www.nrbsc.org

Jan 11, 2012 Connectomics Data Project Overview

mailto:[email protected]

http://www.psc.edu/

http://www.nrbsc.org/

2

Reconstructing brain circuits requires high resolution electron microscopy over “long” distances == BIGDATA

Recent ICs have 32nm features 22nm chips are being delivered.

A synaptic junction>500 nm wide withcleft gap ~20 nm

Vesicles ~30 nm diam.

Dendritic spine www.coolschool.ca/lor/BI12/unit12/U12L04.htm

Dendrite Gate oxide 1.2nm thick

3

A10 Tvoxel dataset aligned by our groupwas an essential part of the March 2011 Nature paper with Davi Bock, Clay Reid and Harvard colleagues

Now we are working ontwo datasets of 100TB each and expect to reach PBs in 2-3 years.

4

The CS project is to implement and test a prototype virtual filesystem to address common problems associated with neuralcircuit and other massive datasets.

The most important aim is reducing unwanted data duplication as raw data are preprocessed for final analysis. The virtual filesystem addresses this by replacing redundant storage by on-the-fly computing.

The second aim is to provide a convenient framework for efficient on-the-fly computation on multidimensional datasets within high performance parallel computing environments using both CPU and GPGPU processing.

The Filesystem in User Space mechanism (FUSE) provides a convenient implementation basis that will work across a variety of systems. There are many existing FUSE codes that serve as useful examples.

5

We would eventually like to have a flexible software framework that allows a combination of common prewritten and user written application codes to operate together and take advantage of parallel CPU and GPGPU technologies.

6

Multidimensional data structures to provide efficient random and sequential access analogous to the 1D representations provided by standard filesystems will be part of this work.

Students working on this project will have access to a parallel cluster whichholds our large datasets along with the compilers and other tools required.Minimal end-to-end functionality with simple linear transforms can likely beachieved in about 8 weeks and then extended as time permits. Please contact Art Wetzel if there are further questions – [email protected].

prototyping a virtual filesystem for storing and processing petascale neural circuit datasets

Documents