xldb2012_wed_0950_timfrazier

31
This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 A Self-Organizing Repository for Fusion Science 6 th Extremely Large Databases Conference, Stanford University September 10 - 13, 2012 Tim Frazier, Allan Casey, Matt Hutton, Prav Patel National Ignition Facility, Control and Information Systems LLNL-PRES-579672

Upload: tim-frazier

Post on 17-Aug-2015

44 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: xldb2012_wed_0950_TimFrazier

This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344

A Self-Organizing Repository for Fusion Science

6th Extremely Large Databases Conference, Stanford University

September 10 - 13, 2012

Tim Frazier, Allan Casey, Matt Hutton, Prav Patel

National Ignition Facility, Control and Information Systems

LLNL-PRES-579672

Page 2: xldb2012_wed_0950_TimFrazier
Page 3: xldb2012_wed_0950_TimFrazier

High Energy Density

Science

Page 4: xldb2012_wed_0950_TimFrazier

Two simultaneous soccer matches can be played on top of the NIF!

Page 5: xldb2012_wed_0950_TimFrazier
Page 6: xldb2012_wed_0950_TimFrazier
Page 7: xldb2012_wed_0950_TimFrazier
Page 8: xldb2012_wed_0950_TimFrazier

NIF Data

Shot Data

Configuration

Calibration

Campaign

Control

Experimental

Results

Analysis

Results

IT Data

Code

Servers

Administration

Business Data

CAD/CAM

NPS

SMART

Shared File Services

User Data

Home Directories

Desktop

Shared File Services

The NIF Data taxonomy provides a map of the data

we generate and manage

The diversity of data, format and a 30-year

retention requirement present challenges &

opportunities

8 Frazier—XLDB 2012—Stanford, September 2012

Page 9: xldb2012_wed_0950_TimFrazier

Full Aperture

Backscatter

Diagnostic

Instrument

Manipulator

(DIM)

Diagnostic

Instrument

Manipulator (DIM)

X-ray imager

Streaked

x-ray detector

VISAR

Velocity

Measurements

Static x-ray

imager

FFLEX

Hard x-ray

spectrometer

Near Backscatter

Imager

DANTE

Soft x-ray

temperature

Diagnostic

Alignment

System

Cross Timing

System

Up to 50 high precision instruments are used to

observe each NIF experiment

Page 10: xldb2012_wed_0950_TimFrazier

Talk or Conference Name, Date 10 NIF-0000-12345.ppt

Page 11: xldb2012_wed_0950_TimFrazier

A Data Taxon is an annotation that specifies location,

instrument & type of data

A data taxon enables a specific data object to be retrieved without knowledge

of the database’s physical organization

Subsystem Location Unit Identifier Instance Data ID

TD

TD

TC143-274

TC143-274

DANTE

DANTE

SCOPE-01-DB

SCOPE-01-DB

SHOT

SHOT

RAW_DANTE_SCOPE

SCOPECORR

Where What Experiment Result /

Analysis Result

11 Frazier—XLDB 2012—Stanford, September 2012

Page 12: xldb2012_wed_0950_TimFrazier

The tools & methods we use for data presentation &

dissemination have evolved over the last five years

Frazier—XLDB 2012—Stanford, September 2012 12

Our ability to develop and deploy server-based analysis codes is no match for

the scientists’ appetite for new & varied ways of using data

File systems

Web pages that query databases

Viewer with tagging, suit casing & dashboarding

Wiki integration

Page 13: xldb2012_wed_0950_TimFrazier

The majority of tools used by the scientific

community remain file-based

Frazier - ICALEPCS Conference France, October 10-14, 2011 13

File systems provide one, and only one, organizing principle for data; despite

this deficiency, we can escape neither their utility nor their ubiquity

File systems

Web pages that query databases

Viewer with tagging, suit casing & dashboarding

Wiki integration

Page 14: xldb2012_wed_0950_TimFrazier

Simple web pages were easy to write and provided

the ability to quickly & visually survey data sets

Frazier - ICALEPCS Conference France, October 10-14, 2011 14

One of the most important requirements was the ability to download data sets,

as files, for offline analysis

File systems

Web pages that query databases

Viewer with tagging, suit casing & dashboarding

Wiki integration

Page 15: xldb2012_wed_0950_TimFrazier

Web page structure is identical to the data model of

the underlying database

Frazier - ICALEPCS Conference France, October 10-14, 2011 15 NIF-0911-22970s2.ppt

A search is expressible as a RESTFUL URL

https://nifit.llnl.gov/pls/nif_dad/cmspub.cms_util.owa_shots?p_dttm_type=SHOT+DTTM&p_last_days=40&p_sho

t_id=N12072&p_y_if_include_system=Y&p_y_if_include_completed=Y

Page 16: xldb2012_wed_0950_TimFrazier

Data are non-relational but NOT without structure!

Frazier - ICALEPCS Conference France, October 10-14, 2011 16 NIF-0911-22970s2.ppt

The task of analysis code is to interpret the non-relational structure as scientific

phenomena

Page 17: xldb2012_wed_0950_TimFrazier

The repository provides the ability to quickly visualize

results and then download them for offline analysis

Frazier—XLDB 2012—Stanford, September 2012 17

Original instrument data are

downloadable for further analysis

Thumbnail images are created as data

are ingested into the Repository

Page 18: xldb2012_wed_0950_TimFrazier

Hierarchical Data Format (HDF5) files are the long

term storage format for data

Frazier - ICALEPCS Conference France, October 10-14, 2011 18

Oracle BLOBs provide the container and curation toolset for HDF files

Page 19: xldb2012_wed_0950_TimFrazier

Multi-step analysis, run on large compute clusters,

reduce raw images to results

Frazier—XLDB 2012—Stanford, September 2012 19

An eight-step, 55

input calculation

Page 20: xldb2012_wed_0950_TimFrazier

The pedigree of calculations and conclusions is an

integral part of the data model

Frazier - ICALEPCS Conference France, October 10-14, 2011 20 NIF-0911-22970s2.ppt

Five calculations

use this image

Data pedigree & other meta-data are stored in relational structures

Page 21: xldb2012_wed_0950_TimFrazier

Downloading data sets can compromise data

pedigree

Frazier—XLDB 2012—Stanford, September 2012 21

Scientists demanded the ability to group data sets using a taxonomy that was

meaningful to them; they also needed to be able to quickly asses key

performance metrics of an experiment

File systems

Web pages that query databases

Viewer with tagging, suit casing & dashboarding

Wiki integration

Page 22: xldb2012_wed_0950_TimFrazier

A more feature-rich Viewer has become the primary

interface for scientists

Frazier—XLDB 2012—Stanford, September 2012 22

Page 23: xldb2012_wed_0950_TimFrazier

Web page structure is still identical to the data model

of the underlying database but with more polish!

Frazier—XLDB 2012—Stanford, September 2012 23 NIF-0911-22970s2.ppt

Tagging enables user-driven

groups to be established

Page 24: xldb2012_wed_0950_TimFrazier

Dashboards are templates for visualizing the key

performance metrics of an experiment

Frazier - ICALEPCS Conference France, October 10-14, 2011 24 NIF-0911-22970s2.ppt

Page 25: xldb2012_wed_0950_TimFrazier

Scientists demanded support for multi-file, offline

analysis

Frazier—XLDB 2012—Stanford, September 2012 25 NIF-0911-22970s2.ppt

“Suit casing”

stream lines

offline

analysis &

adds

traceability

of what was

downloaded

Page 26: xldb2012_wed_0950_TimFrazier

A complete package of data is moved from the

Viewer to Desktop and then back again

Frazier - ICALEPCS Conference France, October 10-14, 2011 26 NIF-0911-22970s2.ppt

Page 27: xldb2012_wed_0950_TimFrazier

A data suitcase enables the export & import of

Scientific Analysis

Frazier—XLDB 2012—Stanford, September 2012 27

The database’s pedigree framework associates downloaded data to

analysis results produced offline

Page 28: xldb2012_wed_0950_TimFrazier

Discussions, often captured in Excel & PowerPoint,

were missing from the Repository

Frazier - ICALEPCS Conference France, October 10-14, 2011 28

A services & database-based Wiki added a missing feature set

File systems

Web pages that query databases

Viewer with tagging, suit casing & dashboarding

Wiki integration

Page 29: xldb2012_wed_0950_TimFrazier

Long-lived analysis & collaboration are supported by

a Wiki that integrates with the repository

Frazier - ICALEPCS Conference France, October 10-14, 2011 29

Meetings, goals

and expectations

are as important as

the experimental

data

Data mining across

experiments requires set-

at-a-time visualization

Each shot is annotated

with a statement of

purpose and a brief

statement of outcome

Page 30: xldb2012_wed_0950_TimFrazier

We have learned several lessons along the way;

some are worth writing down & sharing

• When the cost of data production is high, save everything

• All experimental data has structure; the question is “how much are

you willing to invest to find it?”

• Choose formats & data models that let the data describe themselves

• The most useful presentation of data for users is a file system

• Science is an social activity masquerading as an intellectual activity

• Users will show you the way to go, if you let them

Frazier—XLDB 2012—Stanford, September 2012 30

Page 31: xldb2012_wed_0950_TimFrazier