Download - xldb2012_wed_0950_TimFrazier
This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344
A Self-Organizing Repository for Fusion Science
6th Extremely Large Databases Conference, Stanford University
September 10 - 13, 2012
Tim Frazier, Allan Casey, Matt Hutton, Prav Patel
National Ignition Facility, Control and Information Systems
LLNL-PRES-579672
High Energy Density
Science
Two simultaneous soccer matches can be played on top of the NIF!
NIF Data
Shot Data
Configuration
Calibration
Campaign
Control
Experimental
Results
Analysis
Results
IT Data
Code
Servers
Administration
Business Data
CAD/CAM
NPS
SMART
Shared File Services
User Data
Home Directories
Desktop
Shared File Services
The NIF Data taxonomy provides a map of the data
we generate and manage
The diversity of data, format and a 30-year
retention requirement present challenges &
opportunities
8 Frazier—XLDB 2012—Stanford, September 2012
Full Aperture
Backscatter
Diagnostic
Instrument
Manipulator
(DIM)
Diagnostic
Instrument
Manipulator (DIM)
X-ray imager
Streaked
x-ray detector
VISAR
Velocity
Measurements
Static x-ray
imager
FFLEX
Hard x-ray
spectrometer
Near Backscatter
Imager
DANTE
Soft x-ray
temperature
Diagnostic
Alignment
System
Cross Timing
System
Up to 50 high precision instruments are used to
observe each NIF experiment
Talk or Conference Name, Date 10 NIF-0000-12345.ppt
A Data Taxon is an annotation that specifies location,
instrument & type of data
A data taxon enables a specific data object to be retrieved without knowledge
of the database’s physical organization
Subsystem Location Unit Identifier Instance Data ID
TD
TD
TC143-274
TC143-274
DANTE
DANTE
SCOPE-01-DB
SCOPE-01-DB
SHOT
SHOT
RAW_DANTE_SCOPE
SCOPECORR
Where What Experiment Result /
Analysis Result
11 Frazier—XLDB 2012—Stanford, September 2012
The tools & methods we use for data presentation &
dissemination have evolved over the last five years
Frazier—XLDB 2012—Stanford, September 2012 12
Our ability to develop and deploy server-based analysis codes is no match for
the scientists’ appetite for new & varied ways of using data
File systems
Web pages that query databases
Viewer with tagging, suit casing & dashboarding
Wiki integration
The majority of tools used by the scientific
community remain file-based
Frazier - ICALEPCS Conference France, October 10-14, 2011 13
File systems provide one, and only one, organizing principle for data; despite
this deficiency, we can escape neither their utility nor their ubiquity
File systems
Web pages that query databases
Viewer with tagging, suit casing & dashboarding
Wiki integration
Simple web pages were easy to write and provided
the ability to quickly & visually survey data sets
Frazier - ICALEPCS Conference France, October 10-14, 2011 14
One of the most important requirements was the ability to download data sets,
as files, for offline analysis
File systems
Web pages that query databases
Viewer with tagging, suit casing & dashboarding
Wiki integration
Web page structure is identical to the data model of
the underlying database
Frazier - ICALEPCS Conference France, October 10-14, 2011 15 NIF-0911-22970s2.ppt
A search is expressible as a RESTFUL URL
https://nifit.llnl.gov/pls/nif_dad/cmspub.cms_util.owa_shots?p_dttm_type=SHOT+DTTM&p_last_days=40&p_sho
t_id=N12072&p_y_if_include_system=Y&p_y_if_include_completed=Y
Data are non-relational but NOT without structure!
Frazier - ICALEPCS Conference France, October 10-14, 2011 16 NIF-0911-22970s2.ppt
The task of analysis code is to interpret the non-relational structure as scientific
phenomena
The repository provides the ability to quickly visualize
results and then download them for offline analysis
Frazier—XLDB 2012—Stanford, September 2012 17
Original instrument data are
downloadable for further analysis
Thumbnail images are created as data
are ingested into the Repository
Hierarchical Data Format (HDF5) files are the long
term storage format for data
Frazier - ICALEPCS Conference France, October 10-14, 2011 18
Oracle BLOBs provide the container and curation toolset for HDF files
Multi-step analysis, run on large compute clusters,
reduce raw images to results
Frazier—XLDB 2012—Stanford, September 2012 19
An eight-step, 55
input calculation
The pedigree of calculations and conclusions is an
integral part of the data model
Frazier - ICALEPCS Conference France, October 10-14, 2011 20 NIF-0911-22970s2.ppt
Five calculations
use this image
Data pedigree & other meta-data are stored in relational structures
Downloading data sets can compromise data
pedigree
Frazier—XLDB 2012—Stanford, September 2012 21
Scientists demanded the ability to group data sets using a taxonomy that was
meaningful to them; they also needed to be able to quickly asses key
performance metrics of an experiment
File systems
Web pages that query databases
Viewer with tagging, suit casing & dashboarding
Wiki integration
A more feature-rich Viewer has become the primary
interface for scientists
Frazier—XLDB 2012—Stanford, September 2012 22
Web page structure is still identical to the data model
of the underlying database but with more polish!
Frazier—XLDB 2012—Stanford, September 2012 23 NIF-0911-22970s2.ppt
Tagging enables user-driven
groups to be established
Dashboards are templates for visualizing the key
performance metrics of an experiment
Frazier - ICALEPCS Conference France, October 10-14, 2011 24 NIF-0911-22970s2.ppt
Scientists demanded support for multi-file, offline
analysis
Frazier—XLDB 2012—Stanford, September 2012 25 NIF-0911-22970s2.ppt
“Suit casing”
stream lines
offline
analysis &
adds
traceability
of what was
downloaded
A complete package of data is moved from the
Viewer to Desktop and then back again
Frazier - ICALEPCS Conference France, October 10-14, 2011 26 NIF-0911-22970s2.ppt
A data suitcase enables the export & import of
Scientific Analysis
Frazier—XLDB 2012—Stanford, September 2012 27
The database’s pedigree framework associates downloaded data to
analysis results produced offline
Discussions, often captured in Excel & PowerPoint,
were missing from the Repository
Frazier - ICALEPCS Conference France, October 10-14, 2011 28
A services & database-based Wiki added a missing feature set
File systems
Web pages that query databases
Viewer with tagging, suit casing & dashboarding
Wiki integration
Long-lived analysis & collaboration are supported by
a Wiki that integrates with the repository
Frazier - ICALEPCS Conference France, October 10-14, 2011 29
Meetings, goals
and expectations
are as important as
the experimental
data
Data mining across
experiments requires set-
at-a-time visualization
Each shot is annotated
with a statement of
purpose and a brief
statement of outcome
We have learned several lessons along the way;
some are worth writing down & sharing
• When the cost of data production is high, save everything
• All experimental data has structure; the question is “how much are
you willing to invest to find it?”
• Choose formats & data models that let the data describe themselves
• The most useful presentation of data for users is a file system
• Science is an social activity masquerading as an intellectual activity
• Users will show you the way to go, if you let them
Frazier—XLDB 2012—Stanford, September 2012 30