usability issues facing 21st century data archives joey mukherjee and david winningham [email protected]
TRANSCRIPT
Usability Issues Facing 21st Century Data Archives
Joey Mukherjee and David [email protected]
Current Archiving Goal
Mission TeamRawData Processed
Data
Write Papers
DataIteration
QualityData
ArchiveFuture Scientists
QualityData
Current Archiving Reality
Mission TeamRawData Processed
Data
Write Papers
DataIteration
DataSubsets
Permanent Archive
Future Scientists
UncheckedData
Home Institution
Archive
PublicData
New Goal
Mission TeamRawData Processed
Data
Write Papers
DataIteration
ProcessedData
ArchiveFuture Scientists
ProcessedData
Standardizing HOWTO
Make it easyMake it usefulMake it extensible
Make it Easy
Reading / writing files must be super easy (i.e. cheap!)
– Either with tools or libraries
Tools can be command line or GUI
Make it Useful
How do I look at it?– Plots/Analysis
What else can I do with it?– Read into IDL, Matlab, Excel, etc.
Must have immediate benefits
Make it Extensible
Must be possible for others to add value added servicesMust be able to hold varieties of dataMust agree to give up control on content
Case Studies: HTML
Easy to create!Once done, look at in browserEmbrace / Extend
Case Studies: SPASE
Creation is slow and difficultOnce created, no real benefits yetVxOs have embraced, no one extended yet
Case Studies: IDFS
Until recently, difficult to create, complexOnce in, easy to look at, use, archive, etc.Somewhat extensible
Things right with IDFS
EfficientSelf documentingCalibrations stored in text file Science units derived instead of storedLittle to no reprocessing ever needed
Other IDFS Benefits
Can store most types of space physics data from raw telemetry to highly processed science unitsReversible from science units to raw telemetryUsable by data processor, scientist, and data archiver
Things wrong with IDFS
Overly complex format and APINot enough support in other tools - poor buy-inAnalysis routines merged with the file format - tried to do too much!
Implementation Plan
Develop a simple file format that can contain any and all types of time series space physics dataDevelop tools that allow someone to create and inspect files in this format Merge in the best parts of IDFS, CDF, netCDF, HDF, FITS, etc... without breaking paradigm of simplicity
Simple File Format
Format might already exist:– HDF5– XML– JSON– Other data models?
Making it useful
Get buy-in from visualization tools (SDDAS, DataShop, VisBard, IDL DLM, etc.)Get buy-in from archives sites (PDS, PSA, NSSDC, etc.)Seed money is essential
Advantages
ProvidersUsersManagement
Advantages: Providers
Instrument teams now have something to work towardCan develop expertise
Advantages: Users
Quick ways to create plots or access dataExpertise again!
Advantages: Management
Homogenous archives are infinitely easier to manage and maintainValue added services are a natural extension of quality archives
Conclusion
Why now? Because SPASE is gaining traction, this is the next logical step.This will save money for everyone in the long run.Everyone benefits with value added services.