what makes a data archive tick: marrying content and user support
DESCRIPTION
What Makes a Data Archive Tick: Marrying Content and User Support . Steven Worley National Center for Atmospheric Research Computational and Information Systems Laboratory May 17-21, 2010 Summer Institute for Data Curation for Earth and Environmental Science - PowerPoint PPT PresentationTRANSCRIPT
What Makes a Data Archive Tick: Marrying Content and User Support
Steven WorleyNational Center for Atmospheric Research
Computational and Information Systems LaboratoryMay 17-21, 2010
Summer Institute for Data Curation for Earth and Environmental ScienceGraduate School of Library and Information Science
University of Illinois, Urbana-Champaign
① How to make and keep the archive content relevant to the users?
② How to engage the users?
How to make and keep the archive content relevant to the users?
Know your usersDefine your focus community
Cannot serve everyoneDesign service not to limit othersAt decision points (e.g. changes in service) ask:
“Is this a significant benefit for my users?”The case @ NCAR
Atmospheric, oceanic, and some related geo-science researchGraduate students and higher educationNCAR scientists, researchers @ universities with graduate
degree programs in meteorology and oceanographyOver 50% of 6000+ unique users, annually, are outside focus
group
Understand their science, currently, and trendsAttend seminars, symposia, meetings where they
present their workCorollary: Have science educated staff
The case @ NCAR – Research Data Archive
How to make and keep the archive content relevant to the users?
All have MS degrees, or greater• meteorology (6)• oceanography (2)• computing science
(1)• exception – admin.
(1)
Understand their science, currently, and trendsRoutinely review journals, bulletins, and relevant news
letters Search for science strongly dependent on your data focusContact authors, offer data sharing service
@ NCAR
How to make and keep the archive content relevant to the users?
Understand their science, currently, and trendsDevelop close contacts with a few key users
Seek ‘honest’ opinions about your serviceMake your service known – presentations, publications@ NCAR
How to make and keep the archive content relevant to the users?
Know how your users workHow do they prefer to handle data?
Digital files – write and run program codes to evaluate contentDigital files – specific formats that are application friendly
E.g. netCDF, GIS, WMO ASCII text convenient for worksheets Images of analyses (charts, line graphs, 2D/3D contoured plots)
@NCARDigital files are key Some images for discovery, but not critical
Design the systems to deliver what users want
How to make and keep the archive content relevant to the users?
Choosing the contentAt decision points (e.g. adding a new dataset) ask:
“Can we handle this efficiently?”Does it supplement or extend the central data foci?Does it address a new need or trend?Are the formats aligned with user preferences?
If not, can we make a cost effective conversion?Do you have staff (data scientists / stewards) that can
understand the scientific content?@ NCAR
Atmospheric, oceanic, related geo-sciences observations or analyses derived from observations to support climate and weather research.
How to make and keep the archive content relevant to the users?
Choosing the contentEvaluate user metrics
What datasets are most popular?Who is using what – can you distinguish your focus group? Are there any trends?Caution: this is only part of the story
@ NCAROur user registration allows us to track thisExamples
How to make and keep the archive content relevant to the users?
Unique Users by service path
Users in four service categories MSS to CISL HPC environment Web to world-wide community Orders – one off consulting assisted data
preparation TIGGE
6 thousand users annually FY09: MSS=266, Web=5649, Orders=196,
TIGGE=44
Amount of data by service path
Users in four service categories MSS to CISL HPC environment Web to world-wide community Orders – one off consulting assisted data
preparation TIGGE
162 TB in FY09 FY09: MSS=31, Web=120, Orders=9,
TIGGE=2
NCAR-CSM Symposium on Climate and Energy
12
User ranked popular datasets
7 May 2010
Unique users FY09 datasets Titles
2878ds082.0, ds083.2, ds083.0 NCEP FNL Operational Model Global Tropospheric Analyses
924 ds090.0 NCEP/NCAR Global Reanalysis Products
510ds758.0, ds759.3, ds759.2 NGDC Global 2' and 5' Elevations, USGS 30 ARC-second
477
ds461.0, ds351.0ds337.0, ds464.0,ds353.4 NCEP ADP/PREPBUFR Global Surface and Upper Air Observations
358 ds608.0 NCEP North American Regional Reanalysis (NARR)264 ds609.2 GCIP NCEP ETA model output
262 ds540.1, ds540.0 International Comprehensive Ocean-Atmosphere Data Set (ICOADS)190 ds744.4 QSCAT/NCEP Blended Ocean Winds
173 ds277.0 NCEP V2.0 OI Global SST, V3.0 Extended Reconstructed Analyses153 ds335.0, ds336.0 Unidata (IDD) Observations and Model Data5921 All Datasets All DSS datasets
Top 10 datasets/groups FY09
~ 6000 Unique Users Annually
Remain flexible – expect constant changeBe ready to take opportunities when they come along
Re-adjust prioritiesResist ‘tight’ mission controlTake advice from advisory groups, but don’t depend on
them exclusively Use holistic approach
@ NCAR, unplanned for exampleArctic System Reanalysis – NSF sponsored research critical to
assess the changes happening in the ArcticNeed controlled access to first prototype data – We do this!
How to make and keep the archive content relevant to the users?
Sustaining for the long-termRichness and data value grow over time
Data assets tend to compliment each other – add value to many different research questions
Scientific publications lead to broader and increased interestDefinitive data citation is a work in progress
Staffing needs to be base/core fundedGrant directed funding can lead to a fractured, ad hoc,
incomplete archiveCan be a major frustration for users
@ NCAR – the Research Data ArchiveBegan 40+ years ago Today sustained by 9 persons
How to make and keep the archive content relevant to the users?
CollaborationsParticipate/volunteer for committees and panels that
tackle data issues (all sorts)Learn from others, share knowledge
Share efforts and data with other organizationsNo one group can do it all (don’t have resources and all
expertise required)@ NCAR (conf. like SIDC for EES)
Volunteerism: NAS, AMS, NOAA, WMO, NASANational and International data agreements with:
European Centre for Medium Range Forecasting Japanese Meteorological AdministrationU.S. National Weather Service, National Center for
Environmental Prediction
How to make and keep the archive content relevant to the users?
How to Engage the Users?Data Discovery – how can people find you?
All 600+ RDA Datasets have metadata in GCMD• Automatically, exported via OAI – PMHSimilarly: RDA > CDP@NCAR > BADC in UK
How to Engage the Users?Design your portal to evolve – it will/should
2002• Search• Navigation• List of menus• Unique layout of
links • Picture of
people
How to Engage the Users?
2008• Search
• Two ways
• Navigation• Links• News• Text• People
How to Engage the Users?
NCAR-CSM Symposium on Climate and Energy
207 May 2010
Primary design feature for web portal• Data Discovery – Find Data!
How to Engage the Users?
2010• All about
search• Gone from top
• people• text• news
How to Engage the Users?
Navigation once they arriveWorking principles
Uniform across web portal Keep organizational elements out of prime visual territory
@ NCARHave user registration – only required to get data
All discovery metadata open – unlimited searching
How to Engage the Users?
The complete data knowledge package, and data cycle
What is a complete data knowledge package?Rich metadata plus the data files!
One example http://dss.ucar.edu/datasets/ds277.0/
How to Engage the Users?
The pieces that make rich metadataDataset navigation (Access, Documentation, Software)TitleSummary
How to Engage the Users?
The pieces that make rich metadataPeriod of data recordUpdate cycleScientific parameters (Variables)Earth reference levels
How to Engage the Users?
The pieces that make rich metadataTimes – temporal increment Data types – points or gridsGeo-spatial coverageSource organizations
How to Engage the Users?
The pieces that make rich metadataRelated Internet sitesPublicationsAcknowledgement statement
How to Engage the Users?
The pieces that make rich metadataVolume – size of the datasetData formatsRelated datasets in the NCAR collectionConsulting contact (email and phone)A 2nd pointer to Data Access
How to Engage the Users?
The complete data knowledge package, and data cycleData Cycle Facts Datasets are re-published – new versions. Datasets are corrected and extended in time or space. Scientific analysis and publication will occur randomly along the
data cycle.
Data referencing is more challenging than traditional publication referencing because of the data cycle.
How can you accurately trace/recover what has been used for publication?
How to Engage the Users?
The complete data knowledge package, and data cycle
@ NCAR Don’t have systematic (organization-wide) way to
handle the data cycle We do not discard/delete old versions of data
Ad hoc approach Currently, building a version tracking software
Versioning will be included in DOI implementation
How to Engage the Users?
ConsultationCritical two-way communication1. Benefits for the user
Guidance to best available datasetsConsolidate research ideas into required data sourcesSoftware assistanceCustomized data preparation if necessary
2. Benefits to the archive stewardshipDetect ways to improve our search processLearn about data requirement trendsOccasionally, acquire new data resources from scientific effortsLearn about data problems we might have
How to Engage the Users?
Provide research tool support and documentation Provide users a starting point for data evaluation
Simple access programs – the languages used by the focus community
Pointers to applications (IDL, MatLab, NCL, NCO, etc.) Specific example are VERY helpful!
Must maintain software/applications and documentation for the long-term.Guarantee users will understand the meaning and have access.
How to Engage the Users?
Provide research tool support and documentation @ NCAR
Remain aware of proprietary software taps, E.g. for documents
will .xls be viable 50 years from now - .xlsx is now standard? Is .pdf any better?
Prefer data file formats that define everything to the byte/bit level
Computer code could always be written to access these.All kinds of reports, project descriptions, and documents that
explain the intent of the data are vital for the long-term.Use dedicated document directories for each datasets
How to Engage the Users?
Follow-up aidNotification service for significant dataset changes
If an error is corrected – should notify all users of the data Subscription service
Inform users when new data is available Prepare special products based on user determined template
– e.g. past requests@ NCAR
We have automated notification serviceProvided users register accurately
We do not have subscription service - yet
① How to make and keep the archive content relevant to the users?
② How to engage the users?
http://dss.ucar.edu/