29 march 2004 steven worley, nsf/ncar/scd 1 research data stewardship and access steven worley,...
TRANSCRIPT
![Page 1: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/1.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 1
Research Data Stewardship and Access Steven Worley, CISL/SCD
Cyberinfrastructure meeting with Priscilla Nelson and NSF
colleagues
![Page 2: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/2.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 2
How is cyberinfrastructure used in this domain?
• Harvest data to build RDA content– World-wide
• Create standard metadata– Enable discovery and metadata sharing
• Provide data access– Internally to NCAR/UCAR– Externally to global research community
![Page 3: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/3.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 3
Definition of the RDA
• 500 plus distinct archived datasets• Continual growth for about 40 years• Each has metadata displayed on a web
page• All data on the MSS (primary +
backups)– 548K files– 100.5 TB
![Page 4: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/4.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 4
Harvest data to build RDA content
Dataset Update Frequency
0 5 10 15 20 25 30 35
Number of Datasets
Annual
Several/yr
Weekly
Monthly
Irregular
DailyTotal Active Datasets = 79
Dataset Update Method
0 10 20 30 40 50 60
Number of Datasets
Other Dataset
Network
Tape
CDROM
![Page 5: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/5.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 5
• Current network methods– Manual web download– Automatic scripted FTP– Subscription uploadCommodity internet
• Limitations– Slow for large volumes– Success/failure checks are responsibility of
staff
• Future– Exploit larger bandwidth networks– Larger bandwidth tools, ESG… etc
Harvest data to build RDA content
![Page 6: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/6.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 6
Create standard metadata• Legacy metadata
– Hardcopy and images – Digitally online since about 1980– Local standardize format
• Currently– Legacy metadata remains available
• Used to derive web pages – Transformed to standards used in CDP– Incorporated into THREDDS catalogues
• Enable searches across UCAR
• Future– More detailed metadata for accurate discovery (e.g.
file level metadata) – Continue to be export through CDP and data servers
systems
![Page 7: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/7.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 7
Provide data access (delivery)
• Internally – to NCAR computing systems • Currently, from the NCAR MSS
– Supercomputer– Data analysis systems– Divisional computer systemsMSS is a tape based archive system not
designed to be a scalable file server
• Future • SANS between computer systems and MSS• Enable rapid file service and unburden the
archive system
![Page 8: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/8.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 8
Internal (MSS) access metrics
Unique Users for MSS
0
50
100
150
200
250
300
350
400
450
500
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Data Delivery from MSS
0
5
10
15
20
25
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Terab
ytes
Files read for 2004
• 25K
![Page 9: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/9.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 9
Provide data access (delivery)
• Externally – to the internet• Caveat: some NCAR user
• Currently, traditional data server– Web and FTP downloads
• Most popular data only (166 K files, 10.7 TB)
– Subsetting• By request and delayed mode processing
• Future– More traditional services– Key datasets available through portals
(CDP/ESG)
![Page 10: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/10.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 10
Provide data access (delivery)
• Data server (Web and FTP) metrics• Jan. – Feb. 2005 Only
– New system to accurately track users– Old system provided “fuzzy” metrics
January 2005
February 2005
Unique Users
517 523
Amount (TB) 1.2 1.8
No. Files 6151 12403
![Page 11: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/11.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 11
Future
• Fact– Dataset size and complexity is growing –
need to handle more data
• How?– Use advanced networks harvest rapidly – More complete metadata, in a standard
• Improved data discovery and access• Improved (more efficient) data management
– Provide critical collections through portals• Interoperable access through servers (e.g. GDS,
etc)– Distributed archives
• Share metadata with other portals (global discovery)
![Page 12: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/12.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 12
Key Case – ERA-40
• 35 TB collection, 30 distinct product lines
• Added about 10 products (computed in SCD)– Support Climate Modeling
• Metrics for 2004 Web & FTP NCAR MSS Total Unique Users 68 70 129 Number of Data Files 28426 12898 41324 Data Amount (GB) 10778 9500 20278
• Web & FTP = MSS in Data Amount
• Over 20 TB delivered
• 13K files from non-file server MSS
![Page 13: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/13.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 13
Conclusions
Are using basic cyberinfrastructure now
Will use new proven components in our operations
With cyberinfrastructure we plan to:
• improve data acquisition, discovery, and access
• improve our management efficiencyIn the process we will:
• seamlessly integrate new and traditional systems
• not lose track of critical legacy data and metadata
![Page 14: 29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson](https://reader037.vdocument.in/reader037/viewer/2022110103/5697bff11a28abf838cbb290/html5/thumbnails/14.jpg)
29 March 2004 Steven Worley, NSF/NCAR/SCD 14
Questions/Discussion