![Page 1: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/1.jpg)
A Geospatial Data Catalog and Metadata Management Tools
for the U.S. Environmental Protection Agency’s
Western Ecology Division
David L. Bradford
Geosciences
Oregon State University
![Page 2: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/2.jpg)
Introduction
• U.S. EPA Summer Internship: Western Ecology Division, Corvallis, OR
• Large amount of GIS data (4 Tb) representing 20+ years worth of research
• Common national datasets• Virtually no metadata and no central index• Hard to know whether/where data exist• MISSION: come up with a catalog for these
geospatial data…• …with one intern, no budget, no new
infrastructure, and do it all in 14 weeks?
![Page 3: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/3.jpg)
Introduction
• Background: the Western Ecology Division (WED) & the need for metadata
• Research questions, hypothesis: give them a fish or teach them to fish?
• Approach: system development life cycle• Results: EPA Synchronizer, GeoData
Gateway, & metadata “harvesting”• Discussion & Conclusions: automating
metadata creation, overcoming institutional inertia
![Page 4: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/4.jpg)
Background
• June through September, 2007• The WED – laboratory under the National
Health & Environmental Effects Research Laboratories (NHEERL)
• EPA Office of Research & Development (ORD)
• Project team: Connie Burdick, Denis White, Randy Comeleo, Patrick Clinton, & yours truly
• Help from: Office of Environmental Information (OEI) GeoData Gateway team
![Page 5: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/5.jpg)
Metadata• Information about data• Self-indexing, fitness for purpose, how to
manipulate(Green & Bossomaier, 2002; Longley et al., 2005)
• Time-consuming (i.e. expensive) to create(e.g., Ma, 2007)
• A “hassle” for the analyst• Standard: Federal Geographic Data
Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) (FGDC, 1998)
• LINCHPIN: GOOD METADATA• Objective: Tools to create standards-
compliant metadata and automate the process as much as possible
![Page 6: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/6.jpg)
Existing EPA Process
• WED projects launched, GIS data created• Different PIs, different goals, shared analysts• Before: informal “over-the-cubicle-wall”
communication was sufficient to manage data; could get by without metadata
• Now: informal methods breaking down • GIS analysts/contractors recently dispersing
to different offices, buildings, sites• Data now require multiple disk volumes
![Page 7: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/7.jpg)
Existing Resources & Infrastructure
• Data storage: Windows NT-based servers (2.5 Tb), Linux RAID server (1.5 Tb)
• Web server: Windows NT-based (IIS)• ESRI ArcGIS Suite, ArcObjects Libraries• EPA Metadata Editor (EME)• Second Copy (batch file copy utility)• GeoData Gateway (GDG)• Microsoft Visual Studio 2005 Integrated
Development Environment (IDE)
![Page 8: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/8.jpg)
Other Parameters and Constraints
• Budget: 1 summer intern
• Team: 4 analysts, 1 developer (the intern), 1 GDG administrator, local tech support
• Users: 14 GIS analysts (half contract staff); ~ 50 local GIS data “consumers”
• Data: 4 Tb (coverages & shapefiles)
![Page 9: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/9.jpg)
Other Parameters and Constraints (cont.)
• Standards & Policies– FGDC-CSDGM– EPA National Geospatial Data Policy– EPA Metadata Technical Specification v1.0– GeoData Gateway Governance Structure
• Primary constraint: Don’t relocate the data! Interlinked, interdependent datasets
![Page 10: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/10.jpg)
Challenges
• Can an effective geospatial catalog system be assembled, using existing EPA resources, that has minimal long-term administrative costs?
• Can such a system be more than just a one-time inventory, i.e., can the solution be sustained by the WED GIS community long after the programmer leaves?
![Page 11: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/11.jpg)
Propositions• A sustainable geospatial catalog solution can
be developed using existing or freely available (e.g., open source) tools, software components, and EPA resources
• Regardless of architecture, in order to be self-sustaining, it will require that primary GIS users implement a policy of creating consistent metadata
• The system cannot be fully implemented within 14 weeks
![Page 12: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/12.jpg)
Approach
• System Development Life Cycle
– Identify the need: done
– Requirements Analysis: identify resources,
constraints, functionality, user interfaces
– Architectural Design: weigh options,
choose strategy, develop “blueprint”
![Page 13: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/13.jpg)
Approach (cont.)
• System Development Life Cycle (cont.)
– Software Development: code missing
components, unit test
– Integrated System Testing: implement
components and test entire system
– User Training and Implementation: “roll it
out”
![Page 14: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/14.jpg)
Results: Requirements Analysis
• Support existing processes
• Use existing infrastructure
• Arcane, “homegrown” solution: No
• Low maintenance solution: Yes
• User interfaces:– ArcGIS-Integrated– Web Portal
• Don’t relocate datasets
![Page 15: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/15.jpg)
Results: Architectural Design1. Metadata creation/
maintenance• GIS analyst responsibility• But, as automated as
possible using EPA Synchronizer - new software tool
• Edit/validate metadata using EPA Metadata Editor (EME) - existing tool
• EPA Synchronizer uses EME Defaults Database (local MS Access database)
• Once this step happens, the rest is magic
![Page 16: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/16.jpg)
Results: Architectural Design
2. Internal “harvesting” of metadata
• Weekly server process that runs automatically (Second Copy)
• Locates all new & modified metadata files contained within specified disk volumes
• Copies metadata files (including their containing directory structure) to a “web accessible folder” (WAF) on the WED’s intranet server
![Page 17: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/17.jpg)
Results: Architectural Design
3. GeoData Gateway (GDG) metadata harvest
• ESRI GIS Portal Toolkit server (the catalog system)
• maintained by EPA Office of Environmental Information
• Configured to automatically harvest the WED’s metadata from the WAF
• Validates metadata and posts to GDG catalog
![Page 18: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/18.jpg)
Results: Architectural Design
4. Users search GDG using ArcCatalog or a web browser
• full-text searchable on any metadata element value
• can search using geographic extent (completely within or overlapping)
• results returned include full local path to actual dataset
![Page 19: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/19.jpg)
Results: Software Development
• Synchronization: the term used by ESRI to describe the update of metadata using internal dataset info
© 2002 ESRI
![Page 20: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/20.jpg)
Results: Software Development
• A custom tool, called the EPA Synchronizer, was developed based on ESRI white paper and sample code
• Written in Visual Basic using ArcObjects libraries
• Can automatically create most of the metadata, pulling values from two sources: dataset, and EME defaults database
• User then inserts Title, Abstract, Purpose, & Supplemental Info using EME
![Page 21: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/21.jpg)
Results: Software Development
• Synchronization: the term used by ESRI to describe the update of metadata using internal dataset info
© 2002 ESRI
![Page 22: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/22.jpg)
Results: Unit Testing
Remainder of processIs automated.
![Page 23: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/23.jpg)
Results: Integrated System Testing
• Identify major commonly-used national and regional datasets
• Start process of creating metadata for them
• Automated processes for harvesting metadata would be triggered
• Full system test would be enabled
• This step has barely begun
![Page 24: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/24.jpg)
Results: User Training and Implementation
• Implementation has not yet occurred• Draft of instructional user documentation
completed, focused on metadata creation and catalog searching
• Technical instructions detail installation and configuration of software tools, harvesting processes, and GDG administration
• Catalog (create metadata for) select existing datasets
• Create metadata for new datasets
![Page 25: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/25.jpg)
Discussion
• Seemingly monumental challenge at first, but untapped existing resources emerged (GDG, EME, Second Copy, web server)
• Federated approach: – autonomy in data maintenance– non-intrusive data access– no changes to data structure
• An elegant, minimalist solution
![Page 26: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/26.jpg)
Discussion• But the jury is still out.• Odds of success would increase with:
– Dedicated permanent staff vs. temporary; GIS service and support requires GIS skills, administrative skills, and IT skills (Longley et al., 2005; Longstreth, 1995)
– A champion in the organization; someone needs to foster a high level of support for the project (Obermeyer, 1995)
– Conscious effort to overcome institutional inertia; turf battles, unwillingness to reorganize can kill a project (Evans and Ferreira, 1995)
– Formalized quality control of digital information– Less paranoia, less government red tape
![Page 27: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/27.jpg)
Conclusion• Data used in a shared environment become
cleaner – more complete and correct (Craig, 1995)
• Useful legacy datasets will receive new metadata• Some unseen hurdles remain; will need a
champion to see it through• GDG team has plans to bundle EPA Synchronizer
with EPA Metadata Editor
Obermeyer and Pinto, 1994
![Page 28: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/28.jpg)
Craig, William J. (1995). Why We Can’t Share Data: Institutional Inertia. In: Onsrud, H.J. and G. Rushton (Eds.) Sharing Geographic Information. Rutgers University & the Center for Urban Policy Research, New Brunswick, New Jersey: 107-118.
ESRI (2002). Creating a Custom Metadata Synchronizer, An ESRI White Paper. July 2002. ESRI, Redlands, CA. http://www.esri.com, last accessed November 26, 2007.
Evans, John and J. Ferreira Jr. (1995). Sharing Spatial Information in an Imperfect World: Interactions Between Technical and Organizational Issues. In: Onsrud, H.J. and G. Rushton (Eds.) Sharing Geographic Information. Rutgers University, Center for Urban Policy Research, New Brunswick, New Jersey: 448-460a.
FGDC (1998). FGDC-STD-001-1998, Content Standard for Digital Geospatial Metadata, Federal Geographic Data Committee, June 1998.
Green, David and T. Bossomaier (2002). Online GIS and Spatial Metadata. Taylor & Francis, London; New York.
Longley, Paul A., M.F. Goodchild, D.J. Maguire, and D.W. Rhind (2005). Geographic Information Systems and Science, 2nd Ed. John Wiley & Sons, Ltd, Chichester, West Sussex, England.
Longstreth, Karl (1995). GIS Collection Development, Staffing, And Training. Journal of Academic Librarianship, vol. 21 no. 4: 267-275.
Ma, Jin (2007). SPEC Kit 298: Metadata. Association of Research Libraries, Washington, DC.
Obermeyer, Nancy J. (1995). Reducing Inter-Organizational Conflict To Facilitate Sharing Geographic Information. In: Onsrud, H.J. and G. Rushton (Eds.) Sharing Geographic Information. Rutgers University, Center for Urban Policy Research, New Brunswick, New Jersey: 138-148.
Obermeyer, Nancy J. and J.K. Pinto (1994). Managing Geographic Information Systems. The Guilford Press, New York.
Literature Cited
![Page 29: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences](https://reader030.vdocument.in/reader030/viewer/2022033105/56649e4c5503460f94b41723/html5/thumbnails/29.jpg)
¿Preguntas?