enhanced data description for end users scribekey, llc brian hebert, solutions architect
Post on 31-Mar-2015
215 Views
Preview:
TRANSCRIPT
Enhanced Data Description for End Users
ScribeKey, LLC
Brian Hebert, Solutions Architect
www.scribekey.com
ScribeKey Project Experience• Global FGDC Metadata
production for large commercial data provider(s)
• Federal Agency Assistance: Assess, describe, and standardize large collection of geospatial datasets
• Experience with data cleansing, metadata, integration, presentation, application development.
200+ Countries72 Layers
100s of Attributes100s of Domains
Quarterly Updates 50+ States400 Layers
1000s of Attributes100s of DomainsAnnual Updates
www.scribekey.com 2
Goal: Make Data Easy to Understand and Use
• Data users today have more information than ever to keep track of.
• Individual provider data may be just part of larger data use and mission.
• Learning about data can take considerable time and effort.
• How to best help data customer understand and use data the most effectively?
• Reduce the learning curve.
www.scribekey.com 3
Multiple Data Description Sources
WebsiteDocumentation
Metadata
UserTech Support Data Itself
Users learn how to use data through a variety of sourceswww.scribekey.com 4
Data Description Checklist • Is there a Data User Guide? A glossary and
index?• Are primary data categories and entities fully
described?• Are all acronyms, abbreviations, provider
vocabulary terms explained? • Are short, cryptic database field names and
values explained?• Are data types, lengths, keys, nulls allowed,
formats, lists clear to help user form SQL queries?
• Is FGDC/ISO Metadata available? • Are sample values and data profiles available?• Are data presentations, maps, symbols,
reports prepared for quick start? • All this info in one place?
www.scribekey.com 5
Meaning Structure
Contents
Complete metadata describes Meaning, Structure, and
Contents. Maximize understanding by end
user to help write queries/reports.
Solution: Lightweight HTML Data Dictionary
Full descriptions of data categories, entities, attributes, domain values.Information integrated from documentation, data profiles, metadata, and data provider website. Available as stand alone HTML or on web
site. www.scribekey.com 6
A Library Science Indexing/Abstracting approach is taken to
ensure the most important and useful
information is seen first.
Focus here is on clearly describing top level data
categories, layers and tables.
Key data provider terminology and
concepts are explained.
Dataset Overview
www.scribekey.com 7
Includes Name, Geometry Type, Definition, Attribute List,
Keywords, and link to standard FGDC/ISO Metadata
Drill down to review Attributes
and Domains
FGDC metadata is typically organized and accessed as set of separate XML documents.
ScribeKey’s approach integrates these separate documents,
making all information available at a single access point.
Search/Highlight/Filter/Sort
Layer and Table Details
www.scribekey.com 8
Core Data Info: All dataset metadata including Data
Type, Length, Format, Nulls Allowed, Primary and Foreign Keys, Join Information, Sample
Values, Percent Complete.
This data profiling information is essential for end user wanting to generate information
products as reports, maps, charts, and graphs from
SQL queries.
Attributes and Domain Values
www.scribekey.com 9
Helping with the Data Provider/End User Communication Gap
User Language
Data providers and users have different languages and understandings of data. Use of keywords, aliases, and
definitions in data dictionary helps bridge this gap; provides a translation
www.scribekey.com10
“LayerTable
AttributeMap
SymbolCentroid
JoinReport”
ProviderLanguage
“ImputeFROMHN
EDGESADDRFNInternal
PointMTFCCS1100”
How Does Data Profiling Help?
An essential tool for enhanced metadata: shows end user actual sample values, data types, lengths, formats, percent complete, etc. This valuable contents information is typically not found in metadata.
www.scribekey.com11
NUM FIELD DESCRIPTION1 DatasetId A unique identifier for the dataset2 DatabaseName The name of the source database3 TableName The name of the source database table4 RecordCount The number of records in the table
5 ColumnCount The number of columns in the table6 NumberOfNulls The number of null values in the table
ScribeKey Metadata Generation
www.scribekey.com 12
• Sample data is reviewed and profiled. Any metadata is imported into repository.
• From profile, existing user documentation, technical support staff, and website, a metadata repository is populated and metadata document templates are developed.
• FGDC/ISO Metadata generated, as XML/HTML reports, from metadata repository.
MetadataRepository
MetadataTemplatesMetadata
Templates
MetadataExportApp
FGDC XML HTML
DOC
Map, Query, Report Preparation
www.scribekey.com 13
Metadata Layers .MXD Preparation
Prepared for end user quick start: can include
symbol set up, joins/relates, maps,
queries, reports,
Use metadata to create GIS layers to allow variety of
map presentations, reports, etc. to summarize and highlight datasets by
metadata values.
The Geospatial Metadata Repository
A B C
A B C
A B C
A B C
Areas Entities
Attributes Domains
METADATAREPOSITORY
The Metadata Repository, implemented as an RDMBS, is populated with automated tools then used to generate metadata outputs, data
dictionary content, schemas, maps, etc.
Data Layers
Metadata
Documents
Assessments
www.scribekey.com 14
Derivative Datasets
Meta-Maps
Pivot Tables
Schemas
Data Dictionary
Enhanced User Views
Recap: ScribeKey Data Description Support• Generate or Upgrade FGDC/ISO Metadata
• Profile Data to provide user with actual contents information
• Help develop Data User Guides (PDF) and Website Copy
• Help author Indexes, Abstracts, and Glossaries
• Integrate multiple and separate data description materials in a single lightweight HTML front end.
• Help prepare ArcMap, .mxd, symbols, joins, reports, and maps
• Result: Data is as easy to understand and use as possiblewww.scribekey.com 15
About www.scribekey.com
• ScribeKey, LLC: Massachusetts Corporation
• Brian Hebert, PMP, 30+ years designing and building desktop and web DB/GIS solutions
• Extensive experience producing metadata and data dictionaries for data providers and end users
• Extensive experience with data integration, data quality assessments, data cleansing, ETL, and application development with ESRI/ArcObjects, .NET, SQL, XML, HTML
• Small focused teams, template approach, quick turnarounds, practical approach
16www.scribekey.com
top related