enhanced data description for end users scribekey, llc brian hebert, solutions architect

16
Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect www.scribekey.com

Upload: keven-filbin

Post on 31-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

Enhanced Data Description for End Users

ScribeKey, LLC

Brian Hebert, Solutions Architect

www.scribekey.com

Page 2: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

ScribeKey Project Experience• Global FGDC Metadata

production for large commercial data provider(s)

• Federal Agency Assistance: Assess, describe, and standardize large collection of geospatial datasets

• Experience with data cleansing, metadata, integration, presentation, application development.

200+ Countries72 Layers

100s of Attributes100s of Domains

Quarterly Updates 50+ States400 Layers

1000s of Attributes100s of DomainsAnnual Updates

www.scribekey.com 2

Page 3: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

Goal: Make Data Easy to Understand and Use

• Data users today have more information than ever to keep track of.

• Individual provider data may be just part of larger data use and mission.

• Learning about data can take considerable time and effort.

• How to best help data customer understand and use data the most effectively?

• Reduce the learning curve.

www.scribekey.com 3

Page 4: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

Multiple Data Description Sources

WebsiteDocumentation

Metadata

UserTech Support Data Itself

Users learn how to use data through a variety of sourceswww.scribekey.com 4

Email

Page 5: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

Data Description Checklist • Is there a Data User Guide? A glossary and

index?• Are primary data categories and entities fully

described?• Are all acronyms, abbreviations, provider

vocabulary terms explained? • Are short, cryptic database field names and

values explained?• Are data types, lengths, keys, nulls allowed,

formats, lists clear to help user form SQL queries?

• Is FGDC/ISO Metadata available? • Are sample values and data profiles available?• Are data presentations, maps, symbols,

reports prepared for quick start? • All this info in one place?

www.scribekey.com 5

Meaning Structure

Contents

Complete metadata describes Meaning, Structure, and

Contents. Maximize understanding by end

user to help write queries/reports.

Page 6: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

Solution: Lightweight HTML Data Dictionary

Full descriptions of data categories, entities, attributes, domain values.Information integrated from documentation, data profiles, metadata, and data provider website. Available as stand alone HTML or on web

site. www.scribekey.com 6

Page 7: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

A Library Science Indexing/Abstracting approach is taken to

ensure the most important and useful

information is seen first.

Focus here is on clearly describing top level data

categories, layers and tables.

Key data provider terminology and

concepts are explained.

Dataset Overview

www.scribekey.com 7

Page 8: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

Includes Name, Geometry Type, Definition, Attribute List,

Keywords, and link to standard FGDC/ISO Metadata

Drill down to review Attributes

and Domains

FGDC metadata is typically organized and accessed as set of separate XML documents.

ScribeKey’s approach integrates these separate documents,

making all information available at a single access point.

Search/Highlight/Filter/Sort

Layer and Table Details

www.scribekey.com 8

Page 9: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

Core Data Info: All dataset metadata including Data

Type, Length, Format, Nulls Allowed, Primary and Foreign Keys, Join Information, Sample

Values, Percent Complete.

This data profiling information is essential for end user wanting to generate information

products as reports, maps, charts, and graphs from

SQL queries.

Attributes and Domain Values

www.scribekey.com 9

Page 10: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

Helping with the Data Provider/End User Communication Gap

User Language

Data providers and users have different languages and understandings of data. Use of keywords, aliases, and

definitions in data dictionary helps bridge this gap; provides a translation

www.scribekey.com10

“LayerTable

AttributeMap

SymbolCentroid

JoinReport”

ProviderLanguage

“ImputeFROMHN

EDGESADDRFNInternal

PointMTFCCS1100”

Page 11: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

How Does Data Profiling Help?

An essential tool for enhanced metadata: shows end user actual sample values, data types, lengths, formats, percent complete, etc. This valuable contents information is typically not found in metadata.

www.scribekey.com11

NUM FIELD DESCRIPTION1 DatasetId A unique identifier for the dataset2 DatabaseName The name of the source database3 TableName The name of the source database table4 RecordCount The number of records in the table

5 ColumnCount The number of columns in the table6 NumberOfNulls The number of null values in the table

Page 12: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

ScribeKey Metadata Generation

www.scribekey.com 12

• Sample data is reviewed and profiled. Any metadata is imported into repository.

• From profile, existing user documentation, technical support staff, and website, a metadata repository is populated and metadata document templates are developed.

• FGDC/ISO Metadata generated, as XML/HTML reports, from metadata repository.

MetadataRepository

MetadataTemplatesMetadata

Templates

MetadataExportApp

FGDC XML HTML

PDF

DOC

Page 13: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

Map, Query, Report Preparation

www.scribekey.com 13

Metadata Layers .MXD Preparation

Prepared for end user quick start: can include

symbol set up, joins/relates, maps,

queries, reports,

Use metadata to create GIS layers to allow variety of

map presentations, reports, etc. to summarize and highlight datasets by

metadata values.

Page 14: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

The Geospatial Metadata Repository

A B C

A B C

A B C

A B C

Areas Entities

Attributes Domains

METADATAREPOSITORY

The Metadata Repository, implemented as an RDMBS, is populated with automated tools then used to generate metadata outputs, data

dictionary content, schemas, maps, etc.

Data Layers

Metadata

Documents

Assessments

www.scribekey.com 14

Derivative Datasets

Meta-Maps

Pivot Tables

Schemas

Data Dictionary

Enhanced User Views

Page 15: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

Recap: ScribeKey Data Description Support• Generate or Upgrade FGDC/ISO Metadata

• Profile Data to provide user with actual contents information

• Help develop Data User Guides (PDF) and Website Copy

• Help author Indexes, Abstracts, and Glossaries

• Integrate multiple and separate data description materials in a single lightweight HTML front end.

• Help prepare ArcMap, .mxd, symbols, joins, reports, and maps

• Result: Data is as easy to understand and use as possiblewww.scribekey.com 15

Page 16: Enhanced Data Description for End Users ScribeKey, LLC Brian Hebert, Solutions Architect

About www.scribekey.com

• ScribeKey, LLC: Massachusetts Corporation

• Brian Hebert, PMP, 30+ years designing and building desktop and web DB/GIS solutions

• Extensive experience producing metadata and data dictionaries for data providers and end users

• Extensive experience with data integration, data quality assessments, data cleansing, ETL, and application development with ESRI/ArcObjects, .NET, SQL, XML, HTML

• Small focused teams, template approach, quick turnarounds, practical approach

16www.scribekey.com