introduction to the gao enterprise taxonomy project version 1.0 june 17, 2008 draft
TRANSCRIPT
2
Introduction
• Definitions, Concepts, Context• About GAO• Development Process
o Researcho Strategyo Designo Implementation
Roadmap Sequencing Plan
o Administration• Elements of Information Architecture Maturity• Search Beta Demo
5
Definitions, Concepts, Context
• Enterprise Search o Google o Better Metadata = Better Search
• What is Taxonomy?• What is Information Architecture (IA)?• Perspective from a Taxonomy Manager Point of View• Other perspectives
o Web Designero Usability Engineero Data Architect
Information Architecture for the World Wide Web, Peter Morville and Louis Rosenfeld
Definitions, Concepts, Context
Information Architecture for the World Wide Web, Peter Morville and Louis Rosenfeld
Definitions, Concepts, Context
Information Architecture for the World Wide Web, Peter Morville and Louis Rosenfeld
Definitions, Concepts, Context
ElementData Type Length Required Source Purpose
Asset Metadata
Unique ID String Variable Y System supplied System identifier to retrieve item.
Creator String Variable Y System supplied Editorial ownership.
Title String Variable N System suppliedText search & results display
Description String Variable N User supplied
Date Date Fixed N System supplied Publish, feature, & review content.
Subject Metadata
Topic String Variable Y Topic CV
Search and Browse (Faceted Navigation)
Program String Variable Y Program CV
Agency String Variable Y Agency CV
Type String Variable Y Content Type CV
Use Metadata
Security Level String Variable N Security CV Use control
Audience String Variable N Audience CV Target, personalize content.
Definitions, Concepts, Context: Metadata Schema (excerpt)
Courtesy of EPA
10
Definitions, Concepts, Context: Faceted Classification
• A faceted classification schema enables search and discovery by multiple attributes. These facets bring additional context to the search for assets
ContentSpace Shuttle
Space
exploration
Engagem
ent
Report
NA
SA
00’-0
5’
Cre
ato
rC
reat
or
Co
nsu
mer
Co
nsu
mer
Navigation System:• Wireframes• Blueprints (Site Maps)• Global/Local Templates• Hierarchies• A-Z Index Search and Browse
Zone:Faceted Navigation
akaClassification Scheme
or Taxonomy
Faceted Navigation: Volume and granularity of content presents findability problems. Some systems integrate search and browse allowing users to go back and forth
Search System:• Query Builder• Search Engine• Relevance Ranking• Results Presentation• Metadata Schema• Controlled Vocabulary
Definitions, Concepts, Context: Faceted Navigation System
12
Definitions, Concepts, Context: User Centered Design Focus
The user should be able to:• Search multiple repositories efficiently and intuitively• Find an object without having to know where it is stored• Use keyword queries integrated with browse to discover both
known and unknown data• Save searches and apply personal tags to content,
increasing its findability • Expose relationships between items; increased context
improves sense-making• Use a common, familiar information model when searching
across repositories • Keep search simple
“Deliver the right information to the right person at the right time”
14
Mission and Work
GAO’s Mission is to support the Congress in meeting its constitutional responsibilities and to help improve the performance and ensure the accountability of the federal government for the benefit of the American people. We provide Congress with timely information that is objective, fact-based, nonpartisan, non-ideological, fair, and balanced.
GAO’s Work is done at the request of congressional committees or subcommittees or is mandated by public laws or committee reports. We also undertake research under the authority of the Comptroller General. We support congressional oversight by:
o auditing agency operations to determine whether federal funds are being spent efficiently and effectively;
o investigating allegations of illegal and improper activities; o reporting on how well government programs and policies are meeting
their objectives; o performing policy analyses and outlining options for congressional
consideration; and o issuing legal decisions and opinions, such as bid protest rulings and
reports on agency rules.
15
GAO Engagement Overview
• The Engagement Management Process sets forth specific activities that need to be completed for an engagement. These activities allow an engagement to successfully proceed to product issuance. The activities are not necessarily done sequentially. This process applies to GAO's:
o Congressionally requested work o Legislative mandates o Comptroller General Authority (CGA) work
• Engagement Process: Phases or Activitieso Acceptanceo Planning and Designo Data Gathering and Analysiso Product Development and Distributiono Results
• Many Document Typeso Report, Testimony, Decision, Guidance, CG Presentation ect.
• ~3 million documents in the electronic records management system• ~180,000 engagement publications (audit/legal)• ~40 engagement system applications
16
Acceptance
- Evaluate Request, Mandate, CGA Proposal
- Make Acceptance Decision- Communicate Decision
Planning & Design
- Staff Engagement- Launch Engagement- Design Engagement- Plan Engagement- Commit to Engagement
Results
- Recommendation Tracking
- Accomplishment Reporting
- Audit Documentation Archive
Data Gathering & Analysis
- Gather Data- Analyze Data- Reach Message Agreement
Product Development & Dist.
- Develop Product- Obtain Concurrence- Index and Reference- Address Agency Comments- Perform Final Processing
Engagement Systems
Strategic Planning
Workforce Planning
Performance & Accountability Management
db3
db2:Requestor
SubjectTeamJob #
db1:Requestor
SubjectTeam
Accepted Date
db5
db4:Team
SubjectTitle
ObjectivesJob #
db7
db6:Document
TeamJob #
db9
db8:TeamTitle
Product #
db11
db10:Requestor
TeamJob #Title
Product #
Creation Indexing Publication
17
Typical Enterprise Semantic Problem
“Let us go down there and confuse their language”
Information assets are: • Fragmented • Decentralized • Inconsistently described
Multiple repositories, search systems and results presentationsare confusing to the user• Where is the information stored?• What keywords will retrieve the information?
To-Be GAO Information Architecture:• Enterprise IA Vision–Unify Information Space• Common Information Infrastructure• Taxonomy is the common metadata and vocabulary that provides meaning and context to assets
19
Development Process: Near and Longer Term View
Research Strategy Design Implementation Administration
Longer Term IA Program
Iterative: Series of mini-projectsNear Term
20
Audiences, tasks,needs, informationseeking behavior,experience, vocab-ularies
Business goals, funding, politics,culture, technology,human resources
Document types,content objects,metadata, volume,existing structure
Research: Information Gathering and Analysis
Information Architecture for the World Wide Web, Peter Morville and Louis Rosenfeld
Research Methods • Focus Groups• Interviews• Questionnaires• Benchmarking
21
NASA World Bank EPA
Requirement
Faceted Navigation Yes Yes Yes
Core Metadata Specification
Yes Yes Yes
Enterprise Metadata Profile Yes Yes Yes
Metadata Registry/Repository
Yes Yes Yes
Tools (lists may be incomplete)
CategorizationExtractionFaceted NavigationTaxonomy ManagerInxightSidereanSchemaLogic
CategorizationClusteringExtractionSummarizationTaxonomy ManagerTeragram
CategorizationTaxonomy ManagerContent Intelligence ServicesSynaptica
Research: Benchmarking
Statement of Needs Target Requirements
Corporate Taxonomy Faceted Navigation
Develop an enterprise vocabulary using manual or technological solution
Manual or Extraction Tool
Automatic assignment of content to relevant categories
Categorization Tool
GAO needs a consistent information infrastructure shared across different applications
Core Metadata Specification Information Model Metadata Registry/Repository
Tool TypeVendor
22
Priority
Goal
Capabilities
Improve our ability to work more efficiently
Improve ability to store, archive, retrieve project information
EnterpriseContent
Management
Product Lifecycle
Management
InformationDiscovery and
Retrieval
• Document Storage• Web Content
Management• Records
Management• Work Flow
• Product Data Management• Requirements Management• Risk Management
• Cross Repository Retrieval
• External Partners Data Exchange
• Access Verification• Export Compliance
Processes
Technologies
• Security: Authentication• Metadata Standards
• Electronic Library - DocuShare
• Document Repository - Teamcenter Community
• Web Content - Rythmyx
• PDMS - Teamcenter Enterprise• Requirements Repository – DOORS,
Cradle, Core• Risk Management - ARM
• Portals – Inside JPL, Teamcenter Community
• Search Engine – Google• Problem Reporting - PFR/PRS• Manufacturing/Inventory - iPICS
• Domain Taxonomies• Schema Registries
Common Information Infrastructure
• Unique Object Identifiers
NASA: View from the top….
IA work supportsmany different stakeholders
Courtesy of NASA
23
• Involve users in a process of information gathering and analysis to facilitate identification and development of a set of enterprise information architecture needs or capabilities
• Define a common information infrastructure (common metadata and vocabulary) necessary to sufficiently describe GAO’s information assets
• Design an enterprise asset tagging workflow which increases data integrity and increases productivity
• Develop an approach for data validation, both for new metadata and the conversion of existing metadata from GAO systems
• Create an EIA roadmap which highlights near term “quick wins” and longer term to-be architectures
Business/Financial
Information and
TechnologyMgt
Engagement Audit/Legal
HumanCapital
Harmonize with
Common Metadata
and Vocabulary
Strategy: GAO To-Be IA (Draft)
24
Strategy: Some Specific Capabilities (Draft)
• Simple and transparent metadata capture process for content creation
• Auto-population of metadatao entity/concept extraction, categorization, clustering,
summarizationo Process for metadata creation and conversion
• Improve data integrity, quality and governance• Semantic interoperability
Dual track work plans – improve (quick wins) as-is systems and develop to-be roadmaps
o As-Is GAO Search Beta (demo)o To-Be Information Architecture Roadmap
25
Strategy Vision: Search and Browse (Draft)
Common Information InfrastructureFragmented Information Infrastructure
26
Metadata Extraction, Categorization Applications
Content Creation
Metadata Repository(Harmonize)
Validate against
metadata schema
Strategy Vision: Meta-Tagging Workflow for Content Creation and Conversion (Draft)
Content Object
Data Conversion
Legacy metadata
Document processing auto-population
Common Information Infrastructure
QualityAssurance
27
Design Approach: Four separate but related work tracks (Draft)
• Metadata Schema and Vocabularies (Taxonomy) – assess existing metadata schemas, industry standards and asset use cases to derive a “baseline” metadata framework; a core model plus selective extensions required as necessary by content type and domain. The baseline model will provide a foundation for the project’s subsequent work tracks.
• Information Infrastructure – utilize baseline metadata schema to build metadata element registry/repository, relate metadata element definitions to resources; design and implement architectural framework for metadata auto-population tools
• Governance Framework – define both process (meta-tagging workflow) and technology (meta-tagging tools) activities which provide quality assurance to GAO’s content stakeholders.
• Information Retrieval Systems – create search and navigation roadmaps for “quick wins” and longer range vision, implement mechanisms for faceted navigation
29
Implementation: Prototype Sequence of Events (Draft)
Information Gathering &Planning
Conduct Interviews and Use Case
Prototype Testing
Build Information Retrieval Mechanisms (Search and Browse)
Assess information environment
Build Infrastructure Architecture
Design Metadata Schema and Vocabulary
Facet analysis and Use Case
Assess results and iterate design
Content Integration
Core Metadata Specification Version 1.0 Complete
Mapping/CrosswalkElement Definition/Registry
Audit of information domains and content types - prioritize
Faceted Navigation Enabled
Develop metadata strategy
Auto-populaterepository
Governance Framework
30
Administration: Data Governance (Draft)
• The need to plan, define, enable, and measure information architecture changes drives the need to address governance and consensus building regarding new roles, responsibilities, and workflows as early as possible
Who will manage the metadata schema and
vocabularies?
Thesaurus
Authority Files
ClassificationRules
DataDictionary
Taxonomy
•Core Metadata Specification•Maintain Taxonomies•Maintain Authority Files•Maintain Thesaurus •Maintain Classification Rules
Who will manage impacts to the business?
•Manage consensus across content owner groups on common metadata •Champion changes to existing systems and business unit procedures to capture accurate and more precise asset descriptions
Who will manage impacts to asset tagging
workflows?
• Meta-tagging new content and legacy content conversion•Determine if ‘re-indexing’ is required and how frequently it will occur
Who will manage the infrastructure impacts?
•Foster integration of content and search engines
Auto-population tools
31
Increasing Levels of IA Maturity: Iterative Development
• Search engine indexes multiple repositories• Advanced computation of relevance• Search log and click trail analysis• Core metadata specification• Faceted Navigation• Intelligent Search and Discovery• Metadata Registry/Repository • Semantic Interoperability • Enterprise Search Best Practice• Auto-population of metadata• Improved data integrity and governance• Enterprise 2.0 – social tagging, wikis, blogs