eim intro - information lifecycle
TRANSCRIPT
-
8/9/2019 EIM Intro - Information Lifecycle
1/23
RFCorsello
Research
Foundation
Enterprise Information ManagemInformation Lifecycle
-
8/9/2019 EIM Intro - Information Lifecycle
2/23
Introduction
Information Management is a complex subject covering all aspemanaging information within a given domain or organization
Sharing between domains and organizations is a large need that bradditional complexity
Information Management requires the understanding of the thrbasic states of information
Data raw values
Information data under a given context
Knowledge the emergence of understanding from information
-
8/9/2019 EIM Intro - Information Lifecycle
3/23
Information StFrom Data to Kno
-
8/9/2019 EIM Intro - Information Lifecycle
4/23
Data
Data is: Raw values, such as 5 or 5 dollars
Collections of values, such as a spreadsheet file
Data does not imply format or structure
Data itself is the value, not the storage
Data may be structured or formatted in any way
Any specific format MAY provide context
Data within an appropriate context becomes information
Once removed from a context, or in an irrelevant context is once again data
Data may be relevant in many contexts, which is the cornerstone for sharing
-
8/9/2019 EIM Intro - Information Lifecycle
5/23
Information
Information is:
Data under a relevant context
Context must be relevant to the data
Context must also be relevant to the observer
A concept information does not physically differ from data
Information implies an ability to gain understanding
But only if the information is available
Information must be relevant to the context of the observer
Data must also be relevant to the context of the observer to be more than simply data
Knowledge arises from information
Facts are information known to be correct or true
Information is data with relevant context
Data is any value or set of values
-
8/9/2019 EIM Intro - Information Lifecycle
6/23
Knowledge
Knowledge is:
The theoretical or practical understanding of a subject
Facts and information
Awareness or familiarity gained by experience of a fact or situation
Knowledge arises from information
Facts are information known to be correct or true
Poor context for information may greatly impact the ability to derive knowledge
Information implies an ability to gain understanding
But only if the information is available
Knowledge results in the synthesis of information
Possibly just in the mind
Tools for content management enable the creation of context
Linking between contexts and content provides additional information
Knowledge management is more appropriately information management
It is the users that translate knowledge into information within the systems
-
8/9/2019 EIM Intro - Information Lifecycle
7/23
Data to Knowledge
From an Information Technology perspective
All knowledge is represented as information
All information is represented as data
Structure of the data as stored provides context
Derivation of information from data comes from how the data is accessed
The access methods and data structures form the context
Knowledge is represented as new information generate from existing information
One user creates information that a second user accesses
The second user has gained knowledge from the first
If software were able to take action based upon the information, that software would also have gained knan intelligent agent
Data is what is managed in a computer system
The contexts are also represented in a computer as data
-
8/9/2019 EIM Intro - Information Lifecycle
8/23
Information LifecFrom Capture to D
-
8/9/2019 EIM Intro - Information Lifecycle
9/23
The Lifecycle
The information lifecycle is the processes by which data comes existence, is managed over time and eventually is discarded.
There are generally four basic states of the information lifecycle
Creation, collection or capture
Distribution, use and access
Maintenance, update or change
Disposition, archival or destruction
-
8/9/2019 EIM Intro - Information Lifecycle
10/23
Creation
The creation of data is the process of generating and storing da For some data, the entire lifecycle may be outside of IT systems
as paper records.
The data creation phase is broken into three primary areas:
Capture
Assessment and Approval
Ingestion
-
8/9/2019 EIM Intro - Information Lifecycle
11/23
CreaCapture, Assessment, In
-
8/9/2019 EIM Intro - Information Lifecycle
12/23
Capture
Data capture can be divided into four primary categories:
Continual or telemetry, where data is automatically generated and fed into ainformation repository, such as surveillance cameras
Bulk, or offline where data is collected and aggregated, then fed in bulk toinformation repository
Manual, which is the traditional human process of collecting data a single at a time
Derived, or automated generation, where data is created by performing com
on other data. This includes activities such as: Models and simulations
Statistical analysis
Interpolation or smoothing
-
8/9/2019 EIM Intro - Information Lifecycle
13/23
Assessment and Approval
The assessment process involves the evaluation of captured data to ensure it meets pre-defined criterion for acceptance
The two primary parts to the assessment process are:
Quality Assurance (QA)
Quality Control (QC)
QA is the set of practices that are performed to:
Ensure data will meet acceptance criteria prior to being created. This involves activities such as:
maintenance and calibration of instruments
usage guidance for proper instruments.
Evaluate collected data to enhance quality assurance activities for future collections.
Evaluate QC practices and results to ensure quality criteria are met.
QC is the set of practices that:
Ensures the data within a repository will meet or exceed quality criterion
Prevent poor quality data from making it into the public repositories
The assessment and approval stage ensures only the created data meeting quality and acceptance criteria are available to futu
Poor assessment and approval practices result in poor quality data being available
-
8/9/2019 EIM Intro - Information Lifecycle
14/23
Ingestion
Accepted data is loaded into the appropriate business repositories The process of ingesting data may involve transformations to match the
destination format
Transformation is a common requirement for automated collection me
To maintain a full and verifiable chain of custody
Raw data is kept in addition to the transformed data
For space savings, raw data are often archived to an offline store
Once data is ingested, it is available for consumption
It is not uncommon for the entire creation process to be automateda single system
-
8/9/2019 EIM Intro - Information Lifecycle
15/23
Distribution and Use
Use of data within a repository is the primary purpose for the d
existence
Data use is considered in several ways:
Discoverability
Accessibility
Usability
-
8/9/2019 EIM Intro - Information Lifecycle
16/23
Distribution andDiscoverable, Accessible, Available
-
8/9/2019 EIM Intro - Information Lifecycle
17/23
Discoverable
Once data is within a repository is may be used
In order to use data it must be discovered by a potential user
Mechanisms to facilitate the location of data are discovery mechanisms
If data cannot be found, it cannot be used
Discoverability is key in the storage of data and the availability of that storaguser system
If a user must search in multiple locations to find data, it is of marginal discoand use
For data to be discovered, the discovery data (metadata or catalog) must alsaccessible
A user interface is the central location for discovery to be exposed to a user
-
8/9/2019 EIM Intro - Information Lifecycle
18/23
Accessibility
Data must be accessed to provide value
The accessibility of data involves aspects such as:
Security
Logical location
Format
If data is secured so that potential users cannot access it, the value of the data is diminished to those us
In sensitive domains, this is expected and desired
Logical location further limits accessibility if the data is contained within a repository that cannot be acc Behind a firewall
Simply far away, then data transfer may take too long
Data in formats that are proprietary or poorly supported may not be accessible to the tools required
Overall, accessibility is a balancing act with security, need and cost
-
8/9/2019 EIM Intro - Information Lifecycle
19/23
Usability
Data in an unusable format given the available tools is unusable
If data must be processed prior to being used it is less usable
If processing time is long, data may become irrelevant before it is usable
Usability has subtle implications such as:
Scale
Temporal currency
Accuracy and precision
Low precision data cannot be used in a high precision analysis
Cost of data creation always a trade off against anticipated use
Collecting high-quality, high-precision data can always pay off if cost shared with users in need oflower-precision data
Redundant data collections are purely evil
-
8/9/2019 EIM Intro - Information Lifecycle
20/23
Maintenance and DisposNow that we have it, What do we do with it and How do we get
-
8/9/2019 EIM Intro - Information Lifecycle
21/23
Maintenance
Data that changes over time must be maintained
Data editing is subject to discovery, access and usage in addition to the need edits
In some scenarios, only the current values are relevant
In other scenarios, temporal changes are of greater significance than the current values
Editing scenarios affect and influence data management strategies.
The maintenance phase of the lifecycle includes:
The entire set of practices and processes governing data management and maintenance
Issues such as archival, availability, continuity of operations (COOP), fault-tolerance, perfand total costs
The maintenance phase is the longest lived part of the data lifecycle
All data uses occur within the maintenance phase
-
8/9/2019 EIM Intro - Information Lifecycle
22/23
Disposition
Disposition involves the processes and practices by which data is aged within repo
Disposition includes:
Archival or removal of old data
Segregation of history data from live data
Mechanisms for making segregated data available
It is common that disposition is driven by storage costs and legal mandates such a
SarbanesOxley (Sarb Ox / SOX)
ClingerCohen
Health Insurance Portability and Accountability Act (HIPAA)
Once data has been disposed it is no longer part of the information lifecycle
If the data is still available, such as historic data, it is not disposed
-
8/9/2019 EIM Intro - Information Lifecycle
23/23
Quest