curation of information and knowledge
DESCRIPTION
TRANSCRIPT
Curation of Information & Knowledge
© 2011 Jorn Bettin
http://commons.wikimedia.org/wiki/File:Wentletrap_001.jpg
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
Converting raw data and tacit knowledge into Relevant Information and Explicit Knowledge
Quality
http://commons.wikimedia.org/wiki/File:JonWoodApril2007Texas.jpg
Complexity ...
Relevance ...
http://commons.wikimedia.org/wiki/File:Administrative_burden.JPG
Understandability ...
is hard to communicate• It’s not tangible
• It’s not raw data
• Much of it is tacit
Value of Knowledge
http://commons.wikimedia.org/wiki/File:Cloud_computing_icon.svg
Measuring Quality of Information
Relevant dimensions1. Accuracy
2. Currency
3. Completeness
4. Security
5. Reliability
6. Unambiguity
7. Findability
8. Traceability
9. Simplicity
10. Usability
AccuracyWhy does it matter?
• Information is used for operational and strategic decision making
• It must be trustworthy
How is it measurable?• Define acceptable tolerance intervals
How can it be improved?• Focus on relevant information and eliminate
irrelevant information
CurrencyWhy does it matter?
• Information is used for operational and strategic decision making
• It must be timely
How is it measurable?• Define acceptable temporal delays
How can it be improved?• Increase the level of automated system
integration
• Invest in adequate computing and network infrastructure
CompletenessWhy does it matter?
• Information is used for operational and strategic decision making
• It must be sufficiently free of gaps
How is it measurable?• Specify the sources of each piece of
information
• Distinguish between mandatory and optional information for decision making
How can it be improved?• Focus on relevant information and
eliminate irrelevant information
SecurityWhy does it matter?
• To enforce information ownership
• To ensure compliance with privacy legislation
• To prevent theft of information
How is it measurable?• Strength of authentication mechanisms
• Strength of encryption mechanisms
• Level of alignment between role based access control and job descriptions
How can it be improved?• Introduce stronger authentication and
encryption
• Remove ambiguities from job descriptions
ReliabilityWhy does it matter?
• To avoid outages
• To prevent disasters
How is it measurable?• Definine the acceptable minimum availability
of each information source
How can it be improved?• Use software designs that tolerate temporary
outages of required/external services
• Invest in system and data centre replication technology
UnambiguitityWhy does it matter?
• To minimise communication errors
• To prevent wrong decisions
• To prevent disasters
How is it measurable?• Count the homonyms in each
role-specific context
How can it be improved?• Establish a comprehensive
registry of concepts
• Use concepts names that are tailored to the role-specific context
• Use semantic identities instead of names when communicating information
FinadabilityWhy does it matter?
• To enable staff to find relevant information
• To speed up decision making
• To prevent disasters
How is it measurable?• Count how often staff need to talk to
colleagues to find information that is stored in an information system
How can it be improved?• Provide advanced support for queries
• Make the query engine aware of the role-specific context
• Allow query by information category, by container, by name, and by semantic identity
TraceabilityWhy does it matter?
• To speed up root cause analysis of errors
• To speed up the learning curve for new staff
• To meet legal & regulatory compliance needs
How is it measurable?• Count how often staff need to talk to
colleagues or need to resort to ad-hoc search for tracing the source of an error
How can it be improved?• Consistent use of information categories and
containers
• Automatic tagging of information with temporal & spacial meta data
• Adherance to retention constraints
SimplicityWhy does it matter?
• To accommodate human cognitive limits
• To prevent wrong decisions
• To prevent disasters
How is it measurable?• Collect artefact complexity metrics
How can it be improved?• Intuitive representations that are developed in
collaboration with domain experts
• As needed, role-specific representations
• Provide an explicit modularisation mechanism for all artefacts
UsabilityWhy does it matter?
• Intuitive user/system interaction
• Device independent information access
• To discourage use of non-compliant tools
How is it measurable?• Validation by average users
How can it be improved?• Consistency of representations across devices
• Use of high-quality icons that are developed in collaboration with domain experts
• Ensure adequate reliability
Knowledge Curation
http://commons.wikimedia.org/wiki/File:Wentletrap_001.jpg
Knowledge Repositories
Examples
A language artefact is a non-hardware artefact• information content of pheromones
• information content of body language
• live music
• live speech
• information content in traditional symbolic notations
• program/diagram/hypertext/database content
• information content of recorded sound/pictures/videos
• information content of genetic material
http://commons.wikimedia.org/wiki/File:Photo_with_histogram.JPG
SelerequmAdequate support for role based access control
cri\el Su\rce template/transSate support for role based access control
SeptersAMS datastore bisupport for role based access control
A language artefact• is a container of information
• is instantiated by a specific actor (human or system)
• is consumed by at least one actor (human or system)
• represents a natural unit of work (for the instantiating & consuming actors)
• may contain links to other artefacts
• has a state and a lifecycle
Definition SelerequmAdequate support for role based access control
cri\el Su\rce template/transSate support for role based access control
SeptersAMS datastore bisupport for role based access control
Communication
Definition
Software is an arbitrary set of language artefacts
SelerequmAdequate support for role based access control
cri\el Su\rce template/transSate support for role based access control
SeptersAMS datastore bisupport for role based access control SelerequmAdequ
ate support for role based access control
cri\el Su\rce template/transSate support for role based access control
SeptersAMS datastore bisupport for role based access control SelerequmAdequ
ate support for role based access control
cri\el Su\rce template/transSate support for role based access control
SeptersAMS datastore bisupport for role based access control SelerequmAdequ
ate support for role based access control
cri\el Su\rce template/transSate support for role based access control
SeptersAMS datastore bisupport for role based access control
Software Producers
software systems & other humans
software developers
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
1st-Level Categorisation
operational data
meta datameta data
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
Definitions
Data, Information, Knowledge• uncategorised data has very little value
• categorised data is valuable information
• information combined with an understanding of its usage context is valuable knowledge
the categories (= meta data) must be relevant to the organisation
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
SelerequmAdequate support for role
cri\el Su\rce template/transSat
SeptersAMS datastore bisuppor
A
F
EF
C
D
B
Value Chain
produce
consume
produceproduce
consume
prod
uce
prod
uce
consume
produceco
nsum
e
consume
A B C
D E F
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Elements of knowledge acquisition• Collaboration
• Exploration
• Observation
• Validation
• Abstraction
• Modularisation
• Representation
Learning
Collaboration
“We are smarter than me”Jean-Marie FavreSoftware Anthropologist
Exploration
Raw data acquired by exploration is essential for understanding an unknown domain• Data can be analysed and categorised
• Lack of data only leads to speculation
Observation
Connecting the dots – building a mental model• Associating information with time,
space, and other attributes of origin
• Noticing possible associationsbetween different pieces of information
http://commons.wikimedia.org/wiki/File:Knowledge,_observation_and_reality.svg
Tacit
Validation
Confirming observations• Using the scientific method
• By comparing with observations from others
• By involving domain experts from related disciplines
• Remember: we are smarter than me!
Abstraction
Look for Commonalities• Avoid repetition
• Identify patterns
• Remember: KISS!Photographer Kurt Salzmann -
www.salzmaenner.ch
Modularisation
http://commons.wikimedia.org/wiki/File:Modular_origami.jpg
Modules preserve Simplicity• Rely on role-based
separation of concerns
• Modules must correspond to a natural unit of work
• Roles and modular artefacts represent the building blocks of value chains
• Optimise within the organisational context of customers, suppliers, and available skills
Representation
Modelling is about clarity• Balancing act between simplicity
and not compromising the desired intent
• Focus is on human cognitive abilities & limits
• As needed use multiple syntax elements (visual containers, symbols, text, mathematical expressions)
• Borrow syntax from established languages, or design syntax in close collaboration with the user community
Code
All models are codea system of symbols used for
• identification
• classification in the sense of grouping
a system of signals used to send messages
a set of conventions governing behaviour
Modelling is meta coding to improve clarity of code
Examples
Class : Mammal
dateOfBirth
Class : Dog
isPoliceDog
Class : Cat
Dog : Jack{1/5/03, yes}
Dog : Susie{1/2/00, no}
Cat : Coco{4/3/07}
Cat : Peter{10/9/98}
[*]
[2]
[*]
[2]
http://commons.wikimedia.org/wiki/
Communication Costs
Not all code is a model• a system of signals that includes a
translation of messages to deal with someone else’s syntax
• a system of symbols used for classification in the sense of obfuscation or encryption
http://commons.wikimedia.org/wiki/File:Encryption_-_decryption.svg
Software suffers from the same problems as way backwhen natural language evolved to enrich the exchange between humans
Increasingly the artefacts exchanged between humans are neither hardware nor natural language (encoded in speech or symbolic notation)
All language artefacts share the probems of natural language: unanticipated interpretations
Today
http://commons.wikimedia.org/wiki/File:Discussion.jpg
Requires collaboration and good will between artefact producers & all consumersAssociating information with its usage context
Respecting the notational and terminological preferences of all parties
Assigning a unique semantic identity to each piece of information (= concept)
Minimising Unanticipated Interpretation
Semantic Modelling
AC
B
Semantic Modelling
Semantic DomainsModels
1. Identification of concepts andassigment of semantic identities
3. Naming of concepts in as many terminologies as required by artefact producers and consumers
2. Modelling
next
next
• Based on the mathematics of model theory & denotational semantics
• Constitutes a solid foundation for information engineering & knowledge curation
• Not the same as modelling with theRecource Description Framework (Semantic Web)
• Not the same as classical entity-relationship modelling
• Not the same as object-oriented modelling
Semantic Modelling
Semantic DomainsModels
• Focuses on the meaning of information in a concrete usage context
• Converts tacit knowledge into explicit knowledge for use by humans and software tools
• The Recource Description Framework only partially implements denotational semantics
• Entity-relationship schemas lack a mechanism for modularity
• Object-oriented models are limited to one level of instantiation
Semantic Modelling
Semantic DomainsModels
Without delving into the formal mathematical details, the significance of model theory is best appreciated intuitively by considering the following observations:
• Formal lingustics as pioneered by Noam Chomsky in the 1950s and 1960s can be expressed as a special case of model theory.
• The work of model theorists goes back to the beginning of the 20th century, and was motivated by mathematicians who were concerned about potential logical inconsistencies in the mathematical symbol system and the conventions governing its use.
• The resulting research into symbol systems has led to a mathematical theory that can be used to formalise any symbol system, not limited to the languages invented by humans, and including the genetic code.
• The pictures produced on flip charts and white boards constitute domain specific languages as well, and with the help of their authors, sets of pictures can easily be formalised mathematically, using a specialised software tool for semantic modelling.
Model Theory
A
FEF
CB
Semantic Domains
DD
Modular Models
Modules preserve Simplicity• Roles and modular artefacts represent
the building blocks of value chains
• Optimise within the organisational context of customers, suppliers, and available skills
separation of concerns
unit of workrole based
A B C
D E F
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
A B C
D E F
Connected Semantic Domains
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Selection criteria for a metadata repositoryAdequate support for CR compatible versioning, branching, locking requirementsSupport for interfaces with current commercial products (eg ERWin)Metamodelling capability and ideally an extensible metametamodel Support for development of adaptersAdequate support for generalisation/specialisationSupport for multiple terminologies/jargonsIntegration with open source template/transformation languagesRDBMS datastore binding (to support referential integrity)Support for information ownershipAdequate support for role based access control
Shared Language
ab
ac
df
de
ad
abacdfde ad bcef cf
Jargon = Words + Symbols
dfD
View Point
Perspective
JargonF
Ff
View Point
Reflexive Jargon
DSMLF
DSML = Domain Specific Modelling Language
Jargons develop on top of Shared Semantic Subdomains
A B C
D E F
ab ac
dfde
ad
bc
ef
cf
Thank youJorn Bettin
+61 424 758 540
Knowledge Reconstruction & Risk Management http://jornbettin.com
Gmodel Team Blog the-software-artefact.blogspot.com
The Role of Artefacts tiny.cc/artefacts
From Muddling to Modelling tiny.cc/muddleToModel
Model Oriented Domain Analysis tiny.cc/domainanalysis
More Information
jbettin @ ibrs.com.au www.ibrs.com.au