1 peter fox xinformatics – itec 6961/csci 6960/erth-6963-01 week 12, april 27, 2010 information...
TRANSCRIPT
![Page 1: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/1.jpg)
1
Peter Fox
Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01
Week 12, April 27, 2010
Information discovery and integration and course
summary
![Page 2: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/2.jpg)
Contents• Review of last class, reading
• Information discovery
• Information integration
• Summary of this course and what you needed to learn
• Objectives
• Discussion of reading
• Next class
2
![Page 3: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/3.jpg)
Recall forms of information• Structured/ un-structured
• Presentation and organization
• Syntax-semantics-pragmatics
• Managed, designed and architected.
• Goal of this part of the class is to understand how discovery and integration are enabled or disabled based on these factors
3
![Page 4: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/4.jpg)
Discovery• How does someone find your information?
• How would you provide discovery of – collections – files – ‘bits’
• How would you find ->
4
![Page 5: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/5.jpg)
Discoveryo Federated Searcho Folksonomies (user contributed)o Intelligent Agentso Search Engineso Taxonomies
o Find photos of KimoBoy or girl?
5
![Page 6: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/6.jpg)
Use cases• Find a sound recording of a swallow.
• Excuse me?
6
![Page 7: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/7.jpg)
Use cases• Find a sound recording of an African Swallow
• Find a sound recording of a bird that sounds like an African Swallow
• Media types – how can you discover them?
7
![Page 8: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/8.jpg)
Use cases• Find the movie that Jean Tripplehorn first
starred in/ that was her most successful/ was lead actress?
• Has anyone gene sequenced a mouse?
• Discovery can often involve information integration
8
![Page 9: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/9.jpg)
9
Three level ‘metadata’ solution for DATA
Level 1:
Data Registration at the Discovery Level,
e.g. Volcanolocation and activity
Level 2:
Data Registration at the Inventory Level,
e.g. list of datasets,times, products
Level 3:
Data Registration at the Item Detail
Level, e.g. access toindividual quantities
Ontology basedData IntegrationUsing scientific
workflows
Earth Sciences Virtual DatabaseA Data Warehouse where
Schema heterogeneity problem is Solved; schema based integration
Data Discovery Data Integration
A.K.Sinha, Virginia Tech, 2006
![Page 10: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/10.jpg)
10
Three level ‘metadata’ solution?
Level 1:
Registration at the Discovery Level,
e.g. Find the upperlevel entry point to a
source
Level 2:
Registration at the Inventory Level,
e.g. list of datasets,using the logical
organization
Level 3:
Registration at the Item Detail
Level, i.e. annotatione.g. tagging
Integrationusing mappingmanagement
Catalog/ IndexSchema based integration
Information Discovery
Information
Integration
A.K.Sinha, Virginia Tech, 2006
![Page 11: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/11.jpg)
Information discovery• What makes discovery work?
– Metadata– Logical organization– Attention to the fact that someone would want to
discover it– It turns out that file types are a key enabler or
inhibitor to discovery
• What does not work?– Result ranking using *any* conventional
algorithm11
![Page 12: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/12.jpg)
Federated search• “is the simultaneous search of multiple online
databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.” wikipedia
• Libraries have been doing this for a long time (Z39.50, ISO23950)
• Key is consistent search metadata fields (keywords)• E.g. Geospatial One Stop http://www.geodata.gov
12
![Page 13: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/13.jpg)
Search engines (1)• Contains an automated spider or crawler • No theoretical limits in the amount of indexing
(limited by hardware) • Support remote indexing• Continual background indexing of content• Custom metatag support (some low-end
products do not support this feature) • Support for indexing PDF, .doc, etc (some low-
end products do not support this feature) • Supports URL and word exclusions &
inclusions13
![Page 14: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/14.jpg)
Search engines (2)• SSI supported
• Search by custom metatags
• Case sensitive or insensitive searching
• Simple Customizable search/results pages
• Boolean Searching capabilities
• Provide users meta description and page title in search results
• Inexpensive – $200
• Easily customizable search/results interface14
![Page 15: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/15.jpg)
Search engines (3)• Result weighting feature
• URL Inclusion list
• Require significant memory (RAM) and disk space as the collection grows
• Low-end alternatives often do not possess the capabilities to do phrase or natural language searching.
15
![Page 16: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/16.jpg)
Improve www discovery• Implement metatags on your and your partners web
sites• Update content frequently • Register your site with the major search engines
(tools exist to aid in this process)• Perform a basic study of where your site results
within the major search engine providers• Do not spam the search engine providers • Re-evaluate your web site directory structure to
ensure information is appropriately categorized/ described within your URL strings
16
![Page 17: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/17.jpg)
Improve www discovery• Look through your server log files to determine what
users are trying to find on your site and/or the path they are using to find information
• Perform basic usability testing of your site to determine what users expect and can easily gather from your site. This also may determine why users go to an Internet search engine provider versus accessing your site directly.
• Realize that Internet search engines don’t all act the same, index at the same time period, and often value a particular metatag, document date, etc. more than another vendor product. 17
![Page 18: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/18.jpg)
Smart search• Semantically aware search, e.g.
http://noesis.itsc.uah.edu
• Faceted search, e.g. mspace (http://mspace.fm ), Earth System Grid (http://esg.prototype.ucar.edu ), exhibit (MIT)
18
![Page 19: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/19.jpg)
NOESIS
19
![Page 20: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/20.jpg)
Faceted search• Semantically aware search, e.g.
http://noesis.itsc.uah.edu
• Faceted search, e.g. mspace (http://mspace.fm ), Earth System Grid (http://esg.prototype.ucar.edu )
20
![Page 21: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/21.jpg)
Summary - discovery• Useful to write a few discovery use cases to
drive how your design is developed
• Evolution of your role in facilitating discovery and what/ how others implement access to your information
21
![Page 22: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/22.jpg)
Information integration• Involves combining information residing in different sources
and providing users with a unified view of them. This process becomes significant in a variety of situations both commercial (when two similar companies need to merge their databases) and scientific (combining research results from different bioinformatics repositories, for example).
• Integration appears with increasing frequency as the volume and the need to share existing information explodes.
• It has become the focus of extensive theoretical work, and numerous open problems remain unsolved.
• In management circles, people frequently refer to data integration as "Enterprise Information Integration" (EII)” wikipedia
• Is this an information management challenge (rhetorical question)
22
![Page 23: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/23.jpg)
Aiding integration• Standards – formats for sure but also
• Metadata
• Semantics
• As such any integration capability is HIGHLY curated or left entirely to the end user
• If left to the user, results in a new product which should also be managed and shared
• What do you do?23
![Page 24: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/24.jpg)
Recall elements/ forms of information
• Structured/ un-structured, content, context
• Presentation and organization
• Syntax-semantics-pragmatics
• Managed, designed and architected.
• Integration poses an important challenge here– Two forms presented/ organized differently– Different structure, semantics…
• Information back to data back to information 24
![Page 25: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/25.jpg)
Micro life cycle of data
![Page 26: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/26.jpg)
Geospatial
26
• Much of the work on information integration has focused on the dynamic integration of structured data sources, such as databases or XML data.
• With the more complex geospatial data types, such as imagery, maps, and vector data, researchers have focused on the integration of specific types of information, such as placing points or vectors on maps, but much of this integration is only partially automated.
• The challenge is that the dynamic integration of online data and geospatial data is beyond the state of the art of existing integration systems.
![Page 27: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/27.jpg)
Geospatial
27
• The conflation process divides into following tasks: (1) find a set of conjugate point pairs, termed "control point pairs", in both vector and image datasets, (2) filter control point pairs, and (3) utilize algorithms, such as triangulation and rubber-sheeting, to align the rest of the points and lines in two datasets using the control point pairs.
• Typically by human input has been essential to find control point pairs and/or filter control points
![Page 28: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/28.jpg)
Vectors on maps
28
![Page 29: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/29.jpg)
Different contexts?• Heavily relies on metadata, especially on
structural/ use metadata
• Is more than often what leads to new findings, and abduction!
• Exercise – how does integration occur for the other aspects of information?
29
![Page 30: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/30.jpg)
Review of course content
30
![Page 31: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/31.jpg)
Abduction• Is a method of logical inference introduced by
Peirce which comes prior to induction and deduction for which the colloquial name is to have a "hunch".
• Abductive reasoning starts when an inquirer considers of a set of seemingly unrelated facts, armed with an intuition that they are somehow connected.
• The term abduction is commonly presumed to mean the same thing as hypothesis; however, an abduction is actually the process of inference that produces a hypothesis as its end result
31
![Page 32: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/32.jpg)
Use Case
• … is a collection of possible sequences of interactions between the system under discussion and its actors, relating to a particular goal.
• The collection of Use Cases should define all system behavior relevant to the actors to assure them that their goals will be carried out properly.
• Any system behavior that is irrelevant to the actors should not be included in the use cases.– is a prose description of a system's behavior when
interacting with the outside world.– is a technique for capturing functional requirements of
business systems and, potentially, of an IT system to support the business system.
– can also capture non-functional requirements
![Page 33: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/33.jpg)
Developed for NASA TIWG
Table of Contents• ==Plain Language Description==• ===Short Definition===• ===Purpose===• ===Describe a scenario of expected use===• ===Definition of Success===• ==Formal Use Case Description==• === Use Case Identification===• ===Revision Information===• ===Definition===• ===Successful Outcomes===• ===Failure Outcomes===• ==General Diagrams==• ===Schematic of Use case===• ==Use Case Elaboration==• ===Actors===• ====Primary Actors====• ====Other Actors====• ===Preconditions===• ===Postconditions===• ===Normal Flow (Process Model)===• ===Alternative Flows===
• ===Special Functional Requirements===• ===Extension Points===• ==Diagrams==• ===Use Case Diagram===• ===State Diagram===• ===Activity Diagram===• ===Other Diagrams===• ==Non-Functional Requirements==• ===Performance===• ===Reliability===• ===Scalability===• ===Usability===• ===Security===• ===Other Non-functional Requirements===• ==Selected Technology==• ===Overall Technical Approach===• ===Architecture===• ===Technology A===• ====Description====• ====Benefits====• ====Limitations====• ===Technology B===• ====Description====• ====Benefits====• ====Limitations====• ==References==
• ===Special Functional Requirements===• ===Extension Points===• ==Diagrams==• ===Use Case Diagram===• ===State Diagram===• ===Activity Diagram===• ===Other Diagrams===• ==Non-Functional Requirements==• ===Performance===• ===Reliability===• ===Scalability===• ===Usability===• ===Security===• ===Other Non-functional Requirements===• ==Selected Technology==• ===Overall Technical Approach===• ===Architecture===• ===Technology A===• ====Description====• ====Benefits====• ====Limitations====• ===Technology B===• ====Description====• ====Benefits====• ====Limitations====• ==References==
![Page 34: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/34.jpg)
Information theory
• Semiotics, also called semiotic studies or semiology, is the study of sign processes (semiosis), or signification and communication, signs and symbols, into three branches:– Syntactics: Relation of signs to each other in
formal structures– Semantics: Relation between signs and the
things to which they refer; their denotata– Pragmatics: Relation of signs to their impacts on
those who use them
34
![Page 35: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/35.jpg)
Information integrity• Information of a random variable is defined as
the Sum of p x log p, where p=probability. It represents the uncertainty of the variable.
• In later classes we will cover cognitive and social factors in increasing the conditional entropy and thus reducing the uncertainty and thus increasing information content and value
• We will also cover semiotics (signs) as a prelude to visualization as a presentation mechanism for information 35
![Page 36: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/36.jpg)
Information gain/loss• The mutual information of two variables
define how much information one variable contains about the other.
• It is therefore defined as the decrease of the uncertainty of one variable by knowing the other.
• In probabilistic terms, the entropy decreases by conditioning on the distribution.
• What does this mean for an information system? E.g. a website or web service?
36
![Page 37: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/37.jpg)
Noise• Most often refers to ‘data’ but does apply to
information• Uncertainty, especially any that is introduced is a
source of noise, or more accurately – bias in the use or interpretation of the information
• Noise/ bias is context and structure dependent• Noise/ bias contamination is rampant in information
systems• Quality control and verification is less developed for
information sources, e.g. ‘people do not report problems’
37
![Page 38: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/38.jpg)
Library science• Curates the artifacts of knowledge
• Organizes and manages them for consumers– Cataloging and classification
• Preservation– ‘maintaining or restoring access to artifacts,
documents and records through the study, diagnosis, treatment and prevention of decay and damage’ (wikipedia)
• Digital age– Curation and preservation
38
![Page 39: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/39.jpg)
Cognitive Science• Cognitive science is an interdisciplinary study of
the mind and intelligence• It operates at the intersection of psychology,
philosophy, computer science, linguistics, anthropology, and neuroscience.
• Of relevance for data and information science are three significant theoretical underpinnings– mental representation,– the nature of expertise, – and intuition
• Very relevant to model, metamodel choice39
![Page 40: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/40.jpg)
Social Science• Branch of humanities
• Especially as it relates to networks of scientists
• Exploits sociology of groups, teams
• Cultural norms as well as discipline norms– Modes of what and how rewards are given– Between those who produce and those who
consume data and information– How you collect, understand, model and design
models and architectures is as much social as technical skill
40
![Page 41: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/41.jpg)
Presentation• Separation of content from presentation!!
• The theory here is more empirical or semi-empirical
• Is developed based on a solid understanding of minimizing information uncertainty beginning with content, context and structural considerations and, as we will see, adding cognitive and social factors to reduce uncertainty.
• Physiology for humans, color, …41
![Page 42: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/42.jpg)
Organization• Organizations as producers and consumers
• Organization of information presentation, e.g. layout on a web page
• Also (again) content, context and structure
• How do you organize– Information you’ve collected this semester– Information given to you by others
42
![Page 43: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/43.jpg)
Context• Internal - Human context, tacit knowledge
• External
43
![Page 44: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/44.jpg)
Structure• Is information stored or only presented?
• Structural representation of information content can bias presentation, e.g.– Modern image capture devices (digital camera)
often convert 2 byte integer to float, or 4 byte integer, what are the implications
• Appropriate choice of information structure can significantly decrease uncertainty, e.g. returning land images in GeoTIFF, which can encoding geographic location, instead of PNG 44
![Page 45: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/45.jpg)
Content• Presentation
• Translation
• Encoding
45
![Page 46: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/46.jpg)
Mental Representation• Thinking = representational structures +
procedures that operate on those structures.
• Data structures mental representations+ algorithms +procedures= running programs =thinking
• Methodological consequence: study the mind by developing computer simulations of thinking.
46
![Page 47: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/47.jpg)
Semiotics• Also called semiotic studies or semiology, is
the study of sign processes (semiosis), or signification and communication, signs and symbols
47
![Page 48: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/48.jpg)
Semiotic model
48
![Page 49: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/49.jpg)
Syntax• Relation of signs to
each other in formal structures
• … the term syntax is also used to refer directly to the rules and principles that govern the …
• But not the meaning or the use! 49
![Page 50: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/50.jpg)
Semantics
• Relation between signs and the things to which they refer; their denotata
• Study of meaning of … (anything?)
• Mainly need to worry about failures
50
![Page 51: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/51.jpg)
Pragmatics• Relation of signs to their
impacts on those who use them
• the ways in which context contributes to meaning, conveying and use
51
![Page 52: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/52.jpg)
Information Models• Conceptual models, sometimes called domain
models, are typically used to explore domain concepts
• High-level conceptual models are often created as part of initial requirements envisioning efforts as they are used to explore the high-level static business or science or medicine structures and concepts.
• Conceptual models are often created as the precursor to logical models or as alternatives to them
• Followed by logical and physical models 52
![Page 53: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/53.jpg)
Object models• A data model is a logic organization of the
real world objects (entities), constraints on them, and the relationships among objects. – A database (DB) language is a concrete syntax
for an object (data) model. – A DB system implements that model.
53
![Page 54: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/54.jpg)
Object design• Object-oriented modeling is a formal way of
representing something in the real world. It draws from traditional set theory and classification theory. Some basics to keep in mind in object-oriented modeling are that:– Instances are things.– Properties are attributes.– Relationships are pairs of attributes.– Classes are types of things.– Subclasses are subtypes of things.
54
![Page 55: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/55.jpg)
Architectures• Building on content, context,
and users, some illustrate information architecture as an iceberg.
• Just like an iceberg, the majority of information architecture work is out of sight, "below the water."
• The work includes the creation of plans, controlled-vocabularies, and blueprints all before any user interfaces are created.
55
![Page 56: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/56.jpg)
Interfaces
• Increasingly in tiered architectures there are numerous interfaces
• Information flow at interfaces and thus software engineering at those interfaces becomes a very important consideration
56
![Page 57: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/57.jpg)
And relation to design?• “In the context of information systems design,
information architecture refers to the analysis and design of the data stored by information systems, concentrating on entities, their attributes, and their interrelationships.
• It refers to the modeling of data for an individual database and to the corporate data models an enterprise uses to coordinate the definition of data in several (perhaps scores or hundreds) of distinct databases.
57
![Page 58: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/58.jpg)
Design theory• Elements
– Form– Value– Texture– Lines– Shapes– Direction– Size– Color
• Relate these to signs and relations between them
58
![Page 59: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/59.jpg)
Principles of design
• Balance
• Gradation
• Repetition
• Contrast
• Harmony
• Dominance
• Unity
59
![Page 60: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/60.jpg)
Reference architectures• “provides a proven template solution for an
architecture for a particular domain. It also provides a common vocabulary with which to discuss implementations, often with the aim to stress commonality.
• A reference architecture often consists of a list of functions and some indication of their interfaces (or APIs) and interactions with each other and with functions located outside of the scope of the reference architecture.” (wikipedia) 60
![Page 61: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/61.jpg)
Statefull versus stateless• A key distinction between Grids and Web
environments is state, i.e. the knowledge of ‘who’ knows and remembers ‘what’
• Increasingly there is a need for maintaining some form of state, i.e. reducing information entropy in web and internet-based architectures
• Thus, enter the need for ‘state for a defined purpose’…
61
![Page 62: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/62.jpg)
Life-cycle elements• Acquisition: Process of recording or
generating a concrete artefact from the concept (see transduction)
• Curation: The activity of managing the use of data from its point of creation to ensure it is available for discovery and re-use in the future
• Preservation: Process of retaining usability of data in some source form for intended and unintended use
• Stewardship: Process of maintaining integrity across acquisition, curation and preservation 62
![Page 63: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/63.jpg)
Acquisition• Learn / read what you can
about the developer of the means of acquisition– Documents may not be
easy to find
– Remember bias!!!
• Document as you go
• Have a checklist (the Management list) and review it often 63
![Page 64: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/64.jpg)
Curation• From Producers to Consumers
• Consider the organization and presentation of the data
• Document what has been (and has not been) done
• Consider and address the provenance to date
• Be as technology-neutral as possible
• Look to add metainformation64
![Page 65: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/65.jpg)
Preservation• Refers to the full life cycle
• Archiving is a component
• Stewardship is the act of preservation
• Intent is that ‘you can open it any time in the future’ and that ‘it will be there’
• This involves steps that may not be conventionally thought of
• Think 10, 20, 50, 200 years…. looking historically gives some guide to future considerations 65
![Page 66: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/66.jpg)
Summary of Management• Creation of logical collections
• Physical handling
• Interoperability support
• Security support
• Ownership
• Metadata collection, management and access.
• Persistence
• Knowledge and information discovery
• Dissemination and publication 66
![Page 67: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/67.jpg)
67
Workflow• General definition: series of tasks performed
to produce a final outcome
• Information workflow – “analysis pipeline”– Automate tedious jobs that users traditionally
performed by hand for each dataset– Process large volumes of data/ information faster
than one could do by hand
![Page 68: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/68.jpg)
68
Workflows
• Formal models of the flow of data/ information among processing components
• May be simple and linear or more complex• Can process many data/ information types:
– Archives– Web pages– Streaming/ real time– Images (e.g., medical or satellite)– Simulation output– Observational data
![Page 69: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/69.jpg)
Visualization?• Reducing amount of data, quantization
• Patterns
• Features
• Events
• Trends
• Irregularities
• Exit points for analysis
• Leading to presentation of data cognitive science and the mental
representation69
![Page 70: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/70.jpg)
Types of visualization• Color coding (including false color)
• Classification of techniques is based on– Dimensionality– Information being sought, i.e. purpose
• Line plots
• Contours
• Surface rendering techniques
• Volume rendering techniques
• Animation techniques
• Non-realistic, including ‘cartoon/ artist’ style70
![Page 71: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/71.jpg)
Metadata• Metadata is structured information that
describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.
• Metadata is often called data about data or information about information.
71
![Page 72: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/72.jpg)
Different types of metadata• Descriptive metadata describes a resource
for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords.
• Structural metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters (used).
• Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. 72
![Page 73: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/73.jpg)
Sub-types (admin)• Rights management metadata, which deals
with intellectual property rights
• Preservation metadata, which contains information needed to archive and preserve a resource.
73
![Page 74: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/74.jpg)
Micro life cycle of data
![Page 75: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/75.jpg)
In one slide?• Use case – you have to know the goal (+more)• Conceptual and logical models -> information
models• Understand information flows and uncertainties
(sign systems), the life cycle and manage them• Apply information, library, cognitive, social science,
and design elements to developing a design of an architecture
• Think the design through (e.g. get closer to the physical model (workflow?)) and assess the presentation, organization, content, context, structure, syntax, semantic and pragmatics 75
![Page 76: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/76.jpg)
What would your slide include?
76
![Page 77: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/77.jpg)
Objectives• To instruct future information architects how to
sustainably generate information models, designs and architectures
• To instruct future technologists how to understand and support essential data and information needs of a wide variety of producers and consumers
• For both to know tools, and requirements to properly handle data and information
• Will learn and be evaluated on the underpinnings of informatics, including theoretical methods, technologies and best practices.
77
![Page 78: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/78.jpg)
Learning Objectives• Through class lectures, practical sessions,
written and oral presentation assignments and projects, students should:– Understand and develop skill in Development
and Management of multi-skilled teams in the application of Informatics
– Understand and know how to develop Conceptual and Information Models and Explain them to non-experts
– Knowledge and application of Informatics Standards
– Skill in Informatics Tool Use and Evaluation78
![Page 79: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/79.jpg)
Discussion• About discovery?
• Integration?
• All of the material?
79
![Page 80: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/80.jpg)
Reading for this week• Is retrospective
• Also covers metadata and information modeling
80
![Page 81: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 12, April 27, 2010 Information discovery and integration and course summary](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e0f5503460f94afa4cc/html5/thumbnails/81.jpg)
What is next• Break on May 4, no class
• Week 13 – Project presentations (May 11, i.e. in 2 weeks)
• IDEA surveys after April 28
81