digital libraries prof. marcos andre goncalves universidade federal de minas gerais

71
Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Post on 20-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Digital LibrariesProf. Marcos Andre Goncalves

Universidade Federal de Minas Gerais

Page 2: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

SynchronousScholarly Communication

Same time, Same or different place

Page 3: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Asynchronous, Digital Library Mediated Scholarly Communication

Different time and/or place

Page 4: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Digital LibrariesShorten the Chain from

Editor

Publisher

A&I

Library

Reviewer

Page 5: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

DLs Shorten the Chain to

Author

Reader

Digital

LibraryEditor

Reviewer

Teacher

Learner

Librarian

Page 6: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

DL OverviewWhy of Global Interest?

• National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly

• Knowledge and information are essential to economic and technological growth, education

• DL - a domain for international collaboration– wherein all can contribute and benefit– which leverages investment in networking– which provides useful content on Internet & WWW– which will tie nations and peoples together more

strongly and through deeper understanding

Page 7: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Digital Libraries --- Objectives

• World Lit.: 24hr / 7day / from desktop• Integrated “super” information systems: 5S:

Table of related areas and their coverage• Ubiquitous, Higher Quality, Lower Cost • Education, Knowledge Sharing, Discovery• Disintermediation -> Collaboration • Universities Reclaim Property• Interactive Courseware, Student Works• Scalable, Sustainable, Usable, Useful

Page 8: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

How is a DL different from a database?

• A traditional SQL database has as its basic element data items in a relation:– select name– from employee, project– where employee.deptnumber = “25” AND– project.number = “100”

• databases exploit known structures and relations

• DBMS retrieval is not probabilistic (Frakes, Baeza-Yates, p. 3)

Page 9: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

How is a DL different from the WWW?

• The keyword is managed– The WWW is not managed

• Some meta searchers (Yahoo, Lycos) attempt to add an organizational framework to their web holdings– However, most are focused on keyword

searching (i.e., Google)

Page 10: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

How is a DL different from the WWW?

• Another key difference is who controls the input into the system– most meta searchers hunt down their holdings

• Lycos is short for Lycosidae lycosa (the “wolf spider”), which pursues its prey and does not build a web (Mauldin, IEEE Expert, 1/97)

– some (Yahoo) have humans in the loop for review and classification

• To date, DLs are generally more tightly controlled, and have a targeted customer set

Page 11: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

DL = Content + Services

“Why not just use the WWW” ?– WWW by itself has low archival

& management characteristics

• “Why not use a RDBMS?”– In the same way that a card

catalog is not a TL, a RDBMS is candidate technology for use in DLs

• DL is the union of the content and services defined on the content

WWW (http) Access

(most common)

non-WWWAccess

(now uncommon)

OtherTechnologies

Digital Library Services

(searching, browsing, citation anlaysisusage analysis, alerts)

Vectorand/or

BooleanSearchEngines

(traditional IR)

RDBMSFile

Systems

Content

Page 12: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

How is a DL Different from a Traditional Library?

• TL has as its focus physical objects– even if the card catalog (metadata) is electronic, the

purpose is to point you to a physical location– trafficking in physical objects has both obvious and

subtle implications• object can exist only in 1 place• if you have it, I can’t have it (zero-sum distribution)• I have to go to the object, or wait for it to come to me

Page 13: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

TLs vs. DLs

• DLs clearly better than TLs at:– Dissemination, storing information variety

• However, TL objects are more survivable– Who will archive the research information?

• the publishers?• the institutions?• the authors?

– Will the average DL object still be accessible in 10 years?

• take my digital preservation seminar in the spring!

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

image from: http://www.ancientegypt.co.uk/writing/rosetta.html

Page 14: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

• Digital Library– removing the physical restriction has obvious

benefits• multiple access, multiple listings, electronic transmission

– also complicates many other issues...• intellectual property, terms and conditions, etc.

• Note that a TL offers additional social and educational benefits– Most TLs also offer hybrid services too.

How is a DL Different from a Traditional Library?

Page 15: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

DL Definitions - 1

• “A digital library is an organized and focused collection of digital objects, including text, images, video, and audio, along with methods of access and retrieval, and for selection, creation, organization, maintenance, and sharing of the collection.”

• Witten & Bainbridge – “How to Build a Digital Library” – Morgan Kaufmann 2003

Page 16: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

DL Definitions - 2

• “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities”

• Waters,D.J. CLIR Issues, July/August 1998• www.clir.org/pubs/issues/issues04.html

Page 17: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Informal 5S & DL Definitions

DLs are complex systems that

• help satisfy info needs of users (societies)

• provide info services (scenarios)

• organize info in usable ways (structures)

• present info in usable ways (spaces)

• communicate info with users (streams)

Page 18: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

5Ss

Ss Examples Objectives

Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data

Structures Collection; catalog; hypertext; document; metadata

Specifies organizational aspects of the DL content

Spaces Measure; measurable, topological, vector, probabilistic

Defines logical and presentational views of several DL components

Scenarios Searching, browsing, recommending

Details the behavior of DL services

Societies Service managers, learners, teachers, etc.

Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

Page 19: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

5S and DL formal definitions and compositions (April 2004 TOIS)

5S

structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)

structural metadataspecification(d.25)

descriptive metadataspecification(d.26)

repository(d. 33)

collection (d. 31)

(d.34)indexingservice

structured stream (d.29)

digitalobject (d.30)

metadata catalog (d.32)

browsingservice

(d.37)

searchingservice (d.35)

digital library(minimal) (d. 38)

services (d.22)

sequence (d. 3)

graph (d. 6)function (d. 2)

measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces

event (d.10)state (d. 18)

hypertext(d.36)

sequence (d. 3)

transmission(d.23)

relation (d. 1) language (d.5)

grammar (d. 7)

tuple (d. 4)*

Page 20: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais
Page 21: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA-DL

• Archaeological DL• Integrated DL

– Heterogeneous data handling

• Applies and extends the OAI-PMH– Open Archives Initiative Protocol for Metadata

Handling

• Design considerations– Componentized– Extensible– Portable

Page 22: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Map courtesy: www.enchantedlearning.com

Initial ETANA-DL Member Locations

Virginia Tech

Mississippi State University

Vanderbilt University

Canadian University College

Walla Walla College

Andrews University

CWRU

Willamette University

Page 23: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais
Page 24: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais
Page 25: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Lahav Website

Page 26: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Megiddo Opening Screen

Page 27: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Locus Screen: Pictures

View all

Page 28: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Area Screen

Page 29: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais
Page 30: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA-DL Approach• Applying and extending Digital Library (DL)

techniques to solve key problems: making primary data available, data preservation, and interoperability

• Modeling archaeological information systems using 5S to better understand the domain and design the system and the supporting services

• Rapidly prototyping DLs that handle heterogeneous archaeological data using componentized frameworks:– eliciting requirements– refining metamodel and union schema– modeling sites– mapping– harvesting– providing useful services

Page 31: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA-DL Website

Page 32: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Marking – writingnotes for

a specific user

Marking Items

Page 33: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Marked Items Display

Sender, Date,Object OAI ID

SenderComments

Options:View Record,

Add record to Items Of Interest,Re-mark item (Redirect),

Unmark item (Remove item from list)

Page 34: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Discussions Page

Discussions about an

object

View/Post messages, create new

threads

Page 35: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Recommendations

Items recommendedon the basis of

similar interests

Page 36: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA-DL Searching ServiceSearch

Page 37: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA-DL Multi-dimensional Browsing

3 new sites

2 new types of artifacts

Page 38: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA-DL Visual Browsing Service

Visual BrowseBy site

Page 39: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Visual Browsing Nimrin: Topographical Drawings

Full site North west quadrant

Square:N40/W20

Page 40: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Visual Browsing Nimrin : Square information

Square:N40/W20

Locus: 86

Loci layout

Page 41: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Visual Browsing Nimrin : locus sheet

Page 42: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Visual Browsing Bab edh-Dhra'

Cemetery

Pottery # 25

Page 43: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Visual Browsing Bab edh-Dhra'

Cemetery

Pottery # 25

Page 44: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA Societies

1. Historic and pre-historic societies (being studied)2. Archaeologists (in academic institutes, fieldwork

settings, or local and national governmental bodies)

3. Project directors4. Technical staff (consisting of photographers,

technical illustrators, and their assistants)5. Field staff (responsible for the actual work of

excavation)6. Camp staff (e.g., camp managers, registrars, tool

stewards)7. General public (e.g., educators, learners, citizens)

Page 45: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA Societies

• Social issues1. Who owns the finds?

2. Where should they be preserved?

3. What nationality and ethnicity do they represent?

4. Who has publication rights?

5. What interactions took place between those at the site studied, and others? What theories are proposed by whom about this?

Page 46: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA Scenarios1. Life in the site in former times2. Digital recording: the planning stage and the excavation stage 3. Planning stage: remote sensing, fieldwalking, field surveys, building

surveys, consulting historical and other documentary sources, and managing the sites and monuments

4. Excavation1. Detailed information is recorded, including for each layer of soil, and for

features such as pole holes, pits, and ditches. 2. Data about each artifact is recorded together with information about its

exact find spot. 3. Numerous environmental and other samples are taken for laboratory

analysis, and the location and purpose of each is carefully recorded. 4. Large numbers of photographs are taken, both general views of the

progress of excavation and detailed shots showing the contexts of finds. 5. Organization and storage of material6. Analysis and hypotheses generation and testing7. Publications, museum displays8. Information services for the general public

Page 47: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA Spaces

1. Geographic distribution of found artifacts2. Temporal dimension (as inferred by

archaeologists) 3. Metric or vector spaces

1. used to support retrieval operations, and to calculate distance (and similarity)

2. used to browse / constrain searches spatially

4. 3D models of the past, used to reconstruct and visualize archaeological ruins

5. 2D interfaces for human-computer interaction

Page 48: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA Structures

1. Site Organization1. Region, site, partition, sub-partition, locus,

2. Temporal orderings (ages, periods)

3. Taxonomies1. for bones, seeds, building materials, …

4. Stratigraphic relationships1. above, beneath, coexistent

Page 49: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

ETANA Streams

1. successive photos and drawings of excavation sites, loci, unearthed artifacts

2. audio and video recordings of excavation activities and discussions

3. textual reports

4. 3D models used to reconstruct and visualize archaeological ruins.

Page 50: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Streams• Multiple media types and representation

– See ch. 4 for IR (except some here for non-text)– Standards for each, and for some combinations

• Text– Character strings, encoding (Unicode)– Morphology -> Stemming– Syntax, semantics -> stop words– ** POS tagging, phrases

• Images, Audio, Video, Graphics, Animation– Capture, digitization, representation– CBIR for each

• ** Compression, processing, analysis• **Synchronization, rendering, presentation, interchange

– RealVideo, SMIL, QoS

Page 51: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Content BasedInformationRetrieval

Page 52: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Problems

• Image similarity is subjective

– Personal Interpretation

• Concept x Appearance

Page 53: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

By Visual features

– Retrieve images with 50 percent of white colour and 50

percent of black colour

Page 54: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Textual information retrieval

Query on Google using Sunset and Rio de Janeiro

Query result

Page 55: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Image Classificationby shape

Page 56: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Image Classification by shape

Page 57: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Work of Torres et al

• Search in collections of fish images

• using combination of

• image properties (CBIR) and

• textual descriptions

Page 58: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Motivation

• Query 1:– List all metadata related to fish which were observed

in the Amazon River• Query 2:

– Retrieve images of fishes whose shape is similar to that in the example

o Query 3: List all metadata related to fishes that were

observed in the Amazon River and whose shape is

similar to that in the example

Page 59: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Motivation

• Retrieve fish descriptions whose shapes are similar to the one shown below, that belong to the “Notropis” genre, that have large yes” e and that have been observed in the “Tennessee River”

Page 60: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Problem• There is no BIodiversity Information System

which allow queries involving :– Geographic data

– Species metadata

– Image Descriptors

• Existing systems:– Metadada or

– Metadada + spatial data

– Images are stored as separate files

• With no possibilty of retrieval by content

Page 61: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

WeBioS

Page 62: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Torres: Visualizations

Spiral Pattern

Concentric Rings Pattern

Page 63: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Structures

• Digital Objects– Documents, digitization, packaging (METS), interchange,

standards, format conversion– Genre: plays, encyclopedia, dictionaries, educational resources:

courses (e.g., syllabi) and lessons– Structural organizations (books, chapters, sections),

excerpts/spans (mark, superimposed info)

• Metadata: standards, markup• Knowledge Structures & Representations

– Databases, Schema, Ontologies, Thesauri, Lexicons, Authority files, Concept maps, Semantic networks

• Indexes– Inverted files, signature files, R-trees, Quad trees, etc.

• Clusters & Classification Schemes

Page 64: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Degree of Structure

Chaotic Organized Structured

Web DLs DBs

Page 65: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Digital Objects (DOs)

• Born digital

• Digitized version of “real” object– Is the DO version the same, better, or worse?– Decision for ETDs: structured + rendered

• Surrogate for “real” object– Not covered explicitly in metamodel for a

minimal DL– Crucial in metamodel for archaeology DL

Page 66: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Metadata Objects (MDOs)

• MARC

• Dublin Core

• RDF

• IMS

• OAI (Open Archives Initiative)

• Crosswalks, mappings

• Ontologies

• Topics maps, concept maps

Page 67: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Complex to Simple

MARC ($50) Dublin Core (DC)

+thesis

Page 68: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Spaces

• Retrieval models

– Boolean, extended Boolean

– Vector, LSI

– Probabilistic: classical, belief network, inference network, language models

• User interfaces and visualization

Page 69: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

User interfaces and visualization

• 2D interfaces

• 3D interfaces

• GIS

• Other paradigms

Page 70: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Scenarios

• Recall OO for streams – now have objects as well as scenarios – ex interface components

• Information Access– Searching: ad hoc, filtering/routing– Browsing: using an organization, using a visualization,

using links (i.e., hypertext, hypermedia)– Workflow: sessions, feedback, etc.

• Scenario-based Design• Usability: goals, tasks, claims

• NOTE: this is covered in the outline

Page 71: Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Societies

• User communities– Authors, editors, teachers, students, readers– Personal(ization), group(ware), community, global– Accessibility, universal access

• Librarians: reference, acquisition, operations• Research community

– Associations, conferences, publications, labs, projects• Economics

– Copyright, intellectual property rights, digital rights management, authorization, authentication, security, privacy, self-archiving (eprints)

– Publishers, catalogers, distributors, sustainability– Open source, commercial, hybrid