integrating bibliographical data from heterogeneous digital libraries eike schallehn, martin endig,...

15
Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn , Martin Endig, Kai-Uwe Sattler -von-Guericke-University Magdeburg itute for Technical and Business Information Systems ox 4120 6 Magdeburg any ADBIS - DASFAA 2000 SYMPOSIUM September 5 - 8, 2000

Upload: juniper-stevenson

Post on 22-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

Integrating Bibliographical Data from Heterogeneous Digital Libraries

Eike Schallehn, Martin Endig, Kai-Uwe Sattler

Otto-von-Guericke-University MagdeburgInstitute for Technical and Business Information SystemsPO Box 412039016 MagdeburgGermany

ADBIS - DASFAA 2000 SYMPOSIUM

September 5 - 8, 2000

Page 2: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

2ADBIS - DASFAA 2000 SYMPOSIUM

Overview

• Introduction• Specific requirements• General overview of the approach• Source descriptions• XML adapter• Special application: Citation Linking• Conclusion and outlook

Page 3: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

3ADBIS - DASFAA 2000 SYMPOSIUM

Introduction

• Integration of bibliographical metadata:author, title, publisher, citations, ...

• Wide range of existing providers– Specific for research or geographical area– Publishers, libraries, resellers– differ in scope, quality and quantity of maintained data

• Problems for users– Knowledge about locality, scope, quality etc. is required

• Goal: single point of access

Page 4: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

4ADBIS - DASFAA 2000 SYMPOSIUM

General Overview (1)

FederationLayer

Adapter/SourceLayer

...

AdapterAdapterAdapterAdapter

Higher-level services

Single Point of Access

Federation Service

DBLP

ApplicationLayer

Publisher:

Springer

Publisher:

Spektrum

DBN

Page 5: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

5ADBIS - DASFAA 2000 SYMPOSIUM

General Overview (2)

• non-cooperative providers:– WWW databases– Z39.50 sources

• cooperative providers:– Databases (Relational, O-R, etc.)– sources with limited query facilities

capable of providing XML

Adapters for certain classes of providers

Page 6: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

6ADBIS - DASFAA 2000 SYMPOSIUM

Special Requirements (1)

• Efficient acces for global applications:– Object-relational conforming to standards– flexible import and integration regarding great

numbers of constantly changing systems

• Minimal provider-side resource consumption:– limited query facilities often exist– additional functionality constrained by local

resources

Assumption: common interest in cooperation (on a certain level)

Page 7: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

7ADBIS - DASFAA 2000 SYMPOSIUM

Special Requirements (2)

• Efficient transfer:– usage of XML as transfer standard– size of intermediate results critical factor for

possibly slow network connection– move queries with high selectivity to the source

• Minimal provider-side implementation efforts: – only wrapping of existing functionality required– tools for design based on source description– XML can easily be created

Page 8: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

8ADBIS - DASFAA 2000 SYMPOSIUM

Source Description (1)

• Access to local sources mainly depends on their query capabilities

• Global query re-writing and query processing based on these information

• Example: WWW-databases– Constant selections – Set of allowed comparison operators per attribute– simple combinations

Corresponding description required

Page 9: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

9ADBIS - DASFAA 2000 SYMPOSIUM

Source Description (2)

• Example: Source1 exports a relation BookStore and allows equality match for author and title, and either separate usage or a simple combination

alter table BookStore set query contraints ( predicates ((authors,=),(title,=)), combinations((author),(title),(author,title)));

• Example continued: Source2 exports a relation Books and allows only equality match on title

Page 10: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

10ADBIS - DASFAA 2000 SYMPOSIUM

Source Description (3)

´ODBMS´´Saake´

titleauthors

´Heuer´authors

´Heuer´authors ´Saake´authors

´ODBMS´title

(BookStore) (BookStore) (Books) (Books)

select * from Publications

where title=´ODBMS´ and authors=´Saake´ or authors=´Heuer´;

Page 11: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

11ADBIS - DASFAA 2000 SYMPOSIUM

XML Adapter (1)

• XML for transfer of bibliographical metadata– Cooperative providers– Provider-specific DTD– Underlying data management may vary

Transformation to object-relational structures• Application of XSLT

– Intermediate step: Transformation to internal DOM representation according to own DTD

– Design of XSLT mapping supported by tools

• Further result and query processing at federation layer

Page 12: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

12ADBIS - DASFAA 2000 SYMPOSIUM

XML Adapter (2)

select *

from Publications

where title like ´ODBMS´;

<ROWSET>

<ROW num=„1“>

<ID>1</ID>

<TITLE>ODBMS</TITLE>

<AUTHORS>Heuer, A.</AUTHORS>

</ROW>

...

</ROWSET>

Example Query: Possible XML Result:

Page 13: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

13ADBIS - DASFAA 2000 SYMPOSIUM

XML Adapter (3)

Source

Description

XSLT

FRAQL Query Processor

XML

Query Evaluator

...

Query Translator

XML Parser

Result Translator

...

...

...

Page 14: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

14ADBIS - DASFAA 2000 SYMPOSIUM

Application: Citation Linking

A d a9 8A d a9 8S co 9 8

D a tab asesD a tab asesD ig . L ib ra rie s

A d am sA d am sS co tt

S co 9 8M e y 9 2M o o 9 4

K ey Title A u tho rs R eferences

M o o 9 4……

S Q L……

M o o re……

………

K ey Title A u tho rs R eferences

M o o 9 4M e y 9 2

M o o 9 4M e y 9 2 a

P u b l_ 2P u b l_ 3

key id sou rce

A d a9 8S co 9 8

P u b l_ 1P u b l_ 1

D a tab asesD ig . L ib ra rie s

A d am sS co tt

id sou rce title au tho rs

M o o 9 4 P u b l_ 2 S Q L M o o re

id sou rce title au tho rsS co 9 8M e y 9 2 aM o o 9 4

id sou rce re f_ id re f_ source

A d a9 8A d a9 8S co 9 8

P u b l_ 1P u b l_ 1P u b l_ 1

P u b l_ 1P u b l_ 3P u b l_ 2

Local re lation src1.Publ Local re lation src2.Publ

M apping tableM apPubl

Im port re lation Publ_1Im port re lation R ef_1

Im port re lation Publ_2

Integrating citation information from various sources:

Page 15: Integrating Bibliographical Data from Heterogeneous Digital Libraries Eike Schallehn, Martin Endig, Kai-Uwe Sattler Otto-von-Guericke-University Magdeburg

September 5 - 8, 2000

Eike Schallehn Martin Endig

Kai-Uwe Sattler

15ADBIS - DASFAA 2000 SYMPOSIUM

Conclusion and Outlook

• Application of concepts known from the area of federated and mediator-based systems

• Focus on adapter for partially cooperative providers of bibliographical data– Description of query capabilities for efficient distributed

query processing– Result transfer based on XML– Result transformation based on XSLT

• Better tool support• Improve identification of same-objects