the species analyst

46
DiGIR 1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley

Upload: ngocong

Post on 13-Feb-2017

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: The Species Analyst

DiGIR 1

Distributed Databases and Applications

John WieczorekMuseum of Vertebrate Zoology, UC Berkeley

Page 2: The Species Analyst

DiGIR 2

Distributed Databases Multiple sources of data …under local control, …with concepts in common …and a desire to deliver data as part of a

community.

Page 3: The Species Analyst

DiGIR 3

Distributed Databases The Species Analyst (TSA) The Integrated Taxonomic Information System (ITIS) FishNet The Mammal Networked Information System (MaNIS) HerpNET The Ornithological Information System (ORNIS) …

Page 4: The Species Analyst

DiGIR 4

Distributed Databases European Natural History Science Information

Network (ENHSIN) Biological Collection Access for Europe (

BioCASE) Australia Virtual Herbarium (AVH) Red Mundial de Información Sobre

Biodiversidad, Comisión Nacional para el Conocimiento y Uso de la Biodiversidad (REMIB, CONABIO)

Page 5: The Species Analyst

DiGIR 5

Distributed Databases Mountain and Plains Spatio-Temporal Database-

Informatics (MaPSTeDI) Ocean Biogeographic Information System (OBIS) Pacific Basin Information Node, National Biological

Information Infrastructure (PBIN, NBII) Species Link, Centro de Referência em Informação

Ambiental (Species Link, CRIA) A Virtual Herbarium of the Chicago Region (vPlants) Spatial Analysis of Local Vegetation Inventories Across

Scales (SALVIAS) …

Page 6: The Species Analyst

DiGIR 6

Distributed Databases Berkeley Natural History Museums (BNHM) Association of Biological Collections, UC Davis …

Page 7: The Species Analyst

DiGIR 7

Distributed Databases LifeMapper Global Biodiversity Information Facility (GBIF)

Page 8: The Species Analyst

DiGIR 8

Distributed vs. centralized Multiple sources of data …under local control, …with concepts in common …and a desire to deliver data as part of a

community

Page 9: The Species Analyst

DiGIR 9

Distributed vs. centralized

In other words, distribute the headache rather than have one central migraine.

Page 10: The Species Analyst

DiGIR 10

DiGIRDistributed Generic Information Retrieval

John Wieczorek, Stan Blum, Dave Vieglais, P.J. Schwartz

Page 11: The Species Analyst

DiGIR 11

Project Rationale To avoid multiple incongruous

development efforts To pool resources and create a community

of experts To solve the problem of scalability

Page 12: The Species Analyst

DiGIR 12

Project Goals To define a protocol for retrieving

structured data from multiple, heterogeneous databases across the Internet

To build a reference implementation of both provider and portal software using said protocol

Page 13: The Species Analyst

DiGIR 13

Design Goals To use open protocols and standards, such

as HTTP and XML To decouple the protocol, software and

semantics To make new data provider installations as

easy as possible To have open source development and

GNU General Public Licensing

Page 14: The Species Analyst

DiGIR 14

DiGIR ArchitectureUser InterfaceProtocolPortal EngineProvider

Page 15: The Species Analyst

DiGIR 15

DiGIR ArchitectureProvider

Page 16: The Species Analyst

DiGIR 16

DiGIR ArchitectureProviderRegistry

Page 17: The Species Analyst

DiGIR 17

DiGIR ArchitecturePortal Engine

Page 18: The Species Analyst

DiGIR 18

DiGIR ArchitecturePortal EngineRegistry

Page 19: The Species Analyst

DiGIR 19

DiGIR ArchitectureUser Interface

Page 20: The Species Analyst

DiGIR 20

DiGIR ArchitectureUser InterfaceProtocolPortal Engine

Page 21: The Species Analyst

DiGIR 21

DiGIR ArchitectureUser InterfaceProtocolPortal EngineProtocolProvider

Page 22: The Species Analyst

DiGIR 22

DiGIR ArchitectureUser InterfaceProtocolPortal EngineProtocolProvider

Page 23: The Species Analyst

DiGIR 23

DiGIR ArchitectureUser InterfaceProtocolPortal Engine

Page 24: The Species Analyst

DiGIR 24

DiGIR Component Summary

Page 25: The Species Analyst

DiGIR 25

DiGIR Protocol Defines request and response message

formats for communication between provider, portal engine, and user interfaces Metadata requests Search requests Inventory requests

Remains unfettered by the structure of the data it transfers

Page 26: The Species Analyst

DiGIR 26

Portal Engine The entry point for a “user” Can query a registry for

potential providers Can determine, based on

provider metadata, whether a provider should be queried

Can send requests to multiple providers

Communicates via protocol compliant messaging only

Page 27: The Species Analyst

DiGIR 27

Portal Engine, continued Assembles responses

from providers Returns packaged results

to the “user” Logs activity

Page 28: The Species Analyst

DiGIR 28

Provider Receives requests Retrieves data from database Sends results to requestor Supplies metadata to describe

data classification and availability

Logs requests

Page 29: The Species Analyst

DiGIR 29

Registry Supports provider

“advertising” May be global and open May be private Need not be used at all Example: Universal

Description, Discovery and Integration (UDDI)

Page 30: The Species Analyst

DiGIR 30

User Interfaces Must be able to assemble and

send a request document to a portal

Must be able to receive and interpret a response document from the portal

This is where the real fun is!

Page 31: The Species Analyst

DiGIR 31

Example Network Configurations

Page 32: The Species Analyst

DiGIR 32

BNHM Network ConfigurationPHMA

WorkingDatabase

OnlineDatabase

UCBGWorking

Database

DiGIRProvider

BNHMDiGIRPortal

UCJEPSWorking

Database

OnlineDatabase

UCMPWorking

Databases (4)

OnlineDatabase

EssigWorking

Database

OnlineDatabase

OnlineDatabase

BNHMPresentation

Layer

Page 33: The Species Analyst

DiGIR 33

MaNIS Network ConfigurationWorking

Database

OnlineDatabase

WorkingDatabase

DiGIRProvider

MaNISDiGIRPortal

WorkingDatabase

OnlineDatabase

WorkingDatabase

OnlineDatabase

WorkingDatabase

OnlineDatabase

OnlineDatabase

MaNISPresentation

Layer

DiGIRProvider

MaNISDiGIRPortal

MaNISPresentation

Layer

DiGIRProvider

MaNISDiGIRPortal

MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

Page 34: The Species Analyst

DiGIR 34

MaNIS Network ConfigurationLACM

MS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

TTUFoxPro

Database

OnlineMS AccessDatabase

UWBM4D-Mac

Database

OnlineMS AccessDatabase

CASSQL ServerDatabase

OnlineSQL ServerDatabase

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

LACM-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

Page 35: The Species Analyst

DiGIR 35

MaNIS Network ConfigurationLACM

MS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

TTUFoxPro

Database

OnlineMS AccessDatabase

UWBM4D-Mac

Database

OnlineMS AccessDatabase

CASSQL ServerDatabase

OnlineSQL ServerDatabase

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

LACM-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

Page 36: The Species Analyst

DiGIR 36

MaNIS Network ConfigurationLACM

MS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

TTUFoxPro

Database

OnlineMS AccessDatabase

UWBM4D-Mac

Database

OnlineMS AccessDatabase

CASSQL ServerDatabase

OnlineSQL ServerDatabase

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

LACM-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

Page 37: The Species Analyst

DiGIR 37

MaNIS Network ConfigurationLACM

MS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

TTUFoxPro

Database

OnlineMS AccessDatabase

UWBM4D-Mac

Database

OnlineMS AccessDatabase

CASSQL ServerDatabase

OnlineSQL ServerDatabase

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

LACM-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

Page 38: The Species Analyst

DiGIR 38

MaNIS Network ConfigurationLACM

MS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

TTUFoxPro

Database

OnlineMS AccessDatabase

UWBM4D-Mac

Database

OnlineMS AccessDatabase

CASSQL ServerDatabase

OnlineSQL ServerDatabase

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

LACM-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

Page 39: The Species Analyst

DiGIR 39

Other Network ConfigurationsWorking

Database

OnlineDatabase

WorkingDatabase

DiGIRProvider

DiGIRProvider

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

WorkingDatabase

OnlineDatabase

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRPortal

Page 40: The Species Analyst

DiGIR 40

DiGing a little deeper

Page 41: The Species Analyst

DiGIR 41

Provider Installation Web server (Apache, IIS, etc.) PHP: Hypertext Preprocessor

(PHP) Provider software (DiGIR)

Configuration tool Testing scripts Provider scripts Provider manual (DiGIR)

Page 42: The Species Analyst

DiGIR 42

Provider Configuration Tool Provider metadata Resources Database connection Establishing table

relationships Concept to column (i.e.,

field, attribute) mapping

Page 43: The Species Analyst

DiGIR 43

Portal Configuration Web server (Apache, IIS, etc.) Sun Java 2 (JDK 1.4) Tomcat (Apache) Portal software (DiGIR) Portal installation

documentation (DiGIR)

Page 44: The Species Analyst

DiGIR 44

Portal Installation Engine configuration file

(finding providers) Presentation configuration

file (defining the Information Domain)

Presentation customization Engine start and stop scripts Presentation start and stop

scripts

Page 45: The Species Analyst

DiGIR 45

Portal Demonstrations

Page 46: The Species Analyst

DiGIR 46

DiGIR Project Information The DiGIR project is a collaborative effort DiGIR is currently established as an open

source development project on SourceForge (https://sourceforge.net/projects/digir).

Further documentation is available on the DiGIR web site (http://digir.net).