interlib(-related) activities at sdsc/dice

18
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure InterLib(-Related) Activities at SDSC/DICE Bertram Ludaescher [email protected] IBM HPSS (Storage/Archival, e.g. ADL) SDSC SRB/(E)MCAT (Data Handling/Information Discovery) AMICO Image Collection (CDL Testbed) Excelon as XML Data Server MIX: Mediation of Information using XML (with DB- Lab UCSD)

Upload: lan

Post on 14-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

InterLib(-Related) Activities at SDSC/DICE. IBM HPSS (Storage/Archival, e.g. ADL) SDSC SRB/(E)MCAT (Data Handling/Information Discovery) AMICO Image Collection (CDL Testbed) Excelon as XML Data Server - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

InterLib(-Related) Activities at SDSC/DICE

Bertram Ludaescher

[email protected]

• IBM HPSS (Storage/Archival, e.g. ADL)

• SDSC SRB/(E)MCAT (Data Handling/Information Discovery)

• AMICO Image Collection (CDL Testbed)

Excelon as XML Data Server

• MIX: Mediation of Information using XML (with DB-Lab UCSD)

Page 2: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

HPSS, SRB, MCAT

• HPSS: Storage/Archival of large datasets• (UCB, UCSB, Stanford)

• SRB/(E)MCAT: Data Handling/Information Discovery• transparent access to remote storage• replication• containers for large number of small items• caching• authorization• proxy operation support (filtering, data subsetting)• usage of security infrastructure (GSI)

Page 3: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

SRB Interface

Application

MCAT

SRB MasterSRB Agent

Application

SRB Server

SRB Server

SRB Server

MCATCore

DublinCore

EcoCore

Page 4: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Managing Metadata: EMCAT

• Extensible Meta Data Catalog - EMCAT• Exploits dependencies & relationships (m:n, tc, <=>, …)• T-Language - Markup, Filter & Presentation• Meta Data Repository (Object-, System-, Collection-level)• Based on Kernel Meta Meta Data • Extensible • Uniform Access and Federation interface• Metadata exchange Interface Protocol

• MAPS- Meta data Attribute Presentation Structure• query, update and result structures• Close to Z39.50

Page 5: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

SRB/MCAT Future

• Performance Improvements and Consolidation• Delayed Action Manager - mirror, cronjobs• Support for Methods• Handling Very Large Data sets - partitions• More Drivers - Sybase, NTFS, LDAP• Extensible MCAT• Language Support - Perl, Fortran

http://www.npaci.edu/DICE/SRB

Page 6: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

The AMICO Digital Library Project

http://www.amico.orghttp://www.npaci.edu/DICE/AMICO

Art Museum Image ConsortiumRichard Marciano et. al.

55,146 objects 750 MB

53,763 thumbnail images 319 MB

57,609 full tiff images 180 GB

Page 7: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

AM

ICO

Co

nso

rtiu

m

of

26 (

no

w 3

1) m

use

um

s AGO_ Art Gallery of Ontario AIC_ Art Institute of Chicago AKAG Albright-Knox Art Gallery, Buffalo, NY ASIA Asia Society BMFA Boston Museum of Fine Arts CCP_ Center for Creative Photography, U. Arizona CMA_ The Cleveland Museum of Art DMCC Davis Museum and Cultural Center, Wellesley College, MA FASF Fine Arts Museums of San Francisco GEH_ George Eastman House, Rochester, NY JPGM J. Paul Getty Museum, Los Angeles, CA LACM Los Angeles County Museum of Art LOC_ Library of Congress MACM Musée d'art contemporain de Montréal MBAM Musée des beaux-arts de Montréal MCAS Museum of Contemporary Art, San Diego MIA_ The Minneapolis Institute of Arts MMA_ The Metropolitan Museum of Art NGC_ National Gallery of Canada, Ottawa/Ontario NMAA National Museum of American Art, Smithsonian Institution PMA_ Philadelphia Museum of Art SFMO San Francisco Museum of Modern Art SJMA San Jose Museum of Art TFC_ The Frick Collection, NY WAC_ Walker Art Center, Minneapolis, MN WMAA Whitney Museum of American Art, NY

Page 8: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Raw Metadata Structure - catdata: 8 files 16,604 year1.d990429 14,430 year1.d990512 22,938 year1.d990520 54,303 year1.d990627 15 year1.d990708 54,298 year1.d990731 93 year1.d990806 657 year1.d990813

- tiffmetadata: 23 files 2963 AGO_.tiffmetadata.txt 1016 AIC_.tiffmetadata.txt 894 AKAG.tiffmetadata.txt 187 ASIA.tiffmetadata.txt 7591 BMFA.tiffmetadata.txt 401 CCP_.tiffmetadata.txt 1455 CMA_.tiffmetadata.txt 56 DCMC.tiffmetadata.txt 470 DMCC.tiffmetadata.txt 10141 FASF.tiffmetadata.txt 2137 GEH_.tiffmetadata.txt 1459 JPGM.tiffmetadata.txt 1013 LACM.tiffmetadata.txt 20654 LOC_.tiffmetadata.txt 86 MACM.tiffmetadata.txt 50 MBAM.tiffmetadata.txt 31 MCAS.tiffmetadata.txt 1440 MIA_.tiffmetadata.txt 550 MMA_.tiffmetadata.txt 1507 NGC_.tiffmetadata.txt 1416 NMAA.tiffmetadata.txt 154 PMA_.tiffmetadata.txt 158 SFMO.tiffmetadata.txt 86 SJMA.tiffmetadata.txt 68 Such.tiffmetadata.txt 396 WAC_.tiffmetadata.txt 37069 replacements.txt 57499 replacements2.txt

- thumbmeta: 52,689 files AGO_.1016.25_thum.met* AGO_.1016.32_thum.met* AGO_.1016.39_thum.met* …... WAC_.994C_thum.met WAC_.996C_thum.met WAC_.998C_thum.met WAC_.99C_thum.met* WMAA.1557_56_thum.met WMAA.31_426_thum.met

Page 9: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Page 10: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Page 11: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Page 12: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

AMICO Metadata Conversion Steps

Merge“Raw” Metadata files: - catdata (8 files), - tiffmetada (23 files), - thumbmeta (52,689 files)

Convert toXML

Split-by-museums 1

XML fileper museum

Split-by-file size

MultipleXML files

per museum

eXcelonDump&Load

Utility

eXcelonData Server

Split-by-machines

1 XML fileper museum

Multiplemuseum XML

files per machine

3 XML files: - 1 catdata - 1 tiffmetadata - 1 thumbmeta

eXcelonData Server

eXcelonData Server

ConsolidatedMetadata files: - 1 catdata - 1 tiffmetadata - 1 thumbmeta

Tape Read

Page 13: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Alternative System Architectures

AMICOmetadata

server

* eXcelonHPSS

SRB

* Oracle 8i* DB2

Fileserver180GBRAID

180GBRAID

Data Server

100Mbit Ethernet

HPSS

DB2

Data Server

Page 14: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Current catalog metadata count (per museum)

Page 15: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Average tiff size in MB (per museum)

Page 16: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

Excelon Metadata Layout

XMLStore

Museum1 Museum2 Museum-n

File1.xml

Machine1 Machine2

Binder doc.xml

XQL Query

File2.xml

Museum directories

Page 17: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

XMAS query

MIX: Mediation of Information Using XML ...… for the AMICO CDL Prototype

MIXmengine

MIXmengine

Wrapper

MARCDatabase

XML doc

AMICO XMLDatabase

AMICO XMLDatabase

SRB/MCAT

HPSS

Request forimage (X.509)

tif file

BBQ Interface(slide carousel

interface)

XMAS: XML Matching and Structuring query language

View based onAMICO DTD

Page 18: InterLib(-Related) Activities at SDSC/DICE

San Diego Supercomputer Center

National Partnership for Advanced Computational Infrastructure

SDSC/DICE Discussion Topics

• ADL: caching of HPSS data• ADEPT access to ADL for CDL testbed: SRB?• “Union Catalog”:

• AMICO DTD <=XMAS=> MARC

• SDLIP access to SRB/MCAT and MIX• Use of GINF (Stanford) • ...