interlib(-related) activities at sdsc/dice
DESCRIPTION
InterLib(-Related) Activities at SDSC/DICE. IBM HPSS (Storage/Archival, e.g. ADL) SDSC SRB/(E)MCAT (Data Handling/Information Discovery) AMICO Image Collection (CDL Testbed) Excelon as XML Data Server - PowerPoint PPT PresentationTRANSCRIPT
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
InterLib(-Related) Activities at SDSC/DICE
Bertram Ludaescher
• IBM HPSS (Storage/Archival, e.g. ADL)
• SDSC SRB/(E)MCAT (Data Handling/Information Discovery)
• AMICO Image Collection (CDL Testbed)
Excelon as XML Data Server
• MIX: Mediation of Information using XML (with DB-Lab UCSD)
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
HPSS, SRB, MCAT
• HPSS: Storage/Archival of large datasets• (UCB, UCSB, Stanford)
• SRB/(E)MCAT: Data Handling/Information Discovery• transparent access to remote storage• replication• containers for large number of small items• caching• authorization• proxy operation support (filtering, data subsetting)• usage of security infrastructure (GSI)
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
SRB Interface
Application
MCAT
SRB MasterSRB Agent
Application
SRB Server
SRB Server
SRB Server
MCATCore
DublinCore
EcoCore
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
Managing Metadata: EMCAT
• Extensible Meta Data Catalog - EMCAT• Exploits dependencies & relationships (m:n, tc, <=>, …)• T-Language - Markup, Filter & Presentation• Meta Data Repository (Object-, System-, Collection-level)• Based on Kernel Meta Meta Data • Extensible • Uniform Access and Federation interface• Metadata exchange Interface Protocol
• MAPS- Meta data Attribute Presentation Structure• query, update and result structures• Close to Z39.50
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
SRB/MCAT Future
• Performance Improvements and Consolidation• Delayed Action Manager - mirror, cronjobs• Support for Methods• Handling Very Large Data sets - partitions• More Drivers - Sybase, NTFS, LDAP• Extensible MCAT• Language Support - Perl, Fortran
http://www.npaci.edu/DICE/SRB
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
The AMICO Digital Library Project
http://www.amico.orghttp://www.npaci.edu/DICE/AMICO
Art Museum Image ConsortiumRichard Marciano et. al.
55,146 objects 750 MB
53,763 thumbnail images 319 MB
57,609 full tiff images 180 GB
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
AM
ICO
Co
nso
rtiu
m
of
26 (
no
w 3
1) m
use
um
s AGO_ Art Gallery of Ontario AIC_ Art Institute of Chicago AKAG Albright-Knox Art Gallery, Buffalo, NY ASIA Asia Society BMFA Boston Museum of Fine Arts CCP_ Center for Creative Photography, U. Arizona CMA_ The Cleveland Museum of Art DMCC Davis Museum and Cultural Center, Wellesley College, MA FASF Fine Arts Museums of San Francisco GEH_ George Eastman House, Rochester, NY JPGM J. Paul Getty Museum, Los Angeles, CA LACM Los Angeles County Museum of Art LOC_ Library of Congress MACM Musée d'art contemporain de Montréal MBAM Musée des beaux-arts de Montréal MCAS Museum of Contemporary Art, San Diego MIA_ The Minneapolis Institute of Arts MMA_ The Metropolitan Museum of Art NGC_ National Gallery of Canada, Ottawa/Ontario NMAA National Museum of American Art, Smithsonian Institution PMA_ Philadelphia Museum of Art SFMO San Francisco Museum of Modern Art SJMA San Jose Museum of Art TFC_ The Frick Collection, NY WAC_ Walker Art Center, Minneapolis, MN WMAA Whitney Museum of American Art, NY
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
Raw Metadata Structure - catdata: 8 files 16,604 year1.d990429 14,430 year1.d990512 22,938 year1.d990520 54,303 year1.d990627 15 year1.d990708 54,298 year1.d990731 93 year1.d990806 657 year1.d990813
- tiffmetadata: 23 files 2963 AGO_.tiffmetadata.txt 1016 AIC_.tiffmetadata.txt 894 AKAG.tiffmetadata.txt 187 ASIA.tiffmetadata.txt 7591 BMFA.tiffmetadata.txt 401 CCP_.tiffmetadata.txt 1455 CMA_.tiffmetadata.txt 56 DCMC.tiffmetadata.txt 470 DMCC.tiffmetadata.txt 10141 FASF.tiffmetadata.txt 2137 GEH_.tiffmetadata.txt 1459 JPGM.tiffmetadata.txt 1013 LACM.tiffmetadata.txt 20654 LOC_.tiffmetadata.txt 86 MACM.tiffmetadata.txt 50 MBAM.tiffmetadata.txt 31 MCAS.tiffmetadata.txt 1440 MIA_.tiffmetadata.txt 550 MMA_.tiffmetadata.txt 1507 NGC_.tiffmetadata.txt 1416 NMAA.tiffmetadata.txt 154 PMA_.tiffmetadata.txt 158 SFMO.tiffmetadata.txt 86 SJMA.tiffmetadata.txt 68 Such.tiffmetadata.txt 396 WAC_.tiffmetadata.txt 37069 replacements.txt 57499 replacements2.txt
- thumbmeta: 52,689 files AGO_.1016.25_thum.met* AGO_.1016.32_thum.met* AGO_.1016.39_thum.met* …... WAC_.994C_thum.met WAC_.996C_thum.met WAC_.998C_thum.met WAC_.99C_thum.met* WMAA.1557_56_thum.met WMAA.31_426_thum.met
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
AMICO Metadata Conversion Steps
Merge“Raw” Metadata files: - catdata (8 files), - tiffmetada (23 files), - thumbmeta (52,689 files)
Convert toXML
Split-by-museums 1
XML fileper museum
Split-by-file size
MultipleXML files
per museum
eXcelonDump&Load
Utility
eXcelonData Server
Split-by-machines
1 XML fileper museum
Multiplemuseum XML
files per machine
3 XML files: - 1 catdata - 1 tiffmetadata - 1 thumbmeta
eXcelonData Server
eXcelonData Server
ConsolidatedMetadata files: - 1 catdata - 1 tiffmetadata - 1 thumbmeta
Tape Read
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
Alternative System Architectures
AMICOmetadata
server
* eXcelonHPSS
SRB
* Oracle 8i* DB2
Fileserver180GBRAID
180GBRAID
Data Server
100Mbit Ethernet
HPSS
DB2
Data Server
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
Current catalog metadata count (per museum)
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
Average tiff size in MB (per museum)
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
Excelon Metadata Layout
XMLStore
Museum1 Museum2 Museum-n
File1.xml
Machine1 Machine2
Binder doc.xml
XQL Query
File2.xml
Museum directories
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
XMAS query
MIX: Mediation of Information Using XML ...… for the AMICO CDL Prototype
MIXmengine
MIXmengine
Wrapper
MARCDatabase
XML doc
AMICO XMLDatabase
AMICO XMLDatabase
SRB/MCAT
HPSS
Request forimage (X.509)
tif file
BBQ Interface(slide carousel
interface)
XMAS: XML Matching and Structuring query language
View based onAMICO DTD
San Diego Supercomputer Center
National Partnership for Advanced Computational Infrastructure
SDSC/DICE Discussion Topics
• ADL: caching of HPSS data• ADEPT access to ADL for CDL testbed: SRB?• “Union Catalog”:
• AMICO DTD <=XMAS=> MARC
• SDLIP access to SRB/MCAT and MIX• Use of GINF (Stanford) • ...