4/5/05 university of southern california as if by magic as if by magic presentation to the coalition...
TRANSCRIPT
4/5/05 University of Southern California
As If By MagicAs If By Magic
Presentation to theCoalition for Networked Information
April 5, 2005
Presented by:Mike Pearce – Deputy CIO
Judy Truelson - ISD Reference CoordinatorUniversity of Southern California
Sam Gustman – CTOSurvivors of the Shoah Visual History Foundation
4/5/05 University of Southern California
Supporting the ExpectationsSupporting the Expectations
Users expect to have simple, seamless service from server and storage environments, through the network, and onto the desktop whenever, wherever, and however.
Why shouldn’t I be able to … ?
Google and Amazon make it look easy.
4/5/05 University of Southern California
The LandscapeThe Landscape
Current examples of divergent data archives at USC:
Survivors of the Shoah Visual History Foundation (VHF)
Southern California Earthquake Center (SCEC)
Geographical Information System (GIS)
InscriptiFact
4/5/05 University of Southern California
Storage Case Study OverviewStorage Case Study Overview
VHF GIS SCEC InscriptiFactData in NATIVE FormSource Interviews with Video/Audio Boater Trip Route Maps Earthquakes Sensors Dead Sea ScrollsFormat Beta SP VideoTape Paper Binary Files on Disk and Tape TIFF, J PEG, "MRSID" on Disk/Tape Total Size 6.0 PB compressed 30 Megabytes 3 GB of Earthquake Sensor Data 100,000 Images: 1.75 PB
or 18.0 PB uncompressed per Timestep (1/1000 second) MRSID: J PEG-like format unique to3 GB of Geographical Data InscriptiFact
Processed Stored DataFormat VHS Tape, MPEG1, J PEG on Disk/Tape Binary/WORD Files on Disk Binary Files on Disk and Tape TIFF, J PEG, "MRSID" on Disk/Tape Total 6.0 PB compressed, or 400 MB 24 TB (data from 30 stations) 100,000 Images: 1.75 PB
18.0 PB uncompressed (storaged on videotapes, not digitized yet)
Current Storage3 TB local disk cache-USC 1 TB Network Storage 4 TB disk at USC 5,200 images stored in 87GB on disk.20 TB disk cache at Shoah 18 desktop PC drives 9 TB on disk at SDSC 218GB available on Disk200 TB on tape robot at Shoah Total Size Unknown 16 TB on tape at SDSC Total collection also stored on tape.
Storage NeededIn 2 Years 200 TB disk cache: access 1 Terabyte this year 24TB disk at USC: Now 260GB on disk
200 TB disk cache or tape: backup Size doubles each year 1 Pedabyte: Next 6 months
In 5 Years 6.0 PB compressed or 16 Terabytes 1 Petabyte 1.75 PB on disk18.0 PB uncompressedMultiple mirrors for access
Usage Viewing of non-analyzed data Boating Traffic Predictions Eathquake simulations, both in Viewing of non-analyzed databased on analyzing of data data files and in display formatin Excel using the 'TeraShake' Video
simulator
1 GB = 1 Gigabyte = 1024 Megabytes VHF=Visual History Foundation (Sam Gustman)1 TB = 1 Terabyte = 1024 Gigabytes GIS=Geographical Information System (J ohn Wilson)1 PB = 1 Petabyte = 1024 Terabytes SCEC=Southern California Earthquake Center (Tom J ordan)
4/5/05 University of Southern California
The LandscapeThe Landscape (continued)
Technology issues:
Diverse data use / Formats
Access restrictions
Storage / Media / Data patterns / Devices
Standards / Metadata
Applications / Tools
Network Limitations
Redundancy / Petabyte world
Computing … Power … Air conditioning
4/5/05 University of Southern California
The LandscapeThe Landscape (continued)
Sociological issues:
Human and financial resource limitations
How data is organized – who does it, to what level, and how?
User expectations, requirements, and specialized applications
Expertise and skills
Building consensus around strategies with Faculty Advisory
Groups, Federation Management Standards, etc.
Competing priorities – rock vs. sand:
Security – protecting what, for (and from) whom
Administrative applications CRM …
4/5/05 University of Southern California
Bridging the GapBridging the Gap
Today’s Landscape Meeting Expectations
as if by magic
4/5/05 University of Southern California
Playing TogetherPlaying Together
Requirements
SocialIssues
Method to Accessand/or Analyze
ApplicationsJava, C++ Fortran,
Browsers, XML
"Grid" Federated ToolsIn Common, Globus,Identity Management
Schema of the Bits Metadata - what's in the bits SRB, Documentum, DSpace
Managing the Bits Storage SystemDatabases, Files,
Archives, Preservation
4/5/05 University of Southern California
The Way Forward – Realizing the MiracleThe Way Forward – Realizing the Miracle
Heterogeneous data sources are here to stay
Key is to help data sources play together – Middleware
(SRB, DSpace, etc.)
Build flexibility strategies with multiple access methods
while being unobtrusive to the scholar
Pick what you can do – recognize you can’t do it all
Shared vision and direction
Raw storage – preservation, disaster recovery, etc.
Toolsets
Portals
Federated Identity Management
Survivors of the Shoah Visual History Foundation collected 51,659 interviews, recorded testimonies on 232,554 beta tapes, amassed 116,277 hours of testimony, recorded 32,064 miles of videotape, and conducted interviews in 56 countries and in 32 languages. Interviewers: 2,373 Videographers: 1,045 Volunteers: 2,000 Number of Interviews by CountryArgentina 737 Australia 2,483 Austria 184 Belarus 253 Belgium 207 Bolivia 22 Bosnia & Herzegovina 43 Brazil 567 Bulgaria 636 Canada 2,844 Chile 65 Colombia 14 Costa Rica 19 Republic of Croatia 330 Czech Republic 567 Denmark 95 Dominican Republic 1 Ecuador 9 Estonia 9 Finland 1 France 1,675
Georgia 6 Germany 677 Greece 303 Hungary 730 Ireland 5 Israel 8,474 Italy 419 Japan 1 Kazakhstan 6 Latvia 77 Lithuania 133 Macedonia 9 Mexico 112 Moldova 283 Netherlands 1,051 New Zealand 55 Norway 34 Peru 2 Poland 1,429 Portugal 2 Romania 147 Russia 712
Slovakia 665 Slovenia 12 South Africa 254 Spain 6 Sweden 331 Switzerland 68 Ukraine 3,434 United Kingdom 873 United States 19,843 Uruguay 126 Uzbekistan 25 Venezuela 227 Yugoslavia 361 Zimbabwe 6 Total: 51,659 testimonies 56 countries
Testimony Language Statistics Bulgarian 622 Croatian 394 Czech 574 Danish 72 Dutch 1,080 English 24,947 Flemish 5 French 1,886 German 933 Greek 303 Hebrew 6,317 Hungarian 1,285 Italian 432 Japanese 1 Ladino 10 Latvian 6
Lithuanian 45 Macedonian 9 Norwegian 34 Polish 1,571 Portuguese 563 Romani 28 Romanian 123 Russian 7,011 Serbian 374 Sign (3 American & 1 Hungarian) Slovak 574 Slovenian 6 Spanish 1,350 Swedish 269 Ukrainian 318
Yiddish 513 Total: 51,649 testimonies 32 language
Foundation Central Database
Physical Tape Management
Database
Production Database
Physical Tape Management System
Production Scheduling and Tracking Systems
Cataloging and Pre-Interview Questionaire Data Entry Station
ADIC 400 Tera-Byte Tape archive with AIT-2 Media
Digitization and Tape Copy Station
Foundation Public Database and Web
Server and Web Services
Beta-SP Taped Testimony
End-User Workstation
MPEG-1
MPEG-1 and
JPEGS
MPEG-1 and
JPEGS
Production and Survivor Data
Interview Details, Release Status, Videographer/
Interviewer Data
Interview Details, Release Status, Videographer/
Interviewer Data
Media Tracking Data
Subset of data for
public use
Subset of data for
public use
Video
Metadata
Current Shoah Foundation Architecture
U. of Michigan
Yale
Rice
USC
Distributed caches over Inernet2 with 1-
20 Terabytes of capacity
Australia
West Coast
MidWest
Foundation Central Database
Physical Tape Management
Database
Production Database
Physical Tape Management System
Production Scheduling and Tracking Systems
Cataloging and Pre-Interview Questionaire Data Entry Station
6 Petabyte Tape Archive with broadcast quality preservation copy wrapped in AAF
East CoastDigitization and Tape Copy Station
Foundation Public Database and Web
Server and Web Services
Beta-SP Taped Testimony
End-User Workstation
90 mbps JPEG2000
MPEG and
JPEGS
MPEG and
JPEGS
Production and Survivor Data
Interview Details, Release Status, Videographer/
Interviewer Data
Interview Details, Release Status, Videographer/
Interviewer Data
Media Tracking Data
Subset of data for
public use
Subset of data for
public use
Video
Metadata
Long Term Architecture Goal
Multiple 200 TB mirrors (disk arrays)
in different geographic locations
Access Grant Efforts with Universities
• MALACH: Multilingual Access to Large Spoken Archives– $7.5 Million Large ITR from the NSF
– Univ. of Maryland, Johns Hopkins, IBM, Charles University, Univ. of West Bohemia
– http://www.clsp.jhu.edu/research/malach/
• Mellon Grant – $1 Million
– USC, Rice and Yale (University of Michigan added to project on separate funding)
Scholarly Uses of the Shoah Foundation’s Visual History Archive (VHA) in Research and Instructional Programs
Mellon Grant Project Implementation
(September 2003 – September 2005)
Mellon Grant Project (June 2004-August 2004)
USC set out to:
Assess the usefulness of the archive for instruction and research
Assess the implications of digitally accessible video in various areas of study
USC Mellon Grant Commitments
Tier I—USC faculty integrate the Archive into course work
Tier II—USC Faculty integrate the archive into scholarly research
Tier III—USC makes the archive available on campus computers to interested researchers outside the first two tiers
Findings—Classroom observations have proven valuable in facilitating first time use of the VHA
The Holocaust as visual culture in terms of interviewing techniques, camera effects, videographic methods
Other Conclusions from survey data
Classroom integration of the VHA requires support beyond self-help materials in order to maximize use of the archive
USC’s Ongoing Commitments
Expansion of the local cache of testimonies to reflect the Shoah Foundation’s entire collection
Addition of Internet2 partner universities—University of Michigan has joined the collaboration.
Active promotion of the Shoah Foundation VHA through Presentations Bookmarks Posters Instruction and training
Contact Information
Moderator: Lynn O’Leary Archer, Senior Associate Dean and Executive Director, Resources & Services/Archival Research Center, Director of USC Libraries, [email protected]
Presenters:
Mike Pearce, Deputy CIO, USC [email protected] Sam Gustman, Chief Technical Officer, Survivors of the Shoah Visual History Foundation, [email protected]
Judy Truelson, ISD Reference Coordinator, USC [email protected]