arctos/tacc collaboration chris jordan texas advanced computing center
DESCRIPTION
Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center. Arctos: A 15 year history. MVZ: 1995 - Hired Stan Blum to develop relational data model (following modeling by Assoc. Systematic Collections). - PowerPoint PPT PresentationTRANSCRIPT
Arctos/TACC CollaborationChris Jordan
Texas Advanced Computing Center
Arctos: A 15 year history MVZ: 1995 - 1995 - Hired Stan Blum to develop relational data model (following
modeling by Assoc. Systematic Collections).
MVZ: 1997 - Hired John Wieczorek to implement model (desktop application) using Sybase and Versata. Partial implementation (e.g., no loans).
UAM: 1998-2000 - John W. migrated mammal data to Oracle, set up Versata.
UAM: 2002 - Dusty McDonald replaced Versata with ColdFusion, implemented full model (first web-based instance, aka Arctos).
MSB: 2003 – Joined Arctos at UAM (first multi-hosting instance).
MVZ and MCZ: 2005-2007 - Implemented separate instances of Arctos at Berkeley and Harvard (MVZ: first Postgres, then Oracle).
MVZ: 2009 - Moved hosting of data to Alaska (Virtual Private Database version).
Major repositories using the Arctos database:(34 collections of specimens or observations, 1.3M records)
TACC and TeraGridTACC and TeraGrid
10-year history of Research 10-year history of Research CyberinfrastructureCyberinfrastructure Supercomputing, Visualization and StorageSupercomputing, Visualization and Storage Supported by NSF to provide research Supported by NSF to provide research
resourcesresources TACC expansion of Data-focused supportTACC expansion of Data-focused support
1 Petabyte dedicated online disk1 Petabyte dedicated online disk 10 Petabytes offline archive10 Petabytes offline archive National network of replication resourcesNational network of replication resources
Data Diversity at TACCData Diversity at TACC
Image Collections (Natural History, Art, Image Collections (Natural History, Art, etc)etc)
Structured Data (Economics, Public Health)Structured Data (Economics, Public Health) BioMolecular Data (DNA, RNAseq, etc)BioMolecular Data (DNA, RNAseq, etc) Physical Sciences/Simulation DataPhysical Sciences/Simulation Data Geographic data (Climate, Disaster Geographic data (Climate, Disaster
Preparedness)Preparedness) Integrated Infrastructure Supports Diverse Integrated Infrastructure Supports Diverse
CollectionsCollections
Arctos is…A versatile online collections management system
Cataloged Items (ID, attributes, parts, etc.; batch uploading, downloading, editing; encumbrances)
Localities & Collecting Events (mapping, media, history)
Transactions (loans, accessions, borrows, permits; email reminders)
Usage (publications, projects, sponsors, GenBank)
Curatorial (object tracking, parts, condition, relations, etc.)
Determination history (identification, georef, attributes)
Breadth of Data in Arctos Fish, amphibians, reptiles, mammals, birds and bird Fish, amphibians, reptiles, mammals, birds and bird eggs/nests, plants, arthropods, fossils, molluscseggs/nests, plants, arthropods, fossils, molluscs Specimens and observationsSpecimens and observations Media (images, audio)Media (images, audio) Publications, fieldnotesPublications, fieldnotes
Arctos constantly evolving to incorporate new kinds of Arctos constantly evolving to incorporate new kinds of data, e.g.,:data, e.g.,: Better representation of non-publication documents Better representation of non-publication documents (fieldnotes, correspondence)(fieldnotes, correspondence) Cultural collections (art, anthropology...)Cultural collections (art, anthropology...)
Nearly all that is known about an object (or observation) can be included in Arctos.
Arctos/TACC PartnershipArctos/TACC Partnership
Arctos hosts web/database resourcesArctos hosts web/database resources TACC hosts media collectionsTACC hosts media collections
Images, Recordings, etcImages, Recordings, etc Simple workflows for automated Simple workflows for automated
generation of thumbnails, JPG versions, generation of thumbnails, JPG versions, MP3s, OCRMP3s, OCR
Replication policies automatically replicate Replication policies automatically replicate to various storage locationsto various storage locations
Images directly served from TACC to Images directly served from TACC to browsersbrowsers
Arctos/TACC HistoryArctos/TACC History
Initial work with UAF Herbarium in Initial work with UAF Herbarium in 20082008
Brought on MVZ Collections in 2009Brought on MVZ Collections in 2009 Ongoing work on web audio, OCROngoing work on web audio, OCR New collections from UAF, UNM, othersNew collections from UAF, UNM, others Currently >300,000 digital objects Currently >300,000 digital objects
under managementunder management Support >100,000 downloads of original Support >100,000 downloads of original
scans each yearscans each year
Advantages for Advantages for CollectionsCollections
Lower cost and management overheadLower cost and management overhead Highly reliable, large-scale Highly reliable, large-scale
infrastructureinfrastructure No scalability issuesNo scalability issues Longer-term partnerships promote Longer-term partnerships promote
technical collaboration to add technical collaboration to add capabilities over timecapabilities over time
Provides built-in “Data Management Provides built-in “Data Management Plan”Plan”
Long-Term SustainabilityLong-Term Sustainability
TACC plan is to be a permanent TACC plan is to be a permanent research data resourceresearch data resource
Arctos will evolve over time but the Arctos will evolve over time but the collections have permanent valuecollections have permanent value
Infrastructure foundation is stableInfrastructure foundation is stable Agency funding future is uncertainAgency funding future is uncertain Develop diverse funding sources and Develop diverse funding sources and
models to support robust, long-term models to support robust, long-term operationoperation
Ongoing EffortsOngoing Efforts
Expansion of storage resources at Expansion of storage resources at TACC (~10PB online disk)TACC (~10PB online disk)
Greater engagement in data Greater engagement in data management activitiesmanagement activities
Working with BRC, ADBC awards Working with BRC, ADBC awards and associated dataand associated data
iPlant Data/Genetic resources – link iPlant Data/Genetic resources – link to specimen records?to specimen records?
Thanks for your TimeThanks for your Time
Steffi Ickert-Bond, UAFSteffi Ickert-Bond, UAF Gordon Jarrell, UNMGordon Jarrell, UNM Carla Cicero, MVZCarla Cicero, MVZ Michelle Koo, MVZMichelle Koo, MVZ Dusty Mcdonald, ArctosDusty Mcdonald, Arctos