1/22/2006 columbia university notable new yorkers … project objective –digitally preserve oral...
TRANSCRIPT
1/22/2006
Columbia UniversityColumbia University
Notable New Yorkers …
Project objective
– Digitally preserve oral history recordings on variety of media and paper or electronic transcripts of interviews
1/22/2006
Columbia UniversityColumbia University
Notables New Yorkers include
» John B. Oakes» Bennett Cerf» Kenneth Clark» Ed Koch» Mary Lasker» Frances Perkins» Mamie Clark
1/22/2006
Columbia UniversityColumbia University
Source files– Analog cassettes
• 5” and 7” reels– Lengths: 7 minutes to in excess of 67 hours
– Typed transcripts• 100 pages to maximum 5,566 pages
– Binders– MS Word files
1/22/2006
Columbia UniversityColumbia University
Recorded media to be re-recorded to common file digital format
– Create preservation masters and access copies
» Preservation masters: 96kHz/24bit WAVE files
» Access copies: 44.1kHz/16 bit WAV files
» Web-accessible copies: .mp3 format
1/22/2006
Columbia UniversityColumbia University
Transcripts
– OCLC to provide to CUL• Archival TIFF images• Re-keyed files• Electronically formatted interviews
1/22/2006
Columbia UniversityColumbia University
Physical condition assessment– Interview transcripts quality varied
• Moderate to extensive written revisions & edits
– Audio files varied in quality & format• None provided for preview• Known life-expectancies & recovery issues
for media (audio cassette, reel-to-reel, etc.)• No evidence of ‘sticky shed’ or vinegar
syndrome identified by library
1/22/2006
Columbia UniversityColumbia University
Recommended workflow– OCLC p/u materials at CUL– OCLC deliver audio masters to Safe
Sound Archive– OCLC to return material to CUL
1/22/2006
Columbia UniversityColumbia University
Digitization specifications – OCLC delivered bi-tonal text pages
• 1-bit TIFF• Group IV TIFF compression• 600 dpi
– XML mark-up of full-text• OCLC facilitated
– Re-keying with 99.95% accuracy
TEI-Lite DTD mark-up
1/22/2006
Columbia UniversityColumbia University
Audio reformatting specifications & workflow
– Transmittal and trafficking policies• Database customization• Material log-in • Cross-checking against packing list
1/22/2006
Columbia UniversityColumbia University
Evaluation & engineers notes– Compact audio cassettes played back
on Nakamichi cassette decks w/mechanical & electrical playback alignments
– Digitization w/Prism Sound analog to digital converters
• Output 96kHz/24bit preservation master & 44.1kHz access copy concurrently
1/22/2006
Columbia UniversityColumbia University
Reel-to-reel tapes played back on Studer tape decks
– With mechanical & electrical playback alignments
– Digitization w/Prism Sound analog to digital converters
• Output 96kHz/24bit preservation master & 44.1kHz access copy concurrently
1/22/2006
Columbia UniversityColumbia University
CUL elected semi-monitored approach
– Up to 3 originals transferred simultaneously
– SSA guarantees 1:1 representation– Monitoring alternative – 100%
monitored
1/22/2006
Columbia UniversityColumbia University
Quality Control Procedures• OCLC performs 100% quality assurance of
all original TIFF images– Reviewed for completeness, alignment,
illumination regularity, and detail consistency throughout image» SW allows 1:1 viewing, zooming @
100%+ and reduced full-page view
1/22/2006
Columbia UniversityColumbia University
Quality Control Procedures
• SSA expects zero returns or rework– Achieved via database automation of file
naming (reduces human error)– Each file on delivery medium opened &
auditioned on separate computer – to assure recoverability
– Each file checked @ beginning & end to assure completeness
1/22/2006
Columbia UniversityColumbia University
Quality Control Procedures
• SSA expects zero returns or rework– And spot-checked throughout for
consistency– File names checked by separate person for
naming and contents against original recordings» Person also proofs file headers
1/22/2006
Columbia UniversityColumbia University
File Delivery– Text files – XML as per CUL
specifications• Challenges encountered due to variations
in the interview format/transcript format (some memoir style, some strict Q&A format)
– Audio files:• Portable hard drives
– ftp impractical due to file size» Est. 270GB = 1,000 hours connect time
1/22/2006
Columbia UniversityColumbia University
Digital Archive
– CUL files delivered for uploading to the OCLC Digital Archive repository
1/22/2006
Columbia UniversityColumbia University
Client acceptance of deliverables
– OCLC standard: 30-day image file retention
– Planned 30-day customer acceptance period