methods for knowledge management & digital preservation
DESCRIPTION
Methods for Knowledge Management & Digital Preservation. The Theory and Practice of Digital History. Carl A. Young, M.A. in waiting 1 December 2009. Project Overview. Challenge. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/1.jpg)
“Think. Learn. Succeed.”Ver 1.2
Methods for Knowledge Management & Digital Preservation
The Theory and Practice of Digital History
Carl A. Young, M.A. in waiting1 December 2009
![Page 2: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/2.jpg)
“Think. Learn. Succeed.”Ver 1.2
Project Overview
Resource and skill-constrained historians and archivists require efficient methods for capturing, analyzing, and sharing original
artifacts.
• Multi-phase project • Develop a low-cost process for
digitally archiving documents• Store them in a standards-based
data storage platform• Set the conditions to scale with future
phases • Creating a collaborative, accessible,
online digital repository fully leveraging the optionality of the digital domain.
Phase I – PrototypingPhase II- Capture
Phase III- Web AccessPhase IV- Initial ExpansionPhase V- Infinite Expansion
Major PhasesMethodology
Challenge
![Page 3: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/3.jpg)
“Think. Learn. Succeed.”Ver 1.2
Completed in November 2009, this phase established a usable, affordable methodology
for project development by prototyping the capture and conversion of an original artifact
for testing and exploration purposes.
3
Phase I: Prototype
![Page 4: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/4.jpg)
“Think. Learn. Succeed.”Ver 1.2 4
Demonstration
Phase I: Prototype (cont.)
Original Digital Camera .JPG file format2 MB
Treatment w/Photoshop.TIFF29 MB
Adobe Conversion.pdf278 KB
Time elapsed:Photo: <1 minTreatment: ~3 minConversion: <1min
![Page 5: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/5.jpg)
“Think. Learn. Succeed.”Ver 1.2 5
![Page 6: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/6.jpg)
“Think. Learn. Succeed.”Ver 1.2 6
Phase I: Prototype (cont.)
Process Flowchart
Legend
![Page 7: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/7.jpg)
“Think. Learn. Succeed.”Ver 1.2
Completed in November 2009, this phase performed and documented a low-budget
document capture, artifact preservation, and conversion to a distributable format where a
historic text is extracted from the original document, archived, and presented to the user
in both the original capture (.jpg or .tiff) and distributable (.pdf and .xml) format with an
evaluation of optical character recognition (OCR) and transcription requirements.
7
Phase II: Capture
![Page 8: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/8.jpg)
“Think. Learn. Succeed.”Ver 1.2
Select Area• Image
– Adjustments– Curves “Digitization”
• Channel - RGB• Output-203• Input-160
8
Phase II: Capture (cont.)
Image Treatment
FilterBlur
Smart BlurRadius-100Threshold-100Quality- HighMode- Normal
Surface BlurRadius-100Threshold-25
Surface Blur (if needed)Radius-100Threshold-25
Lens BlurShape - OctagonRadius - 5Blade Curve - 50Rotation - 300Brightness -10Threshold - 75Noise- 3Distro –Uniform Select
SelectColor Range
Modify ShadowsNo Invert
ModifyExpand 2
CutFile
New *Width-1600Height - 2500Resolution- 300CM - RGB 16bit* Recommend saving as a preset.
PasteFlattenClean up as neededSave As .TIFF
![Page 9: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/9.jpg)
“Think. Learn. Succeed.”Ver 1.2 10
OCR and Transcription Demo
Phase II: Capture (cont.)
OCR TranscriptionTime elapsed:OCR: <1 minTranscription: ~5min
![Page 10: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/10.jpg)
“Think. Learn. Succeed.”Ver 1.2 11
OCRTranscription
![Page 11: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/11.jpg)
“Think. Learn. Succeed.”Ver 1.2 12
TEI Demo
Phase II: Capture (cont.)
Time elapsed:Preliminary Data: ~45 minPage: ~5 minLook at UVA’s TEI How To
![Page 12: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/12.jpg)
“Think. Learn. Succeed.”Ver 1.2 13
Phase II: Capture (cont.)
Methodology Flow Chart
Legend
![Page 13: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/13.jpg)
“Think. Learn. Succeed.”Ver 1.2
Phase II: Capture (cont.)
Militiaman’s Guide155 pages total, type text, fair condition
40 hours (optimal) / 5 GbsPer Page Estimates
• Photography: – ~30 sec– 2.5 Mbs @ 5Mpxl
• .tiff Conversion– ~3 min– 23 Mbs
• .pdf Conversion– ~1 min– 300 Kbs
• OCR - ~45 sec• Error Correction/Transcription: ~5 min• TEI - ~5 min (~45 min overhead)
14
Labor Estimates
Case Estimates• Photography:
– ~1:15– ~ 400 Mbs
• .tiff Conversion– ~7:45– 3.5 Gbs
• .pdf Conversion– ~2:30– 50 Mbs
• OCR - ~2 hours• Error Correction/Transcription: ~13 hrs• TEI - ~14 hrs
![Page 14: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/14.jpg)
“Think. Learn. Succeed.”Ver 1.2
• Consumer-grade HP 5Mpxl digital camera ($125)• Slightly above consumer-grade PC ($1100)
– 4 GB RAM– 1 GB VRAM– 500 GB, SATA HD– Dual Screens
• Consumer Software ($600)– Adobe Creative Suite 3
15
Equipment Baseline
![Page 15: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/15.jpg)
“Think. Learn. Succeed.”Ver 1.2
• Use a Tripod/Mount• Use consistent lighting• Safely flatten pages as much as possible• Use a mounting frame• Highest Resolution available• OCR is NOT reliable• Need an efficient method for TEI
16
Lessons Learned
![Page 16: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/16.jpg)
“Think. Learn. Succeed.”Ver 1.2
This phase is the subject of this grant funding request. A team of professional developers will construct a
suitable multi-media database for storage and access of original artifact captures, distributable .pdf versions, and XML-based data and metadata derived from the
original. The team will also develop a working prototype web site
to access the data. Fundamental to this phase will be data archiving and disaster recovery for the data.
Successful conclusion of this phase will yield a working version 1.0 available for release and continued
development.
17
Phase III: Web-Access
![Page 17: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/17.jpg)
“Think. Learn. Succeed.”Ver 1.2 18
Phase III: Web-Access (cont.)
Flow Chart
![Page 18: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/18.jpg)
“Think. Learn. Succeed.”Ver 1.2 19
Work Breakdown Structure
Phase III: Web-Access (cont.)
Database Development
Prototype Evaluation
Prototype Web Development
AlphaTest & Mod
Beta
Test & Mod
RC1Test & Mod
v1.0
DocumentationDisaster Recovery
TestingEstimated Cost:
$52,000
![Page 19: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/19.jpg)
“Think. Learn. Succeed.”Ver 1.2 20
Project Gantt Chart
Phase III: Web-Access (cont.)
![Page 20: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/20.jpg)
“Think. Learn. Succeed.”Ver 1.2
Beyond the scope of this grant request, this phase seeks to develop partnerships and data shares across multiple institutions with similar projects
in development or production. The level of participation directly influences the
scale of this phase. It is anticipated that the minimal costs will be shared across participating
institutions.
21
Phase IV: Initial Expansion
![Page 21: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/21.jpg)
“Think. Learn. Succeed.”Ver 1.2
Conduct Lifecycle Management Review
DocumentationDisaster Recover
Testing
Publish Methodology
Find Partners
Large Scale Capture
Leverage v1.0
Update Code and Processes
22
Work Breakdown Structure
Phase IV: Initial Expansion (cont.)
Estimated Cost: $8,000
![Page 22: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/22.jpg)
“Think. Learn. Succeed.”Ver 1.2
Optionally, and depending on the success of the earlier phases, this phase will greatly expand collaborative efforts by potentially make this capability available to amateur and resource-
constrained archivists and historians by providing a standards-based methodology and
data capture technique and a collaborative platform to share the data once stored.
This aspect of the final phase will be limited only by technology maintenance and scalability
costs.
23
Phase V: Infinite Expansion
![Page 23: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/23.jpg)
“Think. Learn. Succeed.”Ver 1.2 24
Work Breakdown Structure
Phase V: Infinite Expansion (cont.)
Publish Updated Methodology
Publish Membership Schema
Open Data Models
Leverage Current Version
Conduct Lifecycle Management Review
DocumentationDisaster Recover
TestingEstimated
Cost: $82,000
Release New Version(s)
![Page 24: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/24.jpg)
“Think. Learn. Succeed.”Ver 1.2
Summary
• 5-Phase Approach• “How-To”
– Digitization– TEI– Manage the project
• Sets the stage– Broad/ambitious goals and
plan– Manageable pieces– Flexible optionality
• Phase III support:– $51,733.33– Prototype Validation– Database Development– Web Development– Hosting– Disaster Recovery
• Phase IV and V templates– Future expansion as desired– Flexible Planning
25
Project Summary Grant Request / Funding Summary
![Page 25: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/25.jpg)
“Think. Learn. Succeed.”Ver 1.2
QUESTIONS
26
![Page 26: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/26.jpg)
“Think. Learn. Succeed.”Ver 1.2
CONCLUSION
27
![Page 27: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/27.jpg)
“Think. Learn. Succeed.”Ver 1.2
Man had always assumed that he was more intelligent than dolphins because he had achieved so much... the wheel,
New York, wars, and so on, whilst all the dolphins had ever done was muck about in the water having a good
time. But conversely the dolphins believed themselves to be
more intelligent than man for precisely the same reasons.
- Douglas Adams
28
Dead Guy Quote
![Page 28: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/28.jpg)
“Think. Learn. Succeed.”Ver 1.2
BACKUP
29
![Page 29: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/29.jpg)
“Think. Learn. Succeed.”Ver 1.2 30
Phase I: Prototype (cont.)
Work Breakdown Structure
Image Capture
Image Preservation
Image Manipulation
Database Development
TEI Process Development
Data Development
Static Web-Page
Prototyping
DocumentationDisaster
Recovery TestingEstimated Cost:
$5,000
![Page 30: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/30.jpg)
“Think. Learn. Succeed.”Ver 1.2 31
Gantt Chart
Phase I: Prototype (cont.)
![Page 31: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/31.jpg)
“Think. Learn. Succeed.”Ver 1.2 32
Phase II: Capture (cont.)
Work Breakdown Structure
Image Capture
TEI
Prototype Database Input
DocumentationDisaster
Recovery TestingEstimated
Cost: $2,000
![Page 32: Methods for Knowledge Management & Digital Preservation](https://reader035.vdocument.in/reader035/viewer/2022062521/56816956550346895de1040e/html5/thumbnails/32.jpg)
“Think. Learn. Succeed.”Ver 1.2 33
Phase II: Capture (cont.)
Gantt Chart