Digital Image Digital Image ScanningScanning
Instructor:Instructor:
Geri Bunker IngramGeri Bunker [email protected]@dimema.com
An Infopeople WorkshopAn Infopeople Workshop
August 2005August 2005
This Workshop Is Brought to You By the This Workshop Is Brought to You By the Infopeople ProjectInfopeople Project
Infopeople is a federally-funded grant project supported by the California State Library. It provides a wide variety of training to California libraries. Infopeople workshops are offered around the state and are open registration on a first-come, first-served basis.
For a complete list of workshops, and for other information about the Project, go to the Infopeople website at infopeople.org.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
IntroductionsIntroductions
Please tell us again, yourPlease tell us again, your
NameName LibraryLibrary Position and role within the Local History ProjectPosition and role within the Local History Project
Are there lingering questions from yesterday that Are there lingering questions from yesterday that we should discuss?we should discuss?
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Learning ObjectivesLearning Objectives
Understand the basics of digital imagingUnderstand the basics of digital imaging Interpret and evaluate scanning Interpret and evaluate scanning
specifications for your projectspecifications for your project Differentiate among different technology Differentiate among different technology
options for various formatsoptions for various formats Understand the significance of standard Understand the significance of standard
metadatametadata Learn about display and navigation Learn about display and navigation
options.options.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Agenda Agenda
9:00—10:309:00—10:30 What is Digitization?What is Digitization? 10:30—10:4510:30—10:45 BREAK BREAK 10:45—12:0010:45—12:00 Technology InfrastructureTechnology Infrastructure
12:00—1:0012:00—1:00 LUNCHLUNCH
1:00—2:301:00—2:30 Metadata, Rights, Quality ControlMetadata, Rights, Quality Control 2:30—2:452:30—2:45 BREAKBREAK 2:45—4:002:45—4:00 EffectivenessEffectiveness
What is Digitization?What is Digitization?
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
What is Digitization?What is Digitization?
Process of digitization Process of digitization resolution resolution bit depth bit depth
The Local History Project guidelines and The Local History Project guidelines and standards standards
The implications of these standardsThe implications of these standards
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
A Refresher on ScanningA Refresher on Scanning
Scanning takes reflected light signals Scanning takes reflected light signals and changes them to digital data. and changes them to digital data.
The resulting digitized image is made up The resulting digitized image is made up of a grid of individual picture elements.of a grid of individual picture elements.
Picture elements are known as “pixels”. Picture elements are known as “pixels”. Pixels are made up of binary digits (bits)Pixels are made up of binary digits (bits)
Each bit is expressed as either “0” or Each bit is expressed as either “0” or “1”.“1”.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Controlling Spatial Detail and Controlling Spatial Detail and AccuracyAccuracy
Two settings affect spatial detail and Two settings affect spatial detail and accuracy during the scanning processaccuracy during the scanning process bit depthbit depth resolution (the number of bits sampled)resolution (the number of bits sampled)
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Adjusting Bit DepthAdjusting Bit Depth
Binary digit (bit) “depth” Binary digit (bit) “depth” number of bits used to define each pixel. number of bits used to define each pixel. the greater the bit depth, the greater the the greater the bit depth, the greater the
number of tones (grays or color) number of tones (grays or color) Black and white (bitonal)=1 bit per pixelBlack and white (bitonal)=1 bit per pixel Grayscale=8 bits per pixel (256 shades Grayscale=8 bits per pixel (256 shades
of gray)of gray) Color=24 bits per pixel (16.7 million Color=24 bits per pixel (16.7 million
color tones)color tones)
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Adjusting ResolutionAdjusting Resolution
Resolution is a sampling rate—Resolution is a sampling rate— how many dots per inch will you scan? how many dots per inch will you scan?
E.g., 400 dpi. E.g., 400 dpi. The effect: The effect:
the higher the rate, the smoother the imagethe higher the rate, the smoother the image the more it can be magnified before its the more it can be magnified before its
individual pixels become visibleindividual pixels become visible High resolution = many dots per inchHigh resolution = many dots per inch Low resolution = fewer dots per inchLow resolution = fewer dots per inch
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Sometimes Resolution Is Sometimes Resolution Is Expressed As Absolute Pixel Expressed As Absolute Pixel
DimensionsDimensionsPixel dimensions = Pixel dimensions =
(dpi x width) x (dpi x height)(dpi x width) x (dpi x height)
Example: 3200 x 4000 would be the pixel Example: 3200 x 4000 would be the pixel dimensions of an 8” x 10” image scanned dimensions of an 8” x 10” image scanned at 400 dpi using the formula:at 400 dpi using the formula:
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Storing Your ImagesStoring Your Images
Very high quality images create very large files Very high quality images create very large files The higher the resolution, the greater the file The higher the resolution, the greater the file
size size The higher the bit depth, the greater the file The higher the bit depth, the greater the file
sizesize
For the exercise coming up For the exercise coming up two different formulas two different formulas to figure out how much disk space images needto figure out how much disk space images need
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Three Or More Files For Every Three Or More Files For Every ImageImage
Master image Master image This is one you do not tamper with, and you This is one you do not tamper with, and you use a file format that does not lose data use a file format that does not lose data
when you save it. when you save it. Two derivatives: Two derivatives:
access (service) imageaccess (service) image small (thumbnail) image. small (thumbnail) image.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Master FilesMaster Files
Stored offline—Stored offline— it is valuable, it is valuable, usually too large for common bandwidth usually too large for common bandwidth
Not uncommon to have multi-megabyte Not uncommon to have multi-megabyte master images. master images.
The exception is the JPEG2000 format, The exception is the JPEG2000 format, which enjoys a progressive display (details which enjoys a progressive display (details later). later).
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Service or Access ImagesService or Access Images
By contrast, a common range for the By contrast, a common range for the service or access image is service or access image is
100 to 500 KB 100 to 500 KB
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Thumbnail: The Smallest Thumbnail: The Smallest Access ImageAccess Image
A thumbnail may be only a few KB, and A thumbnail may be only a few KB, and typically is no larger than typically is no larger than
about 150-200 pixels on a side about 150-200 pixels on a side
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
For the Local History ProjectFor the Local History Project
Full resolution image and large Full resolution image and large service image delivered directly to service image delivered directly to librarieslibraries
Import either of them to CONTENTdm Import either of them to CONTENTdm to derive a service image and to derive a service image and thumbnailthumbnail
Automatic with CONTENTdm Automatic with CONTENTdm software software
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Keeping Your MasterKeeping Your Master
Retain on your local system, on the CDs Retain on your local system, on the CDs delivered, or in any other manner you delivered, or in any other manner you like. like.
CDL will also receive a copy of both CDL will also receive a copy of both master and derivative, master and derivative, Store the master as your “preservation” Store the master as your “preservation”
copy.copy. Important to understand the storage Important to understand the storage
implications of your master imagesimplications of your master images
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Local History Project ScanningLocal History Project Scanning
A common specification has been A common specification has been developeddeveloped
Scanning vendor (will have been) Scanning vendor (will have been) selected selected
It is still important to understand the It is still important to understand the specification and infrastructure issues.specification and infrastructure issues.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Exercise #1Exercise #1
Calculating File Sizes Calculating File Sizes for Digital Imagesfor Digital Images
Technology InfrastructureTechnology Infrastructure
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Technology InfrastructureTechnology Infrastructure
In this unit we will discuss the hardware, In this unit we will discuss the hardware, software and networking requirements software and networking requirements of digital projects. of digital projects.
We will touch on data storage again We will touch on data storage again briefly and will delve into the question of briefly and will delve into the question of compression and file formats. compression and file formats.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
The Local History Project The Local History Project
Will run on computers located around Will run on computers located around the state, connected through the the state, connected through the Internet.Internet.
The smooth operation of this The smooth operation of this distributed infrastructure involves distributed infrastructure involves not only hardware and software, but not only hardware and software, but also depends upon good also depends upon good communication among people. communication among people.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
All Of This Takes PlanningAll Of This Takes Planning
All the partners in the projectAll the partners in the project including the info tech service providing including the info tech service providing
partnerspartners Must demonstrate good Must demonstrate good
communication skills and communication skills and consistently confer with each otherconsistently confer with each other
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Library PoliciesLibrary Policies
Security Security Intellectual property Intellectual property Policies must be in synch with info Policies must be in synch with info
tech provider tech provider regardless of whom that may be regardless of whom that may be
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
CDL Will Be Providing Access CDL Will Be Providing Access To Your CollectionsTo Your Collections
They must be able to protect their They must be able to protect their networks from misuse. networks from misuse.
The end-users must be able to easily The end-users must be able to easily access unrestricted material. access unrestricted material.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Distributed Architecture Distributed Architecture
Designed for the Local History Designed for the Local History Project, it has local libraries feeding Project, it has local libraries feeding material into a central databankmaterial into a central databank
Fairly sophisticated, and yet divides Fairly sophisticated, and yet divides the labor according to appropriate the labor according to appropriate tasks. tasks.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
The Local History Project Will The Local History Project Will Comprise A Set Of CollectionsComprise A Set Of Collections
Each built locallyEach built locally Using the CONTENTdm Acquisition Using the CONTENTdm Acquisition
Station software, and stored on the Station software, and stored on the CONTENTdm server. The materials CONTENTdm server. The materials will be copied to the CDL will be copied to the CDL
Part of collaborative program for Part of collaborative program for both access and preservationboth access and preservation
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Local History Project Offers At Local History Project Offers At Least Three Outlets For Least Three Outlets For
Collections Collections The way your metadata will get into The way your metadata will get into
the CDL is through the use of the the CDL is through the use of the CONTENTdm export function. CONTENTdm export function.
A customized export/import A customized export/import mechanism writes your metadata in mechanism writes your metadata in the METS format the METS format
You will be trained in its use during You will be trained in its use during your CONTENTdm training sessionyour CONTENTdm training session
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Managing The Digital FilesManaging The Digital Files
Because your scanning will have been Because your scanning will have been done by a vendor, we will not discuss done by a vendor, we will not discuss the attributes of scanning software fully. the attributes of scanning software fully.
But you will need to know something But you will need to know something about the various pieces of software in about the various pieces of software in use.use.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
The Processing That Will Be The Processing That Will Be Done For You Includes:Done For You Includes:
Scanning: representing a print item as a digital Scanning: representing a print item as a digital image. E.g., the software that runs your digital image. E.g., the software that runs your digital camera or your scanner.camera or your scanner.
OCR Software: if you have text that you would OCR Software: if you have text that you would like made searchable, software such as like made searchable, software such as Omnipage then converts the words in the Omnipage then converts the words in the image to a text file that can be searched.image to a text file that can be searched.
Lastly, a Digital Asset Management System Lastly, a Digital Asset Management System (e.g. CONTENTdm) provides a way to organize (e.g. CONTENTdm) provides a way to organize the image files, make derivatives and add the image files, make derivatives and add metadata to each image.metadata to each image.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
CONTENTdm SelectedCONTENTdm Selected
High-performance tool High-performance tool Easy-to-use interface Easy-to-use interface Will scale as the collections grow Will scale as the collections grow
i.e., it will continue to perform well and i.e., it will continue to perform well and be manageable even when there are be manageable even when there are millions of objects millions of objects
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Hardware: From Scanning To Hardware: From Scanning To StorageStorage
The lifecycle of collections now includes The lifecycle of collections now includes preservationpreservation of the digital image. of the digital image.
Before scanning hardware or Before scanning hardware or specifications are set specifications are set
consider the technical issuesconsider the technical issues for access AND for for access AND for long-term preservation of the digital imagelong-term preservation of the digital image
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Sustaining Collections Over Sustaining Collections Over Time Time
Data needs to be saved and Data needs to be saved and protected at every stage in its life-protected at every stage in its life-cyclecycle
Many ways of accomplishing this are Many ways of accomplishing this are in experimental stagesin experimental stages
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Preservation Of Digital FilesPreservation Of Digital Files
Data migration Data migration e.g. moving files from CD to DVDe.g. moving files from CD to DVD
Backup and archiving plans Backup and archiving plans e.g. storing files online or on a central backup e.g. storing files online or on a central backup
serverserver Disaster recovery plans—for both analog Disaster recovery plans—for both analog
and digital resources and digital resources heaven forbid! The library burns down….what heaven forbid! The library burns down….what
happens to your CDs, your computers?happens to your CDs, your computers?
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Preservation Repository Must Preservation Repository Must Also Be ManagedAlso Be Managed
Sized, weeded, protected and moved Sized, weeded, protected and moved
Because CDL is offering long-term Because CDL is offering long-term preservation,preservation, your scans and metadata must meet the your scans and metadata must meet the
standards set for the repository!standards set for the repository!
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Choosing Among File Choosing Among File Formats Formats
One decision that affects collection’s One decision that affects collection’s accessibility and preservation potential accessibility and preservation potential isis
The format of the files you choose to The format of the files you choose to keep keep
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Many File Formats Many File Formats (LHDRP is requiring these *)(LHDRP is requiring these *) TIFF*TIFF* JPEG2000JPEG2000 GIF*GIF* JPEG*JPEG* PDF PDF MrSid—proprietary, wavelet-based MrSid—proprietary, wavelet-based
compression for progressive displaycompression for progressive display
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Choosing among the file Choosing among the file formats means you need to formats means you need to
understand something about understand something about what the file format what the file format
specification implies. specification implies.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Compression Used To Reduce File Compression Used To Reduce File SizesSizes
Two kinds—lossy and lossless.Two kinds—lossy and lossless.
Lossy- an irrecoverable loss of data,Lossy- an irrecoverable loss of data, considerable size reductions (JPEG). considerable size reductions (JPEG).
Lossless (JPEG2000 and TIFF), Lossless (JPEG2000 and TIFF), no loss of data. no loss of data.
TIFF: no loss of data but the file size is not TIFF: no loss of data but the file size is not reduced reduced
JPEG2000: no loss of data, but can also reduce the JPEG2000: no loss of data, but can also reduce the size of the file delivered for display, as it is size of the file delivered for display, as it is decompressed at the point of display.decompressed at the point of display.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
TIFF: Tagged Image File TIFF: Tagged Image File FormatFormat
TIFF itu-t.7 TIFF itu-t.7 IS A 24-bit storage format in widespread IS A 24-bit storage format in widespread
use.use. Useful for both color and bitonal (black & Useful for both color and bitonal (black &
white) images white) images Provides a high level of detail. It is used Provides a high level of detail. It is used
for archival files (masters). for archival files (masters). When compression is used, it should be When compression is used, it should be
lossless.lossless.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
JPEG: Joint Photographic JPEG: Joint Photographic Expert’s Group/JFIF Expert’s Group/JFIF
(JPEG File Interchange Format)(JPEG File Interchange Format) JPEGs are commonly used in bitmap JPEGs are commonly used in bitmap
image editing programs image editing programs e.g., Paintshope.g., Paintshop
In viewers, and most important for our In viewers, and most important for our project, project,
web browsers web browsers 24-bit, lossy compression format 24-bit, lossy compression format Well suited for screen and print Well suited for screen and print
presentations.presentations.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
JPEG2000JPEG2000 Provides highly detailed views of objectsProvides highly detailed views of objects Not a proprietary format Not a proprietary format
but not all software can handle a JPEG2000 filebut not all software can handle a JPEG2000 file both PhotoShop and CONTENTdm have that capability both PhotoShop and CONTENTdm have that capability
To view a file saved as JPEG2000, some products To view a file saved as JPEG2000, some products require a browser “plug-in”. require a browser “plug-in”.
CONTENTdm does not require one, but has a CONTENTdm does not require one, but has a built-in viewer in the extended server software.built-in viewer in the extended server software.
CDL does not currently support JPEG2000, so for CDL does not currently support JPEG2000, so for this project, you will not create JPEG2000 files.this project, you will not create JPEG2000 files.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
GIF: Graphics Interchange GIF: Graphics Interchange FormatFormat
8-bit, lossless compression format 8-bit, lossless compression format Well-suited to low resolution screen Well-suited to low resolution screen
displaydisplay Often used for thumbnails Often used for thumbnails Supported by all major computer Supported by all major computer
platforms and web browsersplatforms and web browsers
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
PDF: Portable Document PDF: Portable Document FormatFormat
Proprietary (Adobe) format, now Proprietary (Adobe) format, now de facto standard (is actually several de facto standard (is actually several
formats)formats) All need a plug-in or external application All need a plug-in or external application
for web display, for web display, but that “reader” is free to download. but that “reader” is free to download.
Widely used for printing and viewing Widely used for printing and viewing multi-page documentsmulti-page documents
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
A Word About File NamingA Word About File Naming
Best practice is to use the standard 8.3 Best practice is to use the standard 8.3 convention, e.g., house178.txt. convention, e.g., house178.txt.
Use lower-case characters only as some Use lower-case characters only as some operating systems such as Unix are operating systems such as Unix are case-sensitive. case-sensitive.
Avoid punctuation characters in Avoid punctuation characters in filenames altogether. filenames altogether.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
File NamingFile Naming
Simple—a single imageSimple—a single image Compound—more than one imageCompound—more than one image
Components need to be named and Components need to be named and stored in logical fashionstored in logical fashion
E.g., when assembling, page_01.jpg will E.g., when assembling, page_01.jpg will precede page_02.jpg (alphanumeric sort)precede page_02.jpg (alphanumeric sort)
E.g., when assembling a hierarchy, items E.g., when assembling a hierarchy, items need to be stored in logical directoriesneed to be stored in logical directories
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Local History Project Local History Project ConventionsConventions
Vendor must deliver files named with Vendor must deliver files named with an appropriate scheme an appropriate scheme that works for your library that works for your library
And for the Local History ProjectAnd for the Local History Project
Exercise will focus on file handlingExercise will focus on file handling File formats, naming and organizationFile formats, naming and organization
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
HardwareHardware
Digital project hardware components will Digital project hardware components will include at minimum include at minimum
Servers Servers Desktop computersDesktop computers Network componentsNetwork components
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Your CONTENTdm Your CONTENTdm EnvironmentEnvironment
Server located and managed remotely Server located and managed remotely for the Local History Project. for the Local History Project.
Computer on your desktopComputer on your desktop Network: IT provider uses componentsNetwork: IT provider uses components
e.g., routers, cables, access points, e.g., routers, cables, access points, network interface cards network interface cards
to connect everything together and to the to connect everything together and to the internet.internet.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Data Storage: Day-to-day, And Data Storage: Day-to-day, And Over The Long HaulOver The Long Haul
As you populate your collections, it is As you populate your collections, it is important to back up the workstations important to back up the workstations and network drives regularly. At the site and network drives regularly. At the site of the CONTENTdm server, as well as at of the CONTENTdm server, as well as at CDL, servers will also be regularly CDL, servers will also be regularly backed up as well. backed up as well.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Digital collection serversDigital collection servers
Remember: form follows function. Remember: form follows function. Hardware is sized for the project and for Hardware is sized for the project and for
the environment, the environment, AfterAfter the software has been chosen. the software has been chosen.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
CONTENTdm Server is Hosted CONTENTdm Server is Hosted by OCLCby OCLC
For LHDRPFor LHDRP One-year license One-year license After that, depends on funding….if After that, depends on funding….if
funded could be extendedfunded could be extended
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Considerations If You Run A Considerations If You Run A ServerServer
Processor style and speedProcessor style and speed Minimum RAMMinimum RAM Minimum online storage Minimum online storage These variables always depend upon the context These variables always depend upon the context
of your organization, the operating system of your organization, the operating system environments supported, and the application environments supported, and the application requirements.requirements.
The minimum requirements for servers in The minimum requirements for servers in general assure good performance, i.e., you can general assure good performance, i.e., you can very rapidly search and retrieve dense data, and very rapidly search and retrieve dense data, and display to many concurrent users. display to many concurrent users.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
CONTENTdm 4 Minimum CONTENTdm 4 Minimum Server RequirementsServer Requirements
CPU: Intel Pentium® 4 or greaterCPU: Intel Pentium® 4 or greater RAM 512 MB minimumRAM 512 MB minimum Operating Systems:Operating Systems: Linux, unix, Sun Solaris™ 8 or higher, Linux, unix, Sun Solaris™ 8 or higher,
Windows 2000/2003Windows 2000/2003 Dedicated Web server Dedicated Web server IIS 4.0 or later with Windows®, Apache IIS 4.0 or later with Windows®, Apache
with UNIX) with UNIX)
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Storage For FilesStorage For Files
Both “derivatives” (service images and Both “derivatives” (service images and thumbnails) are thumbnails) are
kept online kept online The archival TIFF is stored offlineThe archival TIFF is stored offline
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
The Files Most Commonly Seen The Files Most Commonly Seen As Derivative (Access) FilesAs Derivative (Access) Files
JPEGs averaging 100 K (with most JPEGs averaging 100 K (with most CONTENTdm collections)CONTENTdm collections)
Estimate 500 jpgs will need about 50 MB Estimate 500 jpgs will need about 50 MB space to store the access (service, space to store the access (service, derivative) imagesderivative) images
To size a CONTENTdm server, assume that To size a CONTENTdm server, assume that a a 1 GB disk1 GB disk Will store 10,000 jpgs for high-quality displayWill store 10,000 jpgs for high-quality display
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
To Populate The Collections, To Populate The Collections, On The Desktop Contentdm On The Desktop Contentdm
RequiresRequires Monitor capable of 1024 x 768 resolutionMonitor capable of 1024 x 768 resolution 256 MB RAM (512 recommended)256 MB RAM (512 recommended) Disk capacity to hold images Disk capacity to hold images
(temporarily) and software (temporarily) and software i.e. 100 MB for installation of Acquisition i.e. 100 MB for installation of Acquisition
StationStation Windows 2000 or XPWindows 2000 or XP 128 Kbps minimum network connection128 Kbps minimum network connection
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
A Desktop Wish List—not A Desktop Wish List—not Required, But Nice!Required, But Nice!
A dedicated computer for digitization with:A dedicated computer for digitization with: A 19” or 21” inch display monitorA 19” or 21” inch display monitor 1 Gb RAM (for multi-media)1 Gb RAM (for multi-media) 3.2GHz/800MHz processors optimized for 3.2GHz/800MHz processors optimized for
image manipulationimage manipulation Graphics processors (up to 128 MB dedicated Graphics processors (up to 128 MB dedicated
RAM) for high quality video, multiple monitors, RAM) for high quality video, multiple monitors, etc.etc.
High-quality lupes, scales and updated targetsHigh-quality lupes, scales and updated targets
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Digitizing Devices: Scanners Digitizing Devices: Scanners and Camerasand Cameras
In this phase of the project, your In this phase of the project, your scanning will be outsourcedscanning will be outsourced
But info on scanners and cameras is But info on scanners and cameras is included here for future referenceincluded here for future reference
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
We Will Discuss The Primary We Will Discuss The Primary Types Of These “Capture” Types Of These “Capture”
DevicesDevices Flatbed scannersFlatbed scanners Transparency scannersTransparency scanners Overhead scannersOverhead scanners Wide format scannersWide format scanners CamerasCameras Copy stand camerasCopy stand cameras Camera “backs”Camera “backs”
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
The Flatbed ScannerThe Flatbed Scanner
Chances are you have one of these in your Chances are you have one of these in your library (or your home). They handle unbound library (or your home). They handle unbound material up to 11” x 17” in size, and some material up to 11” x 17” in size, and some come with automatic document feeder come with automatic document feeder attachments so that you can stack a document attachments so that you can stack a document for scanning.for scanning.
The makes and models vary greatly in cost and The makes and models vary greatly in cost and quality. Some have transparency adapters too, quality. Some have transparency adapters too, but if you have a lot of film (slides) to scan, you but if you have a lot of film (slides) to scan, you may look for a specialized scanner just for may look for a specialized scanner just for them.them.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Transparency ScannerTransparency Scanner
For transparent material, both negatives and For transparent material, both negatives and slides, there are many makes and models to slides, there are many makes and models to choose from, but a commonly used one is choose from, but a commonly used one is made by Nikon.made by Nikon.
E.g., Nikon LS-2000 Film Scanner E.g., Nikon LS-2000 Film Scanner 36-bit color36-bit color
58mb file size58mb file size20 second scan speed20 second scan speed2700 dpi resolution2700 dpi resolution35mm film strip or slide format35mm film strip or slide format
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Overhead ScannerOverhead Scanner
If you do a lot of interlibrary loan, you If you do a lot of interlibrary loan, you may already own an overhead scanner. may already own an overhead scanner. it was designed for books, other bound it was designed for books, other bound documents, so that the page is documents, so that the page is protected from touch by the machine.protected from touch by the machine.
E.g., Minolta PS 3000 and PS 7000 are E.g., Minolta PS 3000 and PS 7000 are widely in usewidely in use
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
CamerasCameras
For 3-dimensional items and sometimes for For 3-dimensional items and sometimes for oversize items, cameras are becoming very oversize items, cameras are becoming very popular. Discussions on various listservs such popular. Discussions on various listservs such as “imagelib” are lively with comparisons of as “imagelib” are lively with comparisons of cameras from the consumer models we carry cameras from the consumer models we carry on our vacations to high-quality professional on our vacations to high-quality professional set ups.set ups.
E.g., Nikon COOLPIX 3100 E.g., Nikon COOLPIX 3100 Effective pixels 3.2 million (total pixels: 3.34 Effective pixels 3.2 million (total pixels: 3.34
million)million)
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Copy Stands Copy Stands
are used for long exposures, repeated are used for long exposures, repeated placement of objects, etc.placement of objects, etc. An example of a high quality camera and copy stand is An example of a high quality camera and copy stand is
the Leica S1 Pro Digital Camera used in the digitization the Leica S1 Pro Digital Camera used in the digitization lab at the University of Utah. It is described as: lab at the University of Utah. It is described as:
Triple linear color CCD line, high-performance full step Triple linear color CCD line, high-performance full step motor. motor.
Full scan time is 185 seconds. Viewfinder offers laterally Full scan time is 185 seconds. Viewfinder offers laterally correct image on a focusing screen with a grid. correct image on a focusing screen with a grid.
Produces file sizes of Produces file sizes of 75MB at 36 bit color or 75MB at 36 bit color or 150MB at 48 bit color. 150MB at 48 bit color.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Camera SpecificationCamera Specification
Resolution for cameras is often given as Resolution for cameras is often given as the total number of pixels delivered by a the total number of pixels delivered by a device. For example, a camera may be device. For example, a camera may be described as ‘x number of mega-pixels’described as ‘x number of mega-pixels’
A mega-pixel is 1,000,000 pixels.A mega-pixel is 1,000,000 pixels. E.g., Canon’s S45 (4.5 Megapixel) E.g., Canon’s S45 (4.5 Megapixel)
maximum resolution: 2272 x 1704 which maximum resolution: 2272 x 1704 which if you do the math, is closer to 3.8 if you do the math, is closer to 3.8 megapixels…megapixels…
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
For Highest Quality For Highest Quality Professional WorkProfessional Work
Photographers fit 4x5 traditional film Photographers fit 4x5 traditional film cameras with “camera backs” that cameras with “camera backs” that store the images digitally instead of store the images digitally instead of in analog format. E.g., PhaseOne in analog format. E.g., PhaseOne PowerPhase-- a digital back to a 4x5 PowerPhase-- a digital back to a 4x5 view camera that can produce view camera that can produce resolutions of 10,000 x 12,000 pixels. resolutions of 10,000 x 12,000 pixels.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Camera vs. ScannerCamera vs. Scanner
Scanners and cameras share broadly Scanners and cameras share broadly similar technologies, and at this point similar technologies, and at this point there are negligible quality differences there are negligible quality differences at the high end. Of course scanners can at the high end. Of course scanners can only handle 2-dimensional or flat only handle 2-dimensional or flat images, while cameras can handle bothimages, while cameras can handle both
2-dimensional and 3-dimensional 2-dimensional and 3-dimensional objects. objects.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Digital Cameras—Versatile and Digital Cameras—Versatile and Fast Fast
They are preferred for delicate or fragile They are preferred for delicate or fragile originals and increasingly for large flat originals and increasingly for large flat works such as maps and aerial photos.works such as maps and aerial photos.
But the lighting is hard to control to get But the lighting is hard to control to get professional quality work you may find professional quality work you may find yourself hiring a professional photographer yourself hiring a professional photographer to come in. Rare materials should not be to come in. Rare materials should not be subjected to strong light of course, so if subjected to strong light of course, so if doing that sort of photography in-house, doing that sort of photography in-house, you might use a strobe light.you might use a strobe light.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Prints From Digital: What Does Prints From Digital: What Does The User Need?The User Need?
Many libraries are creating revenue Many libraries are creating revenue generating (cost-recovery) programs generating (cost-recovery) programs that provide prints from the collection. that provide prints from the collection.
With the advent of digitization With the advent of digitization programs, these prints are increasingly programs, these prints are increasingly made from digitized copies of the made from digitized copies of the original. original.
Occasionally users even purchase the Occasionally users even purchase the digital file itself. digital file itself.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Cost of Commercial PrinterCost of Commercial Printer
To serve the occasional professional To serve the occasional professional user user
Outsource to a commercial house or offer to Outsource to a commercial house or offer to sell the digital image instead. sell the digital image instead.
““Pro-sumer” photo-quality printers can Pro-sumer” photo-quality printers can be had for under $100 be had for under $100
e.g., Canon i560s e.g., Canon i560s Some of your users may prefer to buy the Some of your users may prefer to buy the
TIFF and print at homeTIFF and print at home
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
IFIF You Print From Digital, You You Print From Digital, You Will House Large FilesWill House Large Files
The B&H Photo house in New York City The B&H Photo house in New York City estimates these file sizes for good output:estimates these file sizes for good output:
Up to 3 MB Up to 3 MB Good for proofing, web use, Good for proofing, web use,
presentationspresentations 3-20 MB 3-20 MB Good for up to 8x10 printsGood for up to 8x10 prints 21-50 MB 21-50 MB Good for up to 16x20 printsGood for up to 16x20 prints 51-99 MB 51-99 MB Good for up to 24x30 printsGood for up to 24x30 prints100-125MB 100-125MB Good for over 24x30 printsGood for over 24x30 prints
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Networking Puts It All Networking Puts It All TogetherTogether
To move your digital images from your To move your digital images from your workstation to your CONTENTdm server, workstation to your CONTENTdm server, you will use the internet. you will use the internet.
Your connection should have sufficient Your connection should have sufficient bandwidth for the digital formats you bandwidth for the digital formats you are importing. are importing.
Your users will of course need to have Your users will of course need to have connections strong enough to download connections strong enough to download the images in real time. the images in real time.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
SpeedsSpeeds
T1: 1.544 million bits per second (Mbps)T1: 1.544 million bits per second (Mbps)—this bandwidth is sufficient for building —this bandwidth is sufficient for building the collection.the collection.
T3: 45 Mbps – of course this is even T3: 45 Mbps – of course this is even better, much faster.better, much faster.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
WirelessWireless
The most popular wireless mode The most popular wireless mode 802.11 b/g (WiFi) 802.11 b/g (WiFi) shared 11 Mbps for “b” and 33-54 Mbps for shared 11 Mbps for “b” and 33-54 Mbps for
“g”. “g”. This should be quite adequate for your This should be quite adequate for your
end-users to access your collections.end-users to access your collections.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
SecuritySecurity
Network access is made secure through Network access is made secure through various methods, various methods,
IP ranges (addresses like 209.116.xxx.xxx)IP ranges (addresses like 209.116.xxx.xxx) PasswordsPasswords Mixed models Mixed models
Integrated with a parent organization’s Integrated with a parent organization’s model!model!
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Exercise #2Exercise #2
Materials Materials PreparationPreparation
Metadata, Rights, Quality Metadata, Rights, Quality ControlControl
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
MetadataMetadata
Standards and schemes Standards and schemes Access and preservation Access and preservation A full one-day workshop on the A full one-day workshop on the
metadatametadata Template for Local History Project in Template for Local History Project in
Project GuideProject Guide
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
A Refresher: A Refresher: What Do We Mean By What Do We Mean By
“Metadata”?“Metadata”? Metadata is information about the Metadata is information about the
digital object. digital object. Good metadata helps in finding and Good metadata helps in finding and
preserving a digital object or preserving a digital object or aggregation of digital objects. aggregation of digital objects.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Metadata Schema ExamplesMetadata Schema Examples
AACR2 (MARC format)AACR2 (MARC format) Dublin Core (DC)Dublin Core (DC) Visual Resources Association Core (VRA Visual Resources Association Core (VRA
Core)Core) Metadata Object Descriptive Schema Metadata Object Descriptive Schema
(MODS) (MODS) Encoded Archival Description (EAD)Encoded Archival Description (EAD)
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Types of MetadataTypes of Metadata
DescriptiveDescriptive AdministrativeAdministrative StructuralStructural TechnicalTechnical ““Preservation”Preservation”
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
DescriptiveDescriptive Metadata Metadata
Terms that say what the digital Terms that say what the digital object represents—what it is “about”object represents—what it is “about”
It’s what your users expect—it It’s what your users expect—it identifies the information resources identifies the information resources in a way that allows them to be in a way that allows them to be discovereddiscovered. .
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
AdministrativeAdministrative Metadata Metadata
Facilitates both short-term and long-Facilitates both short-term and long-term management and processing of term management and processing of digital collectionsdigital collections
Includes data pertinent to the Includes data pertinent to the creation of the digital object creation of the digital object
Includes rights management, access Includes rights management, access control and use requirements control and use requirements
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Structural Structural MetadataMetadata
Facilitates navigation and presentation Facilitates navigation and presentation Provides information about the internal Provides information about the internal
structure of resources structure of resources including page, section, chapter numbering, including page, section, chapter numbering,
indexes, and table of contentsindexes, and table of contents Describes the relationship among Describes the relationship among
materials (e.g., photograph B was front of materials (e.g., photograph B was front of Postcard A)Postcard A)
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
TechnicalTechnical Metadata Metadata
Describes the features of the digital Describes the features of the digital filefile e.g. resolution, e.g. resolution, pixel dimensions, pixel dimensions, and the compression factor used in and the compression factor used in
saving the file. saving the file.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
““Preservation” MetadataPreservation” Metadata
The ability to preserve your digital The ability to preserve your digital resources into the future depends in resources into the future depends in part on how completely you’ve part on how completely you’ve applied metadata, especiallyapplied metadata, especially
administrative administrative structuralstructural technical metadatatechnical metadata
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
LHDRP, CONTENTdm and the LHDRP, CONTENTdm and the Dublin Core Dublin Core
TitleTitle CreatorCreator Subject Subject Description Description Publisher Publisher ContributorContributor DateDate Type Type
FormatFormat IdentifierIdentifier SourceSource LanguageLanguage RelationRelation CoverageCoverage RightsRights CONTENTdm offers CONTENTdm offers
AudienceAudience too too
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
You Will Use These Elements You Will Use These Elements To Describe Your Collections To Describe Your Collections
At the item level At the item level during the CONTENTdm building process.during the CONTENTdm building process.
Later, your collections will be Later, your collections will be exported exported Imported to OAC Imported to OAC
Metadata and CONTENTdm classes Metadata and CONTENTdm classes scheduled scheduled
There we will delve into applying the There we will delve into applying the Dublin Core element set Dublin Core element set
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Rights: MetadataRights: Metadata
When material needs to be restrictedWhen material needs to be restricted The reasons should be made clear to the The reasons should be made clear to the
end-users, end-users, If possible, the right to access the objects If possible, the right to access the objects
should be negotiated. should be negotiated. You will have to clear your materials of You will have to clear your materials of
any restrictions so that they can be freely any restrictions so that they can be freely displayed on the CDL’s public access displayed on the CDL’s public access site(s). site(s).
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Access To Contentdm ServerAccess To Contentdm Server
The Dublin Core Rights field can be used The Dublin Core Rights field can be used to explain the rights situation for the item to explain the rights situation for the item
Mechanisms in place to allow you to Mechanisms in place to allow you to restrict access to materials at the item and restrict access to materials at the item and the collection level.the collection level.
Some commonly used mechanisms for Some commonly used mechanisms for controlling access to digital materials are controlling access to digital materials are user name/password challenges and IP user name/password challenges and IP (internet protocol) address ranges.(internet protocol) address ranges.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
CONTENTdm CONTENTdm
Uses both usernames/passwords and Uses both usernames/passwords and IP ranges IP ranges
Control access at the collection and Control access at the collection and the item level the item level
When your users are viewing your When your users are viewing your images on a CONTENTdm server.images on a CONTENTdm server.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Quality controlQuality control
Getting the materials off to the vendorGetting the materials off to the vendor appropriately packed, tagged and flaggedappropriately packed, tagged and flagged
Getting the materials back from the Getting the materials back from the vendorvendor what will you check for?what will you check for? texts and photos—different things to look texts and photos—different things to look
forfor
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Texts And Images Of TextsTexts And Images Of Texts
The scan produces a file in image The scan produces a file in image format, which in itself is not searchableformat, which in itself is not searchable
There are a number of ways to create There are a number of ways to create searchable text from images of text. searchable text from images of text.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Converting Images To TextConverting Images To Text
Re-keying Re-keying very expensive, but high-qualityvery expensive, but high-quality
handwritten text, or foreign language handwritten text, or foreign language fonts fonts
you will have to create typescripts by hand. you will have to create typescripts by hand. OCR (Optical Character Recognition) is OCR (Optical Character Recognition) is
the automated waythe automated way With correction, expensive, With correction, expensive,
but without correction lower accuracybut without correction lower accuracy
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
What is OCR?What is OCR?
OCR “engines” are OCR “engines” are pattern recognition algorithms which pattern recognition algorithms which
cancan convert images of alphanumeric convert images of alphanumeric
characterscharacters into machine-recognizable characters. into machine-recognizable characters.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
OCR Has Been Around Since OCR Has Been Around Since The 1970sThe 1970s
Much research to improve accuracy Much research to improve accuracy and extend the readable language and extend the readable language sets. sets.
Very expensive in the early daysVery expensive in the early days Available to desktop consumers in Available to desktop consumers in
the mid-late 1990s the mid-late 1990s
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Now There Is Decent “Pro-Now There Is Decent “Pro-sumer” Desktop Softwaresumer” Desktop Software
Such as AbbyyFine Reader available Such as AbbyyFine Reader available (e.g., this is offered as an extension of (e.g., this is offered as an extension of
CONTENTdm.) CONTENTdm.) Service bureaus (vendors) have also Service bureaus (vendors) have also
developed proprietary softwaredeveloped proprietary software get up to 90% accuracy get up to 90% accuracy can handle large volumescan handle large volumes use filters, formulas and multi-pass use filters, formulas and multi-pass
methods methods
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
The Problem With OCRThe Problem With OCR
When used on barely legible old texts, When used on barely legible old texts, film, etc., creates “dirty” ASCII—film, etc., creates “dirty” ASCII—
“ “Guesses” are saved in a string Guesses” are saved in a string not intended for human view. (These not intended for human view. (These
should be cleaned up if display is should be cleaned up if display is important.)important.)
can hide the dirty ASCII from display but can hide the dirty ASCII from display but allow the search engine to index on it allow the search engine to index on it
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
To What Degree Is The To What Degree Is The Accuracy Of The OCR Accuracy Of The OCR
Important?Important? This depends on the quality of the This depends on the quality of the
image being processed, and on the image being processed, and on the intended use of the captured text. intended use of the captured text.
A rule of thumb: high resolution, greater A rule of thumb: high resolution, greater bit-depth gives more accurate OCR (and bit-depth gives more accurate OCR (and larger file sizes).larger file sizes).
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Imaging Vendor Checklist:Imaging Vendor Checklist:Identifying Unacceptable ScansIdentifying Unacceptable Scans
Image not correct sizeImage not correct size File name is incorrectFile name is incorrect File format is incorrectFile format is incorrect Loss of detailLoss of detail Too light or too darkToo light or too dark Image cropped incorrectlyImage cropped incorrectly Image rotated incorrectlyImage rotated incorrectly Image reversedImage reversed
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Identifying Correct Packaging Identifying Correct Packaging Of Digital MaterialsOf Digital Materials
Object identifierObject identifier The order of the compound object’s The order of the compound object’s
parts parts corresponding file names and directory corresponding file names and directory
structure structure Verify to CALIFAVerify to CALIFA
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Exercise #3Exercise #3
Quality ControlQuality Control
EffectivenessEffectiveness
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
What Is Success ?What Is Success ?
Best practices in the digitization process, Best practices in the digitization process, evaluation and quality control. evaluation and quality control.
Usability testingUsability testing As technology changes, As technology changes,
as long as you are relying on agreed-upon as long as you are relying on agreed-upon standards, standards,
You will be able to go back and correct, You will be able to go back and correct, improve and expand.improve and expand.
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
User-driven Purposes
Many reasons for undertaking a Many reasons for undertaking a digitization project, digitization project,
All include to improve and expand end-All include to improve and expand end-user access to your materials. user access to your materials.
Even “preserving” the content and Even “preserving” the content and “conserving the originals”“conserving the originals”
It is because someday a person may need to It is because someday a person may need to access the resourceaccess the resource
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Late Turn-of-the-century Late Turn-of-the-century History History
Regular use of digitization in cultural Regular use of digitization in cultural heritage organizations heritage organizations such as libraries and archivessuch as libraries and archives
Leaders in the field like the California Leaders in the field like the California Digital Library, the Digital Library Digital Library, the Digital Library Federation documented “best Federation documented “best practices”practices”
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Principles, Part 1Principles, Part 1
Leading practices proven over timeLeading practices proven over time Scan at the highest resolution appropriate to the Scan at the highest resolution appropriate to the
informational content of the originals informational content of the originals Scan at an appropriate level of quality to avoid Scan at an appropriate level of quality to avoid
rescanning and re-handling of the originals in the futurerescanning and re-handling of the originals in the future—scan once —scan once
Create and store a master image file that can be used to Create and store a master image file that can be used to produce derivative image files and serve a variety of produce derivative image files and serve a variety of current and future user needs current and future user needs
Use image file formats and compression techniques that Use image file formats and compression techniques that conform to industry standardsconform to industry standards
Create backup copies of all files on a stable medium Create backup copies of all files on a stable medium
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Principles, Part 2:Principles, Part 2:
Create meaningful metadata for image files or Create meaningful metadata for image files or collections collections
Store media in an appropriate environment Store media in an appropriate environment Monitor and recopy data as necessary Monitor and recopy data as necessary Outline a migration strategy for transferring Outline a migration strategy for transferring
data across generations of technology data across generations of technology Anticipate and plan for future technological Anticipate and plan for future technological
developmentsdevelopments Scan (or have your vendor scan) at the Scan (or have your vendor scan) at the
appropriate settings for source materialappropriate settings for source material Inspect master images at 100% magnification Inspect master images at 100% magnification
(all or a sample)(all or a sample)
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Local History Project Local History Project StandardsStandards
The California State Library, CALIFA and CDL The California State Library, CALIFA and CDL Partnered to create a set of standards for Partnered to create a set of standards for
digital imaging and metadatadigital imaging and metadata To ensure that your collections are To ensure that your collections are
accessible to your public and well-preserved accessible to your public and well-preserved into the future. into the future.
Selected a digital collection management Selected a digital collection management tool tool
Prepared a straightforward path for your Prepared a straightforward path for your materials from CONTENTdm to the CDLmaterials from CONTENTdm to the CDL
Let’s Revisit Our Project Let’s Revisit Our Project PlansPlans
And make sure we chart our And make sure we chart our course for the next steps!course for the next steps!
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
Exercise #4Exercise #4
Assessing and Assessing and Improving Your Improving Your
Local History ProjectLocal History Project
August 2005August 2005Digital Image Scanning Digital Image Scanning Geri Ingram, DiMeMa, Geri Ingram, DiMeMa, Inc.Inc.
ConclusionConclusion
Please fill out your evaluation formsPlease fill out your evaluation forms See you in a few weeks for See you in a few weeks for
CONTENTdm training!CONTENTdm training!