the national digital newspaper program (ndnp)
DESCRIPTION
The National Digital Newspaper Program (NDNP). An NEH/LC Collaborative Program Enhancing access to historical newspapers Release: September 2006. NDNP Mission. Enhance access to all American newspapers - PowerPoint PPT PresentationTRANSCRIPT
The National Digital Newspaper Program (NDNP)
An NEH/LC Collaborative Program
Enhancing access to historical newspapers
Release: September 2006
2
NDNP Mission
• Enhance access to all American newspapers• Improve access to products of United States
Newspaper Program (USNP) using current technologies
• Establish standards and “best practices” for newspaper digital reformatting and access
• Use multi-phased approach for research and scaled development
• Develop geographically-diverse program that benefits all US communities
3
Why Newspapers?
• Newspapers: a unique resource for understanding the fundamentals of history – Democracy, free press, diverse geographic
viewpoints at the community level
• Enormous corpus of newspapers presents an archival challenge
• Text-intensive layout is labor-intensive to search without reference tools
• Digitization of microfilmed corpus economically feasible
4
Why a National Effort?
• Voluminous, distributed collections• No one institution holds the “master collection”• Broad user-base for newspaper material• Think nationally, select locally• Comprehensive chronological coverage,
eventually • Need for leadership to build on past national
efforts (USNP)
5
LC’s Historical Newspaper Activities
• 20-year NEH/LC collaboration of USNP– Existing national network of cooperative programs– Standards established for preservation microfilm– Standards established for descriptive metadata/
cataloging
• American Memory’s “Stars and Stripes” – http://memory.loc.gov/ammem/sgphtml/sashtml/sa
shome.html– Proof-of-concept for historical newspaper format
and description
6
What will NDNP Produce?
• Web access to– National directory of US newspaper holdings (what,
when, where) – based on USNP legacy data– More than 30 million page images of historical
newspapers digitized primarily from microfilm, with full-text
– Historical context of newspaper, printing tech, etc
• Depository of duplicate digitized microfilm at LC
7
How?
• Multi-partner program– NEH: Funds the program (“We the People” initiative)– LC: Aggregates, preserves and serves– Awardees: Selects and converts
• Phase I – FY04-FY06 (Test bed)• NEH awardees (up to 10) with existing digital collections
infrastructure and master microfilm negatives• 100,000 pages each + 100,000 LC pages by 2007 (from
1900-1910)• Microfilm reel analysis for research
8
• Phase I Timeline
2004July – NEH cooperative agreement guidelines issued, LC
technical architecture under development
October – Application deadline; 15 applications received
2005April – NEH Awards announced
May – Award conference held at LC
2006September – NDNP application publicly available via Web
9
NDNP, September 2006
• Web access - American Chronicle – Newspaper Title Directory, 1693-present– Full-text of content w/in visual newspaper
layout (page-level access) – Contextual historical material (Encyclopedia)
• Converted content from all awardees– Initial time period covered: 1900-1910
10
Newspaper Title Directory
• Re-use of CONSER and Newspaper Union List, created under USNP (maintained by OCLC)
• 147,000 newspaper titles • 900,000 holdings records• Searchable, Web access to all USNP-
collected data, tied to digitized issues when available, as well as external newspaper Web sites
11
Full Text with Page-level Access
• Preserves integrity of primary historical content, text in context
• Minimal metadata required to achieve reasonable search results
• Economics of large-scale, large-format digitization
• Allows creation of substantial content-base for research and development on additional search strategies and technologies
12
Digital Asset Specifications
• Page Image - grayscale, 400 dpi, from microfilm
• TIFF 6.0; JPEG 2000 (.jp2); PDF with Hidden Text
• OCR• XML – NDNP/ALTO Schema • Page-level, uncorrected, column
zones with “bounding box” mapping coordinates
• Metadata • XML in METS/MODS for digital
objects
13
Historical Context
An Encyclopedia of Newspaper History• Brief essays for each title digitized
– Publisher, geography, significant events covered, audience/community, politics
• History of each participating state and the role of newspapers in its history
• Presentations for technology developments, significant people, places, etc
14
• Future Phases: 2007-2024• Addition of new partners (continuation of Phase I
test bed, to represent all 54 states and territories)• Increased efficiency in workflows, tools,
technology, sustainable resources• Additional access capabilities, improved
technology
Aggregate ~ Preserve ~ Serve
15
For more information, contact [email protected]
Georgia HigleyHead, Newspaper SectionSerials and Government Publications DivisionLibrary of [email protected]://www.loc.gov/rr/news/