iisg applications overview
TRANSCRIPT
Royal Netherlands Academy of Arts and Sciences (KNAW)
International Institute of Social History (IISG)
Library Applications Workflow
Vyacheslav Tykhonov
mailto: [email protected] 18, 2012
Software Tools Overview
Evergreen library system (core) with external applications developed in IISG
Digital Repository to store metadata and files (images, video, audio, etc)
OCR service to convert images to text VisualMets Viewer to browse scans HiTIME project for Named Entity Recognition Search (VuFind) as interface to access linked
metadata
Evergreen applications overview Charts Builder GeoLocator Visual Timelines Custom Reports Open Archives Initiative Protocol (OAI) for
Metadata Harvesting (for VuFind/Wordcat,...) ISBN reader Related bibliographic records finder Authority linking application
Evergreen Charts Builderhttp://evergreen.iisg.nl/charts/report.1900.html
Charts Builder with filtering by country/language/dates
Open website link
Evergreen GeoLocatorExample
OCR ServiceWebsite Link
Optical character recognition (OCR) application for conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text
Texts can be stored in Digital Repository as separate layer and used for further analysis
OCR service can recognize more than 40 languages with high accuracy
Can be trained to work in other languages too High speed of recognition (1-2 second/page)
OCR Example (Dora Russel Archive)Example
HiTiME ProjectGo to website
HiTiME is text analysis system for the recognition and extraction of historical events and facts from historical sources and archives.
Named Entity Recognition process: Persons (Dora Russel, Karl Marx, ...) Locations (Amsterdam, the Netherlands, ...) Dates (October 18, 2012,...)
All named entities will be stored in Knowledge Base and can be linked, persons can create social networks.
IISG resources for HiTiME (Machine Learning)
Training on Authority Records from Evergreen can improve accuracy and recall of Named Entity Recognition (NER)
Evergreen marc21 records for Topic Detection and Tracking (for example, 6XX Subject Access Fields, etc..)
IISG archives and collections can be used to create corpus of related documents
HiTiME - BWSA 2.0 Demohttp://ilk.uvt.nl/hitime/bwsa_tmp/
HiTiME – ExampleDemo
Combining of Tools: OCR + HiTIMEOpen Application
Visual Mets + OCR + NERDemo
Visual Timeline ApplicationExample
Questions?
International Institute of Social History (IISG)
Royal Netherlands Academy of Arts and Sciences (KNAW)
Digital Infrastructure Department (DI)
Vyacheslav Tykhonov
Library Systems Developer
mailto: [email protected]