semantic blumenbach digital library & virtual museum · project called "semantic...
TRANSCRIPT
<term xml:lang=‚la' sortKey='Hystrix'>Hystrix</term></hi> </hi> </hi>. <term xml:lang=‚de' sortKey='Stachelschwein'> Stachelschwein</term>. (Fr. <hi rendition="#i"> <hi rendition="#r"> <term xml:lang=‚fr' sortKey=' porc-epic '> porc-<lb type="inWord"/>epic</term></hi> </hi>. Engl. <hi rendition="#i"> <hi rendition="#r"> <term xml:lang=‚fr' sortKey=' porc-epic '> porcupine</term></hi> </hi>.) ….. <p rendition="#l2em">v.<persName ref=''> Schreber </persName><hi rendition="#r">tab</hi>. 169.</p> <p rendition="#l1em">In<placeName ref='#GettyId:'> Canada</placeName>, auf Labrador, um die Hudsons-<lb type="inWord"/>bay etc. Thut zumahl im Winter den jungen<lb/>Baumstämmen großen Schaden.</p> <p rendition="#indent-2">2. <hi rendition="#i"> <hi rendition="#r"><term xml:lang=‚la' sortKey='Hystrix Cristata'>Cristata</term></hi> </hi>. <hi rendition="#r">H. spinis longissimis, capite cri-<lb type="inWord"/>stato, cauda abbreuiata</hi>.</p>
Contact: Dr. Jörg Wettlaufer Akademie der Wissenschaften
zu Göttingen (ADWG) Göttingen Centre for Digital
Humanities (GCDH) Tel.: +49 (0)551 39 20477 [email protected] www.gcdh.de | www.digihum.de
Semantic Blumenbach
II. Named Entity Recognition
on multilingual historical TEI encoded texts
Corpus of 12 editions of “Handbuch der Naturgeschichte” from 1779-1830.
Named Entity Recognition (NER) on TEI P5 Tite encoded full-texts for places, persons and objects from the natural history domain.
Irregular orthography and multilingual texts from the second half of the 18th century. Precision and recall above 90% with a list-based parser.
Identification of app. 10.000 terms/text and app. 1000 persons/ 1200 places/ 1300 references to the collection database per text.
References to collection database via:
<term xml:lang="de" sortKey="Holz"><rs type="Palaeobotanik" ref="101 113 194 195 196 209 313 409 440 642">Holz</rs></term>
Scientific Communication Infrastructure
References
Goerz, Guenther & Martin Scholz: Adaptation of NLP Techniques to Cultural Heritage Research and Documentation, Journal of Computing and Information Technology - CIT 18, 2010, 4, 317–324.
Wettlaufer, Jörg & Sree Ganesh Thotempudi: Poster - NER in historical Text corpora. Lessons learned so far. 4.-6.03.2013, Mehr Personen – Mehr Daten – Mehr Repositorien, Tagung des Personendaten-repositoriums der BBAW, Berlin.
Object
Object Metadata
Text (Metadata)
Link to catalog http://resolver.sub.uni-goettingen.de/
purl?PPN625161807_0009
http://books.google.de/books?
id=fnfwrkZjm9kC D. Joh. Fr. Blumenbach's … Handbuch der Naturgeschichte : nebst zwey Kupfertafeln. – Sechste Auflage. – Göttingen : Bey Johann Christian Dieterich,
1799.
Abbildungen naturhistorischer
Gegenstände 9 (81; 1809): Taf. 81
Semantic is author of
has collected
Digital Library & Virtual Museum Digital Humanities Research Collaboration – Lower Saxony
I. Introduction
Blumenbach-online, a project of the Göttingen Academy of Sciences and Humanities, started in January 2010 and aims at both digitizing and presenting the writings and collections of the influential Gottingen physician and naturalist Johann Friedrich Blumenbach (1752-1840), one of the founding fathers of physical anthropology, online. To date, almost half of the textual material (77.000 pages altogether) and roughly a quarter of the collections have been digitized and converted into TEI-encoded texts or entered into a database. It is through an application of Semantic Web technologies in a spin-off project called "Semantic Blumenbach" that we hope to explore text-object relationships and establish methods for presenting and providing linked data for Blumenbach-online.
genus species
Academy of Science
and Humanities
Göttingen, Germany
Martin Scholz,
Diplom-Informatiker
Friedrich-Alexander-Universität
Erlangen-Nürnberg
Department Informatik
AG Digital Humanities
Tel: +49 9131 85 29094
martin.scholz @cs.fau.de
Pictures (Metadata)
Text
„ semblu“ Semant ic Blumenbach OWL
III. WissKI
Project of the Artificial Intelligence Chair from the Department of Computer Science at the Friedrich-Alexander-University of Erlangen-Nuremberg (AI) and the Department of Museum Informatics at the Germanisches Nationalmuseum (GNM) in Nuremberg .
The goal of the WissKI project is to apply the concept of Wikis to the scientific domain and to support transdisciplinary collaboration between scientists and researchers from various domains.
Drupal 6 based modules with ECRM Top-Ontology
VI. Conclusion
With the additional modules developed for Semantic Blumenbach, WissKI provides a powerful and attractive environment for connecting both texts and objects in RDF within the CRM-model.
Ingest of already annotated TEI-P5 texts with several hundred pages is now possible.
WissKI provides a SPARQL Endpoint and supports LIDO as exchange format.
Downside: High learning curve for modelling data in CIDOC-CRM.
Mark Fichtner
Diplom-Informatiker
Germanisches Nationalmuseum
Referat für Museums- und
Kulturinformatik
90402 Nürnberg
Tel. +49 911 1331-264
Exploration of text-object relationship with sematic web technologies in the history of science
Sree Ganesh Thotempudi Digital Humanities Research
Collaboration – Lower Saxony
Göttingen Centre for Digital
Humanities (GCDH) Tel.: +49 (0)551 39 20479 [email protected]
www.gcdh.de
Optimization for this edition
Results for 12 editions of the same book
Absolute numbers of tagged strings in the different editions of
Blumenbachs‘ natural history manual from 1779 - 1830
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
<personName>
<placeName>
<term>
<rs>
V. Connecting Texts and Objects
Connecting text (terms with reference strings) and objects via unique IDs of collection database (semblu:E42_KerndatenID).
Ingest of TEI encoded books with hundreds of pages now possible with newly added WissKI-modules (texttei and book_import) including triplification of entities from the text with a XSLT stylesheet by Martin Scholz.
Ingest and Disambiguation of Data from a RDB via ODBC Connector to WissKI by Mark Fichtner.
visit us at: wiss-ki.eu & www.blumenbach-online.de & dhfv-ent2.gcdh.de/blumenbach/wisski/
IV. Modelling the data in ECRM
Erlangen CRM (ECRM) is an OWL-DL Version of CIDOC Conceptual Reference Model (CRM). It serves as top –ontology allowing for application ontologies (semblu).
For internal display and import of data the path- builder module of WissKI is used. By defining groups for linguistic objects and museum objects the system can disambiguate incoming data automatically and therefore connect objects using the same identifier.
Good support for local and global authorities by the WissKI-system is available.
Graph view of the text-object relationship
Pathbuilder
Basic model of the text-object relationship