digitaal some exciting examples ineke schuurman coordinator clarin-vlaanderen
DESCRIPTION
digiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen. Digital Humanities. Language as object of research Language as means for research Modern languages Old languages Written, audio, video (collections of) documents. Treebanks. - PowerPoint PPT PresentationTRANSCRIPT
digiTAAL
Some exciting examples
Ineke Schuurmancoordinator CLARIN-Vlaanderen
Digital Humanities
• Language as object of research• Language as means for research
• Modern languages• Old languages
• Written, audio, video (collections of) documents
3
Treebanks
Available for most ‘modern’ languages
But also possible for ‘dead’ languages like Latin, Ancient Greek
http://nlp.perseus.tufts.edu/syntax/treebank/getinvolved.html
Index Thomisticus Treebank, Milanohttp://itreebank.marginalia.it/
Full query language needed
More treebanks• Medieval Portuguese treebank
– Under construction• In the near future: INPOLDER (CLARIN NL)A parser, not yet a corpus, BUT:
through web interface raw older Dutch text can be entered, and parsed text (syntactically analysed) will be returned
– Uncorrected, but manual correction is possible
5
Visualization
Gabmap: doing dialect analysis on the webADEPT-project (CLARIN-NL)
Dialects (examples Netherlands/Flanders + USA)
www.gabmap.nl, including tutorial, manual, video, FAQ, …
6
Pronunciation distance
Gabmap: doing dialect analysis on the web
7
Dendrogram
Gabmap: doing dialect analysis on the web
Audio
CLARIN pilot (NL/FL) TTNWW, audio partTAAL2SPRAAK (CLARIN-Vlaanderen)
Audio as a means to enlarge accessibility of larger collections of data (tapes)
Transcription, even if not 100% correct, is very helpful in finding what you are looking for, especially if synchronized with time
(useful for psychology, sociology, history)
Audio and older texts
• Digitization of old texts still problematic (cf DigiHIST)
Experiment: Read medieval text aloud and have it automatically transcribed(not trained, modern language model used)
Audio Leuvense Schepenbank
• http://www.ccl.kuleuven.be/CLARIN/SAL8130_0093_inge_moris.hardsubs.mp4
• http://www.ccl.kuleuven.be/CLARIN/SAL8130_0093_inge_moris_4gr.pdf
Raw material !!
Written part TTNWW
• Relate documents, make texts more accessible by making explicit data that are not expressed as suchParis formulated objections, London/John didn’t
What is a name, what kind of name is it?• Analysis of names in fiction• Sagalassos project (archaeology): temporal
and geospatial analysisweb service, end of 2012
Some more examples
• When is ‘now’? And where?
Stylometry
Stylene (CLARIN-Vlaanderen)– UAntwerpen/Univ.College Gent
• Is text as a whole written by same person?• Show development in style of a specific author• Is a text clear? Is it really understandable by ,
say, children age 10-12?
Web service (autumn 2012)
‘stylometry’ as means
• Is thesis X written by student or by ‘Wikipedia’
Reliability
• Can text X be written by a 10 year old girl paedophily
Reusability of data
• For same kind of research• For completely other kind of research
Both should be encouraged• time and money
To be taken into account: IPR !
Veterans project
• Interviews veterans Dutch military actions (1940-2010)
• 1000 interviews (2.5 h), semi-structured
Original: social and military historiansWho else can use this archive ?First: reluctance
Veterans 2
People from divers disciplines invited to write paper: theology, psychology, discourse analysis, anthropology, sociology,..)
Turned out to be a very valuable corpus!
Digital Humanities aspect: several tools were made available to facilitate research in different disciplines, tools to give access to spoken content
“Circulation of Knowledge”
“Geleerdenbrievenproject” (Letters of scientists)17th century: Grotius (Hugo de Groot),
Constantijn Huygens, Christiaan Huygens, Descartes, …
20.000 letters, mainly Dutch, French, Latin
Intended for “history of science”, of course also relevant for other disciplines
Polish example: Sejm
• Polish parliament, 1918 – now• Texts, records, video
Goal: all kinds of linguistic research
• But of course: wealth of information for other disciplines as well
Conclusions
• Several ‘easy-to-use’ research possibilities are (or will soon be) available
• Others are still more complex, but do offer possibilities for new kinds of projects (or easier ways of doing research)
• Lots of material could be used by third parties as well: do not keep stuff “in your drawer”
• Students and (young) researchers should be made aware of new possibilities
• Sound Registers (1739-1799)
35