abstracts dhbenelux tuesday · currencies such as bitcoin, it seems future generations will see...
TRANSCRIPT
1
AbstractsDHBenelux2017conferenceTuesday4July2017
SessionA
1.CoinProductionintheLowCountries,fourteenthcenturytothepresentRombertStapel1,JacoZuijderduijn2,JanLucassen1,KerimMeijer1
InternationalInstituteforSocialHistory,Amsterdam,NetherlandsLundUniversity,Lund,Sweden
Thisprojectcollects,combinesandmakesavailabledataonmintproductionintheLowCountries(Netherlands,Belgium,Luxembourg)andhasdevelopedawebapplicationtoqueryandvisualizethedata,whichisalsolinkedtoadigitalmapof(changing)historicalboundariesintheLowCountriesfrom1100tothepresent(availableinLinkedOpenData).Itprovidesscholarswithauser-friendlyapproachtolargedatasets,andallowsthemaccesstosuchvariablesasregionalproductionfiguresandcoindenominations.
IntroductionMonetizationisakeyconceptineconomicsandineconomichistory.Throughouthistorycurrencieswereacrucialelementofeconomicexchange:firstintheformofmetalcoins,whichmadeupthelion’sshareofcurrencies,andwerewidelyusedineverydaytransactions.Onlymuchlaterpapermoneyalsoemerged:beforetheFirstWorldWarveryfewnormalpeoplewouldhaveeverseenpapermoney.Finally,nowadaysnon-materialbookmoneyhasbecomemuchmoreimportantthancurrencies,andwiththeonsetofmobilebankingandvirtualcurrenciessuchasBitcoin,itseemsfuturegenerationswillseemuchlesscurrenciesthanpeopleinthepast.
Historicalsocietiesdependedasmuchonmediaofexchangeaswedotoday:coinsandpapermoneyhelpedagreatdealinrealizingeverydaytransactions,asdidvariousformsofcredit.Coinproductionfiguresareofcrucialimportanceforunderstandingdevelopmentinthelongrun.1Thestudyofcoinage,theirquantity,denominations,use(e.g.inwagepayments)andmonetarypolicyingeneralprovidesimportantinsightineconomicandsocialhistoryandthisprojectprovideshistoriansafirmquantitativebasisfortheirresearch.
Inthispaper,wewillpresenttheprojectanditsgoals,giveanoverviewoftheprocessofdatacollectionandthewebapplicationwebuilttoqueryandvisualizethedata(includinggeospatialvisualizations),andprovidesomeoftheresultsforhistoricalresearchthatstemfromourdataset.
ProjectCoinProductionintheLowCountries,fourteenthcenturytothepresentprovidesanoverviewofcoinproductionfigurescoveringmanycenturies.Ofcoursewedealwithomissions:notallmintaccountsgobacktothefourteenthcentury,andnotalladministrationhassurvived.Thewebsiteallowsforanoverviewoftheminthousedatawehaveatourdisposalatthemoment,and1 Jan Lucassen and Jaco Zuijderduijn, ‘Coins, currencies, and credit instruments. Media of exchange in
economic and social history’, Tijdschrift voor sociale en economische geschiedenis 11 (2014) 1-13.
2
visualizesthemissingdata.CoinproductionintheLowCountries,fourteenthcenturytothepresentalsodoesnotpretendtobethefinaldataset:likeanyotherdatasetitreflectsthedatathathasbeencollectedandmadeavailableupuntilnow.AlthoughweareconfidentwecoverthevastmajorityofthecoinsmintedintheLowCountries,someneworoverlookedsourcesmayemergeinthefuture;wearelikelytomakeadditionsintime.Thedatasetrepresentsthedatawepresentlyhave,andisatooltobeusedbyscholarslookingforvariablesrelatedtocoinproduction.
Ourgoalforthisprojectwastotaketheaforementioneddatasets,checkthevalidityofthecollecteddata,selectand/or(re)calculatetherelevantvariablesforourproject,combinethedifferentdatasets,andpresentourselectedvariablesinawebapplicationwhichallowstheusertoqueryandvisualizethedata.
Manualwebapplication
Figure1.NumberofcoinsmintedinFlandersbetween1334and1700,organisedperalloy(status:November2016).
Inthewebapplication2,theusercanquerythedataandcreate(andexport)theirownsubsets.Differentqueriesandselectionscanbemadeatthetopleft.ThisincludesthepossibilitytodisplaytheValueindeniergroot,acommoncoinusedasmoneyofaccount,inhourlywages.Thequerystartsbyclicking‘Run’.Therearethreetabs:‘Table’,‘Chart’,and‘Map’.Thevariablesinthetableandchartcanbeadjustedfreelyontheright.Themapneedssomefurther
2https://datasets.socialhistory.org/dataverse/coinproduction/search/.
3
introduction.Atthemoment,themapisusedtogivearoughindicationhowcompleteourdatasetisforparticularminthousesandauthoritiesintime.WehaveturnedtotheworksbyHugoVanhoudtandH.EnnovanGelder,supplementedwithdatafromourowndatasets,todeterminetheyearsofactivitiesofminthousesandauthorities.3
Thecolourofaregionthatmintedcoins(e.g.DuchyofBrabant)willbedependentonthenumberofyears(inaparticularquery)weknowthatregionwasmintingcoinsandforwhichofthoseyearswehaveactualproductionfiguresinourdataset.Thisalsoappliestotheminthouses,wherewehaveusedpiecharts.Forthispurpose,wehavecreatedaGISmapofallmajorauthoritiesintheLowCountriesintime.4Thismeansthatborderswillchangewithtimeandminthouseswillpopupanddisappear.5Aslideronthetopleftcornerofthemapallowstheusertochangetheyears.On therightoftheapplication,differentoptionsregardingthemapcanbeselected,choosingwhetherthecoloursandpiechartsshouldchangeinstantaneouswiththesliderornot.
Figure2.MapoftheLowCountries(1432)withpercentageofdataavailabilityinthatyear(status:November2016).
3 H. Vanhoudt,Atlas dermunten van België van de Kelten tot heden (Heverlee 2007, 2nd edition); H.E. vanGelder,DeNederlandsemunten(Utrecht2002,8thedition).
4ForsomeimportantdisclaimersregardingtheseGISmaps,seetheintroductionathttp://hdl.handle.net/10622/HPIC74/.5 This process was visualized in a movie of the period 1100-2016, where each frame is a year:http://hdl.handle.net/10622/5KGG1T.
4
2.MappingthePlace:“DeKrookQuarter”PirayeHacıgüzeller,SallyChambers,ChristopheVerbruggenandHansBlommeGhentCentreforDigitalHumanities,GhentUniversity
ThepresentationwillelaborateonanewprojectGhentCentreforDigitalHumanities(GhentCDH)isstartingtocarryout,“MappingthePlace:‘DeKrookQuarter’”,whichinvolves“deepmapping”ofahistoricaldistrictinGhent.Inthepresentation,thecontext,framework,workflowandimpactoftheprojectwillbedescribedanddiscussed.
Theobjectiveofthe“MappingthePlace”projectistoharnessthewell-demonstratedpowerofcartographyasaparticipatorytool(Perkins2007).Specifically,theprojectaimstocontributetotheparticipatorygovernanceofculturalheritageinEuropethrough“deepmapping”adistrictinGhent(Belgium)thatembodiesplace-basedheritagesuchasVooruit(apeople’spalaceestablishedin1913thathasbeenturnedintoavibrantinternationalcontemporaryartscentre),theMinard Theatre,De Krook(thenewlybuiltcitylibraryanddigitalinnovationcentre)andadjoiningformerWintercircus,andthesurroundingstreets(Kuiperskaai)thatusedtoconnectaLatinQuarterandredlightdistrict.Incollaborationwiththeheritageinstitutionsresponsibleformanagementoftheseplaces,GhentCDHwillemployavarietyofparticipatorymappingtoolsandmethodologiesinordertoinvolvearangeofcommunitiesinadeepmappingproject.
Deepmapsare“thickspatialdescriptions”ofplacesbreakingawayfromCartesianparadigmincartography.Thelatter,knownalsoas“Westernscientificmapping”(Pickles2004;seeTurnbull1996),limitbothcontentandmethodsofmappingasittraditionallyaimstomaponlyempiricallyobservablephenomenathatisconsideredtoconstituterealityexclusively.Deepmaps,ontheotherhand,inspiredbytheconceptof“thickdescription”coinedbyanthropologistCliffordGeertz(Bodenhameretal.2105),arebasedonamuchmoreflexibleandfruitfuldefinitionofwhatcanconstituteamapandwhatconstitutesplacesaimingtobringtogetheralargeandricharrayofspatialqualities.Deepmappingisevenmorepromisingtodayasdigitalcartographyopensupmanypossibilitiestocollectandcrowdsourcenewtypesofgeospatialinformationandvisualise,integrateandanalyseitinnovelwayswiththehelpoftechnologiessuchasgeographicalinformationsystems,virtualandaugmentedrealityand,realtimemapping.
TheparticipatorydeepmapofGhent,displayedinDeKrookandVooruit,willbeaninnovative,openended,multi-vocalandlargelydigitalcartographicprocessthatwillbringtogethergeographicalinformation,sensualexperiences,memories,oralhistories,creativenarratives,emotions,knowledges,imaginations,practicesandevents.Themapisplannedtobeproducedthroughthefollowingfivetypesofactivity:a)playfulcommunitymappingexercises(Pinder1996;2005)willbeorganisedfordiversegroupsinordertocarryoutacertaincartographictask(e.g.mappinganarea)andtheirknowledgeandexperiencesoftheplaceswillberevealedintheprocessthroughtheirinteraction(e.g.Grasseni2004)b)adigitalonlinecrowdsourcingplatformforheritageplaceswillbecreatedwherepeoplecanentercartographicinformation(seePerkins2013);c)geospatialdataonpeople’semotions(http://biomapping.net/),movement,soundandsmellwillbecollectedinreal-timeandconvertedintodatasculpturesorpaintingsbyartists(see,e.g.,www.refikanadol.com/);d)multi-layeredgeographicinformationsystemsandthree-dimensionalvirtualrealitydisplayswillbeinstalledinDeKrookaffordingadiversegroupsofvisitorstoannotatetheirexperiencesandknowledgeaboutheritageplacesfocusedinthedeepmappingprojecte)(non-)digitalmap-basedor-aidedgames(e.g.geocaching)willbedesigned,developedand/oremployedinordertofacilitateconversationaboutheritageplacesinquestionbetweendiversegroupofpeopleaswellasinformingandengagingthemwiththeseplaces.ThelayersoftheparticipatorydeepmapwillbedistributedacrossmanylocalsinDeKrookcomprisingageographicalinformationsystemscomponent,virtual
5
realityroom,gameroom,exhibitionroom,digitalsculptureandpaintingrooms,screensforrealtimemapping,andcomputerswithaccesstothedigitalcrowdsourcingplatform.
ReferencesBodenhamer,D.J.,Corrigan,J.&Harris,T.M.eds.,2015.Deepmapsandspatialnarratives,Indiana:IndianaUniversityPress.
Grasseni,C.,2004.Skilledlandscapes :mappingpracticesoflocality.EnvironmentandPlanningD:SocietyandSpace,22,pp.699–717.
Perkins,C.,2007.Communitymapping.TheCartographicJournal,44(2),pp.127–137.
Perkins,C.,2013.Plottingpracticesandpolitics:(Im)mutablenarrativesinOpenStreetMap.TransactionsoftheInstituteofBritishGeographers,39(2),pp.304–317.
Pickles,J.,2004.Ahistoryofspaces:Cartographicreason,mappingandthegeo-codedworld,London&NewYork:Routledge.
Pinder,D.,1996.Subvertingcartography:thesituationistsandmapsofthecity.EnvironmentandPlanningA,28,pp.405–427.
Pinder,D.,2005.Artsofurbanexploration.CulturalGeographies,12(4),pp.383–411.
Turnbull,D.,1996.CartographyandscienceinearlymodernEurope:mappingtheconstructionofknowledgespaces.ImagoMundi,48,pp.5–24.
3. Cinemas on the Move: A geospatial analysis of the role oftravelingcinemasintheDutchcinemalandscapeJolandaVisser,JuliaNoordegraafandIvanKisjesUniversityofAmsterdam
Theemergenceofthecinemaasanewculturalindustryatthedawnofthetwentiethcenturyhashadasignificantimpactonthesocial,culturalandeconomicinfrastructuresofmodernizingsocieties.Cinema’stechnologicalandculturalinnovation,combinedwitheconomiccompetition,significantlyreconfiguredtheroleandplaceofentertainmentcultureinpubliclife.Besidesbeinganeconomicfactorofimportance,italsohasliterally“takenplace”inurbanandruralinfrastructures,transformingtheorganizationandexperienceofmodernpublicspace.
ThewaysinwhichcinemahastakenplaceinDutchpublicspacehasbeenthesubjectofanumberofstudies.Somefocusonthehistoryofspecificcinematheatresandtheurbancontextinwhichtheyfunction(Visser2012;Noordegraafetal.2016).Othershaveinvestigatednationalandlocalcinemanetworksandfocusedontheorganizationandeconomicsoftheindustry(Dibbets1980&2006;Oort2016).Yetotherstudiesfocusedonthewaysinwhichmoviesreachedtheiraudiencesandhowthiscorrelateswithspecificreligiousandideologicalorientations(BoterandClaraPafort-Overduin:2009),orstudiedthepopularityofcertaingenresorstars(VanBeusekom2013).Inaddition,acomprehensivedatabasehasbeencreatedthatfacilitatesdata-drivenresearchonnationalDutchfilmculture.6
Atthesametime,though,thestudyoftheroleofcinemainmodernpubliclifehasfocusedprimarilyonurbancontexts.WhenplottingthelocationsofcinemasfromtheCinemaContextonamap,it
6 www.cinemacontext.nl
6
appearsthatthemajorityofcinemasislocatedinurbanizedareas.Infact,therewerecinemascreeningsinlessurbanizedareasaswell;thosewerefrequentedbytravelingcinemas.Atpresent,theroleandimpactofthesetravelingcinemasinDutchcinemacultureremainsentirelyunknown.Inthispaper,wepresenttheresultsoftheveryfirststudyoftheimpactoftravelingcinemasonDutchfilmculture.Usingacombinationofnetworkandgeospatialanalysissoftware,thepapercontributes:1.newinsightsintothewaycinemaasaleisureindustrycontributedtotheshapingofmodernDutchidentity;and2.areflectionontheaffordancesandlimitationsofGISandnetworkanalysistoolsfor(cinema)historicalresearch.
CentralQuestion
OurresearchaimstoestablishtheroleandplaceoftravelingcinemasintheDutch,post-WWIIcinemalandscape.Whatwastherelationbetweenthepermanentandtravelingcinemas,intermsofgeographicaldistribution,marketshare,anddistributionandexhibitionpractices?Inordertoanswerthisquestion,weapproachtheDutchcinemalandscapeasanetworkwithsocio-economic(distribution,consumption)andcultural(programming)dimensions.Inordertoanalysethisnetwork,wecombineageospatialanalysisofthenetworkofpermanentandtravelingcinemasandowners/exhibitorsinTheNetherlandsin1949withanin-depthcasestudyofoneparticularsectionofthismarket.Thiscombinationallowsustocombineamacrosocialanalysisoftheroleoftravelingcinemasinthenationalcinemamarketwithananalysisofthecontextualfeaturesthatexplaincausalityinonespecificcase(Ragin1987).
MethodFortheresearch,weadoptedatwo-tieredapproach.First,weextendedthedataonthelocationofpermanentcinemasandtheirownersintheCinemaContextdatabasewithnewlyassembleddataontheplacesfrequentedbytravelingcinemas.Then,wemappedthesecinemasaccordingtotheirtypologies,distinguishingbetweenpermanenttheatres,theatreswithoccasionalscreeningsandtravelingcinemasinQGIS.ThisresultedinageospatialanalysisoftheorganizationoftheDutchindustrythat,forthefirsttime,includesdataontravelingcinemas.
Second,thenetworksofcinemaexhibitorsofpermanentandtravelingcinemashavebeenanalyzedbyprocessingthedataontheatresandowners/exhibitorsinGephi.Theresultinggraphallowedustoacknowledgetheinfluenceofcinemachainsaswellasindividual,non-networkedentrepreneurs.ByprojectingthesedataonhistoricalmapsinQGis,wecouldcomparethegeographicaldistributionofdifferenttypesofcinemaswiththenetworkofcinemaowners/exhibitors.Weidentifiedanumberofclusterswherepermanentcinemasandmobilecinemaswererelatedandusedthisanalysistoselectonecaseforfurther,in-depthanalysisoffilmflowswithinacinemachainwithatravelingdepartment.TheselectedcasestudytracksthefilmflowsofthecinemachainofJoh.MiedemaandhiscompetitorsintheNorthernprovinceofFrieslandin1949.
Results
Some of the data sets used already existed (Cinema Context database), some had to be digitized partly (census data) and some had to be created (film programming, traveling cinema locations and screenings). In the first phase of the project the data of the cinemas and the networks of cinemas were combined. The first results showed the geographical networks of Dutch permanent cinemas in relation to the network of owners/exhibitors. In general, as also shown by Dibbets (1980), one can conclude that half of the cinemas belonged to a cinema chain, leaving the other half as isolates.
After adding the mobile cinema networks, we identified a clear geographical distribution for exhibitors of a cinema chain with a traveling department, among others in the provinces Friesland and Drenthe. The selected case study focused on the network of Joh. Miedema in
7
Friesland, which comprised 10 permanent cinemas surrounded with places he claimed for his mobile department. It appears he used these mobile screening locations for constructing a buffer zone around the permanent cinemas in his chain, to ward of competition from other owners in the region. Reconstructing film programming practices within that network and comparing that to those of his competitors in the province of Friesland in 1949 provides new insights in the economics of a cinema chain with a traveling department, the socio-economic and cultural context of these various sites visited, and patterns of taste. Based on the first results of this research, the benefits and pitfalls of the combined use of Gephi and QGIS will also be evaluated.
ReferencesBeusekom,Ansjevan.“Distributing,programmingandrecyclingAstaNielsenfilmsintheNetherlands,1911-1920.”InImportingAstaNielsen:Theinternationalfilmstarinthemaking1910-1914,editedbyMartinLoiperdinger&UliJung,259-272.NewBarnet,HertsUK:JohnLibbey/KINtop,2013.
Boter,Jaap,andClaraPafort-Overduin.“CompartementalisationandItsInfluenceonFilmDistributionandExhibitioninTheNetherlands,1934-1936.”InDigitalToolsinMediaStudies:AnalysisandResearch:AnOverview,editedbyMichaelRoss,ManfredGrauer,andBernhardFreisleben,55–68.Bielefeld:TranscriptVerlag,2009.
Dibbets,Karel.“BioscoopketensinNederland:Economischeconcentratieengeografischespreidingvaneenbedrijfstak,1928-1977.”Doctoraalscriptie,UniversiteitvanAmsterdam,1980.online:http://kd.home.xs4all.nl/home/Karel%20Dibbets%20%20Bioscoopketens%20in%20Nederland%201980.pdf
Dibbets,Karel.“HetTaboevandeNederlandseFilmcultuur:NeutraalinEenVerzuildLand.”TijdschriftVoorMediageschiedenis9,no.2(2006):46–64.
Hallam,Julia,andLesRoberts,eds.LocatingtheMovingImage:NewApproachestoFilmandPlace,2014.
Horak,Laura.“UsingDigitalMapstoInvestigateCinemaHistory.”InTheArclightGuidebooktoMediaHistoryandtheDigitalHumanities,editedbyCharlesRAclandandEricHoyt,65–102.Falmer:ReframeBooks,2016.
Noordegraaf,Julia,Opgenhaffen,Loes,&Bakker,Norbert.“CinemaParisien3D:3DVisualisationasaToolfortheHistoryofCinemagoing”.Alphaville,11(2016):45-61.
Oort,Thunnisvan.“IndustrialOrganizationofFilmExhibitorsintheLowCountries:ComparingtheNetherlandsandBelgium,1945–1960.”HistoricalJournalofFilm,RadioandTelevision(March17,2016):1–24.Onlinefirst:http://dx.doi.org/10.1080/01439685.2016.1157294
http://dx.doi.org/10.1080/01439685.2016.1157294
Oort,Thunnisvan.“‘ComingupThisWeekend’:AmbulantFilmExhibitionintheNetherlands”.(Forthcoming).
Ragin,CharlesC.TheComparativeMethod:MovingbeyondQualitativeandQuantitativeStrategies.Berkeley,CA:UniversityofCaliforniaPress,1987.
Visser,Jolanda,SamennaarTheMovies–100jaarBioscoopopdeHaarlemmerdijk161,TheMoviesArtHouseCinemasandFilmDistributionAmsterdam:2012.
8
SessionB
1. Soft skills inhardplaces: the changing faceofDH training inEuropeanresearchinfrastructuresJenniferEdmond,TrinityCollegeDublinVickyGarnett,TrinityCollegeDublin
Researchinfrastructuresarebecominganincreasinglydistinctpresenceinthelandscapeofthedigitalhumanities,creatinguniqueresearchecosystemsthatinteractwith,butremaindistinctfrom,thetraditionaluniversity-basedones.Itisaresearchsectorstillverymuchintheprocessofdefiningitself,however,inparticularintheartsandhumanities,notonlyintermsofhowexactlyinfrastructuressupportresearchbutalsointermsofhowawordwithsuch“hard”connotations(conjuringupimagesofroadsandbridges)canencompassthemany“soft”resourcesandskills,fromdatatoknow-how,thatwenowrecogniseasapartofinfrastructuralprovisionforresearchinEurope.Thistensionisalreadyinhowresearchinfrastructureisdefined,withsomecampspreferringtofallbackonlonglistsofelementsinfrastructuremayormaynotcomprise,suchasdata,servicesandtools,whileothersremainmoretheoretical,placingthemintheroleof“mediating”(BadenochandFlickers,2010)or“belowthelevelofthework”(Edwardsetal..,2012).Regardlessofhowweconceptualiseit,however,infrastructureisundeniableasarisingpresence,withagrowingimpactonhowresearchisconceptualisedandcarriedout,howresearchresultsarecommunicatedandshared,andhowthepotentialscaleofahumanitiesprojectcanbeconceptualised.
Thereisoneelementinthislandscapeofchangethathassteadfastlyremainedbasedwithintheuniversities,however:thatisthemannerinwhichnewgenerationsofresearchersareformed,throughtrainingandeducation.Someofthereasonsforthislieintheneedforspecialisedprocedures,staff,resourcesandexpertisetodeliverformaleducationalprogrammes,alayerofprovisionthatresearchinfrastructuresseldomhave.Indeed,itisthelackofthislayerthatmostdistinctlydifferentiatesactivitiesoftheresearchinfrastructurefromthoseofthemorefamiliaracademiccontext.Aswecontinuetodevelopourunderstandingofwhatitmeansto‘teach’thedigitalhumanities(eg.Fyfe,2011,Hirsch,ed,2012,orBellamy,2012),however,weneedalsotoreconsidertheutility,responsibilityandpotentialcontributionsofotheractorsthanuniversitiesinthisprocess,andhowweintegratethemintorecognisedlearningpathways.Itisnotinfrastructuresdonotoffertrainingopportunities,justthattheparadigminformingmuchofthistraininghashistoricallybeenfoundeduponamorenarrowconceptualisationoftheaddedvalueoftheinfrastructuralspaceforcreatingandsharinguniqueknowledge.Assuch,projectsandplatformswouldtraditionallycreatematerialstoassistusersapproachingspecifictoolsdevelopedorhostedbytheinfrastructure,servingaverynarrowconceptualisationoftheuserandhisorherneeds.
Therehasbeenanincreasingnumberofexamplesoftheinfrastructuralcommunityexpandingtheiractivitiestofillspaceslesseasilyaddressedbytraditional,formal,course-andinstitution-basedtrainingcontexts,however.Hands-ontrainingwithspecificcollectionsorobjects,orusingtransnationalaccesstobuildskills,forexample,aremechanismsthathavebeendevelopedtogreateffectbyinfrastructures,ashasthemodelofpartneringwithotherorganisationstodelivercredit-bearingprogrammes.Thesearemechanismsthathaveariseninpartbecauseoftheopportunitiesthatexist,forexample,whenresearchersworkincloseproximitytospecificscientificinstruments,asinthefieldsofculturalheritageandpreservation,buthavealsoarisenasaccidentsofdesign.Manyresearchinfrastructurefundingschemesincludefixedelementsdrawndirectlyfromthelongertraditionofinfrastructuredevelopmentinthefieldsofscienceandtechnology,mechanismsthatdonotnecessarilyfithumanitiesmodesofworkorinteraction.
9
Evenassuchprogrammesremainedlargelyadhocextensionsoftheoriginatinguser-supportmodeloftraining,theyexposedthepotentialofresearchinfrastructuresnotonlyasplacesthatsupportresearch,butwhereuniqueknowledgewasbeingcreated,andwherethisknowledgecouldandshouldbeshared.Thedevelopmentofatheoreticalunderstandingofthestrengthsoftheresearchinfrastructure,whatknowledgetheycontributetodigitalhumanities,andhowthisknowledgecouldbemoresystematicallysharedhasbeenaprimarygoalofthetrainingprogrammeofthePARTHENOS(PoolingActivities,ResourcesandToolsforHeritagee-Research,OptimizationandSynergies,http://www.parthenos-project.eu/)clusterproject,itselfacollaborationbetweenanumberofresearchinfrastructuresandtheiraffiliatedprojects.
Asaninfrastructurecluster,PARTHENOSischargedwithdeepeningunderstandingofwhatinfrastructureisandhowcommonactivitiescanbebetteralignedformaximalbenefittoresearchersbetweenthecommunitiesthathavebuiltlandmarkresearchinfrastructuresatEuropeanlevel.ThePARTHENOStrainingframeworkseeksfirstandforemosttomakeadistinctionbetweenresearchworkthatdoesandthatdoesnotengagewithdataandserviceinfrastructuressuchasthePARTHENOSpartnersrepresent.Atthenextlevel,theframeworkseekstoaddressthedigitalhumanitiesnotonlyasasetofdomains,butalsoasasetofrolesandactors,followingupontheworkoftheDigCurvproject(http://www.digcurv.gla.ac.uk/).Byreconceptualisingadidacticsystemfromthefirstprinciplesofwhomightneeddigitalinfrastructureandwhattheymightneedtoknoworbeabletodo,PARTHENOShasbeenabletocreatebespoketrainingmaterialsthatdrawfromtheuniquesexperienceswithinresearchinfrastructuresandtheuniqueknowledgetheycreate.Thematerialsexistwithinasimplebutevolvingframework,addressingexperiencelevelsfromthenovice(forexample:“WhatisanInfrastructure”),totheintermediate(forexample:“ManagementChallengesinResearchInfrastructures”)andadvanced(forexample:“IntroductiontoInfrastructuresasCollaborations”)levels.Modulesaredesignedtobuildbridgesbetweenpotentialusersandtheentirecontextoftheresearchinfrastructureandhowtheyoperate,answeringfundamentalquestionsaboutwhatresourcesareavailableandhowtheyoperate,throughtomuchmorefundamentalexplorationsoftheopportunitiesandchallengesthatexistinthisenvironment,issuesthatevenexpertpractitionersstruggletodefineandaddress.
ThepaperwillembedapresentationofPARTHENOS’sworkinatheoreticaldiscussionoftheroleofresearchinfrastructuresinthedevelopmentofskillsandcareersinthedigitalhumanities.Itwillgiveanoverviewofsomeofthepracticalinterventionstheprojecthasmadetoaddressthethornyissuesofdevelopingtrainingandeducationprogrammesoutsideoftheacademy,includingawarenessraising,foresightwork,embeddinginhighereducation,partnershipsandaccreditation.Workinginconcertwithitsconstituentpartners(theDARIAH,CLARINandE-RIHsResearchInfrastructures,aswellastheirpartnerprojects,suchasCENDARI,EHRI,ARIADNE,andIPERIONCH),thePARTHENOSteamistestingthepotentialforinfrastructuralknowledge,foritstransmissionasmaterialsforself-directedusebyindependentlearnersandtrainers,andforitscapacitytobeintegratedintheprogrammesofuniversitiesandprofessionalorganisationsalike.ThroughthisprogrammeofengagementPARTHENOSwillnotonlybringanextendedhorizonfortrainingtoresearchinfrastructuresandtheirusers,buttoallofdigitalhumanities.
ReferencesBadenoch,A.,andA.Fickers,MaterializingEurope:TransnationalInfrastructuresandtheProjectofEurope(PalgraveMcMillan,2010)
Bellamy,Craig,‘TheSoundofManyHandsClapping:TeachingtheDigitalHumanitiesthroughVirtualResearchEnvironment(VREs)’,DigitalHumanitiesQuarterly,6(2012)
Edwards,PaulN.,Knobel,CoryP.,Jackson,StevenJ.,andBowker,GeoffreyC.,UnderstandingInfrastructure:Dynamics,Tensions,andDesign<http://hdl.handle.net/2027.42/49353>[accessed16November2012]
10
Fyfe,Paul,‘DigitalPedagogyUnplugged’,DigitalHumanitiesQuarterly,5(2011)
Hirsch,BrettD.,DigitalHumanitiesPedagogy:Practices,PrinciplesandPolitics(Cambridge:OpenBookPublisher,2012)<http://www.openbookpublishers.com/product/161/digital-humanities-pedagogy--practices--principles-and-politics>[accessed7April2017]
2.Ranke.2-HowtoGetDigitalSourceCriticismontheTeachingAgendaStefaniaScagliola-C2DH–CentreforContemporaryandDigitalHistoryUniversityofLuxemburg
AbstractThetermRanke.2referstotheneedtoreassessLeopoldvonRanke’smethodforhistoricalsourcecriticism,inthelightoftheimpactofdigitizationandtheworldwidewebonthepositionofthearchiveandthecraftofthehistorian.Itisalsotheproposedtitleofaplatformforlessonsondigitalsourcecriticism,aprojectthatisbeingdevelopedattheCentreforContemporaryandDigitalHistoryattheUniversityofLuxemburg.
Whileanumberofscholarshavesuccessfullyaddressedvarioustheoreticalandepistemologicalimplicationsofthedigitalturnforthehistoricalcraft,littleisknownabouthowthissubjectisdealtwithintherealmofteaching.ThispaperpleadsforanassessmentoftheconceptofDigitalSourceCriticismfromtheperspectiveofDigitalHumanitiesPedagogy.ItstartsoffwithsomereflectionsonwhyandhowRanke’sconcepthastobereconsidered.Thenitdiscusseswhethersourcecriticismcanstillberegardedasaspecifichistoricalmethod.Thethirdsectionofthepaperisanaccountofasmall-scaleexplorationamonghumanitiesscholarsinvolvedinteachingatthehumanitiesfacultyoftheUniversityofLuxemburg.Theywereaskedtosharetheirunderstandingofhowdigitalsourcecriticismshouldbetaught.ThepaperconcludeswithapleaforaintegratingsmallscaleDHinterventionsintothetraditionalhistoricalcurriculum.
‘Everythinghaschangedandeverythinghasstayedthesame’Withthearrivalofdigitally-based‘fakenews’andtheinabilityofsectionsofthepublictodistinguishitfromthe‘realthing’,thevitalimportanceofdigitalsourcecriticismshouldbeevident.Whatislessevidentishowitaffectsthecraftofthehistorian.Historianseducatedinthe21stcenturyarewitnessingtheconsolidationofthe‘digitalturn’withprofoundconsequencesforthehistoricalprofession.TheGermanscholarLeopoldvonRankewasresponsibleforanearlierradicalchangeinscholarlypracticeinthe19thcentury:heintroducedtheso-called‘archivalturn’.Healsointroducedtheconceptofthe‘seminar’andencouragedanewgenerationofaspiringscholarstovisitnumerousarchives,scrutinizeandcomparedocuments,andtracebacktheidentityandmotivesoftheauthorandthecircumstancesunderwhichadocumentcameintoexistence.Rankemadeadistinctionbetween‘external’sourcecriticism,whichfocusesonthecreation,appearanceandallegedorrealauthenticityofasource,and‘internal’sourcecriticism,whichevaluatestheevidentialvaluethatcanbeattributedtoaparticularsource.Thisnewapproachbecamewidespreadandproblematizedthetraditionof‘universalhistories’,basedonbroadphilosophicalconceptsandideasabouttheevolutionofmankind.Rigorousfact-checkingcameinplaceofmyth-making.Ranke’sinnovationinthesecondhalfofthe19thcenturycoincidedwiththeperiodofmodernstateformationandthecreationofnationalarchives.Itgraduallybecamethebackboneofprofessionalhistory,withastrongorientationtowardsthearchiveastheguardianofauthenticityandhistoricalrelevance(RisbjergEskildsen2008).
11
Wenowliveinglobalizedworldwithculturalanddisciplinaryboundariesthatareblurred,withdigitaltechnologythathaspermeatedtheacademicresearchpractice,andwiththeopportunitytocopy,alterandremixdatawithrelativeease.Itthereforecomesasnosurprisethatconcernabouttheorigin,authenticityandvalueofhistoricalsourcesindigitalformisincreasing(JonesandHafner,2012)Howthishasaffectedthehistoricalprofessionandwhatchangesneedtobeintroducedhasbeendiscussedbyseveralscholars(Fickers2012,Sternfeld2014,Zaagsma2014,Föhr2015).Theypleadforacriticalreflectiononthenatureofsourcesindigitalformandforaninvestmentindigitalskillstobeenablestudentsandpractitionerstoapplydigitaltoolsinaprofessionalmannerandunderstandtheirpotential,biasandlimits.Criticalreadingandthinkingarenolongerenoughintermsofsafeguards,buthavetobecomplementedwithamoretechnicalandmathematicalunderstandingofdigitalphenomena.(Scagliola2016)
InadditiontothetraditionalRankianinquiryintothecontextinwhichahistoricalsourcecameintoexistence,twoadditionalprocessesofcreationandpossiblemanipulationneedtobescrutinized.Thefirstinvolvesidentifyingalterationsandlossofcontextthatoccurduringthetransformationfromanalogsourcetodigitalobject.(Fickers2012,Treleani2013).Transparencyshouldbethenorm,astowhowasinvolvedinthechainofdigitization,whatchoicesweremadeandwhattoolswereused.Ifthisisabsent,thescholarmusthaveenoughcontextualandtechnicalknowledgetobeabletoidentifyandreconstructtotheextentpossiblethisgapandevaluatehowthismayinfluencethehistoricalinterpretationoftheobject.
Thesecondprocessrelatestoabetterunderstandingofthealgorithm-basedselectionbiasofsearchenginessincetheseincreasinglydetermineourreferenceframeandhavealsopenetratedacademiclibrarysystems(VanDijk2010,Vaidhyanathan2009).Itlooksasifourearlierdependencyonthepolicyofthenationalarchivewithregardtograntingaccesstodocumentsbasedonnationalsecurityandotherconcerns,hasbeensubstitutedbyoneonthebiggeststakeholdersinsearchtechnology:Google.Themeritsandperilsofalgorithm-basedsearchtechnologieshavebeentheobjectofacademicdebatesandhaveledtoreflectionsontheepistemologyofthedigitalenvironment(Woutersetal2013,Liu2014).However,theseremainlimiteddiscussionsbetweenthe‘usualsuspectswithinthecommunityofDHscholars’.Theydonotseemtomatterenoughtopushforreformingifnotrevolutionizingthecurriculum.
Crap-DetectionorDigitalPhilology?Thequestionwefaceishowtogoabouttoadjustandadapttheclassicalhumanitiescurriculumtotherequirementsof21stcenturyacademicresearch.Wheredowestart?Shouldwemakeadistinctionbetweengeneralacademicdigitalskillsandthosethatarecalibratredforspecificfieldsofresearchsuchashistory?
Whenobservingthelearningsubject‘methodsofresearch’,whichisoftentaughtinthefirstyearofahumanitiesbachelorcurriculum,onegainstheimpressionthatwiththe‘Googlelizationofknowledge’andthemoregeneraldigitizationofinformation(Vaidhyanathan2009)topicsthatinthepastbelongedtodistinctive(sub-)fieldsofresearchsuchascriticalmediastudies,informationscience,literacystudiesandeducationstudiesarenowmoreandmorealike.Thiscallsforarenegotiationofboundariesandspecificationofwhatisdistinctiveabouthistory.
Whenwelookattherealmofeducation,thecallfortrainingyoungpeopleinassessingthetrustworthinessofwhattheyconsultandofwhattheyengagewiththroughsocialmedia,isarecurrentfeature.Therearemanyinitiativesaimingatmakingtheuseofdigitalmedialessdangerousforthenoviceinthefield.(Scanlon2014,Cartelli2013,Bellanca2010)ThewriterHowardRheingoldhasre-introducedHemingway’sjournalisticprinciplesfor‘crap-detection’,andpointstotheimportanceofwebresourcesthatgiveadviseonhowtodetectfalseinformation(Rheigold2013).
12
However,whenstudentsenteracademiawiththeintenttoexploretheworldofhistoricalnarratives,philosophicalconceptsandgeneralculturalheritage,willthepossessionofgeneralcriticalmedialiteracybeenoughtoavoidpitfalls?Itseemsthatsomespecialskillsareneeded.Inadditiontobeingabletodistinguishfakefromreal,theyshouldalsobeabletotracebackthehistoryofthevariousversionsofadocument.Thisphilologicalinquiryinadigitalenvironmentrequiresunderstandingthebackendofadigitaldocumentandsometimesrequiresapplyingforensicsoftwaretodetectthetrailofbinarydigitsthateachmanipulationhasleft.Moreover,Web.2andforthcomingWeb.3technologyalsorequirestudentsandacademicstobeabletoexpresstheirthoughtsandinsightsinotherwaysthenwritingatextintheformofanessay.Therefore,digitalsourcecriticismwhenappliedtohistory,involvesmorethanamerecriticalreadingofdigitalsourcesandwritingofarticlesthatarepublishedonline.Itentailstheactiveapplicationoftoolstotraceanddetectchanges,andtocreatedigitalcontent.Itisnotjustonemoremethodaspartofawiderrepertoireofthehistorian’scraft,itisanewconceptofconductinghistoricalresearch.Thishasseriousimplicationsforwhatneedstobeputinpracticeandconsequencesforitsrelationhiptotheexistingcurriculum.Thishasanestablishedstatuswithengravedsocialpractices,inwhichlecturersareinvolvedwhohaveputeffortinit.Changingthesepracticesrequirespatienceanddiplomacy.
OntheVergeofTransformationPassiveifnotactiveresistanceamonglecturerswhentryingtointroducedigitalmethodsinthehumanitiesisnotuncommon.Thisisoftenseenasbeinganinstinctivereactiontoprotectestablishedpositionsofpowerandexpertise(Scanlon2013,DeJongetall2011).Fearfornewtechnologiesanddistrustofrosypromisesaboutwhatsuchtechnologiescando,alsoplayarole.Anotherobstructiveelementcanbetherigidorganizationalstructureoftraditionalacademicteaching,thatisbasedonthetimespanoflecturesofjustoneortwohours.Thishardlyleavesspaceforlearningnewskillsletaloneexperimenting.(HendersonandRomeo2013).
ToexplorethespaceforthesubjectofDigitalSourceCriticismattheFacultyofHumanitiesoftheUniversityofLuxemburg,asmall-scaleuserstudywasconducted.7TheFacultyisasalientenvironmentfortestinginterestinDigitalSourceCriticism,asitisexperiencingconsiderableinstitutionalchanges.AsofOctober2016,thenewCentreforContemporaryandDigitalHistoryhasbeenestablished,thatwilltakeupinnovativeresearchandteachinginclosecollaborationwithitsformerbasis,theInstituteofHistory.
ThefirstpartoftheuserstudyconsistedofapresentationoftheenvisionedformatforlessonsonDSCduringthemainmeetingoftheInstituteofhistory,followedbyasurvey.
Theplanistocreateanappealingvideoessayaroundaparticulardatatypeinwhichthedigitalversionisproblematizedandcomparedtoitsanalogversion.Subsequentlystudentshavetoreadliteratureandconductresearch,andfinallycreateadigitalpublicationorobjectwithasimilartypeofdatawiththehelpofdigitaltools.Thesurveytocollectfeedbackonthisformatwassetoutto40colleaguehistorians,amixofprofessors,lecturersandPh.D.students.Thisyieldedninebenevolentresponses,whichallstressedtheimportanceofthetopic,butalsotheexistinglimitationstointegrateitintotheirlessons,duetolackofexpertiseandtime,andofspacewithinthelimitsoftheprescribedICTS.
Thenextstepwastoorganizefocusgroupswithcolleaguesfromthenewcenter.Fourmeetingswereheldwiththreetofourparticipants,amixofjuniorandseniorcolleagues.Inaddition,afewface-to-faceinterviewswereheld.Thebackgroundoftheparticipantsvaried,mostofthemwere
7The consultation of lecturers is work in progress; it should be completed in the coming months and should yield a more solid foundation for designing and realizing Ranke.2, the new teaching platform on Digital Source Criticism.
13
historians,amongwhichmediastudieswasoverrepresented.Specialweightwasgiventothefeedbackofaninformationscientistandoftwohistoriansspecializedindigitalmethods,allthreewithampleteachingexperience.Again,theywerefirstshownthepresentationontheidealtypicalformatoftheDigitalSourceCriticismlesson,afterwhichthreemainquestionswerepresented:
I. Inwhatwayisdigitalsourcecriticismrelevantforyourresearch?II. Whatdoyouregardasnecessarydigitalskillsforstudents(basic,academic,specificfor
historians);III. Whatwouldyouchoosetointegrateinyourcourses,thevideoessay,theassignments,the
hands-oncomponentoracombination?
Thefeedbacktothepresentationandquestionswasinmostcasesrecordedandlatertranscribed.Inafewcasesnoteswerejotteddownduringtheinterview.Themostsalientconcernsandpreferencesthatcameoutoftheconsultationsaresummarizedbelow:
-Thelevelofdigitalliteracywhenenteringtheuniversity
Thelevelofcompetencesistoodiversebecauseoflackofsystematiccoverageofthetopicinsecondaryeducation.Anentrancetestshouldbeconsideredtobeabletocoverthegapswithindividualtrainingunits.
-LimitedTime.
DigitalLiteracyandcompetencestodealwithdigitaldata,arebesttaughtincollaborativeprojectsthattakeuptimebecauseoftheneedtoteachskills.Thinkofhowmuchtimeittakestolearntowriteaccordingtoacademicstandards.Atthesametime,lecturersofthematiccoursesconsiderdigitalsourcecriticismasatopicthatbelongstothesubject‘researchmethods’-asubjectwithalimitedamountofhoursinthecurriculumwhichisofferedonlyonce,mostofteninthefirstyearofabachelor.Mostteachingisthematicandnotaboutmethods.
-The‘branding’ofthetermDigitalSourceCriticismisproblematic
Creatingaspecialtermforthistypeofsourcecriticismsuggestsitisadifferentandnewpractice.Alecturerof‘methodsofresearch’suggestedtousethegenerictermSourceCriticism,thatcanbeappliedtoanysource,regardlessofwhetheritisananalogueordigitalform.
-Thereisaneedforcontinuityinthe‘framing’oftheproblem.
Somelecturersofmediastudiesstatedthatgivingtoomuchattentiontothetransformationfromanalogtodigital,riskstoobscurethemanytransformationsandmanipulationsthatalreadyoccurbetweenanalogmedia(e.g.intheprocessofeditingofnewsreel).Theyprefertoframethesubjectinamoregeneralway,e.g.‘reflectingontransformations’.
-ThemajorityofresearchersandPhDworkwithnon-digitizedsources.
Takingintoaccounthowmanylecturersandresearchersworkwiththematicsubjectsandwithdataandliteraturethatisnotdigitized,itwouldbedisproportionatetoplaceDigitalSourceCriticism,amethodologicaltopic,asacentralsubjectonthecurriculum.Theprincipleof‘hybrid’researchculturesshouldbeemphasizedasitconnectsbettertothedominantteachingpractice.
ConclusionToaddresssuchconcernsasmartcommunicationstrategyshouldbeconsideredinwhich‘digitalsourcecriticism’ispresentedasa‘hybridconcept’thatencompassesbothdifferencesandcontinuitiesindealingwithsourcecriticism.Whatcouldbeconsideredistosubstitutetheprincipleofaseriesoflessonsthatwouldtakeupmuchofthetimeinthecurriculum,withsmallerteachingunitswithadigitalcomponent.Thesecouldbecomplementaryinathematiccourse,andmorecentralinamethodologicalsubject.Awaytosupportthisapproach
14
wouldbetofollowthepedagogicalprincipleoftheSAMRmodel,whichstandsforSubstitute,Augment,Modify,Redefine.Itwasdesignedtograduallyintegratetechnologyintothecurriculum(Puentedura2014).Theprocessstartswithfirstmerelysubstitutingtasksthathavetobecompletedmanuallywithatechnology,andthengraduallyaddingtechnologicalcomponentstofamiliarizenewuserstothepossibilitiesthattheyoffer.Theoutcomeofthisgradualprocessshouldleadtoaredefinitionoftheoriginaltask.
ThisSAMRmodelapproachiscurrentlybeingconsideredasaninstrumenttorealizetheenvisionedtransition.Atthesametime,however,masterandPhDstudentswillbeimmersedinintensiveDHcollaborativecourseswithexperimentalcomponentsatthenewcentre.
Thepolicyofcombininggradualchangewithimmersiveandexperimentallearningcouldbethesolutiontocreateacommongroundamongdifferentgenerationsofhistoriansandfuturegenerationsofstudentsofhistory.
ReferencesJamesA.Bellanca(2010),21stCenturySkills:RethinkingHowStudentsLearn,SolutionTreePress.Seealso:http://www.p21.org/about-us/our-history
CatherineFrancisBrooks(2016).‘Disciplinaryconvergenceandinterdisciplinarycurriculaforstudentsinaninformationsociety’.In:InnovationsinEducationandTeachingInternational,http://www.tandfonline.com/toc/riie20/current
AntonioCartelli(2013),(ed)Fostering21stCenturyDigitalLiteracyandTechnicalCompentency,InformationScienceReference.
JoseVanDijck(2010),Searchenginesandtheproductionofacademicknowledge.InternationalJournalofCulturalStudies,13(6).doi:10.1177/1367877910376582.
AndreasFickers(2012)‘TowardsANewDigitalHistoricism?DoingHistoryintheAgeofAbundance.’VIEWJournalofEuropeanTelevisionHistoryandCulture,1(1).
PascalFöhr,"Poster‚HistoricalSourceCriticismintheDigitalAge‘,"HistoricalSourceCriticism,31.März2015,http://hsc.hypotheses.org/328..
MichaelHenderson,andJeoffRomeo(2016),TeachingandDigitalTechnologies:BigIssuesandCriticalQuestions:CambridgeUniversityPress.
RodneyH.JonesandChristophA.Hafner(2012),UnderstandingDigitalLiteracies;aPracticalIntroduction,Routledge.
DeJong,Ordelman,Scagliola,Audio-visualCollectionsandtheUserNeedsofScholarsintheHumanities;aCaseforCo-Development,ProceedingsofSupportingDigitalHumanities,2011,Copenhagen.http://files.beeldengeluid.nl/pdf/r-en-d_audio-visual-collections-and-userneeds_dejong-ordelman-scagliola_20111117.pdf
AlanLiu(2014)“ThesesontheEpistemologyoftheDigital:AdviceFortheCambridgeCentreforDigitalKnowledge.”http://liu.english.ucsb.edu/theses-on-the-epistemology-of-the-digital-page
RubenPuentedura(2014),SAMRandTPCK:AHands-OnApproachtoClassroomPracticehttp://www.hippasus.com/rrpweblog/archives/000140.html
HaroldRheingold(2013).http://rheingold.com/2013/crap-detection-mini-course/retrieved1-5-2017.
KasperRisbjergEskildsen,‘Leopoldranke’sarchivalturn:locationandevidenceinmodernHistoriography’,ModernIntellectualHistory,5,3(2008),pp.425–453C_2008Cambridge.doi:10.1017/S1479244308001753
15
EileenScanlon,E.(2014),Scholarshipinthedigitalage:Openeducationalresources,publicationandpublicengagement.BrEducTechnol,45:12–23.doi:10.1111/bjet.12010
MatteoTreleani(2013),‘Recontextualisation;cequelesmédianumériquesfontauxdocumentsaudiovisuels’,in:Réseaux,1,(no177)http://www.cairn.info/publications-de-Treleani-Matteo--99590.htm
StefaniaScagliola(2016),DigitalSourceCriticisminthe21stCentury:ReconsideringRanke’sPrincipleintheDigitalAge,blogDigitalHistoryLab,August2016.http://www.dhlab.lu/blog-post/digital-source-criticism-inthe-21st-century-reconsidering-rankes-principles-in-the-digital-age/
JoshuaSternfeld(2014),‘HistoricalUnderstandingsintheQuantumAge’,JournalofDigitalHumanities,Vol3,nr.2,http://journalofdigitalhumanities.org/3-2/historical-understanding-in-thequantum-age/
SivaVaidhyanathan(2009),‘TheGooglizationofUniversities’,in:TheNEA2009AlmanacofHigherEducation,2009http://www.nea.org/assets/img/PubAlmanac/ALM_09_06.pdf
PaulWouters,AnneBeaulieu,AndreaScharnhorstandSallyWyatt(2013)(eds),VirtualKnowledge;ExperimentingintheHumanitiesandtheSocialSciences(Eds.)
GerbenZaagsma,‘OnDigitalHistory", BMGN - Low Countries Historical Review 128/4 (2013)3-29.
3.Individualpresentation:VideoessaysandthenewpossibilitiesforfilmcriticismandpedagogyIrinaTrocan,CinemaandMediaPhD,NationalUniversityofFilmandTheatreBucharest
Theshiftoffilmcriticismtotheonlinesphereinrecentyearshasledtoanumberofmutations,includingtheincreaseinpopularityofarelativelynewformat:thevideoessay.Roughlyanaudiovisualversionoffilmcriticism-amodeofanalysisthatemploysthediscussedobject(thecinematicwork)directly-,thevideoessayquotesthefilmevenasitdeconstructsit.Itcanthereforebeeasiertograspwithoutnecessarilybeingsimplifiedasdiscourse–aseven-minuteclipcanbeasrichandthoughtfulasalongformessay–andallowsforthesurvivalofintelligentfilmcriticisminaratherdyslexicculturalenvironment.
Theaimofthispresentationistosummarizethecurrentstateofvideoessaysandtheiraestheticanddidacticpossibilities.In2017,thehistoryofvideoessaysissimultaneouslytooshortandtoolong.Sincetheformisroughlyadecadeoldinpopularview,inordertodiscernitsinfluences,onewouldhavetolookbeyondthepracticeitselftoexamineeitherthemoretimeworntraditionofessaycinema–thenon-narrativefilmsofChrisMarker,Jean-LucGodard,HarunFarocki–ortheaudiovisualhistoriesandTVbroadcastsonthesubjectofcinema–MarkCousins'TheStoryofFilm:AnOdysseyorAPersonalJourneywithMartinScorsesethroughtheAmericanCinemabeingpopularexamples.However,adecadeofvideo-essay-makingisalsolongenoughfortheformtohaveexperienceditsfirstmomentsofcrisisandforattemptstotheorizeittobecomeincreasinglydifficultanddangerouslyreductive.Forinstance,videoessaysmadecca.2014wereproblematicintheirover-relianceonvoice-over(i.e.audiocommentaryoftheauthoroverlappedwiththeimages),whereasin2017,beingaimedatsocialmediadistribution,severalofthemadopttheirrelevant/mutedaudio,text-on-screenformat,thusplacingalltheweightonthevisualcomponent;facedwiththenewerpattern,commentershavegonefrompleadingforlessvoice-overtoaskingformoreofit.Thisconstantlychangingmedialandscapemakesiturgenttodevelopstrategiesfor
16
aestheticevaluationandcurationofvideoessays–otherwise,theoverproductionofonlinecontentwillobscurethebestonesandthemoreprovocativepossibilitiesoftheform.
Essential(thoughunderstated)productionguidelinesDuetotheirabilitytoquotefromfilmwithnoneedofprocessingitintoanewlanguage,popularvideoessaysareoftenmadefromimmediatelystrikingfragments:strikingfilmimagery(asinStanleyKubrickfilms),dialogues(AaronSorkin-scriptedone-liners),orevenblatantjuxtapositions(comparingtwostylisticallysimilarfilmsinasplit-screen,withtheaimofprovingjusthowmuchthelaterfilmborrowsfromtheearlier,usuallycanonicone).However,theirrangeofsubjectslargelyoverlapswiththatofcinephile/poponlinecriticism:overviewsofacertainartist'sfilmography,acertaingenre,filmfestival,nationalcinema,trendoftechnicalevolutioninfilmcraft.
Therearealreadyafewprominentplaformsforlaunchingvideoessays,whichprovidevideo-essayistswithopportunities(on-the-jobtraining,accesstoneccesarymedia)evenastheysometimeslimittheircreativeoptions.Thefirstandalreadymostcontroversialisthevideo-on-demandplatformFandorwithitsannexedpublication,Keyframe;othersaretheBFI/Sight&Soundwebsite;theNetherlands-basedplatformFilmkrant;MUBI(alsoannexedtoaVODplatform),andthemostacademic-oriented,[in]Transition(whichismoresimilartoadistributorthanaproducer,toborrowterminologyfromthefilmmarket).
Whiletherearealsovideo-essayist'superstars'withdistinctivestyles,forthesakeofbrevity,Iwillonlyfocusontheinstitutionalguidelineswhichtheymustfollow.Studyingtheseauthors'workoverseveralyearsprovesthat,eveninthisseeminglylaxworkingprocess,shiftingeditorialdemandscanhaveasignificantimpactonwhattheyproduceandhowwidelyitcirculates.Iwouldfurtherarguethattheformativetrainingofthevideo-essayists(whethertheyarefilmmakers,critics,academics)isitselfonlypartlyrelevanttotherigororwhimsicalityoftheirvideographiccriticism.Althoughtheformatisinrapiddevelopmentandexpansion,andmakingavideoessayishypotheticallyaccessibletoanyonewhoownsacomputerandeditingsoftware,hierarchiesandmandatorystylemarkerscaneasilybetracedamongthemostwell-knownvideoessaysmadetodate,whichonceagainindicatesthatthetotalcreativefreedomoftheInternetismerelyautopiandream.
ChallengestothedevelopmentofvideoessaysThedifficultiesofthisnewformtendtobepragmatic,sincethevideoessaysdependonveryprecariousfactors.Thefirstistheirsurvivalandcontinuedavailabilityintheonlinesphere,whichtherecentFandorscandal-involvingthewithdrawalofseveralhundredvideoessays-hasprovedtenuous.Thesecondisthelegalcircumstanceoftheirrighttoexist,namelytheFairUsecopyrightexception:thisstatesthatclipsofartworkcanbeusedbyindividualswithoutpermissionandcopyrightownershipaslongastheultimatepurposeisdifferentfromthestraightforwardexploitationofthematerial.AnoteonFairUseinthebrochureTheVideographicEssay:CriticisminSoundandImageendswithadisclaimerthattheymerelyofferpeeradvice–theyarenot,nordotheyclaimtobe,lawyers.
VideoessaysasstudymaterialAmongthemostremarkablefeatsofvideoessaysisthepopularizationoffilmastheory–oraudiovisualthinking.AsVolkerPantenburgpointsoutinhiscomparativestudyofFarockiandGodard,theoryhasthusfarbeenpredominantlylinguistic,evenwhenitisself-reflexiveandproposesabreakwiththedominant“amalgamofstructuralism,Lacanianpsychoanalysis,post-structuralism,andMarxism”.AsPantenburgputsit,“writingagainstthefilmtheoriesofthe1970scontinuestoassumeacleardistinctionbetweenthefilmsontheonesideandtheiranalysisandtheorizationontheother.”
17
Similarly,inhis2012essayVisualizationMethodsforMediaStudies,LevManovichcouldbetalkingaboutvideoessayswhenusingtermslike“collectionmontage”andclaimingthereisafutureinvisualizationofmediaartifactswhengroupingthembyintrinsic,yet-unarticulatedfeatures:“themostimportantquestion,whichisstillunresolved,ishowtocombinedistantandclosereadings”.Forthis,videoessayscouldbeapowerfultoolofscholarshipandamorecomplexwayofconveyinginformationthanwrittenlanguage.
BibliographyEricFaden,CatherineGrant,KevinB.Lee,JasonMittell,TheVideographicEssay:CriticisminSoundandImage,caboosebooks
Pantenburg,Volker,Farocki/Godard:FilmasTheory(FilmCultureinTransition),AmsterdamUniversityPress2015
Wees,WilliamC.(1993),RecycledImages:TheArtandPoliticsofFoundFootageFilms,AnthologyFilmArchives
Manovich,Lev(2001),TheLanguageofNewMedia,MITUniversityPress
Manovich,Lev(2012),MuseumWithoutWalls,ArtHistoryWithoutNames:VisualizationMethodsforHumanitiesandMediaStudies,manovich.net
Witt,Michael(2013),Jean-LucGodard,CinemaHistorian,IndianaUniversityPress
18
SessionC
1. The Pyramid of Conscientious Digital Humanities Research:howtogeta‘generalideaofwhatyoushouldbeseeing’SergeterBraake,UniversityofAmsterdam
‘Theonlywaytoknowifyourresultsareusefulorwildlyoffthemarkistohaveageneralideaofwhatyoushouldbeseeing.’8
Thequestionhowtocopewithamassivenumberofdigitalhumanitiestexts,andthetoolstoprocessthem,hasledtopublicationson‘algorithmiccriticism’,‘toolcriticism’and‘datacriticism’.Whatthesepublicationshaveincommonisthequestforaconscientiouswaytodealwithtoolsanddata,balancedwiththehumanistdomainknowledgeandmethodologies.9Humanitiestextscanbepoemsthatwerewrittenafterasuddenburstofinspiration,wellcraftedtextsonthehistoryofanempire,themostinnerthoughtsofadiarywriterorconscientiouslycraftedbookkeepingaccountsoflonggonerulers.ThefieldofDigitalHumanitiestendstotreatthesetextsquitebadly.Textsarerippedoutoftheiroriginalcontexts,choppedintopieces,linkedtoothertexts,andusedforanalysesthatgofarbeyondtheiroriginalintentions.
Dependingontheresearchquestionoftheindividualresearcher,orresearchgroup,this‘textransacking’isnotnecessarilyabadthing.DigitalHumanitiescan,should,anddoes,askquestionsthatgobeyondthescopeoftextsthatcouldbestudiedintenselybyonehumanbeing.Therearehowever,plentyofdangersinvolvedinusingdigitaltoolswithoutreallyknowingwhattheyexactlydo.Firstofallthereisthequestionwhenweknowenoughofwhatatooldoestoperformconscientiousdigitalanalyses.Secondlythereisthequestionifwekeep(enough)intouchwiththematerialwestudywithdigitalmethods.Whereliesthedomainknowledgethresholdthatisnecessarytodealwithdigitaldatacarefully?Atwhatpointdowehavea‘generalideaofwhatweshouldbeseeing?’
Thedangerof‘blackboxtooling’isincreasinglygettingattention.10Thedangersoflosingtouchwiththeoriginalsourcematerialrequiressomefurtherexplanation.Forsomehumanitiesscholars,digitalhumanitiesresearchmainlyextendstheworktheyalreadyaredoing:samekindofdata,largerapproaches.WhenFatherRobertBusainitiatedtheIndexThomisticusinthe1940’s,heobviouslyalreadywasfamiliarwiththeworkofThomasofAquinas.WhenliteraryscholarswanttostudythelanguageuseintheworksofJaneAustenwemayassumetheyhavealreadyreadquiteabitof
8MeganR.Brett,‘TopicModeling:ABasicIntroduction’,JournalofDigitalHumanities,vol2.,nr.1,Winter20129Tociteonlyafew:On‘algorithmiccriticism’theslightlydatedbutstillinsightful:S.Ramsay,ReadingMachines:TowardanAlgorithmicCriticism(Chicago2011).Ondatacriticism:FrederickW.GibbsandTrevorJ.Owens,TheHermeneuticsofDataandHistoricalWriting(2012revision)’,in:JackDoughertyandKristenNawrotzkieds.,WritingHistoryintheDigitalAge(Michigan,2013);OnToolcriticism:S.terBraake,,A.S.Fokkens,N.OckeloenandC.vanSon,‘DigitalHistory:towardsnewmethodologies’in:Bozic,Mendel-Gleason,DebruyneandO’Sullivaneds.,2ndIFIPWorkshoponComputationalHistoryandData-DrivenHumanities(2016).10SeeforexampletheToolCriticismWorkshopinAmsterdam:http://event.cwi.nl/toolcriticism/;AlbertMeroño-Peñuela,AshkanAshkpour,MariekevanErp,KeesMandemakers,LeenBreure,AndreaScharnhorst,StefanSchlobach,FrankvanHarmelen,‘SemanticTechnologiesforHistoricalResearch:ASurvey’,SemanticWebJournal,Volume6,Number6(2015)539-564;TerBraaketal,‘DigitalHistory’.
19
Austen.Thesescholarscertainlyalreadyhaveageneralideaofwhattheycouldbeseeing.Whenhistoriansuselargenewspaperarchivesfordigitalresearchhowever,includingdifferentnewspapersspanningnumerousdecades,thingsbecomemorecomplex.Historiansareoftenexpertsononeorseveralhistoricaltopics,withthenecessaryarchivalsourcesattachedtothem.Fewhistoriansareexpertsonawidevarietyofhistoricalnewspapers.Thisproblemisenlargedbythewaydigitaltoolsdealwiththesenewspapers.Textistransformedinto‘data’,takenawayfromthepageanditssurroundingsandistransformedtogetherwithotherpiecesoftextintoanaggregatedresult.11
ThequestionsIwanttoaddresshereare:
1. Whendoesaresearcherknowenoughofatooltouseitconscientiously?2. Whendoesaresearcherknowhismaterialwellenoughtousedigitaltoolsfordistantreading
analyses?
Andfinally,springingforthfromthis:
3. Atwhatpointdowedecidethattheanswersto1)and2)arenotcostefficientanymore?Atwhatpointshouldwedecidethata‘simple’toolandclosereadingpracticesaremorepracticalforhumanistresearchthancomplicatedtoolsusedonlargedatasets?
Ifwewanttovisualisetheinterplaybetweenresearcher,algorithm,tool,interfaceanddata,thenwecancometoapyramidofconscientiousdigitalhumanitiesresearch,asvisualisedbelow.Ontopthereisthehumanistresearcher,withallofhisorherpresuppositionsacquiredfrompriorknowledge.Thisresearcherwillmostlybeworkingwithaninterface,butalsohastounderstandthetoolbehindtheinterfaceandthedataandalgorithmsbehindthetool.Ifthehumanistmisseseitherasufficientgraspofthecomputeralgorithms,orofthedatathatisused,theresultsthatare
11ForexampletheShiCotool,tracingconceptsthroughtime:https://github.com/NLeSC/ShiCo.SeeforreflectionsonthelossofcontextC.Jeurgens,‘TheScentoftheDigitalArchive:DilemmaswithArchiveDigitisation’,BMGN-LowCountriesHistoricalReview128(4)92013)pp.30–54
20
providedbythetoolthroughaninterfacemaybemisinterpreted,orsignificanterrorsmaynotbespotted.
Inshort,thereshouldbea‘generalidea’ofwhatwecouldseeing,bothbyknowingthetoolandthedata.Inthispresentation,Iwillpresentaproposal,astep-by-stepplan,ofwhatcouldbedonetoreachthisgeneralunderstandingbytakingtheexampleofmyownresearchonconceptdriftinDeGidsandVaderlandscheLetteroefeningen,twonineteenthcenturyjournalsdealingwithallkindsoftopicsofgeneralinterest.Thesestepsinclude:1)manualclosereading;2)digitalclosereading;3)digitalanalysis;4)criticismoftheresults;5)reflectiononsteps1and2:weretheysufficient?6)reflectionsonstep3:wasthistoolthebesttouseforthispurpose?
Whengoingthroughthiscyclethesequestionsshouldalwaysbeconsidered:atwhatpointaretherequirementsforconscientiousdigitalhumanitiesresearchtoohightobeworththeeffort?Atwhatpointisthepyramidtoocostly?Whenisitmoreefficient,andinfactconscientious,tosettlefora‘simpler’tool?Atwhatthresholdshouldthedigitalmakeroomagainformoretraditionalhumanities?
2.Thisismygroundtruth,tellmeyours:PotentialsofmultipleannotationsfordigitalhumanitiesBeritJanssenMeertensInstitute,AmsterdamandInstituteforLogic,LanguageandComputation,UniversityofAmsterdam
Manymethodsindigitalhumanitiesrelyoncomputationalmethods,whichmaybetrainedonasetofreferenceannotations,alsoreferredtoasgroundtruth.However,humanjudgementsarerarelyunanimous:thisledtoresearchintohowinformationfromhumanjudgescanbebestcombinedtoincreaseknowledgeofthe“true”relationshipsindata(e.g.,Dong,2010).However,inmanydomains,forinstanceinmusicinformationretrieval,itmaybeassumed,thatmultipleannotatorjudgementsmayformequallyvalidinterpretationsofdatasuchasmusicsimilarityorchordestimation(Koops,2016;Schedl,2014).Thepresentcontributionshowshowmultipleannotationscanbeusedtorevealhumanstrategiesandknowledgebyinvestigatinghowannotatorsmayagreeordisagreeondifferentsubgroupsindata.
Asanexample,Ipresentadata-setofannotationsonphrasesimilarityin360Dutchfolksongs.12Thesefolksongsarecategorizedinto26groupsofvariants,ortunefamilies.Threeannotatorsworkedindependentlytogivelabelstophraseswithintunefamilies,orgroupsofvariants.Thelabelsconsistedofalettercombinedwithanumber,withwhichannotatorscouldindicatesimilarityinthreecategories:“almostidentical”(sameletterandnumber),“relatedbutvaried”(sameletterbutdifferentnumber),and“different”(differentletterandnumber).Theannotatorsdidnotagreeonphrasesimilarityatalltimes,butwithFleiss’κ=0.71(Fleiss&Cohen,1973),theagreementwassubstantial.
Thedatasetwasusedtoevaluatepatternmatchingalgorithms:thesealgorithmscomparedeachphraseinthedatasetagainstthemelodieswithinthetunefamilyfromwhichthequeryphrasewastaken,andreturnedamatchscore.Forevaluationpurposes,thethreeannotationswerecombinedthroughamajorityvote:iftwoormoreannotatorshadgivenanyphraseinagivenvariantthesame
12Availablefromliederenbank.nl/mtc
21
labelasthatofthequeryphrase,thevariantwasconsideredtocontainaninstanceofthephrase,whichapatternmatchingalgorithmshouldfind(cf.Janssen,vanKranenburg&Volk,2017).
The added value of combining multiple annotations is that next to the evaluation of pattern matching algorithms, also the annotators themselves may be compared to the majority vote. This comparison shows that individual annotators agree around 87% with the majority vote: they miss about 10% of the relevant phrase instances, and find about 10% irrelevant occurrences, as compared with the majority vote. Flexer and Grill (2016) showed how such inter-rater disagreement introduces an upper bound for various tasks in music information retrieval.
The current work presents a way to learn from inter-rater disagreement: the dataset is categorized into tune families, which form homogeneous groups of melodies with high distinctiveness between groups. An analysis of the distribution of disagreement with the majority vote over tune families reveals that individual annotators disagree with the majority vote in different ways, such that some tune families lead to few disagreements for one annotator, but many disagreements for another annotator. This differs from the errors produced by the three-best performing pattern matching algorithms: they show similar trends over the tune families, such that a tune family in which one algorithm produces many irrelevant results will also be more difficult to handle by other algorithms. This suggests that the strategies of the compared pattern matching algorithms may be similar, while the annotators bring different strategies to the table.
ReferencesDong,X.L.,Gabrilovich,E.,Heitz,G.,Horn,W.,Murphy,K.,Sun,S.,&Zhang,W.(2014).Fromdatafusiontoknowledgefusion.ProceedingsoftheVLDBEndowment,7(10),881-892.
Fleiss,J.L.,&Cohen,J.(1973).Theequivalenceofweightedkappaandtheintraclasscorrelationcoefficientasmeasuresofreliability.Educationalandpsychologicalmeasurement,33(3),613-619.
Flexer,A.,&Grill,T.(2016).TheProblemofLimitedInter-raterAgreementinModellingMusicSimilarity.JournalofNewMusicResearch,45(3),239-251.
Janssen,B.,vanKranenburg,P.&Volk,A.(2017,inpress).Findingoccurrencesofmelodicsegmentsinfolksongsemployingsymbolicsimilaritymeasures.JournalofNewMusicResearch.
Koops,HendrikVincent,etal."IntegrationAndQualityAssessmentOfHeterogeneousChordSequencesUsingDataFusion."InternationalSocietyforMusicInformationRetrievalConference.2016.
Schedl,M.,Gómez,E.,&Urbano,J.(2014).Musicinformationretrieval:Recentdevelopmentsandapplications.FoundationsandTrendsinInformationRetrieval,8(2-3),127-261.
3.DigitalHistoryProjectsasBoundaryObjectsMaxKemmanUniversityofLuxembourgmax.kemman@uni.lu
Digitalhistoryisconcernedwiththeincorporationofdigitalmethodsinhistoricalresearchpractices.Thus,digitalhistoryaimstousemethods,concepts,ortoolsfromotherdisciplinestothebenefitofhistoricalresearch,makingitaformofmethodologicalinterdisciplinarity(Klein,2014).Thisrequiresexpertiseofdifferentfacets,suchashistory,technology,anddatamanagement,andasaresultmanydigitalhistoryactivitiesareacollaborationofscholarsandprofessionalsfromdifferentbackgrounds.
22
SuchcollaborationswouldfitSvensson’scharacterisationofdigitalhumanitiesasafractionedtradingzone(Svensson,2011,2012).Simplystated,thismeansfirstthatdigitalhumanitiesfunctionsasheterogeneouscollaborations,i.e.,withparticipantsfromdifferentdisciplinarybackgrounds,andsecondthattheparticipantsactvoluntarily.
Inthispaper,wewillinvestigatethesetwoaspectsinthecontextofdigitalhistorytounderstandhowdigitalhistoryprojectsfunctionasheterogeneouscollaborations,andwhattheparticipants’incentivesareforenteringsuchcollaborations.
Wewilllookatdigitalhistoryprojectsasboundaryobjects,aconceptdevelopedbyLeighStarandGriesemertodescribeanobjectthatmaintainsacommonidentityamongthedifferentparticipants,yetisshapedindividuallyaccordingtodisciplinaryneeds(StarandGriesemer,1989;Star,2010).Thisconceptcouldbeusedforexampletorefertothetoolunderdevelopment,orthedataonwhichthetoolandhistorianwillwork.However,inthispaperwewillapproachtheprojectitselfasboundaryobject;theprojectbindstheparticipantstogether,andallparticipantssubscribetoacommondescriptionoftheproject’sgoals,whileatthesametimetheparticipantsshapetheprojectaccordingtotheirownneeds.Asonedigitalhistoryprojectcoordinatordescribeditinaninterview:
”[Y]ouhavearesearchidea,andyoufitthattothecallyou’reapplyingto,andthenyougetfunding…Andifyouthenhireresearchers,yestheytoohavetheirownideaofcourse,andtheirownlineofresearchthey’reworkingon,andtheytrytofitthatintheresearchproject.”
Thisleadsustoinvestigatetheincentivesforcollaboration.Whenwritingaboutinterdisciplinarycollaborationindigitalhistory,thisisalmostalwaysdonetounderscorethepositiveorevennecessaryeffects(e.g.Eijnattenetal.,2013;Hitchcock,2014;Sternfeld,2011).However,suchcollaborationisnottrivialandrequiresdedicationandinvestmentsfromallinvolved,e.g.asshownbySiemens(2009;2012).InordertoinvestigatetheactivitiesofindividualparticipantswewillfollowtheworkofWeedmanonincentivesforcollaborationsbetweenearthscientistsandcomputerscientists(1998).ForseveraldigitalhistoryprojectsbasedintheBeneLux,wehaveinterviewedtheparticipantsandinquiredabouttheirreasonsforjoiningtheproject,theirindividualgoalswiththeproject,andtheexpectedeffectsoftheirparticipationaftertheprojecthasended.Forexample,inaninterviewonehistoriannotedabouttheirproject:
”[W]e’resupposedtobeadvisingtheteamdevelopingthetool.Andtryingtothencarryoutresearchonaspecificcasestudy.Andsooriginallyitwaslikewowwe’regoingtobeabletousethetool,butveryquicklyitbecameclearokactuallyprobablywe’renotgoingtobeabletousethetool.”
Bylookingintotheincentivesofalltheparticipantsofaproject,wewillunpackthetradingzonesofdigitalhistoryprojects,togainanunderstandingofhowheterogeneous,interdisciplinarycollaborationswork,andhowparticipantsshapethesecollaborations.Thiswillallowustolookintowhyasituationasdescribedabovebythishistorianoccurs,andhowindividualshapingoftheprojectcanleadtothis.Moreover,wewillarguethattheseincentivesgobeyonddisciplinaryboundaries,whichmeansthatthetradingzoneinadigitalhistoryprojectismorecomplexthanthe(in)famousTwoCulturesasdescribedbyC.P.Snow.
ThisresearchispartofPhDresearchonhowtheinterdisciplinaryinteractionsindigitalhistoryaffectthepracticesofhistoriansonamethodologicalandepistemologicallevel(Kemman,2016).Byunpackingdigitalhistoryprojects,weaimtogainbetterinsightinhowdigitalhistoryfunctionsasacoordinationofpracticesbetweenhistoriansandcollaboratorsfromdifferentbackgrounds,andhowindividualincentivesshapethiscoordination.
23
ReferencesEijnatten,J.van,Pieters,T.,andVerheul,J.(2013).BigDataforGlobalHistory:TheTransformativePromiseofDigitalHumanities.BMGN-LowCountriesHistoricalReview,128(4):55–77.
Hitchcock,T.(2014).BigData,SmallDataandMeaning.Availablefrom:http://historyonics.blogspot.co.uk/2014/11/big-data-small-data-and-meaning_9.html.
Kemman,M.(2016).DimensionsofDigitalHistoryCollaborations.DHBenelux.Belval,Luxembourg.
Klein,J.T.(2014).InterdiscipliningDigitalHumanities:BoundaryWorkinanEmergingField.UniversityofMichiganPress,onlineedition.
LeighStar,S.(2010).ThisisNotaBoundaryObject:ReflectionsontheOriginofaConcept.Science,Technology&HumanValues,35(5):601–617.
LeighStar,S.andGriesemer,J.R.(1989).InstitutionalEcology,‘Translations’andBoundaryObjects:AmateursandProfessionalsinBerke-ley’sMuseumofVertebrateZoology,1907-39.SocialStudiesofScience,19(3):387–420.
Siemens,L.(2009).’It’sateamifyouuse”replyall”’:Anexplorationofre-searchteamsindigitalhumanitiesenvironments.LiteraryandLinguisticComputing,24(2):225–233.
Siemens,L.andINKEResearchGroup(2012).FromWritingtheGranttoWorkingtheGrant:AnExplorationofProcessesandProceduresinTransition.ScholarlyandResearchCommunication,3(1).
Sternfeld,J.(2011).Archivaltheoryanddigitalhistoriography:Selection,search,andmetadataasarchivalprocessesforassessinghistoricalcontextualization.AmericanArchivist,74(2):544–575.
Svensson,P.(2011).Thedigitalhumanitiesasahumanitiesproject.ArtsandHumanitiesinHigherEducation,11(1-2):42–60.
Svensson,P.(2012).BeyondtheBigTent.InGold,M.K.,editor,DebatesintheDigitalHumanities.UniversityofMinnesotaPress,onlineedition.
Weedman,J.(1998).TheStructureofIncentive:DesignandClientRolesinApplication-OrientedResearch.Science,Technology&HumanValues,23(3):315–345.
24
SessionD
1.ModellingandAnalyzingCharacterNetworksinRecentDutchLiteratureRoelSmeets(PhDcandidate)RadboudUniversityNijmegen,DepartmentofLiteraryandCulturalStudies
Keywords:socialnetworkanalysis,characternetworks,DigitalLiteraryStudies,Dutchliterature
CharacterrelationsWhenweinterpretnovelsweareinfluencedby(hierarchical)relationsbetweencharacters.Theserelationsarenotneutral,butvalue-laden:e.g.thewayinwhichweconnectClarrisawithRichardisofmajorimportanceforourinterpretationofthegenderrelationsinMrsDalloway(1925).Inliterarystudies,characterrelationshavethereforelainatthefoundationofavarietyofcriticalstudiesonliterature(e.g.Minnaard2010,Song2015).Abasicpremiseinsuchcriticismisthatideologicalbiasesareexposedinthe(hierarchical)relationsbetweenrepresentationsofcertaingroups(i.e.gender,ethnicity,socialclass).
Closereading–thecommon,traditionalmethodinliterarystudies–iswellsuitedforfine-grainedanalysesofthenuancesandsubtletiesofcharacterrelations,butfallsshortwhenitcomestofindingpatternsamongcharacterrelationsortestinghypothesesoncharacterrelationsinlargerbodiesofliterarytexts(cf.Stronks2013).
SocialNetworkAnalysisIncomputationallinguistics,inrecentyearsabroadeningrangeofresearchhasbeencarriedoutonthecomputationalanalysisofsocialnetworksin(literary)texts(e.g.Elsonetal.2010,Karsdorpetal.2012).Onthebasisofautomated,computationalmodelscharacterrelationsofallkindsareformalizedandmappedinlargeamountsoftexts.Althoughinitsinfancy,thisbranchofresearchshowsthatsocialnetworkscaninfactbereliablyextractedautomaticallyfromnarrativetexts(VandeCamp2016),andrelationshipscanalsobeclassifiedaccuratelybycomputationalmodelstrainedonexamples,e.g.asbeingromantic(Karsdorpetal.2015)
Thecurrentresearchprojectdepartsfromthehypothesisthatacomputationalapproachtocharacterrelationscanreveal(hierarchical)patternsbetweencharactersinliterarytextsinamoredata-drivenandempiricallyinformedway.Inordertotestthishypothesis,experimentsarebeingconductedwithdifferentformsofsocialnetworkanalysisofcharactersinacorpusof170recentDutchliterarynovels.Thetwomajormethodologicalchallengesare:
1. todefinethenodesthatconstitutethesocialnetworkofanovel2. todefineandtoweightherelationsbetweenthenodes
Thefirstmethodologicalchallengeisaboutdoingaformofcharacterdetection:NLPtechniquesasNamedEntityRecognitionandResolution,pronominalresolutionandcoreferenceresolutioncometomind.However,automaticcharacterdetectioninliterarytextsisfarfromaconvenientclassificationtask(Valaetal2015).
Thesecondmethodologicalchallengeisaboutfindingawaytodecidewhenandhowtwoormorecharactersinatext‘interact’.WhenFrancoMorettiinhisfamousbookDistantreading(2013)madeacharacternetworkofShakespeare’sHamlet,hedidthatonthebasisofoccurrencesofcharacterX(theaddressee)inthelinesofcharacterY(thespeaker).Novelsarefundamentallydifferentthandramaticplaysinthatrespect:charactersinnovelsusuallydon’tspeaktoeachotherinadirectway,andthedefinitionandweighingofcharacterinteractionthereforerequiresadifferentapproach.
25
Top-downandbottom-upapproachInthistalkIwillarguethatapracticalcombinationofmanuallygathereddataandcomputationalanalysiscangaininsightinpatternsbetweencharacterrelationsinrecentDutchliterature.Insteadofusingabottom-upapproachofcharacterdetection,Iwillstarttop-downusingapredefinedlistofnamesofcharactersfromeachnovelinmycorpus.Furthermore,Iwillusemanuallygathereddatafromearlierresearchtoascribedemographicfeaturestothecharactersthatconstitutethenodesofthenetwork(VanderDeijletal2016).Assuch,itwillbepossibletorelatedemographicbackgroundsofcharacterstotheirrespectiveplaceinthecharacternetworkofthenovel.Moredataarecurrentlybeinggatheredmanuallyfromtheresearchcorpus:thematicrelationsasfamily,friend,lover,colleagueandenemy,whichwillbeusedtodepictthenatureoftherelationsbetweenthecharactersinthecorpus.
Iwilldemonstrateinthistalkhowmanuallygathereddata(demographicfeaturesandthematicrelations)canbeusedfordefiningboththenodesofthenetworkandthenatureofrelationbetweenthenodes.Moreover,Iwillshowhowatop-downapproachbasedonmanuallygathereddatacanbecomplementedandenrichedbyabottom-up,computationalanalysisofco-occurrences,whichwillwebeusedforweighingtherelations(or:interactions)betweenthecharacternodes.Theco-occurrenceanalysiswillconsistofpreciselydelineatedtextualwindows(onthesentencelevel)inwhichwillbesearchedfordifferenttokens(variantsofnames,pronouns)forspecificcharacterentitiesinadjacencywithtokensbelongingtoothercharacterentities.
ReferencesCamp,Matjevande.2016.Alinktothepast:ConstructingHistoricalSocialNetworksfromUnstructuredData.PhDthesis,TilburgUniversity(TilburgSchoolforHumanities).
Deijl,Lucasvander,Pieterse,Saskia,Prinse,Marion&Smeets,Roel.2016.‘MappingtheDemographicLandscapeofCharactersinRecentDutchProse:AQuantitativeApproachtoLiteraryRepresentation.’In:JournalofDutchLiterature(7:1).
Elson,David,Dames,Nicholas&McKeown,Kathleen.2010.‘ExtractingSocialNetworksfromLiteraryFiction’.In:Proceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2010),Uppsala.
Karsdorp,Folgert,Kranenburg,Petervan,Meder,Theo&AntalVandenBosch.2012.‘Castingaspell:Identificationandrankingofactorsinfolktales.’In:F.Mambrini,M.Passarotti,andC.Sporleder(eds.),ProceedingsoftheSecondWorkshoponAnnotationofCorporaforResearchintheHumanities(ACRH-2),pp.39–50.
Karsdorp,Folgert,Kestemont,Mike,Schöch,Christof,&Bosch,Antalvanden.2015.‘TheLoveEquation:ComputationalModelingofRomanticRelationshipsinFrenchClassicalDrama.’In:ProceedingsoftheSixthInternationalWorkshoponComputationalModelsofNarrative,pp.98-107
Minnaard,Liesbeth.2010.‘TheSpectacleofanInterculturalLoveAffair:ExoticisminVanDeyssel'sBlankengeel’.In:JournalofDutchLiterature(1:1).
Moretti,Franco.2013.DistantReading.London:Verso.
Song,AngelineM.G.2015.APostcolonialWoman’sEncounterWithMosesandMiriam.NewYork:PalgraveMacmillanUS.
Stronks,Els.2013.‘Deafstandtussencloseendistant.Methodenenvraagstellingenincomputationeelletterkundigonderzoek’.In:TijdschriftVoorNederlandseTaal-enLetterkunde(4).
Vala,Hardik,Jurgens,David,Piper,Andrew&Ruths,Derek.2015.‘Mr.Bennet,hiscoachman,andtheArchbishopwalkintoabarbutonlyoneofthemgetsrecognized:Onthedifficultyofdetecting
26
charactersinliterarytexts.’In:Proceedingsofthe2015ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages769–774,Lisbon,Portugal,AssociationforComputationalLinguistics.
2. Spinozist discourse in Dutch textual culture (1660-1720)A computational approach to the dissemination of the RadicalEnlightenmentLucasvanderDeijl,UniversityofAmsterdam
LiavanGemert,UniversityofAmsterdam
ErikvanZummeren,UniversityofAmsterdamContact:[email protected]
Keywords:Spinozism,RadicalEnlightenment,topicmodeling,discourseanalysis,textmining
Sincethelinguisticturn,theterm‘discourse’hasbeenanimportantinstrumentformanyhumanitiesscholars(Bové1995).Ithasbecomecommonpracticetostudyculturalhistorythroughthelanguageanddiscussionsinwhichitwasmediated.Currently,thegrowingavailabilityofdigitisedhistoricalmaterialprovidesnewwaysandscalestostudyhistoricaldiscourses,whichhavebeenrecognisedbydigitalhumanitiesscholarsatanearlystage(Olsen&Harvey1988).However,digitalapproachestohistoricalcorporafacetheproblemthattheoftenlooselydefinedterm‘discourse’isnoteasytoformalise.Intraditionalliterarystudies,theverylackofdefinitionisinherenttotheinfluentialpost-structuralistparadigmthatreinventedtheterm,inwhichmeaningisconsidered‘indefinite’bydefinition.Withinthistradition,discursiveelementsaremeasuredthroughbothmanifestandlatentsemanticrelations,withanequalfocusonwhatissaidandwhatisleftout,forgottenorsuppressed.Quantitativemethods,tothecontrary,requireamorereductiveunderstandingofwhatadiscoursecomprises(e.g.Jockers2013;Ramsay2011).Theyprimarilyrelyoninformationrepresentedincomputationallymeasurabletextelements,whichchallengesthetraditionaluseoftheterm.DigitalHumanitiesthuspromisenewopportunitiesforculturalhistory,butalsorequireacriticaltranslationoftraditionalmethodology.
Adominantapproachinthestudyofintellectualdiscoursesfocusesonconcepts(e.g.Mandelbaum1965;Lovejoy2001;Kuukkanen2008).Philosophersandcomputationallinguistshavecreatedmodelsandmethodsinordertoaccountforconceptualchangeordriftthroughtimecomputationally(Betti&Hein2014;Kenteretal.2015).Secondly,studiesthatemploydigitaltextanalysistoapproachhistoricaldiscoursesoftenuse‘topics’asarepresentationorindicationofdiscursivepatternsinlargetextcorpora(e.g.Nelson2010).Topicmodelingisausefultechnologyfornarrowingdownaresearchcorpusintoaselectionthatcouldbeofinteresttotheresearcher.Themethodalsoallowstracingtheevolvementofdominantthemesovertime.Itisespeciallyusefulwhentheresearcherhasnostrongintuitionsaboutthecorpus:thepoweroftopicmodelingisitsindependencefromassumptions(Underwood2012).Theuseoftopicsasameasurefor‘discourse’inthetraditionalsenseis,however,problematic.Atopicisformallydefinedasa‘distribution[ofwords]overavocabulary’andisnomorethanasetofwordsthatarestatisticallylikelytoco-occurinagiventext(Blei2012).AdiscourseintheFoucauldiansensecomprises(historical)values,sharedassumptions,‘commonsense’,associations,automatedmodesofwritingandthinking,whichconstituteandregulatepowerrelationsthroughlanguageandintertextuality(e.g.Foucault1977;Bové1995).WhenfollowingFoucaultsnotionofdiscourse,collocations–thebasiclinguisticelementfortopicmodeling–couldbemisleading.Theoperationalisationofdiscoursesthroughtopicsmaybeintuitive,butistheoreticallyfarfromevident.
27
Thestudyofthedisseminationofconceptsanddiscoursesisespeciallyrelevantinthecontextoftheso-calledRadicalEnlightenment,amovementofproto-EnlightenmentintellectualinnovationinwhichSpinozaplayedakeyrole(Israel2001;Jacob1981;Krop2014).AsaresultoftheexplosivetheologicalandscientificdebatesthatthreatenedthestabilityoftheRepublicthroughouttheseventeenthcentury,radicaldiscoursesthatchallengedorthodox-Calvinistdoctrinewerefirmlysuppressedthroughcensorshipandprosecutionofauthors,publishersandprinters(Israel1997).Inspite(orbecause)ofthiscensorship,radicaldiscoursescirculated‘underground’,inclandestinepublicationsandcircuits(cf.Darnton1982).Manyculturalhistorianshavealsoindicatedhowauthorscommunicatedradicalideasindirectlyandambiguouslythroughliterarygenressuchasnovelsandpornography(VanBunge2003;Elias1974;Leemans2002;Wortel2006).TheFoucauldianmeaningof‘discourse’asapossiblemeansforthereinforcementofpowerrelationsbecomesevidentduringtheRadicalEnlightenment.
Ratherthanelaboratingonthetheoreticaldifferencebetweentopics,conceptsanddiscoursesonanabstractlevel,thispaperdemonstratesitthroughacasestudy.Itpresentscomputerassisteddiscourseanalysisasanapproachtoaspecifichistoricalquestion:howdidSpinozistphilosophydisseminateintoa‘Spinozist’discourseinearlymodernDutchtextualculture(1660-1720)?Inthisstudy,Spinozistphilosophywasreducedtoasetofcharacteristicconcepts(cf.DeBolla2013),whichwereidentifiedthroughtf-idf13frequencyanalysesandthenrefinedbyhand.Theconceptswererepresentedasnetworksofco-occuringwordsinseventeenthcenturyDutchtranslationsofeightworkswrittenbythephilosopher,translatedbyPieterBalling(?–1664)andJ.H.Glazemaker(1620-1682)(Thijssen-Schouten1967;Steenbakkers1999).14TheseconceptualnetworkswereusedasameasuretoidentifySpinozist‘discourse’inacorpusof500textspublishedbetween1660and1720.Forpragmaticreasons,thevocabularieswereassumedtobestable,butthispaperaddressespossibleadvancementsbasedontheliteratureonconceptualandlinguisticdrift(Betti&Hein2014;Kenteretal.2015).Also,conventionalproceduresappliedincomputationalintellectualhistoryweremodifiedinordertoreducetheproblemscausedbyspellingvariationinhistoricalDutch(e.g.inHerbelotetal.2012;Tangherlini&Leonard2013).
Theresultsobtainedthroughtheconcept-orientated‘topdown’approacharecontrastedwithamore‘bottomup’transformationofthecorpusbasedontopicmodeling.ThispaperevaluatesthedifferencesbetweenbothapproximationsofSpinozistdiscourseandshowshowSpinozisttextsunknowntothecomputerweresuccessfullyidentifiedanddescribed.Basedontheseresults,itformulatesaworkinghypothesisonthedisseminationofSpinozistdiscourseinDutchtextualcultureandadvancesthedebateontheresonanceof(Radical)Enlightenmentideaswithcomputationalresults(Darnton1982;Israel2001;Leemans2002;Edelstein2010etc.).
ReferencesBetti,A.&H.vandenBerg,‘ModellingtheHistoryofIdeas’.BritishJournalfortheHistoryofPhilosophy22(2014)4:812-835.
Blei,D.,‘ProbabilisticTopicModels’.CommunicationsoftheACM55(2012)4:77-84.
Bolla,P.de,TheArchitectureofConcepts.TheHistoricalFormationofHumanRights.NewYork2013.
13 ‘term frequency – inverse document frequency’. 14 Korte verhandeling van God, de mensch en deszelvs welstand (1660-1661); Renatus Des Cartes Beginzelen
der wysbegeerte, I en II bewezen (1664); Aanhangzel, over-natuirkundige gedachten (1664); Handeling van de verbetering van 't verstant (1667); Zedekunst, In vijf delen onderscheiden (1677); Brieven Van verscheide geleerde Mannen Aan B.d.S (1677); Staatkundige verhandeling (1677); De Rechtzinnige Theologant, of godgeleerde staatkundige verhandeling (1693).
28
Bové,P.A.,‘Discourse’.In:F.Lentricchia&T.McLaughlin,CriticalTermsforLiteraryStudy.Chicago1995:50-64.
Bunge,W.van,‘Philopater,deradicaleVerlichtingenheteindevandeEindtijd’.MededelingenvandeStichtingJacobCampoWeyerman26(2003):10-19.
Darnton,R.,TheliteraryundergroundoftheOldRegime.Cambridge(MA)1982.
Elias,W.,‘HetspinozistischeerotismevanAdriaanBeverland’.TijdschriftvoordeStudievandeVerlichting2(1974):283-320.
Edelstein,D.,TheEnlightenment.Agenealogy.Chicago2010.
Foucault,M.,‘TheArcheologyofKnowledgeandtheDiscourseonLanguage’.Trans.A.Sheridan.NewYork1977.
Gemert,L.van,‘Steneninhetmozaïek.DevroegmoderneNederlandseromanalsinternationaalfenomeen’.TijdschriftvoorNederlandseTaal-enLetterkunde124(2008)1:20-30.
Herbelot,A.,E.vonRedecker,J.Müller,‘Distributionaltechniquesforphilosophicalenquiry’.Proceedingsofthe6thEACLWorkshoponLanguageTechnologyforCulturalHeritage,SocialSciences,andHumanities.Avignon2012:45-54.
Israel,J.,‘ThebanningofSpinoza’sworksintheDutchRepublic’.In:C.Berkvens-Stevelincke.a.(red.),TheemergenceoftoleranceintheDutchRepublic.Leiden1997.
Israel,J.,RadicalEnlightenment.NewYork2001.
Jacob,M.C.,TheradicalEnlightenment.Pantheists,freemasonsandrepublicans.London1981.
Jockers,M.,Macroanalysis.DigitalMethodsandLiteraryHistory.Urbana2013.
Kenter,T.M.,M.Wevers,P.Huijnen&M.deRijke,‘AdHocMonitoringofVocabularyShiftsoverTime’.Proceedingsofthe24thACMInternationalConferenceonInformationandKnowledgeManagement.Melbourne2015.
Krop,H.,Spinoza.EenparadoxaleicoonvanNederland.Amsterdam2014.
Kuukkanen,J.M.,‘MakingSenseofConceptualChange’.HistoryandTheory47(2008):351-372.
Leemans,I.,Hetwoordisaandeonderkant.RadicaleideeëninNederlandsepornografischeromans1670-1700.Nijmegen2002.
Lovejoy,A.O.,‘TheHistoriographyofIdeas’.ProceedingsoftheAmericanPhilosophicalSociety78(1938):529-543.
Lovejoy,A.O.,TheGreatChainofBeing.AStudyoftheHistoryofanIdea.Cambridge,MA/London2001[1964].
Mandelbaum,M.,‘TheHistoryofIdeas.IntellectualHistory,andtheHistoryofPhilosophy’.HistoryandTheory5(1965):33-66.
Nelson,R.K.,‘MiningtheDispatch’,2010.[http://dsl.richmond.edu/dispatch/pages/home]
Olsen,M.&L.G.Harvey,‘ComputersinIntellectualHistory:LexicalStatisticsandtheAnalysisofPoliticalDiscourse’.TheJournalofInterdisciplinaryHistory18(1988)3:449-464.
Ramsay,S.,ReadingMachines.TowardsanAlgorithmicCriticism.Urbana:UniversityofIllinoisPress,2011.
29
Siebrand,S.J.,SpinozaandtheNetherlands.Aninquiryintotheearlyreceptionofhisphilosophy.DissertationRijksuniversiteitGroningen1980.
Steenbakkers,P.M.L.,‘BenedictusdeSpinoza.Eenoverzicht.’Filosofie9(1999)6:4-14.
Tangherlini,T.R.&P.Leonard,‘TrawlingintheSeaoftheGreatUnread:Sub-corpustopicmodelingandHumanitiesresearch’.Poetics41(2013)6:725-749.
Thijssen-Schouten,C.L.,UitdeRepubliekderLetteren.ElfstudiënophetgebiedderideeëngeschiedenisvandeGoudenEeuw.DenHaag1967.
Underwood,T.,‘Topicmodelingmadejustsimpleenough’.Online2012.[https://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/]
Wortel,D.,‘VrouweninmannenklerenenSpinoza.DeKloekmoedigeLand-enZee-Heldin(1682)alsverpakkingvandefilosofievanSpinoza’.In:SpiegelderLetteren48(2006):27-55.
30
SessionE
1.BuildingaConceptualArchitectureandDataModeltoaddresstheSustainableDataIntegrationProblemGeorgeBruseker,MariaTheodoridou,MartinDoerr(ICS-FORTH)
ResearchInfrastructures(RI)seekingtoprovideaunifiedresourcesettotheirusercommunitytendtobeginwiththeelaborationofanewmodelforunifyingadomainofdiscourseandthenseekouttheinstitutionalandpoliticalsupporttoundertakemappingstothedefinedcommonstructure.Theseprojectsareundertakenwiththecriticalaimoffacilitatingbroadresourceaccesswithinthedomainofinterest.Suchprojects,however,notablyfacestrongchallengesbothintermsofdefininganadequatemodeland,then,insustainingamappingandaggregationprocesswhichisunavoidablytimeconsumingandexpensive.Whilesuchresourceintegrationprojectsundoubtedlyserveacrucialroleinresearchenvironments,anessentialaspectofthisprocessseemstobeconsistentlyoverlooked.Dataarefundamentallyheterogenousinnature-astatethatcannotbeavoided-andareinaprocessofcontinuouspotentialoractualchange.Further,actorsmanagingresourceschangecomposition,statusandactivities.Thisquicklycreatesthepotentialforobsolesenceofanyintegrateddataenvironmentastheindexedresourcesinevitablychange.
Itseems,then,thatvaluecanbehadfromanewapproachthatfocusesonmakingintegrationsustainableandusefulinthelongrunbymodellingandmanagingtheintegrationprocessitself.Bymodellingthismetametalevelandprovidingadatastructureforthetrackingofthesame,weargue,itispossibletoprovidethenecessarymanagementstructuresforbuildinggroundupandon-demandaggregationwhichwillmeettheaimsofthisprocessbothinthepresentandintothefuture.Thispaperwilloutlinetheproposalofanewconceptualarchitecturetosupporthighlyscalableintegrationactivitiesfordevelopingevermoreintegratedpoolsofresourcesandaconceptualmodelcapableofrepresentingthedatarequiredtodrivethisprocess.
TheproposedconceptualarchitecturehasatitscorearegistrythatisalogicallyifnotphysicallydistinctdatastructurethatholdsdatapertainingtotheactivitiesofRIsandtheirmembersthemselves,theresourcestheyprovideandthemannerinwhichtheydoso.Theregistrymaintainsthepictureofwhohasanddoeswhatandwhereresourcesare,aswellastheirlevelofcompatibilitywithotherresources.Thedatarequirementsofthisregistryareextremelylightinordertoformaslittleabarrieraspossibletoparticipationinsuchaservicebypotentialpartners.Thebasicfunctionalrelationshipsthataretrackedtoallowthelong-termmanagementandcontrolofresourcesare:part-of,metadata-ofandindexed-by.Additionalmetadataisonlyrequestedinordertohelpdisambiguate
entitiesintheregistryandtosupportitsreadabilitybytheoperators.Intheproposedarchitecture,sourcemetadataanddataaswellastheirmultiplemappingsremaininacontentcloudwhichcanbeeitherdataheldbytrustedproviderswhoguaranteetheirmaintenanceor,otherwise,canbecopiedintoastablestoragefacilityatthetimeofregistration.Theregistryhastheintentiontoenabledecisionswithregardstothemanagementofdata,basedonthehighlevelviewof
resources,wherevertheymayresideacrossthedatacloud.Suchdecisionscouldinclude:identifyingdatasetsforanintegration,identifyinggapsincoverage,connectingorphaneddatasetsto
31
appropriatecurators,followingupwithserviceproviderswithregardstoavailability/qualityofserviceetc.
Inordertosupporttheproposedarchitecture,itisnecessarytoproposeanewconceptualmodeldescribingintegrationprocessesthemselves.ThisisthefunctionoftheParthenosModel.BuiltoffananalysisoftheregistriesofexistingRIs,itaimstomodelthefundamentalresourcesandrelationsthatareofinteresttomanageinintegration.Identifiedthroughthisprocesswereanumberoffundamentalentitiesthestudyofwhoserelationsdrovethemodeldevelopment.Theseare:services,project,datasets,software,andactors.WhatwasofinterestinthemodelwastounderstandthenatureoftheseobjectsnotassuchbutastheyplayarolewithinRIs.Takingthisscopeintoaccountallowedforstronganalyticdistinctionsofthehighlevelentitiesofinterestdeliveringacompactmodelof+-38classesand50relations.
Particularmodellingchallengesincludedefiningthefunctionalroleofservicesandcollections.Serviceplaysacentral,ifoftenoverlooked,roleinRIdiscourse.Itiswhatbindsassetstoactorsandallowsforeffectivecommunicationbetweenagentsonascientificandtechnicallevel.Aparticularchallengewastomodel
servicebeyondthescopeofe-servicesandtounderstandthefullrangeofitsmeaning.Thisleadtothedefinitionofserviceasawillingnessandabilityforsomeonetotakeactiontothebenefitofsomeotheragent.Modellingserviceatthisgenericlevelandthenprovidinghighlevelclassesforhosting,curatingande-servicesallowsahighlyflexibledescriptionofthevariouskindsofserviceRIsprovidetotheirmembers,notablyincludingnon-ITrelatedservices.Anotherparticularmodellingchallengefacedwastoaddresstheperennialquestionofwhatconstitutesa‘collection’.Thefactofthepluralityofanobjectiseasilymodelledthroughpartofrelations,butthismissesanaspectofthephenomenonthat‘collection’triestoexpress.Considerationofthisquestioninrelationtothecontextofserviceallowedforahighlyusefulnewconceptualization,distinguishingpersistentandvolatiledigitalobjects.Theformerarestaticinformationobjectswhoseidentityisfixedatthebitlevelandhaveanobjectivelyidentifiableexistenceovertimefromtheirstructure.Avolatiledigitalobject,however,hasnofixedidentityinitself,sinceitundergoescontinuouschangeandmodification.Itinheritsanidentityfromthefactthatitisanobjectundercuration,theactivityofacurationservice,undertakenwithsomespecificplan.Bymakingreferencetotheserviceofcurationanditsplan,wecanidentifyvolatiledigitalobjectsor‘collections’overtime.
TheproposedshiftinfocusfromdomainmodellingtomodellingofRIintegrationprocessesthemselvesiscurrentlybeingtestedwithintheParthenosProjectwherethearchitectureandmodelarebeingimplemented.ThemodelisbeingdevelopedandvalidatedthroughaniterativeprocessofmappingfromtheparticipatingRIsregistriestothemodelforintegrationintheregistry.ThemappingprocessisbeingundertakenusingtheX3MLtoolkitforwritingdeclarativemappings.OncepopulatedtheregistrywillbeusedtogetanoverviewoftheintegratedresourcecapacitiesofthejoinedRIsanddetermineappropriatedeeplevelintegrations.ThetechnologiestoruntheaggregationandthesubsequentVREsareprovidedthroughtheGCubeandD4Sciencesystems.Todate,themodelhasshownitselfrobustagainstbasicrevisionandflexibleenoughtodescribethishigh-levelmanagementpictureofintegration.
32
2. Improving data quality in Europeana by designing extensiveEDMrecords-TheUniversitätsbibliothekHeidelbergstudycasePierre-EdouardBarrault,ValentineCharles,AntoineIsaac(EuropeanaFoundation,PrinsWillem-Alexanderhof5,2595BE,TheHague,TheNetherlands)
IntroductionForthispaper,wehaveworkedonimprovingtheresultsofmappingprocessfromtheMETS15toEDM16schemas,formetadatarecordsassociatedwithculturalheritageobjects.WechosetopresentthecaseoftheUniversitätsbibliothekHeidelberg17,whichwasfoundedin1386andisGermany'soldestuniversityandoneoftheworld'soldestsurvivinguniversities.Itsmagnificentcollectionofabout25000records18containsparchments19andearlyprintedbooksfromthe14thcenturyuntilModernAge,orbooks,magazinesandnewspapersfromthe19thandonward,invariouslanguagesincludingFrench20,German,ItalianorSpanish.Itiswithoutanydoubtasolidaccomplishmentforanoldbookdigitizationproject,demonstratingthevalueaddedfromrespectingbothcontentintegritythankstohighdigitizationstandardscoupledwiththeIIIFframework,andinformationalqualitythroughrich,highly-structured,opendata.Inaddition,theinstitutionproposesitscollectionundertheCreativeCommons-Attribution,ShareAlike(BY-SA)openlicense,allowingforfreere-use21.
Ontheotherhand,theEuropeanaCollections22isanEuropeanplatformpartneringwithculturalinstitutionstocentralize,inanopenonlinedatabase,allmetadataandcontentrelatedtoculturalheritageobjectsavailableacrossEurope.Theplatformsactsasasearchenginetoexplorethesecollections,offersasetofcuratedchannelsfocusedonspecificthematics,andalsomakesseveralWebservicesavailablethatcanbeusedbydevelopers,creativesandresearchersfortacklingandre-usingdigitalculturalresources.
Previouslytothisexperiment,thecollectionoftheUniversitätsbibliothekHeidelberginEuropeanawasbasedonharvestsoftheOAI-PMHserveroftheinstitutionexposingmetadataundertheESEschema.Weusedtoreceivelimitedmetadatarecordsinwhichmultiplevaluesforagivenfieldweremappedinonlyoneinstanceofthisfield.Fieldssuchasdc:date,dc:typeanddc:subjectwerebiased.HavingsinglestringsintroducedinasinglemetadatafieldwithseparatorspreventstheEuropeanaautomaticsemanticenrichmentfromdetectingtheappropriatestringandenrichingtherecordbasedonthematchingstring.Othershortcomingswerebasedonthelackoflanguageattributesorrelevanthierarchicaldata.
15Seehttp://www.loc.gov/standards/mets/mets-schemadocs.html
16Seehttp://pro.europeana.eu/share-your-data/data-guidelines/edm-documentation
17Seehttp://www.uni-heidelberg.de/index.html
18Europeanarecordsforthisinstitutionhttp://www.europeana.eu/portal/en/search?view=grid&q=PROVIDER%3A%22Universit%C3%A4tsbibliothek+Heidelberg%22&per_page=96
19SeeHeidelbergerSchicksalsbuch(HeidelbergBookofFate),1491http://www.europeana.eu/portal/en/record/07932/diglit_cpg832
20SeeLeSifflet:journalhumoristiquedelafamille(LeSifflet:humorousfamilynewspaper),1872http://www.europeana.eu/portal/en/record/07931/diglit_sifflet1872.html?q=PROVIDER%3A%22Universit%C3%A4tsbibliothek+Heidelberg%22
21Seehttp://creativecommons.org/licenses/by-sa/4.0/
22Seehttp://www.europeana.eu/portal/en
33
IIIFimplementationWefocusedourworkonthisspecificproviderwiththehopeforimprovingitscollections,whichwerealreadyavailableinEuropeanaCollections,withtheIIIF23featurestheyhadimplementedontheirside.Thisopentechnologicalframeworkcanbeimplementedwithincontentmanagementsystemstoenabledeepvisualisationfeatures(zoom,crop,effects),andtomakeimagesharingeasierontheWeb.
ThemaintargetofthisexperimentwasaboutimplementingIIIFmetadataelements,whichwerenotpresentinpreviouslysubmitteddatafromthisinstitutiontotheEuropeanaCollectionsdatabase.Afterinvestigatingtheavailabledataontheinstitution’sside,wedecidedtoharvestMETSrecordsasthiswasamuchrichermetadatasource,regardingbothIIIFcoreelementsandmetadatarangeandquality.
DataqualityEvenifmetadataimprovementsarenotalwaysobviousonaresultpageintheEuropeanaCollectionsportal,theyneverthelesshaveastrongimpactonsearchandoverallfindability.Ingestionofreliabledatathereforeparticipatesinensuringacohesiveexperienceforitsusers,from
Inthecaseofdigitalculturalheritage,qualitativedatasetscouldbedefinedasensembleofstandardised(suchasLODresources),granular,specific,relevantandconsistentmetadata,associatedwithhighqualityvisualisationstandards.Thenatureoftherecordsitselfshouldobviouslybeinconsiderationwhendefiningtheoverallstrategy.Forinstance,OCR24techniqueswouldmakesenseinthecaseoftextdocumentswhilefocusingonhighdigitizationstandardswouldbettersuitphotographs.Dataqualityisyetcriticaltosupportusersfocuseddiscoveryscenarios25,andlong-termstrategytoimproveitshouldbeconsidereddefactobyanyculturalinstitutions,asaleveragetoreachawideraudience.
ByusinganothermetadatasourcefromTheUniversitätsbibliothekHeidelberg,werefinedandimprovedtheoveralldataqualitybyrelyingonLinkedOpenDataresourcesfromtheGNDauthorityvocabularymaintainedbytheGermanNationalLibrary26,whichwereavailableintheoriginalMETSrecords.Wethereforeincluded,assystematicallyaspossible,theprovidedURIsofresourcesrelatedtoagents,conceptsandplaces.ThisapproachfollowsLODimplementationbestpractices:onlylinkstoresourcesareprovidedintheingestedrecords,andthenEuropeanade-referencesthem,fetchingalltheavailablemetadataforeachprovidedURI.
Wealsoappliedstricterconditionstothemappinginordertopreservethesemanticprecisionandgranularityoftheoriginaldataasmuchaspossible.Thiswasdonebychoosingmorespecificmetadatafields,andrejectingirrelevantones.Wefocusedoncoremetadataelementsrelatedtotypology,format,temporalandgeographicalinformation.Wealsocreatedanadhocdescriptionfieldinordertoprovidemorephysicallocationinformationtousers.
Furthernormalizationwasdoneforagentsrelatedtotheserecords(e.g.creatorsandcontributors),whichwerepreviouslysentwithoutanyroledistinction.Wedisambiguatedthemappingofthese
23Seehttp://iiif.io/
24Seehttps://en.wikipedia.org/wiki/Optical_character_recognition
25MostofEuropeanausersrelyonthesearchfunctionality,and59%ofthemuseextrafilteringoptionstorefinetheirsearch.Morethanhalfoftheuserssearchitemsbasedonspecificgeographicallocation.(Source:EuropeanaCollectionsOnlineSurvey,April2016)26Seehttp://www.dnb.de/EN/Standardisierung/GND/gnd_node.html
34
elementsusingtheMARCRelatorscodes27originallyembeddedintheMETSrecords,suchas“aut”thatrepresents“Author”.Thecodeswereusedtoidentifytheagentsascreatorsorcontributors,andthenwerenormalizedintostringstobedirectlyincorporatedintotheresultingEDMrecordsasadditionalmetadata.
Finally,hierarchicalrelationshipsthatwerenotmadeavailableintheoriginalconversionwererepresentedinthenewmetadata.Wefocusedonrecordsforindividualjournalsencompassedinbiggervolumes,andmappedtherelevantmetadata-referencestoparentandchildrenrecords-withinhierarchicalfields.Thisenabledabetterexperienceforendusersthankstothedisplayofawidgetdedicatedtobrowsehierarchicalresourcesbyfollowingtheircardinalityortheirappartenance.
ResultsThefirstoutcomeofthisworkisanextensivereportpresentingthisstudycase,standingasdataguidelinesavailableintheProsectionofEuropeanaCollections28.However,ourresultsrelyonbothqualitativeandquantitativeachievements.
TheoveralldataimprovementempowerstheEuropeanausers-creatives,searchers,curious-withhigherqualityresults,allowingthemtotailortheirexperienceevenfurtherfromthemainpublicaccess.Specificdatareuseordataminingscenariosalsobenefitfromsuchexperiment,thankstotheEuropeana’sRESTAPI29.Inaddition,thecompatibilitywiththeIIIFframeworkensureaseamlessuserexperiencecarriedoutthroughextendedvisualisationfeatures.ThiscanbetransposedintomoreadvancedapplicationsbydirectlyreusingtheaggregatedIIIFmetadatafromEuropeana,e.g.withinDigitalHumanitiesvisualisationprojects.
Finally,theupdateddatasetsdidn’tnecessarilygrowinsize,recordswise.Butinsteadoftheformer1thumbnailperrecordrule(forabout25Krecords),thenewlyaddedIIIFmetadataenablestheEuropeana’sviewertofetchnowmorethan3.5Mhigh-resolutionpictures(+1600pxwide)fromalltheconnectedJSONmanisfests30.
3.EasingAccesstoLinkedDataResourcesforDigitalHumanitiesScholarsAlbertMeroño-Peñuela1andRinkeHoekstra1,21ComputerScienceDepartment,VrijeUniversiteitAmsterdam,NL{albert.merono,rinke.hoekstra}@vu.nl2FacultyofLaw,UniversityofAmsterdam,NL
Abstract.SemanticWebtechnologycomprisesavarietyoflanguages,standardsandpracticesthat,overthelasttwodecades,hasfacilitatedtheemergenceoftheLinkedOpenData(LOD)Cloud–aglobalWebgraphofmorethan100billioninterconnectedstatements[1].DatasetsinthisLODcloudcovera
27Seehttp://www.loc.gov/marc/relators/
28Seehttp://pro.europeana.eu/share-your-data/data-guidelines/edm-case-studies/the-universitaetsbibliothek-heidelberg-case-study
29Seehttp://labs.europeana.eu/api/introduction
30Seehttp://iiif.io/api/annex/notes/jsonld/#greedy-compaction-of-terms
35
varietyofdomains,includinggeography,government,lifesciences,linguistic,media,publicationsandsocialnetworking.DespitethissuccessintegratingdataontheWeb,SemanticWebtechnologyisstillverypresentateveryleveloftheLODcloud.ThisincludestheearlylayerofaccessingLinkedData;thisis,themechanismbywhichusersselectandgrabthedatatheyconsiderfortheirapplicationsoranalyses.AccessingLinkedDatarequirescertaintechnicalskills–mostlyinvolvingunderstandingoftheResourceDescriptionFramework(RDF)[6]andtheSPARQL[7]querylanguage,butalsootherssuchasSQUIN[3]orLinkedDataFragments[8]–thatveryoftenexcludepotentialusers.Inthedigitalhumanities,manyscholarslackthistechnicalknowledge,andconsequentlymissagreatdealofLODsourcesoftheirinterest.Thisincludes,butisnotlimitedto,multiplelinkeddatasetsonhistoricalstatistics(e.g.CEDAR[2],CLARIAH[4]),museumcollections(e.g.Amsterdam,BritishMuseum,Smithsonian),linguisticresources(e.g.lexinfo,BabelNet),andmedia(e.g.MusicBrainz,BBC,NewYorkTimes,LinkedMovieDatabase)).Althoughthesescholarsarebecomingmoreandmoretechsavvy,deepknowledgeoftechnologyshouldnotbeastrictrequirementforaccessingLinkedData.Inordertoaddressthisissue,weproposegrlc[5],anLinkedDataaccessingserverthatusesSPARQLqueriesstoredanywhereontheWebtogeneratecomprehensive,welldocumented,neatlyorganized,andprovenance-trustedAPIspecifications.SuchAPIsmakeanyLinkedDataactionable,makingaccesstoLinkedDatasourceseasy,repeatableandshareablewithonesingleURIentrypoint.grlcreliesontheSwaggerUI31,anOpenAPI32frontend,topresenttheseAPIstotheuserasanintuitiveuserinterface.Inthisdemo,wewillshowhowgrlccanhelponeasingthetraditionallyhightechnicalrequirementstoaccessLinkedData.WewillillustratethiswithseveralrunningusecasesinCLARIAH33,aDutchnationalprojecttobuilddigitalinfrastructureforthehumanities.
Keywords:LinkedData,API,REST,SPARQL,#LD,WebDataaccess,middleware,OpenAPI
References1.Abele,A.,McCrae,J.P.,Buitelaar,P.,Jentzsch,A.,Cyganiak,R.:LinkingOpenDataclouddiagram.http://lod-cloud.net/(2017)
2.CEDARProject,http://www.cedar-project.nl/
3.Hartig,O.:Squin:Atraversalbasedqueryexecutionsystemfortheweboflinkeddata.In:Proceedingsofthe2013ACMSIGMODInternationalConferenceonManagementofData.pp.1081–1084.SIGMOD’13,ACM,NewYork,NY,USA(2013),http://doi.acm.org/10.1145/2463676.2465231
4.Hoekstra,R.,Meroño-Peñuela,A.,Dentler,K.,Rijpma,A.,Zijdeman,R.,Zandhuis,I.:AnEcosystemforLinkedHumanitiesData.In:Proceedingsofthe1stWorkshoponHumanitiesintheSemanticWeb(WHiSe2016),ESWC2016(2016)
5.Meroño-Peñuela,A.,Hoekstra,R.:grlcMakesGitHubTasteLikeLinkedDataAPIs.In:TheSemanticWeb:ESWC2016SatelliteEvents,Heraklion,Crete,Greece,May29–June2,2016,RevisedSelectedPapers.pp.342–353.Springer(2016)
6.TheWorldWideWebConsortium(W3C):ResourceDescriptionFramework(RDF).http://www.w3.org/RDF/
31See http://swagger.io/swagger-ui/ 32See https://www.openapis.org/ 33See http://www.clariah.nl/en/
36
7.TheWorldWideWebConsortium(W3C):SPARQLQueryLanguageforRDF.http://www.w3.org/TR/rdf-sparql-query/
8.Verborgh,R.,Sande,M.V.,Colpaert,P.,Coppens,S.,Mannens,E.,vandeWalle,R.:Web-ScaleQueryingthroughLinkedDataFragments.In:Proceedingsofthe7thWorkshoponLinkedDataontheWeb(LDOW2014),WWW2014(2014)
37
SessionF
1.TheNederlabresearchenvironment:anupdateHennieBrugman&AntalvandenBoschMeertensInstitute,[email protected]
Nederlab34(Brugman,2016)isafiveyearlong'NWOgroot'projectbuildingaresearchinfrastructureforprimarilyhistoriansandliterary,linguisticandculturalscholars.Buildingthisinfrastructureinvolvesactivitiesinthreemaintracks:
1. Acquisition,harmonisation/semanticmapping,textenrichmentandmetadatacurationofasubstantialnumberofexisting(historical)DutchdigitaltextcollectionsofouracademicandculturalheritagepartnersintheBenelux.
2. ImprovingthequalityoftheoutputofexistinglanguageprocessingtoolswhentheyareappliedtohistoricalDutchtextsfrom800untilpresent.
3. Buildingavirtualresearchenvironmentwithapowerfulsearchbackendforexploration,searchandanalysisofmetadataandannotedtextfromourverylargeaggregatedandintegratedcollections(Brouwer,2016).
Wearecurrentlyinthelastyearofourproject.Therefore,inourcontributionwewouldliketotaketheopportunitytoevaluatetowhatextentwehavebeenabletoimplementouroriginal,ambitious,projectusecases.WeintendtosupportthisevaluationwithademonstrationatDHBenelux2017.
Ingeneral,weexpecttohaveprocessedbetweentwentyandthirtycollectionsbytheendofourprojectandtohavemadethoseavailabletotheresearchcommunity.Atthemomentofwritingthis,wehavereachedatotalofalmosttenbillionwordsofannotatedtext,accessiblethroughouronlineVirtualResearchEnvironment,the'researchportal'35.Duringthelastyearofourprojectwearecarryingoutanumberofscientificpilotprojectsinanopencall,totesttheusabilityofthisVREandtheNederlabcollections,andtoaddextensionsbasedonrealuserneeds.
Belowwewillzoominonouroriginalcategoriesofusecases.
1.Detectingtheonsetofchange
Whendonewconceptsoccurforthefirsttime?Ornewwordforms?Orwordcombinations(collocations)?
Bytheendofourprojectwewillhavecollectiondataforallperiodsbetween800andpresenttime,therebyenablingfulldiachronicsearches.OurNederlabresearchportalisabletovisualisetimedistributionsoverallhitsfoundforspecificqueries,bothdocumentandhitcountsandshowingabsoluteaswellasrelativefrequencies(forexample,showthenumberofoccurrancesof'vliegtuig'-airplane-foreachyear).ThesystemsupportscomplexqueriesforsequentialpatternsovermultipleparallelannotationlayersusingtheCorpusQueryLanguage(2),aquerylanguageintroducedbytheCorpusWorkBench(CWB)andregularlyusedinourdomain(e.g.bySketchEngine,MTAS,BlackLab).NederlabusesMultiTierAnnotationSearch(MTAS) 36.SearchingforpatternsusingCQL,incombinationwithgroupingofresultsenablesresearcherstoinvestigatewordcombinationsandhow
34 www.nederlab.nl 35 www.nederlab.nl/onderzoeksportaal 36 https://meertensinstituut.github.io/mtas/
38
oftentheyoccur,forspecificperiodsintime.Forexample,itispossibletoqueryforthemostfrequentnounsusedinsentencescontainingthelemma'varen',foreachcentury,toinvestigatepotentialshiftsinmeaningovertime(inthiscasefrom'go'to'gobyboat').
2.Establishingthespreadofchanges
Howdosuchchangesspread,overtime,overplaces,fromonetexttypetoanother,fromoneauthortoanother?
Oursystemallowsuserstosearchforwordsorpatternsandvisualisetheresultsasdistributionsovermanymetadatadimensions,evenovermultipledimensionssimultaneously(e.g.timeandgenre).Itisalsopossibletodirectlycomparetimedistributionsfordifferentsearchtermssimultaneously(usinga'trends'visualisation,e.g.'mensch'versus'mens')(TjongKimSang,2016).
3.Findingconnectionsandnetworks
Findandinvestigatemotivesusingsemanticwordfieldsaroundconcepts.Establishrelationsbetweenpersonsandplaces.
WecurrentlyalreadysupportexpansionofquerieswithhistoricalvariantsusingawebservicebuiltaroundtheDutchhistoricallexiconbytheInstituutvoordeNederlandseTaal(INT).Weintendtogeneralizeandextendthisqueryexpansionmechanismtoincludesemanticexpansionandexpansionwithuserdefineddomainlexica.Wewilldothisincollaborationwithanumberofourongoingscientificpilotprojects.Anexampleofsuchadomainlexiconisasemanticlexiconcontainingemotionwords.
Networksofpersonsandplacescanbechartedonbasisofthenamedentitiesthatwereaddedtoourcorpusduringtheenrichmentprocess.WeuseCQLsearchingincombinationwithgroupingfunctionalitytodothis(e.g.listthemostfrequentlymentionedpersonsinsentencesorparagraphscontainingthelocation'Deventer').
4.Detectingsimilaritiesanddifferencesbetweentexts
Investigatereuseoftextfragmentsamongauthors.Comparetextsortextcollectionswithcorpusanalysistools.
Forindividualtextsorforanysubcollectionoftextsfromourcompletecorpuswecanqueryforstatistics.Wecandeterminetotalnumbersofdocuments,tokensandtypes,butalsomeanandmediannumberofwordsperdocument,infact,oursystemcanreturncompletewordcountdistributionsthatcanbedirectlyvisualised.Otherstatisticsthataresupported:numbersofsentences,paragraphs,divisions,heads,frequencylistsoverwordsoroveranyoftheannotationlayers,foranysubcollectionofourcorpus.Allofthesestatisticsandlistscaninprinciplebeusedtocomparetextdocumentsorcompletedocumentcollections.Allstatisticscanalsobeexportedforfurtheranalysisinexternaltools,likeforexampleR.
ConclusionAfteranumberofyearsofconstructingthefoundationsofourinfrastructure,theprojectisnowatastagewherewecanstartusingitforrealresearchpilotsorprojects.Althoughthereissubstantialroomforimprovementonmanyaspectsofourproducts,ourinitialaimsarewithinreach.
ReferencesBrouwer,Matthijs,HennieBrugman,Marckemps-Snijders(2016).‘MTAS:ASolr/LucenebasedMultiTierAnnotationSearchsolution’,CLARINAnnualConference2016,Aix-en-Provence,France,26-28October2016.
39
Brugman,Hennie,MartinReynaert,NicolinevanderSijs,RenévanStipriaan,ErikTjongKimSang,AntalvandenBosch(2016).‘Nederlab:TowardsaSinglePortalandResearchEnvironmentforDiachronicDutchTextCorpora’,in:ProceedingsofLREC(10theditionoftheLanguageResourcesandEvaluationConference,23-28May2016,Portorož(Slovenia),pp.1277-1281.
Christ,O.(1994).AModularandFlexibleArchitectureforanIntegratedCorpusQuerySystem.InProceedingsofCOMPLEX’94:3rdConferenceonComputationalLexicographyandTextResearch,Budapest).
TjongKimSang,Erik(2016).'FindingRisingandFallingWords',In:ProceedingsoftheCOLING2016workshopLanguageTechnologyResourcesandToolsforDigitalHumanities,ACL,Osaka,Japan,2016.http://ifarm.nl/erikt/papers/lt4dh2016.pdf
2.ModelingtheevolutionoflanguagesthroughtextminingAproposedmethodologyappliedtothetransitionbetweenLatinandromancevernaculars
FlorianCafieroandRemyVerdo
Themechanismsatstakeinthepassagefroma“dilateddiasystem”,wherealanguagebecomesmoreandmorecomplex,toa“disconnecteddiasystem”,wheretwodistinctlinguisticsystemsappearinthesameculturalsystem,seemtobeawell-studiedproblematic.
Forinstance,severalmodelshavebeenpresentedtodescribetheevolutionfromLatintoromancevernacularsinthepastdecade.ThefirstmodelproposedtoaddressthisquestionisErnstPulgram’s(Pulgram,1950:462).InthispioneeringworkLatin,languageishoweverrepresentedaccordingtothetraditionalwrittenvs.oraldistinction,anddoesnotallowaverydetailedanalysis.Itsdeterministicapproachmightalsoleadtosomeinaccuracies,thelanguagebeingconsideredasalwaysfurtherfrom“oldLatin”asthetimegoesby.In1986,WalterBerschin(Berschin,1986:148)proposedamorecomprehensivemodeling.Berschinproposesatwo-sideddiachronicmodeling.TheconceptofvulgarLatinismorerefinedhere,asitincludesbothwrittenandspokenlanguage.Yet,heretoo,vulgarLatinisseparatedfromliteraryLatin,the“stylisticallevel”(Stilhöhe)ofwhich,evenwhenitisatlowest,nevercrosses,oreventouches,thecurveoforallanguage.Whatismore,thisVulgarLatinissupposedtoevolvelinearly,asinPulgram’sworks.ThecurvemodelingliteraryLatinseemstorepresentthesoleevolutionofthehigherregisteroflanguagethatisobserved.Itdisregardstheco-existenceofdifferentregistersoflanguageinliteraryLatin,andignorestheirarticulationtovulgarLatin.Last,wecanonlyregrettheabsenceofdatatakenfrom“diplomatic”texts,inwhichstylisticandpragmaticeffortsarealsotobenoticed.
Hence,thosestudiesraiseafewproblems.Theydonotaddresswelltheproblemofregisters,usingverybroaddistinctions,andforgettingthepossibilitythatdifferentlanguageregisterscouldbeusedatthesametime,eveninthesametext.Theyalsoare“expertsview”,basedontheauthor’sextensiveexperience,ratherthanonasystematicanalysisofthetexts.
Wethusproposeamethodologytosystematicallystudytheevolutionofalanguagefromaformtoanother,takingintoaccountourremarkonregisters.Thismethodologyinvolvescomputerizedstatisticalanalysisandartificialintelligencebutshouldnotbeseenasanautomatedprocessdisconnectedfromthelinguist’sanalysis.Onthecontrary,ithasbeendesignedasawaytoextendthewayofthinkingofaparticularexpert.Itenablestopartiallyre-createhisownpointofview,andtoapplytoalargeamountoftext,thatwouldtaketoolongtoanalyzeotherwise.
Thefirststepconsistsin“traditional”linguisticanalysisonaselectionoftexts,aimedatdifferentiatingseveralregistersusedinsidethetextsofone’speriodofinterest.
40
Oursamplecorpusconsistsinthreehagiographicaltextsandtwenty-onediplomatictexts.OurthreehagiographicaltextswerewritteninlaterMerovingianorinearlyCarolingianages(ca.650-780),thenrewrittenduringtheCarolingianRenaissance(from780tothedeathofCharlestheBaldin877,orso).Thediplomatictextsare21originalFrankishroyalchartersdatingfromca.665to868.Mostofthemareaccountingforajudgment.OriginallypartofthegreatcollectionofthemonasteryofSaint-Denis,theyarekeptintheFrenchnationalArchive.
Weisolatefivelanguageregistersinthissamplecorpus,consistentwithMichelBanniard’sworks(Banniard,2008),andwedesignatableofcriteriatocharacterizethem.
Wethengothroughacalibrationphase.Wetrytoapplyvariouscomputingmethodsthatcanhelpisolatingdifferentlanguageregistersusedinthevarioustextsofthecorpus-orinsideontextofthecorpus.Thisinitiallycallsforunsupervisedmethods,aswewouldnotwanttoinfluencethecomputations’outcome.Thestatisticalanalysiscouldrevealdivisionsweignored,highlightunnoticedphenomena...Wetrytoimplementclusteringalgorithmssuchask-means,hierarchicalclustering,andvariousneuralnetworks.Wethencomparetheperformanceofthosealgorithmswithsupervisedalgorithms,whereoursamplecorpusisusedastrainingdata.
Crucialforthoseanalysisisthewaywechoosetopresentthetextstoouralgorithms.Lemmatizingthetextswouldremovetoomuchinformation.Here,evensmallvariations,suchaswrittenformvariations,arelikelytobesignificant.Itcansometimesbeevenmoresignificantthanthegrammaticalstructureofthetextsitself.Thisiswhyweapplyourcomputationstotwotypesofversionofourcorpus’texts.Inthefirstversions,thetextsaretreatedasalistofwords,withoutanyfurthertreatment,orwithaselectionofthemostfrequentwords.Inthesecondversions,thetextsaretreatedasn-grams(for8>n>3),withoutanyfurthertreatment,orwithaselectionofthemostfrequentforms.N-gramscandemonstrategreatperformancehere,astheyallowtotakeimplicitlyintoaccountthestructureofthesentences-here,whichwordcomesafterwhich.
Wecompareallthosefindingswithourown“expert”modeldesignedonoursample,andselectthesolutionthatgivesthemostaccuratedivisioninregisters.
Wethenruntheselectedalgorithmonanextendedcorpus,formedbyalargeselectionoftextswrittenduringthesameperiod(650-877).Wethenfollowtheregister’sevolutionacrosstimeonthisbroadercorpus.Wethenconcludeontheglobalconsistenceoftheseresultswiththemodelwedesignedbyanalyzingourfirstsample.
BIBLIOGRAPHYMichelBanniard,«Dulatindesillettrésauromandeslettrés:laquestiondesniveauxdelangueen
France(viiie-xiiesiècle)»,inZwischenBabelundPfingsten:SprachdifferenzenundGesprächsverständigunginderVormoderne(9.-16.Jh.):Aktender3.deutsch-französischenTagungdesArbeitskreises«Gesellschaftund
individuelleKommunikationinderVormoderne»(GIK)inVerbindungmitdemHistorischenSeminarderUniv.Luzern,Höhnscheid(Kassel),16-19nov.2006,PetervonMoosed.,Münster,2008(«GesellschaftundindividuelleKommunikationinderVormoderne»,1),p.269-286.
W.Berschin,BiographieundEpochenstilimlateinischenMittelalter,Stuttgart,t.3:KarolingischeBiographie,750-920,1991.
PieraMolinelli,«Perunasociolinguisticadellatino»,inLatinvulgaire–latintardif:actesduVIIe
colloqueinternationalsurlelatinvulgaireettardif(Séville,02-06septembre2003),éd.CarmenAriasAbellán,Séville:UniversidaddeSevilla,2006,p.463-474.
41
Giovanni Polara, « Problemi di ortografia e di interpunzione nei testi latini di età carolina », Grafia e interpunzione del latino nel Medioevo (Roma, sept. 1984), éd. Alfonso Maieru, Rome, 1987.
Ernst Pulgram, « Spoken and written Latin » , Language. Journal of the Linguistic Society of America, t. 26, 1950.
3.Experimentsinfine-grainedentitytypingforDutchMariekevanErpandPiekVossenComputationalLexicologyandTerminologyLab,VrijeUniversiteitAmsterdam
IntroductionManyentityrecognitionapproachesclassifyrecognisedentitiesintoalimitedsetofcoarse-grainedentitytypes[1].However,fine-grainedentitytypesaremoreusefulfordeepernaturallanguageanalysisandend-usertasks,inparticularinthedigitalhumanitiesdomainwhereentitylinking(groundinganentityinaknowledgebase)isnotpossible.Forexample,whilestandardnamedentityrecognitionmaydeterminethatanentityisapersonknowingwhetherthatentityisawriterorapoliticianisimportantforpopulatingadatabaseofpersonswithparticularoccupations.Currently,fine-grainedentitytypinghasonlybeeninvestigatedforEnglish.Inthisabstract,wepresentafine-grainedentitytypingsystemforDutchusingtrainingdataextractedfromWikipediaandDBpedia.OursystemachievescomparableperformancetoEnglishwithanF1measureof.90on59typesand.57on269types.
ApproachOurapproachtogeneratetrainingdataisinspiredby[2]and[3].In[2],thetrainingdataisgeneratedusingWikipedia,wherethewikilinkanchortextisextractedasanentitymentionwhichmapittoitscorrespondingFreebaseentitytypes.WealsotaketheWikipediawikilinks,anchortextandsurroundingtext,butinsteadoflinkingittoFreebase,welinkittoDBpedia[4].TheadvantageofDBpediaisthatitisbasedonWikipedia,thereforethereisadirectlinkavailablebetweenawikilinkandDBpediathroughamappingsfile.37
Featurename Description Example
Mention Theentityphrase SanFrancisco
Head Thesyntacticheadoftheentityphrase Francisco
Non-head Thenon-headtokensintheentityphrase San
Entity-shape Thewordshapeofthewordsintheentityphrase
AaaAaaaaaaa
Trigrams Charactertrigramsintheentityhead _FrFraranancncicisiscscoco_
Wordbefore Thewordbeforetheentityphrase te
Wordafter Thewordaftertheentityphrase Californië
37http://downloads.dbpedia.org/2016-04/core-i18n/nl/wikipedia_links_nl.ttl.bz2
42
Table1:Descriptionoftheextractedfeatures
Webaseourfeaturevectorson[3],whereweleaveoutthedependencyandtopicrelatedfeaturesduetoprocessingconstraints.ThisresultsinthefeaturesdisplayedinTable1.
Tocompareourresultstothoseinpreviouswork,wemappedtheDBpediatypehierarchytotheentitytypinghierarchyusedin[2]and[3].Outofthe86typesthatwerepresent,9typescouldnotbemappedtotheDBpediatypehierarchy.38Asnotalltypesarepresentinthedataset,weonlyfind59ofthetypesfrompreviousworkinourdataset.WealsoperformaseriesofexperimentswiththefullDBpediatypehierarchy,resultinginanexperimentwith269typestopredict.
Astherearenofine-grainedentitytypingdatasetsavailableforDutchyet,wesplitthegenerateddatasetinto⅔partsfortrainingand⅓partsfortest.Thisresultsinabout1millioninstancesfortrainingonthesetwith59entitytypes,and2milliononthesetwith269entitytypes.
WeusetheFastTextalgorithm[5,6]39totrainourtypepredictionmodel.Thisalgorithmlearnsrepresentationsforcharactern-gramsandwordsarerepresentedasthesumofthen-gramvectors.Thishelpsincoveringmorphologicallyrichlanguages,wordsthatdonotoccuroftenandpotentiallyentitymentionsthatdonotoccurinthetrainingcorpus.
ExperimentsandResultsWefirstevaluateourapproachontheentitytypesfrompreviouswork(rows2-6inTable2).AtLevel1,coarse-grainedentitytypes(person,location,organisation,andother)areevaluated.Thesearethesamehigh-leveltypesthatarepresentinmostnamedentityclassificationtasks.AtLevel2,thefiner-grainedentitytypesthataredirectlybelowtheseareevaluated(e.g.person/artistandorganisation/company).AtLevel3,superfine-grainedtypesareevaluated,forwhichwestillachieveamacroF1of.90(e.g.person/artist/musicandorganisation/company/news).
Types Precision Recall F1
Level1:4types .98 .98 .98
Level2:33types .92 .90 .91
Level3:24types .89 .91 .90
Overall(59types) .93 .88 .90
Overallonlydarkentities(59types) .67 .56 .60
DBpediatypes(269) .68 .52 .57
DBpediatypes,onlydarkentities(269types) .50 .41 .44
Table2:Precision,recallandmacro-averageF1
38Thetypeswecouldnotmapwerethefollowing:location/structure/government,organization/stockexchange,other/health,other/livingthing,other/product/car,other/product/computer,person/education,person/education/student,person/education/teacher39https://github.com/facebookresearch/fastText
43
Wealsoevaluatedtheapproachononlydarkentities(i.e.entitymentionsthatwerenotpresentinthetrainingdata).40HereweseethatthescoresdroptoandF1of.60whichisinlinewithpreviousresearch[7].Itisunlikelythatthereisnooverlapbetweenthetrainingandtestdata,butthisissuedeservesfurtherinvestigation.
Furthermore,weseethattheresultsfortheDBpediatypehierarchycontaining269typesaresignificantlylower,butthereislesstrainingdataavailableforthoseandnotall685DBpediatypesarecovered.ThisispartlyaresultofthemappingsfileonlycontainingthemostspecificDBpediatype,forexamplehttp://nl.dbpedia.org/resource/Old_Amsterdamislistedashavingtype‘Cheese’inthemappingsfile,butitssuperclass‘Food’isnotpresent.
ConclusionsandFutureWorkWehavepresentedanapproachandexperimentsforfine-grainedentitytypingforDutchwhichcanbeparticularlyinterestingforcollectinginformationaboutentitiesindigitalhumanitiessources.OurresultsareonparwithpreviousworkforEnglishandoursoftwareisavailableathttps://github.com/cltl/multilingual-finegrained-entity-typing.
Forfuturework,weaimtotesttheapproachonhistoricaldatasetssuchastheNIOD“GetuigenVerhalen”datasetandBiografischPortaal.Wealsointendtocompileasubsetofmostrelevanttypesforthedigitalhumanitiesdomainandprovideatrainedmodelforreusebyhumanitiesresearchers.
References:[1]Nadeau,D.,Sekine,S.:Asurveyofnamedentityrecognitionandclassification.LingvisticaeInvestigationes30(1),3–26(2007)
[2]Ling,X.,Weld,D.S.:Fine-grainedentityrecognition.In:AAAI(2012)
[3]Gillick,D.,Lazic,N.,Ganchev,K.,Kirchner,J.,Huynh,D.:Context-dependentfine-grainedentitytypetagging.In:arXiv(2014)
[4]Bizer,C.,Lehmann,J.,Kobilarov,G.,Auer,S.,Becker,C.,Cyganiak,R.,Hellmann,S.:DBpedia-acrystallizationpointforthewebofdata.WebSemantics:science,servicesandagentsontheworldwideweb7(3),154–165(2009)
[5]Bojanowski,P.,Grave,E.,Joulin,A.,Mikolov,T.:Enrichingwordvectorswithsubwordinformation.Tech.rep.,Archiv(2016),https://arxiv.org/abs/1607.04606
[6]Joulin,A.,Grave,E.,Bojanowski,P.,Mikolov,T.:Bagoftricksforefficienttextclassification.Tech.rep.,arXiv(2016),https://arxiv.org/abs/1607.01759
[7]Yaghoobzadeh,Y.,Schütze,H.:Corpus-levelfine-grainedentitytypingusingcontextualinformation.In:Proceedingsofthe2015ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.pp.715–725.AssociationforComputationalLinguistics,Lisbon,Portugal(17-21September2015)
40Whilstwemadesurethetrainingdataandtestdatawereseparateontheinstancelevel,popularentitiescanstillbementionedinbothdatasets
44
SessionG
1.PredictingfamilialriskofdyslexiabyapplyingmachinelearningtoinfantvocabularydataAoChen*1,2,FrankWijnen2,CharlotteKoster3,HugoSchnack11DepartmentofPsychiatry,BrainCenterRudolfMagnus,UniversityMedicalCenterUtrecht,Utrecht,theNetherlands2InstituteofLinguistics,UtrechtUniversity,Utrecht,theNetherlands3CenterforLanguageandCognitionGroningen,UniversityofGroningen,Groningen,theNetherlands
BackgroundThecombinationofrapidprogressinthedevelopmentofcomputationaltools,suchasmachinelearning,andthegrowingavailabilityofdigitizeddatainlanguageresearch(e.g.,theDANSdataarchive)andtoolstoassessthesedata(e.g.,viaCLARIAH),hasmadeitpossibletoinvestigatelanguageacquisitioninanautomatedwayandonalargescale(weused22,000vocabularyscoresinourstudy).Inthisstudy,weappliedamachinelearningalgorithmtovocabularydatatomapthepatternofvocabularydevelopmentinindividualchildren.Weinvestigatedwhetherindividualdifferencesbetweenchildreninthewordknowledgeindifferentwordclasses(e.g.,nouns,pronouns,helpingverbs)canbeusedtodetectifachildisatriskofdevelopingdyslexia.Earlydetectionofdevelopmentaldyslexia,aspecificreadingdisorder,willenableinterventionsatanearlyage,beforetheonsetofformalreadingandspellinginstruction.Althoughdeviationsinearlyspeech/languagedevelopmenthavefrequentlybeenrelatedto(riskof)dyslexia(vanderLeijetal,2013),noneofthesemarkershavebeensuccessfullyusedtopredictlaterlanguage/literacyperformanceattheindividuallevel.Machinelearningisatechniquecapableofdiscoveringpatternsindatatomakesuchpredictions.Inthepastdecademachinelearninghasbeensuccessfullyemployedin,e.g.,medicineandthehumanities.Recentexamplesincludethepredictionofdisease-courseinpsychosis(Koutsoulerisetal,2016)andtheattributionofawriterwhowaspreviouslynotconsidered,asauthoroftheDutchanthem(Kestemontetal,2016).Theaimofthisstudywastoinvestigateifearlyvocabularydevelopmentcanbeusedtopredictwhetherornotaninfantisatriskofdyslexia.
MethodWeinvestigatedearlyvocabularydevelopmentintwolarge,independentsamplesofchildrenatfamilialriskofdyslexia(FR;N=495)andtypicallydevelopingchildren(TD;N=498)between17and35monthsofage.TheDutchversionoftheMcArthur-BatesCommunicativeDevelopmentInventory(WordsandSentences)(N-CDI;Fensonetal,1993)wasusedtomeasureeachinfant’svocabularydevelopment.Thiswasdonebycountingthenumberofwordshe/sheknewin22wordcategories.Theseso-called22featuresformedtogetherthefeaturevectorrepresentingthissubject.Wetrainedalinearsupportvectormachine(SVM;Vapnik,1999)topredictthestatusofat-riskattheindividuallevel,basedonthesefeaturevectors.SVMisasupervisedmachinelearningtechniquethatisabletofindpatternsintheinputdata(wordcountsin22categories,inourcase)thatarerelatedtosomeoutputmeasure(inourcase:belongingtotheFRorTDgroup).Thetrainingprocedureresultsinamodelthatoptimallypredictsfor(new)subjectstowhichgrouptheybelong.Thispredictionisbasedontheweightedsumoftheinputvariables,wheretheweightsaretheresultoftheoptimizationprocedureduringtraining.
PerformanceofourpredictionmodelwasassessedbythepercentagesofFRsubjectsthatwerecorrectlyclassifiedasFR(sensitivity),thepercentageofTDsubjectscorrectlyclassified(specificity)andthebalancedaccuracy(meanofsensitivityandspecificity).
45
Themodel’sgeneralizabilitywastestedusingcross-validation.Inthissetupthemodelistrainedandtestedindifferentsubsamples.
ResultsTherewasaspecificageperiod,18-20months,inwhichthemodelwassensitivetopredictthestatusofbeingatrisk(FR).At19-20monthsofage,thecross-validationaccuracywas68%(p<0.01),withsensitivitybeing70%andspecificitybeing67%.Intheotheragegroupstheaccuracywaslowerandnotsignificant.
Notall22featurescontributedtothesameextenttothediscriminationbetweentheFRandTDsubjectsatage19-20months.Theweightsof5wordcategoriesweresignificantlydifferentfromzero.Thecategorieshelpingverbsandprepositionsandlocationscontributedmost.Themodelhadlearntfromthedatathatknowingfewerwordsinthesecategoriesatthisageisasignificantmarkerforbeingatfamilyrisk.
ConclusionMachinelearningmethodsarepromisingtechniquesforseparatingFRandTDchildrenatanearlyage,beforetheystartreading.ThereisasensitivewindowinwhichthedifferencebetweenFRandTDismostevident.ThemodelalsoindicatedthewordcategoriesinwhichFRinfantsknow(onaverage)fewerwordsascomparedtoTDinfants.Itshouldbenotedthatwedidnotpredictthemanifestationofdyslexia,butonlyelevatedrisk.Wewillfollowthesechildrenup,andtheultimategoalistotrainamodelthatisabletodiscriminatebetweentheFRchildrenwhodevelopdyslexiaandwhodonotatanearlyage.
ReferencesCLARIAH.http://www.clariah.nl
DANS.https://dans.knaw.nl
Fenson,L.,Dale,P.S.,Reznick,J.S.,Thal,D.,Bates,E.,Hartung,J.P.,etal.(1993).TheMacArthurCommunicativeDevelopmentInventories:User’sGuideandTechnicalManual.SanDiego,CA:SingularPublishingGroup.
KestemontM,StronksE,DeBruinM,DeWinkelT.VanwieishetWilhelmus?(2016Dec)AmsterdamUniversityPress.
KoutsoulerisN,KahnRS,ChekroudAM,LeuchtS,FalkaiP,WobrockT,DerksEM,FleischhackerWW,HasanA.Multisitepredictionof4-weekand52-weektreatmentoutcomesinpatientswithfirst-episodepsychosis:amachinelearningapproach.LancetPsychiatry.2016Oct;3(10):935-946.doi:10.1016/S2215-0366(16)30171-7.
vanderLeij,A.,vanBergen,E.,vanZuijen,T.,deJong,P.,Maurits,N.,andMaassen,B.(2013).Precursorsofdevelopmentaldyslexia:anoverviewofthelongitudinaldutchdyslexiaprogrammestudy.Dyslexia19,191–213.doi:10.1002/dys.1463.
Vapnik,VN.(1999).Anoverviewofstatisticallearningtheory.NeuralNetworks,IEEETransactionson,10(5),988-999.doi:10.1109/72.788640.
46
2.TheDictionaryoftheSouthernDutchDialects(DSDD):DesigningaVirtualResearchEnvironmentfordigitallexicographicalresearchProf. dr. Jacques Van Keymeulen Ghent University, Belgium
ThesouthernDutchdialectareaconsistsoffourdialectgroups:(1)theFlemishdialects,spokeninFrenchFlanders(France),WestandEastFlanders(Belgium)andZeelandFlanders(TheNetherlands);(2)theBrabanticdialects,spokeninAntwerpandFlemishBrabant(Belgium)andNorthernBrabant(TheNetherlands);(3)theLimburgiandialects(spokenintheLimburgprovincesofBelgiumandTheNetherlands);(4)theZeelanddialects,spokeninZeelandandGoeree-Overflakkee(theNetherlands).
ThedialectvocabularyoftheFlemish,BrabanticandLimburgiandialectsiscollectedinthreeregionaldictionaries(WVD,WBDandWLDrespectively),whicharesetupaccordingtothesameplan,conceivedbyprof.A.Weijnen(Nijmegen):theyareonomasiologicallyarrangedandpublishedinthematicfascicles.Contrarytotheirtitles,thesedictionariesaretobeconsideredasgeographically-orientatedinventoriesofwordusage,andnotasdictionariesproper,sinceitisimpossibletodescribemeaninginanonomasiologicallyarrangeddictionary.Theyareatlasses,notdictionaries!Weretaintheworddictionaries–however–sincethethreeprojectsaretraditionallyknownassuch.
Figure1:Researchareasofthe4regionaldialectdictionariesofthesouthernDutcharea
ThethreedictionariesdescribethevocabularyofthetraditionaldialectsofthefirsthalfofthetwentiethcenturyinthesouthernpartoftheDutchlanguagearea,inajointinternationalandinter-universityproject.TheWBD,the'mother'ofthetwootherprojects,wasfinishedin2005;theWLDwascompletedin2008.TheywerecompiledattheUniversityofNijmegenandtheUniversityofLeuven.TheWVDstarted12yearslaterthanitssisterprojects(in1972attheGhentUniversity,byprof.W.Pée)andwillcontinueuntilabout2019.
47
Thedictionariesweresetupinparallelinordertomakepossibletheaggregationofthedata,thusfulfillingtheobjectivesofthefoundersoftheprojects.Tothateffect,in2016aconsortiumof11linguists,computerscientists,digitalhumanitiesexpertsandgeographerswascreatedsupportingtheproject“DictionaryoftheSouthernDialects”(DSDD).ItaimsattheaggregationandstandardizationofthethreecomprehensivedialectlexicographicdatabasesintooneDSDD-database(towhichhopefullythealphabeticallyarrangedWZDwillbeaddedinthefuture).Inparticular,dialectologistsfromGhentUniversityworkcloselywiththeGhentCentreforDigitalHumanities(GhentCDH)topreparethegroundfortheaggregationofthethreeSouthernDutchdialectdatabasesandtheirexploitationviaaVirtualResearchEnvironmentfordigitallexicographicalresearch.TheGhentteamwillworkcloselywiththeInstituutvandeNederlandseTaalwithregardtothetechnicalandlinguisticsustainabilityoftheDSDD.ThroughthiscollaborationinteroperabilitywithCLARINwillalsobeensured.TheDSDDisadditionallyapilotprojectofDARIAH-BEBelgium.
TheDSDDVirtualResearchEnvironmentwillenablearesearchprogrammewithnewresearchquestions,particularlyinthefieldofquantitativelexicologyandgeographicalanalysis.Duringtheproject2-3researchusecaseswillbedevelopedtotesttheapplicabilityofthenewlyaggregatedDSDDfordigitalscholarship.Forexample:
1. Whatsystematiclexico-geographicalpatternsdothesouthernDutchdialectsshow?Dotheycoincidewiththetraditionalones,basedonphonology?(seeDeVriendt2012).Aretheregeographicalpatternsinsemantics?
2. Inordertoexplorethegeographicalspreadingofseveraldialectologyconceptsandtolinkthemto“Kloekeplaatscodes”(whichareusedinlinguisticresearchformapping/linkingdialectologyconceptstogeographicalregions),asetofgenericbuildingblocksforautomaticatlas/heatmapgenerationwillbedeveloped.Segmentationandclusteringtechniquescanberunoverthegeneratedatlases/heatmapsinordertoautomaticallydetectthehomogeneity(orheterogeneity)ofaparticulardialectologyconcept.Furthermore,spatialqueryingtechniqueswillbesupportedinordertogeographicallysearch/explorethiskindofdialectologyconcepts.
3. Clusteranalysisandexplorationofthelinkage(andvisualization)oflinguisticdatawithsynchronicanddiachronicextralinguisticdataofallkinds.
Bytheendoftheproject,theDSDDwilla)makethenewlyaggregatedDSDDavailableviaauser-friendlywebsiteandb)enabletheDSDDfordigitalscholarship.Toenablethis,aprofessionallydesigneduser-friendlywebapplication,orVirtualResearchEnvironment,(includingapplicationprogramminginterface(API)fordataexport)willbecreated.Theexporteddatawilluseexistingdigitalresearchtools(e.g.forgeo-visualisation,qualitativelexicologyanddialectometry)tovalidatetheresearchcasestudiesdescribedabove.
AttheDHBeneluxConference,wewillproposetheplanfortheaggregation,thestructureofthedatabaseanddwellonthedifferent‘editorial’problemsthathavetobesolved.Thedifferentdictionaries/databasewereindeedcomposedoveraverylongperiodoftime,atdifferentplaces(Nijmegen,Leuven,Ghent)andbydifferenteditors,henceagreatnumberofinconsistenciesaroseovertime.InordertocomposeanaggregatedDSDD-database,anumberofstandardizationactivitieshavetobecarriedout.Additionally,wewillpresenttheinitialresultsoftheVirtualResearchEnvironmentrequirementsanalysis.
3.Establishinginterdisciplinarydialogue:conductingaqualitativeinvestigationintolinguisticrequirementsforNaturalLanguageGenerationEmmaClarkeandOwenConlan
48
BackgroundDialoguesystems,commonlyreferredtoaschatbotsarebecomingincreasinglypopular.In2016,chatbotwasshortlistedaswordoftheyearbyOxfordDictionaries41andplatformssuchasFacebook
Messenger2arefrequentlyutilisedtocommunicateupdatesorinformation,sellproductsorprovideservices.Whilethegoalofadialoguesystemwhichcommunicatesnaturallywithitsuserappearedtohavebeen‘withinreach’asfarbackas2001(Rambowetal.,2001),currentNaturalLanguageGeneration(NLG)researchapproachescontinuetohavelimitationswhenitcomestothe‘natural-ness’oftheirinteractions(LeCunetal.,2015)(ReiterandDale,2006)(ManningandSchütze,1999).Thus,theNLGfieldislookingtomovetowardsmorenaturalconversationalinterfacesbytakinginfluencefromnaturalhumanspeechandasdialoguesystemsbecomemorehuman-like,theinterspersionofpersuasivelanguagewithinthemwillbecomemoreapplicable.Somepriorresearchhasbeencarriedoutonthedevelopmentofpersuasivedialoguesystems(Prakken,2009)(Parsonsetal.,2003)(WaltonandKrabbe,1995).Mostrecently,Hiraokaetal.(2016)observedthat“thesepersuasivedialoguesystemsareintheirfirststagesofdevelopment,andarefarfromtheabilitiesoftheirhumancounterparts,bothintermsofpersuasiveability,andalsoabilitytoachieveusersatisfaction”.Thefocusofthisresearchprojectisthelanguageofpersuasion,namelyrhetoricaldevices.WebelievethatinordertounderstandtherequirementsoftheNLGcommunityinthisarea,theestablishmentofcross-disciplinaryconversationisessential.
ChallengeThenuancesofhumanspeechsuchassarcasm,slangandwordplayandthehumanabilitytoprocessandunderstandthesesubtletiesmakethemequallyfascinatingandfrustratingforresearchersintheareasofnaturallanguageprocessing,understandingandgeneration.AmajorchallengefacedbyNaturalLanguageGeneration(NLG)researchersishowtoincorporatelinguisticunderstandingintoNLGsystemsinordertogeneratemorenaturalsoundinglanguage.Thischallengeisexpectedtocontinuetopervadeinthenextgenerationofnaturallanguagesystems(Dale,2016)(LeCunetal.,2015)(WardandDeVault,2015)(Gartner,n.d.).
OftenlackingindialoguesystemsandNLGresearchislinguisticexpertisepresentedinaformwhichisunderstandable,thatdissectsnaturalelementsofhumanspeech,particularlyelementswhicharedifficultformachinestolearn.WardandDeVault(2015)highlightthisinterdisciplinaryengagementintheir‘TenChallengesinHighly-InteractiveDialogSystems’.
Asinterdisciplinaryresearchbecomesmoreprevalent,therequirementforcomputersciencepractitionerstoengagewithnon-technicalresearchersfromdiversebackgroundswillincrease.Dale(2016)alsoreferstocross-disciplinaryconversationsandencouragesdialoguesystemsdeveloperstoaccesstheexpertiseofthecomputationallinguisticscommunity,inwhichresearchintodiscoursephenomenahasbeenon-goingsincetheinceptionofthefield.Dale(2016)presentsanencouragingcalltoaction:“Ifwewanttohavebetterconversationswithmachines,westandtobenefitfromhavingbetterconversationsamongourselves.”.
ApproachTheoverallaimofthisresearch(fig.1)istoestablishananapproachtounderstandinghowrhetoricaldevicesfunctioninnaturalhumanspeechinordertoproposeamethodwhichcanbebuiltintopracticalNLGapplicationssuchasdialoguesystems(chatbots).Theworkwilldrawuponstructuredratherthanrandominfluencebyobservingtheusageoftheselinguisticstrategiesforpersuasioninhumanspeech.Fromtheseobservations,aTEIschemahasbeencustomisedinorderto
41 https://en.oxforddictionaries.com/word-of-the-year/word-of-the-year-2016 2
https://www.messenger.com/
49
markupasetofrhetoricaldeviceswithinacorpus.
Figure1
Thispaperwillpresentfindingsonthecentralcomponentofthediagramabove:thecross-disciplinaryengagementwithNLGpractitionersinordertodevelopapragmaticapproachtoincorporatingpersuasivelanguageintodialoguesystems.WeexplorehowacustomisedTEIschemaisusedinsemi-structuredinterviewswithNLGresearchers(anongoing,iterativeprocess).Basedonqualitativefindingsfromtheinterviews,theschemaisrevisedandamendedtoincorporaterequirementsandsuggestions.ThefinalschemawillultimatelybeusedtomarkupandannotatespeechesfromthecorpusinordertobeaddedtoNLGaspartofthesystemtraining.
MethodAseriesofsemi-structuredinterviewsarebeingcarriedoutinwhichtenNLGpractitionersinareaskedquestionsinordertounderstandcurrentandfuturerequirementsofNLGapplicationssuchasdialoguessystems.
Inthecourseofeachinterview,theTEIschemaispresentedandthesuggestionsoftheNLGpractitionerssought.Theinterviewsarerecordedandtheresultingoutcomesareanalysedusingatlas.tisoftware.TheresultsarethensummarisedtocreateanoverallpictureofNLGresearcherrequirements.
Outcomes(todate)Theprocessoutlinedaboveisongoingatthetimeofsubmission.However,preliminaryfindingsfromtheinterviewscanbesummarisedasfollows:
• ●Bothtemplate-drivenanddeeplearningsystemsuseannotateddata.Inarule-basedapproach,annotationsareusedtohelpfurtherengineerfeaturesbyhandwhileadeeplearningapproachusesannotationtohelplearnandunderstandstructure.
50
• ThereisanemergingquestioninNLGresearchabouthowtodealwithsentencestructureandnuance.Increasingly,researchersareusingmarkeduptexttohelpsystemslearnhigherorderstructures.
• Pattern-matchingaloneisnotarobustenoughapproach.• Averyclearannotationschemathatmarksupfeaturesofrhetoricaldeviceswouldbeusefulfor
NLGresearchersworkingintheareaofpersuasion.ConclusionTheaimofthisresearchistoengageinaninterdisciplinaryconversationwithNLGpractitioners.Theprocessofengagementandthefindingsfromtheinterviewswillbepresentedinthispaper.
ReferencesDale,R.,2016.Thereturnofthechatbots.Nat.Lang.Eng.22,811–817.Gartner,n.d.Gartner’s2016HypeCycleforEmergingTechnologiesIdentifiesThreeKeyTrendsThatOrganizationsMustTracktoGainCompetitiveAdvantage[WWWDocument].URLhttp://www.gartner.com/newsroom/id/3412017(accessed11.24.16).Hiraoka,T.,Neubig,G.,Sakti,S.,Toda,T.,Nakamura,S.,2016.Constructionandanalysisofapersuasivedialoguecorpus,in:SituatedDialoginSpeech-BasedHuman-
ComputerInteraction.Springer,pp.125–138.LeCun,Y.,Bengio,Y.,Hinton,G.,2015.Deeplearning.Nature521,436–444.Manning,C.D.,Schütze,H.,1999.Foundationsofstatisticalnaturallanguageprocessing.
MITPress,Cambridge,Mass.;London.Parsons,S.,Wooldridge,M.,Amgoud,L.,2003.Propertiesandcomplexityofsomeformal
inter-agentdialogues.J.Log.Comput.13,347–376.Prakken,H.,2009.Modelsofpersuasiondialogue,in:ArgumentationinArtificialIntelligence.
Springer,pp.281–300.Rambow,O.,Bangalore,S.,Walker,M.,2001.Naturallanguagegenerationindialogsystems,in:ProceedingsoftheFirstInternationalConferenceonHumanLanguage
TechnologyResearch.AssociationforComputationalLinguistics,pp.1–4.Reiter,E.,Dale,R.,2006.Buildingnaturallanguagegenerationsystems,Digitallyprinted1stpbk.version.ed,Studiesinnaturallanguageprocessing.CambridgeUniversity
Press,Casmbridge,U.K.;NewYork.Walton,D.,Krabbe,E.C.,1995.Commitmentindialogue:Basicconceptsofinterpersonal
reasoning.SUNYpress.Ward,N.G.,DeVault,D.,2015.Tenchallengesinhighly-interactivedialogsystems,in:AAAI
SpringSymposiumonTurn-TakingandCoordinationinHuman-MachineInteraction.
51
SessionH
1.GettingtheBiggerPicture:ExploratorySearchandNarrativeCreationforMediaResearchintoDisruptiveEventsdr.BerberHagedoorn,UniversityofGroningen,ResearchCentreforMediaStudiesandJournalismdr.SabrinaSauer,UniversityofGroningen,ResearchCentreforMediaStudiesandJournalism
IntroductionDigitalHumanitiescentresonquestionsthatareraisedbyandansweredwithdigitaltoolsintheHumanities.Atthesametime,itinterrogatesthevalueandlimitationsofdigitalmethodsinHumanities’disciplines.WhileitisimportanttounderstandhowdigitaltechnologiescanoffernewvenuesforHumanitiesresearch,itisequallyessentialtounderstand–andtherefore,beingabletointerpret–‘theuserside’ofDigitalHumanities.Specifically,howHumanitiesresearchersappropriateanddomesticatesearchtoolstoaskandanswernewquestions,andapplydigitalmethods.PrevioususerresearchinDigitalHumanitiesconcentratesonassessing,forexample,howandwhyDigitalHumanitiesbenefitsfromstudiesintouserneedsandbehaviour(Warwick,2012),userrequirementresearch,aswellasparticipatorydesignresearch(Kemman&Kleppe,2014).
ExploratorysearchiscrucialforHumanitiesresearcherswhodrawuponmediamaterialsintheirresearch.Audio-visual,onlineanddigitalsourcesareinabundance,scatteredacrossdifferentplatforms,andchangingdailyinourcontemporarylandscape.Supportingresearchers'explorationsbecomesevenmoreimportantwhenscholarsstudymediaevents.A‘mediaevent’isaneventwithaspecificnarrativethatgivestheeventitsmeaning,andisincontemporarysocietiesincreasinglyrecognizedasnon-plannedordisruptive.Disruptivemediaevents,suchasthe‘sudden’riseofpopulistpoliticians,terroristattacksorenvironmentaldisasters,areshockingandunexpected,makingthemdifficulttointerpret.Thisleadstoproblemsformediaresearcherswhoanalysehownarrativesconstructdifferentpolitical,economicorculturalmeaningsaroundsuchevents.Previousresearcharguesthatmediaeventsshouldalwaysbeviewedinrelationtotheirwiderpoliticalandsocioculturalcontexts.Events,astheyunfoldinthemedia,maycorrespondtolong-termsocialphenomena,andthewayinwhichsucheventsare‘constructed’hasparticularconnotations(Jiménez-Martínez,2016).Specificactors(newscasters,governments,institutions)usemediaeventstobuildnarrativesinlinewiththeirownpolitical,economicorculturalpurposes.Mediaresearchersalsobuildnarrativesaroundevents;priorresearchunderlinestheimportanceofvisualizing,constructingandstoringofnarrativesduringtheinformationnavigationtocontextualizematerial(Akkeretal.,2011;Kruijt,2016;DeLeeuw,2012).Offeringmediaresearcherstheabilitytoexploreandcreatelucidnarrativesaboutmediaeventsthereforegreatlysupportstheirinterpretativework.
Thispaperproposestoaddtothisbodyofresearchbypresentingtheinsightsofacross-disciplinaryuserstudythatinvolves,broadlyspeaking,researchersstudyingaudio-visualmaterials,inaco-creativedesignprocess,settofine-tuneandfurtherdevelopadigitaltoolthatsupportsHumanities’researchthroughexploratorysearch.Thispaperfocusesonhowresearchers-inbothacademicaswellasprofessionalsettings-usedigitalsearchtechnologiesintheirdailyworkpracticestodiscoverandexploredigitalaudio-visualarchivalmaterial.Wefocusspecificallyonthreeusergroups,namely(1)MediaStudiesresearchers,(2)Humanitiesresearchersthatuseaudio-visualmaterialsasasourceand(3)Mediaprofessionals.Theseusergroupsaretheforeseenendusersofthetool,becausetheycreateaudiovisualnarrativesfortheirrespectiveworkpurposes.Weset-upco-creativedesignsessionswith74participants(group1:24;group2:40;group3:10)toobserveandreflectonthepracticesofmediaresearchersintermsofhowtheyinteractwithsearchtoolstoexplore,accessandretrievedigitizedaudio-visualmaterial,inordertointerpret,andinsomecases,re-usethismaterialinnewaudio-visualproductions.
52
MethodologyInouruserstudy,weemployauser-centreddesignmethodologytoevaluateandfine-tunetheexploratorysearchtoolDIVE+mediabrowser.Itoffersevents-drivenexplorationofdigitalheritagematerial,whereeventsareprominentbuildingblocksinthecreationofnarrativebackbones(DeBoeretal.,2015)andlinksavarietyofdifferentmediasourcesandcollections.DIVE+offersintuitiveexplorationofmediaeventsatdifferentlevelsofdetail.Itconnectsmediaobjects,subjects(“concepts”),events,andpersonstoaidintheformulationofresearchquestions,andtocontextualizetheformerintooverarchingnarrativesandtimelines.Ourmainresearchquestionthroughoutthecasestudyishowdoesexploratorysearchsupportmediaresearchersintheirstudyofhowmediaeventsareconstructedacrossdifferentmediaandinstilledwithspecificculturalorpoliticalmeanings?Tobeabletoanswerthisquestion,westudyhowmediaresearchersconstructnavigationpathsviaexploratorysearchand-bymeansofuserstudies-evaluatetheroleofnarrativesin(1)learningand(2)research.Inthisprocess,wecompareDIVE+tootheronlinesearchtools.
TheuserstudyobservesmediaresearchersastheyuseDIVE+toexploremediaevents,across3stages:(1)duringresearchquestionformulation(2)DIVE+use;and(3)comparativeuserevaluationsoftheDIVE+browser,comparedtootheronlinesearchtools.Thecollecteddata,consistingofbothqualitative–observationalandfocusgroup-data,aswellasloggingdatagatheredduringusertesting,providesinsightsabouthowmediaresearcherssearchandexploredigitalaudio-visualarchives.Weutilizeacasestudyapproach,whichcombinesgroundedtheory(thatfostersanunderstandingofhowresearchersinterpretandcreatenarratives)withusabilitymethodologies,suchasworktaskevaluations.This,firstofall,allowsustodrawconclusionsabouthowsearchtoolsanddigitaltechnologiesco-constructtheresearcher’sprofessionalpractice.Second,thedatahelpsusprobethequestionhowthe‘digitality’ofsearchandretrievalshapesthepracticeofmediaresearch,and,inextensionofthis,creativeprocesses.
Theresearchpresentedinthispapertakesaninterdisciplinaryapproach:itcombinesinsightsfromMediaStudies,aswellasfromInformationStudiesandScienceandTechnologyStudiesandintegratesideasaboutnarrativecreation,searchpractices,andoverarchingnotionsabouthowusersandtechnologiesco-constructmeaning.ThereforethepresentedresearchdoesnotfocusonhowDigitalHumanities’toolshaveanimpactonresearchers’practices,butratheranalyseshowresearchersmakeuseofsearchtools.Wesubsequently(1)drawconclusionsaboutscholarlypracticeandtheroleofsearchtechnologiesfordigitizedaudio-visualmaterialstherein;and(2)presentlessonslearnedonhowtooptimizethesearchtoolthatisused,inordertoimproveitsperformance.
AcknowledgmentsTheauthorswouldliketothanktheanonymousreviewersofthefirstversionofthisabstractfortheirhelpfulcommentsandsuggestions.ThisresearchwassupportedbytheNetherlandsInstituteforSoundandVision(partiallyinthecontextofBerberHagedoornasSoundandVisionResearcherinResidencein2016-7)andtheNetherlandsOrganisationforScientificResearch(NWO)underprojectnumberCI-14-25aspartoftheMediaNowproject.ThisresearchwasalsosupportedbyCLARIAH,CommonLabInfrastructureofArtsandHumanities,inthecontextoftheResearchPilotNarrativizingDisruption:Howexploratorysearchcansupportmediaresearcherstointerpret‘disruptive’mediaeventsaslucidnarratives(https://www.clariah.nl/projecten/research-pilots/nardis),CLARIAH-projectnumberCC17-13.Allcontentrepresentstheopinionoftheauthors,whichisnotnecessarilysharedorendorsedbytheirrespectiveemployersand/orsponsors.
BibliographyAkker,C.vanden,Legêne,S.,Erp,Mvan,Aroyo,L.,Segers,R.Meij,L.vander,Ossenbruggen,J.van,Schreiber,G.Wielinga,B.,Oomen,J.,Jacobs,G.(2011).DigitalHermeneutics:AgoraandtheOnline
53
UnderstandingofCulturalHeritageCategoriesandSubjectDescriptors.WebSci11,Koblenz,Germany.
Boer,V.de,Oomen,J.,Inel,O.,Aroyo,L.,Staveren,E.van,Helmich,W.,&Beurs,D.de.(2015).DIVEintotheEvent-BasedBrowsingofLinkedHistoricalMedia.WebSemantics:Science,ServicesandAgentsontheWorldWideWeb,35(3),152–158.
DeLeeuw,S.(2012).EuropeanTelevisionHistoryOnline:HistoryandChallenges.VIEWJournalofEuropeanTelevisionHistoryandCulture,1(1),3–11.
Jiménez-Martínez,C.(2016).Integrativedisruption:therescueofthe33Chileanminersasalivemediaevent.In:Fox,A.,(ed.)GlobalPerspectivesonMediaEventsinContemporarySociety.IGIPublishers,Hershey,USA,60-77.
Katz,E.,andLiebes,T.(2007).‘NoMorePeace!’:HowDisaster,TerrorandWarHaveUpstagedMediaEvents.InternationalJournalofCommunication1,157-166.
Kemman,M,andKleppe,M.(2014)."UserRequired?OntheValueofUserResearchintheDigitalHumanities."SelectedPapersfromtheCLARIN2014Conference,October24-25,2014,Soesterberg,TheNetherlands.No.116.LinköpingUniversityElectronicPress.
Kruijt,M.(2016).SupportingExploratorySearchwithFeatures,Visualizations,andInterfaceDesign:ATheoreticalFramework.UniversityofAmsterdam.
Warwick,C.(2012)."StudyingusersinDigitalHumanities."DigitalHumanitiesinpractice,1-21.
2.BiasintheanalysisofmultilinguallegislativespeechLauraHollink,AstridvanAggelen,JaccovanOssenbruggenCentrumWiskunde&Informatica,Amsterdam,[email protected]
InthispaperweinvestigatetheapplicationofnaturallanguageprocessingtoolstothemultilingualproceedingsoftheEuropeanParliament.Thisworkispartofastudyinwhichweexplore(1)howsubcorporaindifferentlanguagesmayleadtodifferentconclusionsaboutthepoliticallandscape,(2)howtodeterminewhatapotentiallanguage-relatedbiasoriginatesfrom,and(3)towhatextentwecanlimitorevenpreventanunwantedlanguage-bias.
Parliamentaryspeechhasbeenusedtostudypartypositions[1,2,3],issueselection[4,5,6,7]andthelevelofdisagreementwithinadebate[8].Manystudieshavemovedawayfrommanualcoding(whichisdoneine.g.[4,5])andinsteadpositionspeechtextsononeormore(latent)dimensionsinstatisticalmodelsbasedonrelativewordfrequencies[1,2,3,6,7,8],oftenincombinationwithbasicpre-processingstepssuchasstemmingandstopping.Thesemodelsandtools,whileimperativetoanalysebiggerdatasets,addasourceoferrorsandbias.Onesourceofpotentialbiascomesfromthefactthattheusedtoolsperformdifferentlyondifferentlanguages.ConsideringthattheaforementionedstudieswerecarriedoutontheEuropean,Irish,US,Spanish,NorwegianandSwedishlegislatures,thecomparabilityandreproducibilityoftheresultsfordifferentlanguagesisunclear.
IntheEuropeanParliament,thespokenaccountsappearin(currently)24languages.Here,theuncertaintystemsnotonlyfromtoolsthatperformdifferentlyoneachlanguage,butalsofromthefactthattheavailabilityofdataineachlanguagevaries.MembersofParliament(MEPs)arefreetospeakinanyoftheofficiallanguages.Speechesaresometimestranslatedinto(some)otherlanguages,dependingonprioritizationwiththeEP,specifictranslation-requestsofthemembersand
54
(supposedly)budgetaryconstraints.Thus,weareleftwith24subcorporaofvaryingsize,oneperlanguage,includingbothoriginalandtranslatedspeech.
Theneedtostudylanguage-effectsinthiscontexthasbeenrecognisedbefore.Prokschetal.[3]reportedamodestlanguage-effect42intheirstudyofpartypositionsintheEuropeanParliament,whichtheyascribedtotranslationratherthanactualdifferencesinpositiontakingbetweenthreecountries.However,whiletheoveralleffectmaybesmall,wearguethatspecificlocaleffectscouldstillleadtosignificantbiasesintheresults.Forexample,FrenchtranslationsofGermantextsseemedtosystematicallygetamoreneutralpositionthantheoriginaltext,whiletheoppositewasnotthecase.ItisimportanttorealisethattheproceedingsoftheEuropeanParliamentarenotonlyacorpusforresearchers.ResidentsoftheEuropeanUnionhavearighttoaccessthesedocumentsinordertomakeinformedvotesandtoholdtheMEPsaccountable43.ThisrightwouldbecompromisedwhenFrenchspeakingcitizenscometodifferentconclusionsaboutwhathasbeendiscussedthanGermanspeakingcitizens.Ouraimistogaininsightintohowworkingwithsubcorporaindifferentlanguagesmayleadtodifferentconclusionsaboutthepoliticallandscape.
Inthisstudy,weusethedataprovidedbytheTalkofEuropeproject[9],inwhichspeechtranscriptsandallavailabletranslationswerecrawledfromthewebsiteoftheEP44,andtranslatedintothesemanticwebformatRDF.Dataisavailablefrom1999to2015andcontainsaround300Kspeechesin22Kdebates.Weapplytopicdetectiontosixlanguage-specificsubcorporaoftheproceedingsoftheEuropeanParliament:German,English,French,Italian,SpanishandDutch.WeusetheJEXsoftwaredevelopedbytheEuropeanCommission'sJointResearchCentre,whichlearnsmulti-labelcategorisationrulesfromdocumentsthatwerepreviouslymanuallyindexedusingthemultilingualEurovocthesaurus[10].Theadvantageofusingthistoolover,forinstance,widelyusedtopicmodelingapproachessuchasLDA[11],isthattheoutputisdirectlycomparableacrosslanguages:thetoolusesasinglethesaurus,Eurovoc,toclassifydocumentsineachlanguage,andconceptsintheEurovocthesaurushavelabelsinalllanguages.Inalaterstageofthestudy,weplantoincludeothertopicdetectiontechniques,andwidenthescopetoallEUlanguages.
Over2000distinctEurovoctopicsweredetectedinthesixsubcorpora.Thefrequencydistributionsovertopicsvaryperlanguage.Figure1visualisesthedistancebetweenlanguages.WeuseKullback–Leiblerdivergence[12],anon-symmetricmeasureforthedifferencebetweentwodistributions.Ahigherscore,visualizedasareddercolour,signifiesagreaterdistance.Forexample,ItalianandFrencharerelativelyclose,whileSpanishandGermanarefarapart.Therearefourhypothesesastowhatthesedifferencesoriginatefrom:
1. MEPsspeakingonelanguageindeedspeakaboutdifferenttopicsthantheircolleagueswhospeakinanotherlanguage.
2. Thereisabiasintheselectionofspeechesthatarebeingtranslated.3. Thereisabiasinhowcertaintopicsaretranslated,e.g.translatorsusemoreambiguousor
polarizedlanguage.4. Thetopicdetectiontoolworksdifferentlyononelanguagethanonanother.
42 A correlation coefficient ranging between 0.86 and 0.93 when comparing party positions derived from texts in German, French and English [3]. 43 Regulation (EC) No 1049/2001 of the European Parliament and of the Council 44 http: //www.europarl.europa.eu
55
Figure1:Heatmapofdifferencesbetweentopicdistributionsinlanguages.
Inourpresentation,wewilltacklethisissuefromtwosides.Firstly,wecomparedifferentsubsetsoftopicsbasedonwhetherornotspeechesweretranslated,andtowhichlanguages,toexplorehypotheses1and2.Then,tostudyhypothesis4(andtoalesserextenthypothesis3)wezoomintotopicsthatappeartobeparticularlydistinctivebetweenlanguages,andcomparethetopicannotationstowhatwasactuallysaidinthedebates.Asanexampleofthelattermethod,Figure2showsthedifferencesinfrequencyofthedetectedtopics“nuclearweapons”and“nuclearenergy”.Remarkably,onlyFrenchandItalianspeechesseemtobeaboutnuclearweapons,whileEnglishandSpanishspeechesareoftenaboutnuclearenergy.Asacomparison,Figure3plotstheoccurrencesofthephrases“nuclearweapons”and“nuclearenergy”(andtranslationsthereof)intherawspeechtexts.Here,partoftheeffectisgone,suggestinganerrorofthetopicannotationsoftware,whilepartoftheeffectremains-Germantextsindeedseemtotalklessaboutbothnuclearweaponsandnuclearenergy.
Withthisstudy,weaimtocontributetothediscussionaboutsystematicmethodsfortoolcriticismandsourcecriticisminacomplexmultilingualcontextliketheEuropeanParliament.
Figure2:Frequencyoftopicsindebates.
56
Figure3:Frequencyofphrasesindebatetexts.
References[1]Benoit,Kenneth,andMichaelLaverNd.EstimatingIrishPartyPositionsUsingComputerWordscoring:The2002Elections.IrishPoliticalStudiesVol.18,Iss.1,2003.
[2]Laver,MichaelJ.,KennethR.Benoit,andJohnGarry.ExtractingPolicyPositionsfromPoliticalTextsUsingWordsasData.AmericanPoliticalScienceReview97(2):311–31,2003.
[3]Proksch,S.-O.andSlapin,J.B.PositionTakinginEuropeanParliamentSpeeches,BritishJournalofPoliticalScience,40(3),pp.587–611,2010.
[4]HannaBäck,MarcDebus&JochenMüller.WhoTakestheParliamentaryFloor?TheRoleofGenderinSpeech-makingintheSwedishRiksdag.PoliticalResearchQuarterly67:504–518,2014.
[5]MarkusBaumann.ConstituencyDemandsandLimitedSupplies:ComparingPersonalIssueEmphasesinCo-sponsorshipofBillsandLegislativeSpeech.ScandinavianPoliticalStudies,Vol.39,issue4,pp.366-387,2016.
[6]Pardos-Prado,Sergi,andIñakiSagarzazu.ThePoliticalConditioningofSubjectiveEconomicEvaluations:TheRoleofPartyDiscourse.BritishJournalofPoliticalScience46(4),799-823,2016.
[7]KevinM.Quinn,BurtL.Monroe,MichaelColaresi,MichaelH.Crespin,DragomirR.Radev.Anautomatedmethodoftopic-codinglegislativespeechovertimewithapplicationtothe105th-108thUSSenate.MidwestPoliticalScienceAssociationMeeting.2006.
[8]BenjaminE.Lauderdale,AlexanderHerzog.MeasuringPoliticalPositionsfromLegislativeSpeech.PolitAnal;24(3):374-394,2016.
[9]AstridvanAggelen,LauraHollink,MaxKemman,MartijnKleppe,andHenriBeunders.Thedebatesoftheeuropeanparliamentaslinkedopendata.SemanticWeb,8(2):271–281,2017.
[10]PouliquenBruno,SteinbergerRalf,CameliaIgnat.AutomaticAnnotationofMultilingualTextCollectionswithaConceptualThesaurus.InProceedingsoftheWorkshopOntologiesandInformationExtractionattheSummerSchoolTheSemanticWebandLanguageTechnology-ItsPotentialandPracticalities(EUROLAN'2003).Bucharest,Romania,28July-8August2003.
[11]Blei,DavidM.,Ng,AndrewY.,Jordan,MichaelI.Lafferty,John,ed.LatentDirichletAllocation.JournalofMachineLearningResearch.3(4–5):pp.993–1022,2003.
[12]Kullback,S.,Leibler,R.A.Oninformationandsufficiency.AnnalsofMathematicalStatistics.22(1):79–86,1951.
57
SessionI
CulturalHeritageDataforResearch:AEuropeanaResearchPanelNienkevanSchaverbeke,HeadofEuropeanaCollectionsMarjoleindeVos,EuropeanaDataPartnerServicesDr.AgiatisBenardou,DigitalCurationUnit,R.C."Athena",InstitutefortheManagementofInformationSystems
Panelmembers:NienkevanSchaverbeke-HeadofEuropeanaCollections-sessionChair
Dr.AgiatisBenardou-DigitalCurationUnit,R.C."Athena",InstitutefortheManagementofInformationSystems-ResearcherNeedsManagement
1MemberofourBoardfromaresearchnetwork(http://research.europeana.eu/blogpost/europeana-research-advisory-board-established)-TBC
Marjolein de Vos - Europeana, Digitised Medieval Manuscripts Maps - Data Quality
Dr. Caroline Ardrey - University of Birmingham - Europeana Grants Winner
Dr. Dana Mustata - University of Groningen. Academic in a digital humanities related field, outsider to Europeana - TBC
CulturalHeritageDataforResearch:AEuropeanaResearchPanel
InthispanelmembersoftheEuropeanaResearchAdvisoryBoard,EuropeanaDataPartnerServices,oneoftheResearchGrantswinnersand,importantly,anacademicexternaltoEuropeanawillpresentanddiscussthevalueofEurope’sculturalheritagedataforresearchinthehumanitiesandsocialsciences,andthewaysinwhichEuropeanaResearchispromotingandenablingitsuse.Thepanelispartofalargerdiscussiongoingonaboutmakingculturalheritageavailableforresearchandtheopportunities,challenges,andconsiderationsinvolvedinthis.
Inshort,thepanelwillfocusonthefollowingpoints:
• EuropeanaResearch-Objectives&Achievements• Relationshiptootherresearchnetworksandinfrastructures(DARIAH,CLARIN,EHRI,Parthenos
etc)• Researcherneedsandcommunityengagement• Dataaggregationandqualityimprovement• UsingEuropeanadatainresearch
EuropeanaResearchwasestablishedasalinkbetweenculturalheritageinstitutionsandresearchers.WerecognizethatundertakingresearchonthedigitisedcontentofEurope’sgalleries,museums,libraries,andarchiveshashugepotentialthatshouldbeexploited.Butissueswithregardstolicensing,interoperability,andaccesscanoftenimpedethere-useofthatdatainresearch.EuropeanaResearchaimstohelpwiththeseissues,liberatingculturalheritageformeaningfulacademicre-use.WeworkonaseriesofactivitiestoenhanceandincreasetheuseofEuropeanadataforresearch,anddevelopthecontent,capacity,andimpactofEuropeana,byfosteringcollaborationsbetweenEuropeanaandtheculturalheritageandresearchsector,aswellasliaisingwithotherdigitalresearchinfrastructuresandnetworks.
EuropeanaResearchisgovernedbyanAdvisoryBoardcomprisingofrenowneddigitalhumanitiesexpertswhohelpusgrowandstrengthenservicesforDHresearchers.Inthefirstsectionofthepanel
58
wewillhighlightourmainobjectivesandgreatestachievements,suchastheResearchGrantsProgramme.
Followingthisintroduction,oneofourpanelmembers,arepresentativefromaresearchnetworkthatwecollaboratewithandanacademicwhoisnotconnectedtoEuropeanawillexpandandelaborateonthisrelationshipbetweentheirnetworkandEuropeana,andthevaluethereof.
Sinceourtargetaudienceareresearchcommunitiesinthehumanitiesandthesocialsciences,itisvitaltounderstandtheirheterogeneousneedsvisàvistheirinformationbehaviourandtheirinteractionwithdigitalcontent.Inthispartofthepanel,wewillgointodetailabouthowwecometounderstandtheneedsofourusers,howtocatertothem,andhowwecontinuouslydevelopandfurtherthisunderstandingandadapttotherequirements.
Withmorethan54millionobjectsfrom40countriesandinavarietyoflanguages,theEuropeanaportalcontainsasubstantialamountofdatatomanage.TheDataPartnerServicesteamdoesnotonlyworkcontinuouslyoningestingnewdatafortheportal,butalsoinveststimeintoevaluatingandimprovingexistingdata.Wemakedataqualityplanswithaggregatorsanddirectproviderstofurtherfindabilityandgranularityoftherecordsintheportal.Furthermore,thereisaspecialassignedDataQualityCommitteethatworksonrefiningandexpandingtheEuropeanaDataModel.Duringthispartofthepanel,wewilltalkabouttheworkthatisbeingdonefromthemetadataperspectiveondataquality,theimportanceofunderstandingresearchersneedsforthis,andthevalueofculturalheritagedataforresearch.
In2016theEuropeanaResearchGrantsProgrammewaslaunched,inwhichDigitalHumanitiesresearcherswereencouragedtoapplywithaprojectwhereEuropeanadatawouldbecentralinansweringtheirresearchquestion.Theunprecedentedsuccessofthiscallforproposalsshowsushowimportantitistomakeheritagedataavailable;thevarietyinideasshowingustherangeofpotentialofwhatisintheportal.TofurtherillustrateandstrengthenthepointsthatwillbementionedinthepaneloneofthewinnersoftheEuropeanaResearchGrantsProgramme2016willdiscussherprojectasashowcaseofEuropeanadatare-useforresearchandthepotentialofferedtoresearchcommunitiesthroughopenaccess,clearlicensing,andadequatedigitaltools.
Afterprovidingshortexplanationsonthepointsmentionedinthisproposal,wewillencouragediscussionfromthepanelandtheaudienceonthesematters.ThesecouldleadtovaluableinsightsforEuropeanaResearchinthewiderdiscussionofopeningupculturalheritagefortheresearchcommunity.WealsowelcomesuggestionsforEuropeanaResearch’sfutureactivitiesandimprovingservices.
59
SessionJ
Textmininginpractice:Adiscussiononuser-appliedtextminingtechniquesinhistoricalresearch.Language:English,Duration:60minutes
Inthispanelwelookattheapplicationoftextminingtechniquesinhistoricalresearch.Inrecentyears,textmininghascomewithinreachofanyvaguelycomputer-literatescholar.Thegrowingavailabilityoflargedigitaltextcollectionsleadstogrowingabilitiestoapplydigitalandquantitativeapproachestothestudyofhistoricaltexts.CommonlyusedlanguagesandstatisticalenvironmentssuchasPythonandR,offerapplicablesoftwaresolutionsforfree.Thishasliberatedhistoriansandotherhumanitiesscholarsfromtheshacklesoftime-consumingandoftenexpensiveprogrammingworkbyhiredexternalprogrammers.
Techniquesliketopicmodelling,wordembeddings,sentimentandemotionminingareincreasinglybeingusedinthehumanitiesandsocialsciences.Historians,politicalscientists,sociologistsandothersnowhavetheopportunitytouseadvancedtextminingtechniquesonlargedatasetsfromtheirdesktops.Althoughstillmostlyexperimental,thepotentialgainsnowappearenormous.
Itisoftenclaimedthatthisenablesresearcherstostudyconceptsanddevelopmentsinlongitudinal,systematicandquantitativewaysthatwereimpossiblebefore.Butwhatdothesedigitaltechniquesreallyaddtomoretraditionalapproaches?Howcantraditionalapproachesandinnovativedigitalmethodologiesbepairedinameaningfulandenrichingmanner?Doesquantitativetextanalysisprimarilyprovidecontexttoexistingknowledge,orisitaradicaldeparturefromwhatwentbefore?
Webelievethatquantitativetextanalysiscouldwellprovetobeadramatic,agenda-settingchange.Asyet,however,severalproblemsneedtobeaddressed.First,mostofthetechniquesinvolvedarelessthanadecadeold,researchersarescatteredamongdepartmentsanddisciplines,andthereisasyetnooverarchingdiscussionaboutbestpractices,pitfallsandproblemswithmethodology,orevenasharedplatformtodiscussbasictechnicalproblemshasbeenestablished.Thereisadistinctneedforabetterexchangeofinformationandsharingofexperience,bothinsideandoutsidetheworldofdigitalhumanities.
Asecondproblemthatneedstobeaddressedistheslowadvancementofnewtechniquesinpublishedresearchoutsidethenarrowdigitalhumanitiesworld.Anecdotalevidencesuggeststhatleadingjournalsinthehumanities,politicalandsocialsciencesarenotparticularlykeenonpapersusingtext-miningmethodologies.Thisunwillingnessisatleastinpartinspiredbytheproblemmentionedabove.Therearefewestablishednormstoevaluatethevalidityofnewtechniques.Ontheotherhand,conservatismmayalsoplayarole.
Athirdproblem,whichalsoimpactspublicationopportunities,isthatthebulkofpublicationssingtext-miningtechniquesarestillprimarilyabouttextmining.Thecorporaused,andtheresearchquestionsasked,inmanycasesstillseemperipheraltotechnologicalglitz.Itisofcourseusefultoinvestigatethetechnicalopportunitiesthatnewtechniqueshavetooffer,butforthewiderdisseminationofthesetechniquesitwillprobablyprovenecessarytotackleexistingresearchproblemsinvariousfieldsandshowthatthisparticularfieldofthedigitalhumanitieshassomethingtooffertothestudyofhistory.
Weproposetodiscusstheseproblemswithamixedpanelofexperiencedtextminingresearchersfromdifferent(sub-)disciplines.Ourcentralgoalistodiscusspracticesforvalidationoftechniquesandmethodologies.Wewanttocomeupwithaproposalforintegratingtextminingtechniquesin
60
historicalresearchpracticeinameaningful,substantive,andcontributiveway,andpavethewayforthemoveoftextminingintocommonresearchpractice,beyondthecurrenthype.
Chair:
• Dr.RalfFutselaar(EUR/NIOD)
Panelmembers:
• Dr.JessedeDoes(IvdNT)• Prof.dr.YasutoNakano(KGU,Japan)• Dr.MartijnSchoonvelde(VU)• MilanvanLange,MA(NIOD/UU)
61
SessionK
MappingHistoricalLeiden:TheCreationofaDigitalAtlasOrganiser: ArievanSteensel,UniversityofGroningen([email protected])
Panellist: JaapEvertAbrahamse,CulturalHeritageAgency([email protected])
Speakers: EllenGehring,ErfgoedLeidenenOmstreken([email protected])RoosvanOosten,LeidenUniversity([email protected])ArievanSteensel,UniversityofGroningen([email protected])
Thedigitalrevolutionhasrenderedmapsevenmoreusefulforallkindsofpurposes,suchasnavigating,locatingservices,orgeotaggingactivities.Moreover,agrowingarrayofdigitaltechnologies,applicationsandplatformsoffernewresearchopportunitiesforscholarsinthehumanities,forwhommapsarebothasourceaboutthepastandatooltostudythepast,andtheyallowheritageorganisationstounlock,visualiseandanalysediversehistoricalandarchaeologicaldataandobjectsinnovativelyonthebasisofgeographicalrelations.Itisbeyonddoubtthatthespatialencodingofobjectsandtextualinformationoffersanewframeworkofanalysisandenablesustobetterexploretheexperiencesandmeaningsofspaceandplaceinthepast.45Tools,mapsanddataareoftenreadilyavailableforthestudyofthemorerecentpast,butthisislessthecaseforthepre-modernperiod.Ingeneral,itrequiresaconsiderabletimeinvestmenttodevelophistoricalGeoInformationSystems(GIS)andonlinemappingplatforms.Theseefforts,however,payoffinthelongrun,sincetheseapplicationsopenawholerangeofnewresearchopportunitiesandnovelwaystopresentandvisualiseresearchresults.46
ThispanelpresentsandcriticallydiscussesthefirstresultsoftheMappingHistoricalLeidenproject,whichaimstodevelopadynamicdigitalatlasofthepre-moderncityofLeiden.Thefirstphaseofthisproject–acollaborationbetweenhistorians,archaeologistsandLeiden’sheritageorganisation(ErfgoedLeidenenOmstreken)–wasrecentlycompleted(thefirstversionoftheatlasisaccessibleonlineathlk.erfgoedleiden.nl,inDutch).Themappingtoolstillrequiresfurthertechnicalimprovementstomakeiteasiertouploadandanalyseadditionaldata,andmoregeocodeddatasetswillbecomeavailableinthecomingmonths.Thetoolenablesuserstolink,identifyandsearchdataacrossplaceandtime,ratherthanprovidingstaticsnapshotsoftheurbanspaceinthepast.
Apartfromitstechnicalresourcesandaspects,themappingtool’sresearchpossibilitieswillbedemonstratedbytwocasestudies:oneontherelationbetweenspaceandwealthinsixteenth-centuryLeiden,andtheotheronthecity’ssanitaryinfrastructureintheearlymodernperiod.Together,thesepresentationswillofferanopportunitytodiscussthepossibilitiesofdigitalmappingtoolsandthevalueofcollaborationbetweenscholarsandspecialistsfromtheheritagesectorinthe45 See, for example, Anne Kelly Knowles and Amy Hillier, eds., Placing History: How Maps, Spatial Data, and
GIS Are Changing Historical Scholarship (Redlands, Calif: ESRI Press, 2008); David J. Bodenhamer, John Corrigan, and Trevor M. Harris, eds., The Spatial Humanities: GIS and the Future of Humanities Scholarship (Bloomington: Indiana University Press, 2010); Alexander von Lünen and Charles Travis, eds., History and GIS: Epistemologies, Considerations and Reflections (Dordrecht: Springer, 2013); Ian N. Gregory and A. Geddes, eds., Toward Spatial Humanities: Historical GIS and Spatial History (Bloomington: Indiana University Press, 2014).
46 See, for example, Onno Boonstra and Gerrit Bloothooft, eds., Tijd en ruimte: nieuwe toepassingen van GIS in de alfawetenschappen (Utrecht: Matrijs, 2009); a theme issue of PCA. Post Classical Archaeologies 2 (2012) on GIS for archaeologists and historians; Hélène Noizet, Boris Bove, and Laurent Jacques Costa, eds., Paris de parcelles en pixels: analyse géomatique de l’espace parisien médiéval et moderne (Saint-Denis: Presses Universitaires de Vincennes, 2013); Nicholas Terpstra and Colin Rose, eds., Mapping Space, Sense, and Movement in Florence: Historical GIS and the Early Modern City (New York: Routledge, 2016).
62
fieldofdigitalhumanities,butalsothepracticalandtechnicalchallengesofhistoricalGISandpotentialpitfallsofpartnerships.
Presentation1(EllenGehring):OneSizeFitsAll?DevelopingaMulti-FunctionalDigitalMappingTool
Buildingacutting-edgemapapplicationforscholars,heritagemanagersandthegeneralpublicisamajorchallengeintechnicalandmethodologicalterms.MappinghistoricalLeidenhasovercomesomeofthebarriers,andthispresentationfocusesonthetechnicalaspectsofthemappingtool.Crucialfortheproject,forexample,wasthedevelopmentofaso-calledhistoricalgeocoder,whichallowstolinkdifferentgeometricformsandtodefinetheirrelations.Apartfromtechnicalities,itwillbefurthershownhowverydiversedatacanbestandardisedthroughanadvanceduseofdatabasestoensuremeaningfulspatialanalyses.Thecodeofthemappingtoolisavailableasopensource,andsinceitisunnecessaryforotherstoreinventthewheel,itwillbefinallyexplainedhowthetoolcanbeutilisedinothercontexts.
Presentation2(ArievanSteensel):WealthandPlaceinLateMedievalLeiden:aParcel-BasedAnalysis
Leidenhasauniquesource,theso-calledBookofWaterwaysandStreets,whichcontainsaboutahundredcadastralmapsthatweredrawnforfiscalpurposesinthesecondhalfofthesixteenthcentury.Inthispresentation,itwillbefirstdemonstratedhowthesemapswereturnedintoageoreferencedbasemap.Secondly,itwillbeshownhowthissixteenth-centurypre-cadastralmapcanbeusedtoanalysetherelationbetweenwealthandspaceinthecityofLeidenataparcellevel,resultinginamorerefinedunderstandingofthecomplexrelationshipbetweenoccupation,wealthandplace,whichchallengescommonassumptionsaboutthesocialgeographyofpremoderncitiesandtowns.ThemainpointtobemadeisthathistoricalGISmakesitpossibletoreinterpretsourcesthatinformusabouttheimportanceofspaceandlocalityinstructuringhumaninteractions,aswellastopresentthesedatainanattractiveandaccessibleway.
Presentation3(RoosvanOosten):Wassanitaryinfrastructureaprivilege?
Scholarshavegenerallyacceptedthatsanitaryinfrastructurewastheprivilegeofthewealthyfew.However,withtheuncoveringofhundredsofcesspitsandwatersupplyfacilitiesinthetownofLeideninthepastdecades,thisassumptioncannowbetestedfordifferenttimeperiods.Inordertoinvestigatethequestionofaccessibilitytosanitaryarrangements,thearchaeologicallydocumentedsanitarystructuresmustbeplottedandfinancialvaluationattachedtothem.Socio-economicdatabasedontaxregistersareavailablefromabout1600,whichwillbemostusefulinthisventure.Furthermore,thankstoHISGIS,wealsohaveaccesstosocio-economicdatafrom1832,whichwillallowustoestablishalong-termperspectiveonthedevelopmentofLeiden’ssanitaryinfrastructure.
63
SessionL
1.WastheFerguutwrittenbyoneortwoauthors?TheoMeder,GosseBouma,HannahMars,TrudyHavinga(RUG)
In1989,WillemKuiperpublishedhisthesisontheMiddleDutchromanceFerguutinwhichheconcludedthattheromanceiswrittenbytwoauthors.Kuipershoweddifferencesinwritingstyleatalllevels(rhyme,syntax,vocabulary,spelling)andconcludedthiswasnocoincidence.AccordingtoKuiper,thefirstauthortranslatedtheOldFrenchFergusbyGuillaumeleClerc,approximatelyuntilvs.2592,whereafterthesecondauthorcompletedthesecondhalfwithoutFrenchexample-inthespiritofFergus,butinhisownwords.Nowhereinthetextthereisaclearreferencetoadualauthorship(cf.theRomanvanWalewein),butthestylebreakhalfwaythroughthetextwasneverthelesssomethingthatascholarlikeEelcoVerwijsnoticedaswell.OtherresearchersquestionedordeniedthefindingthattheFerguutwaswrittenbytwoauthors,likeW.J.A.Jonckbloet,andaftertheappearanceofKuiper’sthesisalsoBartBesamuscaandMikeKestemont.WiththethesisofKestemontweenteredtheeraofe-humanities.WhereasKuiperhadtodohisquantitativestyleanalysisbyhand,todaytheprogramminglanguageRincollaborationwiththestylometricprogramStylocanperformthejobmuchfaster,morethoroughandcompletelyunbiased(Stylodoesn’tknoworcarewhattextsitgetspresentedandwhattheoutcomemaybe,whereashumanresearchersmaybeinfluencedbypreconceivedideas).Initsanalysis,thesoftwarenotonlytakesallthedifferencesintoaccount(likeKuiperdid),butallthesimilaritiesaswell,evenatlevelswherewritersandreadersarehardlyawareof,suchaswordorderandtheuseoffunctionwords.Atthisleveleveryauthorleaveshismostpersonalfingerprintbehind.
SomewhatcautiousKestemontfinallyassumesthatFerguutwaswrittenbyoneauthor,whoasatranslatorpulledopenanotherregisterthanasafreewriter.BecauseFerguutplaysnoprominentroleintheinvestigationofKestemont,wewanttozoominmorefocusedonthisparticularromance.Thecentralquestion:istheFerguutwrittenbyoneortwoauthors?
InordertoinvestigatewhetherthetwopartsoftheFerguutarestylisticallysimilar,wecomparethesimilaritybetweenthetwopartsoftheFerguutwiththesimilaritybetweentwoorthreepartsofother‘randomly’selectedMiddleDutchtextsfromaroundthesameperiodandregion,mostofthemdealingwithcourtlylife.Seventextsweknowtohavebeenwrittenbyasingleauthor,aneighthtextweknowthatitiswrittenbytwoauthors.Weinvolvethefollowingtextsintheanalysis:Ferguut,Beatrijs,DeBorchgravinnevanVergi,Lanceloetenhethertmetdewittevoet,VandenvosReynaerdebyWillem(theAernoutmentionedintheprefaceistheauthorofanOld-FrenchRenarttranche),threepoems(adeliberatemisfit)byWillemvanHildegaersberch(VandenSerpent,VandenPaepdiesijnBaeckgestolenwert,VandenWijnvaet)andDeRomanvanWalewein–forthisexperimentwelookedatthecompletetexts,andcutupthelongertextsintotwoorthreeevenpiecesincasetherewerenocleartextualdivisions.Alltheeditionshadtobethoroughlycleanedandconvertedtotxtformat.
OnlyDeRomanvanWaleweinismostcertainlywrittenbytwoauthors:toabouttwo-thirdsofthetotalnumberofverses,thestoryiswrittenbyPenninc(vs.1–7.880),thelastpartiswrittenbyPieterVostaert(vs.7.881–11.198).Fortheanalysiswethereforecutthistextintothreepieces,sothatthethirdpartiswrittenbyVostaert.Asanexperiment,wecutVandenVosReynaerdeinthreeevenpieces.Theotherlongertextswecutintotwoevenpieces.Ferguutiscutatthelocationwherethestyletransitionshouldoccur,sotheplacewherethesecondauthortookoverfromthefirst,accordingtoKuiper.AllthesetextsandfragmentsarethenpresentedtoStyloforanalysis.Inthisway,wecancomparethesimilaritybetweenthetwopartsoftheFerguutwiththesimilaritybetweenthetwo/threepartsofanumberoftextsthatweknowarewrittenbyasingleauthor,andthethreepartsofatextwhichweknowthatitwaswrittenbytwoauthors.Ifthestylometricanalysis
64
showsthatthetwopartsoftheFerguutlookasmuchalikeastwopartsofthetextsofoneauthor,andresembleeachothermorethanthefirsttwoandthethirdpartoftheWalewein,thatindicatesthattheFerguutwasalsowrittenbyoneauthor.IftheanalysisshowsthatthetwopartsoftheFerguutlooklessalikethanthetwopartsofthetextsofoneauthor,andjustasmuch,orlessthanthethreepartsoftheWaleweintogether,thismayindicatethattheFerguutiswrittenbytwoauthors.
Inabovegraph,basedonwordtri-grams,Styloshowswhatmanyalreadyexpected:allnovelsandwritersareclusteringneatlytogether(N.B.:thesamehappenswithwordbi-gramsandwithcharacterbi-gramsandtri-grams.Asonecanseeinthegraph,inhindsightthefulltextsneednothavebeenincluded,butwewantedtobeverysurewewouldnotencounteranynastysurprises).ThethreepartsoftheReynaertarestylisticallymostalike,thetwopartsoftheBeatrijsmostlyresembleeachother,Vergipart1looksmostlikeVergipart2etc.AlsothetwopartsofFerguutstylisticallymatcheachotherratherthananyothertext.Eventheexemplum,thejestandthesongofHildegaersberhsharethestyleofoneandthesameauthor.OnlyWaleweinexhibitstheexpecteddeviation:Part3wandersoffandpositionsitselfsomewherebetweenFerguutandReynaert,ratherthannexttotheotherpartsoftheWalewein.ThisgraphofthestylometricanalysisjustifiesnootherconclusionthanthattheWaleweiniswrittenbytwoauthors,butFerguutbyoneauthor.Furthermore,itshowsthatthethreeArthurianromancesandReynaertclustertogether,andthecourtly,religiousandmoralistictextsstandtogetherseparately.Weexperimentedwithallkindsofdifferentparameters,buttheresults(practically)remainedthesame.Rollingdeltaresultedintonothingconclusive.OnlycuttinguptheFerguutinevensmallerpiecesandclusteringthemresultedinthestyledifferencesthatKuiperdiscovered,basedonsmallpiecesofcomparison,butdeprivedofalong-termsimilarityoverviewoverthetextmaterial.
65
Reservationscanbemadeforthetechniquesused:stylometricsworksbetterwithlongertextsthanshorterones,stylometricsworksbetteronStandardModernDutchthanonMiddleDutchtextswithitsunstablespelling,stylometricsworksbetteronMiddelDutchrhymepairs,alltheeditionsshouldbeeitherdiplomaticorcriticalorinanyotherwaynormalized/standardizedetcetera.
Still,allthingsconsidered,basedonmultiplestylometricexamination,StyloseesmoresimilaritiesthandifferencesbetweenthetwopartsoftheFerguut,bothonthelevelofwordorderandtheuseoffunctionwords–traitsthatareconsideredtoberatherpersonalforeachauthor.TheFerguutismostprobablywrittenbyoneauthor.Inwritingthesecondhalfofthetext,theauthormay–alsostylistically–beinspiredbythefairytaleknownasATU314ATheShepherdandtheThreeGiants,thatwaspresentintheOld-FrenchFergusaswell.WhatwealreadyknewaboutWaleweinisconfirmed:thelastpartoftheromanceshowsmorestylisticdifferencesthansimilaritiescomparedtootherromancesliketheReynaertandevenFerguut,andthereforeWaleweinwaswrittenbytwoauthors.Finally,itisgoodtoknownowthatoneauthorcouldhaveseveralstylisticregisters:oneforwhenhetranslated,andoneforwhenhefreelyretoldastory.
ReferencesB.Besamusca:‘DeVlaamseopdrachtgeversvanMiddelnederlandseliteratuur:eenliterair-historischprobleem’,in:Denieuwetaalgids84(1991),p.150-162.
A.Th.Bouwman:ReinaertenRenart.HetdiereneposVandenvosReynaerdevergelekenmetdeOudfranseRomandeRenart.2parts,Amsterdam1991.
W.Bisschop&E.Verwijs(eds.):WillemvanHildegaersberch:Gedichten.’s-Gravenhage1870.
K.H.vanDalen-Oskam:DestijlvanR.Amsterdam2013.
T.Dekker,J.vanderKooi&T.Meder:VanAladdintotZwaankleefaan.Lexiconvansprookjes:ontstaan,ontwikkeling,variaties.Nijmegen1997.
M.Draak(ed.):Lanceloetenhethertmetdewittevoet.6thimprint,DenHaag1979.
M.Eder,J.Rybicki&M.Kestemont:‘StylometrywithR:apackageforcomputationalanalyses’,in:TheRJournal(2016),asdownload:https://journal.r-project.org/archive/accepted/eder-rybicki-kestemont.pdf
G.A.vanEs(ed.):DejeestevanWaleweinenhetschaakbord.Zwolle1957.
J.D.Janssens,R.vanDaele&V.Uyttersprot(eds.):VandenVosReynaerde.HetComburgsehandschrift.2ndimprint,Leuven1998.
W.J.A.Jonckbloet(ed.):Beatrijs.EenesprokeuitdeXIIIeeuw.DenHaag1841.
W.J.A.Jonckbloet:GeschiedenisderNederlandscheletterkunde.4thimprint,Groningen1888,part1.
M.Kestemont:Hetgewichtvandeauteur.StylometrischeauteursherkenninginMiddelnederlandseliteratuur.Gent2013.
P.deKeyser(ed.):DeBorchgravinnevanVergi.Antwerpen1943.
W.Kuiper:Dieridderemettenwittenscilde.Oorsprong,overleveringenauteurschapvandeMiddelnederlandseFerguut,gevolgddooreendiplomatischeeditieeneendiplomatischglossarium.Amsterdam1989.
E.Rombauts,N.dePaepe&M.J.M.deHaan(eds.):Ferguut.DenHaag1982.
66
E.Stamatatos:‘Asurveyofmodernauthorshipattributionmethods’,in:JournaloftheAssociationforInformationScienceandTechnology60(2008)3,p.538–556.
H.-J.Uther:TheTypesofInternationalFolktales.AClassificationandBibliography.3volumes.Helsinki2004.
2.StylometryappliedtobookpreferencesPeterBoot,[email protected]
IntroductionOneoftheoldestandmostactivefieldsinDigitalHumanitiesisauthorshipattribution.Ithasbeenshownmanytimesthatwritershaveacharacteristicstylethatcanbeusedtotellthemapart(e.g.Burrows,2002).Itisalsowellknownthatwordusagecanbeusedtopredictpersonalitycharacteristics(e.g.Noecker,Ryan,&Juola,2013).Personalitycharacteristicsinturnarerelatedtopreferencesindifferentartforms(e.g.Cantador,Fernández-Tobías,Bellogín,Kosinski,&Stillwell,2013).Thissuggeststhat,asonewouldhope,thestylisticdifferenceswherebywetellauthorsapart(suchasdifferencesinfunctionwordusage)arenotjustmeaninglesspreferencesforonefunctionwordoveranother,butarerelatedtoartisticpreference,inawaythatisstilltobeclarified.
Thispaper,continuingearlierwork(Boot,2014),triestocontributetothatclarification,inthatitwillremovethemiddleterm(thepersonalitycharacteristics)andshowthatthereisadirectrelationbetweenthewordsthatpeopleuseandtheirpreferencesinart,inthiscase,forbooks.ThewritersthatIstudyherearethewritersofbookreviews,notbooks.Inthefirstsection,Iwillusebookreviewsandratingsfrombookdiscussionsitesandshowcorrelationsbetweenwordusageandbookratings.Inthesecondsection,Iwilltakeanexploratoryapproachandcreateaclusteringofreviewersbywordusage.Forthetwoclusters,Iwillthenlookattheirpreferredwordusage,aswellasthewordusageinthebookdescriptionsoftheirpreferredbooks.
CorrelationsbetweenwordusageandratingsThedatathatthepaperuseswerecollectedfromanumberofDutchbookdiscussionsites.Thesesitesincludehebban.nl,lezerstippenlezers.be,bol.comandthenowdefunctsiteswatleesjij.nuanddizzie.nl.
Thecorrelationswerecomputedasfollows:Iselectedreviewsfromuserswhohadwrittenatleast100000characters,excludingsomeuserswithmultipleaccounts.Icomputedrelativewordfrequenciesintheirreviews,andnormalizedtheresults(centeraroundzeroanddividebythestandarddeviation).Inordertoremovewordswiththematiclinkstobooks(murder,war,castle,love)IlimitedthecomputationtowordsdefinedasfunctionwordsintheDutchLIWC2007dictionary(Boot,Zijlstra,&Geenen,2017,inpress).ForthesameusersIretrievedthebookratingsandcreatedamatrixofusersbyrating,excludingbooksthatwereratedonlyonce.Icomputedthebiascorrecteddistancecorrelation(amultivariategeneralizationofthecorrelationcoefficient,seeSzékely&Rizzo,2013)betweenthetwomatrices,andrepeatedthatcomputationforreviewsinallgenres,inliteratureandintheliterarythriller.TheresultsaregiveninthefirstrowofTable1.
Tobeabsolutelysurethatnocontent-aspectsofthereviewswerereflectedinthewordusage,IrepeatedthecomputationusingPart-of-speech-tags.ThetextsweretaggedusingTreetaggerandinsteadoftherelativewordfrequenciesIusedrelativefrequenciesofPOSbigrams.Theresultsaregiveninthesecondrowofthetable.
67
Table 1
Correlationswithp-values Allgenres189reviewers166reviews(avg.)
Literature41reviewers126reviews(avg.)
Literarythriller32reviewers88reviews(avg.)
functionwords(200)vs.ratings 0.20(0.000) 0.16(0.000) 0.41(0.000)
POSbigrams(100)vs.ratings 0.16(0.000) 0.10(0.002) 0.22(0.000)
Itishardtointerpretthesecorrelationsizes,butitisclearthatthereareverysignificantcorrelationsbetweenfunctionwordusageandbookratings.ThefactthatthesecorrelationspersistevenwhenlookingatPOSbigramsshowsthattherelationistosomeextentbasedpurelyonlinguisticstyle,notoncontent.WhysequencesofPOS-tagsshouldberelatedtoliterarypreferenceisanintriguingquestionthatthispaperwillnotsolve.
ExploratoryanalysisTogetafeelforwhatthiscorrelationmightmeanintermsofrealreviewsandratings,Icreatedaclusteringbasedonfunctionwordusageforagroupofreviewers.Iremovedafewoutliersandwasleftwithtwoclusters,cluster1containing20reviewersandcluster2containing11.
Ithenlookedattheirreviewsandpreferredbooks.Asampleofreviewsfromcluster1showedtheirinformal,directandverypersonalwriting,characteristicsthatweremuchlessprominentincluster2.Thisimpressionisconfirmedwhenlookingatcontrastivekeywordsinthereviewsofbothclusters.The20keywordswiththelargesteffectsize(Gabrielatos&Marchi,2011)forbothclustersareshownintable2.Itisclearcluster1prefersthefirstperson,cluster2hasmoreinterestinwriting.
Table2
Cluster Preferredreviewwords
1 thought(wasoftheopinion),very,because,completely,me,actually,therefore,read(pastpart.),beautiful,afterall,had,have(1stpers.sing.),am,I,very,all,good,otherwise,yet,again
2 writer(fem.),writer,novel,reader,years,under,know,these,characters,one,between,gives,second,the,them,of,until,end,in,who
Turningtotheratings,whilethereweremanybooksthatwereratedsignificantlyhigherbyoneofthegroups,thepreferenceswerehardtounderstandintermsoftaste.Ratingssummedbygenredidn’tshowaveryclearpictureeither.Itwasonlywhenlookingatcontrastivewordusageinthe(publisher-provided)bookdescriptionsforbooksreadbyeitherclusterthataclearerpictureemerged.
Table3
Cluster Keywordsinpreferredbookdescriptions
1 thriller,investigation,police,murdered,murder,case,body,someone,further,secret,above,know,very,sits,very,disappeared,within,nothing,appears,found,become,part,truth,books,there,something,else
2 inwhich,without,about,parents,family,city,bigstories,last,exist,us,we,writer,history,love,country,tells,century,novel,Netherlands,war
68
Hereitbecomesclearthatcluster1prefersthrillersandpolicenovels,whilecluster2hasaless-focussedinterestinfamily,writingandthecountry.Itisworthwhiletorepeatthattheseclustersofcontentwordsresultfromclusteringreviewersonthebasisoffunctionwords.
ConclusionTakentogether,thecorrelationsandtheexploratoryanalysisshowthatthereisarelationbetweenthefunctionwordsthatpeopleuseandtheirpreferencesforbooks.Thisrelationstillholdsatthelevelofpart-of-speechtags.Thisclearlyshowsthatthewordusagethathelpstellauthorsapartistosomeextentrelatedtoartisticpreference.Apossibleexplanationwouldbethatthereviewersunconsciouslyimitatethebookstheyreadintheiruseoffunctionwords.Thatseemsunlikely,amongotherreasonsbecausetheeffectisalsovisiblewhenwejustlookatthereviewsinasinglegenre(secondandthirdcolumnoftable1).Themorelikelyexplanationisthatfunctionwordusageisatleastinpartdeterminedbyartisticpreferenceandrelatedpersonalitycharacteristics.The‘fingerprint’metaphorthatisoftenusedinthiscontext,withitssuggestionofanessentiallyrandomidentifier,unlikelytoberelatedtoartisticpreference,mustthereforebeconsideredasinappropriate.
LiteratureBoot,P.(2014).Dimensionsofliteraryappreciation.Worduseandratingsonabookdiscussionsite.DigitalHumanities2014.Retrievedfromhttp://dharchive.org/paper/DH2014/Paper-825.xml
Boot,P.,Zijlstra,H.,&Geenen,R.(2017,inpress).TheDutchtranslationoftheLinguisticInquiryandWordCount(LIWC)2007dictionary.DutchJournalofAppliedLinguistics,6(1).
Burrows,J.(2002).‘Delta’:Ameasureofstylisticdifferenceandaguidetolikelyauthorship.LiteraryandLinguisticComputing,17(3),267-287.
Cantador,I.,Fernández-Tobías,I.,Bellogín,A.,Kosinski,M.,&Stillwell,D.(2013).RelatingPersonalityTypeswithUserPreferencesinMultipleEntertainmentDomains.Proceedingsofthe1stWorkshoponEmotionsandPersonalityinPersonalizedServices(EMPIRE2013),atthe21stConferenceonUserModeling,AdaptationandPersonalization(UMAP2013).
Gabrielatos,C.,&Marchi,A.(2011).Keyness:Matchingmetricstodefinitions.Theoretical-methodologicalchallengesincorpusapproachestodiscoursestudies-andsomewaysofaddressingthem.
Noecker,J.,Ryan,M.,&Juola,P.(2013).Psychologicalprofilingthroughtextualanalysis.LiteraryandLinguisticComputing,28(3),382-387.
Székely,G.J.,&Rizzo,M.L.(2013).Thedistancecorrelationt-testofindependenceinhighdimension.JournalofMultivariateAnalysis,117,193-213.
3.Corpusenrichmentfor17thcenturyDutch:apilotstudyFeikeDietz1,MarjovanKoppen2,IreneKramer1andMarijnSchraagen21InstituteforCulturalInquiry,2UtrechtInstituteofLinguisticsOTSUtrechtUniversity
1 IntroductionTheDutchlanguageinthe17thcenturywasamixtureoffadinglinguisticpropertiesfromtheprecedinglanguagephase,MiddleDutch,andupcomingnewwaystoconstructwordsandsentences.Withintheselanguagedynamicsweobserveatypeoflanguagevariationthathasrarely
69
beenaddressedbefore:variationwithinindividuallanguageusers(intra-authorvariation).Theaimofthecurrentprojectistodescribeandanalyseindetailthelinguisticandliterary/rhetoricalcontextsinwhichintra-authorvariationoccurs.Asaprerequisite,thedataneedstobeannotatedlinguistically,usingpartofspeech(POS)informationand(morpho-)syntacticstructure,andsociolinguistically,describingvariousfactorsthatinfluencelanguageuse.
InapilotprojectwerestrictourresearchtothelettersofthefamousDutchauthorandpoliticianP.C.Hooft,writtenbetween1600and1638.Thiscollectionisrelativelylarge(approximately800letters,∼300.000words)andcontainssociolinguisticvariationintypeofcorrespondentandtypeofletter.Thecorpuscanbeused,i.a.,tostudythelossofnegativeconcordinDutch,whichisobservedinHooft’slettersfromthisperiod(Paardekooper,2016).
AsastartingpointforobtainingPOStags,theAdelheidtaggerforMiddleDutch(vanHalterenandRem,2013)isused.BecausethetaggeristrainedonMiddleDutch,theresultsarenothighlyaccuratefor17thcenturytexts.Therefore,acorrectionprocedureforPOS-tagsandlemmasisperformedbyhumanannotators.Additionally,theannotatorsprovidethenecessarysociolinguisticinformationaboutlettersandcorrespondents.Whenannotationiscompleted,adetailedandsystematicanalysisoflinguisticphenomenawillbecomefeasible.
2 ApproachThesourcedataisavailableinadiplomaticedition(VanTricht,1976).WeusethiseditionafterseparatingHooftsoriginalseventeenthcenturytextsfromthemetadata(pagenumbers,footnotes,annotations).
Figure1:Exampleofthenewlydevelopedannotationtool
2.1 Part-of-Speechtagging
AcollaborationwiththeNederlabproject(Brugmanetal.,2016)isestablishedtoincreaseavailabilityoftheenrichedcorpus,byincludingthePOStaggingandsociolinguisticmetadataintheNederlabresearchinfrastructure.TheintegrationnecessitatesconversionoftheCRMtagsetusedbyAdelheidtotheCGNtagsetusedbyNederlab.Additionally,thetaggingneedstoberepresentedintotheFoLiA
70
XMLformatforlinguisticannotation(vanGompelandReynaert,2013).TheCRMtagsetismoreextensivethanCGN,notablyintheuseofsurfaceformfeaturessuchasform-e(wordsendingin-e).Surfaceformfeaturesarerelatedtocasemarking,whichisanimportantaspectinthestudyoflinguisticvariationin17thcenturyDutch.Therefore,wedecidedtokeepthesefeaturesinthemappingtoCGNtags(seeFigure1).
2.2 Sociolinguistictagging
Akeyhypothesisinintra-authorvariationistheinfluenceofsociologicalfactorsonlinguisticchoices.Toevaluatethishypothesissystematically,alllettersarebeingannotatedwiththefollowinginformation:
• Goal:expressthanks,askadvice,recommend,invite• Topic:politics,religion,personalaffairs,administration• Forindividualcorrespondents:
o name,gender,yearofbirthanddeatho statusofcorrespondentasliteraryauthoro relationtoHooft:familymembers,literaryfriends,politicians,etc.
• Forgroupcorrespondents:o nameo domain:government,financialorlegalinstitutions,civilassociations
• Letterstructure:greeting,introduction,narratio,closingformulas
2.3 Annotationprocess
Atoolhasbeendeveloped(seeFigure1)toperformPOSandsociolinguisticannotationinanefficientway.Apoolofannotatorsisavailableforthetask,whichwillperformpartlyoverlappingannotationstoallowforagreementmeasurements.Theannotationprocessiscurrentlyongoing.Aprotocolhasbeendevelopedtoguidethepost-correctionprocess(seeFigure2forexamples).
Figure2:Annotationguidelineexamples
3 AnalysisInrelatedwork(Kramer,2016)theuseofnegationbyHoofthasbeenstudiedmanually.KramershowsthatHooftusesmostlysinglenegationindifferentsyntacticalenvironments(subclauses,inversion,mainclauses,localnegation,V1(verb-initial)sentences).Additionally,thenegationparticlenietcanbeusedasalternativeforthenounnothing.Furthermore,Hooftusesbipartitenegationinalmostallsyntacticalenvironmentsaswell(allexceptinV1).InKramer’sresearch,not
Comparative and superlative adjectives are annotated individually. This
rule is also applied for irregular adverbs, such as veel, meer, meest and
wel/goed, beter, best. As an example, minste in the sentence below (1634,
Van Tricht p. 527) receives a separate lemma minst:
. . . waer aen het minste deel niet en zal hebben, Me Jo↵r
e
.
Nominatives and non-nominatives are di↵erentiated. We chose not to de-
nominate dative, genitive, accusative and ablative. Instead, the surface
form, related to case marking, is annotated. An example from 1633 (Van
Tricht p. 437):
Veel gelux
N(ev,non-nom,form-s)
met . . . den
LID(bep,form-n)
jongen
N(ev,non-
nom,form-n)
Arnout, dien god geeve ’t lof des
LID(bep,form-s)
geenen nae te
ijvren, daer hij den naem af draeght.
71
oneenvironmentseemedtoparticularlyaskfortheuseofbipartitenegation.Thisresearch,however,encompassedonly107letters.Thefullyannotatedcorpuswillallowamorequantitativeanalysis,aswellasalargerrangeandhigherlevelofdetailoflinguisticphenomena.
NobelsandRutten(2014)notetheinfluenceofgenderandsocialclassonnegation(p.41):‘whilesinglenegationspreadfromthenorthtothesouth,italsoturnedintoasocialvariant,astheupperranksinsocietyandmaleletterwritersseemedtobequickertopickupontheincomingvariantthanthelowerranksandfemaleletterwriters’.NobelsandRutten(2014)alsonote(p.43)thattraditionsinletterwritingaffectlinguisticdevelopment:‘fixedformulaewerememorizedasawhole(orcopied)bywritersfromanysocialbackground.Thesefixedformulaeoccurincertainpartsoftheletters,mostlyinthebeginningandtheending’.Withthecurrentannotationeffort,thistypeofobservationscanbestudiedsystematically.
ReferencesBrugman,H.,Reynaert,M.,vanderSijs,N.,vanStipriaan,R.,TjongKimSang,E.,andvandenBosch,A.(2016).Nederlab:TowardsasingleportalandresearchenvironmentfordiachronicDutchtextcorpora.InProceedingsofLREC2016.
vanGompel,M.andReynaert,M.(2013).Folia:Apracticalxmlformatforlinguisticannotation-adescriptiveandcomparativestudy.ComputationalLinguisticsintheNetherlandsJournal,3:63–81.
vanHalteren,H.andRem,M.(2013).Dealingwithorthographicvariationinatagger-lemmatizerforfourteenthcenturyDutchcharters.LanguageResourcesandEvaluation,47(4):1233–1259.
Kramer,I.(2016).Variatieinnegatie,eensyntactischenretorischeanalysevanhetgebruikvanenkeleentweeledigenegatieindebrievenvanP.C.Hooftvan1633tot1638aanJoostBaekenTesselschadeRoemersdochterVisser.BAthesis,UniversiteitUtrecht.
Nobels,J.andRutten,G.(2014).Languagenormsandlanguageuseinseventeenth-centuryDutch:negationandthegenitive.InRutten,G.,editor,Normsandusageinlanguagehistory,1600-1900.Asociolinguisticandcomparativeperspective.,pages21–48.JohnBenjaminsPublishingCompany.
Paardekooper,P.(2016).Bloeienondergangvanonbeperktne/en,vooraldatbijniet-woorden.Neerlandistiek.nl.
vanTricht,H.(1976).DebriefwisselingvanPieterCorneliszoonHooft.TjeenkWillink/Noorduijn.