![Page 1: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/1.jpg)
April 21, 2023
WAHSP/BILAND
Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media
![Page 2: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/2.jpg)
April 21, 2023
WAHSP/BILAND
Research team: Stephen Snelders(UU), Pim Huijnen(UU), Daan Odijk(ISLA, UvA), Fons Laan(ISLA), Maarten de Rijke (ISLA), Toine Pieters (UU),
![Page 3: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/3.jpg)
04/21/23
Research
Creating big-data resources
![Page 4: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/4.jpg)
National library of the NetherlandsDigital Newspaper ArchiveNational library of the NetherlandsDigital Newspaper Archive
> 10.000.000 pages> 10.000.000 pages
> 1200 titles> 1200 titles
1618-1995
1618-1995
> 30.000.000 articles> 30.000.000 articles
Still growing...Still growing...
![Page 5: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/5.jpg)
How did/do you study 30 millionnewspaper articles?
![Page 6: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/6.jpg)
Dutch press on GermanyFrank van Vree (1989)Dutch press on GermanyFrank van Vree (1989)
> 1200 titles> 1200 titles
1618-1995
1618-1995
> 31.000.000 articles> 31.000.000 articles44
1930- 1939
1930- 1939
4.0004.000
Sampling
![Page 7: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/7.jpg)
04/21/23
Research
![Page 8: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/8.jpg)
Developing semantic document selection tools
![Page 9: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/9.jpg)
April 21, 2023
Research
WE NEED:
A semi-automatic and interactive open-source
application
An application that does not replace, but
supports the intuition and insights of the
historical researcher with expert knowledge of a
specific topic or domain.
An application that is user-friendly.
![Page 10: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/10.jpg)
April 21, 2023
Research
Problem:
Context and background of Dutch drug and eugenics
debates in time
Aim
Understanding and evaluation of public debates around
drugs, addiction and eugenics in the Netherlands, 1900-
1945
Research question
What are the dynamics (in terms of patterns and trends)
of public debates and sentiments around drugs and
addiction, and eugenics in the Dutch newspapers in the
first half of the twentieth century
![Page 11: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/11.jpg)
April 21, 2023
Research
Poe’s detective finds the truth by using data in those newspaper articles that do not concern the murder.
In a similar way we will find terms and sentiments in those newspaper articles that may seem irrelevant, but are not.
![Page 12: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/12.jpg)
12
E-everything
Information-extraction
Recognize structure in text
Part of speech
Noun, verb, …
Entities
people, organisations, locations, temporal expressions, …
Relations
Who, what, with whom, how, why
![Page 13: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/13.jpg)
13
E-everything
Information-extraction (2)
![Page 14: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/14.jpg)
04/21/23
Enjoyable but what does it tell us?
![Page 15: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/15.jpg)
04/21/23
Research
![Page 16: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/16.jpg)
04/21/23
Research Start Query: Opium
![Page 17: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/17.jpg)
04/21/23
Research Drugs and drug policy
![Page 18: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/18.jpg)
Odijk D., de Rooij O., Peetz M-H., Pieters T., de Rijke M., Snelders S. (2012). "Semantic Document Selection", TPDL 2012: Theory and Practice of Digital Libraries: Springer, September.
![Page 19: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/19.jpg)
04/21/23
Combining and clustering queries
![Page 20: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/20.jpg)
04/21/23
Research
By carefully inspecting the word counts, we found quantitative evidence for historical turning points that indicated the criminalization of the drugs debate around 1924
![Page 21: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/21.jpg)
Eugenics case; query overerving (hereditarian) 1867
04/21/23
Research
Primarily associations with health related terms/entities
![Page 22: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/22.jpg)
04/21/23
Research
Eugenics case;
![Page 23: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/23.jpg)
Eugenics case; query overerving 1935
04/21/23
Research
In 1935, however, the medical context of using the term inheritance made way for a legal and racial context
![Page 24: October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media](https://reader035.vdocument.in/reader035/viewer/2022081603/56649e8b5503460f94b9084e/html5/thumbnails/24.jpg)
E-Humanity Approaches to Reference Cultures: The Emergence of the United States in Public Discourse in the Netherlands, 1890-1990
Challenges: 1. OCR-Repair
2. Improving Text-mining software and data
infrastructure
3. Developing new historical research strategies
4. Educating historians and other humanities
researchers
04/21/23
NEW HORIZONS in DIGITAL HUMANITIES